Why does this code generate much more assembly than equivalent C++/Clang?
Compiling with the compiler flag -O
(and with an added pub
), I get this output (Link to Godbolt):
push rbp
mov rbp, rsp
xor dil, 1
or dil, sil
mov eax, edi
pop rbp
ret
A few things:
Why is it still longer than the C++ version?
The Rust version is exactly three instructions longer:
push rbp mov rbp, rsp [...] pop rbp
These are instructions to manage the so called frame pointer or base pointer (
rbp
). This is mainly required to get nice stack traces. If you disable it for the C++ version via-fno-omit-frame-pointer
, you get the same result. Note that this usesg++
instead ofclang++
since I haven't found a comparable option for the clang compiler.Why doesn't Rust omit frame pointer?
Actually, it does. But Godbolt adds an option to the compiler to preserve frame pointer. You can read more about why this is done here. If you compile your code locally with
rustc -O --crate-type=lib foo.rs --emit asm -C "llvm-args=-x86-asm-syntax=intel"
, you get this output:f1: xor dil, 1 or dil, sil mov eax, edi ret
Which is exactly the output of your C++ version.
You can "undo" what Godbolt does by passing
-C debuginfo=0
to the compiler.Why
-O
instead of--release
?Godbolt uses
rustc
directly instead ofcargo
. The--release
flag is a flag forcargo
. To enable optimizations onrustc
, you need to pass-O
or-C opt-level=3
(or any other level between 0 and 3).
To get the same asm code, you need to disable debug info - this will remove the frame pointers pushes.
-C opt-level=3 -C debuginfo=0
(https://godbolt.org/g/vdhB2f)
Compiling with -C opt-level=3
in godbolt gives:
example::f1:
push rbp
mov rbp, rsp
xor dil, 1
or dil, sil
mov eax, edi
pop rbp
ret
Which looks comparable to the C++ version. See Lukas Kalbertodt's answer for more explanation.
Note: I had to make the function pub extern
to stop the compiler optimising it to nothing, as it is unused.