Consider this C code:
void foo(void);long bar(long x) { foo(); return x;}
When I compile it on GCC 9.3 with either -O3
or -Os
, I get this:
bar: push r12 mov r12, rdi call foo mov rax, r12 pop r12 ret
The output from clang is identical except for choosing rbx
instead of r12
as the callee-saved register.
However, I want/expect to see assembly that looks more like this:
bar: push rdi call foo pop rax ret
In English, here's what I see happening:
- Push the old value of a callee-saved register to the stack
- Move
x
into that callee-saved register - Call
foo
- Move
x
from the callee-saved register into the return-value register - Pop the stack to restore the old value of the callee-saved register
Why bother to mess with a callee-saved register at all? Why not do this instead? It seems shorter, simpler, and probably faster:
- Push
x
to the stack - Call
foo
- Pop
x
from the stack into the return-value register
Is my assembly wrong? Is it somehow less efficient than messing with an extra register? If the answer to both of those are "no", then why don't either GCC or clang do it this way?
Edit: Here's a less trivial example, to show it happens even if the variable is meaningfully used:
long foo(long);long bar(long x) { return foo(x * x) - x;}
I get this:
bar: push rbx mov rbx, rdi imul rdi, rdi call foo sub rax, rbx pop rbx ret
I'd rather have this:
bar: push rdi imul rdi, rdi call foo pop rdi sub rax, rdi ret
This time, it's only one instruction off vs. two, but the core concept is the same.