Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22031

Is an extra move somehow faster when doing division-by-multiplication?

$
0
0

Consider this function:

unsigned long f(unsigned long x) {    return x / 7;}

With -O3, Clang turns the division into a multiplication, as expected:

f:                                      # @f        movabs  rcx, 2635249153387078803        mov     rax, rdi        mul     rcx        sub     rdi, rdx        shr     rdi        lea     rax, [rdi + rdx]        shr     rax, 2        ret

GCC does basically the same thing, except for using rdx where Clang uses rcx. But they both appear to be doing an extra move. Why not this instead?

f:        movabs  rax, 2635249153387078803        mul     rdi        sub     rdi, rdx        shr     rdi        lea     rax, [rdi + rdx]        shr     rax, 2        ret

In particular, they both put the numerator in rax, but by putting the magic number there instead, you avoid having to move the numerator at all. If this is actually better, I'm surprised that neither GCC nor Clang do it this way, since it feels so obvious. Is there some microarchitectural reason that their way is actually faster than my way?

Godbolt link.


Viewing all articles
Browse latest Browse all 22031

Trending Articles