I am looking to compile a set of benchmark suites using traditional GCC optimizations (as in using -O2/3) and comparing this with the same benchmark using no cmov instructions. I have seen several posts/websites addressing this issue (all from several years ago and therefore referencing an older version of GCC than 9.2.0). Essentially, the answers from these was to use the following four flags (this is a good summary of everything I've found online):
-fno-if-conversion -fno-if-conversion2 -fno-tree-loop-if-convert -fno-tree-loop-if-convert-stores
Following this advice, I am using the following command to compile my benchmarks (theoretically with no cmov instructions).
g++-9.2.0 -std=c++11 -O2 -g -fno-if-conversion -fno-if-conversion2 -fno-tree-loop-if-convert -fno-tree-loop-if-convert-stores -fno-inline *.C -o bfs-nocmov
However, I am still finding instances where cmov is being used. If I change the optimization flag to -O0
the cmov instructions are not generated, so I am assuming there must be someway to disable this in GCC without modifying the c code/assembly.
Below is a code snippet example of what I am trying to disable (the last instruction is the cmov I am looking to avoid):
int mx = 0;
for (int i=0; i < n; i++)
41bc8a: 45 85 e4 test %r12d,%r12d
41bc8d: 7e 71 jle 41bd00 <_Z11suffixArrayPhi+0xe0>
41bc8f: 41 8d 44 24 ff lea -0x1(%r12),%eax
41bc94: 48 89 df mov %rbx,%rdi
.../suffix/src/ks.C:92
int mx = 0;
41bc97: 31 c9 xor %ecx,%ecx
41bc99: 48 8d 54 03 01 lea 0x1(%rbx,%rax,1),%rdx
41bc9e: 66 90 xchg %ax,%ax
.../suffix/src/ks.C:94
if (s[i] > mx) mx = s[i];
41bca0: 44 0f b6 07 movzbl (%rdi),%r8d
41bca4: 44 39 c1 cmp %r8d,%ecx
41bca7: 41 0f 4c c8 cmovl %r8d,%ecx
Finally, here is the code snippet generated by using -O0
. I cannot use any optimization level lower than -O2
, and while manually manipulating the code is an option I have a lot of benchmarks I am using so I would like to find a general solution.
for (int i=0; i < n; i++)
c67: 8b 45 e8 mov -0x18(%rbp),%eax
c6a: 3b 45 d4 cmp -0x2c(%rbp),%eax
c6d: 7d 34 jge ca3 <_Z11suffixArrayPhi+0x105>
.../suffix/src/ks.C:94
if (s[i] > mx) mx = s[i];
c6f: 8b 45 e8 mov -0x18(%rbp),%eax
c72: 48 63 d0 movslq %eax,%rdx
c75: 48 8b 45 d8 mov -0x28(%rbp),%rax
c79: 48 01 d0 add %rdx,%rax
c7c: 0f b6 00 movzbl (%rax),%eax
c7f: 0f b6 c0 movzbl %al,%eax
c82: 39 45 e4 cmp %eax,-0x1c(%rbp)
c85: 7d 16 jge c9d <_Z11suffixArrayPhi+0xff>
.../suffix/src/ks.C:94 (discriminator 1)
c87: 8b 45 e8 mov -0x18(%rbp),%eax
c8a: 48 63 d0 movslq %eax,%rdx
c8d: 48 8b 45 d8 mov -0x28(%rbp),%rax
c91: 48 01 d0 add %rdx,%rax
c94: 0f b6 00 movzbl (%rax),%eax
c97: 0f b6 c0 movzbl %al,%eax
c9a: 89 45 e4 mov %eax,-0x1c(%rbp)
If somebody has any advice or direction to look in, it would be much appreciated.