Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22113

Removing CMOV Instructions using GCC-9.2.0 (x86)

$
0
0

I am looking to compile a set of benchmark suites using traditional GCC optimizations (as in using -O2/3) and comparing this with the same benchmark using no cmov instructions. I have seen several posts/websites addressing this issue (all from several years ago and therefore referencing an older version of GCC than 9.2.0). Essentially, the answers from these was to use the following four flags (this is a good summary of everything I've found online):

-fno-if-conversion -fno-if-conversion2 -fno-tree-loop-if-convert -fno-tree-loop-if-convert-stores

Following this advice, I am using the following command to compile my benchmarks (theoretically with no cmov instructions).

g++-9.2.0 -std=c++11 -O2 -g -fno-if-conversion -fno-if-conversion2 -fno-tree-loop-if-convert -fno-tree-loop-if-convert-stores -fno-inline *.C -o bfs-nocmov

However, I am still finding instances where cmov is being used. If I change the optimization flag to -O0 the cmov instructions are not generated, so I am assuming there must be someway to disable this in GCC without modifying the c code/assembly.

Below is a code snippet example of what I am trying to disable (the last instruction is the cmov I am looking to avoid):

  int mx = 0;
  for (int i=0; i < n; i++)
  41bc8a:   45 85 e4                test   %r12d,%r12d
  41bc8d:   7e 71                   jle    41bd00 <_Z11suffixArrayPhi+0xe0>
  41bc8f:   41 8d 44 24 ff          lea    -0x1(%r12),%eax
  41bc94:   48 89 df                mov    %rbx,%rdi
.../suffix/src/ks.C:92
  int mx = 0;
  41bc97:   31 c9                   xor    %ecx,%ecx
  41bc99:   48 8d 54 03 01          lea    0x1(%rbx,%rax,1),%rdx
  41bc9e:   66 90                   xchg   %ax,%ax
.../suffix/src/ks.C:94
    if (s[i] > mx) mx = s[i];
  41bca0:   44 0f b6 07             movzbl (%rdi),%r8d
  41bca4:   44 39 c1                cmp    %r8d,%ecx
  41bca7:   41 0f 4c c8             cmovl  %r8d,%ecx

Finally, here is the code snippet generated by using -O0. I cannot use any optimization level lower than -O2, and while manually manipulating the code is an option I have a lot of benchmarks I am using so I would like to find a general solution.

  for (int i=0; i < n; i++) 
 c67:   8b 45 e8                mov    -0x18(%rbp),%eax
 c6a:   3b 45 d4                cmp    -0x2c(%rbp),%eax
 c6d:   7d 34                   jge    ca3 <_Z11suffixArrayPhi+0x105>
.../suffix/src/ks.C:94
    if (s[i] > mx) mx = s[i];
 c6f:   8b 45 e8                mov    -0x18(%rbp),%eax
 c72:   48 63 d0                movslq %eax,%rdx
 c75:   48 8b 45 d8             mov    -0x28(%rbp),%rax
 c79:   48 01 d0                add    %rdx,%rax
 c7c:   0f b6 00                movzbl (%rax),%eax
 c7f:   0f b6 c0                movzbl %al,%eax
 c82:   39 45 e4                cmp    %eax,-0x1c(%rbp)
 c85:   7d 16                   jge    c9d <_Z11suffixArrayPhi+0xff>
.../suffix/src/ks.C:94 (discriminator 1)
 c87:   8b 45 e8                mov    -0x18(%rbp),%eax
 c8a:   48 63 d0                movslq %eax,%rdx
 c8d:   48 8b 45 d8             mov    -0x28(%rbp),%rax
 c91:   48 01 d0                add    %rdx,%rax
 c94:   0f b6 00                movzbl (%rax),%eax
 c97:   0f b6 c0                movzbl %al,%eax
 c9a:   89 45 e4                mov    %eax,-0x1c(%rbp)

If somebody has any advice or direction to look in, it would be much appreciated.


Viewing all articles
Browse latest Browse all 22113

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>