Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22016

Why is gcc's right-shift code different in C and C++ mode?

$
0
0

When ARM gcc 9.2.1 is given command line options -O3 -xc++ -mcpu=cortex-m0 [compile as C++] and the following code:

unsigned short adjust(unsigned short *p){    unsigned short temp = *p;    temp -= temp>>15;    return temp;}

it produces the reasonable machine code:

    ldrh    r0, [r0]    lsrs    r3, r0, #15    subs    r0, r0, r3    uxth    r0, r0    bx      lr

which is equivalent to:

unsigned short adjust(unsigned short *p){    unsigned r0,r3;    r0 = *p;    r3 = temp >> 15;    r0 -= r3;    r0 &= 0xFFFFu;   // Returning an unsigned short requires...    return r0;       //  computing a 32-bit unsigned value 0-65535.}

Very reasonable. The last "uxtw" could actually be omitted in this particular case, but it's better for a compiler that can't prove the safety of such optimizations to err on the side of caution than risk returning a value outside the range 0-65535, which could totally sink downstream code.

When using -O3 -xc -mcpu=cortex-m0 [identical options, except compiling as C rather than C++], however, the code changes:

    ldrh    r3, [r0]    movs    r2, #0    ldrsh   r0, [r0, r2]    asrs    r0, r0, #15    adds    r0, r0, r3    uxth    r0, r0    bx      lrunsigned short adjust(unsigned short *p){    unsigned r0,r2,r3;    r3 = *p;    r2 = 0;    r0 = ((unsigned short*)p)[r2];    r0 = ((int)r0) >> 15;  // Effectively computes -((*p)>>15) with redundant load    r0 += r3    r0 &= 0xFFFFu;     // Returning an unsigned short requires...    return temp;       //  computing a 32-bit unsigned value 0-65535.}

I know that the defined corner cases for left-shift are different in C and C++, but I thought right shifts were the same. Is there something different about the way right-shifts work in C and C++ that would cause the compiler to use different code to process them? Versions prior to 9.2.1 generate slightly less bad code in C mode:

    ldrh    r3, [r0]    sxth    r0, r3    asrs    r0, r0, #15    adds    r0, r0, r3    uxth    r0, r0    bx      lr

equivalent to:

unsigned short adjust(unsigned short *p){    unsigned r0,r3;    r3 = *p;    r0 = (short)r3;    r0 = ((int)r0) >> 15; // Effectively computes -(temp>>15)    r0 += r3    r0 &= 0xFFFFu;     // Returning an unsigned short requires...    return temp;       //  computing a 32-bit unsigned value 0-65535.}

Not as bad as the 9.2.1 version, but still an instruction longer than a straightforward translation of the code would have been. When using 9.2.1, declaring the argument as unsigned short volatile *p would eliminate the redundant load of p, but I'm curious why gcc 9.2.1 would need a volatile qualifier to help it avoid the redundant load, or why such a bizarre "optimization" only happens in C mode and not C++ mode. I'm also somewhat curious why gcc would even consider adding ((short)temp) >> 15 instead of subtracting temp >> 15. Is there some stage in the optimization where that would seem to make sense?


Viewing all articles
Browse latest Browse all 22016

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>