Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22006

Inserting "marker" instructions into assembly without GCC reordering them

$
0
0

For purposes of doing performance analysis it is useful to be able to tell which line of C code goes with which line of generated assembly code. This can be very difficult once a sufficient number of optimization passes get involved, and I devised the following scheme to make it easier (though it has a lot of caveats). I figured I would use in-line assembly to insert an instruction that is effectively a nop, but that the compiler would rarely or never generate itself. Then when I looked at the generated code I could infer that assembly code that appears between the inserted marker instructions probably comes from C code that lies between the in-line assembly statements.

I came up with these candidates:

// Force insertion of a instruction that will only clobber
// flags and that the compiler hardly ever uses itself. Lie and say
// that it alters memory to try to prevent the compiler from moving
// around. Mark it volatile so the compiler can't remove it entirely.
#define ASSEMBLY_MARKER_0()                 \
    __asm__ volatile ("cld" : /* no outputs */ : /* no inputs */ : "memory", "cc")

#define ASSEMBLY_MARKER_1()                 \
    __asm__ volatile ("xorl %%eax,0" : /* no outputs */ : /* no inputs */ : "memory", "cc")

Then I decided to test whether the compiler would move instructions across these boundaries. clang appears to do exactly what I want, but GCC appears to not be deterred either by the memory clobbering or the fact that this snippet is volatile. It reorders instructions anyway! Is there any way to prevent this?

I know there are a lot of caveats to this method even if I get it to work -- I may heavily influence generated code around the markers. But I maintain that it would still be useful for finding things like accidental implicit conversions between integer widths, and other "wait that should never be necessary..." type problems.

You can see the difference between GCC and clang here: https://godbolt.org/z/ZtUPc9

C code:

int f(int x)
{
    __asm__ volatile ("xorl %%eax,0" : /* no outputs */ : /* no inputs */ : "memory", "cc");
    int j = x << 3;
    __asm__ volatile ("xorl %%eax,0" : /* no outputs */ : /* no inputs */ : "memory", "cc");
    return j;
}

GCC:

    xorl %eax,0
    xorl %eax,0
    lea     eax, [0+rdi*8]
    ret

Clang:

    xor     dword ptr [0], eax
    lea     eax, [8*rdi]
    xor     dword ptr [0], eax
    ret

Edits to answer questions in comments:

Why not nops? Because gcc inserts those itself often. The point is to stick out.

Why not move code into its own function? If you're doing this analysis on C++ template code for example, there be many layers of inlining that occur before producing the function that actually goes in the executable, and the code may be very different if you turn off the inlining (e.g. the code may have been written with the assumption that constant folding, dead code elimitation etc would get rid of trivial things).


Viewing all articles
Browse latest Browse all 22006

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>