Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22250

Mystery: casting a GNU C label pointer to a function pointer, with inline asm to put a ret in that block. Block being optimized away?

$
0
0

Firstly: This code is considered to be of pure fun, please do not do anything like this in production. We will not be responsible of any harm caused to you, your company or your reindeer after compiling and executing this piece of code in any environment. The code below is not safe, not portable and is plainly dangerous. Be warned. Long post below. You were warned.

Now, after the disclaimer: Let's consider the following piece of code:

#include <stdio.h>

int fun()
{
    return 5;
}

typedef int(*F)(void) ;

int main(int argc, char const *argv[])
{

    void *ptr = &&hi;

    F f = (F)ptr;

    int  c = f();
    printf("TT: %d\n", c);

    if(c == 5) goto bye;
    //else goto bye;     /*  <---- This is the most important line. Pay attention to it */

hi:
    c = 5;
    asm volatile ("movl $5, %eax");
    asm volatile ("retq");

bye:
    return 66;
}

For the beginning we have the function fun which I have created purely for reference to get the generated assembly code.

Then we declare a function pointer F to functions taking no parameters and returning an int.

Then we use the not so well known GCC extension https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html to get the address of a label hi, and this works in clang too. Then we do something evil, we create a function pointer F called f and initialize it to be the label above.

Then the worst of all, we actually call this function, and assign its return value to a local variable, called C and the we print it out.

The following is an if to check if the value assigned to the c is actually the one we need, and if yes go to bye so that he application exits normally, with exit code 66. If that can be considered a normal exit code.

The next line is commented out, but I can say this is the most important line in the entire application.

The piece of code after the label hi is to assign 5 to the value of c, then two lines of assembly to initialize the value of eax to 5 and to actually return from the "function" call. As mentioned, there is a reference function, fun which generates the same code.

And now we compile this application, and run it on our online platform: https://gcc.godbolt.org/z/K6z5Yc

It generates the following assembly (with -O1 turned on, and O0 gives a similar result, albeit a bit more longer):

# else goto bye  is COMMENTED OUT
fun:
        mov     eax, 5
        ret
.LC0:
        .string "TT: %d\n"
main:
        push    rbx
        mov     eax, OFFSET FLAT:.L3
        call    rax
        mov     ebx, eax
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        cmp     ebx, 5
        je      .L4
.L3:
        movl $5, %eax
        retq
.L4:
        mov     eax, 66
        pop     rbx
        ret

The important lines are mov eax, OFFSET FLAT:.L3 where the L3 corresponds to our hi label, and the line after that: call rax which actually calls it.

And runs like:

ASM generation compiler returned: 0
Execution build compiler returned: 0
Program returned: 66
    TT: 5

Now, let's revisit the most important line in the application and uncomment it.

With -O0 we get the following assembly, generated by gcc:

# else goto bye  is UNCOMMENTED
# even gcc -O0  "knows" hi: is unreachable.
fun:
        push    rbp
        mov     rbp, rsp
        mov     eax, 5
        pop     rbp
        ret
.LC0:
        .string "TT: %d\n"
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 48
        mov     DWORD PTR [rbp-36], edi
        mov     QWORD PTR [rbp-48], rsi
        mov     QWORD PTR [rbp-8], OFFSET FLAT:.L4
        mov     rax, QWORD PTR [rbp-8]
        mov     QWORD PTR [rbp-16], rax
        mov     rax, QWORD PTR [rbp-16]
        call    rax
        mov     DWORD PTR [rbp-20], eax
        mov     eax, DWORD PTR [rbp-20]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        cmp     DWORD PTR [rbp-20], 5
        nop
.L4:
        mov     eax, 66
        leave
        ret

and the following output:

ASM generation compiler returned: 0
Execution build compiler returned: 0
Program returned: 66

so, as you can see our printf was never called, the culprit is the line mov QWORD PTR [rbp-8], OFFSET FLAT:.L4 where L4 actually corresponds to our bye label.

And from what I can see from the generated assembly, not a piece of code from the part after hi was added into the generated code.

But at least the application runs and at least has some code for comparing c to 5.

On the other end, clang, with O0 generates the following nightmare, which by the way crashes:

# else goto bye  is UNCOMMENTED
# clang -O0 also doesn't emit any instructions for the hi: block
fun:                                    # @fun
        push    rbp
        mov     rbp, rsp
        mov     eax, 5
        pop     rbp
        ret
main:                                   # @main
        push    rbp
        mov     rbp, rsp
        sub     rsp, 48
        mov     dword ptr [rbp - 4], 0
        mov     dword ptr [rbp - 8], edi
        mov     qword ptr [rbp - 16], rsi
        mov     qword ptr [rbp - 24], 1
        mov     rax, qword ptr [rbp - 24]
        mov     qword ptr [rbp - 32], rax
        call    qword ptr [rbp - 32]
        mov     dword ptr [rbp - 36], eax
        mov     esi, dword ptr [rbp - 36]
        movabs  rdi, offset .L.str
        mov     al, 0
        call    printf
        cmp     dword ptr [rbp - 36], 5
        jne     .LBB1_2
        jmp     .LBB1_3
.LBB1_2:
        jmp     .LBB1_3
.LBB1_3:
        mov     eax, 66
        add     rsp, 48
        pop     rbp
        ret
.L.str:
        .asciz  "TT: %d\n"

If we turn on some optimization, for example O1, we get from gcc:

# else goto bye  is UNCOMMENTED
# gcc -O1
fun:
        mov     eax, 5
        ret
.LC0:
        .string "TT: %d\n"
main:
        sub     rsp, 8
        mov     eax, OFFSET FLAT:.L3
        call    rax
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
.L3:
        mov     eax, 66
        add     rsp, 8
        ret

and the application crashes, which is sort of understandable. Again, the compiler had entirely removed our hi section (mov eax, OFFSET FLAT:.L3 goes tiptoe to L3 which corresponds to our bye section) and unfortunately decided that it's a good idea to increase rsp before a ret so to be sure we end up somewhere totally different where we need to be.

And clang delivers something even more dubious:

# else goto bye  is UNCOMMENTED
# clang -O1
fun:                                    # @fun
        mov     eax, 5
        ret
main:                                   # @main
        push    rax
        mov     eax, 1
        call    rax
        mov     edi, offset .L.str
        mov     esi, eax
        xor     eax, eax
        call    printf
        mov     eax, 66
        pop     rcx
        ret
.L.str:
        .asciz  "TT: %d\n"

1 ? How on earth did clang end up with this?

To some level I understand that the compiler decided that dead code after an if where both if and else go to the same location is not needed, but here my knowledge and insight stops.

So now, dear C and C++ gurus, assembly aficionados and compiler crushers, here comes the question:

Why?

Why do you think did the compiler decide that the two labels should be considered equivalent if we have added the else branch, or why did clang put there 1, and last but not least: someone with a deep understanding of the C standard could maybe point out where this piece of code deviated so badly from normality that we ended up in this really really weird situation.


Viewing all articles
Browse latest Browse all 22250

Trending Articles