I am trying to learn more about assembly and which optimizations compilers can and cannot do.
I have a test piece of code for which I have some questions.
See it in action here: https://godbolt.org/z/pRztTT, or check the code and assembly below.
#include <stdio.h>#include <string.h>int main(int argc, char* argv[]){ for (int j = 0; j < 100; j++) { if (argc == 2 && argv[1][0] == '5') { printf("yes\n"); } else { printf("no\n"); } } return 0;}
The assembly produced by GCC 10.1 with -O3:
.LC0: .string "no".LC1: .string "yes"main: push rbp mov rbp, rsi push rbx mov ebx, 100 sub rsp, 8 cmp edi, 2 je .L2 jmp .L3.L5: mov edi, OFFSET FLAT:.LC0 call puts sub ebx, 1 je .L4.L2: mov rax, QWORD PTR [rbp+8] cmp BYTE PTR [rax], 53 jne .L5 mov edi, OFFSET FLAT:.LC1 call puts sub ebx, 1 jne .L2.L4: add rsp, 8 xor eax, eax pop rbx pop rbp ret.L3: mov edi, OFFSET FLAT:.LC0 call puts sub ebx, 1 je .L4 mov edi, OFFSET FLAT:.LC0 call puts sub ebx, 1 jne .L3 jmp .L4
It seems like GCC produces two versions of the loop: one with the argv[1][0] == '5'
condition but without the argc == 2
condition, and one without any condition.
My questions:
- What is preventing GCC from splitting away the full condition? It is similar to this question, but there is no chance for the code to get a pointer into argv here.
- In the loop without any condition (L3 in assembly), why is the loop body duplicated? Is it to reduce the number of jumps while still fitting in some sort of cache?