I wrote a C program that just read/write a large array. I compiled the program with command gcc -O0 program.c -o program
Out of curiosity, I dissemble the C program with objdump -S
command.
The code and assembly of the read_array
and write_array
functions are attached at the end of this question.
I'm trying to interpret how gcc compiles the function. I used //
to add my comments and questions
Take one piece of the beginning of the assembly code of the write_array()
function
4008c1: 48 89 7d e8 mov %rdi,-0x18(%rbp) // this is the first parameter of the fuction
4008c5: 48 89 75 e0 mov %rsi,-0x20(%rbp) // this is the second parameter of the fuction
4008c9: c6 45 ff 01 movb $0x1,-0x1(%rbp) // comparing with the source code, I think this is the `char tmp` variable
4008cd: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp) // this should be the `int i` variable.
What I don't understand is:
1) char tmp
is obviously defined afterint i
in write_array
function. Why gcc reorder the memory location of these two local variables?
2) From the offset, int i
is at -0x8(%rbp)
and char tmp
is at -0x1(%rbp)
, which indicates variable int i
takes 7 bytes? This is quite weird because int i
should be 4 bytes on x86-64 machine. Isn't it? My speculation is that gcc tries to do some alignment?
3) I found the gcc optimization choices are quite interesting. Is there some good documents/book that explain how gcc works? (The third question may be off-topic, and if you think so, please just ignore. I just try to see if there is some short cut to learn the underlying mechanisms gcc uses for compilation. :-) )
Below is the piece of function code:
#define CACHE_LINE_SIZE 64
static inline void
read_array(char* array, long size)
{
int i;
char tmp;
for ( i = 0; i < size; i+= CACHE_LINE_SIZE )
{
tmp = array[i];
}
return;
}
static inline void
write_array(char* array, long size)
{
int i;
char tmp = 1;
for ( i = 0; i < size; i+= CACHE_LINE_SIZE )
{
array[i] = tmp;
}
return;
}
Below is the piece of disassembled code for write_array
, from gcc -O0:
00000000004008bd <write_array>:
4008bd: 55 push %rbp
4008be: 48 89 e5 mov %rsp,%rbp
4008c1: 48 89 7d e8 mov %rdi,-0x18(%rbp)
4008c5: 48 89 75 e0 mov %rsi,-0x20(%rbp)
4008c9: c6 45 ff 01 movb $0x1,-0x1(%rbp)
4008cd: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp)
4008d4: eb 13 jmp 4008e9 <write_array+0x2c>
4008d6: 8b 45 f8 mov -0x8(%rbp),%eax
4008d9: 48 98 cltq
4008db: 48 03 45 e8 add -0x18(%rbp),%rax
4008df: 0f b6 55 ff movzbl -0x1(%rbp),%edx
4008e3: 88 10 mov %dl,(%rax)
4008e5: 83 45 f8 40 addl $0x40,-0x8(%rbp)
4008e9: 8b 45 f8 mov -0x8(%rbp),%eax
4008ec: 48 98 cltq
4008ee: 48 3b 45 e0 cmp -0x20(%rbp),%rax
4008f2: 7c e2 jl 4008d6 <write_array+0x19>
4008f4: 5d pop %rbp
4008f5: c3 retq