I've been fighting with this problem for quite some time, and I've been unable to find a solution or even an explanation for it. So sorry if the question is long, but bear with me as I just want to make it 100% clear in the hopes that someone more experienced than me will be able to figure it out.
I'm keeping the C syntax highlight on for all snippets because it makes them a little bit clearer even if not really correct.
What I want to do
I have a C program which uses some functions from a dynamic library (libzip
). Here it is boiled down to a minimal reproducible example (it basically does nothing, but it works just fine):
#include <zip.h>
int main(void) {
int err;
zip_t *myzip;
myzip = zip_open("myzip.zip", ZIP_CREATE | ZIP_TRUNCATE, &err);
if (myzip == NULL)
return 1;
zip_close(myzip);
return 0;
}
Normally, to compile it, I would simply do:
gcc -c prog.c
gcc -o prog prog.o -lzip
This creates, as expected, an ELF which requires libzip
to run:
$ ldd prog
linux-vdso.so.1 (0x00007ffdafb53000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f81eedc7000)
/lib64/ld-linux-x86-64.so.2 (0x00007f81ef780000)
libzip.so.4 => /usr/lib/x86_64-linux-gnu/libzip.so.4 (0x00007f81ef166000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f81eebad000)
(libz
is just a dependency of libzip
)
What I really want to do though, is to load the library myself using dlopen()
. Pretty simple task, no? Well yes, or at least I thought.
To achieve this, I should just need to call dlopen
and let the loader do its job:
#include <zip.h>
#include <dlfcn.h>
int main(void) {
void *lib;
int err;
zip_t *myzip;
lib = dlopen("libzip.so", RTLD_LAZY | RTLD_GLOBAL);
if (lib == NULL)
return 1;
myzip = zip_open("myzip.zip", ZIP_CREATE | ZIP_TRUNCATE, &err);
if (myzip == NULL)
return 1;
zip_close(myzip);
return 0;
}
Of course, since I want to manually load the library myself, I will not link it this time:
# Create prog.o
gcc -c prog.c
# Do a dry-run just to make sure all symbols are resolved
gcc -o /dev/null prog.o -ldl -lzip
# Now recompile only with libdl
gcc -o prog prog.o -ldl -Wl,--unresolved-symbols=ignore-in-object-files
The flag --unresolved-symbols=ignore-in-object-files
tells ld
to not worry about my prog.o
having unresolved symbols at link time (I want to take care of that myself at runtime).
The problem
The above Should Just Work™, and indeed it does seem to... but I have two machines, and being the pedantic nerd I am I just thought "well, better make sure and compile it on both of them".
First machine
x86-64, Linux 4.9, Debian 9, gcc
6.3.0, ld
2.28. Here everything works as expected.
I can clearly see that the symbols are there:
$ readelf --dyn-syms prog
Symbol table '.dynsym' contains 15 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5 (2)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
===> 4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND zip_close
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND dlopen@GLIBC_2.2.5 (3)
===> 6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND zip_open
7: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
8: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
9: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@GLIBC_2.2.5 (2)
10: 0000000000201040 0 NOTYPE GLOBAL DEFAULT 25 _edata
11: 0000000000201048 0 NOTYPE GLOBAL DEFAULT 26 _end
12: 0000000000201040 0 NOTYPE GLOBAL DEFAULT 26 __bss_start
13: 00000000000006a0 0 FUNC GLOBAL DEFAULT 11 _init
14: 0000000000000924 0 FUNC GLOBAL DEFAULT 15 _fini
The PLT entries are also there as expected and look fine:
$ objdump -j .plt -M intel -d prog
Disassembly of section .plt:
00000000000006c0 <.plt>:
6c0: ff 35 42 09 20 00 push QWORD PTR [rip+0x200942] # 201008 <_GLOBAL_OFFSET_TABLE_+0x8>
6c6: ff 25 44 09 20 00 jmp QWORD PTR [rip+0x200944] # 201010 <_GLOBAL_OFFSET_TABLE_+0x10>
6cc: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
00000000000006d0 <zip_close@plt>:
6d0: ff 25 42 09 20 00 jmp QWORD PTR [rip+0x200942] # 201018 <zip_close>
6d6: 68 00 00 00 00 push 0x0
6db: e9 e0 ff ff ff jmp 6c0 <.plt>
00000000000006e0 <dlopen@plt>:
6e0: ff 25 3a 09 20 00 jmp QWORD PTR [rip+0x20093a] # 201020 <dlopen@GLIBC_2.2.5>
6e6: 68 01 00 00 00 push 0x1
6eb: e9 d0 ff ff ff jmp 6c0 <.plt>
00000000000006f0 <zip_open@plt>:
6f0: ff 25 32 09 20 00 jmp QWORD PTR [rip+0x200932] # 201028 <zip_open>
6f6: 68 02 00 00 00 push 0x2
6fb: e9 c0 ff ff ff jmp 6c0 <.plt>
And the program runs without any problem:
$ ./prog
$ echo $?
0
Even looking inside it with a debugger I can clearly see the symbols getting correctly resolved like any normal dynamic symbol:
0x55555555479b <main+43> lea rax, [rbp - 0x14]
0x55555555479f <main+47> mov rdx, rax
0x5555555547a2 <main+50> mov esi, 9
0x5555555547a7 <main+55> lea rdi, [rip + 0xc0] <0x7ffff7ffd948>
0x5555555547ae <main+62> call zip_open@plt <0x555555554620>
|
v ### PLT entry:
0x555555554620 <zip_open@plt> jmp qword ptr [rip + 0x200a02] <0x555555755028>
|
v
0x555555554626 <zip_open@plt+6> push 2
0x55555555462b <zip_open@plt+11> jmp 0x5555555545f0
|
v ### PLT stub:
0x5555555545f0 push qword ptr [rip + 0x200a12] <0x555555755008>
0x5555555545f6 jmp qword ptr [rip + 0x200a14] <0x7ffff7def0d0>
|
v ### Symbol gets correctly resolved
0x7ffff7def0d0 <_dl_runtime_resolve_fxsave> push rbx
0x7ffff7def0d1 <_dl_runtime_resolve_fxsave+1> mov rbx, rsp
0x7ffff7def0d4 <_dl_runtime_resolve_fxsave+4> and rsp, 0xfffffffffffffff0
0x7ffff7def0d8 <_dl_runtime_resolve_fxsave+8> sub rsp, 0x240
Second machine
x86-64, Linux 4.15, Ubuntu 18.04, gcc
7.4, ld
2.30. Here, something really strange is going on.
Compilation doesn't yield any warning or error, but I do not see the symbols:
$ readelf --dyn-syms prog
Symbol table '.dynsym' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5 (2)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND dlopen@GLIBC_2.2.5 (3)
5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
6: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@GLIBC_2.2.5 (2)
The PLT entries are there, but they are filled with zeroes, and aren't even recognized by objdump
:
$ objdump -j .plt -M intel -d prog
Disassembly of section .plt:
0000000000000560 <.plt>:
560: ff 35 4a 0a 20 00 push QWORD PTR [rip+0x200a4a] # 200fb0 <_GLOBAL_OFFSET_TABLE_+0x8>
566: ff 25 4c 0a 20 00 jmp QWORD PTR [rip+0x200a4c] # 200fb8 <_GLOBAL_OFFSET_TABLE_+0x10>
56c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
...
# ^^^
# Here, these three dots are actually hiding another 0x10+ bytes filled of 0x0
# zip_close@plt should be here instead...
0000000000000580 <dlopen@plt>:
580: ff 25 42 0a 20 00 jmp QWORD PTR [rip+0x200a42] # 200fc8 <dlopen@GLIBC_2.2.5>
586: 68 00 00 00 00 push 0x0
58b: e9 d0 ff ff ff jmp 560 <.plt>
...
# ^^^
# Here, these three dots are actually hiding another 0x10+ bytes filled of 0x0
# zip_open@plt should be here instead...
When the program is run, dlopen()
works fine and loads libzip
into memory, but then when zip_open()
gets called, it just generates a segmentation fault:
$ ./prog
Segmentation fault (code dumped)
Taking a look with a debugger, the issue is even more obvious (in case it wasn't already obvious enough). The PLT entries filled with zeroes just end up decoding to a bunch of add
instructions dereferencing rax
, which contains an invalid address and makes the program segfault and die:
0x5555555546e5 <main+43> lea rax, [rbp - 0x14]
0x5555555546e9 <main+47> mov rdx, rax
0x5555555546ec <main+50> mov esi, 9
0x5555555546f1 <main+55> lea rdi, [rip + 0xc6]
0x5555555546f8 <main+62> call dlopen@plt+16 <0x555555554590>
|
v ### Broken PLT enrty (all 0x0, will cause a segfault):
0x555555554590 <dlopen@plt+16> add byte ptr [rax], al
0x555555554592 <dlopen@plt+18> add byte ptr [rax], al
0x555555554594 <dlopen@plt+20> add byte ptr [rax], al
0x555555554596 <dlopen@plt+22> add byte ptr [rax], al
0x555555554598 <dlopen@plt+24> add byte ptr [rax], al
0x55555555459a <dlopen@plt+26> add byte ptr [rax], al
0x55555555459c <dlopen@plt+28> add byte ptr [rax], al
0x55555555459e <dlopen@plt+30> add byte ptr [rax], al
### Next PLT entry...
0x5555555545a0 <__cxa_finalize@plt> jmp qword ptr [rip + 0x200a52] <0x7ffff7823520>
|
v
0x7ffff7823520 <__cxa_finalize> push r15
0x7ffff7823522 <__cxa_finalize+2> push r14
Questions
- So, first of all... why is this happening?
- I thought that this was supposed to work, isn't it? If not, why? And why only on one of the two machines?
- But most importantly: how can I fix this?
For question 3 I want to emphasize that the whole point of this is that I want to load the library myself, without linking it, so please refrain from just commenting that this is bad practice, or whatever else.