The Executable and Linking Format (ELF) is the file format standard for executables, objects, shared libraries and core dumps in Linux/Unix. The newest debug information format, compatible with ELF, is DWARF 5
If you are not writing a new compiler or debugger it is not necessary to understand every bit on those formats but there are some concepts and some tools that can help you write and manage your code.
Lets take a simple code example and compile it:
#include<stdio.h> int a1=10,a2=20; void f2() { printf("X"); } void f1() { int i; for(i=0;i<100;i++) { if(i % 20 == 0) a1++; f2(); } } void main() { char *str = "hello have a good day....."; f1(); puts(str); printf("hello %d\n",a1); }
compile with defaults
# gcc -o app ./test.c
Now we can obtain the ELF information using readelf(1) tool:
# readelf -a ./app
We can see:
- The ELF header
- used by file(1) command to display general information and code hardware architecture
- Section headers
- code, data, strings and more
- Program headers
- headers for dynamic binaries, stack, etc with their permissions (Read/Write/Exe)
- Dynamic sections
- and more
The Section Headers
Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 0000000 0000000000000000 0000000000000000 0 0 0 [ 1] .interp PROGBITS 0000000000400238 00000238 000000000000001c 0000000000000000 A 0 0 1 [ 2] .note.ABI-tag NOTE 0000000000400254 00000254 0000000000000020 0000000000000000 A 0 0 4 [ 3] .note.gnu.build-i NOTE 0000000000400274 00000274 0000000000000024 0000000000000000 A 0 0 4 [ 4] .gnu.hash GNU_HASH 0000000000400298 00000298 000000000000001c 0000000000000000 A 5 0 8 [ 5] .dynsym DYNSYM 00000000004002b8 000002b8 0000000000000090 0000000000000018 A 6 1 8 [ 6] .dynstr STRTAB 0000000000400348 00000348 0000000000000062 0000000000000000 A 0 0 1 [ 7] .gnu.version VERSYM 00000000004003aa 000003aa 000000000000000c 0000000000000002 A 5 0 2 [ 8] .gnu.version_r VERNEED 00000000004003b8 000003b8 0000000000000030 0000000000000000 A 6 1 8 [ 9] .rela.dyn RELA 00000000004003e8 000003e8 0000000000000018 0000000000000018 A 5 0 8 [10] .rela.plt RELA 0000000000400400 00000400 0000000000000060 0000000000000018 AI 5 24 8 [11] .init PROGBITS 0000000000400460 00000460 000000000000001a 0000000000000000 AX 0 0 4 [12] .plt PROGBITS 0000000000400480 00000480 0000000000000050 0000000000000010 AX 0 0 16 [13] .plt.got PROGBITS 00000000004004d0 000004d0 0000000000000008 0000000000000000 AX 0 0 8 [14] .text PROGBITS 00000000004004e0 000004e0 00000000000002a2 0000000000000000 AX 0 0 16 [15] .fini PROGBITS 0000000000400784 00000784 0000000000000009 0000000000000000 AX 0 0 4 [16] .rodata PROGBITS 0000000000400790 00000790 000000000000001c 0000000000000000 A 0 0 4 [17] .eh_frame_hdr PROGBITS 00000000004007ac 000007ac 0000000000000054 0000000000000000 A 0 0 4 [18] .eh_frame PROGBITS 0000000000400800 00000800 0000000000000174 0000000000000000 A 0 0 8 [19] .init_array INIT_ARRAY 0000000000600e10 00000e10 0000000000000008 0000000000000000 WA 0 0 8 [20] .fini_array FINI_ARRAY 0000000000600e18 00000e18 0000000000000008 0000000000000000 WA 0 0 8 [21] .jcr PROGBITS 0000000000600e20 00000e20 0000000000000008 0000000000000000 WA 0 0 8 [22] .dynamic DYNAMIC 0000000000600e28 00000e28 00000000000001d0 0000000000000010 WA 6 0 8 [23] .got PROGBITS 0000000000600ff8 00000ff8 0000000000000008 0000000000000008 WA 0 0 8 [24] .got.plt PROGBITS 0000000000601000 00001000 0000000000000038 0000000000000008 WA 0 0 8 [25] .data PROGBITS 0000000000601038 00001038 0000000000000018 0000000000000000 WA 0 0 8 [26] .bss NOBITS 0000000000601050 00001050 0000000000000008 0000000000000000 WA 0 0 1 [27] .comment PROGBITS 0000000000000000 00001050 0000000000000034 0000000000000001 MS 0 0 1 [28] .shstrtab STRTAB 0000000000000000 00001084 00000000000000fc 0000000000000000 0 0 1
In each section the build system places different entities. The important sections are
- .text – the compiled code
- .data – initialized data
- .bss – uninitialized data
.bss and .data
To see the difference lets look at a simple code:
int arr[1000]; void main() { arr[0]=1; arr[1]=2; .... }
If we declare a global array without initialization – the build system place it in .bss section:
readelf output:
On application loading the .bss section allocated in ram. The ELF file size:
But if we add some values to initailize the array:
int arr[1024] = {1,2}; void main() { char *str = "hello have a good day....."; ... }
We can see that the array is now on .data section:
and the ELF size is now bigger:
means that the ELF file contains the array even if we only initialized only 2 elements
So if the same program declare the array size as 1000000 :
int arr[1000000] = {1,2}; void main() { char *str = "hello have a good day....."; ... }
the size is now:
i.e. the ELF file is now full of zeros
Debugging Information
debugging info is placed inside the ELF file to let the debugger know the source line correspond to the machine code. The debugger load the program and use the debug info to know where to place breakpoints – It replace the machine instruction with a trap (in x86 – int 3) , this will cause the CPU to generate an exception and it replace it back after resuming
if we compile the code with debug info we will see it size get bigger:
And the sections:
Other useful tools
nm – list the symbols with addresses:
developer@:~/testapp$ nm app2 0000000000601048 D a1 000000000060104c D a2 0000000000601050 B __bss_start 0000000000601050 b completed.7585 0000000000601038 D __data_start 0000000000601038 W data_start .....
objdump – display information for object file:
- headers (-x)
- debugging information (-g)
- disassembly (-d)
- disassembly with source code (-S)
- and more …
# objdump -S ./app2 ... void main() { 400626: 55 push %rbp 400627: 48 89 e5 mov %rsp,%rbp 40062a: 48 83 ec 10 sub $0x10,%rsp char *str = "hello have a good day....."; 40062e: 48 c7 45 f8 f4 06 40 movq $0x4006f4,-0x8(%rbp) 400635: 00 f1(); 400636: b8 00 00 00 00 mov $0x0,%eax 40063b: e8 87 ff ff ff callq 4005c7 <f1> puts(str);
addr2line – display the source code line number from known address
This is very useful if the program crashed and we wrote a fault handler to display the faulty address (program counter). We can use this tool with ELF containing debug info (the crashed program can be stripped):
developer@:~/testapp$ addr2line -e app2 0x400630 /home/developer/testapp/./a.c:25
strip – remove symbols and debugging information from ELF file
Before we deploy our app, we can remove all the symbols and debug info without the need to compile it again.
developer@:~/testapp$ strip -o appstripped ./app developer@:~/testapp$ ls -l total 44 -rw-rw-r-- 1 developer developer 290 dec 7 19:04 a.c -rwxrwxr-x 1 developer developer 29864 dec 7 19:05 app -rwxrwxr-x 1 developer developer 6336 dec 7 21:17 appstripped
objcopy -copy and translate object files
you can use objcopy to copy content from one elf file to another, for example if you want to copy only debug information to a separate file use:
# gcc -g3 -o app ./a.c # objcopy --only-keep-debug app app.debug # strip -s ./app
And you can add the debug info later:
# objcopy --add-gnu-debuglink app.debug app # gdb ./app
size – display sections size
# size ./app text data bss dec hex filename 1594 576 8 2178 882 ./app