Linux – Inside the Build Process

The Executable and Linking Format (ELF) is the file format standard for executables, objects, shared libraries and core dumps in Linux/Unix. The newest debug information format, compatible with ELF, is DWARF 5

If you are not writing a new compiler or debugger it is not necessary to understand every bit on those formats but there are some concepts and some tools that can help you write and manage your code.

Lets take a simple code example and compile it:

#include<stdio.h>

int a1=10,a2=20;

void f2()
{
printf("X");
}

void f1()
{
	int i;
	for(i=0;i<100;i++)
	{
		if(i % 20 == 0)
			a1++;
		f2();
	}
}

void main()
{
    char *str = "hello have a good day.....";
    f1();
    puts(str);
    printf("hello %d\n",a1);
}

compile with defaults

# gcc -o app ./test.c

Now we can obtain the ELF information using readelf(1) tool:

# readelf -a ./app

We can see:

  • The ELF header
    • used by file(1) command to display general information and code hardware architecture
  • Section headers
    • code, data, strings and more
  • Program headers
    • headers for dynamic binaries, stack, etc with their permissions (Read/Write/Exe)
  • Dynamic sections
  • and more

The Section Headers

Section Headers:
  [Nr] Name              Type             Address           Offset    Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  0000000   0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000400238  00000238  000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000400254  00000254  0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.build-i NOTE             0000000000400274  00000274  0000000000000024  0000000000000000   A       0     0     4
  [ 4] .gnu.hash         GNU_HASH         0000000000400298  00000298  000000000000001c  0000000000000000   A       5     0     8
  [ 5] .dynsym           DYNSYM           00000000004002b8  000002b8  0000000000000090  0000000000000018   A       6     1     8
  [ 6] .dynstr           STRTAB           0000000000400348  00000348  0000000000000062  0000000000000000   A       0     0     1
  [ 7] .gnu.version      VERSYM           00000000004003aa  000003aa  000000000000000c  0000000000000002   A       5     0     2
  [ 8] .gnu.version_r    VERNEED          00000000004003b8  000003b8  0000000000000030  0000000000000000   A       6     1     8
  [ 9] .rela.dyn         RELA             00000000004003e8  000003e8  0000000000000018  0000000000000018   A       5     0     8
  [10] .rela.plt         RELA             0000000000400400  00000400  0000000000000060  0000000000000018  AI       5    24     8
  [11] .init             PROGBITS         0000000000400460  00000460  000000000000001a  0000000000000000  AX       0     0     4
  [12] .plt              PROGBITS         0000000000400480  00000480  0000000000000050  0000000000000010  AX       0     0     16
  [13] .plt.got          PROGBITS         00000000004004d0  000004d0  0000000000000008  0000000000000000  AX       0     0     8
  [14] .text             PROGBITS         00000000004004e0  000004e0  00000000000002a2  0000000000000000  AX       0     0     16
  [15] .fini             PROGBITS         0000000000400784  00000784  0000000000000009  0000000000000000  AX       0     0     4
  [16] .rodata           PROGBITS         0000000000400790  00000790  000000000000001c  0000000000000000   A       0     0     4
  [17] .eh_frame_hdr     PROGBITS         00000000004007ac  000007ac  0000000000000054  0000000000000000   A       0     0     4
  [18] .eh_frame         PROGBITS         0000000000400800  00000800  0000000000000174  0000000000000000   A       0     0     8
  [19] .init_array       INIT_ARRAY       0000000000600e10  00000e10  0000000000000008  0000000000000000  WA       0     0     8
  [20] .fini_array       FINI_ARRAY       0000000000600e18  00000e18  0000000000000008  0000000000000000  WA       0     0     8
  [21] .jcr              PROGBITS         0000000000600e20  00000e20  0000000000000008  0000000000000000  WA       0     0     8
  [22] .dynamic          DYNAMIC          0000000000600e28  00000e28  00000000000001d0  0000000000000010  WA       6     0     8
  [23] .got              PROGBITS         0000000000600ff8  00000ff8  0000000000000008  0000000000000008  WA       0     0     8
  [24] .got.plt          PROGBITS         0000000000601000  00001000  0000000000000038  0000000000000008  WA       0     0     8
  [25] .data             PROGBITS         0000000000601038  00001038  0000000000000018  0000000000000000  WA       0     0     8
  [26] .bss              NOBITS           0000000000601050  00001050  0000000000000008  0000000000000000  WA       0     0     1
  [27] .comment          PROGBITS         0000000000000000  00001050  0000000000000034  0000000000000001  MS       0     0     1
  [28] .shstrtab         STRTAB           0000000000000000  00001084  00000000000000fc  0000000000000000           0     0     1

In each section the build system places different entities. The important sections are

  • .text – the compiled code
  • .data  – initialized data
  • .bss – uninitialized data

.bss and .data

To see the difference lets look at a simple code:

int arr[1000];

void main()
{
arr[0]=1;
arr[1]=2;

....
}

If we declare a global array without initialization – the build system place it in .bss section:

readelf output:

On application loading the .bss section allocated in ram. The ELF file size:

 

But if we add some values to initailize the array:

int arr[1024] = {1,2};

void main()
{
char *str = "hello have a good day.....";
...
}

We can see that the array is now on .data section:

and the ELF size is now bigger:

means that the ELF file contains the array even if we only initialized only 2 elements

So if the same program declare the array size as 1000000 :

int arr[1000000] = {1,2};

void main()
{
char *str = "hello have a good day.....";
...
}

the size is now:

i.e. the ELF file is now full of zeros

 

Debugging Information

debugging info is placed inside the ELF file to let the debugger know the source line correspond to the machine code. The debugger load the program and use the debug info to know where to place breakpoints – It replace the machine instruction with a trap (in x86 – int 3) , this will cause the CPU to generate an exception and it replace it back after resuming

if we compile the code with debug info we will see it size get bigger:

And the sections:

Other useful tools

nm – list the symbols with addresses:

developer@:~/testapp$ nm app2
0000000000601048 D a1
000000000060104c D a2
0000000000601050 B __bss_start
0000000000601050 b completed.7585
0000000000601038 D __data_start
0000000000601038 W data_start
.....

objdump – display information for object file:

  • headers (-x)
  • debugging information (-g)
  • disassembly (-d)
  • disassembly with source code (-S)
  • and more …
# objdump -S ./app2
...
void main()
{
  400626:	55                   	push   %rbp
  400627:	48 89 e5             	mov    %rsp,%rbp
  40062a:	48 83 ec 10          	sub    $0x10,%rsp
char *str = "hello have a good day.....";
  40062e:	48 c7 45 f8 f4 06 40 	movq   $0x4006f4,-0x8(%rbp)
  400635:	00 
f1();
  400636:	b8 00 00 00 00       	mov    $0x0,%eax
  40063b:	e8 87 ff ff ff       	callq  4005c7 <f1>
puts(str);

addr2line – display the source code line number from known address

This is very useful if the program crashed and we wrote a fault handler to display the faulty address (program counter). We can use this tool with ELF containing debug info (the crashed program can be stripped):

developer@:~/testapp$ addr2line -e app2 0x400630
/home/developer/testapp/./a.c:25

strip – remove symbols and debugging information from ELF file

Before we deploy our app, we can remove all the symbols and debug info without the need to compile it again.

developer@:~/testapp$ strip -o appstripped ./app
developer@:~/testapp$ ls -l
total 44
-rw-rw-r-- 1 developer developer   290 dec  7 19:04 a.c
-rwxrwxr-x 1 developer developer 29864 dec  7 19:05 app
-rwxrwxr-x 1 developer developer  6336 dec  7 21:17 appstripped

objcopy -copy and translate object files

you can use objcopy to copy content from one elf file to another, for example if you want to copy only debug information to a separate file use:

# gcc -g3 -o app ./a.c 
# objcopy --only-keep-debug app app.debug
# strip -s ./app

And you can add the debug info later:

# objcopy --add-gnu-debuglink app.debug app
# gdb ./app

size – display sections size

# size ./app
   text	   data	    bss	    dec	    hex	filename
   1594	    576	      8	   2178	    882	./app

 

Tagged