Linux – Inside the Build Process

The Executable and Linking Format (ELF) is the file format standard for executables, objects, shared libraries and core dumps in Linux/Unix. The newest debug information format, compatible with ELF, is DWARF 5

If you are not writing a new compiler or debugger it is not necessary to understand every bit on those formats but there are some concepts and some tools that can help you write and manage your code.

Lets take a simple code example and compile it:

compile with defaults

Now we can obtain the ELF information using readelf(1) tool:

We can see:

  • The ELF header
    • used by file(1) command to display general information and code hardware architecture
  • Section headers
    • code, data, strings and more
  • Program headers
    • headers for dynamic binaries, stack, etc with their permissions (Read/Write/Exe)
  • Dynamic sections
  • and more

The Section Headers

In each section the build system places different entities. The important sections are

  • .text – the compiled code
  • .data  – initialized data
  • .bss – uninitialized data

.bss and .data

To see the difference lets look at a simple code:

If we declare a global array without initialization – the build system place it in .bss section:

readelf output:

On application loading the .bss section allocated in ram. The ELF file size:

 

But if we add some values to initailize the array:

We can see that the array is now on .data section:

and the ELF size is now bigger:

means that the ELF file contains the array even if we only initialized only 2 elements

So if the same program declare the array size as 1000000 :

the size is now:

i.e. the ELF file is now full of zeros

 

Debugging Information

debugging info is placed inside the ELF file to let the debugger know the source line correspond to the machine code. The debugger load the program and use the debug info to know where to place breakpoints – It replace the machine instruction with a trap (in x86 – int 3) , this will cause the CPU to generate an exception and it replace it back after resuming

if we compile the code with debug info we will see it size get bigger:

And the sections:

Other useful tools

nm – list the symbols with addresses:

objdump – display information for object file:

  • headers (-x)
  • debugging information (-g)
  • disassembly (-d)
  • disassembly with source code (-S)
  • and more …

addr2line – display the source code line number from known address

This is very useful if the program crashed and we wrote a fault handler to display the faulty address (program counter). We can use this tool with ELF containing debug info (the crashed program can be stripped):

strip – remove symbols and debugging information from ELF file

Before we deploy our app, we can remove all the symbols and debug info without the need to compile it again.

objcopy -copy and translate object files

you can use objcopy to copy content from one elf file to another, for example if you want to copy only debug information to a separate file use:

And you can add the debug info later:

size – display sections size

 

Leave a Reply

Your email address will not be published. Required fields are marked *