Linux – Writing Fault handlers

Every Linux C/C++ developer is familiar with the message “Segmentation fault (core dumped)” , It can happen by accessing wrong memory address, making a floating point error and more.

To find out the problem, you need to enter debug mode or use the core dump

Post mortem – core dump

While the program is loading , the system arrange the process memory, put the code in .text section, initialized data in .data section and uninitialized data in .bss. It also map pages for stack and heap and for every shared object. You can configure the system to dump all this information to a file

The information includes:

  • .data section at the crash point
  • .bss section
  • heap
  • stack
  • registers values
  • stack trace
  • and more

To create such file run:

Now after crashing you can find a file core in the directory and load it with your application to the debugger to see the problem:

The debugger reports the problem (SIGSEGV) and the address with the file and line number (myapp.c:67)

You can examine the registers (info registers) or the memory (x/20w $rsp) , see backtrace (bt) and more

Problems with the core dump

There are 2 main problems with the generated core dump:

  1. It can be very large if the program consumed a lot of memory before crashing. This is a problem while the core dump was generated on a client machine and we need to transfer it to the developer, it can also be a big problem if the storage is relatively small for example in embedded systems sometimes you have 1GB ram but only 256MB flash. Also sometimes the file system is read only (root file system on android smartphone)
  2. While we examine the core dump we can see only data relevant to the CPU. If for example we have hardware mapped on a virtual address, we can’t access those addresses using the core dump. In other words, the core dump records only the CPU and none of the other hardware registers (Fpga registers, …)

Writing Fault handler

To solve the above problems you can write a signal handler for some of the signals caused by errors:

This helper function gets a function pointer and set it to be a signal handler for SIGSEGV(wrong memory address access), SIGFPE (floating point error, divide by zero), SIGILL (illegal instruction) and SIGBUS( bus error, accessing odd address)

Now lets write a simple handler and use it:

In this example we only print a message and abort but here we can dump whatever we like for example assume we already mapped a hardware region we can dump it

You can also use the data provided in the siginfo_t parameter for example on SIGFPE you can find the address of the faulty instruction in:

Note that on SIGSEGV si_addr will be the faulty address (the address we tried to access – NULL in the above example)

The fault context

The most important thing about the fault handler is the 3rd parameter. In the prototype we declare it as void * but actually its type is ucontext_t. The reason that it declared as void * is because ucontext_t is architecture depended , You can find the structure in its header file and use it

For example if we want to print the Instruction Pointer register of the faulty instruction on x86 64 bit we write:

gregs array contains all general registers

We can use this method to access any register (RSP , RBP, RAX, …)

If we write this for ARM architecture its different:

Again we can access other registers this way : arm_sp , arm_lr, arm_r0 etc

See the header file of the required architecture for details

If we run the above code we get:

We can see that siginfo report the faulty address (NULL) but RIP value is 0x400867.

Now we can load the program into the debugger to see the address content:

use the list command with the address Р(gdb) list  *0x400867

As you can see it points the faulty file and line number (fa.c:67)

Integrating with the core dump

If you just return from the fault handler it will be called again and again. thats why we need to call abort or exit at the end. The abort system call generate a core dump but it is generated from the signal handler context. One trick we can use is to set back the default signal handler – this will produce a core dump from the original fault point:

The output now:

And the core dump was generated on the faulty line (and not by the abort.

TIP – Use LD_PRELOAD to load the handler to any compiled program

You can build the fault handler into a shared library and use the constructor to set it. Then you can inject the library to any process using LD_PRELOAD:

The code for the shared object:

Compile it to a library:

The application (without any handler):

Simple compile it:

Running it “as is”:

Inject the shared object:

 

On error resume next

In some cases we need our program to continue running even if we had a problem. We can handle this in some ways:

  1. Continue and never return from the fault handler
  2. Use execve(2) system call or one of its wrapper to call another main
  3. Change the context instruction pointer to return to different code:

Output:

4. Jump over the faulty line – this is dangerous and very platform depended. You need to know how many bytes to jump for the next instruction. For example in ubuntu 64 bit:

Output:

As you can see from the output , the program jumped over the null pointer assignment and the divide by zero. Not so useful but possible.

You can see the full code example here

 

Tagged ,

2 thoughts on “Linux – Writing Fault handlers

  1. […] you write fault handlers¬†in your code and the program crashed, the faulty address is printed to the […]

  2. THank you for this great article. Can you tell us more about Possible security issues in relation to “On error resume next” part when jumping to next instruction ?

Leave a Reply

Your email address will not be published. Required fields are marked *