Custom Kernel Debugging — Using GDB to debug dynamically loaded ELF files

Paulo Almeida
5 min readJun 21, 2021

--

Introduction

It doesn’t matter whether you are a novice developer or an expert one, debugging is the task that you will likely spend most of your time doing.

Interestly, to my great surprise, very little focus is given by developers in terms of figuring out better ways to debug programs behaving unpredictably. It seems like debugging is an art lost and forgotten to time (unfortunately).

Yes, you might be wondering about so many advancements in the field and whether I seem to have forgotten those. No, I have not. They are fantastic tools/methods and there is nothing wrong with them but I have a gut feeling that most developers seem not to know how to use them to reduce troubleshooting time…. and that’s the scary part.

This blog post aims to document a recently fought battle to get gdb to allow me to properly debug my custom kernel the way I wanted.

Problem

When it comes to debugging C/C++ programs, GDB still one of the only options that can do the job well. Not because it’s easy and friendly because it isn’t (at least initially) but due to all of the already existing integrations with a multitude of tools out there.

One of those integrations that is worth mentioning is QEMU+GDB integration. Very powerful but very little intuitive

When debugging custom kernels, you can’t simply put a breakpoint in your code and expect GDB to ‘automagically’ have everything ready for you to correlate the C code with the assembly instructions generated.

This is mostly because during the booting process the control is passed around between different pieces of software and by the time it gets to your ELF executable (kernel), GDB has no idea of what it’s looking at, so generally you will get a screen like this:

gdb layout src

Sure, you can still get the assembly instructions and trust me, this will come in handy a lot of times….but not this time.

After some time playing with Assembly and C you kind of develop this sense of translation in your head somehow which allows you to predict how your C code will be translated after the compiler is done with its job…but the harder the issue is, the more complex the translation process will be.

The bottomline is be proud of that but don’t solely rely on this ability otherwise you will take longer to fix what you need to fix, okay?

Solution

Things I will take for granted here to ensure the blog post is not too lengthy:

With that out of the way, these are the things you have to have to put all the pieces together

  1. Separate the debug symbols and debug-related ELF sections from the rest
  2. Find out in which memory address the C code will be in memory so we can set the breakpoint
  3. Initiate QEMU and GDB in debug mode
  4. Load symbol file at the right time and tweaking some of GDB ‘misunderstandings’

Separate the debug symbols and debug-related ELF sections from the rest

Assuming that you linked your assembly and C code with the right params like in the example below

Now, you will have an ELF file called build/kernel. So let’s strip the debug stuff from it

Find out in which memory address the C code will be in memory so we can set the breakpoint

In the ELF format, we have a symbol table called .symtab that can give us the address of where a C method will be located in memory. In our case, we want to stop at the very first line of our custom kernel kmain function located on memory address 0x201014

PS.: Sure, there are a gazillion of different caveats when it comes to absolute memory position of code but this is not part of this blog post’s scope. Otherwise, I will have to write a book instead and you don’t want that, do you? :)

Initiate QEMU and GDB in debug mode

In two different terminal sessions run these commands

Contents of the ./gdb/debug_commands.txt

In simple words, we are telling QEMU and GDB to communicate through port 8864 and configuring GDB‘s window layout.

Load symbol file at the right time and tweaking some of GDB ‘misunderstandings’

By now, you should have GDB ready to run and QEMU waiting for the green light signal. So you have to tell GDB when to stop on a “C instruction” first.

Assuming that you want to stop at the kmain function, execute on GDB terminal:

(gdb) br *0x201014
Breakpoint 2 at 0x201014

(gdb) c
Continuing.

When GDB pauses again, this means that the custom kernel paused at the desired location/instruction. That means that it’s time to load the ELF debug file and tell GDB whether the source code is

Add symbol file and correct source path to the GDB session

(gdb) layout src
(gdb) set directories src/kernel/

(gdb) symbol-file build/kernel.debug
Reading symbols from build/kernel.debug...

Voilà! Now you can debug your custom kernel as close as possible to the way that you would debug any other application. Truth to be told, when it gets to interruptions, NMI things will get hairier but one problem at the time. Until next time ;)

--

--

Paulo Almeida
Paulo Almeida

Written by Paulo Almeida

Interested in technical deep dives and the Linux kernel; Opinions are my own;

No responses yet