x86-64 Troubleshooting Tales: Why only half of my interrupts work?

Paulo Almeida
4 min readJul 22, 2021

About the x86–64 Troubleshooting Tales series

x86–64 Troubleshooting Tales is this new side-project of mine in which I describe problems I faced while debugging issues on AlmeidaOS (my pet-project).

The idea is very simple:

I document symptoms observed and internal monologues I had while pulling my hair out in a desperate attempt of figuring things out.

You on the other hand, learn a thing or two and can laugh at the situation (or me).

I hope you enjoy it!

Disclaimer: This is not meant to be tutorial-level material so I will take a few things for granted and assume that you know what I’m talking about.

On the brightside, if you are a curious individual then you will know straight away what you need to learn more about after reading this blog post. Either way you win 😉

Introduction

After a lot of groundwork code written, I decided to implement the timer and keyboard interrupts in my OS. In order to do that, I had to configure the Programmable Interrupt Controller (PIC) and Programmable Interval Timer (PIT) chips.

Programmable Interrupt Controller (PIC)

The 8259 PIC controls the CPU’s interrupt mechanism, by accepting several interrupt requests and feeding them to the processor in order.

The OSDev website has a very interesting page about this chip with lots of historical facts, I highly recommend you to take a look at https://wiki.osdev.org/8259_PIC

On a IBM PC, this is roughly how it looks like:

Conceptual view of how the 2 PIC chips fits into the picture

For the purposes of this series, I just want you to know where the Keyboard and Timer IRQ (Interrupt ReQuest) come into the picture:

List of IRQ assignment to PIC

It turns out that x86–64 architecture defines 255 interrupts, however Interrupts 0 - 31 are pre-defined as shown below, so I had to configure PIC to offset all IRQ by 32, that way IRQ 0 becomes IRQ 32 and IRQ 1 becomes IRQ 33 and so on.

So this whole thing put together looks like this

Without further ado, here is the code to initialise both PIC chips

Programmable Interval Timer (PIT)

The Programmable Interval Timer (PIT) chip basically consists of an oscillator, a prescaler and 3 independent frequency dividers.

This chip has multiple modes and can be used in a multitude of scenarios but most people end up using it to implement sleep routines. OSDev has again another great article on that https://wiki.osdev.org/Programmable_Interval_Timer

This is the code used to initialise the PIT.

Putting it all together

At that point, I thought everything was good to go so I built my project ran qemu. This is what my kernel entry looked like:

To my great surprise, this is what I’ve been presented with

The General-Protection exception is defined by Intel as vector 13 on the Interrupt Descriptor Table (IDT). The error dump you can see above is a rudimentary CPU dump routine I had written to help myself debugging in real hardware.

This is how my interrupt handler routine looked like:

The fact that I could see the error on the console for reserved vectors (like #GP shown above) is proof that my interrupt_handler function was being called by the CPU correctly. On the other hand, this didn’t seem to be the case for PIC IRQs 🤯.

I cannot overstate how puzzling that was. Then again, I had configured all my IDT entries in the precise same way as shown below

Solution

After spending almost 2 nights anguishing about the problem while trying to speculate where I could’ve missed something on how IDT works, I finally managed to find the culprit.

Disclaimer: That’s dumb as f*** — as any bug after we find the solution =)

It turns out that when I configured the IDT table pointer, I had done the following

Basically the limit field expects the number of bytes utilised by your IDT. Don’t ask me why (I’m also ashamed by that), I interpreted it as the number of items on the IDT.

So here is the thing, according to the Intel 64-bit Manual, each IDT entry occupies 8 bytes. The value I specified was ARR_SIZE(idt64_table) — 1 which is equal to 255 bytes. If we do the math, 255(bytes)/8(entry_size)= 31ish which explains precisely why the CPU could find the first interrupts (0–31) but not 32 onwards. 🤦‍♂️

In the end, this is how it should’ve been written to save me hours of debugging the issue.

This is it everyone, I hope you have enjoyed this troubleshooting tale. Share this story with everyone that you think that may like this type of content.

Paulo Almeida

--

--

Paulo Almeida

Interested in technical deep dives and the Linux kernel; Opinions are my own;