x86–64 4Kbyte Page translation design — is this the reason why pages are 4096 bytes aligned?

Paulo Almeida
3 min readMay 13, 2021

--

Context

I lost 3 weeks of my life trying to wrap my head around the Intel x86–64/AMD64 manuals to try implementing 4-Kbyte Page translation for my hobbyist OS. The purpose of this small blog post is to document some of the things that in the manual aren’t clear at all. (especially for newcomers)

4-Kbyte Page Translation

You have seen that picture a billion times and if you are reading this post you know that you need to create these Page Tables in memory so that the hardware establishes a well-defined interface with the OS for allocating small chunks of memory before moving to 64-bit mode.

A few relevant things about those tables:

  • one entry in a PML4 can address 512GB
  • one entry in a PDP can address 1GB
  • one entry in a PD can address 2MB
  • one entry in a PT can address 4KB

The not-so-easy to grasp hardware architecture design

Image that I want to create identity-mapped pages for the first 10MB of memory during the early stages of the boot process. This will be useful for loading the kernel in memory memory before moving to Long mode (64-bit).

In order to achieve that using 4-Kbyte pages I will need:

  • 1 entry on the PML4 table
  • 1 entry on the PDP table
  • 5 entries on the PD table
  • 2,560 entries on the PT table

This is a piece of code that does exactly that:

There are 2lines in particular that I want to point out:

.StdBits   equ 0x03
mov DWORD [PDE.Addr], (PTE.Addr) | .StdBits

If you do the interpolation, this will look like this:

mov DWORD [0x00012000], 0x00013003 ;(0x0001300 Bitwise OR 0x3)

The explanation to why we do that is due to the expected 4-Kbyte PDE entry format

So essentially 0x03 can be understood as:

0x3 -> 00000011 -> Present bit and Read/Write Bit set

But if you look closely, you will find something very very odd. We are led to believe that the Page-Table Base Address (0x13000) should be set between the bits 51:12 as per the picture above.

If that assumption is correct, then 0x13000 should be left shifted 12 bits and become 0x13000000. So after the setting the page flags it would become effectively 0x13000003. Which means that the code snippet you shared before is wrong, is that right?

Think like that if you want to spend 3 weeks trying to figure out why you OS doesn’t boot while questioning every career decision you’ve made in the meantime 😰

The train of thought isn’t wrong per se, but the x86–64 design works a bit different from what we speculated.

Page data- structure tables are always aligned on 4-Kbyte boundaries, so only the address bits above bit 11 are stored in the translation-table base-address field. Bits 11:0 are assumed to be 0. (Taken from the AMD64 manual, page 150)

What this means in practice is that if our page is on address 0x13000 and flag bits are 0x03 = 0x13003, our hardware ‘re-utilises’ the 12-bits to set the flags while it knows that if it adds 0x1000 to the current page address, it can get to the next page. It’s hard not to speculate if that alone isn’t the whole reason why page tables are 4096 bytes aligned (or 0x1000 😜)

The problem is that although this is written in the manual, no emphasis is really given to this sparse sentence in a middle of a manual with thousands of pages 😫

Well, it took me a long time to get to this and I’m sure that I will forget my name one day but won’t forget this battle with x86–64 paging implementation.

I hope that if someone is struggling with the same problem that I could have reduced that time it took you to figure this out. ❤️

--

--

Paulo Almeida
Paulo Almeida

Written by Paulo Almeida

Interested in technical deep dives and the Linux kernel; Opinions are my own;

No responses yet