Hakutaku: SMP on x86-64
Hakutaku Operating Systems OSDev X86 SMP multicore
This is one post in my blog series about developing an Operating System for the
x86-64 platform in Rust.
I aim to cover a small topic in each post along with some code samples to get you started if you are
also interested in doing such a project. The complete project is on Github
Multi-core often sounds like a scary topic for most hobbyist OS developers. Today we will tackle this problem together in our Rust OS, Hakutaku.
X86-64 SMP Start-Up General Process
x86-64 the system / BIOS selects a processor as the Bootstrap Processor
or BSP, and the rest will be selected as Application Processor or AP. All the
APs will be put into a “waiting” state until a INIT/SIPI (Start-up IPI) is
received. Without any special operations, all of your code will run on this
processor. In order to enable all the other processors, we need to do a few
- Locate and identify all the other processors on the system.
SIPIInter-Processor Interrupts (IPI) to each enabled processor.
- Setup temporary GDT / IDT for the APs to bootstrap into
- Add the APs into the scheduler
APIC / LAPIC
APIC or the Advanced Programmable Interrupt Controller, is the new mechanism Intel introduced as part of the Intel Multi-Processor (MP) specification. This is the new mechanism employed by modern processors to manage interrupts.
The APIC system consists of two main components, the Local APIC (LAPIC) and the IO APIC. Each logical processor on the system will have its own LAPIC and all processors on a system will share one single IO APIC. On boot, the IOAPIC should be configured to pass/emulate the Legacy (8259) PIC’s interrupts. For now we will only be configuring the LAPIC. IOAPIC is relatively complicated, we will leave it for a later post.
The Local APIC
The Local APIC is relatively simple. It consists of three sets of registers, the Interrupt Command Word (ICW), APIC Local Timer, and Local Vector Table (LVT). Information on register layout can be found in the Intel Manual (Vol. 3, Section 10.4.1, Table 10-1)
Interrupt Command Word (ICW)
This is probably the most important register as of waking up all the other cores. This is the register that you will need to probe in order to send IPIs. The ICW controls what kind of interrupt it is sending, which vector should it be send to, and who is the recipient of the said interrupt.
Local APIC Timer
The Local APIC Timer as the name suggests, is local to each logical processor, and is capable of operating in one of three modes: One-Shot, Periodic and TSC-Deadline. In our case, since this timer will be generating the periodic interrupt for our scheduler on each core, we will be configuring it to the periodic mode.
Local Vector Table
Unlike the Interrupt Vector Table which determines what action to take on each interrupt vector, the LVT controls what vector / interrupt to invoke when the specified event is triggered. In our case we will need to configure the Spurious Interrupt vector and Local Timer Interrupt vector.
Advanced Configuration and Power Interface (ACPI)
I am sure many of you have heard or have seen this word before. This is a table that describes the configuration of the hardware that our OS is currently running on. It includes many information about IO Mapping, IRQ Routing, Processor Topology and much more.
Although some of the tables in ACPI are quite hard to parse (i.e.
MADT table we need right now is rather easy.
Multiple APIC Description Table (MADT)
MADT is a simple table that contains entries describing all the LAPIC and by association their logical processors. Each entry will have describe the state that the processor is in as well as its processor ID and LAPIC ID.
To add multi-core capability to Hakutaku, we will need to first locate all the other logical processors. There are multiple ways of doing this, at the time of writing, the most reliable and widely used method is to locate and parse the ACPI tables.
There are two main ways to locate the ACPI Tables, depending on if you are booting
from BIOS or EFI, the method will vary. On traditional BIOS boot mode, the ACPI
table pointer is loaded in to certain section of the system memory (BIOS / BIOS
shadow area). You can simply search for the string
RSD PTR (non-null terminated).
When you find the pointer, you can begin to check its signature and its version
number to determine if you are looking at a ACPI 1.0 (version number 0) or later
RSD PTR you should be able to locate the remaining parts of the table
and find the previously mentioned
MADT section to list all the available
processors. Before you proceed further in waking up those processors, you should
always check that they are in a Waiting for SIPI mode. (aka not disabled)
Now comes the interesting part, actually waking up the sleeping cores. To do this I recommend start with only one AP. This will help you detect problem in the process.
NOTE: When you wake up a application processor, it will boot to
and have access to only the first 1MB of Physical Memory. So you should plan
for that and place some code for the APs to run initially. Here I just have a
halt loop and a output to the POST code hex display (Often on I/O Port 0x80).
; intel Syntax, nasm align 4096 bits 16 ap_boot: mov al, 0x45 outb al, 0x80 .loop: hlt jmp ap_boot.loop
Note the entry point have to be aligned to 4K and located in the first MiB of memory.
According to intel manual, the protocol for waking up APs consists of the following:
- Send an INIT.
- Wait for 10ms.
- Send the first SIPI.
- Wait for 200uS.
- If the processor is still not running, send a second SIPI.
The SIPI Interrupt
The SIPI interrupt is a very special one, it kicks off the processor from the default “idling” state. Since we need to tell the processor where to begin execution, the vector field for this interrupt is special.
Instead of specifying a interrupt vector, the value in the vector field is used to select a 4K frame starting from physical address 0 to begin execution hence the code needs to be aligned on a 4K boundary below 1 MiB.
Getting to Long Mode
Now that the cores all woke up and executing code, we can think about how to get them into long mode. If you happened to write your own bootloader this would be simply repeating the steps but without having to load the binary this time. (It is already in memory) However, in my case I am booted into Protected Mode by a multiboot compatible bootloader, there will be quite a few steps to go.
Setup Protected Mode GDT
Although there is nothing preventing you from directly entering long mode from real mode, I decided to go through protected mode as a launch pad.
According to the Intel Manual, to enter protected mode, there are two main steps:
- set the protected mode bit in CR0 (bit 0)
- we need to setup at least some sort of segmentation system (We will not be enabling paging in protected mode).
For #2 The easiest way to achieve this is to hard code a gdt with two entries: one for CS(code segment) and one for DS(data segment). Then all we need to do is to load this GDT and point ss, ds, cs to the correct selector by using mov and ljmp respectively.
core_wakeup: cli ; Disable interrupts, we want to be left alone xor ax, ax mov ds, ax ; Set DS-register to 0 - used by lgdt lgdt [gdt_desc] ; Load the GDT descriptor mov eax, cr0 ; Copy the contents of CR0 into EAX or eax, 1 ; Set bit 0 mov cr0, eax ; Copy the contents of EAX into CR0 jmp 08h:smp_protected_entry gdt: ; Address for the GDT gdt_null: ; Null Segment dd 0 dd 0 gdt_code: ; Code segment, read/execute, nonconforming dw 0xFFFF dw 0 db 0 db 10011010b db 11001111b db 0 gdt_data: ; Data segment, read/write, expand down dw 0xFFFF dw 0 db 0 db 10010010b db 11001111b db 0 gdt_end: ; Used to calculate the size of the GDT gdt_desc: ; The GDT descriptor dw gdt_end - gdt - 1 ; Limit (size) dd gdt ; Address of the GDT bits 32 section .smp.protected smp_protected_entry: mov ax, 0x10 mov ds, ax mov ss, ax jmp _ap_start
Jump to Long Mode
Now all that’s left is to setup the last bit of code to get us into Long Mode. Since we already have the initial GDT setup from kernel’s initial booting process, we can simply reuse that structure and code to setup long mode. And the process of getting to long mode is very similar to what we did above for protected mode. Except this time we will be enable paging and loading CR3 with the initial page table. (This will later be replaced with a page table assigned for each core)
_ap_start: ; load P4 to cr3 register (cpu uses this to access the P4 table) mov eax, p4_table mov cr3, eax ; enable PAE-flag in cr4 (Physical Address Extension) mov eax, cr4 or eax, 1 << 5 mov cr4, eax ; set the long mode bit in the EFER MSR (model specific register) mov ecx, 0xC0000080 rdmsr or eax, 1 << 8 wrmsr ; enable paging in the cr0 register mov eax, cr0 or eax, 1 << 31 mov cr0, eax ; JMP to long lgdt [gdt64.pointer] jmp gdt64.code:_ap_long_mode_start .loop: hlt jmp _ap_start.loop enable_paging: ; load P4 to cr3 register (cpu uses this to access the P4 table) mov eax, p4_table mov cr3, eax ; enable PAE-flag in cr4 (Physical Address Extension) mov eax, cr4 or eax, 1 << 5 mov cr4, eax ; set the long mode bit in the EFER MSR (model specific register) mov ecx, 0xC0000080 rdmsr or eax, 1 << 8 wrmsr ; enable paging in the cr0 register mov eax, cr0 or eax, 1 << 31 mov cr0, eax ret section .rodata.init gdt64: dq 0 ; zero entry .code: equ $ - gdt64 ; new dq (1<<43) | (1<<44) | (1<<47) | (1<<53) ; code segment .pointer: dw $ - gdt64 - 1 dq gdt64
The whole process sure looks daunting at first, but it is actually rather straight forward to implement if you have got this far in your kernel. Once you are in long mode, simply proceed to replace the page table and GDT / TSS with a custom one. Just be careful that you can not reuse a TSS on multiple cores.
Hopefully this guide is helpful to you, and feel free to contact me for any typo/ mistakes. You can find my contact information in the about tab.