Guide OS (2): Stage 2 and Loading Kernel

May 07, 2022
1291 Words
Operating Systems OSDev X86 GuideOS

Tip: you can click on the GuideOS tag above to see all posts in this series.

Code download for this and next post can be found here

https://github.com/Codetector1374/GuideOS/tree/post2

Recap and Overview

Before we dive into how to continue booting our future operating system, we will quickly take a look back and see where did we left off last time.

Previously, we wrote our first freestanding binary in x86 assembly that lives in the boot sector, which the bios was able to load for us. With that code, we were able to print the classic “Hello World” except here on baremetal: without any kind of operating system support. It is a great starting point; however, there are still a few problems left:

We only have access to 512 bytes of code space (that’s how much BIOS loads)
We are still in real mode with access to only the lowest 1MB of memory.

In this post we will be addressing both of these problem by writing our second stage bootloader and loading it, then we will be entering protected (32-bit) mode within the second stage bootloader. This will give us access to the first 4GB of memory.

Second Stage Bootloader

As we discussed in previous post, we will be using a two stage loader scheme here, so now it is the best time for us to load the second stage loader. In order to simplify this loading experience, we will be placing the second stage bootloader binary directly after the first stage (bootblock). This will be the second block on the disk (at offset 512).

BIOS Disk Calls and Loading Stage 2

In order to load our second stage bootloader, we will need to read data from disk. The easiest way to accomplish this right now is to utilize various BIOS interrupt calls. You can find a list of the availiable BIOS interrupt calls here. In fact we were calling one of these BIOS functions in our stage 1 bootloader to print that “Hello World.” BIOS functions are identified by an interrupt vector number and an id that you will place in the ah register. For now one of the function we could use is int 13 | ah=0x42: Extended Read. When calling this function we will have ah = 0x42, dl contains the drive number we want to read from, and ds:si points to a disk address packet that looks like this:

struct DiskAddressPacket {
	uint8_t size_of_packet; // 0x10 or 0x18 depending if optional part is used or not
	uint8_t _padding;
	uint16_t number_of_blocks_to_transfer;
	// In Segment : Offset addressing. i.e: 0x1234:5678 would store 0x12345678
	uint32_t target_memory_address; 
	uint64_t starting_disk_lba;
	// optional: 64-bit flat address of the target. (Not supported by all BIOS 
	// and only used if target_memory_address = 0xffffffff)
	uint64_t long_flat_address;
}

After the int 0x13 the BIOS should return control back to our program, and set the carry flag if error occured. (there is also an error code in ax but I was not able to find documentation about it.)

As a side note, you can also take a look at int 0x13 | ah = 0x41 to check if the BIOS actually supports ah=0x42 function. It should be present on most if not all modern PC systems. so I will not be describing that process here.

Now that we have the ability to load sectors (blocks) from disk, we will just need to call it with the correct arguments to load the next 64 sectors (512 Bytes / sector * 64 sectors = 32KiB) into memory at 0x7E00 like we planned in previous post.

Enabling A20

X86 has this speical quirk called A20 gate, in essense by default we are not allowed to access memory beyond 1M (roughly). This is due to the original 8086 process has only 20 bits of address, and many programs back then used the bad practice of relying on accessing memory address higher than that would wrap around to the low addresses. E.g accessing 0x123456 is actually accessing 0x23456 since the the largest address that can be represented using 20 bits is 0xFFFFF. To make sure all these software worked correctly on the later 80386 processor which has 32 bit address bus, the processor simply emulate this behavior on boot.

We want to enable A20 here, because stage 2 will likely need to access memory higher than 1M, since that’s where we want to load our kernel at.

There are many ways of enabling the “A20 gate”, the most reliable one seems to be through the keyboard controller. Feel free to use the code provided here, as this is just one of the “hoops” we will have to jump through using the X86 platform.

seta20.1:
  in        al, 0x64   # Wait for not busy
  test      al, 0x2
  jnz       seta20.1

  mov       al, 0xd1   # 0xd1 -> port 0x64
  out       0x64, al

seta20.2:
  in      al, 0x64     # Wait for not busy
  test    al, 0x2
  jnz     seta20.2

  mov    al, 0xdf      # 0xdf -> port 0x60
  out    0x60, al

Transfering Control to Stage 2

This part is fairly straight forward, we simply need to perform a jmp to the predetermined entrypoint of stage 2 bootloader: 0x0000:7E00.

The stage 2 bootloader binary setup is almost identical to stage 1, feel free to copy the configuration I have in the repository so you can avoid going through all the CMake setup work again.

Loading Kernel

Actually loading the kernel works just like loading the second stage bootloader, except we will need to issue multiple bios disk reads. From my experience, some BIOS has trouble loading more than 127 sectors, so to be safe, we will just execute multiple 64 sector loads. We will be loading the next 256K from disk into memory address 0x1000:0000, which translates to linear address 0x10000

You can see this implemented in stage2.s.

Entering protected mode

While we are in real mode, we have limited access to memory beyond 1M, one of the important goal of our stage 2 loader is to get us into protected mode (32-bit).

From the Intel manual, we can see that to enter protected mode, we mainly need to supply the processor with a GDT and corresponding segment descriptors, then set the PE bit in CR0 register.

Since we just want a simple big flat segment (we will not be using the segmentation system functionally), we just need to setup 2 segment descriptors, one for code and one for data. Both of them will span the entire 4GB memroy space. Note that x86 requrie the first descriptor in GDT not to be used, it should be set as all zeros, called the Null Descriptor.

You can find the implementaiton of protected mode entry in stage2_bootloader/src/asm/protected_mode.s.

(Optional) Copying the kernel to higher address

Now that we are in protected mode, we can copy our kernel to higher address easily. Here I chose to copy it from 0x10000 to 0x100000 (1M). There is no particular reason for doing this, other than it can potentially allow us to expand our kernel larger in the future. This is also helpful to the Multiboot2 compatability we will setup for our kernel later on in this series.

Entering the Kernel

We just need to jump to 0x100000 which is the entry point of our kernel. If you did not copy the kernel to 1M mark, you will jump to 0x10000 instead.

What’s next?

This post has gotten pretty long, so I will stop here for now. We now have the ability to load (up to 256KB) of a binary from disk to high memory (above 1M), and we have brought the system from boot state to 32-bit mode. While we currently do not have a kernel that we can jump to, we soon will be in the next post.