SlideShare a Scribd company logo
vmlinux: Anatomy of bzimage and how
x86_64 processor is booted
Adrian Huang | May, 2021
* Based on kernel 5.11 (x86_64) – QEMU
* Legacy BIOS
Agenda
• bzimage: high-level overview
• Layout of bzImage
• ELF layout
• setup.bin and compressed vmlinux
• Physical memory layout
• Entry point of Linux – ‘start_of_setup’@0x10200 (physical memory)
• From viewpoint of GRUB and QEMU loader
• Initialization flow
• Compressed vmlinux
• ELF layout
• Physical memory layout
• Initialization flow
Agenda
• Layout of bzImage
• ELF layout
• setup.bin and compressed vmlinux
• Physical memory layout
• Entry point of Linux – ‘start_of_setup’@0x10200 (physical memory)
• From viewpoint of GRUB and QEMU loader
• Initialization flow
• Compressed vmlinux
• ELF layout
• Physical memory layout
• Initialization flow
• CPU architecture knowledge
✓ Near call and far call
✓ Near jump and far jump
✓ Instruction opcode
• CPU Operation Mode
✓ Real mode, protected mode and long mode (64-bit mode)
➢ Memory addressing
• ELF
✓ Relocation, program header,…
• GNU assembly
Requisite Knowledge
bzImage: High-level Overview (1/2)
Boot code
(Real mode -> protected mode)
Compressed vmlinux
Boot code
(protected mode + paging -> long mode)
vmlinux.bin.gz
bzImage
bzImage: High-level Overview (2/2)
Boot code
(Real mode -> protected mode)
Compressed vmlinux
Boot code
(protected mode + paging -> long mode)
vmlinux.bin.gz
bzImage
setup.bin
Compressed
vmlinux
(Protected-mode kernel)
CRC
bzImage
Layout of bzImage – setup.bin
setup.bin
Compressed
vmlinux
(Protected-mode kernel)
.bstext
CRC
.bsdata
Part 1 of ‘.header’
Part 2 of ‘.header’
.entrytext
Kernel Boot Section: 512 bytes (MBR)
Source : arch/x86/boot/header.S
<- arch/x86/boot/header.S
0x0
0x200
Offset/Size Name Description
0x1F1/1 setup_sects The size of the setup in sectors
0x01FE/2 boot_flag magic number: 0xAA55
0x200/2 jump Jump instruction
0x214/4 code32_start
Boot loader hook: The address to jump to in protected mode.
Default: 0x100000
".header": Real-mode kernel header
0x1F1
Part 1 of ‘header’
Part 2 of ‘header’
.inittext
.initdata
.text
.text32
.rodata
.videocards
.data
.signature
.bss
<- arch/x86/boot/header.S
<- arch/x86/boot/header.S
<- arch/x86/boot/tty.c
<- arch/x86/boot/*.c
arch/x86/boot/bioscall.S
arch/x86/boot/copy.S
arch/x86/boot/ pmjump.S
<- arch/x86/boot/ pmjump.S
<- arch/x86/boot/*.c
<- arch/x86/boot/video-*.c
<- arch/x86/boot/*.c
<- 4-byte signature
<- arch/x86/boot/*.c
bzImage
ELF sections
Layout of bzImage – setup.bin
setup.bin
Compressed
vmlinux
(Protected-mode kernel)
.bstext
CRC
.bsdata
Part 1 of ‘.header’
Part 2 of ‘.header’
.entrytext
Kernel Boot Section: 512 bytes (MBR)
Source : arch/x86/boot/header.S
<- arch/x86/boot/header.S
0x0
0x200
Offset/Size Name Description
0x1F1/1 setup_sects The size of the setup in sectors
0x01FE/2 boot_flag magic number: 0xAA55
0x200/2 jump Jump instruction
0x214/4 code32_start
Boot loader hook: The address to jump to in protected mode.
Default: 0x100000
".header": Real-mode kernel header
0x1F1
Part 1 of ‘header’
Part 2 of ‘header’
short jump
.inittext
.initdata
.text
.text32
.rodata
.videocards
.data
.signature
.bss
<- arch/x86/boot/header.S
<- arch/x86/boot/header.S
<- arch/x86/boot/tty.c
<- arch/x86/boot/*.c
arch/x86/boot/bioscall.S
arch/x86/boot/copy.S
arch/x86/boot/ pmjump.S
<- arch/x86/boot/ pmjump.S
<- arch/x86/boot/*.c
<- arch/x86/boot/video-*.c
<- arch/x86/boot/*.c
<- 4-byte signature
<- arch/x86/boot/*.c
bzImage
near call
long jump
1
2
3
1 CPU Real Mode (16 bits)
2 CPU Real Mode
3 CPU Real Mode -> CPU Protected Mode (32 bits)
ELF sections
Layout of bzImage – compressed vmlinux
setup.bin
(arch/x86/boot/setup.bin)
Compressed vmlinux
(Protected-mode kernel)
Note
ELF: arch/x86/boot/compressed/vmlinux
Binary: arch/x86/boot/vmlinux.bin
CRC
bzImage
vmlinux.bin
vmlinux.bin.gz
How to pack vmlinux.bin.gz?
arch/x86/boot/compressed
.head.text
.rodata..compressed
(vmlinux.bin.gz)
.text
.rodata
.data
arch/x86/boot/compressed/head_64.S
0x0
.bss
.pgtable
arch/x86/boot/compressed/piggy.S created by arch/x86/boot/compressed/mkpiggy.c
arch/x86/boot/compressed/vmlinux.bin.gz
arch/x86/boot/compressed/*.c
arch/x86/boot/compressed/head_64.S
arch/x86/boot/compressed/efi_thunk_64.S
arch/x86/boot/compressed/head_64.S
ELF Sections
Layout of bzImage – compressed vmlinux
Compressed vmlinux
setup.bin
(arch/x86/boot/setup.bin)
Compressed vmlinux
(Protected-mode kernel)
Note
ELF: arch/x86/boot/compressed/vmlinux
Binary: arch/x86/boot/vmlinux.bin
CRC
bzImage
.head.text
.rodata..compressed
(vmlinux.bin.gz)
.text
.rodata
.data
arch/x86/boot/compressed/head_64.S
0x0
.bss
.pgtable
arch/x86/boot/compressed/piggy.S created by arch/x86/boot/compressed/mkpiggy.c
arch/x86/boot/compressed/vmlinux.bin.gz
arch/x86/boot/compressed/*.c
arch/x86/boot/compressed/head_64.S
arch/x86/boot/compressed/efi_thunk_64.S
arch/x86/boot/compressed/head_64.S
ELF Sections
Layout of bzImage – compressed vmlinux.bin
* Symbol: Equivalent to using ‘.set’ directive
* https://sourceware.org/binutils/docs/as/Setting-Symbols.html
Why z_input_len/input and z_output_len/output_len?
* BFD: Binary File Descriptor library - https://www.gnu.org/software/binutils/
Memory layout of bzImage – Entry Point Address
Where is ‘X’?
BIOS use only
Typically used by MBR
Reserved for MBR/BIOS
Boot loader
0x00000
0x00600
0x00800
0x01000
Kernel boot section
stack/heap
X
X+0x08000
Reserved for BIOS
Command line
I/O memory hole
Protected-mode kernel
(Compressed vmlinux)
X+0x10000
0x100000
0xA0000
Boot sector entry point 0000:7C00
The kernel legacy boot sector
The kernel real-mode/protected mode code
For use by the kernel real-mode/protected mode code
Physical Memory
Kernel setup code
Reference: Documentation/x86/boot.rst
Entry Point of Linux - GRUB
Memory addressing in real mode
[GRUB] Get the memory address for real mode code
1. gs = fs = es = ds = ss = 0x1000
2. sp = GRUB_LINUX_SETUP_STACK = 0x9000
3. cs = 0x1020, ip = 0
Registers configured by GRUB
Kernel boot section
0x10000
0x10200
Physical Memory
GRUB loads ‘setup.bin’ at address 0x10000
0
ds = es = fs = gs = ss
cs
stack
ss:sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
Entry Point of Linux - GRUB
Memory addressing in real mode
[GRUB] Get the memory address for real mode code
1. gs = fs = es = ds = ss = 0x1000
2. sp = GRUB_LINUX_SETUP_STACK = 0x9000
3. cs = 0x1020, ip = 0
Registers configured by GRUB
Kernel boot section
0x10000
0x10200
Physical Memory
GRUB loads ‘setup.bin’ at address 0x10000
0
ds = es = fs = gs = ss
cs
stack
ss:sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
1. QEMU loader and GRUB load ‘setup.bin’ at address 0x10000
2. QEMU loader sets SS:SP = 1000:FFF0 while GRUB sets SS:SP 1000:9000
Entry Point of Linux: QEMU loader
Kernel boot section
0x10000
0x10200
Physical Memory
QEMU loader loads ‘setup.bin’ at address 0x10000
0
ds = es = fs = gs = ss
cs
stack
ss:sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
1
2
3
4
5
6
7
ds = es = fs = gs = ss = segment_addr = 0x1000
esp = stack_addr = cmdline_addr - setup_addr – 16 = 0x20000 –
0x10000 – 16 = 0x10000 – 16 = 0xfff0
cs = 0x1020, ip = 0
Registers configured by QEMU loader
5
6
7
Prepare for far return
8
far return: change ‘cs’ by means of
CPU arch itself
Entry Point of Linux: QEMU loader – Near and Far calls
3
4
5
6
7 Prepare for far return
8
far return: change ‘cs’ by means of
CPU arch itself
Entry Point of Linux: QEMU loader
Kernel boot section
0x10000
0x10200
Physical Memory
QEMU loader loads ‘setup.bin’ at address 0x10000
0
ds = es = fs = gs = ss
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Make sure setup.bin is loaded at 0x10000
Make sure vmlinux.bin is loaded at 0x100000
Address of setup.bin
Address of vmlinux.bin
arch/x86/boot/setup.ld
arch/x86/boot/header.S
1
2
Entry Point of Linux: GNU Linker
[GNU Linker] ENTRY() command
* First executable instruction in an output file → entry point
* ENTRY() is one of choosing the entry point
-- the `-e' entry command-line option
-- the ENTRY(symbol) command in a linker control script
-- the value of the symbol start, if present
-- the address of the first byte of the .text section, if present;
-- the address 0
arch/x86/boot/setup.ld
1
Entry Point of Linux: GNU Linker
[GNU Linker] ENTRY() command
* First executable instruction in an output file → entry point
* ENTRY() is one of choosing the entry point
-- the `-e' entry command-line option
-- the ENTRY(symbol) command in a linker control script
-- the value of the symbol start, if present
-- the address of the first byte of the .text section, if present;
-- the address 0
Kernel boot section
0x10000
0x10200
Physical Memory
QEMU loader loads ‘setup.bin’ at address 0x10000
0
ds = es = fs = gs = ss
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Entry Point of Linux: start_of_setup - GDB
Kernel boot section
0x10000
0x10200
Physical Memory
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Entry Point of Linux: start_of_setup - GDB
Kernel boot section
0x10000
0x10200
Physical Memory
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Entry Point of Linux: start_of_setup – short jump
Kernel boot section
0x10000
0x10200
Physical Memory
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Offset/Size Name Description
0x1F1/1 setup_sects The size of the setup in sectors
0x01FE/2 boot_flag magic number: 0xAA55
0x200/2 jump Jump instruction
0x214/4 code32_start
Boot loader hook: The address to jump to in protected mode.
Default: 0x100000
".header": Real-mode kernel header
Entry Point of Linux: start_of_setup – short jump
0x26c – 0x202 = 0x6a
Entry Point of Linux: start_of_setup
Call Path
Kernel boot section
0x10000
0x10200
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss = cs
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Physical Memory
lretw instruction: Far Return Operation
‘l’ prefix: far control transfer
‘w’ suffix: word (16 bits)
Entry Point of Linux: start_of_setup
Call Path
Kernel boot section
0x10000
0x10200
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss = cs
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Physical Memory
lretw instruction: Far Return Operation
‘l’ prefix: far control transfer
‘w’ suffix: word (16 bits)
1
1
2
2
3
3
Entry Point of Linux: start_of_setup
Kernel boot section
0x10000
0x10200
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss = cs
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Physical Memory
Call Path
lretw instruction: Far Return Operation
‘l’ prefix: far control transfer
‘w’ suffix: word (16 bits)
Entry Point of Linux: start_of_setup – Why to align CS?
Kernel boot section
0x10000
0x10200
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds = ss = cs
cs
stack
sp = 0x1FFF0 (ss:0xFFF0)
protected mode
real mode
Kernel setup
code
Physical Memory
Call Path
lretw instruction: Far Return Operation
‘l’ prefix: far control transfer
‘w’ suffix: word (16 bits)
If cs is not align with ds, ds and es are incorrect
after returning from ‘intcall’.
Entry Point of Linux: start_of_setup – data & bss section
Call Path
Kernel boot section
0x10000
0x10200
Boot loader loads ‘setup.bin’ at address 0x10000
0
gs = fs = es = ds
= ss= cs
stack
sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
Kernel boot section
0x10000
0x10200
0
gs = fs = es = ds
= ss = cs
stack
sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
BSS Section
_end
__bss_start
__bss_end
Data Section
Physical Memory
Entry Point of Linux: start_of_setup -> main()
Call Path
Entry Point of Linux: start_of_setup -> main()
Entry Point of Linux: start_of_setup -> main()
Entry Point of Linux: start_of_setup -> main() -> copy_boot_params()
Call Path
• copy setup header into boot parameter block (struct boot_params:
arch/x86/include/uapi/asm/bootparam.h)
o `struct setup_header hdr` in boot_params
▪ Contain the same fields defined in Linux boot protocol. Those fields are
configured by boot loader and kernel compile/build time
Call Path • console_init()
o Initialize the corresponding serial port if command line has ‘earlyprintk’
parameter
Entry Point of Linux: start_of_setup -> main() -> console_init() – (1/2)
Kernel boot section
0x10000
0x10200
0
gs = fs = es = ds
= ss = cs
stack
sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
BSS Section
_end
__bss_start
__bss_end
Data Section
Kernel Command Line
0x20000
QEMU Loader
Physical Memory
Call Path • console_init()
o Initialize the corresponding serial port if command line has ‘earlyprintk’
parameter
Entry Point of Linux: start_of_setup -> main() -> console_init() – (2/2)
Kernel boot section
0x10000
0x10200
0
gs = fs = es = ds
= ss = cs
stack
sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
BSS Section
_end
__bss_start
__bss_end
Data Section
Kernel Command Line
0x20000
Physical Memory
Call Path • init_heap()
• Discussion in the next few slides
• validate_cpu()
o Check CPU flags
o Check if long mode (x86_64) is available
o [AMD – K7 Processor] Turn SSE+SSE2 on if they are missing in CPU
flags
• detect_memory()
o Use different program interfaces (0xe820, 0xe801 and 0x88) for memory
detection
o 0xe820
▪ Fill boot_params.e820_table based on e820 map
Entry Point of Linux: start_of_setup -> main() -> validate_cpu() & detect_memory()
Kernel boot section
0x10000
0x10200
0
gs = fs = es = ds
= ss = cs
stack
sp = 0x1FFF0
protected mode
real mode
Kernel setup
code
BSS Section
_end
__bss_start
__bss_end
Data Section
Kernel Command Line
0x20000
Physical Memory
Call Path
• init_heap
o Setup the heap space if the ‘CAN_USE_HEAP’ flag (0x80) is set in loadflags
of the kernel setup header.
Entry Point of Linux: start_of_setup -> main() -> init_heap() (1/2)
Call Path
Entry Point of Linux: start_of_setup -> main() -> init_heap() (2/2)
heap: allocate heap if CAN_USE_HEAP’ flag (0x80) is set
No heap
sp (STACK_SIZE = 0x400)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Unused Area
__bss_start
__bss_end
HEAP = heap_end = _end
Data Section
sp (STACK_SIZE = 0x400)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Data Section
gs = fs = es = ds = ss = cs
gs = fs = es = ds = ss = cs
go_to_protected_mode
GDT_ENTRY_BOOT_DS
GDT_ENTRY_BOOT_CS
NULL
NULL
0
1
2
3
GDT_ENTRY_BOOT_TSS
4
Descriptor Table: boot_gdt
System
Memory
0
0xFFFFFFFF
limit
Base Address
GDTR
x86 Segmentation: Address Translation
setup_gdt(): Setup 4G memory space for CS/DS
Call Path
protected_mode_jump (1/6)
protected_mode_jump – ljmpl instruction: ignore ‘.Lin_pm32’ relocation (2/6)
0x30cc
Jump
(absolute
address)
to
the
wrong
location
sp (STACK_SIZE = 0x400)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Data Section
gs = fs = es = ds = ss = cs
setup.bin generation
Physical Memory
protected_mode_jump – ljmpl instruction - relocation (3/6)
sp (STACK_SIZE = 0x400)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Data Section
gs = fs = es = ds = ss = cs
Relocation for absolute address of ‘ljmpl’
ljmpl
Physical Memory
Relocation for absolute address of ‘ljmpl’
protected_mode_jump – ljmpl instruction (4/6)
sp (STACK_SIZE = 0x400)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Data Section
gs = fs = es = ds = ss = cs
ljmpl
Physical Memory
protected_mode_jump – ljmpl instruction: instruction format (5/6)
protected_mode_jump – ljmpl instruction: instruction format (6/6)
Protected mode: ‘.Lin_pm32’ (1/2)
[real mode] SP configuration [protected mode] SP configuration
`addl %ebx, %esp` in label “.Lin_pm32”
0x1FF80 (SS:SP = 0x1000:0xFF80)
Kernel boot section
0x10000
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
esp = 0x1FF80
Kernel boot section
0x10000 (ebx)
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Data Section Data Section
Data Section
1
2
4
3
Protected mode: ‘.Lin_pm32’ (2/2)
X = 0x10000
esp = 0x1FF80
Kernel boot section
0x10200
stack
protected mode
real mode
Kernel setup
code
BSS Section
Heap
__bss_start
__bss_end
HEAP = _end
heap_end
Reserved for BIOS command line
I/O memory hole
Protected-mode kernel code
(compressed vmlinux)
X+0x10000
0xA0000
0x100000 jmpl *%eax
5
Physical Memory
Call Path
Compressed vmlinux: memory layout (1/10)
.head.text – startup_32
0x100000 (ebp register)
0x100200
decompressed vmlinux.bin.bz
.head.text – startup_64
0x1000000
compressed vmlinux
(Relocation)
0x1000000 + boot_param.init_size
0x1000000 + boot_param.init_size
- _end (rbx register)
vmlinux.bin.gz
.text
.rodata
.data
.bss
.pgtable
_end
0x100000 + _end
boot_heap (size: 0x10000)
boot_stack (size: 0x4000)
…
input_data
input_data_end
Memory Layout
32-bit entry point
_bss
Compressed vmlinux: boot_stack & boot_heap in .bss (2/10)
.head.text – startup_32
0x100000 (ebp register)
0x100200
decompressed vmlinux.bin.bz
.head.text – startup_64
0x1000000
compressed vmlinux
(Relocation)
0x1000000 + boot_param.init_size
0x1000000 + boot_param.init_size
- _end (rbx register)
vmlinux.bin.gz
.text
.rodata
.data
.bss
.pgtable
_end
0x100000 + _end
boot_heap (size: 0x10000)
boot_stack (size: 0x4000)
…
input_data
input_data_end
Memory Layout
32-bit entry point
_bss
Compressed vmlinux: High-level Overview (3/10)
Why relocation
• Base address of 32-bit Linux kernel entry point: 0x100000
• Default base address of Linux kernel:
CONFIG_PHYSICAL_START=0x1000000
• Use Case
• kdump: a recuse kernel is loaded to a different address
• PIE (Position independent Executable) and PIC (Position
Independent Code)
Compressed vmlinux: startup_32: 32-bit entry point (4/10)
1
1
Compressed vmlinux: startup_32 (5/10)
1
1
Get the loading address
2
2
Compressed vmlinux: startup_32 (6/10)
Compressed vmlinux: startup_32: Init 4-level page table (7/10)
Sign-extend
Page Map
Level-4 Offset
Page Directory
Pointer Offset
Page Directory
Offset
Physical Page Offset
0
30 21
39 20
38 29
47
48
63
PML4E #0
PDPTE #3
Data
Page Map
Level-4 Table
Page Directory
Pointer Table
Page Directory
Table
40
9 9 9
Linear Address
CR3
PDPTE #2
PDPTE #1
PDPTE #0
PDE #1535
PDE #1024
.
.
PDE #2047
PDE #1536
.
.
PDE #511
PDE #0
.
.
PDE #1023
PDE #512
.
.
2MBbyte
Physical
Page
40
40
31
21
[Paging] Identity mapping for 0-4GB memory space
Compressed vmlinux: startup_32: Init 4-level page table (8/10)
Reference: Section 4.1 “PAGING MODES AND CONTROL BITS”, Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 3 (3A, 3B, 3C & 3D): System Programming Guide
Compressed vmlinux: startup_32: Init 4-level page table (9/10)
Compressed vmlinux: far return to startup_64 (10/10)
rva(startup_64) = 0x200
ebp = 0x100000
eax = 0x100000 + 0x200 = 0x100200
Compressed vmlinux: startup_64
2
3
Why to reload CS? (Commit “34bb49229f19”)
When the pre-decompression code loads its first GDT in startup_64, it is still
running on the CS value of the previous GDT. In the case of SEV-ES this is the EFI
GDT. It can be anything depending on what has loaded the kernel (EFI, legacy boot
code, container runtime, etc.)
Compressed vmlinux: [.text] .Lrelocated (1/5)
4
5
Why to call initialize_identity_maps()?
Compressed vmlinux: [.text] .Lrelocated (2/5)
4
5
Why to map boot_params and command line?
Compressed vmlinux: parse_elf (3/5)
4
ELF Header
0x1000000
decompressed vmlinux.bin.bz
(vmlinux.bin – ELF format)
program headers
program header #0
(.text, .rodata, .pci_fixup….)
0x1200000
program header #1
(.data .vvar)
program header #2
(.init.text .altinstr_aux …)
0x1a00000
0x1ac2000
program header #3 (.notes)
0x18886b0
0x1000000
program header #0
(.text, .rodata, .pci_fixup….)
0x1800000
program header #1
(.data .vvar)
program header #2
(.init.text .altinstr_aux …) 0x18c2000
Physical memory Physical memory
Compressed vmlinux: handle_relocations (4/5)
4
CONFIG_RELOCATABLE
• Retain relocation information (generate .rel.* or rela.* sections) when
building a kernel image, so it can be loaded someplace besides the default
address (CONFIG_PHYSICAL_START = 16MB).
• Use case: kdump kernel (recovery kernel)
handle_relocations() - Relocation if CONFIG_X86_NEED_RELOCS is set
• Depend on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)
• Scan relocation tables (.rel.* or .rela.* sections) for symbol relocation
Compressed vmlinux: handle_relocations (5/5)
4
vmlinux.bin.bz
vmlinux.bin
vmlinux.relocs
handle_relocations():
Perform relocation
backwards from the end
of the decompressed
vmlinux
64-bit relocation
address
0
32-bit relocation
address
0
-R section_name: Remove any section matching section_name
-S or strip-all: Do not copy relocation and symbol information from the source file
objdump options
Recap
setup.bin
(arch/x86/boot/setup.bin)
Compressed vmlinux
(Protected-mode kernel)
Note
ELF: arch/x86/boot/compressed/vmlinux
Binary: arch/x86/boot/vmlinux.bin
CRC
bzImage
[More info] bzImage = vmlinuz
On a physical machine
Source code: arch/x86/boot/Makefile, arch/x86/boot/install.sh
Reference
• The Linux/x86 Boot Protocol, Documentation/x86/boot.rst
• Intel® 64 and IA-32 Architectures Software Developer’s Manual
• https://wdv4758h.github.io/notes/blog/linux-kernel-boot.html
• Linux insides, https://0xax.gitbooks.io/linux-insides/content/
Appendix
gdb: Preparation for debugging real-mode of Linux kernel (1/2)
Github: https://github.com/AdrianHuang/gdb-linux-real-mode
gdb: Preparation for debugging real-mode of Linux kernel (2/2)
Github: https://github.com/AdrianHuang/gdb-linux-real-mode
initialize_identity_maps
x86_mapping_info
void *(*alloc_pgt_page)(void *)
void *context
unsigned long page_flag
unsigned long offset
alloc_pgt_data
unsigned char *pgt_buf
unsigned long pgt_buf_size
unsigned long pgt_buf_offset
bool direct_gbpages
unsigned long kernpg_flag
UEFI booting flow – EFI boot stub: Entry point
AddressOfEntryPoint (efi_pe_entry): 0x18d84a
ImageBase = 0x1000000
Physical address of AddressofEntryPoint = 0x1000000 +
0x18d84a = 0x118d84a
UEFI booting flow – EFI Handover protocol
UEFI booting flow – EFI Handover protocol
UEFI booting flow – EFI Handover protocol
Where is the address of bzimage loaded by boot loader?
UEFI booting: call path

More Related Content

PDF
Physical Memory Management.pdf
PDF
Decompressed vmlinux: linux kernel initialization from page table configurati...
PPTX
Slab Allocator in Linux Kernel
PDF
Process Address Space: The way to create virtual address (page table) of user...
PDF
Page cache in Linux kernel
PDF
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
PDF
Physical Memory Models.pdf
PDF
spinlock.pdf
Physical Memory Management.pdf
Decompressed vmlinux: linux kernel initialization from page table configurati...
Slab Allocator in Linux Kernel
Process Address Space: The way to create virtual address (page table) of user...
Page cache in Linux kernel
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
Physical Memory Models.pdf
spinlock.pdf

What's hot (20)

PPTX
Linux Kernel Booting Process (2) - For NLKB
PPTX
Linux Kernel Booting Process (1) - For NLKB
PDF
Memory Mapping Implementation (mmap) in Linux Kernel
PDF
Arm device tree and linux device drivers
PDF
Kdump and the kernel crash dump analysis
PPTX
Linux Initialization Process (1)
PDF
Memory Management with Page Folios
PPTX
Linux Initialization Process (2)
PPTX
U-Boot presentation 2013
PDF
U-Boot - An universal bootloader
PPTX
Linux kernel debugging
PPT
U boot porting guide for SoC
PDF
Part 02 Linux Kernel Module Programming
PDF
malloc & vmalloc in Linux
PDF
Embedded Linux BSP Training (Intro)
PPT
Introduction to Linux Kernel by Quontra Solutions
PDF
systemd
PDF
Reverse Mapping (rmap) in Linux Kernel
PPT
U Boot or Universal Bootloader
PDF
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
Memory Mapping Implementation (mmap) in Linux Kernel
Arm device tree and linux device drivers
Kdump and the kernel crash dump analysis
Linux Initialization Process (1)
Memory Management with Page Folios
Linux Initialization Process (2)
U-Boot presentation 2013
U-Boot - An universal bootloader
Linux kernel debugging
U boot porting guide for SoC
Part 02 Linux Kernel Module Programming
malloc & vmalloc in Linux
Embedded Linux BSP Training (Intro)
Introduction to Linux Kernel by Quontra Solutions
systemd
Reverse Mapping (rmap) in Linux Kernel
U Boot or Universal Bootloader
Ad

Similar to Vmlinux: anatomy of bzimage and how x86 64 processor is booted (20)

PPTX
Linux Kernel Tour
PPTX
U-Boot Porting on New Hardware
PPT
How to build and load linux to embedded system
PDF
Grub2 Booting Process
PDF
Linux Porting
PPTX
Raspberry Pi tutorial
PPTX
“Linux Kernel CPU Hotplug in the Multicore System”
PPT
Bootstrap process of u boot (NDS32 RISC CPU)
PDF
005 skyeye
ODP
[Defcon] Hardware backdooring is practical
PDF
Launch the First Process in Linux System
PDF
Beagleboard xm-setup
ODP
Kernel compilation
PDF
LCU14 302- How to port OP-TEE to another platform
PPTX
Vagrant, Ansible, and OpenStack on your laptop
PPTX
PV-Drivers for SeaBIOS using Upstream Qemu
PPTX
망고100 보드로 놀아보자 7
PPT
Linux Booting Steps
PPT
Basic Linux Internals
PDF
Project ACRN configuration scenarios and config tool
Linux Kernel Tour
U-Boot Porting on New Hardware
How to build and load linux to embedded system
Grub2 Booting Process
Linux Porting
Raspberry Pi tutorial
“Linux Kernel CPU Hotplug in the Multicore System”
Bootstrap process of u boot (NDS32 RISC CPU)
005 skyeye
[Defcon] Hardware backdooring is practical
Launch the First Process in Linux System
Beagleboard xm-setup
Kernel compilation
LCU14 302- How to port OP-TEE to another platform
Vagrant, Ansible, and OpenStack on your laptop
PV-Drivers for SeaBIOS using Upstream Qemu
망고100 보드로 놀아보자 7
Linux Booting Steps
Basic Linux Internals
Project ACRN configuration scenarios and config tool
Ad

More from Adrian Huang (6)

PDF
Linux Synchronization Mechanism: RCU (Read Copy Update)
PPTX
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
PDF
semaphore & mutex.pdf
PDF
Memory Compaction in Linux Kernel.pdf
PDF
Linux Kernel - Virtual File System
PDF
Anatomy of the loadable kernel module (lkm)
Linux Synchronization Mechanism: RCU (Read Copy Update)
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
semaphore & mutex.pdf
Memory Compaction in Linux Kernel.pdf
Linux Kernel - Virtual File System
Anatomy of the loadable kernel module (lkm)

Recently uploaded (20)

PDF
System and Network Administration Chapter 2
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
history of c programming in notes for students .pptx
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Transform Your Business with a Software ERP System
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
ai tools demonstartion for schools and inter college
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Digital Strategies for Manufacturing Companies
System and Network Administration Chapter 2
Odoo Companies in India – Driving Business Transformation.pdf
history of c programming in notes for students .pptx
CHAPTER 2 - PM Management and IT Context
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
ManageIQ - Sprint 268 Review - Slide Deck
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Design an Analysis of Algorithms II-SECS-1021-03
Transform Your Business with a Software ERP System
How Creative Agencies Leverage Project Management Software.pdf
PTS Company Brochure 2025 (1).pdf.......
ISO 45001 Occupational Health and Safety Management System
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
L1 - Introduction to python Backend.pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
ai tools demonstartion for schools and inter college
How to Migrate SBCGlobal Email to Yahoo Easily
Digital Strategies for Manufacturing Companies

Vmlinux: anatomy of bzimage and how x86 64 processor is booted

  • 1. vmlinux: Anatomy of bzimage and how x86_64 processor is booted Adrian Huang | May, 2021 * Based on kernel 5.11 (x86_64) – QEMU * Legacy BIOS
  • 2. Agenda • bzimage: high-level overview • Layout of bzImage • ELF layout • setup.bin and compressed vmlinux • Physical memory layout • Entry point of Linux – ‘start_of_setup’@0x10200 (physical memory) • From viewpoint of GRUB and QEMU loader • Initialization flow • Compressed vmlinux • ELF layout • Physical memory layout • Initialization flow
  • 3. Agenda • Layout of bzImage • ELF layout • setup.bin and compressed vmlinux • Physical memory layout • Entry point of Linux – ‘start_of_setup’@0x10200 (physical memory) • From viewpoint of GRUB and QEMU loader • Initialization flow • Compressed vmlinux • ELF layout • Physical memory layout • Initialization flow • CPU architecture knowledge ✓ Near call and far call ✓ Near jump and far jump ✓ Instruction opcode • CPU Operation Mode ✓ Real mode, protected mode and long mode (64-bit mode) ➢ Memory addressing • ELF ✓ Relocation, program header,… • GNU assembly Requisite Knowledge
  • 4. bzImage: High-level Overview (1/2) Boot code (Real mode -> protected mode) Compressed vmlinux Boot code (protected mode + paging -> long mode) vmlinux.bin.gz bzImage
  • 5. bzImage: High-level Overview (2/2) Boot code (Real mode -> protected mode) Compressed vmlinux Boot code (protected mode + paging -> long mode) vmlinux.bin.gz bzImage setup.bin Compressed vmlinux (Protected-mode kernel) CRC bzImage
  • 6. Layout of bzImage – setup.bin setup.bin Compressed vmlinux (Protected-mode kernel) .bstext CRC .bsdata Part 1 of ‘.header’ Part 2 of ‘.header’ .entrytext Kernel Boot Section: 512 bytes (MBR) Source : arch/x86/boot/header.S <- arch/x86/boot/header.S 0x0 0x200 Offset/Size Name Description 0x1F1/1 setup_sects The size of the setup in sectors 0x01FE/2 boot_flag magic number: 0xAA55 0x200/2 jump Jump instruction 0x214/4 code32_start Boot loader hook: The address to jump to in protected mode. Default: 0x100000 ".header": Real-mode kernel header 0x1F1 Part 1 of ‘header’ Part 2 of ‘header’ .inittext .initdata .text .text32 .rodata .videocards .data .signature .bss <- arch/x86/boot/header.S <- arch/x86/boot/header.S <- arch/x86/boot/tty.c <- arch/x86/boot/*.c arch/x86/boot/bioscall.S arch/x86/boot/copy.S arch/x86/boot/ pmjump.S <- arch/x86/boot/ pmjump.S <- arch/x86/boot/*.c <- arch/x86/boot/video-*.c <- arch/x86/boot/*.c <- 4-byte signature <- arch/x86/boot/*.c bzImage ELF sections
  • 7. Layout of bzImage – setup.bin setup.bin Compressed vmlinux (Protected-mode kernel) .bstext CRC .bsdata Part 1 of ‘.header’ Part 2 of ‘.header’ .entrytext Kernel Boot Section: 512 bytes (MBR) Source : arch/x86/boot/header.S <- arch/x86/boot/header.S 0x0 0x200 Offset/Size Name Description 0x1F1/1 setup_sects The size of the setup in sectors 0x01FE/2 boot_flag magic number: 0xAA55 0x200/2 jump Jump instruction 0x214/4 code32_start Boot loader hook: The address to jump to in protected mode. Default: 0x100000 ".header": Real-mode kernel header 0x1F1 Part 1 of ‘header’ Part 2 of ‘header’ short jump .inittext .initdata .text .text32 .rodata .videocards .data .signature .bss <- arch/x86/boot/header.S <- arch/x86/boot/header.S <- arch/x86/boot/tty.c <- arch/x86/boot/*.c arch/x86/boot/bioscall.S arch/x86/boot/copy.S arch/x86/boot/ pmjump.S <- arch/x86/boot/ pmjump.S <- arch/x86/boot/*.c <- arch/x86/boot/video-*.c <- arch/x86/boot/*.c <- 4-byte signature <- arch/x86/boot/*.c bzImage near call long jump 1 2 3 1 CPU Real Mode (16 bits) 2 CPU Real Mode 3 CPU Real Mode -> CPU Protected Mode (32 bits) ELF sections
  • 8. Layout of bzImage – compressed vmlinux setup.bin (arch/x86/boot/setup.bin) Compressed vmlinux (Protected-mode kernel) Note ELF: arch/x86/boot/compressed/vmlinux Binary: arch/x86/boot/vmlinux.bin CRC bzImage vmlinux.bin vmlinux.bin.gz How to pack vmlinux.bin.gz? arch/x86/boot/compressed .head.text .rodata..compressed (vmlinux.bin.gz) .text .rodata .data arch/x86/boot/compressed/head_64.S 0x0 .bss .pgtable arch/x86/boot/compressed/piggy.S created by arch/x86/boot/compressed/mkpiggy.c arch/x86/boot/compressed/vmlinux.bin.gz arch/x86/boot/compressed/*.c arch/x86/boot/compressed/head_64.S arch/x86/boot/compressed/efi_thunk_64.S arch/x86/boot/compressed/head_64.S ELF Sections
  • 9. Layout of bzImage – compressed vmlinux Compressed vmlinux setup.bin (arch/x86/boot/setup.bin) Compressed vmlinux (Protected-mode kernel) Note ELF: arch/x86/boot/compressed/vmlinux Binary: arch/x86/boot/vmlinux.bin CRC bzImage .head.text .rodata..compressed (vmlinux.bin.gz) .text .rodata .data arch/x86/boot/compressed/head_64.S 0x0 .bss .pgtable arch/x86/boot/compressed/piggy.S created by arch/x86/boot/compressed/mkpiggy.c arch/x86/boot/compressed/vmlinux.bin.gz arch/x86/boot/compressed/*.c arch/x86/boot/compressed/head_64.S arch/x86/boot/compressed/efi_thunk_64.S arch/x86/boot/compressed/head_64.S ELF Sections
  • 10. Layout of bzImage – compressed vmlinux.bin * Symbol: Equivalent to using ‘.set’ directive * https://sourceware.org/binutils/docs/as/Setting-Symbols.html Why z_input_len/input and z_output_len/output_len? * BFD: Binary File Descriptor library - https://www.gnu.org/software/binutils/
  • 11. Memory layout of bzImage – Entry Point Address Where is ‘X’? BIOS use only Typically used by MBR Reserved for MBR/BIOS Boot loader 0x00000 0x00600 0x00800 0x01000 Kernel boot section stack/heap X X+0x08000 Reserved for BIOS Command line I/O memory hole Protected-mode kernel (Compressed vmlinux) X+0x10000 0x100000 0xA0000 Boot sector entry point 0000:7C00 The kernel legacy boot sector The kernel real-mode/protected mode code For use by the kernel real-mode/protected mode code Physical Memory Kernel setup code Reference: Documentation/x86/boot.rst
  • 12. Entry Point of Linux - GRUB Memory addressing in real mode [GRUB] Get the memory address for real mode code 1. gs = fs = es = ds = ss = 0x1000 2. sp = GRUB_LINUX_SETUP_STACK = 0x9000 3. cs = 0x1020, ip = 0 Registers configured by GRUB Kernel boot section 0x10000 0x10200 Physical Memory GRUB loads ‘setup.bin’ at address 0x10000 0 ds = es = fs = gs = ss cs stack ss:sp = 0x1FFF0 protected mode real mode Kernel setup code
  • 13. Entry Point of Linux - GRUB Memory addressing in real mode [GRUB] Get the memory address for real mode code 1. gs = fs = es = ds = ss = 0x1000 2. sp = GRUB_LINUX_SETUP_STACK = 0x9000 3. cs = 0x1020, ip = 0 Registers configured by GRUB Kernel boot section 0x10000 0x10200 Physical Memory GRUB loads ‘setup.bin’ at address 0x10000 0 ds = es = fs = gs = ss cs stack ss:sp = 0x1FFF0 protected mode real mode Kernel setup code 1. QEMU loader and GRUB load ‘setup.bin’ at address 0x10000 2. QEMU loader sets SS:SP = 1000:FFF0 while GRUB sets SS:SP 1000:9000
  • 14. Entry Point of Linux: QEMU loader Kernel boot section 0x10000 0x10200 Physical Memory QEMU loader loads ‘setup.bin’ at address 0x10000 0 ds = es = fs = gs = ss cs stack ss:sp = 0x1FFF0 protected mode real mode Kernel setup code 1 2 3 4 5 6 7 ds = es = fs = gs = ss = segment_addr = 0x1000 esp = stack_addr = cmdline_addr - setup_addr – 16 = 0x20000 – 0x10000 – 16 = 0x10000 – 16 = 0xfff0 cs = 0x1020, ip = 0 Registers configured by QEMU loader 5 6 7 Prepare for far return 8 far return: change ‘cs’ by means of CPU arch itself
  • 15. Entry Point of Linux: QEMU loader – Near and Far calls 3 4 5 6 7 Prepare for far return 8 far return: change ‘cs’ by means of CPU arch itself
  • 16. Entry Point of Linux: QEMU loader Kernel boot section 0x10000 0x10200 Physical Memory QEMU loader loads ‘setup.bin’ at address 0x10000 0 ds = es = fs = gs = ss cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Make sure setup.bin is loaded at 0x10000 Make sure vmlinux.bin is loaded at 0x100000 Address of setup.bin Address of vmlinux.bin
  • 17. arch/x86/boot/setup.ld arch/x86/boot/header.S 1 2 Entry Point of Linux: GNU Linker [GNU Linker] ENTRY() command * First executable instruction in an output file → entry point * ENTRY() is one of choosing the entry point -- the `-e' entry command-line option -- the ENTRY(symbol) command in a linker control script -- the value of the symbol start, if present -- the address of the first byte of the .text section, if present; -- the address 0
  • 18. arch/x86/boot/setup.ld 1 Entry Point of Linux: GNU Linker [GNU Linker] ENTRY() command * First executable instruction in an output file → entry point * ENTRY() is one of choosing the entry point -- the `-e' entry command-line option -- the ENTRY(symbol) command in a linker control script -- the value of the symbol start, if present -- the address of the first byte of the .text section, if present; -- the address 0 Kernel boot section 0x10000 0x10200 Physical Memory QEMU loader loads ‘setup.bin’ at address 0x10000 0 ds = es = fs = gs = ss cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code
  • 19. Entry Point of Linux: start_of_setup - GDB Kernel boot section 0x10000 0x10200 Physical Memory Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code
  • 20. Entry Point of Linux: start_of_setup - GDB Kernel boot section 0x10000 0x10200 Physical Memory Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code
  • 21. Entry Point of Linux: start_of_setup – short jump Kernel boot section 0x10000 0x10200 Physical Memory Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Offset/Size Name Description 0x1F1/1 setup_sects The size of the setup in sectors 0x01FE/2 boot_flag magic number: 0xAA55 0x200/2 jump Jump instruction 0x214/4 code32_start Boot loader hook: The address to jump to in protected mode. Default: 0x100000 ".header": Real-mode kernel header
  • 22. Entry Point of Linux: start_of_setup – short jump 0x26c – 0x202 = 0x6a
  • 23. Entry Point of Linux: start_of_setup Call Path Kernel boot section 0x10000 0x10200 Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss = cs cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Physical Memory lretw instruction: Far Return Operation ‘l’ prefix: far control transfer ‘w’ suffix: word (16 bits)
  • 24. Entry Point of Linux: start_of_setup Call Path Kernel boot section 0x10000 0x10200 Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss = cs cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Physical Memory lretw instruction: Far Return Operation ‘l’ prefix: far control transfer ‘w’ suffix: word (16 bits) 1 1 2 2 3 3
  • 25. Entry Point of Linux: start_of_setup Kernel boot section 0x10000 0x10200 Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss = cs cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Physical Memory Call Path lretw instruction: Far Return Operation ‘l’ prefix: far control transfer ‘w’ suffix: word (16 bits)
  • 26. Entry Point of Linux: start_of_setup – Why to align CS? Kernel boot section 0x10000 0x10200 Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss = cs cs stack sp = 0x1FFF0 (ss:0xFFF0) protected mode real mode Kernel setup code Physical Memory Call Path lretw instruction: Far Return Operation ‘l’ prefix: far control transfer ‘w’ suffix: word (16 bits) If cs is not align with ds, ds and es are incorrect after returning from ‘intcall’.
  • 27. Entry Point of Linux: start_of_setup – data & bss section Call Path Kernel boot section 0x10000 0x10200 Boot loader loads ‘setup.bin’ at address 0x10000 0 gs = fs = es = ds = ss= cs stack sp = 0x1FFF0 protected mode real mode Kernel setup code Kernel boot section 0x10000 0x10200 0 gs = fs = es = ds = ss = cs stack sp = 0x1FFF0 protected mode real mode Kernel setup code BSS Section _end __bss_start __bss_end Data Section Physical Memory
  • 28. Entry Point of Linux: start_of_setup -> main() Call Path
  • 29. Entry Point of Linux: start_of_setup -> main()
  • 30. Entry Point of Linux: start_of_setup -> main()
  • 31. Entry Point of Linux: start_of_setup -> main() -> copy_boot_params() Call Path • copy setup header into boot parameter block (struct boot_params: arch/x86/include/uapi/asm/bootparam.h) o `struct setup_header hdr` in boot_params ▪ Contain the same fields defined in Linux boot protocol. Those fields are configured by boot loader and kernel compile/build time
  • 32. Call Path • console_init() o Initialize the corresponding serial port if command line has ‘earlyprintk’ parameter Entry Point of Linux: start_of_setup -> main() -> console_init() – (1/2) Kernel boot section 0x10000 0x10200 0 gs = fs = es = ds = ss = cs stack sp = 0x1FFF0 protected mode real mode Kernel setup code BSS Section _end __bss_start __bss_end Data Section Kernel Command Line 0x20000 QEMU Loader Physical Memory
  • 33. Call Path • console_init() o Initialize the corresponding serial port if command line has ‘earlyprintk’ parameter Entry Point of Linux: start_of_setup -> main() -> console_init() – (2/2) Kernel boot section 0x10000 0x10200 0 gs = fs = es = ds = ss = cs stack sp = 0x1FFF0 protected mode real mode Kernel setup code BSS Section _end __bss_start __bss_end Data Section Kernel Command Line 0x20000 Physical Memory
  • 34. Call Path • init_heap() • Discussion in the next few slides • validate_cpu() o Check CPU flags o Check if long mode (x86_64) is available o [AMD – K7 Processor] Turn SSE+SSE2 on if they are missing in CPU flags • detect_memory() o Use different program interfaces (0xe820, 0xe801 and 0x88) for memory detection o 0xe820 ▪ Fill boot_params.e820_table based on e820 map Entry Point of Linux: start_of_setup -> main() -> validate_cpu() & detect_memory() Kernel boot section 0x10000 0x10200 0 gs = fs = es = ds = ss = cs stack sp = 0x1FFF0 protected mode real mode Kernel setup code BSS Section _end __bss_start __bss_end Data Section Kernel Command Line 0x20000 Physical Memory
  • 35. Call Path • init_heap o Setup the heap space if the ‘CAN_USE_HEAP’ flag (0x80) is set in loadflags of the kernel setup header. Entry Point of Linux: start_of_setup -> main() -> init_heap() (1/2)
  • 36. Call Path Entry Point of Linux: start_of_setup -> main() -> init_heap() (2/2) heap: allocate heap if CAN_USE_HEAP’ flag (0x80) is set No heap sp (STACK_SIZE = 0x400) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Unused Area __bss_start __bss_end HEAP = heap_end = _end Data Section sp (STACK_SIZE = 0x400) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Data Section gs = fs = es = ds = ss = cs gs = fs = es = ds = ss = cs
  • 37. go_to_protected_mode GDT_ENTRY_BOOT_DS GDT_ENTRY_BOOT_CS NULL NULL 0 1 2 3 GDT_ENTRY_BOOT_TSS 4 Descriptor Table: boot_gdt System Memory 0 0xFFFFFFFF limit Base Address GDTR x86 Segmentation: Address Translation setup_gdt(): Setup 4G memory space for CS/DS Call Path
  • 39. protected_mode_jump – ljmpl instruction: ignore ‘.Lin_pm32’ relocation (2/6) 0x30cc Jump (absolute address) to the wrong location sp (STACK_SIZE = 0x400) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Data Section gs = fs = es = ds = ss = cs setup.bin generation Physical Memory
  • 40. protected_mode_jump – ljmpl instruction - relocation (3/6) sp (STACK_SIZE = 0x400) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Data Section gs = fs = es = ds = ss = cs Relocation for absolute address of ‘ljmpl’ ljmpl Physical Memory
  • 41. Relocation for absolute address of ‘ljmpl’ protected_mode_jump – ljmpl instruction (4/6) sp (STACK_SIZE = 0x400) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Data Section gs = fs = es = ds = ss = cs ljmpl Physical Memory
  • 42. protected_mode_jump – ljmpl instruction: instruction format (5/6)
  • 43. protected_mode_jump – ljmpl instruction: instruction format (6/6)
  • 44. Protected mode: ‘.Lin_pm32’ (1/2) [real mode] SP configuration [protected mode] SP configuration `addl %ebx, %esp` in label “.Lin_pm32” 0x1FF80 (SS:SP = 0x1000:0xFF80) Kernel boot section 0x10000 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end esp = 0x1FF80 Kernel boot section 0x10000 (ebx) 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Data Section Data Section
  • 45. Data Section 1 2 4 3 Protected mode: ‘.Lin_pm32’ (2/2) X = 0x10000 esp = 0x1FF80 Kernel boot section 0x10200 stack protected mode real mode Kernel setup code BSS Section Heap __bss_start __bss_end HEAP = _end heap_end Reserved for BIOS command line I/O memory hole Protected-mode kernel code (compressed vmlinux) X+0x10000 0xA0000 0x100000 jmpl *%eax 5 Physical Memory Call Path
  • 46. Compressed vmlinux: memory layout (1/10) .head.text – startup_32 0x100000 (ebp register) 0x100200 decompressed vmlinux.bin.bz .head.text – startup_64 0x1000000 compressed vmlinux (Relocation) 0x1000000 + boot_param.init_size 0x1000000 + boot_param.init_size - _end (rbx register) vmlinux.bin.gz .text .rodata .data .bss .pgtable _end 0x100000 + _end boot_heap (size: 0x10000) boot_stack (size: 0x4000) … input_data input_data_end Memory Layout 32-bit entry point _bss
  • 47. Compressed vmlinux: boot_stack & boot_heap in .bss (2/10) .head.text – startup_32 0x100000 (ebp register) 0x100200 decompressed vmlinux.bin.bz .head.text – startup_64 0x1000000 compressed vmlinux (Relocation) 0x1000000 + boot_param.init_size 0x1000000 + boot_param.init_size - _end (rbx register) vmlinux.bin.gz .text .rodata .data .bss .pgtable _end 0x100000 + _end boot_heap (size: 0x10000) boot_stack (size: 0x4000) … input_data input_data_end Memory Layout 32-bit entry point _bss
  • 48. Compressed vmlinux: High-level Overview (3/10) Why relocation • Base address of 32-bit Linux kernel entry point: 0x100000 • Default base address of Linux kernel: CONFIG_PHYSICAL_START=0x1000000 • Use Case • kdump: a recuse kernel is loaded to a different address • PIE (Position independent Executable) and PIC (Position Independent Code)
  • 49. Compressed vmlinux: startup_32: 32-bit entry point (4/10) 1 1
  • 50. Compressed vmlinux: startup_32 (5/10) 1 1 Get the loading address
  • 52. Compressed vmlinux: startup_32: Init 4-level page table (7/10) Sign-extend Page Map Level-4 Offset Page Directory Pointer Offset Page Directory Offset Physical Page Offset 0 30 21 39 20 38 29 47 48 63 PML4E #0 PDPTE #3 Data Page Map Level-4 Table Page Directory Pointer Table Page Directory Table 40 9 9 9 Linear Address CR3 PDPTE #2 PDPTE #1 PDPTE #0 PDE #1535 PDE #1024 . . PDE #2047 PDE #1536 . . PDE #511 PDE #0 . . PDE #1023 PDE #512 . . 2MBbyte Physical Page 40 40 31 21 [Paging] Identity mapping for 0-4GB memory space
  • 53. Compressed vmlinux: startup_32: Init 4-level page table (8/10) Reference: Section 4.1 “PAGING MODES AND CONTROL BITS”, Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C & 3D): System Programming Guide
  • 54. Compressed vmlinux: startup_32: Init 4-level page table (9/10)
  • 55. Compressed vmlinux: far return to startup_64 (10/10) rva(startup_64) = 0x200 ebp = 0x100000 eax = 0x100000 + 0x200 = 0x100200
  • 56. Compressed vmlinux: startup_64 2 3 Why to reload CS? (Commit “34bb49229f19”) When the pre-decompression code loads its first GDT in startup_64, it is still running on the CS value of the previous GDT. In the case of SEV-ES this is the EFI GDT. It can be anything depending on what has loaded the kernel (EFI, legacy boot code, container runtime, etc.)
  • 57. Compressed vmlinux: [.text] .Lrelocated (1/5) 4 5 Why to call initialize_identity_maps()?
  • 58. Compressed vmlinux: [.text] .Lrelocated (2/5) 4 5 Why to map boot_params and command line?
  • 59. Compressed vmlinux: parse_elf (3/5) 4 ELF Header 0x1000000 decompressed vmlinux.bin.bz (vmlinux.bin – ELF format) program headers program header #0 (.text, .rodata, .pci_fixup….) 0x1200000 program header #1 (.data .vvar) program header #2 (.init.text .altinstr_aux …) 0x1a00000 0x1ac2000 program header #3 (.notes) 0x18886b0 0x1000000 program header #0 (.text, .rodata, .pci_fixup….) 0x1800000 program header #1 (.data .vvar) program header #2 (.init.text .altinstr_aux …) 0x18c2000 Physical memory Physical memory
  • 60. Compressed vmlinux: handle_relocations (4/5) 4 CONFIG_RELOCATABLE • Retain relocation information (generate .rel.* or rela.* sections) when building a kernel image, so it can be loaded someplace besides the default address (CONFIG_PHYSICAL_START = 16MB). • Use case: kdump kernel (recovery kernel) handle_relocations() - Relocation if CONFIG_X86_NEED_RELOCS is set • Depend on RANDOMIZE_BASE || (X86_32 && RELOCATABLE) • Scan relocation tables (.rel.* or .rela.* sections) for symbol relocation
  • 61. Compressed vmlinux: handle_relocations (5/5) 4 vmlinux.bin.bz vmlinux.bin vmlinux.relocs handle_relocations(): Perform relocation backwards from the end of the decompressed vmlinux 64-bit relocation address 0 32-bit relocation address 0 -R section_name: Remove any section matching section_name -S or strip-all: Do not copy relocation and symbol information from the source file objdump options
  • 62. Recap setup.bin (arch/x86/boot/setup.bin) Compressed vmlinux (Protected-mode kernel) Note ELF: arch/x86/boot/compressed/vmlinux Binary: arch/x86/boot/vmlinux.bin CRC bzImage
  • 63. [More info] bzImage = vmlinuz On a physical machine Source code: arch/x86/boot/Makefile, arch/x86/boot/install.sh
  • 64. Reference • The Linux/x86 Boot Protocol, Documentation/x86/boot.rst • Intel® 64 and IA-32 Architectures Software Developer’s Manual • https://wdv4758h.github.io/notes/blog/linux-kernel-boot.html • Linux insides, https://0xax.gitbooks.io/linux-insides/content/
  • 66. gdb: Preparation for debugging real-mode of Linux kernel (1/2) Github: https://github.com/AdrianHuang/gdb-linux-real-mode
  • 67. gdb: Preparation for debugging real-mode of Linux kernel (2/2) Github: https://github.com/AdrianHuang/gdb-linux-real-mode
  • 68. initialize_identity_maps x86_mapping_info void *(*alloc_pgt_page)(void *) void *context unsigned long page_flag unsigned long offset alloc_pgt_data unsigned char *pgt_buf unsigned long pgt_buf_size unsigned long pgt_buf_offset bool direct_gbpages unsigned long kernpg_flag
  • 69. UEFI booting flow – EFI boot stub: Entry point AddressOfEntryPoint (efi_pe_entry): 0x18d84a ImageBase = 0x1000000 Physical address of AddressofEntryPoint = 0x1000000 + 0x18d84a = 0x118d84a
  • 70. UEFI booting flow – EFI Handover protocol
  • 71. UEFI booting flow – EFI Handover protocol
  • 72. UEFI booting flow – EFI Handover protocol Where is the address of bzimage loaded by boot loader?