SlideShare a Scribd company logo
COMP9242 
Advanced Operating Systems 
S2/2014 Week 1: 
Introduction to seL4 
@GernotHeiser, NICTA and UNSW 
COMP9242 S2/2014 W01
Copyright Notice 
These slides are distributed under the Creative Commons 
Attribution 3.0 License 
• You are free: 
– to share—to copy, distribute and transmit the work 
– to remix—to adapt the work 
• under the following conditions: 
– Attribution: You must attribute the work (but not in any way that 
suggests that the author endorses you or your use of the work) as 
follows: 
• “Courtesy of Gernot Heiser, [Institution]”, where [Institution] is one of 
“UNSW” or “NICTA” 
The complete license text can be found at 
http://creativecommons.org/licenses/by/3.0/legalcode 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 2 Attribution License 
COMP9242 S2/2014 W01
Monolithic Kernels vs Microkernels 
• Idea of microkernel: 
– Flexible, minimal platform 
– Mechanisms, not policies 
– Goes back to Nucleus [Brinch Hansen, CACM’70] 
VFS 
IPC, file system 
Scheduler, virtual memory 
Device drivers, dispatcher 
Hardware 
IPC, virtual memory 
User 
Mode 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 3 Attribution License 
COMP9242 S2/2014 W01 
Hardware 
Application 
Application 
Unix 
Server 
File 
Server 
Device 
Driver 
Syscall 
IPC 
Kernel 
Mode
Microkernel Evolution 
First generation 
• Eg Mach [’87] 
Memory Objects 
Low-level FS, 
Swapping 
Devices 
Kernel memory 
Scheduling 
IPC, MMU abstr. 
• 180 syscalls 
• 100 kLOC 
• 100 μs IPC 
Third generation 
• seL4 [’09] 
Memory-mangmt 
library 
Scheduling 
• ~3 syscalls 
• 9 kLOC 
• 0.2–1 μs IPC 
Kernel memory 
Scheduling 
IPC, MMU abstr. 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 4 Attribution License 
COMP9242 S2/2014 W01 
Second generation 
IPC, MMU abstr. 
• Eg L4 [’95] 
• ~7 syscalls 
• ~10 kLOC 
• ~ 1 μs IPC
2nd-Generation Microkernels 
• 1st-generation kernels (Mach, Chorus) were a failure 
– Complex, inflexible, slow 
• L4 was first 2G microkernel [Liedtke, SOSP’93, SOSP’95] 
– Radical simplification & manual micro-optimisation 
– “A concept is tolerated inside the microkernel only if moving it outside 
the kernel, i.e. permitting competing implementations, would prevent the 
implementation of the system’s required functionality.” 
– High IPC performance 
• Family of L4 kernels: 
– Original Liedtke (GMD) assembler kernel (‘95) 
– Family of kernels developed by Dresden, UNSW/NICTA, Karlsruhe 
– Commercial clones (PikeOS, P4, CodeZero, …) 
– Influenced commercial QNX (‘82), Green Hills Integrity (‘90s) 
– Generated NICTA startup Open Kernel Labs (OK Labs) 
• large-scale commercial deployment (multiple billions shipped) 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 5 Attribution License 
COMP9242 S2/2014 W01
L4: A Family of Microkernels 
API Inheritance 
Code Inheritance 
L4/MIPS 
L4/Alpha 
L3 → L4 “X” Hazelnut Pistachio 
seL4 
OKL4 Microvisor 
L4-embed. 
OKL4 μKernel 
Codezero 
Fiasco Fiasco.OC 
GMD/IBM/Karlsruhe NOVA 
P4 → PikeOS 
93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 6 Attribution License 
UNSW/NICTA 
Dresden 
Commercial Clone 
OK Labs
Issues of 2G Microkernels 
• L4 solved performance issue [Härtig et al, SOSP’97] 
• Left a number of security issues unsolved 
• Problem: ad-hoc approach to protection and resource management 
– Global thread name space ⇒ covert channels [Shapiro’03] 
– Threads as IPC targets ⇒ insufficient encapsulation 
– Single kernel memory pool ⇒ DoS attacks 
– Insufficient delegation of authority ⇒ limited flexibility, performance 
• Addressed by seL4 
– Designed to support safety- and security-critical systems 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 7 Attribution License 
COMP9242 S2/2014 W01
seL4 Principles 
• Single protection mechanism: capabilities 
– Except for time ! 
• All resource-management policy at user level 
– Painful to use 
– Need to provide standard memory-management library 
• Results in L4-like programming model 
• Suitable for formal verification (proof of implementation correctness) 
– Attempted since ‘70s 
– Finally achieved by L4.verified project at NICTA [Klein et al, SOSP’09] 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 8 Attribution License 
COMP9242 S2/2014 W01
seL4 Concepts 
• Capabilities (Caps) 
– mediate access 
• Kernel objects: 
– Threads (thread-control blocks, TCBs) 
– Address spaces (page table objects, PDs, PTs) 
– IPC endpoints (EPs, AsyncEPs) 
– Capability spaces (CNodes) 
– Frames 
– Interrupt objects 
– Untyped memory 
• System calls 
– Send, Wait (and variants) 
– Yield 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 9 Attribution License 
COMP9242 S2/2014 W01
Capabilities (Caps) 
• Token representing privileges [Dennis & Van Horn, ‘66] 
– Cap = “prima facie evidence of right to perform operation(s)” 
• Object-specific ⇒ fine-grained access control 
– Cap identifies object ⇒ is an (opaque) object name 
– Leads to object-oriented API: 
err = method( cap, args ); 
– Privilege check at invocation time 
• Caps were used in microkernels before 
– KeyKOS (‘85), Mach (’87) 
– EROS (‘99): first well-performing cap system 
– OKL4 V2.1 (’08): first cap-based L4 kernel 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 10 Attribution License 
COMP9242 S2/2014 W01
seL4 Capabilities 
• Stored in cap space (CSpace) 
– Kernel object made up of CNodes 
– each an array of cap “slots” 
• Inaccessible to userland 
– But referred to by pointers into CSpace (slot addresses) 
– These CSpace addresses are called CPTRs 
• Caps convey specific privilege (access rights) 
– Read, Write, Grant (cap transfer) [Yes, there should be Execute!] 
• Main operations on caps: 
– Invoke: perform operation on object referred to by cap 
• Possible operations depend on object type 
– Copy/Mint/Grant: create copy of cap with same/lesser privilege 
– Move/Mutate: transfer to different address with same/lesser privilege 
– Delete: invalidate slot 
• Only affects object if last cap is deleted 
– Revoke: delete any derived (eg. copied or minted) caps 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 11 Attribution License 
COMP9242 S2/2014 W01
Inter-Process Communication (IPC) 
• Fundamental microkernel operation 
– Kernel provides no services, only mechanisms 
– OS services provided by (protected) user-level server processes 
– invoked by IPC 
• seL4 IPC uses a handshake through endpoints: 
– Transfer points without storage capacity 
– Message must be transferred instantly 
• One partner may have to block 
• Single copy user ➞ user by kernel 
• Two endpoint types: 
– Synchronous (Endpoint) and asynchronous (AsyncEP) 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 12 Attribution License 
COMP9242 S2/2014 W01 
seL4 
Client Server 
IPC 
send receive
Synchronous Endpoint 
….... 
• Threads must rendez-vous for message transfer 
– One side blocks until the other is ready 
– Implicit synchronisation 
• Message copied from sender’s to receiver’s message registers 
– Message is combination of caps and data words 
• presently max 121 words (484B, incl message “tag”) 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 13 Attribution License 
COMP9242 S2/2014 W01 
….... 
Thread1 
Running Blocked 
Thread2 
Blocked Running 
Send (ep1_cap, …) 
….. Wait (ep1_cap, …) 
Send (ep2_cap, …) 
Wait (ep2_cap, …) 
…....
Asynchronous Endpoint 
• Avoids blocking 
– send OR-s cap badge to AEP’s data word 
– no caps can be sent 
• Receiver can poll or wait 
– waiting returns and clears data word 
– polling just returns data word 
• Similar to interrupt (with small payload, like interrupt mask) 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 14 Attribution License 
COMP9242 S2/2014 W01 
….... 
Thread1 
Running Blocked 
Thread2 
Blocked Running 
w = Poll (ep_cap, …) 
…... w = Wait (ep_cap,…) 
Send (ep_ c a p, …) ….... 
Send (ep_cap, …)
Receiving from Sync and Async Endpoints 
Server with synchronous and asynchronous interface 
• Example: file system 
– synchronous (RPC-style) client protocol 
– asynchronous notifications from driver 
• Could have separate threads waiting on endpoints 
– forces multi-threaded server, concurrency control 
• Alternative: allow single thread to wait on both EP types 
– Mechanism: 
• AsyncEP is bound to thread with BindAEP() syscall 
• thread waits on synchronous endpoint 
• async message delivered as if been waiting on AsyncEP 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 15 Attribution License 
COMP9242 S2/2014 W01 
Server 
Client Driver
Sync Endpoints are Message Queues 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 16 Attribution License 
Kernel 
• EP has no sense of direction 
• May queue senders or receivers 
– never both at the same time! 
• Communication needs 2 EPs! 
COMP9242 S2/2014 W01 
Server 
First invocation 
queues caller 
Client1 
Client2 
TCB1 TCB2 EP 
Further callers of 
same direction 
queue behind
Client-Server Communication 
• Asymmetric relationship: 
– Server widely accessible, clients not 
– How can server reply back to client (distinguish between them)? 
• Client can pass (session) reply cap in first request 
– server needs to maintain session state 
– forces stateful server design 
• seL4 solution: Kernel provides single-use reply cap 
– only for Call operation (Send+Wait) 
– allows server to reply to client 
– cannot be copied/minted/re-used but can be moved 
– one-shot (automatically destroyed after first use) 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 17 Attribution License 
COMP9242 S2/2014 W01 
Client1 
Server Client2
Call RPC Semantics 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 18 Attribution License 
Client 
Call(ep,…) 
process 
Server 
Wait(ep,&rep) 
process 
Send(rep,…) 
process 
COMP9242 S2/2014 W01 
Client Server 
Kernel 
mint rep 
deliver to server 
deliver to client 
destroy rep
Identifying Clients 
Stateful server serving multiple clients 
• Must respond to 
correct client 
– Ensured by reply cap 
• Must associate request 
with correct state 
• Could use separate EP per client 
– endpoints are lightweight (16 B) 
– but requires mechanism to wait on a set of EPs (like select) 
• Instead, seL4 allows to individually mark (“badge”) caps to same EP 
– server provides individually badged caps to clients 
– server tags client state with badge 
– kernel delivers badge to receiver on invocation of badged caps 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 19 Attribution License 
COMP9242 S2/2014 W01 
Client1 
Server 
Client1 
state 
Client2 Client2 
state
IPC Mechanics: Virtual Registers 
• Like physical registers, virtual registers are thread state 
– context-switched by kernel 
– implemented as physical registers or thread-local memory location 
• Message registers 
– contain message transferred in IPC 
– architecture-dependent subset mapped to physical registers 
• 5 on ARM, 3 on x86 
– library interface hides details 
– 1st message register is special, contains message tag 
• Reply cap 
– overwritten by next receive! 
– can move to CSpace with cspace_save_reply_cap() 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 20 Attribution License 
COMP9242 S2/2014 W01
IPC Message Format 
CSpace reference for receiving 
caps (Receive only) 
Caps (on Send) 
Tag Message Badges (on Receive) 
Note: Don’t need to deal with this explicitly for project 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 21 Attribution License 
COMP9242 S2/2014 W01 
Msg 
Length 
# 
Caps 
Caps 
Label unwrapped 
Meaning defined 
by IPC protocol 
(Kernel or user) 
Raw data 
Bitmap indicating 
caps which had 
badges extracted 
Caps sent 
or received
Client-Server IPC Example 
Allocate EP and retype 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 22 Attribution License 
Server 
Client 
COMP9242 S2/2014 W01 
seL4_MessageInfo_t tag = seL4_MessageInfo_new(0, 0, 0, 1); 
seL4_SetTag(tag); 
seL4_SetMR(0,1); 
seL4_Call(server_c, tag); 
Load into 
tag register 
Set message 
register #0 
seL4_Word addr = ut_alloc(seL4_EndpointBits); 
err = cspace_ut_retype_addr(tcb_addr, seL4_EndpointObject, 
seL4_EndpointBits, cur_cspace, &ep_cap) 
seL4_CPtr cap = cspace_mint_cap(dest, cur_cspace, ep_cap, seL4_all_rights, 
seL4_CapData_MakeBadge_new)); 
… 
seL4_Word badge; 
seL4_MessageInfo_t msg = seL4_Wait(ep, &badge); 
… 
seL4_MessageInfo_t reply = seL4_MessageInfo_new(0, 0, 0, 0); 
seL4_Reply(reply); 
Cap is badged 0 
Insert EP into 
CSpace 
Implicit use 
of reply cap
Server Saving Reply Cap 
seL4_Word addr = ut_alloc(seL4_EndpointBits); 
err = cspace_ut_retype_addr(tcb_addr, seL4_EndpointObject, 
seL4_EndpointBits, cur_cspace, &ep_cap) 
seL4_CPtr cap = cspace_mint_cap(dest, cur_cspace, ep_cap, seL4_all_rights, 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 23 Attribution License 
Server 
COMP9242 S2/2014 W01 
seL4_CapData_MakeBadge(0)); 
… 
seL4_Word badge; 
seL4_MessageInfo_t msg = seL4_Wait(ep, &badge); 
seL4_CPtr slot = cspace_save_reply_cap(cur_cspace); 
… 
seL4_MessageInfo_t reply = seL4_MessageInfo_new(0, 0, 0, 0); 
seL4_Send(slot, reply); 
cspace_free_cslot(slot); 
Save reply cap 
in CSpace 
Explicit use 
of reply cap 
Reply cap no 
longer valid
IPC Operations Summary 
• Send (ep_cap, …), Wait (ep_cap, …), Wait (aep_cap, …) 
– blocking message passing 
– needs Write, Read permission, respectively 
• NBSend (ep_cap, …) 
– discard message if receiver isn’t ready 
• Call (ep_cap, …) 
– equivalent to Send (ep_cap,…) + reply-cap + Wait (ep_cap,…) 
• Reply (…) 
– equivalent to Send (rep_cap, …) 
• ReplyWait (ep_cap, …) 
– equivalent to Reply (…) + Wait (ep_cap, …) 
– purely for efficiency of server operation 
• Notify (aep_cap, …), Poll (aep_cap, …) 
– non-blocking send / check for message on AsyncEP 
No failure notification where this reveals info on other entities! 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 24 Attribution License 
COMP9242 S2/2014 W01 
Need error 
handling 
protocol !
Derived Capabilities 
• Badging is an example of capability derivation 
• The Mint operation creates a new, less powerful cap 
– Can add a badge 
• Mint ( , ) ➞ 
– Can strip access rights 
• eg WR➞R/O 
• Granting transfers caps over an Endpoint 
– Delivers copy of sender’s cap(s) to receiver 
• reply caps are a special case of this 
– Sender needs Endpoint cap with Grant permission 
– Receiver needs Endpoint cap with Write permission 
• else Write permission is stripped from new cap 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 25 Attribution License 
• Retyping 
– Fundamental operation of seL4 memory management 
– Details later… 
COMP9242 S2/2014 W01
seL4 System Calls 
• Notionally, seL4 has 6 syscalls: 
– Yield(): invokes scheduler 
• only syscall which doesn’t require a cap! 
– Send(), Receive() and 3 variants/combinations thereof 
• Notify() is actually not a separate syscall but same as Send() 
– This is why I earlier said “approximately 3 syscalls” ☺ 
• All other kernel operations are invoked by “messaging” 
– Invoking Send()/Receive() on an object cap 
– Each object has a set of kernel protocols 
• operations encoded in message tag 
• parameters passed in message words 
– Mostly hidden behind “syscall” wrappers 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 26 Attribution License 
COMP9242 S2/2014 W01
seL4 Memory Management Principles 
• Memory (and caps referring to it) is typed: 
– Untyped memory: 
• unused, free to Retype into something else 
– Frames: 
• (can be) mapped to address spaces, no kernel semantics 
– Rest: TCBs, address spaces, CNodes, EPs 
• used for specific kernel data structures 
• After startup, kernel never allocates memory! 
– All remaining memory made Untyped, handed to initial address space 
• Space for kernel objects must be explicitly provided to kernel 
– Ensures strong resource isolation 
• Extremely powerful tool for shooting oneself in the foot! 
– We hide much of this behind the cspace and ut allocation libraries 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 27 Attribution License 
COMP9242 S2/2014 W01
Capability Derivation 
• Copy, Mint, Mutate, Revoke are invoked on CNodes 
Mint( , dest, src, rights, ) 
– CNode cap must provide appropriate rights 
• Copy takes a cap for destination 
– Allows copying of caps between CSpaces 
– Alternative to granting via IPC (if you have privilege to access Cspace!) 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 28 Attribution License 
COMP9242 S2/2014 W01
Cspace Operations 
extern cspace_t * cspace_create(int levels); /* either 1 or 2 level */ 
extern cspace_err_t cspace_destroy(cspace_t *c); 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 29 Attribution License 
COMP9242 S2/2014 W01 
extern seL4_CPtr cspace_copy_cap(cspace_t *dest, cspace_t *src, 
seL4_CPtr src_cap, seL4_CapRights rights); 
extern seL4_CPtr cspace_mint_cap(cspace_t *dest, cspace_t *src, 
seL4_CPtr src_cap, seL4_CapRights rights, 
seL4_CapData badge); 
extern seL4_CPtr cspace_move_cap(cspace_t *dest, cspace_t *src, 
seL4_CPtr src_cap); 
extern cspace_err_t cspace_delete_cap(cspace_t *c, seL4_CPtr cap); 
extern cspace_err_t cspace_revoke_cap(cspace_t *c, seL4_CPtr cap);
cspace and ut libraries 
ut_alloc() 
ut_free() 
… 
OS seL4 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 30 Attribution License 
COMP9242 S2/2014 W01 
cspace_create() 
cspace_destroy() 
… 
Personality 
System Calls 
Library Calls 
User-level 
Wraps messy 
Cspace tree & 
slot management 
Manages slab 
of Untyped Extend for 
own needs!
seL4 Memory Management Approach 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 31 Attribution License 
COMP9242 S2/2014 W01 
Global Resource Manager 
RAM Kernel 
Data 
GRM 
Data 
Resource Manager 
RM 
Data 
Resource Manager 
RM 
Data 
Addr 
Space 
Addr 
Space 
Addr 
Space 
Addr 
Space 
RM 
RM 
Data 
Resources fully 
delegated, allows 
autonomous 
operation 
Strong isolation, 
No shared kernel 
resources
Memory Management Mechanics: Retype 
Retype (Untyped, 21) 
Retype (TCB, 2n) 
… … 
Retype (Frame, 22) 
F0 F1 UT1 F2 F3 UT2 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 32 Attribution License 
COMP9242 S2/2014 W01 
UT0 
Retype (Untyped, 21) 
UT3 UT4 
Retype (CNode, 2m, 2n) 
r,w r,w r,w r,w 
… … 
r 
Mint (r) 
Revoke()
seL4 Address Spaces (VSpaces) 
• Very thin wrapper around hardware page tables 
– Architecture-dependent 
– ARM and (32-bit) x86 are very similar 
• Page directories (PDs) map page tables, 
page tables (PTs) map pages 
• A VSpace is represented 
by a PD object: 
– Creating a PD (by Retype) 
creates the VSpace 
– Deleting the PD deletes 
the VSpace 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 33 Attribution License 
COMP9242 S2/2014 W01 
PageTable_Map(PD) 
Page_Map(PT)
Address Space Operations 
Sample code 
we provide 
• Each mapping has: 
– virtual_address, phys_address, address_space and frame_cap 
– address_space struct identifies the level 1 page_directory cap 
– you need to keep track of (frame_cap, PD_cap, v_adr, p_adr)! 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 34 Attribution License 
COMP9242 S2/2014 W01 
seL4_Word frame_addr = ut_alloc(seL4_PageBits); 
err = cspace_ut_retype_addr(frame_addr, seL4_ARM_Page, 
seL4_ARM_PageBits, cur_cspace, &frame_cap); 
map_page(frame_cap, pd_cap, 0xA0000000, seL4_AllRights, 
seL4_ARM_Default_VMAttributes); 
bzero((void *)0xA0000000, PAGESIZE); 
seL4_ARM_Page_Unmap(frame_cap); 
cspace_delete_cap(frame_cap) 
ut_free(frame_addr, seL4_PageBits); 
cap to level 1 
page table
Mapping Same Frame Twice: Shared Memory 
seL4_CPtr new_frame_cap = cspace_copy_cap(cur_cspace, cur_cspace, 
• Each mapping requires its own frame cap even for the same frame 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 35 Attribution License 
COMP9242 S2/2014 W01 
existing_frame_cap, 
seL4_AllRights); 
map_page(new_frame_cap, pd_cap, 0xA0000000, seL4_AllRights, 
seL4_ARM_Default_VMAttributes); 
bzero((void *)0xA0000000, PAGESIZE); 
seL4_ARM_Page_Unmap(existing_frame_cap); 
cspace_delete_cap(existing_frame_cap) 
seL4_ARM_Page_Unmap(new_frame_cap); 
cspace_delete_cap(new_frame_cap) 
ut_free(frame_addr, seL4_PageBits);
Memory Management Caveats 
• The object manager handles allocation for you 
• However, it is very simplistic, you need to understand how it works 
• Simple rule (it’s buddy-based): 
– Freeing an object of size n: you can allocate new objects <= size n 
– Freeing 2 objects of size n does not mean that you can allocate an 
object of size 2n. 
Object size on ARM (Bytes) 
Frame 212 
Page directory 214 
Endpoint 24 
Cslot 24 
TCB 29 
Page table 210 
• All kernel objects must be size aligned! 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 36 Attribution License 
COMP924 2 S2/2014 W01
Memory Management Caveats 
• Objects are allocated by Retype() of Untyped memory by seL4 kernel 
– The kernel will not allow you to overlap objects 
• ut_alloc and ut_free() manage user-level’s view of 
Untyped allocation. 
– Major pain if kernel and user’s view diverge 
– TIP: Keep objects address and CPtr together. 
Untyped Memory 215 B 
8 frames 
B 
But debugging 
nightmare if 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons COMP9242 S2/2014 W01 37 Attribution License 
you try!! 
• Be careful with allocations! 
• Don’t try to allocate all of physical 
memory as frames, as you need 
more memory for TCBs, endpoints 
etc. 
• Your frametable will eventually 
integrate with ut_alloc to manage 
the 4K untyped size.
Threads 
• Theads are represented by TCB objects 
• They have a number of attributes (recorded in TCB): 
– VSpace: a virtual address space 
• page directory reference 
• multiple threads can belong to the same VSpace 
– CSpace: capability storage 
• CNode reference (CSpace root) plus a few other bits 
– Fault endpoint 
• Kernel sends message to this EP if the thread throws an exception 
– IPC buffer (backing storage for virtual registers) 
– stack pointer (SP), instruction pointer (IP), user-level registers 
– Scheduling priority 
– Time slice length (presently a system-wide constant) 
• Yes, this is broken! (Will be fixed soon…) 
• These must be explicitly managed 
– … we provide an example you can modify 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 38 Attribution License 
COMP9242 S2/2014 W01
Threads 
Creating a thread 
• Obtain a TCB object 
• Set attributes: Configure() 
– associate with VSpace, CSpace, fault EP, prio, define IPC buffer 
• Set SP, IP (and optionally other registers): WriteRegisters() 
– this results in a completely initialised thread 
– will be able to run if resume_target is set in call, else still inactive 
• Activated (made schedulable): Resume() 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 39 Attribution License 
COMP9242 S2/2014 W01
Creating a Thread in Own AS and cspace_t 
static char stack[100]; 
int thread_fct() { 
while(1); 
return 0; 
} 
/* Allocate and map new frame for IPC buffer as before */ 
seL4_Word tcb_addr = ut_alloc(seL4_TCBBits); 
err = cspace_ut_retype_addr(tcb_addr, seL4_TCBObject, seL4_TCBBits, 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 40 Attribution License 
COMP9242 S2/2014 W01 
cur_cspace, &tcb_cap) 
err = seL4_TCB_Configure(tcb_cap, FAULT_EP_CAP, PRIORITY, 
curspace->root_cnode, seL4NilData, 
seL4_CapInitThreadPD, seL4_NilData, 
PROCESS_IPC_BUFFER, ipc_buffer_cap); 
seL4_UserContext context = { .pc = &thread, .sp = &stack}; 
seL4_TCB_WriteRegisters(tcb_cap, 1, 0, 2, &context); 
If you use threads, write a library to create and destroy them.
Threads and Stacks 
• Stacks are completely user-managed, kernel doesn’t care! 
– Kernel only preserves SP, IP on context switch 
• Stack location, allocation, size must be managed by userland 
• Beware of stack overflow! 
– Easy to grow stack into other data 
• Pain to debug! 
– Take special care with automatic arrays! 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 41 Attribution License 
COMP9242 S2/2014 W01 
Stack 1 Stack 2 
f () { 
int buf[10000]; 
. . . 
}
Creating a Thread in New AS and cspace_t 
/* Allocate, retype and map new frame for IPC buffer as before 
* Allocate and map stack??? 
* Allocate and retype a TCB as before 
* Allocate and retype a seL4_ARM_PageDirectoryObject of size seL4_PageDirBits 
* Mint a new badged cap to the syscall endpoint 
*/ 
cspace_t * new_cpace = ut_alloc(seL4_TCBBits); 
char *elf_base = cpio_get_file(_cpio_archive, “test”)->p_base; 
err = elf_load(new_pagedirectory_cap, elf_base); 
unsigned int entry = elf_getEntryPoint(elf_base); 
err = seL4_TCB_Configure(tcb_cap, FAULT_EP_CAP, PRIORITY, 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 42 Attribution License 
COMP9242 S2/2014 W01 
new_cspace->root_cnode, seL4NilData, 
new_pagedirectory_cap, seL4_NilData, 
PROCESS_IPC_BUFFER, ipc_buffer_cap); 
seL4_UserContext context = {.pc = entry, .sp = &stack}; 
seL4_TCB_WriteRegisters(tcb_cap, 1, 0, 2, &context);
seL4 Scheduling 
• Presently, seL4 uses 256 hard priorities (0–255) 
– Priorities are strictly observed 
– The scheduler will always pick the highest-prio runnable thread 
– Round-robin scheduling within prio level 
• Aim is real-time performance, not fairness 
– Kernel itself will never change the prio of a thread 
– Achieving fairness (if desired) is the job of user-level servers 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 43 Attribution License 
COMP9242 S2/2014 W01 
0 prio 255
Exception Handling 
• A thread can trigger different kinds of exceptions: 
– invalid syscall 
• may require instruction emulation or result from virtualization 
– capability fault 
• cap lookup failed or operation is invalid on cap 
– page fault 
• attempt to access unmapped memory 
• may have to grow stack, grow heap, load dynamic library, … 
– architecture-defined exception 
• divide by zero, unaligned access, … 
• Results in kernel sending message to fault endpoint 
– exception protocol defines state info that is sent in message 
• Replying to this message restarts the thread 
– endless loop if you don’t remove the cause for the fault first! 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 44 Attribution License 
COMP9242 S2/2014 W01
Exception Handling 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 45 Attribution License 
COMP9242 S2/2014 W01 
TCB 
Exception 
Handler 
Exception triggered. 
Kernel fakes message 
from thread to handler 
Handler performs 
appropriate action 
(e.g. map page). 
Handler replies 
Kernel intercepts to restart thread 
message and 
restarts thread
Interrupt Management 
• seL4 models IRQs as messages sent to an AEP 
– Interrupt handler has Receive cap on that AEP 
• 2 special objects used for managing and acknowledging interrupts: 
– Single IRQControl object 
• single IRQControl cap provided by kernel to initial VSpace 
• only purpose is to create IRQHandler caps 
– Per-IRQ-source IRQHandler object 
• interrupt association and dissociation 
• interrupt acknowledgment 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 46 Attribution License 
COMP9242 S2/2014 W01 
IRQControl 
Get(usb) 
IRQHandler
Interrupt Handling 
• IRQHandler cap allows driver to bind AEP to interrupt 
• Afterwards: 
– AEP is used to receive interrupt 
– IRQHandler is used to acknowledge interrupt 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 47 Attribution License 
COMP9242 S2/2014 W01 
SetEndpoint(aep) 
IRQHandler 
Wait(aep) 
Ack(handler) 
seL4_IRQHandler interrupt = cspace_irq_control_get_cap(cur_cspace, 
seL4_CapIRQControl, irq_number); 
seL4_IRQHandler_SetEndpoint(interrupt, async_ep_cap); 
seL4_IRQHander_ack(interrupt); 
Ack first to 
unmask IRQ
Device Drivers 
• Drivers do three things: 
– Handle interrupts (already explained) 
– Communicate with rest of OS (IPC + shared memory) 
– Access device registers 
• Device register access 
– Devices are memory-mapped on ARM 
– Have to find frame cap from bootinfo structure 
– Map the appropriate page in the driver’s VSpace 
device_vaddr = map_device(0xA0000000, (1 << seL4_PageBits)); 
… 
*((void *) device_vaddr= …; 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 48 Attribution License 
COMP9242 S2/2014 W01 
Magic device 
register access
Where To Find More 
• UNSW Advanced Operating Systems Course 
http://www.cse.unsw.edu.au/~cs9242 
• NICTA Trustworthy Systems research 
http://trustworthy.systems 
• seL4 open-source portal 
http://sel4.systems 
• L4 Microkernel Headquarters 
http://l4hq.org 
• Gernot’s blog: 
http://microkerneldude.wordpress.com/ 
• Gernot’s research home page: 
http://ssrg.nicta.com.au/people/?cn=Gernot+Heiser 
©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 49 Attribution License 
COMP9242 S2/2014 W01

More Related Content

PPTX
RISC-V Boot Process: One Step at a Time
PPT
linux device driver
PDF
Embedded Linux Kernel - Build your custom kernel
PDF
gcc and friends
PDF
Board Bringup
ODP
Linux Internals - Kernel/Core
PPT
Basic Linux Internals
PDF
강좌 06 부트로더
RISC-V Boot Process: One Step at a Time
linux device driver
Embedded Linux Kernel - Build your custom kernel
gcc and friends
Board Bringup
Linux Internals - Kernel/Core
Basic Linux Internals
강좌 06 부트로더

What's hot (20)

PPTX
Yocto Project introduction
PDF
Kernel Recipes 2017: Using Linux perf at Netflix
PPTX
Inter process communication
PDF
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
PDF
Linux kernel tracing
PDF
Decompressed vmlinux: linux kernel initialization from page table configurati...
ODP
Looking into trusted and encrypted keys
PDF
Static Partitioning with Xen, LinuxRT, and Zephyr: A Concrete End-to-end Exam...
PDF
Tiered Compilation in Hotspot JVM
PPT
U Boot or Universal Bootloader
PDF
Bootloaders
PPTX
Linux Kernel Booting Process (1) - For NLKB
PDF
New Ways to Find Latency in Linux Using Tracing
PDF
Apache Hbase バルクロードの使い方
PDF
jemalloc 세미나
PDF
Linux : The Common Mailbox Framework
PDF
Accelerating Envoy and Istio with Cilium and the Linux Kernel
PPTX
U-Boot Porting on New Hardware
Yocto Project introduction
Kernel Recipes 2017: Using Linux perf at Netflix
Inter process communication
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Linux kernel tracing
Decompressed vmlinux: linux kernel initialization from page table configurati...
Looking into trusted and encrypted keys
Static Partitioning with Xen, LinuxRT, and Zephyr: A Concrete End-to-end Exam...
Tiered Compilation in Hotspot JVM
U Boot or Universal Bootloader
Bootloaders
Linux Kernel Booting Process (1) - For NLKB
New Ways to Find Latency in Linux Using Tracing
Apache Hbase バルクロードの使い方
jemalloc 세미나
Linux : The Common Mailbox Framework
Accelerating Envoy and Istio with Cilium and the Linux Kernel
U-Boot Porting on New Hardware
Ad

Viewers also liked (9)

PDF
Microkernel design
PDF
給 GLib 使用者的 libev 簡介
PPTX
Microkernel
PDF
seL4 on RISC-V/lowRISC - ORCONF'15
PDF
Reference Counting
PDF
From L3 to seL4: What have we learnt in 20 years of L4 microkernels
PPTX
Process scheduling
PDF
他の会計ソフトで入力した取引データを取り込む方法
PDF
Microkernel Evolution
Microkernel design
給 GLib 使用者的 libev 簡介
Microkernel
seL4 on RISC-V/lowRISC - ORCONF'15
Reference Counting
From L3 to seL4: What have we learnt in 20 years of L4 microkernels
Process scheduling
他の会計ソフトで入力した取引データを取り込む方法
Microkernel Evolution
Ad

Similar to seL4 intro (20)

PPTX
Operating Systems
PPTX
UNIT II DIS.pptx
PDF
Van jaconson netchannels
PPT
Distributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGY
PPT
Processes and Threads in Windows Vista
PPT
Chapter 6 os
PDF
Distributed computing
PDF
DistributedOSintro.pdf from CSE Distributed operating system
PPTX
PPTX
Lec 9-os-review
PPT
UNIT I Process management main concept.ppt
PPT
Os4
PDF
Lecture 3_Processes in Operating Systems.pdf
PPT
Unit 2(oss) (1)
PPTX
UNIT II.pptx
PPT
Synchronization linux
PPT
4.Process.ppt
PPT
Advanced Operating System, Distributed Operating System
PDF
CS9222 ADVANCED OPERATING SYSTEMS
PPT
Earhart
Operating Systems
UNIT II DIS.pptx
Van jaconson netchannels
Distributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGY
Processes and Threads in Windows Vista
Chapter 6 os
Distributed computing
DistributedOSintro.pdf from CSE Distributed operating system
Lec 9-os-review
UNIT I Process management main concept.ppt
Os4
Lecture 3_Processes in Operating Systems.pdf
Unit 2(oss) (1)
UNIT II.pptx
Synchronization linux
4.Process.ppt
Advanced Operating System, Distributed Operating System
CS9222 ADVANCED OPERATING SYSTEMS
Earhart

Recently uploaded (20)

PPTX
Essential Infomation Tech presentation.pptx
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
history of c programming in notes for students .pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
L1 - Introduction to python Backend.pptx
PDF
System and Network Administraation Chapter 3
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Nekopoi APK 2025 free lastest update
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Essential Infomation Tech presentation.pptx
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
history of c programming in notes for students .pptx
Wondershare Filmora 15 Crack With Activation Key [2025
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Operating system designcfffgfgggggggvggggggggg
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Upgrade and Innovation Strategies for SAP ERP Customers
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
L1 - Introduction to python Backend.pptx
System and Network Administraation Chapter 3
How Creative Agencies Leverage Project Management Software.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
Nekopoi APK 2025 free lastest update
Design an Analysis of Algorithms I-SECS-1021-03
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus

seL4 intro

  • 1. COMP9242 Advanced Operating Systems S2/2014 Week 1: Introduction to seL4 @GernotHeiser, NICTA and UNSW COMP9242 S2/2014 W01
  • 2. Copyright Notice These slides are distributed under the Creative Commons Attribution 3.0 License • You are free: – to share—to copy, distribute and transmit the work – to remix—to adapt the work • under the following conditions: – Attribution: You must attribute the work (but not in any way that suggests that the author endorses you or your use of the work) as follows: • “Courtesy of Gernot Heiser, [Institution]”, where [Institution] is one of “UNSW” or “NICTA” The complete license text can be found at http://creativecommons.org/licenses/by/3.0/legalcode ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 2 Attribution License COMP9242 S2/2014 W01
  • 3. Monolithic Kernels vs Microkernels • Idea of microkernel: – Flexible, minimal platform – Mechanisms, not policies – Goes back to Nucleus [Brinch Hansen, CACM’70] VFS IPC, file system Scheduler, virtual memory Device drivers, dispatcher Hardware IPC, virtual memory User Mode ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 3 Attribution License COMP9242 S2/2014 W01 Hardware Application Application Unix Server File Server Device Driver Syscall IPC Kernel Mode
  • 4. Microkernel Evolution First generation • Eg Mach [’87] Memory Objects Low-level FS, Swapping Devices Kernel memory Scheduling IPC, MMU abstr. • 180 syscalls • 100 kLOC • 100 μs IPC Third generation • seL4 [’09] Memory-mangmt library Scheduling • ~3 syscalls • 9 kLOC • 0.2–1 μs IPC Kernel memory Scheduling IPC, MMU abstr. ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 4 Attribution License COMP9242 S2/2014 W01 Second generation IPC, MMU abstr. • Eg L4 [’95] • ~7 syscalls • ~10 kLOC • ~ 1 μs IPC
  • 5. 2nd-Generation Microkernels • 1st-generation kernels (Mach, Chorus) were a failure – Complex, inflexible, slow • L4 was first 2G microkernel [Liedtke, SOSP’93, SOSP’95] – Radical simplification & manual micro-optimisation – “A concept is tolerated inside the microkernel only if moving it outside the kernel, i.e. permitting competing implementations, would prevent the implementation of the system’s required functionality.” – High IPC performance • Family of L4 kernels: – Original Liedtke (GMD) assembler kernel (‘95) – Family of kernels developed by Dresden, UNSW/NICTA, Karlsruhe – Commercial clones (PikeOS, P4, CodeZero, …) – Influenced commercial QNX (‘82), Green Hills Integrity (‘90s) – Generated NICTA startup Open Kernel Labs (OK Labs) • large-scale commercial deployment (multiple billions shipped) ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 5 Attribution License COMP9242 S2/2014 W01
  • 6. L4: A Family of Microkernels API Inheritance Code Inheritance L4/MIPS L4/Alpha L3 → L4 “X” Hazelnut Pistachio seL4 OKL4 Microvisor L4-embed. OKL4 μKernel Codezero Fiasco Fiasco.OC GMD/IBM/Karlsruhe NOVA P4 → PikeOS 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 6 Attribution License UNSW/NICTA Dresden Commercial Clone OK Labs
  • 7. Issues of 2G Microkernels • L4 solved performance issue [Härtig et al, SOSP’97] • Left a number of security issues unsolved • Problem: ad-hoc approach to protection and resource management – Global thread name space ⇒ covert channels [Shapiro’03] – Threads as IPC targets ⇒ insufficient encapsulation – Single kernel memory pool ⇒ DoS attacks – Insufficient delegation of authority ⇒ limited flexibility, performance • Addressed by seL4 – Designed to support safety- and security-critical systems ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 7 Attribution License COMP9242 S2/2014 W01
  • 8. seL4 Principles • Single protection mechanism: capabilities – Except for time ! • All resource-management policy at user level – Painful to use – Need to provide standard memory-management library • Results in L4-like programming model • Suitable for formal verification (proof of implementation correctness) – Attempted since ‘70s – Finally achieved by L4.verified project at NICTA [Klein et al, SOSP’09] ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 8 Attribution License COMP9242 S2/2014 W01
  • 9. seL4 Concepts • Capabilities (Caps) – mediate access • Kernel objects: – Threads (thread-control blocks, TCBs) – Address spaces (page table objects, PDs, PTs) – IPC endpoints (EPs, AsyncEPs) – Capability spaces (CNodes) – Frames – Interrupt objects – Untyped memory • System calls – Send, Wait (and variants) – Yield ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 9 Attribution License COMP9242 S2/2014 W01
  • 10. Capabilities (Caps) • Token representing privileges [Dennis & Van Horn, ‘66] – Cap = “prima facie evidence of right to perform operation(s)” • Object-specific ⇒ fine-grained access control – Cap identifies object ⇒ is an (opaque) object name – Leads to object-oriented API: err = method( cap, args ); – Privilege check at invocation time • Caps were used in microkernels before – KeyKOS (‘85), Mach (’87) – EROS (‘99): first well-performing cap system – OKL4 V2.1 (’08): first cap-based L4 kernel ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 10 Attribution License COMP9242 S2/2014 W01
  • 11. seL4 Capabilities • Stored in cap space (CSpace) – Kernel object made up of CNodes – each an array of cap “slots” • Inaccessible to userland – But referred to by pointers into CSpace (slot addresses) – These CSpace addresses are called CPTRs • Caps convey specific privilege (access rights) – Read, Write, Grant (cap transfer) [Yes, there should be Execute!] • Main operations on caps: – Invoke: perform operation on object referred to by cap • Possible operations depend on object type – Copy/Mint/Grant: create copy of cap with same/lesser privilege – Move/Mutate: transfer to different address with same/lesser privilege – Delete: invalidate slot • Only affects object if last cap is deleted – Revoke: delete any derived (eg. copied or minted) caps ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 11 Attribution License COMP9242 S2/2014 W01
  • 12. Inter-Process Communication (IPC) • Fundamental microkernel operation – Kernel provides no services, only mechanisms – OS services provided by (protected) user-level server processes – invoked by IPC • seL4 IPC uses a handshake through endpoints: – Transfer points without storage capacity – Message must be transferred instantly • One partner may have to block • Single copy user ➞ user by kernel • Two endpoint types: – Synchronous (Endpoint) and asynchronous (AsyncEP) ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 12 Attribution License COMP9242 S2/2014 W01 seL4 Client Server IPC send receive
  • 13. Synchronous Endpoint ….... • Threads must rendez-vous for message transfer – One side blocks until the other is ready – Implicit synchronisation • Message copied from sender’s to receiver’s message registers – Message is combination of caps and data words • presently max 121 words (484B, incl message “tag”) ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 13 Attribution License COMP9242 S2/2014 W01 ….... Thread1 Running Blocked Thread2 Blocked Running Send (ep1_cap, …) ….. Wait (ep1_cap, …) Send (ep2_cap, …) Wait (ep2_cap, …) …....
  • 14. Asynchronous Endpoint • Avoids blocking – send OR-s cap badge to AEP’s data word – no caps can be sent • Receiver can poll or wait – waiting returns and clears data word – polling just returns data word • Similar to interrupt (with small payload, like interrupt mask) ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 14 Attribution License COMP9242 S2/2014 W01 ….... Thread1 Running Blocked Thread2 Blocked Running w = Poll (ep_cap, …) …... w = Wait (ep_cap,…) Send (ep_ c a p, …) ….... Send (ep_cap, …)
  • 15. Receiving from Sync and Async Endpoints Server with synchronous and asynchronous interface • Example: file system – synchronous (RPC-style) client protocol – asynchronous notifications from driver • Could have separate threads waiting on endpoints – forces multi-threaded server, concurrency control • Alternative: allow single thread to wait on both EP types – Mechanism: • AsyncEP is bound to thread with BindAEP() syscall • thread waits on synchronous endpoint • async message delivered as if been waiting on AsyncEP ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 15 Attribution License COMP9242 S2/2014 W01 Server Client Driver
  • 16. Sync Endpoints are Message Queues ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 16 Attribution License Kernel • EP has no sense of direction • May queue senders or receivers – never both at the same time! • Communication needs 2 EPs! COMP9242 S2/2014 W01 Server First invocation queues caller Client1 Client2 TCB1 TCB2 EP Further callers of same direction queue behind
  • 17. Client-Server Communication • Asymmetric relationship: – Server widely accessible, clients not – How can server reply back to client (distinguish between them)? • Client can pass (session) reply cap in first request – server needs to maintain session state – forces stateful server design • seL4 solution: Kernel provides single-use reply cap – only for Call operation (Send+Wait) – allows server to reply to client – cannot be copied/minted/re-used but can be moved – one-shot (automatically destroyed after first use) ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 17 Attribution License COMP9242 S2/2014 W01 Client1 Server Client2
  • 18. Call RPC Semantics ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 18 Attribution License Client Call(ep,…) process Server Wait(ep,&rep) process Send(rep,…) process COMP9242 S2/2014 W01 Client Server Kernel mint rep deliver to server deliver to client destroy rep
  • 19. Identifying Clients Stateful server serving multiple clients • Must respond to correct client – Ensured by reply cap • Must associate request with correct state • Could use separate EP per client – endpoints are lightweight (16 B) – but requires mechanism to wait on a set of EPs (like select) • Instead, seL4 allows to individually mark (“badge”) caps to same EP – server provides individually badged caps to clients – server tags client state with badge – kernel delivers badge to receiver on invocation of badged caps ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 19 Attribution License COMP9242 S2/2014 W01 Client1 Server Client1 state Client2 Client2 state
  • 20. IPC Mechanics: Virtual Registers • Like physical registers, virtual registers are thread state – context-switched by kernel – implemented as physical registers or thread-local memory location • Message registers – contain message transferred in IPC – architecture-dependent subset mapped to physical registers • 5 on ARM, 3 on x86 – library interface hides details – 1st message register is special, contains message tag • Reply cap – overwritten by next receive! – can move to CSpace with cspace_save_reply_cap() ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 20 Attribution License COMP9242 S2/2014 W01
  • 21. IPC Message Format CSpace reference for receiving caps (Receive only) Caps (on Send) Tag Message Badges (on Receive) Note: Don’t need to deal with this explicitly for project ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 21 Attribution License COMP9242 S2/2014 W01 Msg Length # Caps Caps Label unwrapped Meaning defined by IPC protocol (Kernel or user) Raw data Bitmap indicating caps which had badges extracted Caps sent or received
  • 22. Client-Server IPC Example Allocate EP and retype ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 22 Attribution License Server Client COMP9242 S2/2014 W01 seL4_MessageInfo_t tag = seL4_MessageInfo_new(0, 0, 0, 1); seL4_SetTag(tag); seL4_SetMR(0,1); seL4_Call(server_c, tag); Load into tag register Set message register #0 seL4_Word addr = ut_alloc(seL4_EndpointBits); err = cspace_ut_retype_addr(tcb_addr, seL4_EndpointObject, seL4_EndpointBits, cur_cspace, &ep_cap) seL4_CPtr cap = cspace_mint_cap(dest, cur_cspace, ep_cap, seL4_all_rights, seL4_CapData_MakeBadge_new)); … seL4_Word badge; seL4_MessageInfo_t msg = seL4_Wait(ep, &badge); … seL4_MessageInfo_t reply = seL4_MessageInfo_new(0, 0, 0, 0); seL4_Reply(reply); Cap is badged 0 Insert EP into CSpace Implicit use of reply cap
  • 23. Server Saving Reply Cap seL4_Word addr = ut_alloc(seL4_EndpointBits); err = cspace_ut_retype_addr(tcb_addr, seL4_EndpointObject, seL4_EndpointBits, cur_cspace, &ep_cap) seL4_CPtr cap = cspace_mint_cap(dest, cur_cspace, ep_cap, seL4_all_rights, ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 23 Attribution License Server COMP9242 S2/2014 W01 seL4_CapData_MakeBadge(0)); … seL4_Word badge; seL4_MessageInfo_t msg = seL4_Wait(ep, &badge); seL4_CPtr slot = cspace_save_reply_cap(cur_cspace); … seL4_MessageInfo_t reply = seL4_MessageInfo_new(0, 0, 0, 0); seL4_Send(slot, reply); cspace_free_cslot(slot); Save reply cap in CSpace Explicit use of reply cap Reply cap no longer valid
  • 24. IPC Operations Summary • Send (ep_cap, …), Wait (ep_cap, …), Wait (aep_cap, …) – blocking message passing – needs Write, Read permission, respectively • NBSend (ep_cap, …) – discard message if receiver isn’t ready • Call (ep_cap, …) – equivalent to Send (ep_cap,…) + reply-cap + Wait (ep_cap,…) • Reply (…) – equivalent to Send (rep_cap, …) • ReplyWait (ep_cap, …) – equivalent to Reply (…) + Wait (ep_cap, …) – purely for efficiency of server operation • Notify (aep_cap, …), Poll (aep_cap, …) – non-blocking send / check for message on AsyncEP No failure notification where this reveals info on other entities! ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 24 Attribution License COMP9242 S2/2014 W01 Need error handling protocol !
  • 25. Derived Capabilities • Badging is an example of capability derivation • The Mint operation creates a new, less powerful cap – Can add a badge • Mint ( , ) ➞ – Can strip access rights • eg WR➞R/O • Granting transfers caps over an Endpoint – Delivers copy of sender’s cap(s) to receiver • reply caps are a special case of this – Sender needs Endpoint cap with Grant permission – Receiver needs Endpoint cap with Write permission • else Write permission is stripped from new cap ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 25 Attribution License • Retyping – Fundamental operation of seL4 memory management – Details later… COMP9242 S2/2014 W01
  • 26. seL4 System Calls • Notionally, seL4 has 6 syscalls: – Yield(): invokes scheduler • only syscall which doesn’t require a cap! – Send(), Receive() and 3 variants/combinations thereof • Notify() is actually not a separate syscall but same as Send() – This is why I earlier said “approximately 3 syscalls” ☺ • All other kernel operations are invoked by “messaging” – Invoking Send()/Receive() on an object cap – Each object has a set of kernel protocols • operations encoded in message tag • parameters passed in message words – Mostly hidden behind “syscall” wrappers ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 26 Attribution License COMP9242 S2/2014 W01
  • 27. seL4 Memory Management Principles • Memory (and caps referring to it) is typed: – Untyped memory: • unused, free to Retype into something else – Frames: • (can be) mapped to address spaces, no kernel semantics – Rest: TCBs, address spaces, CNodes, EPs • used for specific kernel data structures • After startup, kernel never allocates memory! – All remaining memory made Untyped, handed to initial address space • Space for kernel objects must be explicitly provided to kernel – Ensures strong resource isolation • Extremely powerful tool for shooting oneself in the foot! – We hide much of this behind the cspace and ut allocation libraries ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 27 Attribution License COMP9242 S2/2014 W01
  • 28. Capability Derivation • Copy, Mint, Mutate, Revoke are invoked on CNodes Mint( , dest, src, rights, ) – CNode cap must provide appropriate rights • Copy takes a cap for destination – Allows copying of caps between CSpaces – Alternative to granting via IPC (if you have privilege to access Cspace!) ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 28 Attribution License COMP9242 S2/2014 W01
  • 29. Cspace Operations extern cspace_t * cspace_create(int levels); /* either 1 or 2 level */ extern cspace_err_t cspace_destroy(cspace_t *c); ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 29 Attribution License COMP9242 S2/2014 W01 extern seL4_CPtr cspace_copy_cap(cspace_t *dest, cspace_t *src, seL4_CPtr src_cap, seL4_CapRights rights); extern seL4_CPtr cspace_mint_cap(cspace_t *dest, cspace_t *src, seL4_CPtr src_cap, seL4_CapRights rights, seL4_CapData badge); extern seL4_CPtr cspace_move_cap(cspace_t *dest, cspace_t *src, seL4_CPtr src_cap); extern cspace_err_t cspace_delete_cap(cspace_t *c, seL4_CPtr cap); extern cspace_err_t cspace_revoke_cap(cspace_t *c, seL4_CPtr cap);
  • 30. cspace and ut libraries ut_alloc() ut_free() … OS seL4 ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 30 Attribution License COMP9242 S2/2014 W01 cspace_create() cspace_destroy() … Personality System Calls Library Calls User-level Wraps messy Cspace tree & slot management Manages slab of Untyped Extend for own needs!
  • 31. seL4 Memory Management Approach ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 31 Attribution License COMP9242 S2/2014 W01 Global Resource Manager RAM Kernel Data GRM Data Resource Manager RM Data Resource Manager RM Data Addr Space Addr Space Addr Space Addr Space RM RM Data Resources fully delegated, allows autonomous operation Strong isolation, No shared kernel resources
  • 32. Memory Management Mechanics: Retype Retype (Untyped, 21) Retype (TCB, 2n) … … Retype (Frame, 22) F0 F1 UT1 F2 F3 UT2 ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 32 Attribution License COMP9242 S2/2014 W01 UT0 Retype (Untyped, 21) UT3 UT4 Retype (CNode, 2m, 2n) r,w r,w r,w r,w … … r Mint (r) Revoke()
  • 33. seL4 Address Spaces (VSpaces) • Very thin wrapper around hardware page tables – Architecture-dependent – ARM and (32-bit) x86 are very similar • Page directories (PDs) map page tables, page tables (PTs) map pages • A VSpace is represented by a PD object: – Creating a PD (by Retype) creates the VSpace – Deleting the PD deletes the VSpace ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 33 Attribution License COMP9242 S2/2014 W01 PageTable_Map(PD) Page_Map(PT)
  • 34. Address Space Operations Sample code we provide • Each mapping has: – virtual_address, phys_address, address_space and frame_cap – address_space struct identifies the level 1 page_directory cap – you need to keep track of (frame_cap, PD_cap, v_adr, p_adr)! ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 34 Attribution License COMP9242 S2/2014 W01 seL4_Word frame_addr = ut_alloc(seL4_PageBits); err = cspace_ut_retype_addr(frame_addr, seL4_ARM_Page, seL4_ARM_PageBits, cur_cspace, &frame_cap); map_page(frame_cap, pd_cap, 0xA0000000, seL4_AllRights, seL4_ARM_Default_VMAttributes); bzero((void *)0xA0000000, PAGESIZE); seL4_ARM_Page_Unmap(frame_cap); cspace_delete_cap(frame_cap) ut_free(frame_addr, seL4_PageBits); cap to level 1 page table
  • 35. Mapping Same Frame Twice: Shared Memory seL4_CPtr new_frame_cap = cspace_copy_cap(cur_cspace, cur_cspace, • Each mapping requires its own frame cap even for the same frame ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 35 Attribution License COMP9242 S2/2014 W01 existing_frame_cap, seL4_AllRights); map_page(new_frame_cap, pd_cap, 0xA0000000, seL4_AllRights, seL4_ARM_Default_VMAttributes); bzero((void *)0xA0000000, PAGESIZE); seL4_ARM_Page_Unmap(existing_frame_cap); cspace_delete_cap(existing_frame_cap) seL4_ARM_Page_Unmap(new_frame_cap); cspace_delete_cap(new_frame_cap) ut_free(frame_addr, seL4_PageBits);
  • 36. Memory Management Caveats • The object manager handles allocation for you • However, it is very simplistic, you need to understand how it works • Simple rule (it’s buddy-based): – Freeing an object of size n: you can allocate new objects <= size n – Freeing 2 objects of size n does not mean that you can allocate an object of size 2n. Object size on ARM (Bytes) Frame 212 Page directory 214 Endpoint 24 Cslot 24 TCB 29 Page table 210 • All kernel objects must be size aligned! ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 36 Attribution License COMP924 2 S2/2014 W01
  • 37. Memory Management Caveats • Objects are allocated by Retype() of Untyped memory by seL4 kernel – The kernel will not allow you to overlap objects • ut_alloc and ut_free() manage user-level’s view of Untyped allocation. – Major pain if kernel and user’s view diverge – TIP: Keep objects address and CPtr together. Untyped Memory 215 B 8 frames B But debugging nightmare if ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons COMP9242 S2/2014 W01 37 Attribution License you try!! • Be careful with allocations! • Don’t try to allocate all of physical memory as frames, as you need more memory for TCBs, endpoints etc. • Your frametable will eventually integrate with ut_alloc to manage the 4K untyped size.
  • 38. Threads • Theads are represented by TCB objects • They have a number of attributes (recorded in TCB): – VSpace: a virtual address space • page directory reference • multiple threads can belong to the same VSpace – CSpace: capability storage • CNode reference (CSpace root) plus a few other bits – Fault endpoint • Kernel sends message to this EP if the thread throws an exception – IPC buffer (backing storage for virtual registers) – stack pointer (SP), instruction pointer (IP), user-level registers – Scheduling priority – Time slice length (presently a system-wide constant) • Yes, this is broken! (Will be fixed soon…) • These must be explicitly managed – … we provide an example you can modify ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 38 Attribution License COMP9242 S2/2014 W01
  • 39. Threads Creating a thread • Obtain a TCB object • Set attributes: Configure() – associate with VSpace, CSpace, fault EP, prio, define IPC buffer • Set SP, IP (and optionally other registers): WriteRegisters() – this results in a completely initialised thread – will be able to run if resume_target is set in call, else still inactive • Activated (made schedulable): Resume() ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 39 Attribution License COMP9242 S2/2014 W01
  • 40. Creating a Thread in Own AS and cspace_t static char stack[100]; int thread_fct() { while(1); return 0; } /* Allocate and map new frame for IPC buffer as before */ seL4_Word tcb_addr = ut_alloc(seL4_TCBBits); err = cspace_ut_retype_addr(tcb_addr, seL4_TCBObject, seL4_TCBBits, ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 40 Attribution License COMP9242 S2/2014 W01 cur_cspace, &tcb_cap) err = seL4_TCB_Configure(tcb_cap, FAULT_EP_CAP, PRIORITY, curspace->root_cnode, seL4NilData, seL4_CapInitThreadPD, seL4_NilData, PROCESS_IPC_BUFFER, ipc_buffer_cap); seL4_UserContext context = { .pc = &thread, .sp = &stack}; seL4_TCB_WriteRegisters(tcb_cap, 1, 0, 2, &context); If you use threads, write a library to create and destroy them.
  • 41. Threads and Stacks • Stacks are completely user-managed, kernel doesn’t care! – Kernel only preserves SP, IP on context switch • Stack location, allocation, size must be managed by userland • Beware of stack overflow! – Easy to grow stack into other data • Pain to debug! – Take special care with automatic arrays! ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 41 Attribution License COMP9242 S2/2014 W01 Stack 1 Stack 2 f () { int buf[10000]; . . . }
  • 42. Creating a Thread in New AS and cspace_t /* Allocate, retype and map new frame for IPC buffer as before * Allocate and map stack??? * Allocate and retype a TCB as before * Allocate and retype a seL4_ARM_PageDirectoryObject of size seL4_PageDirBits * Mint a new badged cap to the syscall endpoint */ cspace_t * new_cpace = ut_alloc(seL4_TCBBits); char *elf_base = cpio_get_file(_cpio_archive, “test”)->p_base; err = elf_load(new_pagedirectory_cap, elf_base); unsigned int entry = elf_getEntryPoint(elf_base); err = seL4_TCB_Configure(tcb_cap, FAULT_EP_CAP, PRIORITY, ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 42 Attribution License COMP9242 S2/2014 W01 new_cspace->root_cnode, seL4NilData, new_pagedirectory_cap, seL4_NilData, PROCESS_IPC_BUFFER, ipc_buffer_cap); seL4_UserContext context = {.pc = entry, .sp = &stack}; seL4_TCB_WriteRegisters(tcb_cap, 1, 0, 2, &context);
  • 43. seL4 Scheduling • Presently, seL4 uses 256 hard priorities (0–255) – Priorities are strictly observed – The scheduler will always pick the highest-prio runnable thread – Round-robin scheduling within prio level • Aim is real-time performance, not fairness – Kernel itself will never change the prio of a thread – Achieving fairness (if desired) is the job of user-level servers ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 43 Attribution License COMP9242 S2/2014 W01 0 prio 255
  • 44. Exception Handling • A thread can trigger different kinds of exceptions: – invalid syscall • may require instruction emulation or result from virtualization – capability fault • cap lookup failed or operation is invalid on cap – page fault • attempt to access unmapped memory • may have to grow stack, grow heap, load dynamic library, … – architecture-defined exception • divide by zero, unaligned access, … • Results in kernel sending message to fault endpoint – exception protocol defines state info that is sent in message • Replying to this message restarts the thread – endless loop if you don’t remove the cause for the fault first! ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 44 Attribution License COMP9242 S2/2014 W01
  • 45. Exception Handling ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 45 Attribution License COMP9242 S2/2014 W01 TCB Exception Handler Exception triggered. Kernel fakes message from thread to handler Handler performs appropriate action (e.g. map page). Handler replies Kernel intercepts to restart thread message and restarts thread
  • 46. Interrupt Management • seL4 models IRQs as messages sent to an AEP – Interrupt handler has Receive cap on that AEP • 2 special objects used for managing and acknowledging interrupts: – Single IRQControl object • single IRQControl cap provided by kernel to initial VSpace • only purpose is to create IRQHandler caps – Per-IRQ-source IRQHandler object • interrupt association and dissociation • interrupt acknowledgment ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 46 Attribution License COMP9242 S2/2014 W01 IRQControl Get(usb) IRQHandler
  • 47. Interrupt Handling • IRQHandler cap allows driver to bind AEP to interrupt • Afterwards: – AEP is used to receive interrupt – IRQHandler is used to acknowledge interrupt ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 47 Attribution License COMP9242 S2/2014 W01 SetEndpoint(aep) IRQHandler Wait(aep) Ack(handler) seL4_IRQHandler interrupt = cspace_irq_control_get_cap(cur_cspace, seL4_CapIRQControl, irq_number); seL4_IRQHandler_SetEndpoint(interrupt, async_ep_cap); seL4_IRQHander_ack(interrupt); Ack first to unmask IRQ
  • 48. Device Drivers • Drivers do three things: – Handle interrupts (already explained) – Communicate with rest of OS (IPC + shared memory) – Access device registers • Device register access – Devices are memory-mapped on ARM – Have to find frame cap from bootinfo structure – Map the appropriate page in the driver’s VSpace device_vaddr = map_device(0xA0000000, (1 << seL4_PageBits)); … *((void *) device_vaddr= …; ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 48 Attribution License COMP9242 S2/2014 W01 Magic device register access
  • 49. Where To Find More • UNSW Advanced Operating Systems Course http://www.cse.unsw.edu.au/~cs9242 • NICTA Trustworthy Systems research http://trustworthy.systems • seL4 open-source portal http://sel4.systems • L4 Microkernel Headquarters http://l4hq.org • Gernot’s blog: http://microkerneldude.wordpress.com/ • Gernot’s research home page: http://ssrg.nicta.com.au/people/?cn=Gernot+Heiser ©2011 Gernot Heiser UNSW/NICTA. Distributed under Creative Commons 49 Attribution License COMP9242 S2/2014 W01