SlideShare a Scribd company logo
1
Linux Kernel Library: Reusing
Monolithic Kernel
Hajime Tazaki
IIJ Innovation Institute
2016/07
AIST seminar vol.2
2 . 1
LKL in a nutshell
Linux kernel library
a library of Linux
Octavian Purdila (Intel)'s
work (since 2007?)
Proposed on LKML (Nov. 2015)
2809 LoC (as of Apr. 2016)
https://lwn.net/Articles/662953/
Purdila et al., LKL: The Linux kernel library, RoEduNet
2010.
2 . 2
LKL (cont'd)
hardware-independent architecture (arch/lkl)
provide an interface underlying environment
outsource dependencies
clock, memory allocation, scheduler
running on Windows, Linux, FreeBSD
simplify I/O operation of devices
virtio host implementation
could use the driver (of virtio) in Linux
Purdila et al., LKL: The Linux kernel library,
RoEduNet 2010.
2 . 3
Benefit
less ossi cation of new features
operating system personality
userspace library has less deployment cost
Well-matured code base
(e.g.) Linux kernel running in userspace
small kernel, a bunch of library
but in a di erent shape
Any problem in computer science can be solved with
another level of indirection.
(Wheeler and/or Lampson)
img src: https://www. ickr.com/photos/thomasclaveirole/305073153
2 . 4
2 . 5
What is reusing monolithic kernel ?
Anykernel: originally in NetBSD rump kernel
We de ne an anykernel to be an organization of
kernel code which allows the kernel's unmodi ed
drivers to be run in various con gurations such as
application libraries and microkernel style servers,
and also as part of a monolithic kernel. -- Kantee
2012.
Using (unmodi ed) high-quality code base of monolithic kernel
on di erent environment in di erent shape
by gluing additional stu s
2 . 62 . 7
(a bit
of)
History
rump: 2007 (NetBSD)
LKL: 2007 (Linux)
DCE/LibOS: 2008 (Linux/FreeBSD)
LibOS/LKL revival: 2015
LibOS merged to LKL
http://news.mynavi.jp/news/2015/03/25/285/
https://news.ycombinator.com/item?id=9259292
http://www.phoronix.com/scan.php?page=news_item&px=Linux-Library-LibOS
http://lwn.net/Articles/639333/
2 . 8
2 . 9
LKL v.s. LibOS
LKL LibOS
LKL v.s. LibOS (cont'd)
LoC:
arch/lkl (LKL) < arch/lib (LibOS)
di : the amount of stub code
commons
no modi cation to the original Linux code
description of kernel context (by POSIX thread)
outsourced resources (clock, memory, scheduler)
CPU independent architecture
di s
LibOS: implemented with higher API (timer, irq, kthread) by pthread
LKL: implement IRQ, kthread, timer with pthread in lower layer
3 . 1
Implementation
2 . 10
3 . 2
Internals
1. Host backend (host_ops)
2. CPU independent arch. (arch/lkl)
3. Application interface
1. host backend
environment dependent
part
unify an interface across
di erent platforms
(rump-hypercall like)
device interface with Virtio
block device <=> disk image
networking <=> TAP,
raw socket, DPDK, VDE
3 . 4
2. CPU independent architecture
architecture (arch/lkl)
transparent architecture bind
(as CPU arch)
require no modi cation to
the other
2800 LoC
thread information (struct
thread_info)
irq, timer, syscall handler
access to underlying layer
by host_ops
3 . 3
3 . 5
3. Application interface
1. use exposed API (LKL syscall)
2. use host libc (LD_PRELOAD)
3. extend (alternative) libc
3 . 6
API 1: use exposed API (LKL
syscall)
call entry points of LKL kernel
lkl_sys_open(), lkl_sys_socket()
almost same as ordinal syscalls
return value, errno noti cation are di erent
can use LKL syscall and host syscall
simultaneously
read ext4 le by lkl_sys_read() =>
write into host (Windows) by write()
3 . 7
API 2: hijack host standard library
dynamically replace symbols
of host syscalls (of libc)
LD_PRELOAD
socket() => lkl_sys_socket()
can use host binary (executable) as-is
limitation of replaceable symbols
needs syscall translation on non-linux host
3 . 8
API 3: extend (alternative) libc
only call LKL syscall with our own libc
also introduce as a virtual CPU architecture
a program can link this instead of host libc
can't access to (underlying) host resource
directly via this lkl syscall
as a patch for musl libc
3 . 9
Usecase (applications)
Use Case 1: instant kernel bypass
Use Case 2: programs reusing kernel code in userspace
Use Case 3: unikernel
3 . 10
Use Case 1: instant kernel bypass
syscall redirection by LD_PRELOAD
can use both LKL and host syscalls
new feature without touching host kernel
LD_PRELOAD=liblkl­super­tcp++.so firefox 
3 . 11
Use Case 2: programs reusing
kernel code in userspace
use kernel code without porting
mount a lesystem w/o root privilege
can use both LKL and host syscalls
e.g., access to disk image of ext4 format on Windows
1. open disk image (CreateFile())
2. Mount (lkl_sys_mount())
3. read a le in the disk image (lkl_sys_read())
4. write a le to windows side (WriteFile())
3 . 12
Use Case 3: Unikernel
single-application contained LKL
python + LKL, nginx + LKL
only LKL syscalls available
musl libc extension
rump hypcall (frankenlibc)
running on non-OS environment
(on Xen Mini-OS via rumprun)
Work in progress
- http://www.linux.com/news/enterprise/cloud-
computing/751156-are-cloud-operating-
systems-the-next-big-thing-
3 . 13
demos with linux kernel library
Unikernel on Linux (ping6 command
embedded kernel library)
Unikernel on qemu-arm (hello
world)
4 . 1
Kernel bypass/userspace
networking
4 . 2
Network Stack
Why in kernel space ?
the cost of packet was
expensive at the era ('70s)
now much cheaper
Getting fat (matured)
after decades
code path is longer
(and slower)
hard to add new features
faced unknown issues
img src: http://www.makelinux.net/kernel_map/
4 . 3
Alternate network stacks
lwip (2002~)
Arrakis [OSDI '14]
IX [OSDI '14]
MegaPipe [OSDI '12]
mTCP [NSDI '14]
SandStorm [SIGCOMM '14]
uTCP [CCR '14]
rumpkernel [ATC '09]
FastSocket [ASPLOS '16]
SolarFlare (2007~?)
StackMap [ATC '16]
libuinet (2013~)
SeaStar (2014~)
Snabb Switch (2012~)
4 . 4
Motivations
Socket API sucks
StackMap, MegaPipe, uTCP, SandStorm, IX
New API: no bene t with existing applications
Network stack in kernel space sucks
FastSocket, mTCP, lwip (SolarFlare?)
Compatibility is (also) important
rumpkernel, libuinet, Arrakis, IX, SolarFlare
Existing programming model sucks
SeaStar
4 . 5
Techniques
batching (syscall/NIC access)
Arrakis, IX, MegaPipe, mTCP, SandStorm, uTCP
Utilize feature-rich kernel stack
rumpkernel, fastsocket, StackMap
Porting to userspace stack
libuinet, SandStorm
Kernel bypass (userspace network stack)
mTCP, SandStorm, uTCP, rumpkernel, libuinet, lwip, SeaStar
bypass technique itself
netmap, PF_RING, raw socket, Intel DPDK
Connection locality (multi-core scalability)
SeaStar, MegaPipe, mTCP, fastsocket, .....
4 . 6
Implementation
Full scratch
lwip (Arrakis, IX, SolarFlare?), mTCP, uTCP, SeaStar
Porting based
libuinet, SandStorm
New API
MegaPipe, StackMap
Anykernel
rumpkernel, (LKL)
4 . 7
What's still missing ?
some solves problems by specialization
avoiding generality tax
performance w/ specialization v.s. more features w/ generalization
e.g., less TCP stack features, new API breaks existing applications
support.
specialized v.s. generalized
generalization often involves indirection
indirection usually introduces complexity (Wheeler/Lampson)
performant and generalized ?
5 . 1
Performance study
5 . 2
Conditions
ThinkStation P310 x2
CPU: Intel Core i7-6700 CPU @ 3.40GHz (8 cores)
Memory: 32GB
NIC: X540-T2
Linux 4.4.6-301 (x86_64) on Fedora 23
Linux bridge (X540 + tap/raw socket)
no DPDK... can't with hijack, etc
netperf (git ~v2.7.0)
netserver (native)
netperf (varied)
5 . 3
Conditions (cont'd)
combinations
netperf (sendmmsg) + host stack (native)
+ hijack library, native thread (hijack)
+ frankenlibc/lkl, green thread (lkl-musl)
netperf (sendmmsg) + lkl extension + frankenlibc (lkl-musl (skb pre
alloc))
pinned a processor
using taskset command
disable all o oad features (tso/gso/gro, rx/tx cksum)
TCP_RR (netperf)
5 . 4
UDP_STREAM (netperf)
5 . 5
UDP_STREAM (pps, netperf)
5 . 6
TCP_STREAM (netperf)
5 . 7
5 . 8
(ref.) LibOS results (as of Feb.
2015)
1024 bytes UDP, own-crafted tool
throughput: <10% of Linux native
5 . 9
Observations (of benchmark)
Native thread vs Green thread
better TCP_RR w/ native thread (pthread)
better TCP_STREAM/UDP_STREAM w/ green thread
???
avoiding dynamic allocation contributes a lot
penalized over MTU-sized payload on host stack (?)
6 . 1
Summary
Morphing monolithic kernel into an Anykernel
Various use cases
Userspace network stack (kernel bypass)
Unikernel
Performance study in progress
https://github.com/lkl/linux
6 . 2
Reference
Linux Kernel Library
Purdila et al., LKL: The Linux kernel library, RoEduNet 2010.
Rumpkernel (dissertation)
Kantee, Flexible Operating System Internals: The Design and
Implementation of the Anykernel and Rump Kernels, Ph.D Thesis,
2012
Linux LibOS in general
Tazaki et al. Direct Code Execution: Revisiting Library OS
Architecture for Reproducible Network Experiments, CoNEXT 2013
(LibOS in general)
https://github.com/lkl/linux
http://libos-nuse.github.io/
https://lwn.net/Articles/637658/
7 . 1
Backups
7 . 4
Recent Updates
7 . 5
Updates (diff to lkl)
(musl) libc integration
rump hypercall interface
via frankenlibc tools (for POSIX environment)
via rumprun framework (for baremetall/xen/kvm environment)
more applications
netperf (signal handling, etc)
nginx
ghc (Haskell runtime)
performance study
7 . 6
libc integration
standard lib for LKL
all syscall direct to LKL
application can use LKL transparently
no special modi cations or hijack needed
based on musl libc
introduce new (sub) architecture lkl
rump hypercall interface
replacement of LKL host_ops
or yet-another new host environment (rump)
has two thread primitives
pthread-based (as LKL does)
ucontext-based (more e cient on non-MP)
can reduce
the e ort of host_ops maintainance
complexity of tall abstraction turtle
7 . 8
rump hypcall (cont'd)
integration of
libc (musl for LKL, netbsd libc for rumpkernel)
rump hypcall (on linux, freebsd, netbsd, qemu-arm, spike)
host (platform) support code
frankenlibc
has two namespaced libc(s)
hyper call implementation can use libc
provides
a libc.a
cross-build toolchains (rumprun-cc, etc)
7 . 7
7 . 9
Usage
build
% ./configure CC=rumprun­cc ; make 
execution (with rexec launcher)
% rexec ./nginx disk­nginx.img tap:tap0 ­­ ­c nginx.conf 
rexec executable [disk image le] [NIC] -- [executable speci c options]
7 . 10
Codes
https://github.com/libos-nuse/lkl-linux
https://github.com/libos-nuse/musl
https://github.com/libos-nuse/frankenlibc
https://github.com/libos-nuse/rumprun
https://github.com/libos-nuse/nginx
https://github.com/libos-nuse/ghc

More Related Content

PPTX
Linux Initialization Process (1)
PDF
Linux Instrumentation
PDF
Library Operating System for Linux #netdev01
PPTX
Yocto Project introduction
PPT
A Quick Introduction to Linux
PDF
Kubernetes Story - Day 1: Build and Manage Containers with Podman
PDF
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
PDF
eBPF - Rethinking the Linux Kernel
Linux Initialization Process (1)
Linux Instrumentation
Library Operating System for Linux #netdev01
Yocto Project introduction
A Quick Introduction to Linux
Kubernetes Story - Day 1: Build and Manage Containers with Podman
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
eBPF - Rethinking the Linux Kernel

What's hot (20)

PDF
Kdump and the kernel crash dump analysis
PDF
[Ubisoft] Perforce Integration in a AAA Game Engine
PDF
Empirical Evidence Of Agile Methods
PDF
Linux Performance Analysis: New Tools and Old Secrets
PPTX
Linux Network Stack
PDF
UM2019 Extended BPF: A New Type of Software
PDF
PDF
Linux Systems: Getting started with setting up an Embedded platform
PDF
Hands-on ethernet driver
PDF
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
PDF
LSFMM 2019 BPF Observability
PDF
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
PDF
Virtualization Support in ARMv8+
PDF
Android Internals at Linaro Connect Asia 2013
PPTX
DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...
PDF
Marco Cavallini @ LinuxLab 2018 : Workshop Yocto Project, an automatic genera...
PDF
Bloc for Pharo: Current State and Future Perspective
PDF
[KubeCon NA 2020] containerd: Rootless Containers 2020
PDF
05.2 virtio introduction
PDF
IPv4/IPv6 移行・共存技術の動向
Kdump and the kernel crash dump analysis
[Ubisoft] Perforce Integration in a AAA Game Engine
Empirical Evidence Of Agile Methods
Linux Performance Analysis: New Tools and Old Secrets
Linux Network Stack
UM2019 Extended BPF: A New Type of Software
Linux Systems: Getting started with setting up an Embedded platform
Hands-on ethernet driver
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
LSFMM 2019 BPF Observability
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Virtualization Support in ARMv8+
Android Internals at Linaro Connect Asia 2013
DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...
Marco Cavallini @ LinuxLab 2018 : Workshop Yocto Project, an automatic genera...
Bloc for Pharo: Current State and Future Perspective
[KubeCon NA 2020] containerd: Rootless Containers 2020
05.2 virtio introduction
IPv4/IPv6 移行・共存技術の動向
Ad

Viewers also liked (20)

PDF
mTCP使ってみた
PDF
IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)
PDF
Direct Code Execution @ CoNEXT 2013
PDF
LibOS as a regression test framework for Linux networking #netdev1.1
PDF
NUSE (Network Stack in Userspace) at #osio
PDF
Kernelvm 201312-dlmopen
PDF
Kernel Recipes 2015: Kernel packet capture technologies
PDF
信学会IA研(広島市立大,2011年12月)招待講演発表資料,小川晃通,「2011年インターネット関連ニュース総括」
PPTX
Fablab baisc
PDF
Lively Walk-Through: A Lightweight Formal Method in UI/UX design
PDF
2016.03.04 NetOpsCoding#2
PDF
ドメイン名の ライフサイクルマネージメント
PDF
昨今のトラフィック状況
PDF
Debian tokyo-20150224-01
PDF
Janogia20120921 tsuchiyashishio
PDF
Janogia20120921 yoshinotakeshi
PPTX
horiyo-talk-CfS-20150527
PPTX
キメチャッテ
PDF
Capturando pacotes de rede no kernelspace
PPTX
仮想通貨テストベッドネットワークの構築
mTCP使ってみた
IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)
Direct Code Execution @ CoNEXT 2013
LibOS as a regression test framework for Linux networking #netdev1.1
NUSE (Network Stack in Userspace) at #osio
Kernelvm 201312-dlmopen
Kernel Recipes 2015: Kernel packet capture technologies
信学会IA研(広島市立大,2011年12月)招待講演発表資料,小川晃通,「2011年インターネット関連ニュース総括」
Fablab baisc
Lively Walk-Through: A Lightweight Formal Method in UI/UX design
2016.03.04 NetOpsCoding#2
ドメイン名の ライフサイクルマネージメント
昨今のトラフィック状況
Debian tokyo-20150224-01
Janogia20120921 tsuchiyashishio
Janogia20120921 yoshinotakeshi
horiyo-talk-CfS-20150527
キメチャッテ
Capturando pacotes de rede no kernelspace
仮想通貨テストベッドネットワークの構築
Ad

Similar to Linux Kernel Library - Reusing Monolithic Kernel (20)

PDF
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
PDF
Network stack personality in Android phone - netdev 2.2
PDF
Evolution of containers to kubernetes
PDF
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
PDF
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
ODP
CRIU: are we there yet?
PDF
Direct Code Execution - LinuxCon Japan 2014
PDF
UniK - a unikernel compiler and runtime
PDF
20240201 [HPC Containers] Rootless Containers.pdf
ODP
Linux26 New Features
PDF
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
PDF
Network & Filesystem: Doing less cross rings memory copy
PPTX
Linux Container Brief for IEEE WG P2302
PPTX
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
PPTX
OCP Engineering Workshop at UNH
PDF
Intel Briefing Notes
PDF
Rclex: A Library for Robotics meet Elixir
PDF
Playing BBR with a userspace network stack
PPTX
Realizing Linux Containers (LXC)
PDF
Docker London: Container Security
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Network stack personality in Android phone - netdev 2.2
Evolution of containers to kubernetes
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
CRIU: are we there yet?
Direct Code Execution - LinuxCon Japan 2014
UniK - a unikernel compiler and runtime
20240201 [HPC Containers] Rootless Containers.pdf
Linux26 New Features
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Network & Filesystem: Doing less cross rings memory copy
Linux Container Brief for IEEE WG P2302
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
OCP Engineering Workshop at UNH
Intel Briefing Notes
Rclex: A Library for Robotics meet Elixir
Playing BBR with a userspace network stack
Realizing Linux Containers (LXC)
Docker London: Container Security

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Approach and Philosophy of On baking technology
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Cloud computing and distributed systems.
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation
Per capita expenditure prediction using model stacking based on satellite ima...
Approach and Philosophy of On baking technology
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
NewMind AI Weekly Chronicles - August'25 Week I
NewMind AI Monthly Chronicles - July 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
Cloud computing and distributed systems.
Network Security Unit 5.pdf for BCA BBA.
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Advanced Soft Computing BINUS July 2025.pdf
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Mobile App Security Testing_ A Comprehensive Guide.pdf

Linux Kernel Library - Reusing Monolithic Kernel

  • 1. 1 Linux Kernel Library: Reusing Monolithic Kernel Hajime Tazaki IIJ Innovation Institute 2016/07 AIST seminar vol.2
  • 2. 2 . 1 LKL in a nutshell Linux kernel library a library of Linux Octavian Purdila (Intel)'s work (since 2007?) Proposed on LKML (Nov. 2015) 2809 LoC (as of Apr. 2016) https://lwn.net/Articles/662953/ Purdila et al., LKL: The Linux kernel library, RoEduNet 2010.
  • 3. 2 . 2 LKL (cont'd) hardware-independent architecture (arch/lkl) provide an interface underlying environment outsource dependencies clock, memory allocation, scheduler running on Windows, Linux, FreeBSD simplify I/O operation of devices virtio host implementation could use the driver (of virtio) in Linux Purdila et al., LKL: The Linux kernel library, RoEduNet 2010.
  • 4. 2 . 3 Benefit less ossi cation of new features operating system personality userspace library has less deployment cost Well-matured code base (e.g.) Linux kernel running in userspace small kernel, a bunch of library but in a di erent shape
  • 5. Any problem in computer science can be solved with another level of indirection. (Wheeler and/or Lampson) img src: https://www. ickr.com/photos/thomasclaveirole/305073153
  • 6. 2 . 4 2 . 5 What is reusing monolithic kernel ? Anykernel: originally in NetBSD rump kernel We de ne an anykernel to be an organization of kernel code which allows the kernel's unmodi ed drivers to be run in various con gurations such as application libraries and microkernel style servers, and also as part of a monolithic kernel. -- Kantee 2012. Using (unmodi ed) high-quality code base of monolithic kernel on di erent environment in di erent shape by gluing additional stu s
  • 7. 2 . 62 . 7 (a bit of) History rump: 2007 (NetBSD) LKL: 2007 (Linux) DCE/LibOS: 2008 (Linux/FreeBSD) LibOS/LKL revival: 2015 LibOS merged to LKL
  • 9. 2 . 8 2 . 9 LKL v.s. LibOS LKL LibOS
  • 10. LKL v.s. LibOS (cont'd) LoC: arch/lkl (LKL) < arch/lib (LibOS) di : the amount of stub code commons no modi cation to the original Linux code description of kernel context (by POSIX thread) outsourced resources (clock, memory, scheduler) CPU independent architecture di s LibOS: implemented with higher API (timer, irq, kthread) by pthread LKL: implement IRQ, kthread, timer with pthread in lower layer
  • 12. 2 . 10 3 . 2 Internals 1. Host backend (host_ops) 2. CPU independent arch. (arch/lkl) 3. Application interface
  • 13. 1. host backend environment dependent part unify an interface across di erent platforms (rump-hypercall like) device interface with Virtio block device <=> disk image networking <=> TAP, raw socket, DPDK, VDE
  • 14. 3 . 4 2. CPU independent architecture architecture (arch/lkl) transparent architecture bind (as CPU arch) require no modi cation to the other 2800 LoC thread information (struct thread_info) irq, timer, syscall handler access to underlying layer by host_ops
  • 15. 3 . 3 3 . 5 3. Application interface 1. use exposed API (LKL syscall) 2. use host libc (LD_PRELOAD) 3. extend (alternative) libc
  • 16. 3 . 6 API 1: use exposed API (LKL syscall) call entry points of LKL kernel lkl_sys_open(), lkl_sys_socket() almost same as ordinal syscalls return value, errno noti cation are di erent can use LKL syscall and host syscall simultaneously read ext4 le by lkl_sys_read() => write into host (Windows) by write()
  • 17. 3 . 7 API 2: hijack host standard library dynamically replace symbols of host syscalls (of libc) LD_PRELOAD socket() => lkl_sys_socket() can use host binary (executable) as-is limitation of replaceable symbols needs syscall translation on non-linux host
  • 18. 3 . 8 API 3: extend (alternative) libc only call LKL syscall with our own libc also introduce as a virtual CPU architecture a program can link this instead of host libc can't access to (underlying) host resource directly via this lkl syscall as a patch for musl libc
  • 19. 3 . 9 Usecase (applications) Use Case 1: instant kernel bypass Use Case 2: programs reusing kernel code in userspace Use Case 3: unikernel
  • 20. 3 . 10 Use Case 1: instant kernel bypass syscall redirection by LD_PRELOAD can use both LKL and host syscalls new feature without touching host kernel LD_PRELOAD=liblkl­super­tcp++.so firefox 
  • 21. 3 . 11 Use Case 2: programs reusing kernel code in userspace use kernel code without porting mount a lesystem w/o root privilege can use both LKL and host syscalls e.g., access to disk image of ext4 format on Windows 1. open disk image (CreateFile()) 2. Mount (lkl_sys_mount()) 3. read a le in the disk image (lkl_sys_read()) 4. write a le to windows side (WriteFile())
  • 22. 3 . 12 Use Case 3: Unikernel single-application contained LKL python + LKL, nginx + LKL only LKL syscalls available musl libc extension rump hypcall (frankenlibc) running on non-OS environment (on Xen Mini-OS via rumprun) Work in progress - http://www.linux.com/news/enterprise/cloud- computing/751156-are-cloud-operating- systems-the-next-big-thing-
  • 23. 3 . 13 demos with linux kernel library Unikernel on Linux (ping6 command embedded kernel library) Unikernel on qemu-arm (hello world)
  • 24. 4 . 1 Kernel bypass/userspace networking
  • 25. 4 . 2 Network Stack Why in kernel space ? the cost of packet was expensive at the era ('70s) now much cheaper Getting fat (matured) after decades code path is longer (and slower) hard to add new features faced unknown issues img src: http://www.makelinux.net/kernel_map/
  • 26. 4 . 3 Alternate network stacks lwip (2002~) Arrakis [OSDI '14] IX [OSDI '14] MegaPipe [OSDI '12] mTCP [NSDI '14] SandStorm [SIGCOMM '14] uTCP [CCR '14] rumpkernel [ATC '09] FastSocket [ASPLOS '16] SolarFlare (2007~?) StackMap [ATC '16] libuinet (2013~) SeaStar (2014~) Snabb Switch (2012~)
  • 27. 4 . 4 Motivations Socket API sucks StackMap, MegaPipe, uTCP, SandStorm, IX New API: no bene t with existing applications Network stack in kernel space sucks FastSocket, mTCP, lwip (SolarFlare?) Compatibility is (also) important rumpkernel, libuinet, Arrakis, IX, SolarFlare Existing programming model sucks SeaStar
  • 28. 4 . 5 Techniques batching (syscall/NIC access) Arrakis, IX, MegaPipe, mTCP, SandStorm, uTCP Utilize feature-rich kernel stack rumpkernel, fastsocket, StackMap Porting to userspace stack libuinet, SandStorm Kernel bypass (userspace network stack) mTCP, SandStorm, uTCP, rumpkernel, libuinet, lwip, SeaStar bypass technique itself netmap, PF_RING, raw socket, Intel DPDK Connection locality (multi-core scalability) SeaStar, MegaPipe, mTCP, fastsocket, .....
  • 29. 4 . 6 Implementation Full scratch lwip (Arrakis, IX, SolarFlare?), mTCP, uTCP, SeaStar Porting based libuinet, SandStorm New API MegaPipe, StackMap Anykernel rumpkernel, (LKL)
  • 30. 4 . 7 What's still missing ? some solves problems by specialization avoiding generality tax performance w/ specialization v.s. more features w/ generalization e.g., less TCP stack features, new API breaks existing applications support. specialized v.s. generalized generalization often involves indirection indirection usually introduces complexity (Wheeler/Lampson) performant and generalized ?
  • 32. 5 . 2 Conditions ThinkStation P310 x2 CPU: Intel Core i7-6700 CPU @ 3.40GHz (8 cores) Memory: 32GB NIC: X540-T2 Linux 4.4.6-301 (x86_64) on Fedora 23 Linux bridge (X540 + tap/raw socket) no DPDK... can't with hijack, etc netperf (git ~v2.7.0) netserver (native) netperf (varied)
  • 33. 5 . 3 Conditions (cont'd) combinations netperf (sendmmsg) + host stack (native) + hijack library, native thread (hijack) + frankenlibc/lkl, green thread (lkl-musl) netperf (sendmmsg) + lkl extension + frankenlibc (lkl-musl (skb pre alloc)) pinned a processor using taskset command disable all o oad features (tso/gso/gro, rx/tx cksum)
  • 35. 5 . 4 UDP_STREAM (netperf)
  • 36. 5 . 5 UDP_STREAM (pps, netperf)
  • 37. 5 . 6 TCP_STREAM (netperf)
  • 38. 5 . 7 5 . 8 (ref.) LibOS results (as of Feb. 2015) 1024 bytes UDP, own-crafted tool throughput: <10% of Linux native
  • 39. 5 . 9 Observations (of benchmark) Native thread vs Green thread better TCP_RR w/ native thread (pthread) better TCP_STREAM/UDP_STREAM w/ green thread ??? avoiding dynamic allocation contributes a lot penalized over MTU-sized payload on host stack (?)
  • 40. 6 . 1 Summary Morphing monolithic kernel into an Anykernel Various use cases Userspace network stack (kernel bypass) Unikernel Performance study in progress https://github.com/lkl/linux
  • 41. 6 . 2 Reference Linux Kernel Library Purdila et al., LKL: The Linux kernel library, RoEduNet 2010. Rumpkernel (dissertation) Kantee, Flexible Operating System Internals: The Design and Implementation of the Anykernel and Rump Kernels, Ph.D Thesis, 2012 Linux LibOS in general Tazaki et al. Direct Code Execution: Revisiting Library OS Architecture for Reproducible Network Experiments, CoNEXT 2013 (LibOS in general) https://github.com/lkl/linux http://libos-nuse.github.io/ https://lwn.net/Articles/637658/
  • 43. 7 . 4 Recent Updates
  • 44. 7 . 5 Updates (diff to lkl) (musl) libc integration rump hypercall interface via frankenlibc tools (for POSIX environment) via rumprun framework (for baremetall/xen/kvm environment) more applications netperf (signal handling, etc) nginx ghc (Haskell runtime) performance study
  • 45. 7 . 6 libc integration standard lib for LKL all syscall direct to LKL application can use LKL transparently no special modi cations or hijack needed based on musl libc introduce new (sub) architecture lkl
  • 46. rump hypercall interface replacement of LKL host_ops or yet-another new host environment (rump) has two thread primitives pthread-based (as LKL does) ucontext-based (more e cient on non-MP) can reduce the e ort of host_ops maintainance complexity of tall abstraction turtle
  • 47. 7 . 8 rump hypcall (cont'd) integration of libc (musl for LKL, netbsd libc for rumpkernel) rump hypcall (on linux, freebsd, netbsd, qemu-arm, spike) host (platform) support code frankenlibc has two namespaced libc(s) hyper call implementation can use libc provides a libc.a cross-build toolchains (rumprun-cc, etc)
  • 48. 7 . 7 7 . 9 Usage build % ./configure CC=rumprun­cc ; make  execution (with rexec launcher) % rexec ./nginx disk­nginx.img tap:tap0 ­­ ­c nginx.conf  rexec executable [disk image le] [NIC] -- [executable speci c options]