SlideShare a Scribd company logo
Akihiro Suda (containerd / NTT)
Rootless Containers 2020
Akihiro Suda (containerd / NTT)
Rootless Containers 2020
Ask me questions at
#2-kubecon-maintainer ( https://slack.cncf.io )
What is Rootless Containers?
• Running container runtimes (and also containers, of course) as a non-
root user on the host
• OCI (e.g. runc)
• CRI (e.g. containerd)
• CNI (e.g. Flannel)
• kubelet, dockerd, …
• Protects the host from potential vulnerabilities and misconfigurations
3
What is Rootless Containers?
Don’t be confused… The following stuffs are unrelated:
• .spec.securityContext.runAsUser (≈ docker run --user)
• UserNS KEP (≈ dockerd --userns-remap)
• usermod -aG docker foo
• Singularity with SETUID
4
Why do we need Rootless?
Most runtimes are designed to be secure by default, but they are still
likely to have vulnerabilities
Identifier Component Description
CVE-2017-1002102 kubelet Files on the host could be removed
containerd#2001 (2018) containerd /tmp on the host could be removed
CVE-2018-11235 kubelet Arbitrary command could be executed on the host
runc#1962 (2019) runc Bare procfs was exposed with non-pivot rootfs mode
CVE-2019-5736 runc runc binary could be replaced with a malicious file
CVE-2019-11245 kubelet An image could be executed with an unexpected UID
CVE-2019-14271 dockerd A malicious NSS library could be loaded
… … …
And more!
5
Why do we need Rootless?
• People often make misconfigurations L
• Sets up insufficient PodSecurityPolicy / Gatekeeper policies
• Exposes system components’ TCP ports without mTLS
(e.g. etcd, kube-apiserver, kubelet, dockerd…)
• Exposes private keys as IaaS metadata (169.254.169.254)
• Uses same kubelet certs for all the nodes
• …
6
Why do we need Rootless?
• Rootless Containers can mitigate the impacts of such vulnerabilities
and misconfiguration
• Even if the host gets compromised, the attacker won’t be able to:
• access files owned by other users
• modify firmware and kernel (→ undetectable malware)
• ARP spoofing (→ DNS spoofing)
7
Not a panacea, of course…
Not effective against:
• Vulnerabilities of kernel and hardware
• DDoS attacks
• Cryptomining …
8
Not a panacea, of course…
Some caveats apply
• Network throughput is slowed down
(But we are seeing HUGE improvements in 2020)
• No support for NFS and block storages
(But it doesn’t matter if you use managed DBs and object storages)
9
History
It began in c. 2012… But wasn’t popular until 2018-2019
Year Low layers High layers
2012 Kernel [officially in 2013]
2013 Semi-privileged networking with
SETUID
LXC
2014
2015
2016 runc [officially in 2017]
2017
10
History
It began in c. 2012… But wasn’t popular until 2018-2019
Year Low layers High layers
2018
Unprivileged networking (slirp4netns)
Unprivileged FUSE-OverlayFS
BuildKit, based on containerd tech
Docker [officially in 2019] & containerd
Podman & CRI-O
Kubernetes [unofficial, still]
2019 Unprivileged cgroup v2 via systemd
Faster port forwarding (RootlessKit)
k3s
2020 Faster networking with seccomp addfd
2021+ Kubernetes, officially?
11
• https://get.docker.com/rootless
• Rootless mode was experimental in v19.03, will be GA in v20.10
• Other notables updates in v20.10 w.r.t. Rootless:
• Resource limitation with Cgroup v2
• FUSE-OverlayFS
• Improved installer
Example: Docker
12
Easy to install
Example: Docker
13
$ curl -fsSL https://get.docker.com/rootless | sh ⏎
$ export DOCKER_HOST=unix:///run/user/1000/docker.sock ⏎
$ docker run -d --name caddy -p 8080:80 caddy ⏎
$ curl http://localhost:8080 ⏎
...
<title>Caddy works!</title>
...
All processes are running as a non-root user
Example: Docker
14
$ pstree user ⏎
sshd───bash───pstree
systemd─┬─(sd-pam)
├─containerd-shim─┬─caddy───7*[{caddy}]
│ └─12*[{containerd-shim}]
└─rootlesskit─┬─exe─┬─dockerd─┬─containerd───10*[{containerd}]
│ │ ├─rootlesskit-doc─┬─docker-proxy───6*[{docker-proxy}]
│ │ │ └─6*[{rootlesskit-doc}]
│ │ └─11*[{dockerd}]
│ └─11*[{exe}]
├─vpnkit───4*[{vpnkit}]
└─8*[{rootlesskit}]
• https://github.com/rootless-containers/usernetes
• Rootless Kubernetes distribution
• Multi-node demo is provided as a Docker Compose stack
• CNI: Flannel (VXLAN)
Example: Usernetes
15
$ docker-compose up –d ⏎
$ kubectl get nodes ⏎
NAME STATUS ROLES AGE VERSION
node-containerd Ready <none> 3m46s v1.19.0-usernetes
node-crio Ready <none> 3m46s v1.19.0-usernetes
Example: Usernetes
16
$ docker exec usernetes_node-containerd_1 pstree user ⏎
journalctl---(sd-pam)
systemd-+-(sd-pam)
|-containerd-fuse---containerd-fuse---4*[{containerd-fuse}]
|-containerd.sh---containerd---10*[{containerd}]
|-flanneld.sh---flanneld---9*[{flanneld}]
|-nsenter.sh---kubelet---13*[{kubelet}]
|-nsenter.sh---kube-proxy---7*[{kube-proxy}]
`-rootlesskit.sh---rootlesskit-+-exe-+-rootlesskit.sh---sleep
| `-9*[{exe}]
|-slirp4netns
`-8*[{rootlesskit}]
Example: k3s
17
$ k3s server --rootless ⏎
$ k3s kubectl apply –f manifest.yaml ⏎
• https://k3s.io/
• CNCF Sandbox Project
• Focuses on edge computing
• Incorporates Usernetes patches for supporting rootless, ahead of the
Kubernetes upstream
• Uses containerd as the CRI runtime
Example: BuildKit
18
• https://github.com/moby/buildkit
• A container image builder, built on containerd technology
• Can be executed in several ways
• As a built-in feature of dockerd
• As a standalone daemon
• As a Kubernetes Pod
• As a Kubernetes Job, without a daemon Pod
• As a Tekton Task
No need to set securityContext.Privileged
But Seccomp and AppArmor constraints need to be relaxed
Example: BuildKit
19
spec:
containers:
- securityContext:
runAsUser: 1000
seccompProfile:
type: Unconfined
metadata:
annotations:
container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
How it works
• UserNS
• MountNS
• NetNS
• Cgroup
• New frontier: Seccomp User Notification
20
• Maps a non-root user (e.g. UID 1000) to a fake root user (UID 0)
• Not the real root, but enough to run containers
• Subordinate UIDs are mapped as well
( typically 65,536 UIDs, defined in /etc/subuid )
How it works: UserNS
21
Host
UserNS
0 1 65536
0 1000 100000 165535 232
How it works: MountNS
• A non-root user can create MontNS along with UserNS
• But cannot mount most filesystems, except bind-mount, tmpfs, procfs,
and sysfs...
• No Overlayfs (on vanilla kernel)
• No NFS
• No block storages
• FUSE is supported since kernel 4.18
• FUSE-OverlayFS can substitute real OverlayFS
22
• A non-root user can also create NetNS with UserNS
• But cannot create vEth pairs, i.e. No internet connectivity
• Slirp is used instead of vEth for unprivileged internet connectivity
• Slow (51.5Gbps → 9.21Gbps), but we are seeing huge improvements
NetNS
How it works: NetNS
23
TAP Kernelslirp4netns
Ethernet
packets
Socket
syscalls
How it works: Cgroup
• No support for cgroup v1
• i.e. no memory limit, no CPU limit, no fork-bomb guard...
• Cgroup v2 is almost fully supported
• Fedora has already switched the default to v2
• Other distros will follow in 2021-2022 ?
24
A new frontier in 2020:
Seccomp User Notification
• Kernel 5.0 merged the support for Seccomp User Notification: a new
way to hook syscalls in the userspace
• Similar to ptrace, but less numbers of context switches
• Allows emulating subordinate UIDs without /etc/subuid
• POC: https://github.com/rootless-containers/subuidless
25
A new frontier in 2020:
Seccomp User Notification
• Kernel 5.9 merged the support for SECCOMP_IOCTL_NOTIF_ADDFD
• Allows injecting file descriptors from a host process into container
processes
• e.g. replace sockfd on connect(2)
• No slirp overhead any more
• POC: https://github.com/rootless-containers/bypass4netns
26
Recap
• Rootless Containers can protect the host from potential vulnerabilities
and misconfigurations
• Already adopted by lots of projects: BuildKit, Docker, containerd,
Podman, CRI-O, k3s ...
• Being also proposed to the Kubernetes upstream
• There are some drawbacks, but being significantly improved using
Seccomp User Notification
27
Resources
• Rootless Containers overview: https://rootlesscontaine.rs/
• Rootless containerd:
https://github.com/containerd/containerd/blob/master/docs/rootless.md
• Rootless Docker: https://get.docker.com/rootless
• Usernetes: https://github.com/rootless-containers/usernetes
• Rootless KEP: https://github.com/kubernetes/enhancements/pull/1371
28
Questions?
• Ask me questions at #2-kubecon-maintainer ( https://slack.cncf.io )
29
[KubeCon NA 2020] containerd: Rootless Containers 2020

More Related Content

PDF
I/O仮想化最前線〜ネットワークI/Oを中心に〜
PDF
eStargzイメージとlazy pullingによる高速なコンテナ起動
PDF
分散ストレージソフトウェアCeph・アーキテクチャー概要
PPTX
OpenStackを使用したGPU仮想化IaaS環境 事例紹介
PDF
Linux女子部 systemd徹底入門
PDF
「Neutronになって理解するOpenStack Network」~Neutron/Open vSwitchなどNeutronと周辺技術の解説~ - ...
PDF
今話題のいろいろなコンテナランタイムを比較してみた
PDF
10分で分かるLinuxブロックレイヤ
I/O仮想化最前線〜ネットワークI/Oを中心に〜
eStargzイメージとlazy pullingによる高速なコンテナ起動
分散ストレージソフトウェアCeph・アーキテクチャー概要
OpenStackを使用したGPU仮想化IaaS環境 事例紹介
Linux女子部 systemd徹底入門
「Neutronになって理解するOpenStack Network」~Neutron/Open vSwitchなどNeutronと周辺技術の解説~ - ...
今話題のいろいろなコンテナランタイムを比較してみた
10分で分かるLinuxブロックレイヤ

What's hot (20)

PDF
BuildKitの概要と最近の機能
PDF
DockerとKubernetesをかけめぐる
PDF
Kvm performance optimization for ubuntu
PPTX
「おうちクラウド」が今熱い!
PDF
TripleOの光と闇
PPTX
分散ストレージ技術Cephの最新情報
PPTX
ARM LinuxのMMUはわかりにくい
PDF
Disaggregating Ceph using NVMeoF
PDF
仮想化環境におけるパケットフォワーディング
PDF
containerdの概要と最近の機能
PDF
GPU Container as a Serviceを実現するための最新OSS徹底比較 - OpenStack最新情報セミナー 2017年7月
PDF
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
PPTX
コンテナネットワーキング(CNI)最前線
PDF
ゼロからはじめるKVM超入門
PPTX
Issues of OpenStack multi-region mode
PPTX
Ceph アーキテクチャ概説
PPTX
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
PDF
OSC2011 Tokyo/Fall 濃いバナ(virtio)
PDF
ML2/OVN アーキテクチャ概観
PDF
05.2 virtio introduction
BuildKitの概要と最近の機能
DockerとKubernetesをかけめぐる
Kvm performance optimization for ubuntu
「おうちクラウド」が今熱い!
TripleOの光と闇
分散ストレージ技術Cephの最新情報
ARM LinuxのMMUはわかりにくい
Disaggregating Ceph using NVMeoF
仮想化環境におけるパケットフォワーディング
containerdの概要と最近の機能
GPU Container as a Serviceを実現するための最新OSS徹底比較 - OpenStack最新情報セミナー 2017年7月
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
コンテナネットワーキング(CNI)最前線
ゼロからはじめるKVM超入門
Issues of OpenStack multi-region mode
Ceph アーキテクチャ概説
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
OSC2011 Tokyo/Fall 濃いバナ(virtio)
ML2/OVN アーキテクチャ概観
05.2 virtio introduction
Ad

Similar to [KubeCon NA 2020] containerd: Rootless Containers 2020 (20)

PDF
Rootless Containers & Unresolved issues
PDF
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
PDF
Rootless Containers
PDF
Docker and coreos20141020b
PDF
[Podman Special Event] Kubernetes in Rootless Podman
PDF
The State of Rootless Containers
PDF
Kubernetes
PDF
[DockerCon 2019] Hardening Docker daemon with Rootless mode
PDF
DCSF19 Hardening Docker daemon with Rootless mode
PDF
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
PDF
Podman rootless containers
PDF
The internals and the latest trends of container runtimes
PPTX
Comparison of existing cni plugins for kubernetes
PDF
Docker 0.11 at MaxCDN meetup in Los Angeles
PPTX
Introducing Container Technology to TSUBAME3.0 Supercomputer
PDF
Containers > VMs
PDF
Introduction to Docker at the Azure Meet-up in New York
PDF
Docker_AGH_v0.1.3
PPTX
Big Data in Container; Hadoop Spark in Docker and Mesos
PDF
A Gentle Introduction to Docker and Containers
Rootless Containers & Unresolved issues
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
Rootless Containers
Docker and coreos20141020b
[Podman Special Event] Kubernetes in Rootless Podman
The State of Rootless Containers
Kubernetes
[DockerCon 2019] Hardening Docker daemon with Rootless mode
DCSF19 Hardening Docker daemon with Rootless mode
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
Podman rootless containers
The internals and the latest trends of container runtimes
Comparison of existing cni plugins for kubernetes
Docker 0.11 at MaxCDN meetup in Los Angeles
Introducing Container Technology to TSUBAME3.0 Supercomputer
Containers > VMs
Introduction to Docker at the Azure Meet-up in New York
Docker_AGH_v0.1.3
Big Data in Container; Hadoop Spark in Docker and Mesos
A Gentle Introduction to Docker and Containers
Ad

More from Akihiro Suda (20)

PDF
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
PDF
20250616 [KubeCon JP 2025] VexLLM - Silence Negligible CVE Alerts Using LLM.pdf
PDF
20250403 [KubeCon EU] containerd - Project Update and Deep Dive.pdf
PDF
20250403 [KubeCon EU Pavilion] containerd.pdf
PDF
20250402 [KubeCon EU Pavilion] Lima.pdf_
PDF
20241115 [KubeCon NA Pavilion] Lima.pdf_
PDF
20241113 [KubeCon NA Pavilion] containerd.pdf
PDF
【情報科学若手の会 (2024/09/14】なぜオープンソースソフトウェアにコントリビュートすべきなのか
PDF
【Vuls祭り#10 (2024/08/20)】 VexLLM: LLMを用いたVEX自動生成ツール
PDF
20240321 [KubeCon EU Pavilion] Lima.pdf_
PDF
20240320 [KubeCon EU Pavilion] containerd.pdf
PDF
20240201 [HPC Containers] Rootless Containers.pdf
PDF
[KubeConNA2023] Lima pavilion
PDF
[KubeConNA2023] containerd pavilion
PDF
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
PDF
[CNCF TAG-Runtime] Usernetes Gen2
PDF
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
PDF
[KubeConEU2023] Lima pavilion
PDF
[KubeConEU2023] containerd pavilion
PDF
[Container Plumbing Days 2023] Why was nerdctl made?
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
20250616 [KubeCon JP 2025] VexLLM - Silence Negligible CVE Alerts Using LLM.pdf
20250403 [KubeCon EU] containerd - Project Update and Deep Dive.pdf
20250403 [KubeCon EU Pavilion] containerd.pdf
20250402 [KubeCon EU Pavilion] Lima.pdf_
20241115 [KubeCon NA Pavilion] Lima.pdf_
20241113 [KubeCon NA Pavilion] containerd.pdf
【情報科学若手の会 (2024/09/14】なぜオープンソースソフトウェアにコントリビュートすべきなのか
【Vuls祭り#10 (2024/08/20)】 VexLLM: LLMを用いたVEX自動生成ツール
20240321 [KubeCon EU Pavilion] Lima.pdf_
20240320 [KubeCon EU Pavilion] containerd.pdf
20240201 [HPC Containers] Rootless Containers.pdf
[KubeConNA2023] Lima pavilion
[KubeConNA2023] containerd pavilion
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
[CNCF TAG-Runtime] Usernetes Gen2
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
[KubeConEU2023] Lima pavilion
[KubeConEU2023] containerd pavilion
[Container Plumbing Days 2023] Why was nerdctl made?

Recently uploaded (20)

PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Essential Infomation Tech presentation.pptx
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
history of c programming in notes for students .pptx
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
L1 - Introduction to python Backend.pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Introduction to Artificial Intelligence
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
Online Work Permit System for Fast Permit Processing
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PTS Company Brochure 2025 (1).pdf.......
Design an Analysis of Algorithms II-SECS-1021-03
Materi-Enum-and-Record-Data-Type (1).pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Essential Infomation Tech presentation.pptx
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
history of c programming in notes for students .pptx
How to Migrate SBCGlobal Email to Yahoo Easily
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Odoo POS Development Services by CandidRoot Solutions
L1 - Introduction to python Backend.pptx
Wondershare Filmora 15 Crack With Activation Key [2025
Operating system designcfffgfgggggggvggggggggg
Which alternative to Crystal Reports is best for small or large businesses.pdf
Introduction to Artificial Intelligence
ISO 45001 Occupational Health and Safety Management System
Online Work Permit System for Fast Permit Processing
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus

[KubeCon NA 2020] containerd: Rootless Containers 2020

  • 1. Akihiro Suda (containerd / NTT) Rootless Containers 2020
  • 2. Akihiro Suda (containerd / NTT) Rootless Containers 2020 Ask me questions at #2-kubecon-maintainer ( https://slack.cncf.io )
  • 3. What is Rootless Containers? • Running container runtimes (and also containers, of course) as a non- root user on the host • OCI (e.g. runc) • CRI (e.g. containerd) • CNI (e.g. Flannel) • kubelet, dockerd, … • Protects the host from potential vulnerabilities and misconfigurations 3
  • 4. What is Rootless Containers? Don’t be confused… The following stuffs are unrelated: • .spec.securityContext.runAsUser (≈ docker run --user) • UserNS KEP (≈ dockerd --userns-remap) • usermod -aG docker foo • Singularity with SETUID 4
  • 5. Why do we need Rootless? Most runtimes are designed to be secure by default, but they are still likely to have vulnerabilities Identifier Component Description CVE-2017-1002102 kubelet Files on the host could be removed containerd#2001 (2018) containerd /tmp on the host could be removed CVE-2018-11235 kubelet Arbitrary command could be executed on the host runc#1962 (2019) runc Bare procfs was exposed with non-pivot rootfs mode CVE-2019-5736 runc runc binary could be replaced with a malicious file CVE-2019-11245 kubelet An image could be executed with an unexpected UID CVE-2019-14271 dockerd A malicious NSS library could be loaded … … … And more! 5
  • 6. Why do we need Rootless? • People often make misconfigurations L • Sets up insufficient PodSecurityPolicy / Gatekeeper policies • Exposes system components’ TCP ports without mTLS (e.g. etcd, kube-apiserver, kubelet, dockerd…) • Exposes private keys as IaaS metadata (169.254.169.254) • Uses same kubelet certs for all the nodes • … 6
  • 7. Why do we need Rootless? • Rootless Containers can mitigate the impacts of such vulnerabilities and misconfiguration • Even if the host gets compromised, the attacker won’t be able to: • access files owned by other users • modify firmware and kernel (→ undetectable malware) • ARP spoofing (→ DNS spoofing) 7
  • 8. Not a panacea, of course… Not effective against: • Vulnerabilities of kernel and hardware • DDoS attacks • Cryptomining … 8
  • 9. Not a panacea, of course… Some caveats apply • Network throughput is slowed down (But we are seeing HUGE improvements in 2020) • No support for NFS and block storages (But it doesn’t matter if you use managed DBs and object storages) 9
  • 10. History It began in c. 2012… But wasn’t popular until 2018-2019 Year Low layers High layers 2012 Kernel [officially in 2013] 2013 Semi-privileged networking with SETUID LXC 2014 2015 2016 runc [officially in 2017] 2017 10
  • 11. History It began in c. 2012… But wasn’t popular until 2018-2019 Year Low layers High layers 2018 Unprivileged networking (slirp4netns) Unprivileged FUSE-OverlayFS BuildKit, based on containerd tech Docker [officially in 2019] & containerd Podman & CRI-O Kubernetes [unofficial, still] 2019 Unprivileged cgroup v2 via systemd Faster port forwarding (RootlessKit) k3s 2020 Faster networking with seccomp addfd 2021+ Kubernetes, officially? 11
  • 12. • https://get.docker.com/rootless • Rootless mode was experimental in v19.03, will be GA in v20.10 • Other notables updates in v20.10 w.r.t. Rootless: • Resource limitation with Cgroup v2 • FUSE-OverlayFS • Improved installer Example: Docker 12
  • 13. Easy to install Example: Docker 13 $ curl -fsSL https://get.docker.com/rootless | sh ⏎ $ export DOCKER_HOST=unix:///run/user/1000/docker.sock ⏎ $ docker run -d --name caddy -p 8080:80 caddy ⏎ $ curl http://localhost:8080 ⏎ ... <title>Caddy works!</title> ...
  • 14. All processes are running as a non-root user Example: Docker 14 $ pstree user ⏎ sshd───bash───pstree systemd─┬─(sd-pam) ├─containerd-shim─┬─caddy───7*[{caddy}] │ └─12*[{containerd-shim}] └─rootlesskit─┬─exe─┬─dockerd─┬─containerd───10*[{containerd}] │ │ ├─rootlesskit-doc─┬─docker-proxy───6*[{docker-proxy}] │ │ │ └─6*[{rootlesskit-doc}] │ │ └─11*[{dockerd}] │ └─11*[{exe}] ├─vpnkit───4*[{vpnkit}] └─8*[{rootlesskit}]
  • 15. • https://github.com/rootless-containers/usernetes • Rootless Kubernetes distribution • Multi-node demo is provided as a Docker Compose stack • CNI: Flannel (VXLAN) Example: Usernetes 15 $ docker-compose up –d ⏎ $ kubectl get nodes ⏎ NAME STATUS ROLES AGE VERSION node-containerd Ready <none> 3m46s v1.19.0-usernetes node-crio Ready <none> 3m46s v1.19.0-usernetes
  • 16. Example: Usernetes 16 $ docker exec usernetes_node-containerd_1 pstree user ⏎ journalctl---(sd-pam) systemd-+-(sd-pam) |-containerd-fuse---containerd-fuse---4*[{containerd-fuse}] |-containerd.sh---containerd---10*[{containerd}] |-flanneld.sh---flanneld---9*[{flanneld}] |-nsenter.sh---kubelet---13*[{kubelet}] |-nsenter.sh---kube-proxy---7*[{kube-proxy}] `-rootlesskit.sh---rootlesskit-+-exe-+-rootlesskit.sh---sleep | `-9*[{exe}] |-slirp4netns `-8*[{rootlesskit}]
  • 17. Example: k3s 17 $ k3s server --rootless ⏎ $ k3s kubectl apply –f manifest.yaml ⏎ • https://k3s.io/ • CNCF Sandbox Project • Focuses on edge computing • Incorporates Usernetes patches for supporting rootless, ahead of the Kubernetes upstream • Uses containerd as the CRI runtime
  • 18. Example: BuildKit 18 • https://github.com/moby/buildkit • A container image builder, built on containerd technology • Can be executed in several ways • As a built-in feature of dockerd • As a standalone daemon • As a Kubernetes Pod • As a Kubernetes Job, without a daemon Pod • As a Tekton Task
  • 19. No need to set securityContext.Privileged But Seccomp and AppArmor constraints need to be relaxed Example: BuildKit 19 spec: containers: - securityContext: runAsUser: 1000 seccompProfile: type: Unconfined metadata: annotations: container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
  • 20. How it works • UserNS • MountNS • NetNS • Cgroup • New frontier: Seccomp User Notification 20
  • 21. • Maps a non-root user (e.g. UID 1000) to a fake root user (UID 0) • Not the real root, but enough to run containers • Subordinate UIDs are mapped as well ( typically 65,536 UIDs, defined in /etc/subuid ) How it works: UserNS 21 Host UserNS 0 1 65536 0 1000 100000 165535 232
  • 22. How it works: MountNS • A non-root user can create MontNS along with UserNS • But cannot mount most filesystems, except bind-mount, tmpfs, procfs, and sysfs... • No Overlayfs (on vanilla kernel) • No NFS • No block storages • FUSE is supported since kernel 4.18 • FUSE-OverlayFS can substitute real OverlayFS 22
  • 23. • A non-root user can also create NetNS with UserNS • But cannot create vEth pairs, i.e. No internet connectivity • Slirp is used instead of vEth for unprivileged internet connectivity • Slow (51.5Gbps → 9.21Gbps), but we are seeing huge improvements NetNS How it works: NetNS 23 TAP Kernelslirp4netns Ethernet packets Socket syscalls
  • 24. How it works: Cgroup • No support for cgroup v1 • i.e. no memory limit, no CPU limit, no fork-bomb guard... • Cgroup v2 is almost fully supported • Fedora has already switched the default to v2 • Other distros will follow in 2021-2022 ? 24
  • 25. A new frontier in 2020: Seccomp User Notification • Kernel 5.0 merged the support for Seccomp User Notification: a new way to hook syscalls in the userspace • Similar to ptrace, but less numbers of context switches • Allows emulating subordinate UIDs without /etc/subuid • POC: https://github.com/rootless-containers/subuidless 25
  • 26. A new frontier in 2020: Seccomp User Notification • Kernel 5.9 merged the support for SECCOMP_IOCTL_NOTIF_ADDFD • Allows injecting file descriptors from a host process into container processes • e.g. replace sockfd on connect(2) • No slirp overhead any more • POC: https://github.com/rootless-containers/bypass4netns 26
  • 27. Recap • Rootless Containers can protect the host from potential vulnerabilities and misconfigurations • Already adopted by lots of projects: BuildKit, Docker, containerd, Podman, CRI-O, k3s ... • Being also proposed to the Kubernetes upstream • There are some drawbacks, but being significantly improved using Seccomp User Notification 27
  • 28. Resources • Rootless Containers overview: https://rootlesscontaine.rs/ • Rootless containerd: https://github.com/containerd/containerd/blob/master/docs/rootless.md • Rootless Docker: https://get.docker.com/rootless • Usernetes: https://github.com/rootless-containers/usernetes • Rootless KEP: https://github.com/kubernetes/enhancements/pull/1371 28
  • 29. Questions? • Ask me questions at #2-kubecon-maintainer ( https://slack.cncf.io ) 29