SlideShare a Scribd company logo
Rootless Containers
Akihiro Suda (NTT)
akihiro.suda.cz@hco.ntt.co.jp
HPC Containers Advisory Council Meeting (Feb 1, 2024)
• Puts container runtimes (as well as containers) in a user namespace
– UserNS: Linux kernel’s feature that maps a non-root user to a fake root
(the root privilege is limited inside the namespace)
• Can mitigate potential vulnerabilities of the runtimes
– No access to read/write other users’ files
– No access to modify the kernel
– No access to modify the firmware
– No ARP spoofing
– No DNS spoofing
• Also useful for shared hosts (High-performance Computing, etc.)
– Works with GPU too
2
Rootless containers
e.g., runc breakout
CVE-2024-21626
(2024-01-31)
• 2014: LXC v1.0 introduced support for Rootless containers
(called “unprivileged containers” at that time)
– Networking depends on a SETUID binary, which is hard to configure and also is insecure
• 2016: Singularity v2.2 gained initial support for Rootless
• 2017: runc v1.0-rc4 gained initial support for Rootless
• 2018: Several works has begun to support Rootless in containerd, BuildKit,
Docker, Podman, etc.
– slirp4netns (usermode TCP/IP) eliminated the need to use a SETUID binary for bringing up
container-to-container networks
• 2019: Docker v19.03 was released with an experimental Rootless support
• 2020: Docker v20.10 was released with general availability of Rootless
3
History
• Linux kernel’s feature to remap UIDs and GIDs
– UID=1000 gains fake root privileges (UID=0) that are enough to create containers
– The privileges are limited inside the namespace
• Typically at least 65,536 subuids have to be allocated for containers
– Static configuration (/etc/subuid):
most common, but can be a mess for shared computing
– Dynamic configuration (nsswitch):
more preferrable for shared computing
• e.g., via FreeIPA https://freeipa.readthedocs.io/en/latest/designs/subordinate-ids.html
4
User namespaces
# /etc/subuid
1000:100000:65536
0 1 65536
0 1000 100000 165535
• POC of subuid-less rootless containers is also available, but not
ready to be used yet
https://github.com/rootless-containers/subuidless
– Emulates UID-related syscalls such as chown(2) using
seccomp_unotify(2) and xattr(7)
– More syscalls have to be emulated
5
User namespaces
6
Networking stack
(vEth)
eth0: 172.17.0.2
(Bridge)
docker0: 172.17.0.1
(TAP)
tap0: 10.0.2.100
(vEth) (vEth)
Network namespaces
(vEth)
eth0: 172.17.0.3
(Physical Ethernet)
eth0: 192.168.0.42
(slirp4netns)
virtual IP:10.0.2.2
Network namespace + User namespace
Ethernet packets
Unprivileged socket
syscalls
• Rootless Docker daemon is executed in slirp4netns’s NetNS too,
for ease of implementation
– Slow pull/push
– No direct access to localhost registries
– No support for --net=host
• Docker v26 (or later) may execute the daemon outside slirp4netns’s NetNS
to eliminate the restrictions 🎉
https://github.com/moby/moby/pull/47103 (WIP)
• The same technique has been used by Podman and nerdctl
(contaiNERD CTL) v2 too
7
Faster networking (for runtimes)
8
Faster networking (for containers)
• Bypass4netns allows bypassing slirp4netns
https://github.com/rootless-containers/bypass4netns
• Captures socket syscalls inside the NetNS, reconstructs the FDs
outside the NetNS, and replaces the FDs inside the NetNS
• Integrated into nerdctl (opt-in)
• Can be used with Docker and Podman too
9
Faster networking (for containers)
Accelerating TCP/IP Communications in Rootless Containers by Socket Switching (Naoki Matsumoto and Akihiro Suda, SWoPP 2022)
https://speakerdeck.com/mt2naoki/ip-communications-in-rootless-containers-by-socket-switching?slide=4
Even faster than rootful
10
• It is controversial whether non-root users should be allowed to
create user namespaces
• Yes, for container users, because rootless containers are much safer
than running everything as the root
• No, for others, because it can be rather an attack surface
CVE-2023-32233: Privilege escalation in Linux Kernel due to a Netfilter
nf_tables vulnerability
• Several mechanisms are being worked on to conditionally enable
unprivileged user namespaces
Criticisms against Rootless containers (and solutions)
11
• Linux v6.1 (2022) introduced a new LSM hook: userns_create
– Hookable from KRSI (eBPF LSM)
– Userspace tools have to be improved to provide a human-friendly UX for this
• Ubuntu 23.10 introduced a new sysctl value
kernel.apparmor_restrict_unprivileged_userns
– /etc/apparmor.d/usr.bin.<FOO> profile is needed to create UserNS
– Older releases of Ubuntu were using kernel.unprivileged_userns_clone
(system-wide single boolean value)
Criticisms against Rootless containers (and solutions)
LSM: Linux Security Module, KRSI: Kernel Runtime Security Instrumentation
Rootless Kubernetes
• Usernetes: Rootless Kubernetes
https://github.com/rootless-containers/usernetes
• The current version is implemented by running Kubernetes inside
Rootless Docker/Podman/nerdctl
• Multi-node networking is possible with VXLAN (Flannel)
13
Rootless Kubernetes
• Began in 2018
– As old as Rootless Docker (pre-release at that time) and Rootless Podman
• The changes to Kubernetes was merged in Kubernetes v1.22
(Aug 2021)
– Feature gate: KubeletInUsernameSpace (Alpha)
• The feature gate is also adopted by:
– kind (with Rootless Docker or Rootless Podman)
– Minikube (with Rootless Docker or Rootless Podman)
– k3s
14
History
Gen 1 (2018-2023) Gen 2 (2023-)
Host dependency RootlessKit Rootless Docker,
Rootless Podman, or
Rootless nerdctl
(contaiNERD CTL)
Supports kubeadm No Yes
Supports multi-node Yes, but practically No,
due to complexity
Yes
Supports hostPath
volumes
Yes Yes, for most paths,
but needs an extra config
15
Usernetes Gen 1 vs Gen 2
”The hard way”
Similar to `kind` and minikube,
but supports real multi-node
16
Usage
# Bootstrap the first node
make up
make kubeadm-init
make install-flannel
# Enable kubectl
make kubeconfig
export KUBECONFIG=$(pwd)/kubeconfig
kubectl get pods -A
# Multi-node
make join-command
scp join-command another-host:~/usernetes
ssh another-host make -C ~/usernetes up kubeadm-join

More Related Content

PDF
Rootless Containers
PDF
Rootless Containers & Unresolved issues
PDF
[KubeCon NA 2020] containerd: Rootless Containers 2020
PDF
The State of Rootless Containers
PDF
Podman rootless containers
PDF
Rootless Kubernetes
PDF
DCSF19 Hardening Docker daemon with Rootless mode
PDF
[DockerCon 2019] Hardening Docker daemon with Rootless mode
Rootless Containers
Rootless Containers & Unresolved issues
[KubeCon NA 2020] containerd: Rootless Containers 2020
The State of Rootless Containers
Podman rootless containers
Rootless Kubernetes
DCSF19 Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode

Similar to 20240201 [HPC Containers] Rootless Containers.pdf (20)

PDF
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
PPTX
Usernetes: Kubernetes as a non-root user
PDF
[DockerCon 2020] Hardening Docker daemon with Rootless Mode
PDF
The internals and the latest trends of container runtimes
PDF
[Podman Special Event] Kubernetes in Rootless Podman
PDF
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
PDF
ACM_Intro_Containers_Cloud.pdf Cloud.pdf
PDF
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
PDF
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
PDF
Introduction to Docker (as presented at December 2013 Global Hackathon)
PDF
Lightweight Virtualization: LXC containers & AUFS
PPTX
Exploring Docker Security
PDF
Scale11x lxc talk
PPTX
Central Iowa Linux Users Group: November Meeting -- Container showdown
PPTX
Docker: Aspects of Container Isolation
PDF
Docker
PDF
[CNCF TAG-Runtime] Usernetes Gen2
PDF
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
PDF
Rooting Out Root: User namespaces in Docker
PDF
Linux Containers From Scratch
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
Usernetes: Kubernetes as a non-root user
[DockerCon 2020] Hardening Docker daemon with Rootless Mode
The internals and the latest trends of container runtimes
[Podman Special Event] Kubernetes in Rootless Podman
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
ACM_Intro_Containers_Cloud.pdf Cloud.pdf
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Introduction to Docker (as presented at December 2013 Global Hackathon)
Lightweight Virtualization: LXC containers & AUFS
Exploring Docker Security
Scale11x lxc talk
Central Iowa Linux Users Group: November Meeting -- Container showdown
Docker: Aspects of Container Isolation
Docker
[CNCF TAG-Runtime] Usernetes Gen2
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Rooting Out Root: User namespaces in Docker
Linux Containers From Scratch
Ad

More from Akihiro Suda (20)

PDF
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
PDF
20250616 [KubeCon JP 2025] VexLLM - Silence Negligible CVE Alerts Using LLM.pdf
PDF
20250403 [KubeCon EU] containerd - Project Update and Deep Dive.pdf
PDF
20250403 [KubeCon EU Pavilion] containerd.pdf
PDF
20250402 [KubeCon EU Pavilion] Lima.pdf_
PDF
20241115 [KubeCon NA Pavilion] Lima.pdf_
PDF
20241113 [KubeCon NA Pavilion] containerd.pdf
PDF
【情報科学若手の会 (2024/09/14】なぜオープンソースソフトウェアにコントリビュートすべきなのか
PDF
【Vuls祭り#10 (2024/08/20)】 VexLLM: LLMを用いたVEX自動生成ツール
PDF
20240321 [KubeCon EU Pavilion] Lima.pdf_
PDF
20240320 [KubeCon EU Pavilion] containerd.pdf
PDF
[KubeConNA2023] Lima pavilion
PDF
[KubeConNA2023] containerd pavilion
PDF
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
PDF
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
PDF
[KubeConEU2023] Lima pavilion
PDF
[KubeConEU2023] containerd pavilion
PDF
[Container Plumbing Days 2023] Why was nerdctl made?
PDF
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
PDF
[CNCF TAG-Runtime 2022-10-06] Lima
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
20250616 [KubeCon JP 2025] VexLLM - Silence Negligible CVE Alerts Using LLM.pdf
20250403 [KubeCon EU] containerd - Project Update and Deep Dive.pdf
20250403 [KubeCon EU Pavilion] containerd.pdf
20250402 [KubeCon EU Pavilion] Lima.pdf_
20241115 [KubeCon NA Pavilion] Lima.pdf_
20241113 [KubeCon NA Pavilion] containerd.pdf
【情報科学若手の会 (2024/09/14】なぜオープンソースソフトウェアにコントリビュートすべきなのか
【Vuls祭り#10 (2024/08/20)】 VexLLM: LLMを用いたVEX自動生成ツール
20240321 [KubeCon EU Pavilion] Lima.pdf_
20240320 [KubeCon EU Pavilion] containerd.pdf
[KubeConNA2023] Lima pavilion
[KubeConNA2023] containerd pavilion
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
[KubeConEU2023] Lima pavilion
[KubeConEU2023] containerd pavilion
[Container Plumbing Days 2023] Why was nerdctl made?
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
[CNCF TAG-Runtime 2022-10-06] Lima
Ad

Recently uploaded (20)

PPTX
ai tools demonstartion for schools and inter college
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
top salesforce developer skills in 2025.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
System and Network Administration Chapter 2
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Transform Your Business with a Software ERP System
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
ai tools demonstartion for schools and inter college
How to Choose the Right IT Partner for Your Business in Malaysia
VVF-Customer-Presentation2025-Ver1.9.pptx
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Design an Analysis of Algorithms II-SECS-1021-03
top salesforce developer skills in 2025.pdf
Operating system designcfffgfgggggggvggggggggg
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
L1 - Introduction to python Backend.pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
System and Network Administration Chapter 2
How to Migrate SBCGlobal Email to Yahoo Easily
Odoo POS Development Services by CandidRoot Solutions
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Transform Your Business with a Software ERP System
ManageIQ - Sprint 268 Review - Slide Deck
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Materi-Enum-and-Record-Data-Type (1).pptx

20240201 [HPC Containers] Rootless Containers.pdf

  • 1. Rootless Containers Akihiro Suda (NTT) akihiro.suda.cz@hco.ntt.co.jp HPC Containers Advisory Council Meeting (Feb 1, 2024)
  • 2. • Puts container runtimes (as well as containers) in a user namespace – UserNS: Linux kernel’s feature that maps a non-root user to a fake root (the root privilege is limited inside the namespace) • Can mitigate potential vulnerabilities of the runtimes – No access to read/write other users’ files – No access to modify the kernel – No access to modify the firmware – No ARP spoofing – No DNS spoofing • Also useful for shared hosts (High-performance Computing, etc.) – Works with GPU too 2 Rootless containers e.g., runc breakout CVE-2024-21626 (2024-01-31)
  • 3. • 2014: LXC v1.0 introduced support for Rootless containers (called “unprivileged containers” at that time) – Networking depends on a SETUID binary, which is hard to configure and also is insecure • 2016: Singularity v2.2 gained initial support for Rootless • 2017: runc v1.0-rc4 gained initial support for Rootless • 2018: Several works has begun to support Rootless in containerd, BuildKit, Docker, Podman, etc. – slirp4netns (usermode TCP/IP) eliminated the need to use a SETUID binary for bringing up container-to-container networks • 2019: Docker v19.03 was released with an experimental Rootless support • 2020: Docker v20.10 was released with general availability of Rootless 3 History
  • 4. • Linux kernel’s feature to remap UIDs and GIDs – UID=1000 gains fake root privileges (UID=0) that are enough to create containers – The privileges are limited inside the namespace • Typically at least 65,536 subuids have to be allocated for containers – Static configuration (/etc/subuid): most common, but can be a mess for shared computing – Dynamic configuration (nsswitch): more preferrable for shared computing • e.g., via FreeIPA https://freeipa.readthedocs.io/en/latest/designs/subordinate-ids.html 4 User namespaces # /etc/subuid 1000:100000:65536 0 1 65536 0 1000 100000 165535
  • 5. • POC of subuid-less rootless containers is also available, but not ready to be used yet https://github.com/rootless-containers/subuidless – Emulates UID-related syscalls such as chown(2) using seccomp_unotify(2) and xattr(7) – More syscalls have to be emulated 5 User namespaces
  • 6. 6 Networking stack (vEth) eth0: 172.17.0.2 (Bridge) docker0: 172.17.0.1 (TAP) tap0: 10.0.2.100 (vEth) (vEth) Network namespaces (vEth) eth0: 172.17.0.3 (Physical Ethernet) eth0: 192.168.0.42 (slirp4netns) virtual IP:10.0.2.2 Network namespace + User namespace Ethernet packets Unprivileged socket syscalls
  • 7. • Rootless Docker daemon is executed in slirp4netns’s NetNS too, for ease of implementation – Slow pull/push – No direct access to localhost registries – No support for --net=host • Docker v26 (or later) may execute the daemon outside slirp4netns’s NetNS to eliminate the restrictions 🎉 https://github.com/moby/moby/pull/47103 (WIP) • The same technique has been used by Podman and nerdctl (contaiNERD CTL) v2 too 7 Faster networking (for runtimes)
  • 8. 8 Faster networking (for containers) • Bypass4netns allows bypassing slirp4netns https://github.com/rootless-containers/bypass4netns • Captures socket syscalls inside the NetNS, reconstructs the FDs outside the NetNS, and replaces the FDs inside the NetNS • Integrated into nerdctl (opt-in) • Can be used with Docker and Podman too
  • 9. 9 Faster networking (for containers) Accelerating TCP/IP Communications in Rootless Containers by Socket Switching (Naoki Matsumoto and Akihiro Suda, SWoPP 2022) https://speakerdeck.com/mt2naoki/ip-communications-in-rootless-containers-by-socket-switching?slide=4 Even faster than rootful
  • 10. 10 • It is controversial whether non-root users should be allowed to create user namespaces • Yes, for container users, because rootless containers are much safer than running everything as the root • No, for others, because it can be rather an attack surface CVE-2023-32233: Privilege escalation in Linux Kernel due to a Netfilter nf_tables vulnerability • Several mechanisms are being worked on to conditionally enable unprivileged user namespaces Criticisms against Rootless containers (and solutions)
  • 11. 11 • Linux v6.1 (2022) introduced a new LSM hook: userns_create – Hookable from KRSI (eBPF LSM) – Userspace tools have to be improved to provide a human-friendly UX for this • Ubuntu 23.10 introduced a new sysctl value kernel.apparmor_restrict_unprivileged_userns – /etc/apparmor.d/usr.bin.<FOO> profile is needed to create UserNS – Older releases of Ubuntu were using kernel.unprivileged_userns_clone (system-wide single boolean value) Criticisms against Rootless containers (and solutions) LSM: Linux Security Module, KRSI: Kernel Runtime Security Instrumentation
  • 13. • Usernetes: Rootless Kubernetes https://github.com/rootless-containers/usernetes • The current version is implemented by running Kubernetes inside Rootless Docker/Podman/nerdctl • Multi-node networking is possible with VXLAN (Flannel) 13 Rootless Kubernetes
  • 14. • Began in 2018 – As old as Rootless Docker (pre-release at that time) and Rootless Podman • The changes to Kubernetes was merged in Kubernetes v1.22 (Aug 2021) – Feature gate: KubeletInUsernameSpace (Alpha) • The feature gate is also adopted by: – kind (with Rootless Docker or Rootless Podman) – Minikube (with Rootless Docker or Rootless Podman) – k3s 14 History
  • 15. Gen 1 (2018-2023) Gen 2 (2023-) Host dependency RootlessKit Rootless Docker, Rootless Podman, or Rootless nerdctl (contaiNERD CTL) Supports kubeadm No Yes Supports multi-node Yes, but practically No, due to complexity Yes Supports hostPath volumes Yes Yes, for most paths, but needs an extra config 15 Usernetes Gen 1 vs Gen 2 ”The hard way” Similar to `kind` and minikube, but supports real multi-node
  • 16. 16 Usage # Bootstrap the first node make up make kubeadm-init make install-flannel # Enable kubectl make kubeconfig export KUBECONFIG=$(pwd)/kubeconfig kubectl get pods -A # Multi-node make join-command scp join-command another-host:~/usernetes ssh another-host make -C ~/usernetes up kubeadm-join