SlideShare a Scribd company logo
A ScyllaDB Community
Securely Serving
Millions of Boot
Artifacts a Day
João Pedro Lima
Systems Engineer, Linux Team
Matt Fleming
Senior Systems Engineer
João Pedro Lima (he/him)
■ Previously Product Security @VMWare, Infrastructure
Security @ Cloudflare
■ Interested in OS Security and Cryptography
Matt Fleming (he/him)
■ Former Linux Kernel maintainer
■ Focused on performance of OS, DB, and dist sys
■ Co-authored papers on Change Point Detection and
testing distributed systems
■ Cloudflare’s Fleet Architecture
■ Boot Process
■ Design Evolution
■ Future Work
Presentation Agenda
Fleet Architecture
Cloudflare has data centers in over 335 cities
■ Edge vs. control plane
■ Control plane data centers can have thousands of machines
■ Edge data centers can be big or small
■ Edge compute servers are called “metals” for historical reasons
■ OS is stateless and executes from ramdisk
■ Optimised, latest LTS Linux Kernel
■ Debian 12 (bookworm)
■ Secure Boot
Fleet Architecture
■ Each datacenter has a set of datacenter manager (DM) nodes
■ DM renders boot configuration for all nodes in datacenter from
configuration management
■ Metal requests boot artifacts from DM
■ DM runs nginx
■ DM cryptographically signs all artifacts on render
■ Metal verifies artifacts before executing
Fleet Architecture
Boot Process
Boot Process
Power-on
/Reset
Chipset/CPU
BIOS/UEFI
firmware
PXE/iPXE OS Kernel
Userspace
Baseboard
management
controller (BMC)
iPXE scripts
■ Secure boot keys configured when metal is provisioned into datacenter
■ iPXE used to pull kernel image and ramdisk via HTTP
■ Secure boot used to verify kernel images and modules
Boot Process
#!ipxe
:diag
kernel ${boot_prefix}/vmlinuz initrd=diag-image.img console=tty0
imgverify vmlinuz ${boot_prefix}/vmlinuz.sig
initrd ${boot_prefix}/diag-image.img
imgverify diag-image.img ${boot_prefix}/diag-image.img.sig
boot
:updates
imgfetch --name {{ hw_model }}/update.ipxe {{ hw_model }}/update.ipxe
imgverify {{ hw_model }}/update.ipxe {{ hw_model }}/update.ipxe.sig
imgexec {{ hw_model }}/update.ipxe
:baseimg
kernel vmlinuz
imgverify vmlinuz ${boot_prefix}/vmlinuz.sig
initrd ${boot_prefix}/baseimg.img
imgverify baseimg.img ${boot_prefix}/baseimg.img.sig
initrd ${boot_prefix}/{{ net_img }}.img
imgverify {{ net_img }}.img ${boot_prefix}/{{ net_img }}.img.sig
boot
Secure boot
Power-on
/Reset
Chipset/CPU
BIOS/UEFI
firmware
PXE/iPXE OS Kernel
Userspace
Baseboard
management
controller (BMC)
iPXE scripts
https://blog.cloudflare.com/anchoring-trust-a-hardware-secure-boot-story/
Platform Secure Boot/
HW root of trust
UEFI Secure Boot
Secure boot + iPXE signing
Power-on
/Reset
Chipset/CPU
BIOS/UEFI
firmware
PXE/iPXE OS Kernel
Userspace
Baseboard
management
controller (BMC)
iPXE scripts
Platform Secure Boot/
HW root of trust
UEFI Secure Boot
■ Network boot (netboot) via DM
■ Default boot strategy for metals
■ Boot artifacts retrieved just-in-time
■ Local disk boot (localboot) from disk EFI partition
■ Needed to boot first DM in datacenter
■ Fallback strategy for metals if no DM is available
■ Boot artifacts are synced every time configuration management runs (~ 3
hours)
Boot strategies
Architecture Control plane colo
Config mgmt
master Vault
Boot information
sources
“Primary” DM
Metals
DMs
Edge colo
Boot render flow
Boot flow
Design Evolution
Challenges
■ Too many nodes trusted to render and sign boot artifacts
■ Ideally go from all DMs to just a single identity
■ DMs have failover but it’s not elastic
■ DMs configuration management update time is dominated by boot
artifact handling
■ Localboot pull model is costly and inefficient
Requirements
■ Must be able to generate artifacts for all nodes
■ Highly available service
■ Tolerant to the loss of part of/entire control plane
■ Some degradation is acceptable in extreme circumstances
Architecture Internal K8S cluster
Boot service Vault
Boot information
sources
DMs Metals
Edge colo
Internal backup K8S cluster
Boot service Vault
Boot information
sources
Internal S3
cluster
Backup cloud S3
cluster
■ Reduced trust domain
■ Only boot service is able to render and sign dynamic artifacts
■ Static artifacts are signed once and served from S3 afterwards
■ DM configuration management gains
■ High Availability and Elasticity
■ K8s service written in Go
■ Load balanced across multiple instances
■ Fallback to public cloud S3 for default artifacts if all K8S out
■ Localboot adopts push model
■ Only rendered and updated if node configuration changed
Boot service
Future Work
Future Work
■ Cryptographically verify all executable code
■ Eliminate DMs completely
■ TPMs!
Thank you!
João Pedro Lima
jlima@cloudflare.com
jopelima
in/joaopedropaulinolima
Matt Fleming
mfleming@cloudflare.com
fleming_matt
mfleming

More Related Content

PDF
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
PDF
Building Automated Infrastructure Policy and Trust Systems
PPTX
Security Enhancements in Windows Server 2012 Securing the Private - Cloud Inf...
PDF
Secure IOT Gateway
PDF
LCA 2013 - Baremetal Provisioning with Openstack
PPTX
Secure boot general
PDF
Automated Out-of-Band management with Ansible and Redfish
PDF
Manage your bare-metal infrastructure with a CI/CD-driven approach
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
Building Automated Infrastructure Policy and Trust Systems
Security Enhancements in Windows Server 2012 Securing the Private - Cloud Inf...
Secure IOT Gateway
LCA 2013 - Baremetal Provisioning with Openstack
Secure boot general
Automated Out-of-Band management with Ansible and Redfish
Manage your bare-metal infrastructure with a CI/CD-driven approach

Similar to Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt Fleming (20)

PDF
Karl Grzeszczak: September Docker Presentation at Mediafly
PDF
How we built Packet's bare metal cloud platform
PDF
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
PDF
State of Union - Containerz
PDF
CoreOS @Codetalks Hamburg
PDF
Here Be Dragons: Security Maps of the Container New World
PDF
Distro Recipes 2013: Secure Boot and Linux: several issues, one solution
PPTX
Episode 1: Building Kubernetes-as-a-Service
PDF
Hot Cloud'16: An Experiment on Bare-Metal BigData Provisioning
PPTX
501 ch 5 securing hosts and data
PDF
Redfish and python-redfish for Software Defined Infrastructure
PPTX
Automated out-of-band management with Ansible and Redfish
PDF
Canonical ubuntu introduction_20170330
PPTX
PLNOG15: Simplifying network deployment using Autonomic networking and Plug-a...
PDF
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
PDF
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
ODP
Continuous Security
PDF
Scale 12x Securing Your Cloud with The Xen Hypervisor
PDF
PXE Boot Server Using AIO Boot Creater (Pre Boot Execution)
PDF
How (and why!) we built Packet
Karl Grzeszczak: September Docker Presentation at Mediafly
How we built Packet's bare metal cloud platform
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
State of Union - Containerz
CoreOS @Codetalks Hamburg
Here Be Dragons: Security Maps of the Container New World
Distro Recipes 2013: Secure Boot and Linux: several issues, one solution
Episode 1: Building Kubernetes-as-a-Service
Hot Cloud'16: An Experiment on Bare-Metal BigData Provisioning
501 ch 5 securing hosts and data
Redfish and python-redfish for Software Defined Infrastructure
Automated out-of-band management with Ansible and Redfish
Canonical ubuntu introduction_20170330
PLNOG15: Simplifying network deployment using Autonomic networking and Plug-a...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
New Jersey Red Hat Users Group Presentation: Provisioning anywhere
Continuous Security
Scale 12x Securing Your Cloud with The Xen Hypervisor
PXE Boot Server Using AIO Boot Creater (Pre Boot Execution)
How (and why!) we built Packet
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
PDF
A Dist Sys Programmer's Journey into AI by Piotr Sarna
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...
A Dist Sys Programmer's Journey into AI by Piotr Sarna
Ad

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Modernizing your data center with Dell and AMD
PDF
Machine learning based COVID-19 study performance prediction
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
cuic standard and advanced reporting.pdf
PDF
Empathic Computing: Creating Shared Understanding
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Understanding_Digital_Forensics_Presentation.pptx
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Modernizing your data center with Dell and AMD
Machine learning based COVID-19 study performance prediction
Unlocking AI with Model Context Protocol (MCP)
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Spectral efficient network and resource selection model in 5G networks
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Review of recent advances in non-invasive hemoglobin estimation
cuic standard and advanced reporting.pdf
Empathic Computing: Creating Shared Understanding

Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt Fleming

  • 1. A ScyllaDB Community Securely Serving Millions of Boot Artifacts a Day João Pedro Lima Systems Engineer, Linux Team Matt Fleming Senior Systems Engineer
  • 2. João Pedro Lima (he/him) ■ Previously Product Security @VMWare, Infrastructure Security @ Cloudflare ■ Interested in OS Security and Cryptography
  • 3. Matt Fleming (he/him) ■ Former Linux Kernel maintainer ■ Focused on performance of OS, DB, and dist sys ■ Co-authored papers on Change Point Detection and testing distributed systems
  • 4. ■ Cloudflare’s Fleet Architecture ■ Boot Process ■ Design Evolution ■ Future Work Presentation Agenda
  • 6. Cloudflare has data centers in over 335 cities ■ Edge vs. control plane ■ Control plane data centers can have thousands of machines ■ Edge data centers can be big or small ■ Edge compute servers are called “metals” for historical reasons ■ OS is stateless and executes from ramdisk ■ Optimised, latest LTS Linux Kernel ■ Debian 12 (bookworm) ■ Secure Boot Fleet Architecture
  • 7. ■ Each datacenter has a set of datacenter manager (DM) nodes ■ DM renders boot configuration for all nodes in datacenter from configuration management ■ Metal requests boot artifacts from DM ■ DM runs nginx ■ DM cryptographically signs all artifacts on render ■ Metal verifies artifacts before executing Fleet Architecture
  • 9. Boot Process Power-on /Reset Chipset/CPU BIOS/UEFI firmware PXE/iPXE OS Kernel Userspace Baseboard management controller (BMC) iPXE scripts
  • 10. ■ Secure boot keys configured when metal is provisioned into datacenter ■ iPXE used to pull kernel image and ramdisk via HTTP ■ Secure boot used to verify kernel images and modules Boot Process
  • 11. #!ipxe :diag kernel ${boot_prefix}/vmlinuz initrd=diag-image.img console=tty0 imgverify vmlinuz ${boot_prefix}/vmlinuz.sig initrd ${boot_prefix}/diag-image.img imgverify diag-image.img ${boot_prefix}/diag-image.img.sig boot :updates imgfetch --name {{ hw_model }}/update.ipxe {{ hw_model }}/update.ipxe imgverify {{ hw_model }}/update.ipxe {{ hw_model }}/update.ipxe.sig imgexec {{ hw_model }}/update.ipxe :baseimg kernel vmlinuz imgverify vmlinuz ${boot_prefix}/vmlinuz.sig initrd ${boot_prefix}/baseimg.img imgverify baseimg.img ${boot_prefix}/baseimg.img.sig initrd ${boot_prefix}/{{ net_img }}.img imgverify {{ net_img }}.img ${boot_prefix}/{{ net_img }}.img.sig boot
  • 12. Secure boot Power-on /Reset Chipset/CPU BIOS/UEFI firmware PXE/iPXE OS Kernel Userspace Baseboard management controller (BMC) iPXE scripts https://blog.cloudflare.com/anchoring-trust-a-hardware-secure-boot-story/ Platform Secure Boot/ HW root of trust UEFI Secure Boot
  • 13. Secure boot + iPXE signing Power-on /Reset Chipset/CPU BIOS/UEFI firmware PXE/iPXE OS Kernel Userspace Baseboard management controller (BMC) iPXE scripts Platform Secure Boot/ HW root of trust UEFI Secure Boot
  • 14. ■ Network boot (netboot) via DM ■ Default boot strategy for metals ■ Boot artifacts retrieved just-in-time ■ Local disk boot (localboot) from disk EFI partition ■ Needed to boot first DM in datacenter ■ Fallback strategy for metals if no DM is available ■ Boot artifacts are synced every time configuration management runs (~ 3 hours) Boot strategies
  • 15. Architecture Control plane colo Config mgmt master Vault Boot information sources “Primary” DM Metals DMs Edge colo Boot render flow Boot flow
  • 17. Challenges ■ Too many nodes trusted to render and sign boot artifacts ■ Ideally go from all DMs to just a single identity ■ DMs have failover but it’s not elastic ■ DMs configuration management update time is dominated by boot artifact handling ■ Localboot pull model is costly and inefficient
  • 18. Requirements ■ Must be able to generate artifacts for all nodes ■ Highly available service ■ Tolerant to the loss of part of/entire control plane ■ Some degradation is acceptable in extreme circumstances
  • 19. Architecture Internal K8S cluster Boot service Vault Boot information sources DMs Metals Edge colo Internal backup K8S cluster Boot service Vault Boot information sources Internal S3 cluster Backup cloud S3 cluster
  • 20. ■ Reduced trust domain ■ Only boot service is able to render and sign dynamic artifacts ■ Static artifacts are signed once and served from S3 afterwards ■ DM configuration management gains ■ High Availability and Elasticity ■ K8s service written in Go ■ Load balanced across multiple instances ■ Fallback to public cloud S3 for default artifacts if all K8S out ■ Localboot adopts push model ■ Only rendered and updated if node configuration changed Boot service
  • 22. Future Work ■ Cryptographically verify all executable code ■ Eliminate DMs completely ■ TPMs!
  • 23. Thank you! João Pedro Lima jlima@cloudflare.com jopelima in/joaopedropaulinolima Matt Fleming mfleming@cloudflare.com fleming_matt mfleming