PCI pass-through with
de-privileged QEMU
XIN LI, XENSERVER, CITRIX
Privileged mode
• QEMU process is running as root.
• QEMU process’s CWD is “/”.
• No restrict to the hypercalls that
QEMU can call.
• …. (can do almost anything)
PID UID GID CMD
8265 0 0 qemu-dm-16 –machine ...
# pwdx 8265
8265: /
De-privileged mode
• QEMU process is running in generated
user-id and group-id.
• QEMU process is chrooted.
• xen-domid-restrict mode enabled.
• ….
PID UID GID CMD
11544 65552 1010 qemu-dm- ...
# pwdx 11544
11544: /var/xen/qemu/root-17
1. disable arbitrary hypercalls,
only a subset hypercalls(like MMAP, DM-Ops ….)
are allowed.
2. allow operation on the specified domain only
for all existing handlers.
Fallback to privileged for PCI pass-through
Why, what are the problems?
• In chrooted environment, both sysfs and device node path can’t be accessed.
• Without CAP_SYS_ADMIN, only the first 64B of PCI config file can be read.
• Non-root user can’t read all sysfs files needed for the PCI device.
• /dev/mem is directly access by QEMU to map physical MSI-X table.
• A set of restricted hypercalls are called only for PCI pass-through.
Enable privileged mode (1)
In tool stack (XAPI):
Before PCI device is attached:
1. mount sysfs in chroot directory
2. change ownership of the related sysfs
resource*, vendor, device, irq, class
After PCI device is detached:
reset the file owner to root
Enable privileged mode (2)
In tool stack (xl):
In pci_add, before sending “device_add”:
1. open PCI config file as root
2. send this fd to QEMU over QMP
In QEMU:
1. add new property config_fdset for XenPCIPassthroughState
2. prefer to read from this config fd from config_fdset
Enable privileged mode (3)
Index New DM-OP
1 physdev_map_pirq
2 physdev_map_pirq_msi
3 physdev_unmap_pirq
4 domain_bind_pt_pci_irq
5 domain_unbind_pt_pci_irq
6 domain_update_msi_irq
7 domain_unbind_msi_irq
8 domain_iomem_permission
9 domain_memory_mapping
10 domain_ioport_mapping
For xen-domid-restrict mode,
add new DM-OPs, let QEMU
call them instead.
Enable privileged mode (4)
QEMU access /dev/mem to
1. read vector control(per-vector mask) from msix table entry
2. read the PBA after msix table
Remove direct access to /dev/mem
• /sys/…/resource is already parsed to get the MMIO regions info. (addr, size, bar index,
offset).
• Instead of /dev/mem, we can mmap /sys/…/resourceX.
How to verify? Capabilities: [a0] MSI-X: Enable+ Count=17 Masked+
Thank you!

More Related Content

PDF
Ha opensuse
ODP
LSA2 - 01 Virtualization with KVM
ODP
LSA2 - 02 Control Groups
PDF
Embedded Systems Conference 2014 Presentation
PDF
Reconnaissance of Virtio: What’s new and how it’s all connected?
DOCX
How to prepare 32 tr scf
PDF
Kdump and the kernel crash dump analysis
PPTX
First steps on CentOs7
Ha opensuse
LSA2 - 01 Virtualization with KVM
LSA2 - 02 Control Groups
Embedded Systems Conference 2014 Presentation
Reconnaissance of Virtio: What’s new and how it’s all connected?
How to prepare 32 tr scf
Kdump and the kernel crash dump analysis
First steps on CentOs7

What's hot (20)

PPT
ODP
Linux Kernel Crashdump
PDF
Kernel Recipes 2015 - Kernel dump analysis
PDF
Server Room Configuration
PDF
Kernel crashdump
PDF
Beagleboard xm-setup
PDF
Local file systems update
PDF
Systemd cheatsheet
PDF
2009-06-18 CAVMEN System z Users Group Update
PDF
Kdump-FUDcon-2015-Session
PPTX
Build Your Own Android Tablet
PDF
Kcd226 Sistem Operasi Lecture01
PPT
Qemu - Raspberry | while42 Singapore #2
PDF
Iscsi
PDF
Linux fundamental - Chap 16 System Rescue
PDF
LinuxIO-Introduction-FUDCon-2015
PDF
unixtoolbox
PPT
Linux Crash Dump Capture and Analysis
PPTX
Free Bsd7.2 Install V1.7
PDF
KCC_Final.pdf
Linux Kernel Crashdump
Kernel Recipes 2015 - Kernel dump analysis
Server Room Configuration
Kernel crashdump
Beagleboard xm-setup
Local file systems update
Systemd cheatsheet
2009-06-18 CAVMEN System z Users Group Update
Kdump-FUDcon-2015-Session
Build Your Own Android Tablet
Kcd226 Sistem Operasi Lecture01
Qemu - Raspberry | while42 Singapore #2
Iscsi
Linux fundamental - Chap 16 System Rescue
LinuxIO-Introduction-FUDCon-2015
unixtoolbox
Linux Crash Dump Capture and Analysis
Free Bsd7.2 Install V1.7
KCC_Final.pdf
Ad

Similar to XPDDS18 Design Session: PCI pass-through with de-privileged QEMU - Xin Li, Citrix (20)

PDF
PCI Pass-through - FreeBSD VM on Hyper-V (MeetBSD California 2016)
PDF
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
PPTX
Slideshare - PCIe
PDF
PPT
AIXpert - AIX Security expert
PDF
XPDDS17: Bring up PCI Passthrough on ARM - Julien Grall, ARM
PPT
101 1.1 hardware settings
DOC
Notes for LX0-101 Linux
PPTX
XPDDS18: Qemu and Xen: Reducing the attack surface - Paul Durrant, Citrix
PDF
haifux-pcie.pdf
PDF
Pcie basic
PDF
PCI Drivers
PDF
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
PPT
Pcie drivers basics
PDF
PCI Passthrough and ITS Support in Xen / ARM :Xen Dev Summit 2015 Presentation
PPT
Basic Linux Internals
PDF
XPDDS17: The dm_op hypercall and libxendevicemodel - Paul Durrant, Citrix
PDF
PCI_Express_Basics_Background.pdf
PPTX
Linux MMAP & Ioremap introduction
PDF
XS Boston 2008 VT-D PCI
PCI Pass-through - FreeBSD VM on Hyper-V (MeetBSD California 2016)
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
Slideshare - PCIe
AIXpert - AIX Security expert
XPDDS17: Bring up PCI Passthrough on ARM - Julien Grall, ARM
101 1.1 hardware settings
Notes for LX0-101 Linux
XPDDS18: Qemu and Xen: Reducing the attack surface - Paul Durrant, Citrix
haifux-pcie.pdf
Pcie basic
PCI Drivers
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
Pcie drivers basics
PCI Passthrough and ITS Support in Xen / ARM :Xen Dev Summit 2015 Presentation
Basic Linux Internals
XPDDS17: The dm_op hypercall and libxendevicemodel - Paul Durrant, Citrix
PCI_Express_Basics_Background.pdf
Linux MMAP & Ioremap introduction
XS Boston 2008 VT-D PCI
Ad

More from The Linux Foundation (20)

PDF
ELC2019: Static Partitioning Made Simple
PDF
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
PDF
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
PDF
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
PDF
XPDDS19 Keynote: Unikraft Weather Report
PDF
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
PDF
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
PDF
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
PDF
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
PPTX
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
PPTX
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
PDF
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
PDF
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
PDF
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
PDF
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
PDF
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
PDF
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
PDF
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
PDF
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
PDF
XPDDS19: Implementing AMD MxGPU - Jonathan Farrell, Assured Information Security
ELC2019: Static Partitioning Made Simple
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Implementing AMD MxGPU - Jonathan Farrell, Assured Information Security

Recently uploaded (20)

PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPT
What is a Computer? Input Devices /output devices
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
2018-HIPAA-Renewal-Training for executives
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PPTX
Chapter 5: Probability Theory and Statistics
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
Five Habits of High-Impact Board Members
PPT
Geologic Time for studying geology for geologist
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Abstractive summarization using multilingual text-to-text transfer transforme...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
CloudStack 4.21: First Look Webinar slides
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Final SEM Unit 1 for mit wpu at pune .pptx
What is a Computer? Input Devices /output devices
Enhancing emotion recognition model for a student engagement use case through...
The influence of sentiment analysis in enhancing early warning system model f...
Zenith AI: Advanced Artificial Intelligence
2018-HIPAA-Renewal-Training for executives
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
OpenACC and Open Hackathons Monthly Highlights July 2025
Chapter 5: Probability Theory and Statistics
sbt 2.0: go big (Scala Days 2025 edition)
A proposed approach for plagiarism detection in Myanmar Unicode text
Five Habits of High-Impact Board Members
Geologic Time for studying geology for geologist
A contest of sentiment analysis: k-nearest neighbor versus neural network
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Abstractive summarization using multilingual text-to-text transfer transforme...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
UiPath Agentic Automation session 1: RPA to Agents
CloudStack 4.21: First Look Webinar slides
From MVP to Full-Scale Product A Startup’s Software Journey.pdf

XPDDS18 Design Session: PCI pass-through with de-privileged QEMU - Xin Li, Citrix

  • 1. PCI pass-through with de-privileged QEMU XIN LI, XENSERVER, CITRIX
  • 2. Privileged mode • QEMU process is running as root. • QEMU process’s CWD is “/”. • No restrict to the hypercalls that QEMU can call. • …. (can do almost anything) PID UID GID CMD 8265 0 0 qemu-dm-16 –machine ... # pwdx 8265 8265: /
  • 3. De-privileged mode • QEMU process is running in generated user-id and group-id. • QEMU process is chrooted. • xen-domid-restrict mode enabled. • …. PID UID GID CMD 11544 65552 1010 qemu-dm- ... # pwdx 11544 11544: /var/xen/qemu/root-17 1. disable arbitrary hypercalls, only a subset hypercalls(like MMAP, DM-Ops ….) are allowed. 2. allow operation on the specified domain only for all existing handlers.
  • 4. Fallback to privileged for PCI pass-through Why, what are the problems? • In chrooted environment, both sysfs and device node path can’t be accessed. • Without CAP_SYS_ADMIN, only the first 64B of PCI config file can be read. • Non-root user can’t read all sysfs files needed for the PCI device. • /dev/mem is directly access by QEMU to map physical MSI-X table. • A set of restricted hypercalls are called only for PCI pass-through.
  • 5. Enable privileged mode (1) In tool stack (XAPI): Before PCI device is attached: 1. mount sysfs in chroot directory 2. change ownership of the related sysfs resource*, vendor, device, irq, class After PCI device is detached: reset the file owner to root
  • 6. Enable privileged mode (2) In tool stack (xl): In pci_add, before sending “device_add”: 1. open PCI config file as root 2. send this fd to QEMU over QMP In QEMU: 1. add new property config_fdset for XenPCIPassthroughState 2. prefer to read from this config fd from config_fdset
  • 7. Enable privileged mode (3) Index New DM-OP 1 physdev_map_pirq 2 physdev_map_pirq_msi 3 physdev_unmap_pirq 4 domain_bind_pt_pci_irq 5 domain_unbind_pt_pci_irq 6 domain_update_msi_irq 7 domain_unbind_msi_irq 8 domain_iomem_permission 9 domain_memory_mapping 10 domain_ioport_mapping For xen-domid-restrict mode, add new DM-OPs, let QEMU call them instead.
  • 8. Enable privileged mode (4) QEMU access /dev/mem to 1. read vector control(per-vector mask) from msix table entry 2. read the PBA after msix table Remove direct access to /dev/mem • /sys/…/resource is already parsed to get the MMIO regions info. (addr, size, bar index, offset). • Instead of /dev/mem, we can mmap /sys/…/resourceX. How to verify? Capabilities: [a0] MSI-X: Enable+ Count=17 Masked+