SlideShare a Scribd company logo
2
Most read
3
Most read
4
Most read
Function Level Analysis of
Linux NVMe Driver
Comparative analysis on legacy and MQ-based NVMe driver
Ingu Kang
Embedded Systems Lab.
Kookmin University
2016-05-09
General
Architecture of
Linux I/O Stack
2
Linux NVMe
Function Call Flow
● The NVMe driver of Linux ~3.18
bypasses the block layer
routines, and directly takes bio
structure instances.
● On Linux 3.19+, bios go through
blk-mq and are converted into
“request” structure.
● blk-mq queues requests into the
NVMe driver by calling
nvme_queue_rq().
● On Linux 4.4+, some refactoring
has been done, including
optimization and bug-fixes.
3
1. nvme_make_request() takes a
bio from the upper layer,
converting it into iod (I/O
descriptor). iod is allocated
independently for each bio.
2. nvme_submit_iod() takes an
iod and gets a struct
nvme_cmd_info as well as its
index on the space pre-allocated
in current CPU's nvmeq(struct)
for it.
3. nvme_submit_iod() builds up
a command in a submission
queue (SQ) based on the iod. It
also sets up the nvme_cmd_info
(pre-allocated) that saves the
reference to an iod and a
callback. Those will be used in
completion queue entry
processing. Finally, it rings the
doorbell of SQ.
Note: we save pointers to iod and
callback into cmd_info, as
SQ/CQ entries can contain
command_id but not iod and
Linux ~3.18
4
1. blk_mq_make_request() takes a
bio and builds up a request
instance from it. The instance is
picked from the instance pool of a
hctx, which was pre-allocated on
device initialization. request->tag
keeps the index of request instance
itself in pool array. 'tag' will be used
as command_id in
nvme_submit_iod() later on. The
request is then added to
plug->mq_list.
2. blk_mq_flush_plug_list() and
blk_mq_insert_requests() flush
requests from the plug->mq_list to
the ctx->rq_list (note: ctx->rq_list
is a per-CPU SW queue).
3. flush_busy_ctxs() and
__blk_mq_insert_request() flush
requests from the ctx->rq_list to
the locally defined rq_list that acts
as 1-to-1 HW dispatch queue, and
process them with
nvme_queue_rq().
Linux 3.19+ (1)
5
4. nvme_queue_rq() allocates an
iod and converts a request into the
allocated iod.
5. nvme_submit_iod() converts iod
into command. request->tag is
reused as command_id. Callback
and iod information is set into
nvme_cmd_info for later use in the
completion routine. It then rings
the SQ doorbell.
Note 1: The address of
pre-allocated nvme_cmd_info is
obtained by calling
blk_mq_rq_to_pdu(), which
calculates address to PDU data
area by adding sizeof(struct
request) to address of request
instance.
Note 2: nvme_cmd_info and
nvme_iod are merged in the most
recent version of NVMe driver. All
the iods are pre-allocated on device
initialization, but the DMA memory
segments information array
iod->sg is NOT.
Linux 3.19+ (2)
6
Thanks!
7

More Related Content

PDF
The Linux Block Layer - Built for Fast Storage
PPTX
Enable DPDK and SR-IOV for containerized virtual network functions with zun
PDF
Linux-Internals-and-Networking
PPTX
Linux Kernel MMC Storage driver Overview
PDF
Kdump and the kernel crash dump analysis
PDF
netfilter and iptables
PDF
Understanding the Android System Server
PPTX
Linux Network Stack
The Linux Block Layer - Built for Fast Storage
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Linux-Internals-and-Networking
Linux Kernel MMC Storage driver Overview
Kdump and the kernel crash dump analysis
netfilter and iptables
Understanding the Android System Server
Linux Network Stack

What's hot (20)

PDF
Low Level View of Android System Architecture
PDF
Device Tree for Dummies (ELC 2014)
PDF
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
PDF
NVMe Over Fabrics Support in Linux
PDF
QEMU Disk IO Which performs Better: Native or threads?
PDF
淺談探索 Linux 系統設計之道
PDF
Project meeting: Android Graphics Architecture Overview
PDF
LinuxCon 2015 Linux Kernel Networking Walkthrough
PDF
Linux systems - Linux Commands and Shell Scripting
PDF
Android Boot Time Optimization
PDF
High-Performance Networking Using eBPF, XDP, and io_uring
PDF
Jagan Teki - U-boot from scratch
PDF
Qemu Introduction
PPTX
DPDK KNI interface
PPT
Basic Linux Internals
PPTX
Linux Interrupts
PPTX
QEMU - Binary Translation
PDF
systemd
PDF
Linux Networking Explained
PDF
DPDK in Containers Hands-on Lab
Low Level View of Android System Architecture
Device Tree for Dummies (ELC 2014)
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
NVMe Over Fabrics Support in Linux
QEMU Disk IO Which performs Better: Native or threads?
淺談探索 Linux 系統設計之道
Project meeting: Android Graphics Architecture Overview
LinuxCon 2015 Linux Kernel Networking Walkthrough
Linux systems - Linux Commands and Shell Scripting
Android Boot Time Optimization
High-Performance Networking Using eBPF, XDP, and io_uring
Jagan Teki - U-boot from scratch
Qemu Introduction
DPDK KNI interface
Basic Linux Internals
Linux Interrupts
QEMU - Binary Translation
systemd
Linux Networking Explained
DPDK in Containers Hands-on Lab
Ad

Viewers also liked (20)

PDF
Introduction to NVMe Over Fabrics-V3R
PDF
Moving to PCI Express based SSD with NVM Express
PDF
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
PPTX
Userspace Linux I/O
PDF
Hardware accelerated virtio networking for nfv linux con
PDF
NVMe PCIe and TLC V-NAND It’s about Time
PDF
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
PDF
Devconf2017 - Can VMs networking benefit from DPDK
PDF
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
PDF
SR-IOV ixgbe Driver Limitations and Improvement
PDF
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
PPTX
HP_NextGEN_Training_Q4_2015
PPTX
HERD-Hanjun
PPTX
GPUrdma - Presentation
PPT
Paper on RDMA enabled Cluster FileSystem at Intel Developer Forum
PPT
PDF
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
PDF
Identifying PCIe 3.0 Dynamic Equalization Problems
PDF
SOUG_GV_Flashgrid_V4
PPTX
Persistent memory
Introduction to NVMe Over Fabrics-V3R
Moving to PCI Express based SSD with NVM Express
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Userspace Linux I/O
Hardware accelerated virtio networking for nfv linux con
NVMe PCIe and TLC V-NAND It’s about Time
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Devconf2017 - Can VMs networking benefit from DPDK
Intel and DataStax: 3D XPoint and NVME Technology Cassandra Storage Comparison
SR-IOV ixgbe Driver Limitations and Improvement
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
HP_NextGEN_Training_Q4_2015
HERD-Hanjun
GPUrdma - Presentation
Paper on RDMA enabled Cluster FileSystem at Intel Developer Forum
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Identifying PCIe 3.0 Dynamic Equalization Problems
SOUG_GV_Flashgrid_V4
Persistent memory
Ad

Similar to Function Level Analysis of Linux NVMe Driver (20)

PDF
High Performance Storage Devices in the Linux Kernel
PDF
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
PDF
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
PDF
Current and Future of Non-Volatile Memory on Linux
PDF
Embedded Android : System Development - Part II (Linux device drivers)
PPT
Linuxdd[1]
PDF
Buiding a better Userspace - The current and future state of QEMU and KVM int...
PDF
Kernel Recipes 2015 - So you want to write a Linux driver framework
PPT
Driver_linux
ODP
Fedora Virtualization Day: Linux Containers & CRIU
PDF
Linux io-stack-diagram v1.0
PDF
2. Vagin. Linux containers. June 01, 2013
PDF
Ganesh naik linux_kernel_internals
PDF
Ganesh naik linux_kernel_internals
PPTX
Io sy.stemppt
PPTX
HSA Kernel Code (KFD v0.6)
ODP
Visual comparison of Unix-like systems & Virtualisation
PPT
Introduction to Linux Kernel by Quontra Solutions
PPT
Linux introduction
High Performance Storage Devices in the Linux Kernel
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Current and Future of Non-Volatile Memory on Linux
Embedded Android : System Development - Part II (Linux device drivers)
Linuxdd[1]
Buiding a better Userspace - The current and future state of QEMU and KVM int...
Kernel Recipes 2015 - So you want to write a Linux driver framework
Driver_linux
Fedora Virtualization Day: Linux Containers & CRIU
Linux io-stack-diagram v1.0
2. Vagin. Linux containers. June 01, 2013
Ganesh naik linux_kernel_internals
Ganesh naik linux_kernel_internals
Io sy.stemppt
HSA Kernel Code (KFD v0.6)
Visual comparison of Unix-like systems & Virtualisation
Introduction to Linux Kernel by Quontra Solutions
Linux introduction

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
cuic standard and advanced reporting.pdf
PPTX
Cloud computing and distributed systems.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Electronic commerce courselecture one. Pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Modernizing your data center with Dell and AMD
PPTX
A Presentation on Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Reach Out and Touch Someone: Haptics and Empathic Computing
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Building Integrated photovoltaic BIPV_UPV.pdf
The AUB Centre for AI in Media Proposal.docx
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
Modernizing your data center with Dell and AMD
A Presentation on Artificial Intelligence
Approach and Philosophy of On baking technology
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
CIFDAQ's Market Insight: SEC Turns Pro Crypto

Function Level Analysis of Linux NVMe Driver

  • 1. Function Level Analysis of Linux NVMe Driver Comparative analysis on legacy and MQ-based NVMe driver Ingu Kang Embedded Systems Lab. Kookmin University 2016-05-09
  • 3. Linux NVMe Function Call Flow ● The NVMe driver of Linux ~3.18 bypasses the block layer routines, and directly takes bio structure instances. ● On Linux 3.19+, bios go through blk-mq and are converted into “request” structure. ● blk-mq queues requests into the NVMe driver by calling nvme_queue_rq(). ● On Linux 4.4+, some refactoring has been done, including optimization and bug-fixes. 3
  • 4. 1. nvme_make_request() takes a bio from the upper layer, converting it into iod (I/O descriptor). iod is allocated independently for each bio. 2. nvme_submit_iod() takes an iod and gets a struct nvme_cmd_info as well as its index on the space pre-allocated in current CPU's nvmeq(struct) for it. 3. nvme_submit_iod() builds up a command in a submission queue (SQ) based on the iod. It also sets up the nvme_cmd_info (pre-allocated) that saves the reference to an iod and a callback. Those will be used in completion queue entry processing. Finally, it rings the doorbell of SQ. Note: we save pointers to iod and callback into cmd_info, as SQ/CQ entries can contain command_id but not iod and Linux ~3.18 4
  • 5. 1. blk_mq_make_request() takes a bio and builds up a request instance from it. The instance is picked from the instance pool of a hctx, which was pre-allocated on device initialization. request->tag keeps the index of request instance itself in pool array. 'tag' will be used as command_id in nvme_submit_iod() later on. The request is then added to plug->mq_list. 2. blk_mq_flush_plug_list() and blk_mq_insert_requests() flush requests from the plug->mq_list to the ctx->rq_list (note: ctx->rq_list is a per-CPU SW queue). 3. flush_busy_ctxs() and __blk_mq_insert_request() flush requests from the ctx->rq_list to the locally defined rq_list that acts as 1-to-1 HW dispatch queue, and process them with nvme_queue_rq(). Linux 3.19+ (1) 5
  • 6. 4. nvme_queue_rq() allocates an iod and converts a request into the allocated iod. 5. nvme_submit_iod() converts iod into command. request->tag is reused as command_id. Callback and iod information is set into nvme_cmd_info for later use in the completion routine. It then rings the SQ doorbell. Note 1: The address of pre-allocated nvme_cmd_info is obtained by calling blk_mq_rq_to_pdu(), which calculates address to PDU data area by adding sizeof(struct request) to address of request instance. Note 2: nvme_cmd_info and nvme_iod are merged in the most recent version of NVMe driver. All the iods are pre-allocated on device initialization, but the DMA memory segments information array iod->sg is NOT. Linux 3.19+ (2) 6