Understanding PCI MSI (Message Signaled Interrupts) in Linux
Message Signaled Interrupts (MSI) represent a significant advancement in how modern computer systems handle device interrupts. This article explores how MSI works, its implementation in the Linux kernel, and provides concrete examples of MSI usage in device drivers.
Traditional Interrupts vs. MSI
┌───────────────────────────────────────────────────────────┐
│ │
│ Traditional Pin-Based Interrupts │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ PCI Device │ │ Interrupt Controller │ │
│ │ │ │ (PIC/APIC/IOAPIC) │ │
│ │ │ │ │ │
│ │ ┌───────────┐ │ │ ┌───────────┐ │ │
│ │ │ INT# Pin ├──────┼────┼─►│ IRQ Line │ │ │
│ │ └───────────┘ │ │ └─────┬─────┘ │ │
│ │ │ │ │ │ │
│ └─────────────────────┘ └────────┼─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ CPU │ │
│ │ │ │
│ └─────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────┐
│ │
│ Message Signaled Interrupts │
│ │
│ ┌─────────────────────┐ │
│ │ PCI Device │ │
│ │ │ │
│ │ ┌───────────┐ │ ┌───────────────────────┐ │
│ │ │ PCI │ │ │ Memory-Mapped │ │
│ │ │ Config │ │ │ Message Address │ │
│ │ │ Space │ │ │ │ │
│ │ └───────────┘ │ └───────────┬───────────┘ │
│ │ │ │ │ │
│ │ │ Memory Write │ │
│ │ └────────────┼────────────────┘ │
│ │ │ │
│ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ CPU │ │
│ │ │ │
│ └─────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
Traditional Interrupt Mechanism
In traditional pin-based interrupts:
MSI Mechanism
With Message Signaled Interrupts:
MSI in the PCI Specification
┌───────────────────────────────────────────────────────────┐
│ │
│ PCI MSI Capability │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ PCI Configuration Space │ │
│ │ │ │
│ │ ┌───────────────────────┐ │ │
│ │ │ MSI Capability Header │ │ │
│ │ │ (ID: 0x05) │ │ │
│ │ └───────────────────────┘ │ │
│ │ ┌───────────────────────┐ │ │
│ │ │ Message Control │ │ │
│ │ │ - MSI Enable │ │ │
│ │ │ - Multiple Message │ │ │
│ │ │ - 64-bit Address │ │ │
│ │ └───────────────────────┘ │ │
│ │ ┌───────────────────────┐ │ │
│ │ │ Message Address │ │ │
│ │ │ (Lower 32-bits) │ │ │
│ │ └───────────────────────┘ │ │
│ │ ┌───────────────────────┐ │ │
│ │ │ Message Address │ │ │
│ │ │ (Upper 32-bits) │ [Optional] │ │
│ │ └───────────────────────┘ │ │
│ │ ┌───────────────────────┐ │ │
│ │ │ Message Data │ │ │
│ │ └───────────────────────┘ │ │
│ │ ┌───────────────────────┐ │ │
│ │ │ Mask Bits │ [Optional] │ │
│ │ └───────────────────────┘ │ │
│ │ ┌───────────────────────┐ │ │
│ │ │ Pending Bits │ [Optional] │ │
│ │ └───────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
The PCI specification defines two types of Message Signaled Interrupts:
How MSI Works
1. Software Configuration:
- The OS/driver configures the MSI Capability Structure in the PCIe device’s configuration space. This includes:
- MSI Address: The memory address (physical or IOVA) of the interrupt controller’s register.
- MSI Data: The interrupt vector or value to write (determines which CPU interrupt is triggered).
- Message Control Register: Enables MSI, sets the number of supported vectors, etc.
2. Hardware Automation:
- Once configured and enabled, the PCIe device hardware autonomously generates a Memory Write TLP whenever it needs to signal an interrupt.
- The TLP includes:
- The pre-programmed MSI Address (destination in memory).
- The pre-programmed MSI Data (interrupt vector).
- No further software involvement is required for individual interrupts.
MSI in the Linux Kernel Architecture
┌───────────────────────────────────────────────────────────┐
│ │
│ Linux Kernel MSI Framework │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Device Driver │ │ PCI Subsystem │ │
│ │ │ │ │ │
│ │pci_alloc_irq_vectors│───►│ pci_msi_vec_count │ │
│ │ pci_irq_vector │ │ pci_enable_msi │ │
│ │ request_irq │ │ pci_enable_msix │ │
│ └─────────┬───────────┘ └───────────┬───────────┘ │
│ │ │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ MSI/MSI-X Core │ │
│ │ │ │
│ │ msi_domain_alloc_irqs │ │
│ │ msi_domain_free_irqs │ │
│ └───────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ IRQ Domain │ │
│ │ │ │
│ │ irq_domain_alloc_irqs │ │
│ │ irq_domain_free_irqs │ │
│ └───────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ Architecture-specific │ │
│ │ MSI Implementation │ │
│ │ (x86, ARM, etc.) │ │
│ └───────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
The Linux kernel implements a layered approach to MSI handling:
MSI Initialization Flow
┌───────────────────────────────────────────────────────────┐
│ │
│ MSI Initialization Flow │
│ │
│ ┌─────────────────────┐ │
│ │ Driver probe() │ │
│ └─────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │pci_alloc_irq_vectors│ │
│ └─────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Check MSI/MSI-X │ │
│ │ capability │ │
│ └─────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Allocate IRQ vectors│ │
│ └─────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Program MSI │ │
│ │ registers │ │
│ └─────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ request_irq() for │ │
│ │ each vector │ │
│ └─────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Enable device │ │
│ │ interrupts │ │
│ └─────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
MSI Implementation in Linux Kernel
Core MSI Functions
The Linux kernel provides several key functions for MSI handling:
/* Allocate IRQ vectors for a PCI device */
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags);
/* Get the Linux IRQ number for a specific MSI vector */
int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
/* Request an interrupt handler for a specific IRQ */
int request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
const char *name, void *dev);
/* Free previously allocated IRQ vectors */
void pci_free_irq_vectors(struct pci_dev *dev);
The flags parameter in pci_alloc_irq_vectors() can include:
Concrete Implementation Example
Here's how a typical device driver would implement MSI support:
static int my_driver_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct my_device *dev;
int ret, nvecs, i;
/* Allocate device structure */
dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
if (!dev)
return -ENOMEM;
/* Enable the PCI device */
ret = pci_enable_device(pdev);
if (ret)
return ret;
/* Set up DMA */
ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
if (ret) {
ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
if (ret) {
dev_err(&pdev->dev, "No usable DMA configuration\n");
goto err_disable_device;
}
}
/* Request memory regions */
ret = pci_request_regions(pdev, "my_driver");
if (ret)
goto err_disable_device;
/* Map device memory */
dev->regs = pci_iomap(pdev, 0, 0);
if (!dev->regs) {
ret = -ENOMEM;
goto err_release_regions;
}
/* Try to enable MSI-X first, then MSI, then fall back to legacy interrupts */
nvecs = pci_alloc_irq_vectors(pdev, 1, MY_MAX_VECTORS,
PCI_IRQ_MSIX | PCI_IRQ_MSI | PCI_IRQ_LEGACY);
if (nvecs < 0) {
dev_err(&pdev->dev, "Failed to allocate IRQ vectors\n");
ret = nvecs;
goto err_iounmap;
}
dev->num_vectors = nvecs;
dev_info(&pdev->dev, "Allocated %d %s vectors\n", nvecs,
pdev->msix_enabled ? "MSI-X" :
pdev->msi_enabled ? "MSI" : "legacy");
/* Request interrupt handlers for each vector */
for (i = 0; i < nvecs; i++) {
int irq = pci_irq_vector(pdev, i);
ret = request_irq(irq, my_interrupt_handler, 0, "my_driver", dev);
if (ret) {
dev_err(&pdev->dev, "Failed to request IRQ %d\n", irq);
goto err_free_irqs;
}
dev->irq[i] = irq;
}
/* Initialize device */
ret = my_device_init(dev);
if (ret)
goto err_free_irqs;
/* Store device data */
pci_set_drvdata(pdev, dev);
/* Enable interrupts in the device */
my_enable_interrupts(dev);
return 0;
err_free_irqs:
while (--i >= 0)
free_irq(dev->irq[i], dev);
pci_free_irq_vectors(pdev);
err_iounmap:
pci_iounmap(pdev, dev->regs);
err_release_regions:
pci_release_regions(pdev);
err_disable_device:
pci_disable_device(pdev);
return ret;
}
static void my_driver_remove(struct pci_dev *pdev)
{
struct my_device *dev = pci_get_drvdata(pdev);
int i;
/* Disable interrupts in the device */
my_disable_interrupts(dev);
/* Free interrupt handlers */
for (i = 0; i < dev->num_vectors; i++)
free_irq(dev->irq[i], dev);
/* Free IRQ vectors */
pci_free_irq_vectors(pdev);
/* Clean up device */
my_device_cleanup(dev);
/* Unmap and release resources */
pci_iounmap(pdev, dev->regs);
pci_release_regions(pdev);
pci_disable_device(pdev);
}
Interrupt Handler Implementation
static irqreturn_t my_interrupt_handler(int irq, void *dev_id)
{
struct my_device *dev = dev_id;
u32 status;
/* Read interrupt status register */
status = readl(dev->regs + MY_INTR_STATUS_REG);
/* If this interrupt is not for us */
if (!status)
return IRQ_NONE;
/* Clear the interrupt */
writel(status, dev->regs + MY_INTR_STATUS_REG);
/* Process the interrupt */
if (status & MY_RX_INTR_MASK)
my_process_rx(dev);
if (status & MY_TX_INTR_MASK)
my_process_tx(dev);
if (status & MY_ERR_INTR_MASK)
my_process_errors(dev);
return IRQ_HANDLED;
}
Real-World Example: RTL8169 Driver
The Realtek RTL8169 Ethernet driver in the Linux kernel provides a good example of MSI implementation. Here's how it handles MSI setup:
static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
{
// ... existing code ...
/* Allocate and register IRQ for the device */
if (pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES) < 0) {
/* Handle error */
}
/* Register interrupt handler */
ret = request_irq(pdev->irq, rtl8169_interrupt, IRQF_SHARED,
dev->name, dev);
/* Enable interrupts in the hardware */
RTL_W16(tp, IntrMask, rtl8169_intr_mask);
// ... existing code ...
}
The RTL8169 driver uses the IntrMask register (at offset 0x3c) to enable specific interrupts in the hardware. The driver sets up a single interrupt handler that can handle both MSI and legacy interrupts, making the code more maintainable.
MSI-X and Per-CPU Interrupts
MSI-X provides more advanced capabilities, particularly for high-performance devices that need to distribute interrupts across multiple CPUs:
┌───────────────────────────────────────────────────────────┐
│ │
│ MSI-X CPU Affinity │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Multi-Queue Device │ │ MSI-X Table │ │
│ │ (e.g., NIC) │ │ │ │
│ │ │ │ ┌─────┬─────┬───────┐ │ │
│ │ ┌─────────────┐ │ │ │Vec 0│Addr0│Data 0 │ │ │
│ │ │ TX Queue 0 ├─────┼────┼─► │ │ │ │ │
│ │ └─────────────┘ │ │ ├─────┼─────┼───────┤ │ │
│ │ ┌─────────────┐ │ │ │Vec 1│Addr1│Data 1 │ │ │
│ │ │ TX Queue 1 ├─────┼────┼─► │ │ │ │ │
│ │ └─────────────┘ │ │ ├─────┼─────┼───────┤ │ │
│ │ ┌─────────────┐ │ │ │Vec 2│Addr2│Data 2 │ │ │
│ │ │ RX Queue 0 ├─────┼────┼─► │ │ │ │ │
│ │ └─────────────┘ │ │ ├─────┼─────┼───────┤ │ │
│ │ ┌─────────────┐ │ │ │Vec 3│Addr3│Data 3 │ │ │
│ │ │ RX Queue 1 ├─────┼────┼─► │ │ │ │ │
│ │ └─────────────┘ │ │ └─────┴─────┴───────┘ │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ IRQ Affinity Setting │ │
│ │ │ │
│ │ CPU0 ◄── Vector 0, 2 │ │
│ │ CPU1 ◄── Vector 1, 3 │ │
│ └───────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
MSI-X Implementation Example
Here's how a driver might implement MSI-X with CPU affinity:
static int setup_msix_interrupts(struct my_device *dev)
{
int i, ret, cpu, vector = 0;
int num_online_cpus = num_online_cpus();
/* Allocate MSI-X vectors */
ret = pci_alloc_irq_vectors(dev->pdev, dev->num_queues * 2,
dev->num_queues * 2, PCI_IRQ_MSIX);
if (ret < 0)
return ret;
/* Set up TX queue interrupts */
for (i = 0; i < dev->num_queues; i++) {
int irq = pci_irq_vector(dev->pdev, vector);
ret = request_irq(irq, my_tx_interrupt_handler, 0,
"my_driver-tx", &dev->tx_queue[i]);
if (ret)
goto err_free_irqs;
/* Set CPU affinity for this interrupt */
cpu = i % num_online_cpus;
irq_set_affinity_hint(irq, cpumask_of(cpu));
dev->tx_queue[i].irq = irq;
dev->tx_queue[i].vector = vector++;
}
/* Set up RX queue interrupts */
for (i = 0; i < dev->num_queues; i++) {
int irq = pci_irq_vector(dev->pdev, vector);
ret = request_irq(irq, my_rx_interrupt_handler, 0,
"my_driver-rx", &dev->rx_queue[i]);
if (ret)
goto err_free_tx_irqs;
/* Set CPU affinity for this interrupt */
cpu = i % num_online_cpus;
irq_set_affinity_hint(irq, cpumask_of(cpu));
dev->rx_queue[i].irq = irq;
dev->rx_queue[i].vector = vector++;
}
return 0;
err_free_tx_irqs:
while (--i >= 0) {
irq_set_affinity_hint(dev->rx_queue[i].irq, NULL);
free_irq(dev->rx_queue[i].irq, &dev->rx_queue[i]);
}
i = dev->num_queues;
err_free_irqs:
while (--i >= 0) {
irq_set_affinity_hint(dev->tx_queue[i].irq, NULL);
free_irq(dev->tx_queue[i].irq, &dev->tx_queue[i]);
}
pci_free_irq_vectors(dev->pdev);
return ret;
}
Interrupt Coalescing with MSI
Many high-performance devices implement interrupt coalescing to reduce interrupt overhead:
┌───────────────────────────────────────────────────────────┐
│ │
│ Interrupt Coalescing │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Without Coalescing │ │
│ │ │ │
│ │ Packet 1 Packet 2 Packet 3 Packet 4 │ │
│ │ │ │ │ │ │ │
│ │ ▼ ▼ ▼ ▼ │ │
│ │ Interrupt Interrupt Interrupt Interrupt │ │
│ │ │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ With Coalescing │ │
│ │ │ │
│ │ Packet 1 Packet 2 Packet 3 Packet 4 │ │
│ │ │ │ │ │ │ │
│ │ └──────────┴──────────┴──────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Interrupt │ │
│ │ │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
The RTL8169 driver implements interrupt coalescing through the IntrMitigate register:
/* From the RTL8169 driver */
#define RTL_COALESCE_TX_USECS GENMASK(15, 12)
#define RTL_COALESCE_TX_FRAMES GENMASK(11, 8)
#define RTL_COALESCE_RX_USECS GENMASK(7, 4)
#define RTL_COALESCE_RX_FRAMES GENMASK(3, 0)
static void rtl_set_coalesce(struct net_device *dev,
struct ethtool_coalesce *ec)
{
struct rtl8169_private *tp = netdev_priv(dev);
u16 intr_mitigation = 0;
/* Set TX time limit */
intr_mitigation |= min(ec->tx_coalesce_usecs, RTL_COALESCE_T_MAX) << 12;
/* Set TX packet count limit */
intr_mitigation |= min(ec->tx_max_coalesced_frames,
RTL_COALESCE_FRAME_MAX) << 8;
/* Set RX time limit */
intr_mitigation |= min(ec->rx_coalesce_usecs, RTL_COALESCE_T_MAX) << 4;
/* Set RX packet count limit */
intr_mitigation |= min(ec->rx_max_coalesced_frames,
RTL_COALESCE_FRAME_MAX);
/* Write to hardware register */
RTL_W16(tp, IntrMitigate, intr_mitigation);
}
Debugging MSI Issues
When debugging MSI-related issues, several tools and techniques are useful:
Conclusion
Message Signaled Interrupts represent a significant improvement over traditional pin-based interrupts, offering better scalability, reduced IRQ sharing, and improved performance. The Linux kernel provides a comprehensive framework for MSI support, making it easy for device drivers to take advantage of these benefits.
Modern device drivers should always attempt to use MSI or MSI-X when available, falling back to legacy interrupts only when necessary. With the increasing number of cores in modern processors and the growing bandwidth of I/O devices, the ability to efficiently distribute interrupts across CPUs has become essential for high-performance systems.
By understanding how MSI works and how to implement it in device drivers, developers can create more efficient and scalable systems that make the best use of modern hardware capabilities.
Embedded Firmware & Device Drivers Developer with Linux System connectivity platforms expertise
3moThanks for sharing, David