Understanding PCI MSI (Message Signaled Interrupts) in Linux

Understanding PCI MSI (Message Signaled Interrupts) in Linux

Message Signaled Interrupts (MSI) represent a significant advancement in how modern computer systems handle device interrupts. This article explores how MSI works, its implementation in the Linux kernel, and provides concrete examples of MSI usage in device drivers.

Traditional Interrupts vs. MSI

┌───────────────────────────────────────────────────────────┐
│                                                           │
│            Traditional Pin-Based Interrupts               │
│                                                           │
│  ┌─────────────────────┐    ┌───────────────────────┐     │
│  │ PCI Device          │    │ Interrupt Controller  │     │
│  │                     │    │ (PIC/APIC/IOAPIC)     │     │
│  │                     │    │                       │     │
│  │  ┌───────────┐      │    │  ┌───────────┐       │     │
│  │  │ INT# Pin  ├──────┼────┼─►│ IRQ Line  │       │     │
│  │  └───────────┘      │    │  └─────┬─────┘       │     │
│  │                     │    │        │             │     │
│  └─────────────────────┘    └────────┼─────────────┘     │
│                                      │                    │
│                                      ▼                    │
│                             ┌─────────────────┐           │
│                             │ CPU             │           │
│                             │                 │           │
│                             └─────────────────┘           │
│                                                           │
└───────────────────────────────────────────────────────────┘

┌───────────────────────────────────────────────────────────┐
│                                                           │
│              Message Signaled Interrupts                  │
│                                                           │
│  ┌─────────────────────┐                                  │
│  │ PCI Device          │                                  │
│  │                     │                                  │
│  │  ┌───────────┐      │    ┌───────────────────────┐     │
│  │  │ PCI       │      │    │ Memory-Mapped         │     │
│  │  │ Config    │      │    │ Message Address       │     │
│  │  │ Space     │      │    │                       │     │
│  │  └───────────┘      │    └───────────┬───────────┘     │
│  │        │            │                │                 │
│  │        │ Memory Write                │                 │
│  │        └────────────┼────────────────┘                 │
│  │                     │                                  │
│  └─────────────────────┘                                  │
│                                      │                    │
│                                      ▼                    │
│                             ┌─────────────────┐           │
│                             │ CPU             │           │
│                             │                 │           │
│                             └─────────────────┘           │
│                                                           │
└───────────────────────────────────────────────────────────┘
        

Traditional Interrupt Mechanism

In traditional pin-based interrupts:

  1. Each PCI device uses a physical pin (INTA#, INTB#, INTC#, or INTD#) to signal interrupts
  2. These pins connect to interrupt controller lines (IRQs)
  3. Multiple devices often share the same IRQ line, leading to IRQ sharing issues
  4. The CPU must poll devices to determine which one triggered a shared interrupt

MSI Mechanism

With Message Signaled Interrupts:

  1. No physical interrupt pins are used
  2. Devices generate interrupts by writing a specific data value to a pre-defined memory address
  3. Each device can have multiple MSI vectors (up to 32 for MSI, 2048 for MSI-X)
  4. Each interrupt vector gets a unique message address and data value
  5. No IRQ sharing is needed, improving performance and simplifying debugging

MSI in the PCI Specification

┌───────────────────────────────────────────────────────────┐
│                                                           │
│                  PCI MSI Capability                       │
│                                                           │
│  ┌─────────────────────────────────────────────────┐      │
│  │ PCI Configuration Space                         │      │
│  │                                                 │      │
│  │  ┌───────────────────────┐                      │      │
│  │  │ MSI Capability Header │                      │      │
│  │  │ (ID: 0x05)            │                      │      │
│  │  └───────────────────────┘                      │      │
│  │  ┌───────────────────────┐                      │      │
│  │  │ Message Control       │                      │      │
│  │  │ - MSI Enable          │                      │      │
│  │  │ - Multiple Message    │                      │      │
│  │  │ - 64-bit Address      │                      │      │
│  │  └───────────────────────┘                      │      │
│  │  ┌───────────────────────┐                      │      │
│  │  │ Message Address       │                      │      │
│  │  │ (Lower 32-bits)       │                      │      │
│  │  └───────────────────────┘                      │      │
│  │  ┌───────────────────────┐                      │      │
│  │  │ Message Address       │                      │      │
│  │  │ (Upper 32-bits)       │ [Optional]           │      │
│  │  └───────────────────────┘                      │      │
│  │  ┌───────────────────────┐                      │      │
│  │  │ Message Data          │                      │      │
│  │  └───────────────────────┘                      │      │
│  │  ┌───────────────────────┐                      │      │
│  │  │ Mask Bits             │ [Optional]           │      │
│  │  └───────────────────────┘                      │      │
│  │  ┌───────────────────────┐                      │      │
│  │  │ Pending Bits          │ [Optional]           │      │
│  │  └───────────────────────┘                      │      │
│  │                                                 │      │
│  └─────────────────────────────────────────────────┘      │
│                                                           │
└───────────────────────────────────────────────────────────┘
        

The PCI specification defines two types of Message Signaled Interrupts:

  1. MSI (Message Signaled Interrupts)
  2. MSI-X (Extended Message Signaled Interrupts)

How MSI Works

Article content

1. Software Configuration:

- The OS/driver configures the MSI Capability Structure in the PCIe device’s configuration space. This includes:

- MSI Address: The memory address (physical or IOVA) of the interrupt controller’s register.

- MSI Data: The interrupt vector or value to write (determines which CPU interrupt is triggered).

- Message Control Register: Enables MSI, sets the number of supported vectors, etc.

2. Hardware Automation:

- Once configured and enabled, the PCIe device hardware autonomously generates a Memory Write TLP whenever it needs to signal an interrupt.

- The TLP includes:

- The pre-programmed MSI Address (destination in memory).

- The pre-programmed MSI Data (interrupt vector).

- No further software involvement is required for individual interrupts.

MSI in the Linux Kernel Architecture

┌───────────────────────────────────────────────────────────┐
│                                                           │
│                Linux Kernel MSI Framework                 │
│                                                           │
│  ┌─────────────────────┐    ┌───────────────────────┐     │
│  │ Device Driver       │    │ PCI Subsystem         │     │
│  │                     │    │                       │     │
│  │pci_alloc_irq_vectors│───►│ pci_msi_vec_count    │     │
│  │ pci_irq_vector      │    │ pci_enable_msi        │     │
│  │ request_irq         │    │ pci_enable_msix       │     │
│  └─────────┬───────────┘    └───────────┬───────────┘     │
│            │                            │                 │
│            └────────────┬───────────────┘                 │
│                         │                                 │
│                         ▼                                 │
│             ┌───────────────────────┐                     │
│             │ MSI/MSI-X Core        │                     │
│             │                       │                     │
│             │ msi_domain_alloc_irqs │                     │
│             │ msi_domain_free_irqs  │                     │
│             └───────────┬───────────┘                     │
│                         │                                 │
│                         ▼                                 │
│             ┌───────────────────────┐                     │
│             │ IRQ Domain            │                     │
│             │                       │                     │
│             │ irq_domain_alloc_irqs │                     │
│             │ irq_domain_free_irqs  │                     │
│             └───────────┬───────────┘                     │
│                         │                                 │
│                         ▼                                 │
│             ┌───────────────────────┐                     │
│             │ Architecture-specific │                     │
│             │ MSI Implementation    │                     │
│             │ (x86, ARM, etc.)      │                     │
│             └───────────────────────┘                     │
│                                                           │
└───────────────────────────────────────────────────────────┘
        

The Linux kernel implements a layered approach to MSI handling:

  1. Device Driver Layer: Requests MSI/MSI-X interrupts through the PCI subsystem
  2. PCI Subsystem: Manages PCI device configuration and MSI capability detection
  3. MSI Core: Provides generic MSI allocation and management functions
  4. IRQ Domain: Maps MSI vectors to Linux IRQ numbers
  5. Architecture-specific Layer: Implements hardware-specific MSI programming

MSI Initialization Flow

┌───────────────────────────────────────────────────────────┐
│                                                           │
│                  MSI Initialization Flow                  │
│                                                           │
│  ┌─────────────────────┐                                  │
│  │ Driver probe()      │                                  │
│  └─────────┬───────────┘                                  │
│            │                                              │
│            ▼                                              │
│  ┌─────────────────────┐                                  │
│  │pci_alloc_irq_vectors│                                 │
│  └─────────┬───────────┘                                  │
│            │                                              │
│            ▼                                              │
│  ┌─────────────────────┐                                  │
│  │ Check MSI/MSI-X     │                                  │
│  │ capability          │                                  │
│  └─────────┬───────────┘                                  │
│            │                                              │
│            ▼                                              │
│  ┌─────────────────────┐                                  │
│  │ Allocate IRQ vectors│                                  │
│  └─────────┬───────────┘                                  │
│            │                                              │
│            ▼                                              │
│  ┌─────────────────────┐                                  │
│  │ Program MSI         │                                  │
│  │ registers           │                                  │
│  └─────────┬───────────┘                                  │
│            │                                              │
│            ▼                                              │
│  ┌─────────────────────┐                                  │
│  │ request_irq() for   │                                  │
│  │ each vector         │                                  │
│  └─────────┬───────────┘                                  │
│            │                                              │
│            ▼                                              │
│  ┌─────────────────────┐                                  │
│  │ Enable device       │                                  │
│  │ interrupts          │                                  │
│  └─────────────────────┘                                  │
│                                                           │
└───────────────────────────────────────────────────────────┘
        

MSI Implementation in Linux Kernel

Core MSI Functions

The Linux kernel provides several key functions for MSI handling:

/* Allocate IRQ vectors for a PCI device */
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
                          unsigned int max_vecs, unsigned int flags);

/* Get the Linux IRQ number for a specific MSI vector */
int pci_irq_vector(struct pci_dev *dev, unsigned int nr);

/* Request an interrupt handler for a specific IRQ */
int request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
                const char *name, void *dev);

/* Free previously allocated IRQ vectors */
void pci_free_irq_vectors(struct pci_dev *dev);
        

The flags parameter in pci_alloc_irq_vectors() can include:

  • PCI_IRQ_MSI: Allow using MSI
  • PCI_IRQ_MSIX: Allow using MSI-X
  • PCI_IRQ_LEGACY: Allow using legacy interrupts
  • PCI_IRQ_AFFINITY: Set up interrupt affinity

Concrete Implementation Example

Here's how a typical device driver would implement MSI support:

static int my_driver_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
    struct my_device *dev;
    int ret, nvecs, i;
    
    /* Allocate device structure */
    dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
    if (!dev)
        return -ENOMEM;
    
    /* Enable the PCI device */
    ret = pci_enable_device(pdev);
    if (ret)
        return ret;
    
    /* Set up DMA */
    ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
    if (ret) {
        ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
        if (ret) {
            dev_err(&pdev->dev, "No usable DMA configuration\n");
            goto err_disable_device;
        }
    }
    
    /* Request memory regions */
    ret = pci_request_regions(pdev, "my_driver");
    if (ret)
        goto err_disable_device;
    
    /* Map device memory */
    dev->regs = pci_iomap(pdev, 0, 0);
    if (!dev->regs) {
        ret = -ENOMEM;
        goto err_release_regions;
    }
    
    /* Try to enable MSI-X first, then MSI, then fall back to legacy interrupts */
    nvecs = pci_alloc_irq_vectors(pdev, 1, MY_MAX_VECTORS,
                                 PCI_IRQ_MSIX | PCI_IRQ_MSI | PCI_IRQ_LEGACY);
    if (nvecs < 0) {
        dev_err(&pdev->dev, "Failed to allocate IRQ vectors\n");
        ret = nvecs;
        goto err_iounmap;
    }
    
    dev->num_vectors = nvecs;
    dev_info(&pdev->dev, "Allocated %d %s vectors\n", nvecs,
             pdev->msix_enabled ? "MSI-X" :
             pdev->msi_enabled ? "MSI" : "legacy");
    
    /* Request interrupt handlers for each vector */
    for (i = 0; i < nvecs; i++) {
        int irq = pci_irq_vector(pdev, i);
        
        ret = request_irq(irq, my_interrupt_handler, 0, "my_driver", dev);
        if (ret) {
            dev_err(&pdev->dev, "Failed to request IRQ %d\n", irq);
            goto err_free_irqs;
        }
        
        dev->irq[i] = irq;
    }
    
    /* Initialize device */
    ret = my_device_init(dev);
    if (ret)
        goto err_free_irqs;
    
    /* Store device data */
    pci_set_drvdata(pdev, dev);
    
    /* Enable interrupts in the device */
    my_enable_interrupts(dev);
    
    return 0;

err_free_irqs:
    while (--i >= 0)
        free_irq(dev->irq[i], dev);
    pci_free_irq_vectors(pdev);
err_iounmap:
    pci_iounmap(pdev, dev->regs);
err_release_regions:
    pci_release_regions(pdev);
err_disable_device:
    pci_disable_device(pdev);
    return ret;
}

static void my_driver_remove(struct pci_dev *pdev)
{
    struct my_device *dev = pci_get_drvdata(pdev);
    int i;
    
    /* Disable interrupts in the device */
    my_disable_interrupts(dev);
    
    /* Free interrupt handlers */
    for (i = 0; i < dev->num_vectors; i++)
        free_irq(dev->irq[i], dev);
    
    /* Free IRQ vectors */
    pci_free_irq_vectors(pdev);
    
    /* Clean up device */
    my_device_cleanup(dev);
    
    /* Unmap and release resources */
    pci_iounmap(pdev, dev->regs);
    pci_release_regions(pdev);
    pci_disable_device(pdev);
}
        

Interrupt Handler Implementation

static irqreturn_t my_interrupt_handler(int irq, void *dev_id)
{
    struct my_device *dev = dev_id;
    u32 status;
    
    /* Read interrupt status register */
    status = readl(dev->regs + MY_INTR_STATUS_REG);
    
    /* If this interrupt is not for us */
    if (!status)
        return IRQ_NONE;
    
    /* Clear the interrupt */
    writel(status, dev->regs + MY_INTR_STATUS_REG);
    
    /* Process the interrupt */
    if (status & MY_RX_INTR_MASK)
        my_process_rx(dev);
    
    if (status & MY_TX_INTR_MASK)
        my_process_tx(dev);
    
    if (status & MY_ERR_INTR_MASK)
        my_process_errors(dev);
    
    return IRQ_HANDLED;
}
        

Real-World Example: RTL8169 Driver

The Realtek RTL8169 Ethernet driver in the Linux kernel provides a good example of MSI implementation. Here's how it handles MSI setup:

static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
{
    // ... existing code ...
    
    /* Allocate and register IRQ for the device */
    if (pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES) < 0) {
        /* Handle error */
    }
    
    /* Register interrupt handler */
    ret = request_irq(pdev->irq, rtl8169_interrupt, IRQF_SHARED, 
                     dev->name, dev);
    
    /* Enable interrupts in the hardware */
    RTL_W16(tp, IntrMask, rtl8169_intr_mask);
    
    // ... existing code ...
}
        

The RTL8169 driver uses the IntrMask register (at offset 0x3c) to enable specific interrupts in the hardware. The driver sets up a single interrupt handler that can handle both MSI and legacy interrupts, making the code more maintainable.

MSI-X and Per-CPU Interrupts

MSI-X provides more advanced capabilities, particularly for high-performance devices that need to distribute interrupts across multiple CPUs:

┌───────────────────────────────────────────────────────────┐
│                                                           │
│                  MSI-X CPU Affinity                       │
│                                                           │
│  ┌─────────────────────┐    ┌───────────────────────┐     │
│  │ Multi-Queue Device  │    │ MSI-X Table           │     │
│  │ (e.g., NIC)         │    │                       │     │
│  │                     │    │ ┌─────┬─────┬───────┐ │     │
│  │ ┌─────────────┐     │    │ │Vec 0│Addr0│Data 0 │ │     │
│  │ │ TX Queue 0  ├─────┼────┼─►     │     │       │ │     │
│  │ └─────────────┘     │    │ ├─────┼─────┼───────┤ │     │
│  │ ┌─────────────┐     │    │ │Vec 1│Addr1│Data 1 │ │     │
│  │ │ TX Queue 1  ├─────┼────┼─►     │     │       │ │     │
│  │ └─────────────┘     │    │ ├─────┼─────┼───────┤ │     │
│  │ ┌─────────────┐     │    │ │Vec 2│Addr2│Data 2 │ │     │
│  │ │ RX Queue 0  ├─────┼────┼─►     │     │       │ │     │
│  │ └─────────────┘     │    │ ├─────┼─────┼───────┤ │     │
│  │ ┌─────────────┐     │    │ │Vec 3│Addr3│Data 3 │ │     │
│  │ │ RX Queue 1  ├─────┼────┼─►     │     │       │ │     │
│  │ └─────────────┘     │    │ └─────┴─────┴───────┘ │     │
│  └─────────────────────┘    └───────────────────────┘     │
│                                         │                 │
│                                         ▼                 │
│                             ┌───────────────────────┐     │
│                             │ IRQ Affinity Setting  │     │
│                             │                       │     │
│                             │ CPU0 ◄── Vector 0, 2  │     │
│                             │ CPU1 ◄── Vector 1, 3  │     │
│                             └───────────────────────┘     │
│                                                           │
└───────────────────────────────────────────────────────────┘
        

MSI-X Implementation Example

Here's how a driver might implement MSI-X with CPU affinity:

static int setup_msix_interrupts(struct my_device *dev)
{
    int i, ret, cpu, vector = 0;
    int num_online_cpus = num_online_cpus();
    
    /* Allocate MSI-X vectors */
    ret = pci_alloc_irq_vectors(dev->pdev, dev->num_queues * 2, 
                               dev->num_queues * 2, PCI_IRQ_MSIX);
    if (ret < 0)
        return ret;
    
    /* Set up TX queue interrupts */
    for (i = 0; i < dev->num_queues; i++) {
        int irq = pci_irq_vector(dev->pdev, vector);
        
        ret = request_irq(irq, my_tx_interrupt_handler, 0, 
                         "my_driver-tx", &dev->tx_queue[i]);
        if (ret)
            goto err_free_irqs;
        
        /* Set CPU affinity for this interrupt */
        cpu = i % num_online_cpus;
        irq_set_affinity_hint(irq, cpumask_of(cpu));
        
        dev->tx_queue[i].irq = irq;
        dev->tx_queue[i].vector = vector++;
    }
    
    /* Set up RX queue interrupts */
    for (i = 0; i < dev->num_queues; i++) {
        int irq = pci_irq_vector(dev->pdev, vector);
        
        ret = request_irq(irq, my_rx_interrupt_handler, 0, 
                         "my_driver-rx", &dev->rx_queue[i]);
        if (ret)
            goto err_free_tx_irqs;
        
        /* Set CPU affinity for this interrupt */
        cpu = i % num_online_cpus;
        irq_set_affinity_hint(irq, cpumask_of(cpu));
        
        dev->rx_queue[i].irq = irq;
        dev->rx_queue[i].vector = vector++;
    }
    
    return 0;

err_free_tx_irqs:
    while (--i >= 0) {
        irq_set_affinity_hint(dev->rx_queue[i].irq, NULL);
        free_irq(dev->rx_queue[i].irq, &dev->rx_queue[i]);
    }
    i = dev->num_queues;
    
err_free_irqs:
    while (--i >= 0) {
        irq_set_affinity_hint(dev->tx_queue[i].irq, NULL);
        free_irq(dev->tx_queue[i].irq, &dev->tx_queue[i]);
    }
    pci_free_irq_vectors(dev->pdev);
    return ret;
}
        

Interrupt Coalescing with MSI

Many high-performance devices implement interrupt coalescing to reduce interrupt overhead:

┌───────────────────────────────────────────────────────────┐
│                                                           │
│                  Interrupt Coalescing                     │
│                                                           │
│  ┌─────────────────────────────────────────────────┐      │
│  │ Without Coalescing                              │      │
│  │                                                 │      │
│  │  Packet 1   Packet 2   Packet 3   Packet 4      │      │
│  │     │          │          │          │          │      │
│  │     ▼          ▼          ▼          ▼          │      │
│  │  Interrupt  Interrupt  Interrupt  Interrupt     │      │
│  │                                                 │      │
│  └─────────────────────────────────────────────────┘      │
│                                                           │
│  ┌─────────────────────────────────────────────────┐      │
│  │ With Coalescing                                 │      │
│  │                                                 │      │
│  │  Packet 1   Packet 2   Packet 3   Packet 4      │      │
│  │     │          │          │          │          │      │
│  │     └──────────┴──────────┴──────────┘          │      │
│  │                    │                            │      │
│  │                    ▼                            │      │
│  │                 Interrupt                       │      │
│  │                                                 │      │
│  └─────────────────────────────────────────────────┘      │
│                                                           │
└───────────────────────────────────────────────────────────┘
        

The RTL8169 driver implements interrupt coalescing through the IntrMitigate register:

/* From the RTL8169 driver */
#define RTL_COALESCE_TX_USECS	GENMASK(15, 12)
#define RTL_COALESCE_TX_FRAMES	GENMASK(11, 8)
#define RTL_COALESCE_RX_USECS	GENMASK(7, 4)
#define RTL_COALESCE_RX_FRAMES	GENMASK(3, 0)

static void rtl_set_coalesce(struct net_device *dev, 
                            struct ethtool_coalesce *ec)
{
    struct rtl8169_private *tp = netdev_priv(dev);
    u16 intr_mitigation = 0;
    
    /* Set TX time limit */
    intr_mitigation |= min(ec->tx_coalesce_usecs, RTL_COALESCE_T_MAX) << 12;
    
    /* Set TX packet count limit */
    intr_mitigation |= min(ec->tx_max_coalesced_frames, 
                          RTL_COALESCE_FRAME_MAX) << 8;
    
    /* Set RX time limit */
    intr_mitigation |= min(ec->rx_coalesce_usecs, RTL_COALESCE_T_MAX) << 4;
    
    /* Set RX packet count limit */
    intr_mitigation |= min(ec->rx_max_coalesced_frames, 
                          RTL_COALESCE_FRAME_MAX);
    
    /* Write to hardware register */
    RTL_W16(tp, IntrMitigate, intr_mitigation);
}
        

Debugging MSI Issues

When debugging MSI-related issues, several tools and techniques are useful:

  1. Check if MSI is enabled:cat /proc/interrupts MSI interrupts will show up with names like pci-msi-edge or IR-PCI-MSI-edge.
  2. Check PCI device capabilities:lspci -vv Look for "MSI" or "MSI-X" in the capabilities list.
  3. Force legacy interrupts (for testing):modprobe <driver_name> disable_msi=1 Many drivers have options to disable MSI.
  4. Kernel boot parameters:pci=nomsi disables MSI for all devices.

Conclusion

Message Signaled Interrupts represent a significant improvement over traditional pin-based interrupts, offering better scalability, reduced IRQ sharing, and improved performance. The Linux kernel provides a comprehensive framework for MSI support, making it easy for device drivers to take advantage of these benefits.

Modern device drivers should always attempt to use MSI or MSI-X when available, falling back to legacy interrupts only when necessary. With the increasing number of cores in modern processors and the growing bandwidth of I/O devices, the ability to efficiently distribute interrupts across CPUs has become essential for high-performance systems.

By understanding how MSI works and how to implement it in device drivers, developers can create more efficient and scalable systems that make the best use of modern hardware capabilities.

lokesh babu cheluri

Embedded Firmware & Device Drivers Developer with Linux System connectivity platforms expertise

3mo

Thanks for sharing, David

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics