Understanding the Linux Interrupt Subsystem
Interrupts are a fundamental mechanism in operating systems that allow hardware devices to signal the CPU when they need attention. The Linux kernel's interrupt subsystem provides a sophisticated framework for handling these asynchronous events efficiently. This article explores the architecture and of the Linux interrupt subsystem.
What are Interrupts?
┌─────────────────────────────────────────────────────────┐
│ │
│ CPU │
│ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ Current Task │ │ Interrupt │ │
│ │ Execution │───▶│ Handler │ │
│ └───────────────┘ └───────┬───────┘ │
│ │ │
└────────────────────────────────┼────────────────────────┘
│
│ Interrupt Signal
│
┌────────────────────────────────▼────────────────────────┐
│ │
│ Hardware Devices │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Network │ │ Disk │ │ Keyboard │ │
│ │ Card │ │ Controller│ │ │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │
└─────────────────────────────────────────────────────────┘
Interrupts allow hardware devices to signal the CPU when they need attention, such as when:
When an interrupt occurs, the CPU temporarily suspends its current execution, saves its state, and jumps to a specific interrupt handler routine to service the interrupt.
Linux Interrupt Subsystem Architecture
┌───────────────────────────────────────────────────────────┐
│ │
│ User Applications │
│ │
└───────────────────────────┬───────────────────────────────┘
│
│ System Calls
│
┌───────────────────────────▼───────────────────────────────┐
│ │
│ Linux Kernel │
│ │
│ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ │ │ │ │
│ │ Device Drivers │◄────►│ Interrupt Subsystem │ │
│ │ │ │ │ │
│ └────────┬────────┘ └─────────────┬───────────┘ │
│ │ │ │
│ │ │ │
│ ┌────────▼────────┐ ┌─────────────▼───────────┐ │
│ │ │ │ │ │
│ │ Kernel Core │◄────►│ Hardware Layer │ │
│ │ │ │ │ │
│ └─────────────────┘ └─────────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
The Linux interrupt subsystem consists of several key components:
Interrupt Flow in Linux
┌───────────────────────────────────────────────────────────┐
│ │
│ Interrupt Flow │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Hardware │ │ Interrupt │ │ Top Half │ │
│ │ Interrupt │───►│ Controller │───►│ (Handler) │ │
│ └─────────────┘ └─────────────┘ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Bottom Half │ │
│ │ Processing │ │
│ └─────────────┘ │
│ │
└────────────────────────────────────────────────────────────┘
When a hardware interrupt occurs:
Top Half vs. Bottom Half Processing
┌───────────────────────────────────────────────────────────┐
│ │
│ Top Half vs. Bottom Half │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Top Half │ │ Bottom Half │ │
│ │ (Interrupt Context) │ │ (Process Context) │ │
│ │ │ │ │ │
│ │ - Fast execution │ │ - Longer processing │ │
│ │ - Interrupts off │ │ - Interrupts on │ │
│ │ - Cannot sleep │ │ - Can sleep │ │
│ │ - Minimal work │ │ - Complex processing │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
Linux divides interrupt handling into two parts:
1. Top Half (Interrupt Handler):
- Runs with interrupts disabled
- Must execute quickly
- Cannot sleep or block
- Acknowledges the interrupt and saves essential data
- Schedules the bottom half for later execution
2. Bottom Half (Deferred Work):
- Runs with interrupts enabled
- Can take more time to execute
- Can sleep if necessary
- Processes the data collected by the top half
- Implemented as softirqs, tasklets, or work queues
Interrupt Types in Linux
┌───────────────────────────────────────────────────────────┐
│ │
│ Interrupt Types │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Hardware Interrupts │ │ Software Interrupts │ │
│ │ │ │ │ │
│ │ - Device-generated │ │ - Kernel-generated │ │
│ │ - Asynchronous │ │ - Synchronous │ │
│ │ - External events │ │ - Internal events │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Maskable Interrupts │ │ Non-maskable Interrupts│ │
│ │ │ │ │ │
│ │ - Can be disabled │ │ - Cannot be disabled │ │
│ │ - Normal priority │ │ - Highest priority │ │
│ │ - Most devices │ │ - Critical hardware │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
Linux handles several types of interrupts:
Interrupt Request (IRQ) Numbers
┌───────────────────────────────────────────────────────────┐
│ │
│ IRQ Allocation │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Static IRQs (0-15) │ │ Dynamic IRQs (>16) │ │
│ │ │ │ │ │
│ │ - Legacy devices │ │ - Modern devices │ │
│ │ - Fixed assignments │ │ - Allocated at boot │ │
│ │ - Historical │ │ - PCI, USB, etc. │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Example Static IRQ Assignments │ │
│ │ │ │
│ │ IRQ 0: System Timer │ │
│ │ IRQ 1: Keyboard │ │
│ │ IRQ 2: Cascade for IRQs 8-15 │ │
│ │ IRQ 3: COM2/COM4 │ │
│ │ IRQ 4: COM1/COM3 │ │
│ │ IRQ 8: Real-time Clock │ │
│ │ IRQ 14: Primary IDE │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
Each interrupt source is assigned an IRQ (Interrupt Request) number:
Interrupt Controllers
┌───────────────────────────────────────────────────────────┐
│ │
│ Interrupt Controllers │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Legacy PIC │ │ Advanced APIC │ │
│ │ (8259A) │ │ │ │
│ │ │ │ - Multiprocessor │ │
│ │ - 8+8 IRQs │ │ - 256 IRQs │ │
│ │ - Single CPU │ │ - Per-CPU local APIC │ │
│ │ - Limited features │ │ - I/O APIC │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ GIC │ │ Platform-specific │ │
│ │ (ARM) │ │ Controllers │ │
│ │ │ │ │ │
│ │ - ARM architecture │ │ - Custom hardware │ │
│ │ - Multiprocessor │ │ - Specialized │ │
│ │ - SMP support │ │ - Embedded systems │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
Linux supports various interrupt controllers:
Registering Interrupt Handlers
Device drivers register interrupt handlers using the request_irq() function:
int request_irq(
unsigned int irq,
irq_handler_t handler,
unsigned long flags,
const char *name,
void *dev
);
Where:
Example from a network driver:
static int rtl8169_open(struct net_device *dev)
{
struct rtl8169_private *tp = netdev_priv(dev);
int retval;
// ... existing code ...
retval = request_irq(tp->irq, rtl8169_interrupt, IRQF_SHARED,
dev->name, dev);
if (retval < 0)
goto err_out;
// ... existing code ...
}
Deferred Interrupt Processing
┌───────────────────────────────────────────────────────────┐
│ │
│ Deferred Processing Mechanisms │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Softirqs │ │ Tasklets │ │
│ │ │ │ │ │
│ │ - Static allocation │ │ - Dynamic allocation │ │
│ │ - Parallel execution│ │ - Serial execution │ │
│ │ - System-defined │ │ - Built on softirqs │ │
│ │ - Low-level │ │ - Simpler interface │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Work Queues │ │ Threaded IRQs │ │
│ │ │ │ │ │
│ │ - Kernel threads │ │ - Kernel thread │ │
│ │ - Can sleep │ │ - Can sleep │ │
│ │ - Flexible │ │ - Replaces bottom half│ │
│ │ - General purpose │ │ - Modern approach │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
Linux provides several mechanisms for deferred interrupt processing:
1. Softirqs:
- Limited number of statically defined handlers
- Can run in parallel on multiple CPUs
- Used for high-frequency, performance-critical tasks
- Examples: network RX/TX, timers, scheduling
2. Tasklets:
- Built on top of softirqs
- Dynamically allocated
- Run serially (same tasklet can not run on multiple CPUs simultaneously)
- Simpler interface than softirqs
3. Work Queues:
- Run in the context of kernel worker threads
- Can sleep and block
- Used for tasks that may need to wait for resources
4. Threaded IRQs:
- Run the handler in its own kernel thread
- Can sleep and block
- Modern approach for complex device drivers
Softirq Implementation
┌───────────────────────────────────────────────────────────┐
│ │
│ Softirq Types │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ HI_SOFTIRQ - High priority tasklets │ │
│ │ TIMER_SOFTIRQ - Timer processing │ │
│ │ NET_TX_SOFTIRQ - Network transmit │ │
│ │ NET_RX_SOFTIRQ - Network receive │ │
│ │ BLOCK_SOFTIRQ - Block device operations │ │
│ │ IRQ_POLL_SOFTIRQ - IRQ polling │ │
│ │ TASKLET_SOFTIRQ - Regular tasklets │ │
│ │ SCHED_SOFTIRQ - Scheduler operations │ │
│ │ HRTIMER_SOFTIRQ - High-resolution timers │ │
│ │ RCU_SOFTIRQ - RCU processing │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
Softirqs are processed at specific points in the kernel:
Interrupt Context vs. Process Context
┌───────────────────────────────────────────────────────────┐
│ │
│ Interrupt Context vs. Process Context │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Interrupt Context │ │ Process Context │ │
│ │ │ │ │ │
│ │ - Cannot sleep │ │ - Can sleep │ │
│ │ - Cannot access │ │ - Can access │ │
│ │ user space │ │ user space │ │
│ │ - Limited stack │ │ - Full kernel stack │ │
│ │ - Preemption off │ │ - Preemptible │ │
│ │ - Time-critical │ │ - Not time-critical │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
Understanding the difference between interrupt context and process context is crucial:
1. Interrupt Context:
- Top half handlers and softirqs run in interrupt context
- Cannot sleep or block
- Cannot access user space memory
- Limited stack space
- Preemption is disabled
2. Process Context:
- Threaded IRQs and work queues run in process context
- Can sleep and block
- Can access user space memory (with proper checks)
- Normal kernel stack
- Can be preempted
Interrupt Statistics
Linux provides interrupt statistics through the /proc/interrupts file:
CPU0 CPU1
0: 84 0 IO-APIC 2-edge timer
1: 9 0 IO-APIC 1-edge i8042
8: 0 1 IO-APIC 8-edge rtc0
9: 0 0 IO-APIC 9-fasteoi acpi
12: 15 0 IO-APIC 12-edge i8042
16: 31 0 IO-APIC 16-fasteoi ehci_hcd:usb1
23: 158 0 IO-APIC 23-fasteoi ehci_hcd:usb2
40: 0 0 PCI-MSI 458752-edge PCIe PME
41: 123747 0 PCI-MSI 512000-edge eth0
42: 115032 0 PCI-MSI 524288-edge snd_hda_intel:card0
43: 0 0 PCI-MSI 32768-edge mei_me
44: 536 0 PCI-MSI 360448-edge nvme0q0
45: 944 0 PCI-MSI 360449-edge nvme0q1
This file shows:
Interrupt Handling in SMP Systems
┌───────────────────────────────────────────────────────────┐
│ │
│ SMP Interrupt Handling │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ IRQ Balancing │ │ Per-CPU Interrupts │ │
│ │ │ │ │ │
│ │ - Distribute load │ │ - Dedicated to one CPU│ │
│ │ - Dynamic routing │ │ - No contention │ │
│ │ - irqbalance daemon │ │ - Cache-friendly │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ CPU Affinity │ │ NUMA Considerations │ │
│ │ │ │ │ │
│ │ - Manual assignment │ │ - Local interrupts │ │
│ │ - Performance tuning│ │ - Memory locality │ │
│ │ - /proc/irq/N/smp_ │ │ - Node awareness │ │
│ │ affinity │ │ │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
In multi-processor systems, interrupt handling becomes more complex:
1. IRQ Balancing:
- Distributes interrupts across CPUs
- Implemented by the irqbalance daemon
- Aims to balance interrupt load while maintaining cache locality
2. Per-CPU Interrupts:
- Some interrupts can be dedicated to specific CPUs
- Reduces cache thrashing and lock contention
- Improves performance for high-frequency interrupts
3. CPU Affinity:
- Manually assign interrupts to specific CPUs
- Controlled via /proc/irq/N/smp_affinity
- Useful for performance tuning
Real-Time Considerations
┌───────────────────────────────────────────────────────────┐
│ │
│ Real-Time Interrupt Handling │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Threaded IRQs │ │ IRQ Time Limits │ │
│ │ │ │ │ │
│ │ - Preemptible │ │ - Detect long handlers│ │
│ │ - Prioritized │ │ - Debug facilities │ │
│ │ - Reduced latency │ │ - Latency tracking │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ RT Priority │ │ Interrupt Off Time │ │
│ │ │ │ │ │
│ │ - SCHED_FIFO │ │ - Minimize time with │ │
│ │ - Configurable │ │ interrupts disabled │ │
│ │ - Deterministic │ │ - Critical for RT │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
For real-time systems, interrupt handling requires special considerations:
1. Threaded IRQs:
- Move most interrupt processing to preemptible kernel threads
- Allow higher-priority tasks to preempt interrupt handling
- Reduce interrupt latency
2. Priority-Based Handling:
- Assign priorities to interrupt threads
- Process critical interrupts before less important ones
- Provide deterministic response times
3. Minimizing Interrupt-Off Time:
- Reduce the time spent with interrupts disabled
- Keep top half handlers as short as possible
- Move processing to bottom halves
Interrupt-Driven I/O Example
Let's look at a simplified example of interrupt-driven I/O in a network driver:
┌───────────────────────────────────────────────────────────┐
│ │
│ Network Driver Interrupt Flow │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Packet │ │ Hardware │ │ Top Half │ │
│ │ Arrives │───►│ Interrupt │───►│ Handler │ │
│ └─────────────┘ └─────────────┘ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Protocol │ │ NAPI │ │ NET_RX │ │
│ │ Stack │◄───│ Poll │◄───│ Softirq │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
1. Packet Arrival:
- Network card receives a packet
- Card generates an interrupt
2. Top Half Handler:
- Acknowledges the interrupt
- Disables further interrupts from the device
- Schedules the NET_RX_SOFTIRQ softirq
- Returns quickly
3. Bottom Half Processing:
- NET_RX_SOFTIRQ runs the NAPI polling function
- Driver processes received packets in batches
- Packets are passed to the network stack
- Re-enables interrupts when queue is empty
This approach balances responsiveness with efficiency by:
Interrupt Mitigation Techniques
┌───────────────────────────────────────────────────────────┐
│ │
│ Interrupt Mitigation │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ NAPI │ │ Interrupt Coalescing │ │
│ │ (New API) │ │ │ │
│ │ │ │ - Hardware bundles │ │
│ │ - Polling + IRQs │ │ multiple events │ │
│ │ - Adaptive │ │ - Single interrupt for│ │
│ │ - High throughput │ │ multiple packets │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
│ ┌─────────────────────┐ ┌───────────────────────┐ │
│ │ Throttling │ │ Dynamic Interrupt │ │
│ │ │ │ Moderation │ │
│ │ - Limit IRQ rate │ │ │ │
│ │ - Configurable │ │ - Adaptive algorithms │ │
│ │ - /proc/irq/N/ │ │ - Load-based │ │
│ │ throttle │ │ - Self-tuning │ │
│ └─────────────────────┘ └───────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
To prevent interrupt overload, Linux employs several mitigation techniques:
1. NAPI (New API):
- Hybrid approach combining interrupts and polling
- Uses interrupts during low traffic
- Switches to polling during high traffic
- Reduces interrupt overhead while maintaining responsiveness
2. Interrupt Coalescing:
- Hardware bundles multiple events into a single interrupt
- Configurable via ethtool for network devices
- Balances latency and throughput
3. Interrupt Throttling:
- Limits the rate of interrupts
- Prevents a single device from monopolizing CPU time
- Configurable via /proc/irq/N/throttle
Debugging Interrupt Issues
Linux provides several tools for debugging interrupt-related issues:
Common interrupt-related issues include:
Conclusion
The Linux interrupt subsystem provides a sophisticated framework for handling asynchronous hardware events efficiently. By dividing interrupt handling into top and bottom halves, employing various deferred processing mechanisms, and implementing mitigation techniques, Linux achieves a balance between responsiveness and throughput.
Understanding how interrupts work in Linux is essential for kernel developers, device driver authors, and system administrators who need to diagnose and optimize system performance. The interrupt subsystem continues to evolve, with ongoing improvements for real-time performance, scalability on many-core systems, and support for new hardware architectures.