Multi-Core Systems and Linux Kernel Concepts: How It All Fits Together
Modern Linux systems leverage multiple CPU cores to achieve parallelism, improving performance for both applications and the kernel. However, multi-core architectures introduce complexities in scheduling, synchronization, and resource management. This article explains how Linux extends core concepts like processes, threads, interrupts, and sleep/wake mechanisms to multi-core systems, with concrete examples.
1. Multi-Core Basics: SMP and Scheduling
Symmetric Multi-Processing (SMP)
In Linux, all CPU cores are treated equally (SMP). Each core can execute kernel code, handle interrupts, and run user-space threads. The kernel’s scheduler distributes tasks across cores dynamically.
Example:
A 4-core CPU runs a web server with 8 threads. The scheduler might assign 2 threads to each core, or dynamically balance them based on load.
Key Multi-Core Concepts:
2. Processes, Threads, and Multi-Core
Thread Execution Across Cores
Example:
// A multi-threaded program with threads running on different cores
#include <pthread.h>
#include <stdio.h>
void* work(void* arg) {
printf("Thread %ld running on CPU %d\n", pthread_self(), sched_getcpu());
return NULL;
}
int main() {
pthread_t t1, t2;
pthread_create(&t1, NULL, work, NULL);
pthread_create(&t2, NULL, work, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
return 0;
}
Output (may vary):
Thread 140123456 running on CPU 2
Thread 140123789 running on CPU 3
Kernel Internals:
3. Sleep and Wake-Up in Multi-Core Systems
Sleeping Threads
When a thread sleeps (e.g., waiting for I/O), it is removed from its core’s runqueue. Other threads on the same core or other cores continue running.
Example: A thread on CPU 1 calls read() on an empty socket:
Waking Threads
IMPORTANT: When the event occurs (e.g., data arrives), the kernel wakes the thread. The scheduler may run it on any available core, not necessarily the original one.
Example:
4. Interrupts and Multi-Core
Interrupt Distribution
Example: Assign a network card’s interrupts to CPU 0:
echo 1 > /proc/irq/<IRQ_NUMBER>/smp_affinity # Bitmask for CPU 0
SoftIRQs and Tasklets
5. Synchronization Across Cores
Shared Resources
Cores accessing shared data (e.g., kernel data structures) require synchronization:
Example (Kernel Module Using Spinlocks):
#include <linux/spinlock.h>
static DEFINE_SPINLOCK(my_lock);
void core_a_function(void) {
spin_lock(&my_lock);
// Access shared data
spin_unlock(&my_lock);
}
void core_b_function(void) {
spin_lock(&my_lock);
// Access shared data
spin_unlock(&my_lock);
}
Cache Coherence
6. Concrete Multi-Core Scenarios
Scenario 1: Multi-Threaded Web Server
Setup:
Behavior:
Scenario 2: Real-Time Application with CPU Pinning
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(3, &cpuset);
pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
7. Tools for Debugging Multi-Core Behavior
Press 1 to show per-core CPU usage.
Identify cores with high load or idle cores.
taskset -c 0,1 ./my_program # Run on CPUs 0 and 1
perf stat -e cache-misses ./my_program
8. Challenges in Multi-Core Systems
Too many cores fighting for the same lock reduces scalability.
Fix: Use per-CPU data or lock-free algorithms.
Two cores modifying different variables in the same cache line.
Fix: Align data structures to cache line boundaries.
On NUMA systems, accessing memory from a remote node is slower.
Fix: Use numactl to bind memory to local nodes.
Conclusion
Multi-core systems amplify the power of Linux’s process, thread, and interrupt mechanisms but require careful handling of parallelism. Key takeaways:
By understanding these concepts, developers can write efficient, scalable code that fully leverages modern multi-core architectures.
Principal Engineer @ Broadcom Ex- ( VMWare, Juniper, Cavium, Cisco, LG)
3moThanks for this wonderful write up David. I think it is worth covering the boot time complexity of the multi-core system, the very first point from where kernel starts breathing.