Multi-Core Systems and Linux Kernel Concepts: How It All Fits Together

Multi-Core Systems and Linux Kernel Concepts: How It All Fits Together

Modern Linux systems leverage multiple CPU cores to achieve parallelism, improving performance for both applications and the kernel. However, multi-core architectures introduce complexities in scheduling, synchronization, and resource management. This article explains how Linux extends core concepts like processes, threads, interrupts, and sleep/wake mechanisms to multi-core systems, with concrete examples.


1. Multi-Core Basics: SMP and Scheduling

Symmetric Multi-Processing (SMP)

In Linux, all CPU cores are treated equally (SMP). Each core can execute kernel code, handle interrupts, and run user-space threads. The kernel’s scheduler distributes tasks across cores dynamically.

Example:

A 4-core CPU runs a web server with 8 threads. The scheduler might assign 2 threads to each core, or dynamically balance them based on load.

Key Multi-Core Concepts:

  1. Per-CPU Runqueues: Each core has its own runqueue (list of runnable threads). This reduces lock contention.
  2. CPU Affinity: Threads can be pinned to specific cores (e.g., for cache locality).
  3. Load Balancing: The scheduler migrates tasks between cores to avoid idle CPUs.


2. Processes, Threads, and Multi-Core

Thread Execution Across Cores

  • IMPORTANT: Threads from the same process can run on different cores simultaneously.
  • Shared memory (e.g., global variables) must be synchronized with locks (e.g., pthread_mutex).

Example:

// A multi-threaded program with threads running on different cores
#include <pthread.h>
#include <stdio.h>

void* work(void* arg) {
    printf("Thread %ld running on CPU %d\n", pthread_self(), sched_getcpu());
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, work, NULL);
    pthread_create(&t2, NULL, work, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    return 0;
}
        

Output (may vary):

Thread 140123456 running on CPU 2  
Thread 140123789 running on CPU 3
        

Kernel Internals:

  • The scheduler selects the "next" thread for each core independently.
  • Thread migration between cores is transparent to applications.


3. Sleep and Wake-Up in Multi-Core Systems

Sleeping Threads

When a thread sleeps (e.g., waiting for I/O), it is removed from its core’s runqueue. Other threads on the same core or other cores continue running.

Example: A thread on CPU 1 calls read() on an empty socket:

  1. The kernel adds it to the socket’s wait queue.
  2. The "thread’s state" is set to TASK_INTERRUPTIBLE.
  3. CPU 1’s scheduler switches to another thread.

Waking Threads

IMPORTANT: When the event occurs (e.g., data arrives), the kernel wakes the thread. The scheduler may run it on any available core, not necessarily the original one.

Example:

  • A network packet arrives, triggering an interrupt on CPU 2.
  • The interrupt handler calls wake_up(), marking "the thread" as TASK_RUNNING.
  • The scheduler assigns the thread to CPU 3’s runqueue (due to load balancing).
  • Note: thread state change will tell scheduler the how to deal with the thread.


4. Interrupts and Multi-Core

Interrupt Distribution

  • Hardware interrupts can be routed to specific cores (via IRQ affinity).
  • Each core handles its own interrupts in interrupt context.

Example: Assign a network card’s interrupts to CPU 0:

echo 1 > /proc/irq/<IRQ_NUMBER>/smp_affinity  # Bitmask for CPU 0
        

SoftIRQs and Tasklets

  • Deferred interrupt work (e.g., packet processing) runs as SoftIRQs or tasklets.
  • SoftIRQs can run concurrently on multiple cores, but tasklets are serialized.


5. Synchronization Across Cores

Shared Resources

Cores accessing shared data (e.g., kernel data structures) require synchronization:

  • Spinlocks: Used in interrupt context or for short critical sections.
  • Mutexes: Used in process context (sleeps if contested).

Example (Kernel Module Using Spinlocks):

#include <linux/spinlock.h>

static DEFINE_SPINLOCK(my_lock);

void core_a_function(void) {
    spin_lock(&my_lock);
    // Access shared data
    spin_unlock(&my_lock);
}

void core_b_function(void) {
    spin_lock(&my_lock);
    // Access shared data
    spin_unlock(&my_lock);
}
        

Cache Coherence

  • Hardware ensures cache coherence (e.g., via MESI protocol).
  • Misaligned data structures can cause false sharing (cores fighting over cache lines).


6. Concrete Multi-Core Scenarios

Scenario 1: Multi-Threaded Web Server

Setup:

  • 4 cores, 8 worker threads.

Behavior:

  • The scheduler distributes threads across cores.
  • Threads blocking on I/O are removed from runqueues.
  • Completed I/O operations (e.g., disk reads) wake threads, which may resume on any core.

Scenario 2: Real-Time Application with CPU Pinning

  • Goal: Minimize latency by pinning a thread to CPU 3.
  • Code:

cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(3, &cpuset);
pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);        

7. Tools for Debugging Multi-Core Behavior

  • top/htop:

Press 1 to show per-core CPU usage.

Identify cores with high load or idle cores.

  • taskset:

taskset -c 0,1 ./my_program  # Run on CPUs 0 and 1        

  • perf: Profile CPU cache misses or synchronization overhead:

perf stat -e cache-misses ./my_program        

  • /proc/interrupts: View interrupt distribution across cores.


8. Challenges in Multi-Core Systems

  • Lock Contention:

Too many cores fighting for the same lock reduces scalability.

Fix: Use per-CPU data or lock-free algorithms.

  • False Sharing:

Two cores modifying different variables in the same cache line.

Fix: Align data structures to cache line boundaries.

  • NUMA Effects:

On NUMA systems, accessing memory from a remote node is slower.

Fix: Use numactl to bind memory to local nodes.


Conclusion

Multi-core systems amplify the power of Linux’s process, thread, and interrupt mechanisms but require careful handling of parallelism. Key takeaways:

  • The scheduler dynamically balances threads across cores.
  • Synchronization (spinlocks, mutexes) is critical to avoid race conditions.
  • CPU affinity and NUMA awareness optimize performance.
  • Tools like perf, taskset, and /proc help diagnose multi-core issues.

By understanding these concepts, developers can write efficient, scalable code that fully leverages modern multi-core architectures.

Vishwas Srivastava

Principal Engineer @ Broadcom Ex- ( VMWare, Juniper, Cavium, Cisco, LG)

3mo

Thanks for this wonderful write up David. I think it is worth covering the boot time complexity of the multi-core system, the very first point from where kernel starts breathing.

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics