Multi-Core Systems and Linux Kernel Concepts: How It All Fits Together

Modern Linux systems leverage multiple CPU cores to achieve parallelism, improving performance for both applications and the kernel. However, multi-core architectures introduce complexities in scheduling, synchronization, and resource management. This article explains how Linux extends core concepts like processes, threads, interrupts, and sleep/wake mechanisms to multi-core systems, with concrete examples.

1. Multi-Core Basics: SMP and Scheduling

Symmetric Multi-Processing (SMP)

In Linux, all CPU cores are treated equally (SMP). Each core can execute kernel code, handle interrupts, and run user-space threads. The kernel’s scheduler distributes tasks across cores dynamically.

Example:

A 4-core CPU runs a web server with 8 threads. The scheduler might assign 2 threads to each core, or dynamically balance them based on load.

Key Multi-Core Concepts:

Per-CPU Runqueues: Each core has its own runqueue (list of runnable threads). This reduces lock contention.
CPU Affinity: Threads can be pinned to specific cores (e.g., for cache locality).
Load Balancing: The scheduler migrates tasks between cores to avoid idle CPUs.

2. Processes, Threads, and Multi-Core

Thread Execution Across Cores

IMPORTANT: Threads from the same process can run on different cores simultaneously.
Shared memory (e.g., global variables) must be synchronized with locks (e.g., pthread_mutex).

Example:

// A multi-threaded program with threads running on different cores
#include <pthread.h>
#include <stdio.h>

void* work(void* arg) {
    printf("Thread %ld running on CPU %d\n", pthread_self(), sched_getcpu());
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, work, NULL);
    pthread_create(&t2, NULL, work, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    return 0;
}

Output (may vary):

Thread 140123456 running on CPU 2  
Thread 140123789 running on CPU 3

Kernel Internals:

The scheduler selects the "next" thread for each core independently.
Thread migration between cores is transparent to applications.

3. Sleep and Wake-Up in Multi-Core Systems

Sleeping Threads

When a thread sleeps (e.g., waiting for I/O), it is removed from its core’s runqueue. Other threads on the same core or other cores continue running.

Example: A thread on CPU 1 calls read() on an empty socket:

The kernel adds it to the socket’s wait queue.
The "thread’s state" is set to TASK_INTERRUPTIBLE.
CPU 1’s scheduler switches to another thread.

Waking Threads

IMPORTANT: When the event occurs (e.g., data arrives), the kernel wakes the thread. The scheduler may run it on any available core, not necessarily the original one.

Example:

A network packet arrives, triggering an interrupt on CPU 2.
The interrupt handler calls wake_up(), marking "the thread" as TASK_RUNNING.
The scheduler assigns the thread to CPU 3’s runqueue (due to load balancing).
Note: thread state change will tell scheduler the how to deal with the thread.

4. Interrupts and Multi-Core

Interrupt Distribution

Hardware interrupts can be routed to specific cores (via IRQ affinity).
Each core handles its own interrupts in interrupt context.

Example: Assign a network card’s interrupts to CPU 0:

echo 1 > /proc/irq/<IRQ_NUMBER>/smp_affinity  # Bitmask for CPU 0

SoftIRQs and Tasklets

Deferred interrupt work (e.g., packet processing) runs as SoftIRQs or tasklets.
SoftIRQs can run concurrently on multiple cores, but tasklets are serialized.

5. Synchronization Across Cores

Shared Resources

Cores accessing shared data (e.g., kernel data structures) require synchronization:

Spinlocks: Used in interrupt context or for short critical sections.
Mutexes: Used in process context (sleeps if contested).

Example (Kernel Module Using Spinlocks):

#include <linux/spinlock.h>

static DEFINE_SPINLOCK(my_lock);

void core_a_function(void) {
    spin_lock(&my_lock);
    // Access shared data
    spin_unlock(&my_lock);
}

void core_b_function(void) {
    spin_lock(&my_lock);
    // Access shared data
    spin_unlock(&my_lock);
}

Cache Coherence

Hardware ensures cache coherence (e.g., via MESI protocol).
Misaligned data structures can cause false sharing (cores fighting over cache lines).

6. Concrete Multi-Core Scenarios

Scenario 1: Multi-Threaded Web Server

Setup:

4 cores, 8 worker threads.

Behavior:

The scheduler distributes threads across cores.
Threads blocking on I/O are removed from runqueues.
Completed I/O operations (e.g., disk reads) wake threads, which may resume on any core.

Scenario 2: Real-Time Application with CPU Pinning

Goal: Minimize latency by pinning a thread to CPU 3.
Code:

cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(3, &cpuset);
pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);

7. Tools for Debugging Multi-Core Behavior

top/htop:

Press 1 to show per-core CPU usage.

Identify cores with high load or idle cores.

taskset:

taskset -c 0,1 ./my_program  # Run on CPUs 0 and 1

perf: Profile CPU cache misses or synchronization overhead:

perf stat -e cache-misses ./my_program

/proc/interrupts: View interrupt distribution across cores.

8. Challenges in Multi-Core Systems

Lock Contention:

Too many cores fighting for the same lock reduces scalability.

Fix: Use per-CPU data or lock-free algorithms.

False Sharing:

Two cores modifying different variables in the same cache line.

Fix: Align data structures to cache line boundaries.

NUMA Effects:

On NUMA systems, accessing memory from a remote node is slower.

Fix: Use numactl to bind memory to local nodes.

Conclusion

Multi-core systems amplify the power of Linux’s process, thread, and interrupt mechanisms but require careful handling of parallelism. Key takeaways:

The scheduler dynamically balances threads across cores.
Synchronization (spinlocks, mutexes) is critical to avoid race conditions.
CPU affinity and NUMA awareness optimize performance.
Tools like perf, taskset, and /proc help diagnose multi-core issues.

By understanding these concepts, developers can write efficient, scalable code that fully leverages modern multi-core architectures.

Multi-Core Systems and Linux Kernel Concepts: How It All Fits Together

David Zhu

Linux driver developer

1. Multi-Core Basics: SMP and Scheduling

Symmetric Multi-Processing (SMP)

Key Multi-Core Concepts:

2. Processes, Threads, and Multi-Core

Thread Execution Across Cores

Kernel Internals:

3. Sleep and Wake-Up in Multi-Core Systems

Sleeping Threads

Waking Threads

4. Interrupts and Multi-Core

Interrupt Distribution

SoftIRQs and Tasklets

5. Synchronization Across Cores

Shared Resources

Cache Coherence

6. Concrete Multi-Core Scenarios

Scenario 1: Multi-Threaded Web Server

Scenario 2: Real-Time Application with CPU Pinning

7. Tools for Debugging Multi-Core Behavior

8. Challenges in Multi-Core Systems

Conclusion

More articles by this author

Others also viewed

How and Why RISC Architectures Took Over from CISC Architectures

x86 protected mode and Long Mode x86-64 and the equivalents on ARM.

Why the Kernel Is Always There — Even When It’s Not Running

Writing a 512-Byte Boot Sector OS in x86 Assembly from Scratch

How the Kernel Is Entered: Syscalls, Traps, and Interrupts

ARM cortex M processor

Understanding PCI MSI (Message Signaled Interrupts) in Linux

Introduction to Linux Spinlocks and Comparison with Mutexes

Performance, Scalability and Availability checklist which can be used to check if costly CPU cycles are the reason for the impact.

Demystifying Control Memory in MCP Systems

Explore topics

1. Multi-Core Basics: SMP and Scheduling

Symmetric Multi-Processing (SMP)

Key Multi-Core Concepts:

2. Processes, Threads, and Multi-Core

Thread Execution Across Cores

Kernel Internals:

3. Sleep and Wake-Up in Multi-Core Systems

Sleeping Threads

Waking Threads

4. Interrupts and Multi-Core

Interrupt Distribution

SoftIRQs and Tasklets

5. Synchronization Across Cores

Shared Resources

Cache Coherence

6. Concrete Multi-Core Scenarios

Scenario 1: Multi-Threaded Web Server

Scenario 2: Real-Time Application with CPU Pinning

7. Tools for Debugging Multi-Core Behavior

8. Challenges in Multi-Core Systems

Conclusion

🏗️ EDK2 DSC Files Demystified: Building UEFI Platforms, Not Just Packages!

Aug 24, 2025

WRITE_ONCE Macro in Network Drivers: Memory Ordering and Race Condition Prevention

Jun 26, 2025

Memory Barriers in Network Drivers: Understanding smp_wmb()

Jun 26, 2025

Understanding sk_buff: Linear and Fragment Data Organization in Linux Kernel Networking

Jun 25, 2025

Linux Kernel Network Stack: Complete Implementation Path

Jun 24, 2025

Linux Network Concepts and Data Structures: sock, socket, and Their Relationships

Jun 22, 2025

Match the network devices with the corresponding pci devices

Jun 16, 2025

🍓 Raspberry Pi 5 Kernel Build Guide

Jun 9, 2025

CONFIG_DEBUG_INFO Impact Analysis

Jun 9, 2025

Protocol Code Organization

Jun 5, 2025

Others also viewed

How and Why RISC Architectures Took Over from CISC Architectures

x86 protected mode and Long Mode x86-64 and the equivalents on ARM.

Why the Kernel Is Always There — Even When It’s Not Running

Writing a 512-Byte Boot Sector OS in x86 Assembly from Scratch

How the Kernel Is Entered: Syscalls, Traps, and Interrupts

ARM cortex M processor

Understanding PCI MSI (Message Signaled Interrupts) in Linux

Introduction to Linux Spinlocks and Comparison with Mutexes

Performance, Scalability and Availability checklist which can be used to check if costly CPU cycles are the reason for the impact.

Demystifying Control Memory in MCP Systems

Explore topics