Linux Scheduling

The Linux operating system is renowned for its robust and flexible scheduling mechanism, which ensures efficient CPU time allocation across processes and threads.

The Linux kernel uses a preemptive scheduler to manage CPU time across tasks (processes and threads). Its goals include:

  • Fairness: Ensuring all tasks get a fair share of CPU time.

  • Responsiveness: Prioritizing interactive tasks (e.g., GUI).

  • Efficiency: Minimizing overhead during context switches.

  • Real-Time Support: Guaranteeing deterministic execution for time-critical tasks.

Linux treats threads as the fundamental schedulable units, not processes. Each thread is represented in the kernel as a task_struct and competes globally for CPU time.

Linux schedules threads across multiple processes when they have same/different scheduling policies and priorities. The Linux kernel treats threads (even in different processes) as independent schedulable entities, so their scheduling behavior depends on their individual policies and priorities. Here’s how it works:


Key Scheduling Policies

SCHED_OTHER (CFS – Completely Fair Scheduler)

  • Default Policy: Used for non-real-time tasks.

  • Algorithm: Allocates CPU time proportionally based on niceness values (-20 to 19). Lower niceness = higher priority.

  • Fairness: Tasks with equal niceness receive equal CPU time.

  • Preemption: Tasks can be preempted by real-time threads.

Tasks under this policy have dynamic priorities adjusted based on "niceness" values (ranging from -20 to 19.

SCHED_FIFO (First-In, First-Out)

Real-Time Policy: For time-critical tasks (e.g., robotics, audio processing).

Behavior:

  • Runs until it blocks, exits, or is preempted by a higher-priority thread.

  • No time slicing – can starve lower-priority threads.

Priority: Static (1–99), with 99 being the highest.

SCHED_RR (Round-Robin)

Real-Time Policy: Similar to SCHED_FIFO but with time slicing.

Behavior:

  • Runs for a fixed quantum (time slice), then yields to other threads of the same priority.

  • Preempted by higher-priority threads.

Priority: Static (1–99).


Process vs. Thread Scheduling

  • Processes: Act as containers for resources (memory, files).

  • Threads: Lightweight units of execution within a process.

Key Differences:

Example: A multi-threaded web server process might have:

  • 1 SCHED_FIFO thread handling real-time requests.

  • 4 SCHED_OTHER threads managing background tasks.


General Rules of Thread scheduling

Linux schedules threads across multiple processes when they have same/different scheduling policies and priorities. The Linux kernel treats threads (even in different processes) as independent schedulable entities, so their scheduling behavior depends on their individual policies and priorities. Here’s how it works:

  1. Real-Time Policies (SCHED_FIFO, SCHED_RR) Always Preempt SCHED_OTHER.

  2. Higher Priority Wins: Within the same policy, threads with higher priority run first.

  3. Process Boundaries Are Irrelevant: The scheduler does not prioritize threads based on their parent process. Only individual thread policies and priorities matter.

  4. Same Policy + Same Priority:

  • For SCHED_FIFO: Threads run in FIFO order (first to start runs until it blocks/yields).

  • For SCHED_RR: Threads run in round-robin order (time-sliced).

  • For SCHED_OTHER: CPU time is allocated proportionally based on niceness (fair sharing).

Why This Works like this

1. Kernel’s Global View: The Linux scheduler maintains a global priority queue for all threads across all processes.

  • Real-time threads (regardless of process) are placed in a separate queue ordered by priority.

  • SCHED_OTHER threads are managed by CFS, which distributes CPU time fairly.

2. Preemption: Higher-priority threads can preempt lower-priority ones at any time, even mid-process.

Possible Implications

  1. Starvation Risk: A high-priority real-time thread (e.g., SCHED_FIFO) in an infinite loop can starve all lower-priority threads (even in other processes).

  2. Process Isolation: CPU allocation is not per-process; it’s per-thread. A process with multiple high-priority threads can dominate the CPU.

  3. Heterogeneous Workloads: Mixing policies allows critical tasks (e.g., audio processing in SCHED_FIFO) to coexist with non-critical tasks (e.g., GUI in SCHED_OTHER).


Scenarios Explained

1. Threads with the Same Policy and Same Priority

Case 1: All SCHED_OTHER (CFS)

  • The CFS allocates CPU time fairly across all threads based on their "niceness" (weight). Threads in different processes share CPU proportionally.

  • Example: Two threads (in different processes) with SCHED_OTHER and niceness 0 split CPU time equally.

Case 2: All SCHED_FIFO (Same Priority)

  • Threads run in FIFO order. The first thread to acquire the CPU runs until it blocks, exits, or is preempted by a higher-priority thread.

  • Example: Thread A (Process 1) and Thread B (Process 2) with SCHED_FIFO and priority 50. If Thread A starts first, it monopolizes the CPU until it finishes or blocks, then Thread B runs.

Case 3: All SCHED_RR (Same Priority)

  • Threads run in round-robin order, with each receiving a fixed time slice (quantum).

  • Example: Thread A (Process 1) and Thread B (Process 2) with SCHED_RR and priority 50. The kernel alternates execution between them after each quantum expires.


2. Threads with the Same Policy but Different Priorities

Case 1: SCHED_FIFO/SCHED_RR with Different Priorities

  • Higher-priority threads always preempt lower-priority ones, regardless of the process they belong to.

  • Example: Thread A (Process 1, SCHED_FIFO, priority 90) and Thread B (Process 2, SCHED_FIFO, priority 80). Thread A runs first and indefinitely until it blocks, even if Thread B is in another process.

Case 2: SCHED_OTHER with Different Niceness

  • Threads with lower niceness (higher priority) receive a larger share of CPU time.

  • Example: Thread A (niceness -10) gets ~75% of CPU time, while Thread B (niceness 10) gets ~25% (approximate ratio based on CFS weight calculations).


3. Threads with Different Policies

  • Rule: Real-time (SCHED_FIFO, SCHED_RR) threads always preempt SCHED_OTHER threads. Between real-time policies, priority determines order.

Example:

  • Thread A (Process 1, SCHED_FIFO, priority 50)

  • Thread B (Process 2, SCHED_RR, priority 60)

  • Thread C (Process 3, SCHED_OTHER, niceness 0)

  • Order: Thread B (highest real-time priority) → Thread A → Thread C (only runs when no real-time threads are active).


Changing the Scheduling Policy

We can modify the scheduling policy for a process or thread using below :

Programmaticalychanging the scheduling policy is by directly invoke system calls in code.

Using chrt (Command Line)

Get a real-time policy of a process:

Set a real-time policy for a process:

Using System Calls (C Code)

For Processes:

Complete code :

Note: Only root can set real-time policies (SCHED_FIFO, SCHED_RR).

For Threads:

Complete code

Warning : Permissions Required

  • Real-time policies require CAP_SYS_NICE capability or root privileges.


Scope of Scheduling Policy Changes

Process vs. Threads: In Linux, threads are treated as lightweight processes (LWPs) with shared memory. Each thread has its own scheduling policy and priority.

  • Changing a process's policy (e.g., via sched_setscheduler()) affects only the main thread.

  • To change other threads, use pthread_setschedparam() on individual threads.

  • New threads inherit the creating thread’s policy unless explicitly set.

When creating a thread, you can set the inherit scheduler attribute:

Impact of pthread_setschedparam

  • Direct Thread Control: pthread_setschedparam() modifies the policy/priority of a single thread. The kernel schedules threads independently, regardless of their parent process.

  • Why It Works: The kernel is the "superior" entity managing all scheduling. Threads are treated as distinct schedulable entities, so per-thread policies allow fine-grained control.


Priority Management

Real-Time Priorities (1–99)

  • Higher values = higher priority.

  • SCHED_FIFO and SCHED_RR use this range.

Niceness Values (-20 to 19)

  • Lower values = higher priority for SCHED_OTHER.

  • Adjusted using nice or setpriority().


User Space vs. Kernel Space Switching

In Linux, processes operate in two modes:

  • User Space: Normal application execution.

  • Kernel Space: When a process requests system services (e.g., file operations, memory management).

Processes switch between user and kernel mode via:

1. System Calls (e.g., read(), write()).

The syscall () triggers a software interrupt ( or instruction on x86-64) to enter kernel mode.

2. Interrupts (e.g., timer interrupts for scheduling).

  • Hardware events (e.g., timer interrupts, I/O completion) cause the CPU to enter kernel mode.

  • Example: A keyboard press triggers an interrupt handler in the kernel.

3. Exceptions (e.g., page faults).

  • If a program tries to access invalid memory (e.g., segmentation fault), it triggers a kernel-mode exception handler.

How the Kernel Returns to User Space

  • After handling the syscall, interrupt, or exception, the kernel restores the user-space state and resumes execution where it left off.

Overall Mechanism:

  • The CPU saves user-space registers and switches to kernel mode.

  • The kernel executes the required handler (e.g., syscall).

  • Returns to user mode, restoring the saved context.


Mixed Scheduling Policies in Practice

If threads in a process have different policies (e.g., SCHED_OTHER, SCHED_FIFO, SCHED_RR):

  1. Priority Hierarchy: Real-time policies (SCHED_FIFO, SCHED_RR) always preempt SCHED_OTHER.

  2. Real-Time Priorities: SCHED_FIFO and SCHED_RR use static priorities (1–99, where 99 is highest). Higher-priority threads run first.

  3. Scheduling Behavior:

  • SCHED_FIFO: Runs until it blocks, yields, or is preempted by a higher-priority thread.

  • SCHED_RR: Runs for a fixed time slice, then is placed at the back of its priority queue.

  • SCHED_OTHER: Allocates CPU time proportionally based on niceness.

Example Scenario

A process has three threads:

  • Thread 1: SCHED_OTHER (niceness 0).

  • Thread 2: SCHED_FIFO (priority 80).

  • Thread 3: SCHED_RR (priority 90).

Scheduling Behavior:

  1. Thread 3 (SCHED_RR, priority 90) runs first. -> Uses round-robin time slices.

  2. Thread 2 (SCHED_FIFO, priority 80) runs next. -> Runs indefinitely until it blocks.

  3. Thread 1 (SCHED_OTHER) runs only when no real-time threads are active.

Why Mix Policies?

  • Real-Time Guarantees: : Critical tasks (e.g., audio processing) can use SCHED_FIFO for low latency, while non-critical threads use SCHED_OTHER.

  • Fairness: Non-critical tasks (e.g., logging) use SCHED_OTHER.

  • Priority Isolation: Prevents a misbehaving thread from starving others by capping its priority.

  • Flexibility: Kernel handles heterogeneity, ensuring system-wide fairness and responsiveness.


Common Pitfalls and Best Practices

Pitfalls

  • Starvation: A SCHED_FIFO thread with priority 99 can monopolize the CPU.

  • Priority Inversion: Low-priority threads holding resources needed by high-priority threads.

  • Misconfiguration: Incorrectly assigning real-time policies destabilizes the system.

Best Practices

  • Use real-time policies sparingly.

  • Avoid infinite loops in SCHED_FIFO threads.

  • Use priority inheritance mechanisms (e.g., pthread_mutexattr_setprotocol).


Tools for Monitoring and Debugging

  • top -H: View thread-level CPU usage.

  • chrt: Check/modify scheduling policies.

  • perf sched: Analyze scheduler behavior.

  • strace: Trace system calls and signals.


Referance :

Amlal E.

Founder @ SNU Systems Corp | Lead Systems Architect @ NeKernel.org

4mo

I love your article! Especially the CFS part. You also show pretty neatly how Linux combines different scheduling policies.

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore topics