SlideShare a Scribd company logo
© 2017 VMware Inc. All rights reserved.
Understanding SCHED_DEADLINE
Controlling CPU Bandwidth
Steven Rostedt
26/9/2017
What is SCHED_DEADLINE?
●
A new scheduling class (Well since v3.14)
– others are: SCHED_OTHER/NORMAL, SCHED_FIFO, SCHED_RR
●
SCHED_IDLE, SCHED_BATCH (out of scope for today)
●
Constant Bandwidth Scheduler
●
Earliest Deadline First
Other Schedulers
●
SCHED_OTHER / SCHED_NORMAL
– Completely Fair Scheduler (CFS)
– Uses “nice” priority
– Each task gets a fair share of the CPU bandwidth
●
SCHED_FIFO
– First in, first out
– Each task runs till it gives up the CPU or a higher priority task preempts it
●
SCHED_RR
– Like SCHED_FIFO but same prio tasks get slices of CPU
Priorities
●
You have two programs running on the same CPU
– One runs a nuclear power plant
●
Requires 1/2 second out of every second of the CPU (50% of the CPU)
– The other runs a washing machine
●
Requires 50 millisecond out of every 200 milliseconds (25% of the CPU)
– Which one gets the higher priority?
Priorities
½ sec
½ sec
1 sec
1 sec
Priorities Nuke > Washing Machine
½ sec
½ sec
1 sec
1 sec
Priorities Nuke < Washing Machine
½ sec
½ sec
1 sec
1 sec
Rate Monotonic Scheduling (RMS)
●
Computational time vs Period
●
Can be implemented by SCHED_FIFO
●
Smallest period gets highest priority
●
Compute computation time (C)
●
Compute period time (T)
U =∑
i=1
n
Ci
Ti
Rate Monotonic Scheduling (RMS)
●
Add a Dishwasher to the mix...
●
Nuclear Power Plant : C = 500ms T=1000ms (50% of the CPU)
●
Dishwasher: C = 300ms T = 900ms (33.3333% of the CPU)
●
Washing Machine: C = 100ms T = 800ms (12.5% of the CPU)
U =
500
1000
+
300
900
+
100
800
=.958333
Rate Monotonic Scheduling (RMS)
0 1 12111098765432 13
Rate Monotonic Scheduling (RMS)
0 1 12111098765432 13
Rate Monotonic Scheduling (RMS)
0 1 12111098765432 13
Rate Monotonic Scheduling (RMS)
0 1 12111098765432 13
Rate Monotonic Scheduling (RMS)
0 1 12111098765432 13
Rate Monotonic Scheduling (RMS)
0 1 12111098765432 13
Rate Monotonic Scheduling (RMS)
0 1 12111098765432 13
FAILED!
Rate Monotonic Scheduling (RMS)
●
Computational time vs Period
●
Can be implemented by SCHED_FIFO
●
Smallest period gets highest priority
●
Compute computation time (C)
●
Compute period time (T)
U =∑
i=1
n
Ci
Ti
Rate Monotonic Scheduling (RMS)
●
Computational time vs Period
●
Can be implemented by SCHED_FIFO
●
Smallest period gets highest priority
●
Compute computation time (C)
●
Compute period time (T)
U =∑
i=1
n
Ci
Ti
≤n(
n
√2−1)
Rate Monotonic Scheduling (RMS)
●
Add a Dishwasher to the mix...
●
Nuclear Power Plant : C = 500ms T=1000ms (50% of the CPU)
●
Dishwasher: C = 300ms T = 900ms (33.3333% of the CPU)
●
Washing Machine: C = 100ms T = 800ms (12.5% of the CPU)
U =
500
1000
+
300
900
+
100
800
=.958333
Rate Monotonic Scheduling (RMS)
●
Add a Dishwasher to the mix...
●
Nuclear Power Plant : C = 500ms T=1000ms (50% of the CPU)
●
Dishwasher: C = 300ms T = 900ms (33.3333% of the CPU)
●
Washing Machine: C = 100ms T = 800ms (12.5% of the CPU)
U =
500
1000
+
300
900
+
100
800
=.958333
U ≤n(
n
√2−1)=3(
3
√2−1)=0.77976
Rate Monotonic Scheduling (RMS)
U =∑
i=1
n
Ci
Ti
≤n(
n
√2−1)
lim
n→ ∞
n(
n
√2−1)=ln 2≈0.693147
SCHED_DEADLINE
●
Utilizes Earliest Deadline First (EDF)
●
Dynamic priority
●
The task with next deadline has highest priority
U =∑
i=1
n
Ci
Ti
=1
Earliest Deadline First (EDF)
0 1 12111098765432 13
Earliest Deadline First (EDF)
0 1 12111098765432 13
Earliest Deadline First (EDF)
0 1 12111098765432 13
Earliest Deadline First (EDF)
0 1 12111098765432 13
:) HAPPY :)
Earliest Deadline First (EDF)
0 1 12111098765432 13
Earliest Deadline First (EDF)
0 1 12111098765432 13
Setting a RT priority
sched_setscheduler(pid_t pid, int policy, struct sched_param *param)
struct sched_param {
u32 sched_priority;
};
Implementing SCHED_DEADLINE in Linux
30
Two new syscalls
sched_getattr(pid_t pid, struct sched_attr *attr, unsigned int size, unsigned int flags)
(Similar to sched_getparam(pid_t pid, struct sched_param *param)
sched_setattr(pid_t pid, struct sched_attr *attr, unsigned int flags)
(Similar to sched_setparam(pid_t pid, struct sched_param *param)
Implementing SCHED_DEADLINE
struct sched_attr {
u32 size; /* Size of this structure */
u32 sched_policy; /* Policy (SCHED_*) */
u64 sched_flags; /* Flags */
s32 sched_nice; /* Nice value (SCHED_OTHER,
SCHED_BATCH) */
u32 sched_priority; /* Static priority (SCHED_FIFO,
SCHED_RR) */
/* Remaining fields are for SCHED_DEADLINE */
u64 sched_runtime;
u64 sched_deadline;
u64 sched_period;
};
Implementing SCHED_DEADLINE
struct sched_attr attr;
ret = sched_getattr(0, &attr, sizeof(attr), 0);
if (ret < 0)
error();
attr.sched_policy = SCHED_DEADLINE;
attr.sched_runtime = runtime_ns;
attr.sched_deadline = deadline_ns;
ret = sched_setattr(0, &attr, 0);
if (ret < 0)
error();
sched_yield()
●
Most use cases are buggy
– Most tasks will not give up the CPU
●
SCHED_OTHER
– Gives up current CPU time slice
●
SCHED_FIFO / SCHED_RR
– Gives up the CPU to a task of the SAME PRIORITY
– Voluntary scheduling among same priority tasks
sched_yield()
●
Buggy code!
again:
pthread_mutex_lock(&mutex_A);
B = A->B;
if (pthread_mutex_trylock(&B->mutex_B)) {
pthread_mutex_unlock(&mutex_A);
sched_yield();
goto again;
}
sched_yield()
●
What you want for SCHED_DEADLINE!
●
Tells the kernel the task is done with current period
●
Used to relinquish the rest of the runtime budget
Constant Bandwidth Server
36
scheduling deadline = current time + deadline
remaining runtime = runtime
remaining runtime
scheduling deadline−current time
>
runtime
period
Self sleeping tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 3/9
Self sleeping tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 3/9
●
Remainder = 2/3 > 3/9
Self sleeping tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 3/9
●
Deadline = current time + new deadline
Self sleeping tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 3/9
●
Remaining Runtime = Runtime (3 units)
Self sleeping tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 3/9
●
Another Deadline task?
Self sleeping tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 3/9
●
Another Deadline task?
Self sleeping tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 3/9
●
Only ran 2 units in the original 9
Original Deadline
Donut Hole Puncher!
Deadline vs Period
●
Can't have offset holes in our donuts
●
Have a specific deadline to make within a period
runtime <= deadline <= period
●
But is this too constrained?
U =∑
i=1
n
Ci
Di
=1
Self sleeping constrained tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 2/4/10
Self sleeping constrained tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 2/4/10
●
1/1 > 2/4
Self sleeping constrained tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 2/4/10
●
Move deadline from 4 to 7 (period from 10 to 13)
Self sleeping constrained tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 2/4/10
●
Runs for 1 and sleeps again
Self sleeping constrained tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 2/4/10
●
Wakes up again with 1 to go (moves deadline to 10, period to 16)
Self sleeping constrained tasks
Courtesy of Daniel Bristot de Oliveira
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
U = 2/4/10
●
4 out of 10! Instead of 2 out 4 in 10
Multi processors!
It's all fun and games until someone throws another processor into
your eye
●
M CPUs
●
M+1 tasks
●
One task with runtime 999ms out of 1000ms
●
M tasks of runtime of 10ms out of 999ms
●
All start at the same time
●
The M tasks have a shorted deadline
●
All M tasks run on all CPUs for 10ms
●
That one task now only has 990 ms left to run 999ms.
Multi processors! (Dhall's Effect)
999
1000
+ M (
10
999
)=0.999+.01001 M <M
M=2;
999
1000
+2(
10
999
)=0.999+2∗0.01001=1.01902<2
2 tasks: 2/9; 1 task 9/10; U = 2/9 + 2/9 + 9/10 = 1.34444 < 2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
2/9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
2/9
2 tasks: 2/9; 1 task 9/10; U = 2/9 + 2/9 + 9/10 = 1.34444 < 2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
2/9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
2/9 9/10
Multi processors!
●
EDF can not give you better than U = 1
– No matter how many processors you have
– Full utilization should be U = N CPUs
●
Two methods
– Partitioning (Bind each task to a CPU)
– Global (let all tasks migrate wherever)
– Neither give better than U = 1 guarantees
Multi processors!
●
EDF partitioned
Can not always be used:
●
U_t1 = .6
●
U_t2 = .6
●
U_t3 = .5
●
The above would need special scheduling to work anyway
To figure out the best utilization is the bin packing problem
●
Sorry folks, it's NP complete
●
Don't even bother trying
Multi processors!
●
Global Earliest Deadline First (gEDF)
●
Can not guarantee deadlines of U > 1 for all cases
●
But special cases can be satisfied for U > 1
D_i = P_i
U_max = max{C_i/P_i}
U =∑
i=1
n
Ci
Pi
≤M −( M−1)∗U max
Multi processors!
●
M = 8
●
U_max = 0.5
U =∑
i=1
n
Ci
Pi
≤M −( M−1)∗U max
U =∑
i=1
n
Ci
Pi
≤8−(7)∗.5=4.5
Multi processors!
●
M = 2
●
U_max = 999/1000
U =∑
i=1
n
Ci
Pi
≤M −( M−1)∗U max
U =∑
i=1
n
Ci
Pi
≤2−(1)∗0.999=1.001
The limits of SCHED_DEADLINE
●
Runs on all CPUS (well sorta)
No limited sched affinity allowed
Global EDF is the default
Must account for sched migration overheads
●
Can not have children (no forking)
Your SCHED_DEADLINE tasks have been fixed
●
Calculating Worse Case Execution Time (WCET)
If you get it wrong, SCHED_DEADLINE may throttle your task before it finishes
Giving SCHED_DEADLINE Affinity
Setting task affinity on SCHED_DEADLINE is not allowed
But you can limit them by creating new sched domains
CPU sets
Implementing Partitioned EDF
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
mkdir other_set
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
mkdir other_set
echo 0-2 > other_set/cpuset.cpus
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
mkdir other_set
echo 0-2 > other_set/cpuset.cpus
echo 0 > other_set/cpuset.mems
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
mkdir other_set
echo 0-2 > other_set/cpuset.cpus
echo 0 > other_set/cpuset.mems
echo 1 > other_set/cpuset.sched_load_balance
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
mkdir other_set
echo 0-2 > other_set/cpuset.cpus
echo 0 > other_set/cpuset.mems
echo 1 > other_set/cpuset.sched_load_balance
echo 1 > other_set/cpuset.cpu_exclusive
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
mkdir other_set
echo 0-2 > other_set/cpuset.cpus
echo 0 > other_set/cpuset.mems
echo 1 > other_set/cpuset.sched_load_balance
echo 1 > other_set/cpuset.cpu_exclusive
echo 3 > my_set/cpuset.cpus
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
mkdir other_set
echo 0-2 > other_set/cpuset.cpus
echo 0 > other_set/cpuset.mems
echo 1 > other_set/cpuset.sched_load_balance
echo 1 > other_set/cpuset.cpu_exclusive
echo 3 > my_set/cpuset.cpus
echo 0 > my_set/cpuset.mems
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
mkdir other_set
echo 0-2 > other_set/cpuset.cpus
echo 0 > other_set/cpuset.mems
echo 1 > other_set/cpuset.sched_load_balance
echo 1 > other_set/cpuset.cpu_exclusive
echo 3 > my_set/cpuset.cpus
echo 0 > my_set/cpuset.mems
echo 1 > my_set/cpuset.sched_load_balance
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
mkdir other_set
echo 0-2 > other_set/cpuset.cpus
echo 0 > other_set/cpuset.mems
echo 1 > other_set/cpuset.sched_load_balance
echo 1 > other_set/cpuset.cpu_exclusive
echo 3 > my_set/cpuset.cpus
echo 0 > my_set/cpuset.mems
echo 1 > my_set/cpuset.sched_load_balance
echo 1 > my_set/cpuset.cpu_exclusive
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
mkdir other_set
echo 0-2 > other_set/cpuset.cpus
echo 0 > other_set/cpuset.mems
echo 1 > other_set/cpuset.sched_load_balance
echo 1 > other_set/cpuset.cpu_exclusive
echo 3 > my_set/cpuset.cpus
echo 0 > my_set/cpuset.mems
echo 1 > my_set/cpuset.sched_load_balance
echo 1 > my_set/cpuset.cpu_exclusive
echo 0 > cpuset.sched_load_balance
Giving SCHED_DEADLINE Affinity
cd /sys/fs/cgroup/cpuset
mkdir my_set
mkdir other_set
echo 0-2 > other_set/cpuset.cpus
echo 0 > other_set/cpuset.mems
echo 1 > other_set/cpuset.sched_load_balance
echo 1 > other_set/cpuset.cpu_exclusive
echo 3 > my_set/cpuset.cpus
echo 0 > my_set/cpuset.mems
echo 1 > my_set/cpuset.sched_load_balance
echo 1 > my_set/cpuset.cpu_exclusive
echo 0 > cpuset.sched_load_balance
That’s a lot!
Giving SCHED_DEADLINE Affinity
cat tasks | while read task; do
echo $task > other_set/tasks
done
echo $sched_deadline_task > my_set/tasks
Calculating WCET
●
Today's hardware is extremely unpredictable
●
Worse Case Execution Time is impossible to know
●
Allocate too much bandwidth instead
●
Need something between RMS and CBS
GRUB (not the boot loader)
●
Greedy Reclaim of Unused Bandwidth
●
Allows for SCHED_DEADLINE tasks to use up the unused utilization
of the CPU (or part of it)
●
Allows for tasks to handle WCET of a bit more than calculated.
●
Just went into mainline (v4.13)
New Work
●
Revisiting affinity
●
Semi-partitioning (dynamic deadlines to boot)
2 tasks: 2/9; 1 task 9/10; U = 2/9 + 2/9 + 9/10 = 1.34444 < 2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
2/9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
2/9 9/10
2 tasks: 2/9; 1 task 9/10; U = 2/9 + 2/9 + 9/10 = 1.34444 < 2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
2/9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
2/9 2/9/10
7/8/10
Links
Documentation/scheduler/sched_deadline.txt
http://disi.unitn.it/~abeni/reclaiming/rtlws14-grub.pdf
http://www.evidence.eu.com/sched_deadline.html
http://www.atc.uniovi.es/rsa/starts/documents/Lopez_2004_rts.pdf
https://cs.unc.edu/~anderson/papers/rtj06a.pdf
Thank You
Steven Rostedt

More Related Content

PPTX
GPU-Accelerated Parallel Computing
PDF
Nvidia® cuda™ 5 sample evaluationresult_2
PDF
CPU Scheduling - Part2
PPTX
CPU scheduling algorithms in OS
PPT
CPU Scheduling Algorithms
PDF
Improvement of Scheduling Granularity for Deadline Scheduler
PDF
ps with notes required to tackelt project
PPT
multiprocessor real_ time scheduling.ppt
GPU-Accelerated Parallel Computing
Nvidia® cuda™ 5 sample evaluationresult_2
CPU Scheduling - Part2
CPU scheduling algorithms in OS
CPU Scheduling Algorithms
Improvement of Scheduling Granularity for Deadline Scheduler
ps with notes required to tackelt project
multiprocessor real_ time scheduling.ppt

Similar to Embedded Recipes 2017 - Understanding SCHED_DEADLINE - Steven Rostedt (20)

PDF
02 performance
PDF
Scheduling in Android
PDF
Brace yourselves, leap second is coming
PPTX
Multiprocessor scheduling 3
PDF
CS-102 DS-class_01_02 Lectures Data .pdf
PDF
Scheduling in Android
PDF
Extlect02
PDF
When the OS gets in the way
PPTX
Ch 06 - CPM PERT (1).pptx
PPTX
Cpu scheduling algorithm on windows
PDF
Keynote: Scaling Sensu Go
PDF
Cpu Schedule Algorithm
PPTX
Task allocation and scheduling inmultiprocessors
PPT
OS-operating systems- ch05 (CPU Scheduling) ...
PDF
Understanding of linux kernel memory model
PDF
Optimizing Parallel Reduction in CUDA : NOTES
PDF
Kubernetes Workload Rebalancing
PDF
PPTX
Operating Systems CPU Scheduling Process
PPTX
Devansh
02 performance
Scheduling in Android
Brace yourselves, leap second is coming
Multiprocessor scheduling 3
CS-102 DS-class_01_02 Lectures Data .pdf
Scheduling in Android
Extlect02
When the OS gets in the way
Ch 06 - CPM PERT (1).pptx
Cpu scheduling algorithm on windows
Keynote: Scaling Sensu Go
Cpu Schedule Algorithm
Task allocation and scheduling inmultiprocessors
OS-operating systems- ch05 (CPU Scheduling) ...
Understanding of linux kernel memory model
Optimizing Parallel Reduction in CUDA : NOTES
Kubernetes Workload Rebalancing
Operating Systems CPU Scheduling Process
Devansh
Ad

More from Anne Nicolas (20)

PDF
Kernel Recipes 2019 - Driving the industry toward upstream first
PDF
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
PDF
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
PDF
Kernel Recipes 2019 - Metrics are money
PDF
Kernel Recipes 2019 - Kernel documentation: past, present, and future
PDF
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
PDF
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
PDF
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
PDF
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
PDF
Embedded Recipes 2019 - Making embedded graphics less special
PDF
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
PDF
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
PDF
Embedded Recipes 2019 - Testing firmware the devops way
PDF
Embedded Recipes 2019 - Herd your socs become a matchmaker
PDF
Embedded Recipes 2019 - LLVM / Clang integration
PDF
Embedded Recipes 2019 - Introduction to JTAG debugging
PDF
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
PDF
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
PDF
Kernel Recipes 2019 - Suricata and XDP
PDF
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Ad

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PDF
Network Security Unit 5.pdf for BCA BBA.
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
KodekX | Application Modernization Development
Network Security Unit 5.pdf for BCA BBA.
The AUB Centre for AI in Media Proposal.docx
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
Building Integrated photovoltaic BIPV_UPV.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Monthly Chronicles - July 2025
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Embedded Recipes 2017 - Understanding SCHED_DEADLINE - Steven Rostedt

  • 1. © 2017 VMware Inc. All rights reserved. Understanding SCHED_DEADLINE Controlling CPU Bandwidth Steven Rostedt 26/9/2017
  • 2. What is SCHED_DEADLINE? ● A new scheduling class (Well since v3.14) – others are: SCHED_OTHER/NORMAL, SCHED_FIFO, SCHED_RR ● SCHED_IDLE, SCHED_BATCH (out of scope for today) ● Constant Bandwidth Scheduler ● Earliest Deadline First
  • 3. Other Schedulers ● SCHED_OTHER / SCHED_NORMAL – Completely Fair Scheduler (CFS) – Uses “nice” priority – Each task gets a fair share of the CPU bandwidth ● SCHED_FIFO – First in, first out – Each task runs till it gives up the CPU or a higher priority task preempts it ● SCHED_RR – Like SCHED_FIFO but same prio tasks get slices of CPU
  • 4. Priorities ● You have two programs running on the same CPU – One runs a nuclear power plant ● Requires 1/2 second out of every second of the CPU (50% of the CPU) – The other runs a washing machine ● Requires 50 millisecond out of every 200 milliseconds (25% of the CPU) – Which one gets the higher priority?
  • 6. Priorities Nuke > Washing Machine ½ sec ½ sec 1 sec 1 sec
  • 7. Priorities Nuke < Washing Machine ½ sec ½ sec 1 sec 1 sec
  • 8. Rate Monotonic Scheduling (RMS) ● Computational time vs Period ● Can be implemented by SCHED_FIFO ● Smallest period gets highest priority ● Compute computation time (C) ● Compute period time (T) U =∑ i=1 n Ci Ti
  • 9. Rate Monotonic Scheduling (RMS) ● Add a Dishwasher to the mix... ● Nuclear Power Plant : C = 500ms T=1000ms (50% of the CPU) ● Dishwasher: C = 300ms T = 900ms (33.3333% of the CPU) ● Washing Machine: C = 100ms T = 800ms (12.5% of the CPU) U = 500 1000 + 300 900 + 100 800 =.958333
  • 10. Rate Monotonic Scheduling (RMS) 0 1 12111098765432 13
  • 11. Rate Monotonic Scheduling (RMS) 0 1 12111098765432 13
  • 12. Rate Monotonic Scheduling (RMS) 0 1 12111098765432 13
  • 13. Rate Monotonic Scheduling (RMS) 0 1 12111098765432 13
  • 14. Rate Monotonic Scheduling (RMS) 0 1 12111098765432 13
  • 15. Rate Monotonic Scheduling (RMS) 0 1 12111098765432 13
  • 16. Rate Monotonic Scheduling (RMS) 0 1 12111098765432 13 FAILED!
  • 17. Rate Monotonic Scheduling (RMS) ● Computational time vs Period ● Can be implemented by SCHED_FIFO ● Smallest period gets highest priority ● Compute computation time (C) ● Compute period time (T) U =∑ i=1 n Ci Ti
  • 18. Rate Monotonic Scheduling (RMS) ● Computational time vs Period ● Can be implemented by SCHED_FIFO ● Smallest period gets highest priority ● Compute computation time (C) ● Compute period time (T) U =∑ i=1 n Ci Ti ≤n( n √2−1)
  • 19. Rate Monotonic Scheduling (RMS) ● Add a Dishwasher to the mix... ● Nuclear Power Plant : C = 500ms T=1000ms (50% of the CPU) ● Dishwasher: C = 300ms T = 900ms (33.3333% of the CPU) ● Washing Machine: C = 100ms T = 800ms (12.5% of the CPU) U = 500 1000 + 300 900 + 100 800 =.958333
  • 20. Rate Monotonic Scheduling (RMS) ● Add a Dishwasher to the mix... ● Nuclear Power Plant : C = 500ms T=1000ms (50% of the CPU) ● Dishwasher: C = 300ms T = 900ms (33.3333% of the CPU) ● Washing Machine: C = 100ms T = 800ms (12.5% of the CPU) U = 500 1000 + 300 900 + 100 800 =.958333 U ≤n( n √2−1)=3( 3 √2−1)=0.77976
  • 21. Rate Monotonic Scheduling (RMS) U =∑ i=1 n Ci Ti ≤n( n √2−1) lim n→ ∞ n( n √2−1)=ln 2≈0.693147
  • 22. SCHED_DEADLINE ● Utilizes Earliest Deadline First (EDF) ● Dynamic priority ● The task with next deadline has highest priority U =∑ i=1 n Ci Ti =1
  • 23. Earliest Deadline First (EDF) 0 1 12111098765432 13
  • 24. Earliest Deadline First (EDF) 0 1 12111098765432 13
  • 25. Earliest Deadline First (EDF) 0 1 12111098765432 13
  • 26. Earliest Deadline First (EDF) 0 1 12111098765432 13 :) HAPPY :)
  • 27. Earliest Deadline First (EDF) 0 1 12111098765432 13
  • 28. Earliest Deadline First (EDF) 0 1 12111098765432 13
  • 29. Setting a RT priority sched_setscheduler(pid_t pid, int policy, struct sched_param *param) struct sched_param { u32 sched_priority; };
  • 30. Implementing SCHED_DEADLINE in Linux 30 Two new syscalls sched_getattr(pid_t pid, struct sched_attr *attr, unsigned int size, unsigned int flags) (Similar to sched_getparam(pid_t pid, struct sched_param *param) sched_setattr(pid_t pid, struct sched_attr *attr, unsigned int flags) (Similar to sched_setparam(pid_t pid, struct sched_param *param)
  • 31. Implementing SCHED_DEADLINE struct sched_attr { u32 size; /* Size of this structure */ u32 sched_policy; /* Policy (SCHED_*) */ u64 sched_flags; /* Flags */ s32 sched_nice; /* Nice value (SCHED_OTHER, SCHED_BATCH) */ u32 sched_priority; /* Static priority (SCHED_FIFO, SCHED_RR) */ /* Remaining fields are for SCHED_DEADLINE */ u64 sched_runtime; u64 sched_deadline; u64 sched_period; };
  • 32. Implementing SCHED_DEADLINE struct sched_attr attr; ret = sched_getattr(0, &attr, sizeof(attr), 0); if (ret < 0) error(); attr.sched_policy = SCHED_DEADLINE; attr.sched_runtime = runtime_ns; attr.sched_deadline = deadline_ns; ret = sched_setattr(0, &attr, 0); if (ret < 0) error();
  • 33. sched_yield() ● Most use cases are buggy – Most tasks will not give up the CPU ● SCHED_OTHER – Gives up current CPU time slice ● SCHED_FIFO / SCHED_RR – Gives up the CPU to a task of the SAME PRIORITY – Voluntary scheduling among same priority tasks
  • 34. sched_yield() ● Buggy code! again: pthread_mutex_lock(&mutex_A); B = A->B; if (pthread_mutex_trylock(&B->mutex_B)) { pthread_mutex_unlock(&mutex_A); sched_yield(); goto again; }
  • 35. sched_yield() ● What you want for SCHED_DEADLINE! ● Tells the kernel the task is done with current period ● Used to relinquish the rest of the runtime budget
  • 36. Constant Bandwidth Server 36 scheduling deadline = current time + deadline remaining runtime = runtime remaining runtime scheduling deadline−current time > runtime period
  • 37. Self sleeping tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 3/9
  • 38. Self sleeping tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 3/9 ● Remainder = 2/3 > 3/9
  • 39. Self sleeping tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 3/9 ● Deadline = current time + new deadline
  • 40. Self sleeping tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 3/9 ● Remaining Runtime = Runtime (3 units)
  • 41. Self sleeping tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 3/9 ● Another Deadline task?
  • 42. Self sleeping tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 3/9 ● Another Deadline task?
  • 43. Self sleeping tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 3/9 ● Only ran 2 units in the original 9 Original Deadline
  • 45. Deadline vs Period ● Can't have offset holes in our donuts ● Have a specific deadline to make within a period runtime <= deadline <= period ● But is this too constrained? U =∑ i=1 n Ci Di =1
  • 46. Self sleeping constrained tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 2/4/10
  • 47. Self sleeping constrained tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 2/4/10 ● 1/1 > 2/4
  • 48. Self sleeping constrained tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 2/4/10 ● Move deadline from 4 to 7 (period from 10 to 13)
  • 49. Self sleeping constrained tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 2/4/10 ● Runs for 1 and sleeps again
  • 50. Self sleeping constrained tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 2/4/10 ● Wakes up again with 1 to go (moves deadline to 10, period to 16)
  • 51. Self sleeping constrained tasks Courtesy of Daniel Bristot de Oliveira 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 U = 2/4/10 ● 4 out of 10! Instead of 2 out 4 in 10
  • 52. Multi processors! It's all fun and games until someone throws another processor into your eye
  • 53. ● M CPUs ● M+1 tasks ● One task with runtime 999ms out of 1000ms ● M tasks of runtime of 10ms out of 999ms ● All start at the same time ● The M tasks have a shorted deadline ● All M tasks run on all CPUs for 10ms ● That one task now only has 990 ms left to run 999ms. Multi processors! (Dhall's Effect) 999 1000 + M ( 10 999 )=0.999+.01001 M <M M=2; 999 1000 +2( 10 999 )=0.999+2∗0.01001=1.01902<2
  • 54. 2 tasks: 2/9; 1 task 9/10; U = 2/9 + 2/9 + 9/10 = 1.34444 < 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 2/9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 2/9
  • 55. 2 tasks: 2/9; 1 task 9/10; U = 2/9 + 2/9 + 9/10 = 1.34444 < 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 2/9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 2/9 9/10
  • 56. Multi processors! ● EDF can not give you better than U = 1 – No matter how many processors you have – Full utilization should be U = N CPUs ● Two methods – Partitioning (Bind each task to a CPU) – Global (let all tasks migrate wherever) – Neither give better than U = 1 guarantees
  • 57. Multi processors! ● EDF partitioned Can not always be used: ● U_t1 = .6 ● U_t2 = .6 ● U_t3 = .5 ● The above would need special scheduling to work anyway To figure out the best utilization is the bin packing problem ● Sorry folks, it's NP complete ● Don't even bother trying
  • 58. Multi processors! ● Global Earliest Deadline First (gEDF) ● Can not guarantee deadlines of U > 1 for all cases ● But special cases can be satisfied for U > 1 D_i = P_i U_max = max{C_i/P_i} U =∑ i=1 n Ci Pi ≤M −( M−1)∗U max
  • 59. Multi processors! ● M = 8 ● U_max = 0.5 U =∑ i=1 n Ci Pi ≤M −( M−1)∗U max U =∑ i=1 n Ci Pi ≤8−(7)∗.5=4.5
  • 60. Multi processors! ● M = 2 ● U_max = 999/1000 U =∑ i=1 n Ci Pi ≤M −( M−1)∗U max U =∑ i=1 n Ci Pi ≤2−(1)∗0.999=1.001
  • 61. The limits of SCHED_DEADLINE ● Runs on all CPUS (well sorta) No limited sched affinity allowed Global EDF is the default Must account for sched migration overheads ● Can not have children (no forking) Your SCHED_DEADLINE tasks have been fixed ● Calculating Worse Case Execution Time (WCET) If you get it wrong, SCHED_DEADLINE may throttle your task before it finishes
  • 62. Giving SCHED_DEADLINE Affinity Setting task affinity on SCHED_DEADLINE is not allowed But you can limit them by creating new sched domains CPU sets Implementing Partitioned EDF
  • 63. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset
  • 64. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set
  • 65. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set mkdir other_set
  • 66. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set mkdir other_set echo 0-2 > other_set/cpuset.cpus
  • 67. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set mkdir other_set echo 0-2 > other_set/cpuset.cpus echo 0 > other_set/cpuset.mems
  • 68. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set mkdir other_set echo 0-2 > other_set/cpuset.cpus echo 0 > other_set/cpuset.mems echo 1 > other_set/cpuset.sched_load_balance
  • 69. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set mkdir other_set echo 0-2 > other_set/cpuset.cpus echo 0 > other_set/cpuset.mems echo 1 > other_set/cpuset.sched_load_balance echo 1 > other_set/cpuset.cpu_exclusive
  • 70. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set mkdir other_set echo 0-2 > other_set/cpuset.cpus echo 0 > other_set/cpuset.mems echo 1 > other_set/cpuset.sched_load_balance echo 1 > other_set/cpuset.cpu_exclusive echo 3 > my_set/cpuset.cpus
  • 71. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set mkdir other_set echo 0-2 > other_set/cpuset.cpus echo 0 > other_set/cpuset.mems echo 1 > other_set/cpuset.sched_load_balance echo 1 > other_set/cpuset.cpu_exclusive echo 3 > my_set/cpuset.cpus echo 0 > my_set/cpuset.mems
  • 72. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set mkdir other_set echo 0-2 > other_set/cpuset.cpus echo 0 > other_set/cpuset.mems echo 1 > other_set/cpuset.sched_load_balance echo 1 > other_set/cpuset.cpu_exclusive echo 3 > my_set/cpuset.cpus echo 0 > my_set/cpuset.mems echo 1 > my_set/cpuset.sched_load_balance
  • 73. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set mkdir other_set echo 0-2 > other_set/cpuset.cpus echo 0 > other_set/cpuset.mems echo 1 > other_set/cpuset.sched_load_balance echo 1 > other_set/cpuset.cpu_exclusive echo 3 > my_set/cpuset.cpus echo 0 > my_set/cpuset.mems echo 1 > my_set/cpuset.sched_load_balance echo 1 > my_set/cpuset.cpu_exclusive
  • 74. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set mkdir other_set echo 0-2 > other_set/cpuset.cpus echo 0 > other_set/cpuset.mems echo 1 > other_set/cpuset.sched_load_balance echo 1 > other_set/cpuset.cpu_exclusive echo 3 > my_set/cpuset.cpus echo 0 > my_set/cpuset.mems echo 1 > my_set/cpuset.sched_load_balance echo 1 > my_set/cpuset.cpu_exclusive echo 0 > cpuset.sched_load_balance
  • 75. Giving SCHED_DEADLINE Affinity cd /sys/fs/cgroup/cpuset mkdir my_set mkdir other_set echo 0-2 > other_set/cpuset.cpus echo 0 > other_set/cpuset.mems echo 1 > other_set/cpuset.sched_load_balance echo 1 > other_set/cpuset.cpu_exclusive echo 3 > my_set/cpuset.cpus echo 0 > my_set/cpuset.mems echo 1 > my_set/cpuset.sched_load_balance echo 1 > my_set/cpuset.cpu_exclusive echo 0 > cpuset.sched_load_balance That’s a lot!
  • 76. Giving SCHED_DEADLINE Affinity cat tasks | while read task; do echo $task > other_set/tasks done echo $sched_deadline_task > my_set/tasks
  • 77. Calculating WCET ● Today's hardware is extremely unpredictable ● Worse Case Execution Time is impossible to know ● Allocate too much bandwidth instead ● Need something between RMS and CBS
  • 78. GRUB (not the boot loader) ● Greedy Reclaim of Unused Bandwidth ● Allows for SCHED_DEADLINE tasks to use up the unused utilization of the CPU (or part of it) ● Allows for tasks to handle WCET of a bit more than calculated. ● Just went into mainline (v4.13)
  • 80. 2 tasks: 2/9; 1 task 9/10; U = 2/9 + 2/9 + 9/10 = 1.34444 < 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 2/9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 2/9 9/10
  • 81. 2 tasks: 2/9; 1 task 9/10; U = 2/9 + 2/9 + 9/10 = 1.34444 < 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 2/9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 2/9 2/9/10 7/8/10