SlideShare a Scribd company logo
OS Scheduling and The
Anatomy of a context
switch
Daniel Ben-Zvi
Multi-tasking
Scheduling in the modern world
● Pause a running process in the middle of its execution
● Resume a previously paused process that is ready to
execute
● Do this (tens) of thousands of times per second
But it wasn’t always like this
No Scheduler
Only one process can operate at a time.
A single process can hog the whole system
forever.
Digger sleep function, 1982
sleep(time)
int time;
{
int a,b;
for(a=0; a<time; a++) {
for(b=0; b<100; b++);
}
}
So this is why pressing Turbo would speed it up! :)
3
(source: http://www.digger.org/digsrc_orig.zip)
And then they said.. lets cooperate!
Cooperative scheduler
Each process must “play nice” and yield control
to other processes.
… but a single misbehaving process can still
hog the whole system forever.
(the sad reality of a kibbutz)
Sleep method designed for MacOS 8 (in C)
void wxThread::Sleep(unsigned long milliseconds)
{
UnsignedWide start, now;
Microseconds(&start);
double mssleep = milliseconds * 1000 ;
double msstart, msnow ;
msstart = (start.hi * 4294967296.0 + start.lo) ;
do
{
YieldToAnyThread();
Microseconds(&now);
msnow = (now.hi * 4294967296.0 + now.lo) ;
} while( msnow - msstart < mssleep );
}
// Start a busy loop till time has passed
// Pass control to other threads (in the active process context)
(source: https://github.com/LuaDist/wxwidgets/blob/master/src/mac/classic/thread.cpp#L539)
Non preemptive
A process will eventually misbehave...
OS scheduling and The anatomy of a context switch
Round Robin (Time sharing)
(source: http://www.qnx.com/developers/docs/qnx_4.25_docs/qnx4/sysarch/microkernel.html)
but.. tasks are different in nature!
● Some tasks require CPU time to complete
● others require I/O (waiting for user input,
network data, disk)
● Some just sleep most of the time
Multilevel feedback queue
And many more..
So, how do you interrupt a running
process in the middle of its execution,
and then resume the work later on?
Use a timer interrupt!
● Ask the processor to interrupt the active process and wake up
the kernel every X interval (in the goold old Linux days, this was
defined as CONFIG_HZ):
void do_timer(struct pt_regs *regs) // Linux kernel (some version) interrupt handler
{
jiffies_64++;
update_process_times(user_mode(regs));
update_times();
}
The “context” in this program
bits 16 ; 16 bits real mode
start:
cli ; disable interrupts
mov si, msg ; SI points to RAM address of msg
mov ah, 0x0e ; print char service, with int 0x10
.loop lodsb ; load RAM: AL <- [DS:SI] && SI++
or al, al ; end of string?
jz halt
int 0x10 ; call BIOS print char
jmp .loop ; next char
halt: hlt ; halt
msg: db "Hello, World!", 0
# void swtch(struct context **old, struct context *new);
#
# Save current register context in old
# and then load register context from new.
.globl swtch
swtch:
movl 4(%esp), %eax
movl 8(%esp), %edx
# Save old callee-save registers
pushl %ebp ; Save old process ebp (base pointer) register
pushl %ebx ; Save old process ebx (general) register
pushl %esi ; Save old process esi (source index) register
pushl %edi ; Save old process edi (destination index) register
# Switch stacks
movl %esp, (%eax)
movl %edx, %esp
# Load new callee-save registers
popl %edi ; Load new process edi
popl %esi ; Load new process esi
popl %ebx ; Load new process ebx
popl %ebp ; Load new process ebp
ret
(source: http://samwho.co.uk/blog/2013/06/01/context-switching-on-x86/)
Finding out the context switch rate
[us-east-1c] i-91xxxxxx [root:~]$ vmstat -w 10
procs ---------------memory-------------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 3275668 48308 370264 0 0 0 0 12504 8884 1 5 94 0 0
0 0 0 3275492 48308 370264 0 0 0 0 12933 8834 2 5 93 0 0
1 0 0 3275588 48308 370264 0 0 0 0 12490 8825 2 4 94 0 0
Using /usr/bin/time
[us-east-1c] i-xxxxxxxx [root:~]$/usr/bin/time -v find / > /dev/null
Command being timed: "find /"
User time (seconds): 0.12
System time (seconds): 0.30
Percent of CPU this job got: 6%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.79
...
Voluntary context switches: 7362
Involuntary context switches: 60
...
File system inputs: 58888
Voluntary vs Involuntary
● Non voluntary context switches occur when the
scheduler interrupts the process (i.e. timeslice expired)
● Voluntary context switches occur when the process
yields control to the CPU (i.e. waiting for I/O, sleeping)
Using pidstat
[us-east-1c] i-xxxxxxxx [root:~]$pidstat -w -p 1504
Linux 3.13.0-55-generic (ip-10-xx-xx-x8) 06/21/2015 _x86_64_ (2 CPU)
09:24:23 PM UID PID cswch/s nvcswch/s Command
09:24:23 PM 0 1504 531.40 3.45 haproxy
Some benchmarks
● No CPU affinity
○ Intel 5150: ~4300ns/context switch
○ Intel E5440: ~3600ns/context switch
○ Intel E5520: ~4500ns/context switch
○ Intel X5550: ~3000ns/context switch
○ Intel L5630: ~3000ns/context switch
○ Intel E5-2620: ~3000ns/context switch
● With CPU affinity
○ Intel 5150: ~1900ns/process context switch, ~1700ns/thread context switch
○ Intel E5440: ~1300ns/process context switch, ~1100ns/thread context switch
○ Intel E5520: ~1400ns/process context switch, ~1300ns/thread context switch
○ Intel X5550: ~1300ns/process context switch, ~1100ns/thread context switch
○ Intel L5630: ~1600ns/process context switch, ~1400ns/thread context switch
○ Intel E5-2620: ~1600ns/process context switch, ~1300ns/thread context switch
Performance boost: 5150: 66%, E5440: 65-70%, E5520: 50-54%, X5550: 55%, L5630: 45%, E5-2620: 45%.
(source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
The hidden cost
● TLB flush
● Potentially screw up L1,L2,L3 CPU cache
● Kernel tries very hard to schedule threads on
the same Core/CPU/Numa node, but
sometimes it can't.
Without CPU affinity
(source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
With CPU affinity
(source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
Without CPU affinity With CPU affinity
(source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
Full context switch
● Save current process context
● TLB flush
● Potentially screw up L1,L2,L3 CPU cache
● Load next process context
● Can be done in hardware - but is usually
done in software.
So how much is too much?
10k c.s. @ 5 micro-seconds per c.s. = 50ms
CPU time spent on context switching.
If this was 100k...
Some formulas
● IO bound tasks - threads = number of cores * (1 + wait time / service time)
● Balanced - N = U * Ncores * (1+ I/O Wait time / CPU time) ^^ same as above without the U factor
● CPU bound tasks - threads = number of CPUs + 1 ^^ same as above but W == 0
see y i like this version of the formula? yes.
I think this is a pretty good presentation.. :P
yep, but it looks like we’ll keep adding slides until tomorrow night…. lol
Most probably you’re right - but this covers great things
we have to stop somewhere. also, we need to get drunk ;)
Conclusions
● Scheduling is complicated
● Context switches can be very expensive
● Find the right balance when deciding how many workers
to use
● Use CPU affinity to reduce cost where applicable
● And most important lesson, don't overload the scheduler
with runnable threads/processes
Thank you
Daniel Ben-Zvi | daniel.benzvi@gmail.com
Further reading
● The Timer Interrupt Handler
● Context Switching on X86
● CPU Scheduling
● How long does it take to make a context switch?
● Calculate the Optimum Number of Threads

More Related Content

PPTX
Operating System - Types Of Operating System Unit-1
PPT
Semaphores OS Basics
ODP
Introduction to Shell script
PPTX
Cpu scheduling in operating System.
PPTX
Types Of Operating Systems
PDF
PPTX
Linux process management
PPTX
Process management in linux
Operating System - Types Of Operating System Unit-1
Semaphores OS Basics
Introduction to Shell script
Cpu scheduling in operating System.
Types Of Operating Systems
Linux process management
Process management in linux

What's hot (20)

PPT
Context Switching
PPTX
operating system
PPTX
Operating system 31 multiple processor scheduling
PDF
Linux systems - Linux Commands and Shell Scripting
PDF
Linux Presentation
PPTX
Types of operating system.................
PPT
Process scheduling linux
PPTX
SHELL PROGRAMMING
PPTX
Shell scripting
PPTX
Wireshark
PPTX
Networking in linux
PPTX
Unix operating system architecture with file structure
PPTX
Cpu scheduling
PPTX
Kernel I/O subsystem
PPT
Shell Scripting in Linux
PPTX
Shortest job first Scheduling (SJF)
PPTX
Multiprocessor architecture
PPT
Shell and its types in LINUX
PPTX
Linux Boot Process
PDF
Operating System-Process Scheduling
Context Switching
operating system
Operating system 31 multiple processor scheduling
Linux systems - Linux Commands and Shell Scripting
Linux Presentation
Types of operating system.................
Process scheduling linux
SHELL PROGRAMMING
Shell scripting
Wireshark
Networking in linux
Unix operating system architecture with file structure
Cpu scheduling
Kernel I/O subsystem
Shell Scripting in Linux
Shortest job first Scheduling (SJF)
Multiprocessor architecture
Shell and its types in LINUX
Linux Boot Process
Operating System-Process Scheduling
Ad

Similar to OS scheduling and The anatomy of a context switch (20)

PDF
Operating System-Concepts of Process
PDF
seminar report
PDF
OS-Process.pdf
DOCX
Process concept
PPT
OS_Unit_II_Ch3 Process and CPU Scheduling
PPTX
CNT_Lecture 8.pptx operating system computer
PPTX
Process scheduling
PDF
Module 3-cpu-scheduling
PPTX
CPU Scheduling Criteria CPU Scheduling Criteria (1).pptx
PPTX
Operating Systems Process Management.pptx
PPTX
CONTEXT SWITCHING,PREEMPTIVE,NONPREEMPTIVE.pptx
DOCX
Process scheduling
PPTX
Linux architecture
PPT
Operating System 3
PDF
Making Linux do Hard Real-time
PDF
AOS Lab 6: Scheduling
PDF
OS - Process Concepts
PPTX
Os unit 3 , process management
PDF
Linux scheduler
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineeri...
Operating System-Concepts of Process
seminar report
OS-Process.pdf
Process concept
OS_Unit_II_Ch3 Process and CPU Scheduling
CNT_Lecture 8.pptx operating system computer
Process scheduling
Module 3-cpu-scheduling
CPU Scheduling Criteria CPU Scheduling Criteria (1).pptx
Operating Systems Process Management.pptx
CONTEXT SWITCHING,PREEMPTIVE,NONPREEMPTIVE.pptx
Process scheduling
Linux architecture
Operating System 3
Making Linux do Hard Real-time
AOS Lab 6: Scheduling
OS - Process Concepts
Os unit 3 , process management
Linux scheduler
IJCER (www.ijceronline.com) International Journal of computational Engineeri...
Ad

Recently uploaded (20)

PPTX
additive manufacturing of ss316l using mig welding
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPTX
Geodesy 1.pptx...............................................
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Artificial Intelligence
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
PPT on Performance Review to get promotions
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPT
Mechanical Engineering MATERIALS Selection
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
Construction Project Organization Group 2.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
additive manufacturing of ss316l using mig welding
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Geodesy 1.pptx...............................................
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Artificial Intelligence
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPT on Performance Review to get promotions
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Mechanical Engineering MATERIALS Selection
R24 SURVEYING LAB MANUAL for civil enggi
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Fundamentals of Mechanical Engineering.pptx
Construction Project Organization Group 2.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
UNIT 4 Total Quality Management .pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems

OS scheduling and The anatomy of a context switch

  • 1. OS Scheduling and The Anatomy of a context switch Daniel Ben-Zvi
  • 3. Scheduling in the modern world ● Pause a running process in the middle of its execution ● Resume a previously paused process that is ready to execute ● Do this (tens) of thousands of times per second
  • 4. But it wasn’t always like this
  • 5. No Scheduler Only one process can operate at a time. A single process can hog the whole system forever.
  • 6. Digger sleep function, 1982 sleep(time) int time; { int a,b; for(a=0; a<time; a++) { for(b=0; b<100; b++); } } So this is why pressing Turbo would speed it up! :) 3 (source: http://www.digger.org/digsrc_orig.zip)
  • 7. And then they said.. lets cooperate!
  • 8. Cooperative scheduler Each process must “play nice” and yield control to other processes. … but a single misbehaving process can still hog the whole system forever. (the sad reality of a kibbutz)
  • 9. Sleep method designed for MacOS 8 (in C) void wxThread::Sleep(unsigned long milliseconds) { UnsignedWide start, now; Microseconds(&start); double mssleep = milliseconds * 1000 ; double msstart, msnow ; msstart = (start.hi * 4294967296.0 + start.lo) ; do { YieldToAnyThread(); Microseconds(&now); msnow = (now.hi * 4294967296.0 + now.lo) ; } while( msnow - msstart < mssleep ); } // Start a busy loop till time has passed // Pass control to other threads (in the active process context) (source: https://github.com/LuaDist/wxwidgets/blob/master/src/mac/classic/thread.cpp#L539)
  • 10. Non preemptive A process will eventually misbehave...
  • 12. Round Robin (Time sharing) (source: http://www.qnx.com/developers/docs/qnx_4.25_docs/qnx4/sysarch/microkernel.html)
  • 13. but.. tasks are different in nature! ● Some tasks require CPU time to complete ● others require I/O (waiting for user input, network data, disk) ● Some just sleep most of the time
  • 16. So, how do you interrupt a running process in the middle of its execution, and then resume the work later on?
  • 17. Use a timer interrupt! ● Ask the processor to interrupt the active process and wake up the kernel every X interval (in the goold old Linux days, this was defined as CONFIG_HZ): void do_timer(struct pt_regs *regs) // Linux kernel (some version) interrupt handler { jiffies_64++; update_process_times(user_mode(regs)); update_times(); }
  • 18. The “context” in this program bits 16 ; 16 bits real mode start: cli ; disable interrupts mov si, msg ; SI points to RAM address of msg mov ah, 0x0e ; print char service, with int 0x10 .loop lodsb ; load RAM: AL <- [DS:SI] && SI++ or al, al ; end of string? jz halt int 0x10 ; call BIOS print char jmp .loop ; next char halt: hlt ; halt msg: db "Hello, World!", 0
  • 19. # void swtch(struct context **old, struct context *new); # # Save current register context in old # and then load register context from new. .globl swtch swtch: movl 4(%esp), %eax movl 8(%esp), %edx # Save old callee-save registers pushl %ebp ; Save old process ebp (base pointer) register pushl %ebx ; Save old process ebx (general) register pushl %esi ; Save old process esi (source index) register pushl %edi ; Save old process edi (destination index) register # Switch stacks movl %esp, (%eax) movl %edx, %esp # Load new callee-save registers popl %edi ; Load new process edi popl %esi ; Load new process esi popl %ebx ; Load new process ebx popl %ebp ; Load new process ebp ret (source: http://samwho.co.uk/blog/2013/06/01/context-switching-on-x86/)
  • 20. Finding out the context switch rate [us-east-1c] i-91xxxxxx [root:~]$ vmstat -w 10 procs ---------------memory-------------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 3275668 48308 370264 0 0 0 0 12504 8884 1 5 94 0 0 0 0 0 3275492 48308 370264 0 0 0 0 12933 8834 2 5 93 0 0 1 0 0 3275588 48308 370264 0 0 0 0 12490 8825 2 4 94 0 0
  • 21. Using /usr/bin/time [us-east-1c] i-xxxxxxxx [root:~]$/usr/bin/time -v find / > /dev/null Command being timed: "find /" User time (seconds): 0.12 System time (seconds): 0.30 Percent of CPU this job got: 6% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.79 ... Voluntary context switches: 7362 Involuntary context switches: 60 ... File system inputs: 58888
  • 22. Voluntary vs Involuntary ● Non voluntary context switches occur when the scheduler interrupts the process (i.e. timeslice expired) ● Voluntary context switches occur when the process yields control to the CPU (i.e. waiting for I/O, sleeping)
  • 23. Using pidstat [us-east-1c] i-xxxxxxxx [root:~]$pidstat -w -p 1504 Linux 3.13.0-55-generic (ip-10-xx-xx-x8) 06/21/2015 _x86_64_ (2 CPU) 09:24:23 PM UID PID cswch/s nvcswch/s Command 09:24:23 PM 0 1504 531.40 3.45 haproxy
  • 24. Some benchmarks ● No CPU affinity ○ Intel 5150: ~4300ns/context switch ○ Intel E5440: ~3600ns/context switch ○ Intel E5520: ~4500ns/context switch ○ Intel X5550: ~3000ns/context switch ○ Intel L5630: ~3000ns/context switch ○ Intel E5-2620: ~3000ns/context switch ● With CPU affinity ○ Intel 5150: ~1900ns/process context switch, ~1700ns/thread context switch ○ Intel E5440: ~1300ns/process context switch, ~1100ns/thread context switch ○ Intel E5520: ~1400ns/process context switch, ~1300ns/thread context switch ○ Intel X5550: ~1300ns/process context switch, ~1100ns/thread context switch ○ Intel L5630: ~1600ns/process context switch, ~1400ns/thread context switch ○ Intel E5-2620: ~1600ns/process context switch, ~1300ns/thread context switch Performance boost: 5150: 66%, E5440: 65-70%, E5520: 50-54%, X5550: 55%, L5630: 45%, E5-2620: 45%. (source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
  • 25. The hidden cost ● TLB flush ● Potentially screw up L1,L2,L3 CPU cache ● Kernel tries very hard to schedule threads on the same Core/CPU/Numa node, but sometimes it can't.
  • 26. Without CPU affinity (source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
  • 27. With CPU affinity (source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
  • 28. Without CPU affinity With CPU affinity (source: http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html)
  • 29. Full context switch ● Save current process context ● TLB flush ● Potentially screw up L1,L2,L3 CPU cache ● Load next process context ● Can be done in hardware - but is usually done in software.
  • 30. So how much is too much? 10k c.s. @ 5 micro-seconds per c.s. = 50ms CPU time spent on context switching. If this was 100k...
  • 31. Some formulas ● IO bound tasks - threads = number of cores * (1 + wait time / service time) ● Balanced - N = U * Ncores * (1+ I/O Wait time / CPU time) ^^ same as above without the U factor ● CPU bound tasks - threads = number of CPUs + 1 ^^ same as above but W == 0 see y i like this version of the formula? yes. I think this is a pretty good presentation.. :P yep, but it looks like we’ll keep adding slides until tomorrow night…. lol Most probably you’re right - but this covers great things we have to stop somewhere. also, we need to get drunk ;)
  • 32. Conclusions ● Scheduling is complicated ● Context switches can be very expensive ● Find the right balance when deciding how many workers to use ● Use CPU affinity to reduce cost where applicable ● And most important lesson, don't overload the scheduler with runnable threads/processes
  • 33. Thank you Daniel Ben-Zvi | daniel.benzvi@gmail.com
  • 34. Further reading ● The Timer Interrupt Handler ● Context Switching on X86 ● CPU Scheduling ● How long does it take to make a context switch? ● Calculate the Optimum Number of Threads