SlideShare a Scribd company logo
Kernel Synchronization in Linux
(Chap. 5 in Understanding the
Linux Kernel)
J. H. Wang
Sep. 29, 2011
Outline
• Kernel Control Paths
• When Synchronization is not Necessary
• Synchronization Primitives
• Synchronizing Accesses to Kernel Data
Structures
• Examples of Race Condition Prevention
Kernel Control Paths
• Linux kernel: like a server that answers
requests
– Parts of the kernel are run in interleaved way

• A kernel control path: a sequence of
instructions executed in kernel mode on
behalf of current process
– Interrupts or exceptions
– Lighter than a process (less context)
Example Kernel Control Paths
• Three CPU states are considered
– Running a process in User Mode (User)
– Running an exception or a system call handler
(Excp)
– Running an interrupt handler (Intr)
Kernel Preemption
• Preemptive kernel: a process running in kernel
mode can be replaced by another process while
in the middle of a kernel function
• The main motivation for making a kernel
preemptive is to reduce the dispatch latency of the
user mode processes
– Delay between the time they become runnable and
the time they actually begin running

• The kernel can be preempted only when it is
executing an exception handler (in particular a
system call) and the kernel preemption has not
been explicitly disabled
When Synchronization in
Necessary
• A race condition can occur when the outcome of a
computation depends on how two or more interleaved
kernel control paths are nested
• To identify and protect the critical regions in exception
handlers, interrupt handlers, deferrable functions, and
kernel threads
– On single CPU, critical region can be implemented by disabling
interrupts while accessing shared data
– If the same data is shared only by the service routines of system
calls, critical region can be implemented by disabling kernel
preemption while accessing shared data

• Things are more complicated on multiprocessor systems
– Different synchronization techniques are necessary
When Synchronization is not
Necessary
• The same interrupt cannot occur until the
handler terminates
• Interrupt handlers and softirqs are nonpreemptable, non-blocking
• A kernel control path performing interrupt
handling cannot be interrupted by a kernel
control path executing a deferrable function or a
system call service routine
• Softirqs cannot be interleaved
Synchronization Primitives
Technique

Description

Scope

Per-CPU
variables

Duplicate a data structure
among CPUs

All CPUs

Atomic
operation

Atomic read-modify-write
instruction

All

Memory barrier

Avoid instruction re-ordering

Local CPU

Spin lock

Lock with busy wait

All

Semaphore

Lock with blocking wait
(sleep)

All

Seqlocks

Lock based on access counter

All

Local interrupt
disabling

Forbid interrupt on a single
CPU

Local

Local softirq
disabling

Forbid deferrable function on a Local
single CPU

Read-copyupdate (RCU)

Lock-free access to shared data All
through pointers
Per-CPU Variables
• The simplest and most efficient synchronization
technique consists of declaring kernel variables as perCPU variables
– an array of data structures, one element per each CPU in the
system
– A CPU should not access the elements of the array
corresponding to the other CPUs

• While per-CPU variables provide protection against
concurrent accesses from several CPUs, they do not
provide protection against accesses from asynchronous
functions (interrupt handlers and deferrable functions)
• Per-CPU variables are prone to race conditions caused
by kernel preemption, both in uniprocessor and
multiprocessor systems
Functions and Macros for the PerCPU Variables

Macro/ function
Description
name
DEFINE_PER_CPU(ty
pe, name)

Statically allocates a per-CPU array

per_cpu(name, cpu)

Selects the element for CPU of the per-CPU array

__get_cpu_var(name)

Selects the local CPU's element of the per-CPU
array

get_cpu_var(name)

Disables kernel preemption, then selects the local
CPU's element of the per-CPU array

put_cpu_var(name)

Enables kernel preemption

alloc_percpu(type)

Dynamically allocates a per-CPU array

free_percpu(pointer)

Releases a dynamically allocated per-CPU array

per_cpu_ptr(pointer,
cpu)

Returns the address of the element for CPU of
the per-CPU array
Atomic Operations
• Atomic 80x86 instructions
– Instructions that make zero or one aligned
memory access
– Read-modify-write instructions (inc or dec)
– Read-modify-write instructions whose opcode
is prefixed by the lock byte (0xf0)
– Assembly instructions whose opcode is
prefixed by a rep byte (0xf2, 0xf3) are not
atmoic
• Atomic_t type: 24-bit atomic counter
• Atomic operations in Linux:
Function

Description

atomic_read(v)
atomic_set(v,i)
atomic_add(i,v)
atomic_sub(i,v)
atomic_sub_and_test(i,v)
atomic_inc(v)
atomic_dec(v)
atomic_dec_and_test(v)
atomic_inc_and_test(v)
atomic_add_negative(i,v)

Return *v
set *v to i
add i to *v
subtract i from *v
subtract i from *v and return 1 if result is 0
add 1 to *v
subtract 1 from *v
subtract 1 from *v and return 1 if result is 0
add 1 to *v and return 1 if result is 0
add i to *v and return 1 if result is negative
Atomic Bit Handling Functions
Function

Description

test_bit(nr, addr)
set_bit(nr, addr)
clear_bit(nr, addr)
change_bit(nr, addr)
test_and_set_bit(nr, addr)
test_and_clear_bit(nr, addr)
test_and_change_bit(nr, addr)
atomic_clear_mask(mask, addr)
atomic_set_mask(mask, addr)

return the nrth bit of *addr
set the nrth bit of *addr
clear the nrth bit of *addr
invert the nrth bit of *addr
set nrth bit of *addr and return old value
clear nrth bit of *addr and return old value
invert nrth bit of *addr and return old value
clear all bits of addr specified by mask
set all bits of addr specified by mask
Memory Barriers
• When dealing with synchronization, instruction
reordering must be avoided
• A memory barrier primitive ensures that the
operations before the primitive are finished
before starting the operations after the primitive
– All instructions that operate on I/O ports
– All instructions prefixed by lock byte
– All instructions that write into control registers,
system registers, or debug registers
– A few special instructions, e.g. iret
– lfence, sfence, and mfence instructions for Pentium 4
Memory Barriers in Linux
Macro
mb()
rmb()
wmb()
smp_mb()
smp_rmb()
smp_wmb()

Description
Memory barrier for MP and UP
Read memory barrier for MP, UP
Write memory barrier for MP, UP
Memory barrier for MP only
Read memory barrier for MP only
Write memory barrier for MP only
Spin Locks
• Spin locks are a special kind of lock
designed to work in a multiprocessor
environment
– Busy waiting
– Very convenient
– Represented by spinlock_t structure
• slock: 1 – unlocked, <=0 - locked
• break_lock: flag
Protecting Critical Regions with
Several Locks
Spin Lock Macros
Macro

Description

spin_lock_init()
spin_lock()
spin_unlock()
spin_unlock_wait()
spin_is_locked()
spin_trylock()

set the spinlock to 1 (unlocked)
cycle until spin lock becomes 1, then set to 0
set the spin lock to 1
wait until the spin lock becomes 1
return 0 if the spin lock is set to 1
set the spin lock to 0 (locked), and return 1 if the
lock is obtained
Read/Write Spin Locks
• To increase the amount of concurrency in the kernel
– Multiple reads, one write

• rwlock_t structure
– lock field: 32-bit
• 24-bit counter: (bit 0-23) # of kernel control paths currently reading
the protected data (in two’s complement)
• An unlock flag: (bit 24)

• Macros
–
–
–
–

read_lock()
read_unlock()
write_lock()
write_unlock()
Read/Write Spin Locks
Seqlock
• Seqlocks introduced in Linux 2.6 are
similar to read/write spin locks
– except that they give a much higher priority
to writers
– a writer is allowed to proceed even when
readers are active
Read-Copy Update
• Read-copy update (RCU): another synchronization
technique designed to protect data structures
that are mostly accessed for reading by several
CPUs
– RCU allows many readers and many writers to
proceed concurrently
– RCU is lock-free

• Key ideas

– Only data structures that are dynamically allocated
and referenced via pointers can be protected by RCU
– No kernel control path can sleep inside a critical
section protected by RCU
• Macros
– rcu_read_lock()
– rcu_read_unlock()
– call_rcu()

• RCU
– New in Linux 2.6
– Used in networking layer and VFS
Semaphores
• Two kinds of semaphores
– Kernel semaphores: by kernel control paths
– System V IPC semaphores: by user processes

• Kernel semaphores
– struct semaphore
• count
• wait
• sleepers

– up(): to acquire a kernel semaphore (similar to signal)
– down(): to release kernel semaphore (similar to wait)
Read/Write Semaphores
• Similar to read/write spin locks
– except that waiting processes are suspended instand of spinning

• struct rw_semaphore
– count
– wait_list
– wait_lock

• init_rwsem()
• down_read(), down_write(): acquire a read/write
semaphore
• up_read(), up_write(): release a read/write semaphore
Completions
• To solve a subtle race condition in
mutliprocessor systems
– Similar to semaphores

• struct completion
– done
– wait

• complete(): corresponding to up()
• wait_for_completion(): corresponding to
down()
Local Interrupt Disabling
• Interrupts can be disabled on a CPU with
cli instruction
– local_irq_disable() macro

• Interrupts can be enabled by sti
instruction
– local_irq_enable() macro
Disabling/Enabling Deferrable
Functions
• “softirq”
• The kernel sometimes needs to disable
deferrable functions without disabling
interrupts
– local_bh_disable() macro
– local_bh_enable() macro
Synchronizing Accesses to Kernel
Data Structures
• Rule of thumb for kernel developers:
– Always keep the concurrency level as high as
possible in the system
– Two factors:
• The number of I/O devices that operate
concurrently
• The number of CPUs that do productive work
• A shared data structure consisting of a
single integer value can be updated by
declaring it as an atomic_t type and by
using atomic operations
• Inserting an element into a shared linked
list is never atomic since it consists of at
least two pointer assignments
Choosing among Spin Locks,
Semaphores, and Interrupt Disabling
Kernel control paths

UP protection

MP further protection

Exceptions
interrupts
deferrable functions
exceptions+interrupts
exceptions+deferrable
interrupts+deferrable
exceptions+interrupts+d
eferrable

Semaphore
local interrupt disabling
none
local interrupt disabling
local softirq disabling
local interrupt disabling
local interrupt disabling

None
spin lock
none or spin lock
spin lock
spin lock
spin lock
spin lock
Interrupt-aware Spin Lock Macros
•
•
•
•
•
•
•
•
•
•
•
•
•

spin_lock_irq(l), spin_unlcok_irq(l)
spin_lock_bh(l), spin_unlock_bh(l)
spin_lock_irqsave(l,f), spin_unlock_irqrestore(l,f)
read_lock_irq(l), read_unlock_irq(l)
read_lock_bh(l), read_unlock_bh(l)
write_lock_irq(l), write_unlock_irq(l)
write_lock_bh(l), write_unlock_bh(l)
read_lock_irqsave(l,f), read_unlock_irqrestore(l,f)
write_lock_irqsave(l,f), write_unlock_irqrestore(l,f)
read_seqbegin_irqsave(l,f), read_seqretry_irqrestore(l,f),
write_seqlock_irqsave(l,f), write_sequnlock_irqrestore(l,f)
write_seqlock_irq(l), write_sequnlock_irq(l)
write_seqlock_bh(l), write_sequnlock_bh(l)
Examples of Race Condition
Prevention
• Reference counters: an atomic_t counter associated with
a specific resource
• The global kernel lock (a.k.a big kernel lock, or BKL)
– Lock_kernel(), unlock_kernel()
– Mostly used in early versions, used in Linux 2.6 to protect old
code (related to VFS, and several file systems)

• Memory descriptor read/write semaphore
– mmap_sem field in mm_struct

• Slab cache list semaphore
– cache_chain_sem semaphore

• Inode semaphore
– i_sem field
• When a program uses two or more semaphores,
the potential for deadlock is present because two
different paths could wait for each other
– Linux has few problems with deadlocks on
semaphore requests since each path usually acquire
just one semaphore
– In cases such as rmdir() and rename() system calls,
two semaphore requests
– To avoid such deadlocks, semaphore requests are
performed in address order
• Semaphore request are performed in predefined address
order
Thanks for Your Attention!

More Related Content

PDF
Embedded linux network device driver development
PDF
BusyBox for Embedded Linux
PDF
PPT
U Boot or Universal Bootloader
PPTX
Linux Kernel Booting Process (1) - For NLKB
PPT
ITE v5.0 - Chapter 5
PPT
Linux file system
PDF
Linux Internals - Interview essentials 4.0
Embedded linux network device driver development
BusyBox for Embedded Linux
U Boot or Universal Bootloader
Linux Kernel Booting Process (1) - For NLKB
ITE v5.0 - Chapter 5
Linux file system
Linux Internals - Interview essentials 4.0

What's hot (20)

PPTX
Linux Memory Management with CMA (Contiguous Memory Allocator)
PDF
U-Boot - An universal bootloader
PDF
Linux Memory Management
PDF
Linux-Internals-and-Networking
PPTX
Context switching
PDF
File systems for Embedded Linux
PPT
Linux memory
PDF
BeagleBone Black Bootloaders
PPTX
The TCP/IP Stack in the Linux Kernel
PDF
Linux scheduler
PDF
File System Hierarchy
PDF
Linux Locking Mechanisms
PDF
USB Drivers
PPT
Unix - Sistema Operacional
PPTX
UNIX/Linux training
PPTX
04. availability-concepts
PPTX
Linux Serial Driver
PDF
Linux File System
PPTX
Linux kernel debugging
Linux Memory Management with CMA (Contiguous Memory Allocator)
U-Boot - An universal bootloader
Linux Memory Management
Linux-Internals-and-Networking
Context switching
File systems for Embedded Linux
Linux memory
BeagleBone Black Bootloaders
The TCP/IP Stack in the Linux Kernel
Linux scheduler
File System Hierarchy
Linux Locking Mechanisms
USB Drivers
Unix - Sistema Operacional
UNIX/Linux training
04. availability-concepts
Linux Serial Driver
Linux File System
Linux kernel debugging
Ad

Similar to Synchronization linux (20)

PPTX
WEEK6_COMPUTER_ORGANIZATION.pptx
PPT
cs-procstruc.ppt
PPTX
Computer Organization: Introduction to Microprocessor and Microcontroller
PPTX
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
PPT
pipeline and pipeline hazards
PPTX
Multithreading computer architecture
PDF
Introduction to ARM Architecture
PPTX
Memory model
PPTX
Beneath the Linux Interrupt handling
PPTX
Computer_Organization_and_Architecture.pptx
PPTX
Computer_Organization and architecture _unit 1.pptx
PDF
Linux Device Driver parallelism using SMP and Kernel Pre-emption
ODP
Linux Internals - Kernel/Core
PDF
Board support package_on_linux
PPTX
VMworld 2016: vSphere 6.x Host Resource Deep Dive
PPTX
Refining Linux
PPT
Scheduler Activations - Effective Kernel Support for the User-Level Managemen...
PDF
ARM architcture
PPTX
Embedded systems 101 final
PPTX
UNIT 3 - General Purpose Processors
WEEK6_COMPUTER_ORGANIZATION.pptx
cs-procstruc.ppt
Computer Organization: Introduction to Microprocessor and Microcontroller
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
pipeline and pipeline hazards
Multithreading computer architecture
Introduction to ARM Architecture
Memory model
Beneath the Linux Interrupt handling
Computer_Organization_and_Architecture.pptx
Computer_Organization and architecture _unit 1.pptx
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Internals - Kernel/Core
Board support package_on_linux
VMworld 2016: vSphere 6.x Host Resource Deep Dive
Refining Linux
Scheduler Activations - Effective Kernel Support for the User-Level Managemen...
ARM architcture
Embedded systems 101 final
UNIT 3 - General Purpose Processors
Ad

More from Susant Sahani (20)

PDF
systemd
PDF
systemd
PDF
How to debug systemd problems fedora project
PDF
Systemd vs-sys vinit-cheatsheet.jpg
PDF
Systemd cheatsheet
PDF
Systemd
PDF
Systemd for administrators
PDF
Pdf c1t tlawaxb
PDF
Systemd mlug-20140614
PDF
Summit demystifying systemd1
PDF
Systemd evolution revolution_regression
PDF
Systemd for administrators
PDF
Systemd poettering
PPT
Interface between kernel and user space
PPT
Week3 binary trees
PDF
Van jaconson netchannels
PPT
PPT
Demo preorder-stack
PDF
Bacnet white paper
PDF
Api presentation
systemd
systemd
How to debug systemd problems fedora project
Systemd vs-sys vinit-cheatsheet.jpg
Systemd cheatsheet
Systemd
Systemd for administrators
Pdf c1t tlawaxb
Systemd mlug-20140614
Summit demystifying systemd1
Systemd evolution revolution_regression
Systemd for administrators
Systemd poettering
Interface between kernel and user space
Week3 binary trees
Van jaconson netchannels
Demo preorder-stack
Bacnet white paper
Api presentation

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Encapsulation theory and applications.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Understanding_Digital_Forensics_Presentation.pptx
A Presentation on Artificial Intelligence
Mobile App Security Testing_ A Comprehensive Guide.pdf
Empathic Computing: Creating Shared Understanding
Review of recent advances in non-invasive hemoglobin estimation
Dropbox Q2 2025 Financial Results & Investor Presentation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Encapsulation theory and applications.pdf
NewMind AI Monthly Chronicles - July 2025
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
Understanding_Digital_Forensics_Presentation.pptx

Synchronization linux

  • 1. Kernel Synchronization in Linux (Chap. 5 in Understanding the Linux Kernel) J. H. Wang Sep. 29, 2011
  • 2. Outline • Kernel Control Paths • When Synchronization is not Necessary • Synchronization Primitives • Synchronizing Accesses to Kernel Data Structures • Examples of Race Condition Prevention
  • 3. Kernel Control Paths • Linux kernel: like a server that answers requests – Parts of the kernel are run in interleaved way • A kernel control path: a sequence of instructions executed in kernel mode on behalf of current process – Interrupts or exceptions – Lighter than a process (less context)
  • 4. Example Kernel Control Paths • Three CPU states are considered – Running a process in User Mode (User) – Running an exception or a system call handler (Excp) – Running an interrupt handler (Intr)
  • 5. Kernel Preemption • Preemptive kernel: a process running in kernel mode can be replaced by another process while in the middle of a kernel function • The main motivation for making a kernel preemptive is to reduce the dispatch latency of the user mode processes – Delay between the time they become runnable and the time they actually begin running • The kernel can be preempted only when it is executing an exception handler (in particular a system call) and the kernel preemption has not been explicitly disabled
  • 6. When Synchronization in Necessary • A race condition can occur when the outcome of a computation depends on how two or more interleaved kernel control paths are nested • To identify and protect the critical regions in exception handlers, interrupt handlers, deferrable functions, and kernel threads – On single CPU, critical region can be implemented by disabling interrupts while accessing shared data – If the same data is shared only by the service routines of system calls, critical region can be implemented by disabling kernel preemption while accessing shared data • Things are more complicated on multiprocessor systems – Different synchronization techniques are necessary
  • 7. When Synchronization is not Necessary • The same interrupt cannot occur until the handler terminates • Interrupt handlers and softirqs are nonpreemptable, non-blocking • A kernel control path performing interrupt handling cannot be interrupted by a kernel control path executing a deferrable function or a system call service routine • Softirqs cannot be interleaved
  • 8. Synchronization Primitives Technique Description Scope Per-CPU variables Duplicate a data structure among CPUs All CPUs Atomic operation Atomic read-modify-write instruction All Memory barrier Avoid instruction re-ordering Local CPU Spin lock Lock with busy wait All Semaphore Lock with blocking wait (sleep) All Seqlocks Lock based on access counter All Local interrupt disabling Forbid interrupt on a single CPU Local Local softirq disabling Forbid deferrable function on a Local single CPU Read-copyupdate (RCU) Lock-free access to shared data All through pointers
  • 9. Per-CPU Variables • The simplest and most efficient synchronization technique consists of declaring kernel variables as perCPU variables – an array of data structures, one element per each CPU in the system – A CPU should not access the elements of the array corresponding to the other CPUs • While per-CPU variables provide protection against concurrent accesses from several CPUs, they do not provide protection against accesses from asynchronous functions (interrupt handlers and deferrable functions) • Per-CPU variables are prone to race conditions caused by kernel preemption, both in uniprocessor and multiprocessor systems
  • 10. Functions and Macros for the PerCPU Variables Macro/ function Description name DEFINE_PER_CPU(ty pe, name) Statically allocates a per-CPU array per_cpu(name, cpu) Selects the element for CPU of the per-CPU array __get_cpu_var(name) Selects the local CPU's element of the per-CPU array get_cpu_var(name) Disables kernel preemption, then selects the local CPU's element of the per-CPU array put_cpu_var(name) Enables kernel preemption alloc_percpu(type) Dynamically allocates a per-CPU array free_percpu(pointer) Releases a dynamically allocated per-CPU array per_cpu_ptr(pointer, cpu) Returns the address of the element for CPU of the per-CPU array
  • 11. Atomic Operations • Atomic 80x86 instructions – Instructions that make zero or one aligned memory access – Read-modify-write instructions (inc or dec) – Read-modify-write instructions whose opcode is prefixed by the lock byte (0xf0) – Assembly instructions whose opcode is prefixed by a rep byte (0xf2, 0xf3) are not atmoic
  • 12. • Atomic_t type: 24-bit atomic counter • Atomic operations in Linux: Function Description atomic_read(v) atomic_set(v,i) atomic_add(i,v) atomic_sub(i,v) atomic_sub_and_test(i,v) atomic_inc(v) atomic_dec(v) atomic_dec_and_test(v) atomic_inc_and_test(v) atomic_add_negative(i,v) Return *v set *v to i add i to *v subtract i from *v subtract i from *v and return 1 if result is 0 add 1 to *v subtract 1 from *v subtract 1 from *v and return 1 if result is 0 add 1 to *v and return 1 if result is 0 add i to *v and return 1 if result is negative
  • 13. Atomic Bit Handling Functions Function Description test_bit(nr, addr) set_bit(nr, addr) clear_bit(nr, addr) change_bit(nr, addr) test_and_set_bit(nr, addr) test_and_clear_bit(nr, addr) test_and_change_bit(nr, addr) atomic_clear_mask(mask, addr) atomic_set_mask(mask, addr) return the nrth bit of *addr set the nrth bit of *addr clear the nrth bit of *addr invert the nrth bit of *addr set nrth bit of *addr and return old value clear nrth bit of *addr and return old value invert nrth bit of *addr and return old value clear all bits of addr specified by mask set all bits of addr specified by mask
  • 14. Memory Barriers • When dealing with synchronization, instruction reordering must be avoided • A memory barrier primitive ensures that the operations before the primitive are finished before starting the operations after the primitive – All instructions that operate on I/O ports – All instructions prefixed by lock byte – All instructions that write into control registers, system registers, or debug registers – A few special instructions, e.g. iret – lfence, sfence, and mfence instructions for Pentium 4
  • 15. Memory Barriers in Linux Macro mb() rmb() wmb() smp_mb() smp_rmb() smp_wmb() Description Memory barrier for MP and UP Read memory barrier for MP, UP Write memory barrier for MP, UP Memory barrier for MP only Read memory barrier for MP only Write memory barrier for MP only
  • 16. Spin Locks • Spin locks are a special kind of lock designed to work in a multiprocessor environment – Busy waiting – Very convenient – Represented by spinlock_t structure • slock: 1 – unlocked, <=0 - locked • break_lock: flag
  • 17. Protecting Critical Regions with Several Locks
  • 18. Spin Lock Macros Macro Description spin_lock_init() spin_lock() spin_unlock() spin_unlock_wait() spin_is_locked() spin_trylock() set the spinlock to 1 (unlocked) cycle until spin lock becomes 1, then set to 0 set the spin lock to 1 wait until the spin lock becomes 1 return 0 if the spin lock is set to 1 set the spin lock to 0 (locked), and return 1 if the lock is obtained
  • 19. Read/Write Spin Locks • To increase the amount of concurrency in the kernel – Multiple reads, one write • rwlock_t structure – lock field: 32-bit • 24-bit counter: (bit 0-23) # of kernel control paths currently reading the protected data (in two’s complement) • An unlock flag: (bit 24) • Macros – – – – read_lock() read_unlock() write_lock() write_unlock()
  • 21. Seqlock • Seqlocks introduced in Linux 2.6 are similar to read/write spin locks – except that they give a much higher priority to writers – a writer is allowed to proceed even when readers are active
  • 22. Read-Copy Update • Read-copy update (RCU): another synchronization technique designed to protect data structures that are mostly accessed for reading by several CPUs – RCU allows many readers and many writers to proceed concurrently – RCU is lock-free • Key ideas – Only data structures that are dynamically allocated and referenced via pointers can be protected by RCU – No kernel control path can sleep inside a critical section protected by RCU
  • 23. • Macros – rcu_read_lock() – rcu_read_unlock() – call_rcu() • RCU – New in Linux 2.6 – Used in networking layer and VFS
  • 24. Semaphores • Two kinds of semaphores – Kernel semaphores: by kernel control paths – System V IPC semaphores: by user processes • Kernel semaphores – struct semaphore • count • wait • sleepers – up(): to acquire a kernel semaphore (similar to signal) – down(): to release kernel semaphore (similar to wait)
  • 25. Read/Write Semaphores • Similar to read/write spin locks – except that waiting processes are suspended instand of spinning • struct rw_semaphore – count – wait_list – wait_lock • init_rwsem() • down_read(), down_write(): acquire a read/write semaphore • up_read(), up_write(): release a read/write semaphore
  • 26. Completions • To solve a subtle race condition in mutliprocessor systems – Similar to semaphores • struct completion – done – wait • complete(): corresponding to up() • wait_for_completion(): corresponding to down()
  • 27. Local Interrupt Disabling • Interrupts can be disabled on a CPU with cli instruction – local_irq_disable() macro • Interrupts can be enabled by sti instruction – local_irq_enable() macro
  • 28. Disabling/Enabling Deferrable Functions • “softirq” • The kernel sometimes needs to disable deferrable functions without disabling interrupts – local_bh_disable() macro – local_bh_enable() macro
  • 29. Synchronizing Accesses to Kernel Data Structures • Rule of thumb for kernel developers: – Always keep the concurrency level as high as possible in the system – Two factors: • The number of I/O devices that operate concurrently • The number of CPUs that do productive work
  • 30. • A shared data structure consisting of a single integer value can be updated by declaring it as an atomic_t type and by using atomic operations • Inserting an element into a shared linked list is never atomic since it consists of at least two pointer assignments
  • 31. Choosing among Spin Locks, Semaphores, and Interrupt Disabling Kernel control paths UP protection MP further protection Exceptions interrupts deferrable functions exceptions+interrupts exceptions+deferrable interrupts+deferrable exceptions+interrupts+d eferrable Semaphore local interrupt disabling none local interrupt disabling local softirq disabling local interrupt disabling local interrupt disabling None spin lock none or spin lock spin lock spin lock spin lock spin lock
  • 32. Interrupt-aware Spin Lock Macros • • • • • • • • • • • • • spin_lock_irq(l), spin_unlcok_irq(l) spin_lock_bh(l), spin_unlock_bh(l) spin_lock_irqsave(l,f), spin_unlock_irqrestore(l,f) read_lock_irq(l), read_unlock_irq(l) read_lock_bh(l), read_unlock_bh(l) write_lock_irq(l), write_unlock_irq(l) write_lock_bh(l), write_unlock_bh(l) read_lock_irqsave(l,f), read_unlock_irqrestore(l,f) write_lock_irqsave(l,f), write_unlock_irqrestore(l,f) read_seqbegin_irqsave(l,f), read_seqretry_irqrestore(l,f), write_seqlock_irqsave(l,f), write_sequnlock_irqrestore(l,f) write_seqlock_irq(l), write_sequnlock_irq(l) write_seqlock_bh(l), write_sequnlock_bh(l)
  • 33. Examples of Race Condition Prevention • Reference counters: an atomic_t counter associated with a specific resource • The global kernel lock (a.k.a big kernel lock, or BKL) – Lock_kernel(), unlock_kernel() – Mostly used in early versions, used in Linux 2.6 to protect old code (related to VFS, and several file systems) • Memory descriptor read/write semaphore – mmap_sem field in mm_struct • Slab cache list semaphore – cache_chain_sem semaphore • Inode semaphore – i_sem field
  • 34. • When a program uses two or more semaphores, the potential for deadlock is present because two different paths could wait for each other – Linux has few problems with deadlocks on semaphore requests since each path usually acquire just one semaphore – In cases such as rmdir() and rename() system calls, two semaphore requests – To avoid such deadlocks, semaphore requests are performed in address order • Semaphore request are performed in predefined address order
  • 35. Thanks for Your Attention!