SlideShare a Scribd company logo
R(ead) C(opy) U(pdate)‏ [email_address]
Agenda What is RCU? Why? RCU Primitives RCU List Operations Sleepable RCU User Level RCU Q&A
What is RCU? Read-copy-update An alternative of rwlock Allow low over-head wait-free read Update can be expensive: need to maintain old copies if in use
Why RCU? W/o lock, this is broken due to compiler optimization and CPU out-of-order exec 1 struct foo { 2  int a; 3  int b; 4  int c; 5 }; 6 struct foo *gp = NULL; 7  8 /* . . . */ 9  10 p = kmalloc(sizeof(*p), GFP_KERNEL); 11 p->a = 1; 12 p->b = 2; 13 p->c = 3; 14 gp = p;
Why RCU? Mutex, no concurrent readers Spin_lock, ditto Rwlock, allow concurrent readers. The right choice?
Why RCU? rwlock is expensive Even read_lock has more overhead than spin_lock If write_lock is not really rare, rwlock contention is much worse than spin_lock contension
RCU Basis Split update into removal and reclamation phases Removal is performed immediately, while reclamation is deferred until all readers active during the removal phase have completed Takes advantage of the fact that writes to single aligned pointers are atomic on modern CPUs
RCU Terminology read-side critical sections: code delimited by rcu_read_lock() and rcu_read_unlock(),  MUST NOT  sleep. quiescent state: any code not within an RCU read-side critical section grace period: any time period during which each thread resides at least one quiescent state
RCU Terminology More on grace period: after a full grace period, all pre-existing RCU read-side critical sections are completed.
RCU Update Sequence Remove pointers to a data structure, so that subsequent readers cannot gain a reference to it Wait for all previous readers to complete their RCU read-side critical sections (AKA, a grace period passes)‏ At this point, there cannot be any readers who hold references to the data structure, so it now may safely be reclaimed (e.g., in another thread)‏
When Grace Period Passes? RCU readers are not permitted to block, switch to user-mode execution, or enter the idle loop. As soon as a CPU is seen passing through any of these three states, we know that that CPU has exited any previous RCU read-side critical sections. If we remove an item from a linked list, and then wait until all CPUs have switched context, executed in user mode, or executed in the idle loop, we can safely free up that item.
Core RCU APIs rcu_read_lock()‏ rcu_read_unlock()‏ synchronize_rcu()/call_rcu()‏ rcu_assign_pointer()‏ rcu_dereference()‏
Wait for Readers synchronize_rcu(): waits only for all ongoing RCU read-side critical sections to complete call_rcu(): registers a function and argument which are invoked after all ongoing RCU read-side critical sections have completed
Assign & Retrieve rcu_assign_pointer(): assign a new value to an RCU-protected pointer rcu_dereference(): fetch an RCU-protected pointer, which is safe to use until rcu_read_unlock()‏
RCU List Insert list_add_rcu()  list_add_tail_rcu()  list_replace_rcu()  Must be protected by some locks.
Sample Code 1 struct foo { 2  struct list_node *list; 3  int a; 4  int b; 5  int c; 6 }; 7 LIST_HEAD(head); 8  9 /* . . . */  10 p = kmalloc(sizeof(*p), GFP_KERNEL); 11 p->a = 1; 12 p->b = 2; 13 p->c = 3; 14 spin_lock(&list_lock); 15 list_add_head_rcu(&p->list, &head); 16 spin_unlock(&list_lock);
RCU List Transversal list_for_each_entry_rcu()‏ rcu_read_lock() and rcu_read_unlock() must be called, but they never spin or block Allows list_add_rcu() execute concurrently
RCU List Removal list_del_rcu() removes element from list. Must be protected by some lock But when to free it? synchronize_rcu() blocks until all read-side critical sections that begin before synchronize_rcu() is completed call_rcu() runs after all read-side critical sections that begin before call_rcu() is completed.
Sample Code spin_lock(&mylock); p = search(head, key); if (p == NULL)‏ spin_unlock(&mylock); else { list_del_rcu(&p->list); spin_unlock(&mylock); synchronize_rcu(); kfree(p); }
Sleepable RCU Why? the realtime kernels that require spinlock critical sections be preemptible also require that RCU read-side critical sections be preemptible
SRCU  Implementation Strategy prevent any given task sleeping in an RCU read-side critical section from getting an unbounded number of RCU callbacks refusing to provide asynchronous grace-period interfaces, such as the Classic RCU's call_rcu() API  isolating grace-period detection within each subsystem using SRCU
SRCU Grace Period? grace periods are detected by counting per-CPU counters. readers manipulate CPU-local counters. Two sets of per-CPU counters to do read-copy-update
SRCU  Data Structure struct srcu_struct { int completed; struct srcu_struct_array __percpu *per_cpu_ref; struct mutex mutex; }; struct srcu_struct_array { int c[2]; };
Wait for Grace Period synchronize_srcu()‏ Flip the completed counter. So new readers will be using the other set of per-CPU counters. Wait for the old count to drain to zero.
SRCU APIs int init_srcu_struct(struct srcu_struct *sp); void cleanup_srcu_struct(struct srcu_struct *sp); int srcu_read_lock(struct srcu_struct *sp) __acquires(sp); void srcu_read_unlock(struct srcu_struct *sp, int idx); void synchronize_srcu(struct srcu_struct *sp); void synchronize_srcu_expedited(struct srcu_struct *sp); long srcu_batches_completed(struct srcu_struct *sp);
Userspace RCU Available on  http://lttng.org/urcu git clone git://git.lttng.org/userspace-rcu.git Debian: aptitude install liburcu-dev Examples
Q & A

More Related Content

PDF
Yet another introduction to Linux RCU
ODP
CRIU: Time and Space Travel for Linux Containers
PDF
Bpf performance tools chapter 4 bcc
ODP
Fedora Virtualization Day: Linux Containers & CRIU
PPTX
grsecurity and PaX
PDF
Fun with Network Interfaces
PDF
netfilter and iptables
PDF
FOSDEM2015: Live migration for containers is around the corner
Yet another introduction to Linux RCU
CRIU: Time and Space Travel for Linux Containers
Bpf performance tools chapter 4 bcc
Fedora Virtualization Day: Linux Containers & CRIU
grsecurity and PaX
Fun with Network Interfaces
netfilter and iptables
FOSDEM2015: Live migration for containers is around the corner

What's hot (18)

ODP
Glusterd_thread_synchronization_using_urcu_lca2016
PDF
My talk about Tarantool and Lua at Percona Live 2016
PDF
LXC on Ganeti
PDF
Building Network Functions with eBPF & BCC
PDF
FreeBSD and Drivers
PDF
Presentation 14 09_2012
PDF
Staging driver sins
PDF
Semtex.c [CVE-2013-2094] - A Linux Privelege Escalation
PPTX
Bypassing ASLR Exploiting CVE 2015-7545
PDF
Spying on the Linux kernel for fun and profit
ODP
Checkpoint/restore of containers with CRIU
PDF
Specializing the Data Path - Hooking into the Linux Network Stack
PDF
Kqueue : Generic Event notification
PPTX
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
PDF
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
PDF
Security Monitoring with eBPF
PPTX
Kernel Proc Connector and Containers
PDF
LCA14: LCA14-412: GPGPU on ARM SoC session
Glusterd_thread_synchronization_using_urcu_lca2016
My talk about Tarantool and Lua at Percona Live 2016
LXC on Ganeti
Building Network Functions with eBPF & BCC
FreeBSD and Drivers
Presentation 14 09_2012
Staging driver sins
Semtex.c [CVE-2013-2094] - A Linux Privelege Escalation
Bypassing ASLR Exploiting CVE 2015-7545
Spying on the Linux kernel for fun and profit
Checkpoint/restore of containers with CRIU
Specializing the Data Path - Hooking into the Linux Network Stack
Kqueue : Generic Event notification
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Security Monitoring with eBPF
Kernel Proc Connector and Containers
LCA14: LCA14-412: GPGPU on ARM SoC session
Ad

Similar to RCU (20)

PDF
Linux Synchronization Mechanism: RCU (Read Copy Update)
ODP
Gluster d thread_synchronization_using_urcu_lca2016
PDF
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
PDF
Userspace RCU library : what linear multiprocessor scalability means for your...
ODP
Thread synchronization in GlusterD using URCU
PPTX
Wonderful world of Microarchitectural attacks
PDF
PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
PPT
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
PPT
Java util concurrent
PPTX
Run time, frequently, non-frequently reconfigurable system &
PDF
RTAI - Earliest Deadline First
PPTX
How to Avoid Learning the Linux-Kernel Memory Model
PDF
RTOS implementation
PPTX
DOCX
exp 3.docx
PPTX
Embedded JavaScript
PPTX
Threads and multi threading
PDF
rrxv6 Build a Riscv xv6 Kernel in Rust.pdf
PDF
RxJava@Android
PPT
Synchronization linux
Linux Synchronization Mechanism: RCU (Read Copy Update)
Gluster d thread_synchronization_using_urcu_lca2016
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Userspace RCU library : what linear multiprocessor scalability means for your...
Thread synchronization in GlusterD using URCU
Wonderful world of Microarchitectural attacks
PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
Your tuning arsenal: AWR, ADDM, ASH, Metrics and Advisors
Java util concurrent
Run time, frequently, non-frequently reconfigurable system &
RTAI - Earliest Deadline First
How to Avoid Learning the Linux-Kernel Memory Model
RTOS implementation
exp 3.docx
Embedded JavaScript
Threads and multi threading
rrxv6 Build a Riscv xv6 Kernel in Rust.pdf
RxJava@Android
Synchronization linux
Ad

More from bergwolf (11)

PDF
NFS updates for CLSF
ODP
Linux aio
PDF
CLFS 2010
PPT
Google Megastore
PDF
vmfs intro
PDF
pnfs status
PPT
linux trim
PPT
network filesystem briefs
PDF
logfs
PDF
gsoc and grub4ext4
PDF
grub4ext4 status-plans
NFS updates for CLSF
Linux aio
CLFS 2010
Google Megastore
vmfs intro
pnfs status
linux trim
network filesystem briefs
logfs
gsoc and grub4ext4
grub4ext4 status-plans

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Electronic commerce courselecture one. Pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Review of recent advances in non-invasive hemoglobin estimation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Electronic commerce courselecture one. Pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Monthly Chronicles - July 2025
Unlocking AI with Model Context Protocol (MCP)
20250228 LYD VKU AI Blended-Learning.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Digital-Transformation-Roadmap-for-Companies.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

RCU

  • 1. R(ead) C(opy) U(pdate)‏ [email_address]
  • 2. Agenda What is RCU? Why? RCU Primitives RCU List Operations Sleepable RCU User Level RCU Q&A
  • 3. What is RCU? Read-copy-update An alternative of rwlock Allow low over-head wait-free read Update can be expensive: need to maintain old copies if in use
  • 4. Why RCU? W/o lock, this is broken due to compiler optimization and CPU out-of-order exec 1 struct foo { 2 int a; 3 int b; 4 int c; 5 }; 6 struct foo *gp = NULL; 7 8 /* . . . */ 9 10 p = kmalloc(sizeof(*p), GFP_KERNEL); 11 p->a = 1; 12 p->b = 2; 13 p->c = 3; 14 gp = p;
  • 5. Why RCU? Mutex, no concurrent readers Spin_lock, ditto Rwlock, allow concurrent readers. The right choice?
  • 6. Why RCU? rwlock is expensive Even read_lock has more overhead than spin_lock If write_lock is not really rare, rwlock contention is much worse than spin_lock contension
  • 7. RCU Basis Split update into removal and reclamation phases Removal is performed immediately, while reclamation is deferred until all readers active during the removal phase have completed Takes advantage of the fact that writes to single aligned pointers are atomic on modern CPUs
  • 8. RCU Terminology read-side critical sections: code delimited by rcu_read_lock() and rcu_read_unlock(), MUST NOT sleep. quiescent state: any code not within an RCU read-side critical section grace period: any time period during which each thread resides at least one quiescent state
  • 9. RCU Terminology More on grace period: after a full grace period, all pre-existing RCU read-side critical sections are completed.
  • 10. RCU Update Sequence Remove pointers to a data structure, so that subsequent readers cannot gain a reference to it Wait for all previous readers to complete their RCU read-side critical sections (AKA, a grace period passes)‏ At this point, there cannot be any readers who hold references to the data structure, so it now may safely be reclaimed (e.g., in another thread)‏
  • 11. When Grace Period Passes? RCU readers are not permitted to block, switch to user-mode execution, or enter the idle loop. As soon as a CPU is seen passing through any of these three states, we know that that CPU has exited any previous RCU read-side critical sections. If we remove an item from a linked list, and then wait until all CPUs have switched context, executed in user mode, or executed in the idle loop, we can safely free up that item.
  • 12. Core RCU APIs rcu_read_lock()‏ rcu_read_unlock()‏ synchronize_rcu()/call_rcu()‏ rcu_assign_pointer()‏ rcu_dereference()‏
  • 13. Wait for Readers synchronize_rcu(): waits only for all ongoing RCU read-side critical sections to complete call_rcu(): registers a function and argument which are invoked after all ongoing RCU read-side critical sections have completed
  • 14. Assign & Retrieve rcu_assign_pointer(): assign a new value to an RCU-protected pointer rcu_dereference(): fetch an RCU-protected pointer, which is safe to use until rcu_read_unlock()‏
  • 15. RCU List Insert list_add_rcu() list_add_tail_rcu() list_replace_rcu() Must be protected by some locks.
  • 16. Sample Code 1 struct foo { 2 struct list_node *list; 3 int a; 4 int b; 5 int c; 6 }; 7 LIST_HEAD(head); 8 9 /* . . . */ 10 p = kmalloc(sizeof(*p), GFP_KERNEL); 11 p->a = 1; 12 p->b = 2; 13 p->c = 3; 14 spin_lock(&list_lock); 15 list_add_head_rcu(&p->list, &head); 16 spin_unlock(&list_lock);
  • 17. RCU List Transversal list_for_each_entry_rcu()‏ rcu_read_lock() and rcu_read_unlock() must be called, but they never spin or block Allows list_add_rcu() execute concurrently
  • 18. RCU List Removal list_del_rcu() removes element from list. Must be protected by some lock But when to free it? synchronize_rcu() blocks until all read-side critical sections that begin before synchronize_rcu() is completed call_rcu() runs after all read-side critical sections that begin before call_rcu() is completed.
  • 19. Sample Code spin_lock(&mylock); p = search(head, key); if (p == NULL)‏ spin_unlock(&mylock); else { list_del_rcu(&p->list); spin_unlock(&mylock); synchronize_rcu(); kfree(p); }
  • 20. Sleepable RCU Why? the realtime kernels that require spinlock critical sections be preemptible also require that RCU read-side critical sections be preemptible
  • 21. SRCU Implementation Strategy prevent any given task sleeping in an RCU read-side critical section from getting an unbounded number of RCU callbacks refusing to provide asynchronous grace-period interfaces, such as the Classic RCU's call_rcu() API isolating grace-period detection within each subsystem using SRCU
  • 22. SRCU Grace Period? grace periods are detected by counting per-CPU counters. readers manipulate CPU-local counters. Two sets of per-CPU counters to do read-copy-update
  • 23. SRCU Data Structure struct srcu_struct { int completed; struct srcu_struct_array __percpu *per_cpu_ref; struct mutex mutex; }; struct srcu_struct_array { int c[2]; };
  • 24. Wait for Grace Period synchronize_srcu()‏ Flip the completed counter. So new readers will be using the other set of per-CPU counters. Wait for the old count to drain to zero.
  • 25. SRCU APIs int init_srcu_struct(struct srcu_struct *sp); void cleanup_srcu_struct(struct srcu_struct *sp); int srcu_read_lock(struct srcu_struct *sp) __acquires(sp); void srcu_read_unlock(struct srcu_struct *sp, int idx); void synchronize_srcu(struct srcu_struct *sp); void synchronize_srcu_expedited(struct srcu_struct *sp); long srcu_batches_completed(struct srcu_struct *sp);
  • 26. Userspace RCU Available on http://lttng.org/urcu git clone git://git.lttng.org/userspace-rcu.git Debian: aptitude install liburcu-dev Examples
  • 27. Q & A