SlideShare a Scribd company logo
Glusterd_thread_synchronization_using_urcu_lca2016
2
Glusterd Thread Synchronization
using user space RCU
Atin Mukherjee
SSE-Red Hat
Gluster Maintainer
IRC : atinm on freenode
Twitter: @mukherjee_atin
3
Agenda
● Introduction to GlusterD
● Big lock in thread synchronization in GlusterD
● Issues with Big Lock approach
● Different locking primitives
● What is RCU
● Advantage of RCU over read-write lock
● RCU mechanisms – Insertion, Deletion, Reader
● URCU flavors
● URCU APIs
● URCU use cases
● Q&A
4
What is GlusterD
● Manages the cluster configuration for Gluster
● Responsible for
– Peer membership management
– Elastic volume management
– Configuration consistency
– Distributed command execution (orchestration)
– Service management (manages GlusterFS
daemons)
5
Thread synchronization in GlusterD
● GlusterD was initially designed as single threaded
● Single threaded → Multi threaded to satisfy usecases
like snapshot
● Big lock
– A coarse grained lock
– Only one transaction can work inside big lock
– Protects all the shared data structures
6
Issues with Big Lock
● Threads contend for even unrelated data
● Can end up in a deadlock
– RPC request's callback also needs big lock
● Shall we release big lock in between a transaction to
get rid of above deadlock? Yes we do, but….
● Here come's the problem - a small window of time
when the shared data structures are prone to updates
leading to inconsistencies
7
Different locking primitives
● Fine grained locks
– Mutex
– Read-write lock
– Spin lock
– Seq lock
– Read-Copy-Update (RCU)
8
What is RCU
● Synchronization mechanism
● Not new, added to Linux Kernel in 2002
● Allows reads to occur concurrently with update
● Maintains multiple version of objects for read
coherency
● Almost zero over heads in read side critical
section
9
Advantages of RCU over read-write
lock
● Concurrent readers & writers – writer writes, readers read
● Wait free reads
– RCU readers have no wait overhead. They can never be blocked by writers
● Existence guarantee
– RCU guarantees that RCU protected data in a readers critical section will remain
in existence till the end of the critical section
● Deadlock immunity
– RCU readers always run in a deterministic time as they never block. This means
that they can never become a part of a deadlock.
● No writer starvation
– As RCU readers don't block, writers can never starve.
10
RCU mechanism
● RCU is made up of three fundamental mechanisms
– Publish-Subscribe Mechanism (for insertion)
– Wait For Pre-Existing RCU Readers to Complete (for
deletion)
– Maintain Multiple Versions of Recently Updated Objects
(for readers)
11
Publish-Subscribe model
● rcu_assign_pointer () for publication
1 struct foo {
2 int a;
3 int b;
4 int c;
5 };
6 struct foo *gp = NULL;
7
8 /* . . . */
9
10 p = malloc (...);
11 p->a = 1;
12 p->b = 2;
13 p->c = 3;
14 gp = p;
1 struct foo {
2 int a;
3 int b;
4 int c;
5 };
6 struct foo *gp = NULL;
7
8 /* . . . */
9
10 p = malloc (...);
11 p->a = 1;
12 p->b = 2;
13 p->c = 3;
14 rcu_assign_pointer(gp, p);
● rcu_dereference () for subscription
1 p = gp;
2 if (p != NULL) {
3 do_something_with(p->a, p->b, p->c);
4 }
1 rcu_read_lock();
2 p = rcu_dereference(gp);
3 if (p != NULL) {
4 do_something_with(p->a, p->b, p->c);
5 }
6 rcu_read_unlock();
12
Publish-Subscribe Model (ii)
● rcu_assign_pointer () & rcu_dereference ()
embedded in special RCU variants of Linux's
list-manipulation API
● rcu_assign_pointer () → list_add_rcu ()
● rcu_dereference () → list_for_each_entry_rcu ()
13
Wait For Pre-Existing RCU Readers to
Complete
● Approach used for deletion
● Synchronous – synchronize_rcu ()
● Asynchronous – call_rcu ()
q = malloc(...);
*q = *p;
q->b = 2;
q->c = 3;
list_replace_rcu(&p->list, &q->list);
synchronize_rcu();
free(p)
q = malloc(...);
*q = *p;
q->b = 2;
q->c = 3;
list_replace_rcu(&p->list, &q->list);
call_rcu (&p->list, cbk); /* cbk will free p */
14
Maintain multiple version objects
● Used for existence gurantee
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
Maintain multiple version objects
● Used for existence gurantee
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
1. p = search(head, key);
2. list_del_rcu(&p->list);
3. synchronize_rcu();
4. free (p);
15
URCU flavors
● QSBR (quiescent-state-based RCU)
– each thread must periodically invoke rcu_quiescent_state()
– Thread (un)registration required
● Memory-barrier-based RCU
– Preemptible RCU implementation
– Introduces memory barrier in read critical secion, hence high read side
overhead
● “Bullet-proof” RCU (RCU-BP)
– Similar like memory barrier based RCU but thread (un)registration is taken
care
– Primitive overheads but can be used by application without worrying about
thread creation/destruction
16
URCU flavors (ii)
● Signal-based RCU
– Removes memory barrier
– Can be used by library function
– requires that the user application give up a POSIX signal to be
used by synchronize_rcu() in place of the read-side memory
barriers.
– Requires explicit thread registration
● Signal-based RCU using an out-of-tree sys_membarrier() system call
– sys_membarrier() system call instead of POSIX signal
17
URCU APIs
● Atomic-operation and utility APIs
– caa_: Concurrent Architecture Abstraction.
– cmm_: Concurrent Memory Model.
– uatomic_: URCU Atomic Operation.
– https://lwn.net/Articles/573435/
● The URCU APIs
– https://lwn.net/Articles/573439/
● RCU-Protected Lists
– https://lwn.net/Articles/573441
18
When is URCU useful
19
References
● https://lwn.net/Articles/262464/
● https://lwn.net/Articles/263130/
● https://lwn.net/Articles/573424/
● http://www.efficios.com/pub/lpc2011/Presentation-
lpc2011-desnoyers-urcu.pdf
● http://www.rdrop.com/~paulmck/RCU/RCU.IISc-
Bangalore.2013.06.03a.pdf
● http://urcu.so/
20
References
Q&A

More Related Content

PDF
Glusterfs session #5 inode t, fd-t lifecycles
PDF
My talk about Tarantool and Lua at Percona Live 2016
PDF
Glusterfs session #8 memory tracking infra, io-threads
PDF
Glusterfs session #18 intro to fuse and its trade offs
PDF
Gluster dev session #3 xlator interface
PPTX
Threads and Node.js
PDF
Scheming Defaults
PDF
Understanding the Disruptor
Glusterfs session #5 inode t, fd-t lifecycles
My talk about Tarantool and Lua at Percona Live 2016
Glusterfs session #8 memory tracking infra, io-threads
Glusterfs session #18 intro to fuse and its trade offs
Gluster dev session #3 xlator interface
Threads and Node.js
Scheming Defaults
Understanding the Disruptor

What's hot (20)

PDF
Glusterfs session #13 replication introduction
PDF
Clojure concurrency overview
PPTX
UDPSRC GStreamer Plugin Session VIII
PPTX
Disruptor
PDF
Glusterfs session #10 locks xlator inodelks
PDF
Fun with Network Interfaces
PPTX
grsecurity and PaX
PPTX
Highload осень 2012 лекция 1
PDF
Open Social Data (Jaca), Alejandro Rivero
PDF
The TCP/IP stack in the FreeBSD kernel COSCUP 2014
PPTX
MessagePack - An efficient binary serialization format
PPT
ODP
Introduction to Redis
PDF
EROSについて
PPT
More than UI
PDF
Non-DIY* Logging
PDF
Introduction to Rust
PPTX
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
PPT
PDF
Userfaultfd: Current Features, Limitations and Future Development
Glusterfs session #13 replication introduction
Clojure concurrency overview
UDPSRC GStreamer Plugin Session VIII
Disruptor
Glusterfs session #10 locks xlator inodelks
Fun with Network Interfaces
grsecurity and PaX
Highload осень 2012 лекция 1
Open Social Data (Jaca), Alejandro Rivero
The TCP/IP stack in the FreeBSD kernel COSCUP 2014
MessagePack - An efficient binary serialization format
Introduction to Redis
EROSについて
More than UI
Non-DIY* Logging
Introduction to Rust
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Userfaultfd: Current Features, Limitations and Future Development
Ad

Viewers also liked (15)

PDF
EFG Product News 2015
DOCX
Carta eliana
PPTX
A aicep Portugal Global | Sessão informativa 'Internacionalizar e as Empresas...
PPTX
Estado de espirito
PPTX
Johnnie walker
PPT
Slideshare#1
DOCX
Gracious city 2
PPTX
Firme fundamento
DOCX
Certificado(ensayo)
PDF
Imagen 1
PPT
INSEME Séniors et numérique
PPT
Atelier Monnaies Complémentaires - Rencontres de Babyloan 2010
PDF
PancreasCenterNews_spring2016
PDF
ΠΛΗ20 ΤΕΣΤ 22
EFG Product News 2015
Carta eliana
A aicep Portugal Global | Sessão informativa 'Internacionalizar e as Empresas...
Estado de espirito
Johnnie walker
Slideshare#1
Gracious city 2
Firme fundamento
Certificado(ensayo)
Imagen 1
INSEME Séniors et numérique
Atelier Monnaies Complémentaires - Rencontres de Babyloan 2010
PancreasCenterNews_spring2016
ΠΛΗ20 ΤΕΣΤ 22
Ad

Similar to Glusterd_thread_synchronization_using_urcu_lca2016 (20)

ODP
Gluster d thread_synchronization_using_urcu_lca2016
PDF
Userspace RCU library : what linear multiprocessor scalability means for your...
ODP
Thread synchronization in GlusterD using URCU
PDF
Linux Synchronization Mechanism: RCU (Read Copy Update)
PDF
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
PDF
Yet another introduction to Linux RCU
PPTX
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
PDF
Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...
PDF
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
PDF
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
PDF
Checkpoint and Restore In Userspace
PDF
Programming with Threads in Java
PDF
What’s new in 9.6, by PostgreSQL contributor
PDF
Userspace adaptive spinlocks with rseq
PDF
Streaming replication in practice
PDF
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
PDF
Oracle to Postgres Migration - part 2
PDF
We shall play a game....
PDF
Linux Locking Mechanisms
Gluster d thread_synchronization_using_urcu_lca2016
Userspace RCU library : what linear multiprocessor scalability means for your...
Thread synchronization in GlusterD using URCU
Linux Synchronization Mechanism: RCU (Read Copy Update)
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Yet another introduction to Linux RCU
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Checkpoint and Restore In Userspace
Programming with Threads in Java
What’s new in 9.6, by PostgreSQL contributor
Userspace adaptive spinlocks with rseq
Streaming replication in practice
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
Oracle to Postgres Migration - part 2
We shall play a game....
Linux Locking Mechanisms

More from Atin Mukherjee (7)

ODP
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
ODP
Ready to go
ODP
Gluster d2.0
ODP
Manging scalability of distributed system
ODP
GlusterD - Daemon refactoring
ODP
Consensus algo with_distributed_key_value_store_in_distributed_system
PDF
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
Ready to go
Gluster d2.0
Manging scalability of distributed system
GlusterD - Daemon refactoring
Consensus algo with_distributed_key_value_store_in_distributed_system
Gluster fs architecture_&_roadmap_atin_punemeetup_2015

Recently uploaded (20)

PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Geodesy 1.pptx...............................................
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
composite construction of structures.pdf
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
DOCX
573137875-Attendance-Management-System-original
PDF
Well-logging-methods_new................
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Welding lecture in detail for understanding
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
OOP with Java - Java Introduction (Basics)
additive manufacturing of ss316l using mig welding
Geodesy 1.pptx...............................................
Model Code of Practice - Construction Work - 21102022 .pdf
Sustainable Sites - Green Building Construction
Internet of Things (IOT) - A guide to understanding
composite construction of structures.pdf
Strings in CPP - Strings in C++ are sequences of characters used to store and...
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
573137875-Attendance-Management-System-original
Well-logging-methods_new................
UNIT-1 - COAL BASED THERMAL POWER PLANTS
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Welding lecture in detail for understanding
Foundation to blockchain - A guide to Blockchain Tech
Arduino robotics embedded978-1-4302-3184-4.pdf
Lesson 3_Tessellation.pptx finite Mathematics

Glusterd_thread_synchronization_using_urcu_lca2016

  • 2. 2 Glusterd Thread Synchronization using user space RCU Atin Mukherjee SSE-Red Hat Gluster Maintainer IRC : atinm on freenode Twitter: @mukherjee_atin
  • 3. 3 Agenda ● Introduction to GlusterD ● Big lock in thread synchronization in GlusterD ● Issues with Big Lock approach ● Different locking primitives ● What is RCU ● Advantage of RCU over read-write lock ● RCU mechanisms – Insertion, Deletion, Reader ● URCU flavors ● URCU APIs ● URCU use cases ● Q&A
  • 4. 4 What is GlusterD ● Manages the cluster configuration for Gluster ● Responsible for – Peer membership management – Elastic volume management – Configuration consistency – Distributed command execution (orchestration) – Service management (manages GlusterFS daemons)
  • 5. 5 Thread synchronization in GlusterD ● GlusterD was initially designed as single threaded ● Single threaded → Multi threaded to satisfy usecases like snapshot ● Big lock – A coarse grained lock – Only one transaction can work inside big lock – Protects all the shared data structures
  • 6. 6 Issues with Big Lock ● Threads contend for even unrelated data ● Can end up in a deadlock – RPC request's callback also needs big lock ● Shall we release big lock in between a transaction to get rid of above deadlock? Yes we do, but…. ● Here come's the problem - a small window of time when the shared data structures are prone to updates leading to inconsistencies
  • 7. 7 Different locking primitives ● Fine grained locks – Mutex – Read-write lock – Spin lock – Seq lock – Read-Copy-Update (RCU)
  • 8. 8 What is RCU ● Synchronization mechanism ● Not new, added to Linux Kernel in 2002 ● Allows reads to occur concurrently with update ● Maintains multiple version of objects for read coherency ● Almost zero over heads in read side critical section
  • 9. 9 Advantages of RCU over read-write lock ● Concurrent readers & writers – writer writes, readers read ● Wait free reads – RCU readers have no wait overhead. They can never be blocked by writers ● Existence guarantee – RCU guarantees that RCU protected data in a readers critical section will remain in existence till the end of the critical section ● Deadlock immunity – RCU readers always run in a deterministic time as they never block. This means that they can never become a part of a deadlock. ● No writer starvation – As RCU readers don't block, writers can never starve.
  • 10. 10 RCU mechanism ● RCU is made up of three fundamental mechanisms – Publish-Subscribe Mechanism (for insertion) – Wait For Pre-Existing RCU Readers to Complete (for deletion) – Maintain Multiple Versions of Recently Updated Objects (for readers)
  • 11. 11 Publish-Subscribe model ● rcu_assign_pointer () for publication 1 struct foo { 2 int a; 3 int b; 4 int c; 5 }; 6 struct foo *gp = NULL; 7 8 /* . . . */ 9 10 p = malloc (...); 11 p->a = 1; 12 p->b = 2; 13 p->c = 3; 14 gp = p; 1 struct foo { 2 int a; 3 int b; 4 int c; 5 }; 6 struct foo *gp = NULL; 7 8 /* . . . */ 9 10 p = malloc (...); 11 p->a = 1; 12 p->b = 2; 13 p->c = 3; 14 rcu_assign_pointer(gp, p); ● rcu_dereference () for subscription 1 p = gp; 2 if (p != NULL) { 3 do_something_with(p->a, p->b, p->c); 4 } 1 rcu_read_lock(); 2 p = rcu_dereference(gp); 3 if (p != NULL) { 4 do_something_with(p->a, p->b, p->c); 5 } 6 rcu_read_unlock();
  • 12. 12 Publish-Subscribe Model (ii) ● rcu_assign_pointer () & rcu_dereference () embedded in special RCU variants of Linux's list-manipulation API ● rcu_assign_pointer () → list_add_rcu () ● rcu_dereference () → list_for_each_entry_rcu ()
  • 13. 13 Wait For Pre-Existing RCU Readers to Complete ● Approach used for deletion ● Synchronous – synchronize_rcu () ● Asynchronous – call_rcu () q = malloc(...); *q = *p; q->b = 2; q->c = 3; list_replace_rcu(&p->list, &q->list); synchronize_rcu(); free(p) q = malloc(...); *q = *p; q->b = 2; q->c = 3; list_replace_rcu(&p->list, &q->list); call_rcu (&p->list, cbk); /* cbk will free p */
  • 14. 14 Maintain multiple version objects ● Used for existence gurantee 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p); 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p); 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p); Maintain multiple version objects ● Used for existence gurantee 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p); 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p); 1. p = search(head, key); 2. list_del_rcu(&p->list); 3. synchronize_rcu(); 4. free (p);
  • 15. 15 URCU flavors ● QSBR (quiescent-state-based RCU) – each thread must periodically invoke rcu_quiescent_state() – Thread (un)registration required ● Memory-barrier-based RCU – Preemptible RCU implementation – Introduces memory barrier in read critical secion, hence high read side overhead ● “Bullet-proof” RCU (RCU-BP) – Similar like memory barrier based RCU but thread (un)registration is taken care – Primitive overheads but can be used by application without worrying about thread creation/destruction
  • 16. 16 URCU flavors (ii) ● Signal-based RCU – Removes memory barrier – Can be used by library function – requires that the user application give up a POSIX signal to be used by synchronize_rcu() in place of the read-side memory barriers. – Requires explicit thread registration ● Signal-based RCU using an out-of-tree sys_membarrier() system call – sys_membarrier() system call instead of POSIX signal
  • 17. 17 URCU APIs ● Atomic-operation and utility APIs – caa_: Concurrent Architecture Abstraction. – cmm_: Concurrent Memory Model. – uatomic_: URCU Atomic Operation. – https://lwn.net/Articles/573435/ ● The URCU APIs – https://lwn.net/Articles/573439/ ● RCU-Protected Lists – https://lwn.net/Articles/573441
  • 18. 18 When is URCU useful
  • 19. 19 References ● https://lwn.net/Articles/262464/ ● https://lwn.net/Articles/263130/ ● https://lwn.net/Articles/573424/ ● http://www.efficios.com/pub/lpc2011/Presentation- lpc2011-desnoyers-urcu.pdf ● http://www.rdrop.com/~paulmck/RCU/RCU.IISc- Bangalore.2013.06.03a.pdf ● http://urcu.so/