SlideShare a Scribd company logo
Understanding Real-World
Concurrency Bugs in Go
@kakashi
Hello!
I am kakashi
- Infra lead @UmboCV
- Co-organizer @ Golang Taipei Gathering
@kakashiliu
@kkcliu
Learning Camera Smart Cloud Neural A.I.
Agenda
● Introduction
● Concurrency in Go
● Go concurrency Bugs
○ Blocking
○ Non-Blocking
● Conclusion
Introduction
● Systematic study for 6 popular go projects
Concurrency in Go
1. Making threads (goroutines) lightweight and easy to create
2. Using explicit messaging (via channels) to communicate across
threads
Beliefs about Go:
● Make concurrent programming easier and less
error-prone
● Make heavy use of message passing via channels,
which is less error prone than shared memory
● Have less concurrency bugs
● Built-in deadlock and data racing can catch any bugs
Go Concurrency Usage Patterns
surprising finding is that shared memory synchronisation operations are still
used more often than message passing
線程安全,碼農有錢
Go Concurrency Bugs
1. Blocking - one or more goroutines are unintentionally stuck in their execution
and cannot move forward.
2. Non-Blocking - If instead all goroutines can finish their tasks but their
behaviors are not desired, we call them non-blocking ones
Blocking Bugs Causes
Message passing operations are even more likely to cause blocking bugs
faultMutex.Lock()
if faultDomain == nil {
var err error
faultDomain, err = fetchFaultDomain()
if err != nil {
return cloudprovider.Zone{}, err
}
}
zone := cloudprovider.Zone{}
faultMutex.UnLock()
return zone, nil
Blocking Bug caused by Mutex
faultMutex.Lock()
defer faultMutex.UnLock()
if faultDomain == nil {
var err error
faultDomain, err = fetchFaultDomain()
if err != nil {
return cloudprovider.Zone{}, err
}
}
zone := cloudprovider.Zone{}
faultMutex.UnLock()
return zone, nil
Blocking Bug caused by Mutex
var group sync.WaitGroup
group.Add(len(pm.plugins))
for_, p := range pm.plugins {
go func(p *plugin) {
defer group.Done()
}
group.Wait()
}
Blocking Bug caused by WaitGroup
var group sync.WaitGroup
group.Add(len(pm.plugins))
for_, p := range pm.plugins {
go func(p *plugin) {
defer group.Done()
}
group.Wait() // blocking
}
group.Wait() // fixed
Blocking Bug caused by WaitGroup
func finishReq(timeout time.Duration) r ob {
ch := make(chanob)
go func() {
result := fn()
ch <- result
}
select {
case result = <- ch
return result
case <- time.After(timeout)
return nil
}
}
Blocking Bug caused by Channel
func finishReq(timeout time.Duration) r ob {
ch := make(chanob, 1)
go func() {
result := fn()
ch <- result // blocking
}
select {
case result = <- ch
return result
case <- time.After(timeout)
return nil
}
}
Blocking Bug caused by Channel
Blocking Bug: Mistakenly using channel and mutex
Blocking Bug: Mistakenly using channel and mutex
func goroutine1() {
m.Lock()
ch <- request // blocking
m.Unlock()
}
func goroutine2() {
for{
m.Lock() // blocking
m.Unlock()
request <- ch
}
}
Non-Blocking Bugs Causes
There are much fewer non-blocking bugs caused by message passing than by
shared memory accesses.
Non-Blocking Bug caused by select and channel
ticker := time.NewTicker()
for {
f()
select {
case <- stopCh
return
case <- ticker
}
}
Non-Blocking Bug caused by select and channel
ticker := time.NewTicker()
for {
select{
case <- stopCh:
return
default:
}
f()
select {
case <- stopCh:
return
case <- ticker:
}
}
Non-Blocking Bug caused Timer
timer := time.NewTimer(0)
if dur > 0 {
timer = time.NewTimer(dur)
}
select{
case <- timer.C:
case <- ctx.Done:
return nil
}
Non-Blocking Bug caused Timer
timer := time.NewTimer(0)
var timeout <- chan time.Time
if dur > 0 {
timer = time.NewTimer(dur)
timeout = time.NewTimer(dur).C
}
select{
case <- timer.C:
case <- timeout:
case <- ctx.Done:
return nil
}
A data race caused by anonymous function
for i:=17; i<=21; i++ { // write
go func() {
apiVersion := fmt.Sprintf(“v1.%d”, i)
}()
}
A data race caused by anonymous function
for i:=17; i<=21; i++ { // write
go func(i int) {
apiVersion := fmt.Sprintf(“v1.%d”, i)
}(i)
}
A data race caused by passing reference through channel
Conclusion
1. Contrary to the common belief that message passing is less
error-prone, more blocking bugs in our studied Go applications
are caused by wrong message passing than by wrong shared
memory protection.
2. Message passing causes less nonblocking bugs than shared
memory synchronization
3. Misusing Go libraries can cause both blocking and
nonblocking bugs
Q&A

More Related Content

PDF
Understanding real world concurrency bugs in go (fixed)
PPTX
Handle File on mule
PPTX
Http request smuggling
PDF
Art of Disorderly Programming
PPSX
Golang getting started
PDF
Zmq in context of openstack
PDF
Mock testing mit Python
PDF
Bartosz Zaczyński (Grand Parade Poland) - WebSocket for Dummies
Understanding real world concurrency bugs in go (fixed)
Handle File on mule
Http request smuggling
Art of Disorderly Programming
Golang getting started
Zmq in context of openstack
Mock testing mit Python
Bartosz Zaczyński (Grand Parade Poland) - WebSocket for Dummies

Similar to Understanding real world concurrency bugs in go (20)

PDF
10 reasons to be excited about go
PPTX
Fundamental concurrent programming
PDF
Demystifying the Go Scheduler
PDF
Async Web Frameworks in Python
PPTX
Go & multi platform GUI Trials and Errors
PDF
Asynchronous programming intro
PDF
Inroduction to golang
PPTX
Go fundamentals
PDF
Introduction to ZooKeeper - TriHUG May 22, 2012
PPTX
How go makes us faster (May 2015)
PDF
On the way to low latency (2nd edition)
PDF
2015-GopherCon-Talk-Uptime.pdf
PPTX
PPT
Ipc feb4
PPT
Jingle: Cutting Edge VoIP
PPT
Os Tucker
PDF
HTTP/2 turns 3 years old // Web Performance Meetup wao.io 20180612
PPTX
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
PDF
Introduction to Google Colaboratory.pdf
PDF
Server side story
10 reasons to be excited about go
Fundamental concurrent programming
Demystifying the Go Scheduler
Async Web Frameworks in Python
Go & multi platform GUI Trials and Errors
Asynchronous programming intro
Inroduction to golang
Go fundamentals
Introduction to ZooKeeper - TriHUG May 22, 2012
How go makes us faster (May 2015)
On the way to low latency (2nd edition)
2015-GopherCon-Talk-Uptime.pdf
Ipc feb4
Jingle: Cutting Edge VoIP
Os Tucker
HTTP/2 turns 3 years old // Web Performance Meetup wao.io 20180612
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Introduction to Google Colaboratory.pdf
Server side story
Ad

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
cuic standard and advanced reporting.pdf
PPT
Teaching material agriculture food technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
A Presentation on Artificial Intelligence
PPTX
Cloud computing and distributed systems.
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
20250228 LYD VKU AI Blended-Learning.pptx
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
Chapter 3 Spatial Domain Image Processing.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Spectroscopy.pptx food analysis technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
A comparative analysis of optical character recognition models for extracting...
cuic standard and advanced reporting.pdf
Teaching material agriculture food technology
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
A Presentation on Artificial Intelligence
Cloud computing and distributed systems.
sap open course for s4hana steps from ECC to s4
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Ad

Understanding real world concurrency bugs in go

  • 2. Hello! I am kakashi - Infra lead @UmboCV - Co-organizer @ Golang Taipei Gathering @kakashiliu @kkcliu
  • 3. Learning Camera Smart Cloud Neural A.I.
  • 4. Agenda ● Introduction ● Concurrency in Go ● Go concurrency Bugs ○ Blocking ○ Non-Blocking ● Conclusion
  • 5. Introduction ● Systematic study for 6 popular go projects
  • 6. Concurrency in Go 1. Making threads (goroutines) lightweight and easy to create 2. Using explicit messaging (via channels) to communicate across threads
  • 7. Beliefs about Go: ● Make concurrent programming easier and less error-prone ● Make heavy use of message passing via channels, which is less error prone than shared memory ● Have less concurrency bugs ● Built-in deadlock and data racing can catch any bugs
  • 8. Go Concurrency Usage Patterns surprising finding is that shared memory synchronisation operations are still used more often than message passing
  • 10. Go Concurrency Bugs 1. Blocking - one or more goroutines are unintentionally stuck in their execution and cannot move forward. 2. Non-Blocking - If instead all goroutines can finish their tasks but their behaviors are not desired, we call them non-blocking ones
  • 11. Blocking Bugs Causes Message passing operations are even more likely to cause blocking bugs
  • 12. faultMutex.Lock() if faultDomain == nil { var err error faultDomain, err = fetchFaultDomain() if err != nil { return cloudprovider.Zone{}, err } } zone := cloudprovider.Zone{} faultMutex.UnLock() return zone, nil Blocking Bug caused by Mutex
  • 13. faultMutex.Lock() defer faultMutex.UnLock() if faultDomain == nil { var err error faultDomain, err = fetchFaultDomain() if err != nil { return cloudprovider.Zone{}, err } } zone := cloudprovider.Zone{} faultMutex.UnLock() return zone, nil Blocking Bug caused by Mutex
  • 14. var group sync.WaitGroup group.Add(len(pm.plugins)) for_, p := range pm.plugins { go func(p *plugin) { defer group.Done() } group.Wait() } Blocking Bug caused by WaitGroup
  • 15. var group sync.WaitGroup group.Add(len(pm.plugins)) for_, p := range pm.plugins { go func(p *plugin) { defer group.Done() } group.Wait() // blocking } group.Wait() // fixed Blocking Bug caused by WaitGroup
  • 16. func finishReq(timeout time.Duration) r ob { ch := make(chanob) go func() { result := fn() ch <- result } select { case result = <- ch return result case <- time.After(timeout) return nil } } Blocking Bug caused by Channel
  • 17. func finishReq(timeout time.Duration) r ob { ch := make(chanob, 1) go func() { result := fn() ch <- result // blocking } select { case result = <- ch return result case <- time.After(timeout) return nil } } Blocking Bug caused by Channel
  • 18. Blocking Bug: Mistakenly using channel and mutex
  • 19. Blocking Bug: Mistakenly using channel and mutex func goroutine1() { m.Lock() ch <- request // blocking m.Unlock() } func goroutine2() { for{ m.Lock() // blocking m.Unlock() request <- ch } }
  • 20. Non-Blocking Bugs Causes There are much fewer non-blocking bugs caused by message passing than by shared memory accesses.
  • 21. Non-Blocking Bug caused by select and channel ticker := time.NewTicker() for { f() select { case <- stopCh return case <- ticker } }
  • 22. Non-Blocking Bug caused by select and channel ticker := time.NewTicker() for { select{ case <- stopCh: return default: } f() select { case <- stopCh: return case <- ticker: } }
  • 23. Non-Blocking Bug caused Timer timer := time.NewTimer(0) if dur > 0 { timer = time.NewTimer(dur) } select{ case <- timer.C: case <- ctx.Done: return nil }
  • 24. Non-Blocking Bug caused Timer timer := time.NewTimer(0) var timeout <- chan time.Time if dur > 0 { timer = time.NewTimer(dur) timeout = time.NewTimer(dur).C } select{ case <- timer.C: case <- timeout: case <- ctx.Done: return nil }
  • 25. A data race caused by anonymous function for i:=17; i<=21; i++ { // write go func() { apiVersion := fmt.Sprintf(“v1.%d”, i) }() }
  • 26. A data race caused by anonymous function for i:=17; i<=21; i++ { // write go func(i int) { apiVersion := fmt.Sprintf(“v1.%d”, i) }(i) }
  • 27. A data race caused by passing reference through channel
  • 28. Conclusion 1. Contrary to the common belief that message passing is less error-prone, more blocking bugs in our studied Go applications are caused by wrong message passing than by wrong shared memory protection. 2. Message passing causes less nonblocking bugs than shared memory synchronization 3. Misusing Go libraries can cause both blocking and nonblocking bugs
  • 29. Q&A