Effective Fault-Localization Techniques for Concurrent Software

Effective Fault-Localization
Techniques for

Concurrent Software
Sangmin Park

08/06/2014

!
Committee:

RichVuduc, Mayur Naik,Alex Orso

Milos Prvulovic, Mark Grechanik

(Mary Jean Harrold)

Introduction Background Prior Work User Study Conclusion
Impact of Concurrency Bugs
2
Northeast Blackout Facebook IPO Glitch
FAIL

Debugging Concurrency Bugs
3
Concurrency bugs are rated as the most difﬁcult types of bugs
Survey at Microsoft

[Godefroid08]
• 72% rated concurrency
bugs ‘very hard’ or ‘hard’
to debug 
!
• 83% rated concurrency
bugs ‘most severe’ or
‘severe’
What is the hardest bug? 
!
#1: Concurrency bugs

(40%, 101/255)
StackOverﬂow

 
http://bit.ly/sohardest

Debugging Concurrency Bugs
4
Concurrency bugs are difficult to locate, understand, and fix
“Intermittently I get the following error.

I would be grateful if anyone could shed any light on this issue.”
Difficult to Locate
* BugID: 27315
“I’ve noticed and reproduced crashes with the following stack trace. …

I have no clues on why this crash occurs.”
Difficult to Locate and Understand
* BugID: 3596MySQL
!
40% of initial patches to concurrency bugs are buggy

Highest ratio among all software bugs
Difficult to Fix
[Yin, FSE11]Survey

Challenges
5
Non-determinism Complex state changes
Debugging Concurrent Programs [McDowell 89]

Debugging Process
6
Software
Testcase
Fault Localization Fault Understanding Fault Correction

Debugging Process
7
Software
Testcase
Localize Single-Variable Faults
[ICSE 2010]
1
Localize Multi-Variable Faults
[ICST 2012]
2
Provide Fault Explanation
[ISSTA 2013]
3
User Study
4
Before

Proposal
After

Proposal

Thesis Statement
8
Dynamic fault-localization techniques
can assist developers in locating and
understanding non-deadlock
concurrency bugs by identifying
suspicious memory-access patterns and
providing calling contexts and methods.

Concurrency Bugs
9
Bug Type Ratio Bug Cause
Deadlock 30%
Mutual Exclusion, Hold/Wait,
No preemption, Circular Wait
Order Violation 22% Memory Access Orders
Atomicity Violation 47% Memory Access Orders
Others 1%
* Learning from Mistakes - A Comprehensive Survey on Concurrency Bugs [Lu08].

OrderViolation
10* https://bugzilla.mozilla.org/show_bug.cgi?id=61369
Thread 1 Thread 2
void init(…) {

!
mThread = CreateThread();

!
}
void foo(…) {

!
mState = mThread -> State;

!
}
W R

AtomicityViolation
11* https://bugzilla.mozilla.org/show_bug.cgi?id=73291
Thread 1 Thread 2
…

lock(L);

lptr = str;

unlock(L);

!
…

lock(L);

llen = length;

unlock(L);

…

lock(L);

str = newStr;

unlock(L);

!
…

lock(L);

length = newLength;

unlock(L);
WR
char* str; // shared vars

int length; // locked by L

WR

Patterns for Concurrency Bugs
12
OrderViolation: R1(x) W2(x)

!
AtomicityViolation: R1(x) W2(x) W2(y) R1(y)

Patterns for Concurrency Bugs
13
Type Memory Access Patterns
Order Violation
R1(x) W
W1(x) R
W1(x) W
Atomicity Violation  
(one variable)
R1(x) W
W1(x) W
R1(x) W
W1(x) R
W1(x) W
Atomicity Violation 
(multiple variables)
W1(x) W
W1(x) W
W1(x) W
W1(x) R
W1(x) R
R1(x) W
R1(x) W
R1(x) W
W1(x) R
* The patterns were identiﬁed by previous work [Lu06,Vaziri06, Hammer08].
Developed fault-localization techniques  
for these patterns

Prior Work
14
Software
Testcase
Localize Single-Variable Faults
[ICSE 2010]
1
Localize Multi-Variable Faults
[ICST 2012]
2
Provide Fault Explanation
[ISSTA 2013]
3
User Study
4
Before
Proposal
After

Proposal

Fault Localization for

Single-Variable Faults
15
Ranked List

for Single-Variable Bugs
1. R-W-R
2. R-W-W
3. R-W-W
4. W-W-W
5. R-W-W
....
Software
Testcase
FALCON
Falcon
Statistical
Fault Localization
Dynamic
Pattern Detection
Single-Variable
Patterns
• Pros: Effective in ranking patterns

• Cons: Miss multi-variable faults  
(30% of non-deadlock concurrency bugs [Lu 08])

Multiple-Variable Faults
16
1. R-W-R
2. R-W-W-W
3. R-W-W-R
4. W-W-W-W
5. R-W
....
Ranked List for

Single-/Multi-Variable Bugs
UNICORN
Software
Testcase
Unicorn
Dynamic
Pair Detection
Pattern
Combination
Pairs Patterns Statistical
Fault Localization
• Pros: Effective in ranking patterns

• Cons: Miss contextual information

Fault Explanation
17
GRIFFIN
Software
Testcase
Bug Graphs

(memory accesses + 
calling stacks +

suspicious methods)
Thread 1 Thread 2
150 Foo ()
270 int getS ()
271 return s; R
R
R
W 851 b.s += c.s;
852 b.a += c.a;W
152 Foo ()
680 void Bar()
681 a.s = b.s;
682 a.a = b.a;
Grifﬁn
Unicorn
Fault Localization
Pattern
Clustering
Patterns
per
Execution
Clustered
Patterns
Context
Reconstruction
Effective in clustering memory accesses
and locating the bug at method level

Fault Explanation
16
Software
Testcase
Bug Graphs!
calling stacks +!
suspicious methods)
Grifﬁn
Unicorn
Fault Localization
Execution
Clustering
Pattern
Clustering
Ranked
Lists
Clustered
Failing
Executions
GRIFFIN
15
Software
Testcase
1. R-W-R
2. R-W-W-W
3. R-W-W-R
4. W-W-W-W
5. R-W
....
Ranked List!
for Single-/Multi-Variable Bugs
Unicorn
Dynamic
Pair Detection
Pattern
Combination
Fault Localization
UNICORN
Fault Localization for !
14
Ranked List!
1. R-W-R
2. R-W-W
3. R-W-W
4. W-W-W
5. R-W-W
....
Software
Testcase
Falcon
Statistical
Fault Localization
Dynamic
Pattern Detection
Single-Variable
Patterns
FALCON
18
Usefulness
FALCON
UNICORN GRIFFIN

Goal
19
Fault Explanation
16
Software
Testcase
Bug Graphs!
calling stacks +!
suspicious methods)
Grifﬁn
Unicorn
Fault Localization
Execution
Clustering
Pattern
Clustering
Ranked
Lists
Clustered
Failing
Executions
GRIFFIN
15
Software
Testcase
1. R-W-R
2. R-W-W-W
3. R-W-W-R
4. W-W-W-W
5. R-W
....
Ranked List!
Unicorn
Dynamic
Pair Detection
Pattern
Combination
Fault Localization
UNICORN
14
Ranked List!
1. R-W-R
2. R-W-W
3. R-W-W
4. W-W-W
5. R-W-W
....
Software
Testcase
Falcon
Statistical
Fault Localization
Dynamic
Pattern Detection
Single-Variable
Patterns
FALCON
FALCON
UNICORN GRIFFIN
To determine whether these fault-localization
techniques help developers in understanding and
ﬁxing concurrency bugs
3 techniques implemented in Eclipse tools

Debugging Tools
20
Tools Output Comments
Tracer!
(Baseline)
Dump of shared
memory accesses from
a failing execution
• Based on ConcurrencyExplorer
(at Microsoft)
• Tool used for debugging*
Unicorn
Ranked list of memory-
access patterns
• Unicorn subsumes Falcon
• Based on Unicorn
Grifﬁn
List of memory
accesses with calling
context
• Based on Grifﬁn
* Other tools (e.g.,TIE, Jive, Jove) focus on visualizing thread interactions.

Tool:Tracer
21
Output: Dump of memory accesses
Thread Selector
Thread, Source Location, Variable…
Compared to ConcurrencyExplorer

• Same output (dump of memory accesses)

• Same outlook (tool + editor)

Tracer!
(Baseline)
Dump of shared
a failing execution
(at Microsoft)
Unicorn
access patterns
Grifﬁn
List of memory
context
Debugging Tools
22

Tool: Unicorn
23
Output: Ranked list of memory-access patterns
R-W-W pattern
Compared to Tracer

+ Memory patterns

- Thread identiﬁer

Tracer!
(Baseline)
Dump of shared
a failing execution
(at Microsoft)
Unicorn
access patterns
Grifﬁn
List of memory
context
Debugging Tools
24

Tool: Grifﬁn
25
Interleaving
Output: List of memory accesses with calling context
+ Clustered accesses
+ Suspicious methods
+ Calling context
Compared to Unicorn

+ Clustered memory accesses

+ Suspicious methods

+ Calling context

Hypotheses
• H1 (understanding): Unicorn > Tracer

✦ Unicorn provides summary of bugs

!
• H2 (understanding): Griffin > Unicorn,Tracer

✦ Griffin provides more context information

!
• H3 (fix): Unicorn, Griffin > Tracer

✦ Understand better => Fix better
26

Study Setup
27
3 Subject Programs
32 Developers
Protocol

Study Setup
28
3 Java Programs: 
- Bank Account (100 LoC) 
- Shop (300 LoC) 
- List (25 KLoC)

Subject 1: Bank Account
29
User 2User 1
Balance: $100Balance: $100
Deposit: $300
Withdraw: -$100
Transfer: -$100
Deposit: $300
Withdraw: -$100
Transfer: -$100
$100
$400
$300
$300
$100
$400
$300
$300$200 $400
• Size: 100 LoC

• Difﬁculty: Easy

Subject 2: Shop
30
CustomerGetItem
Supplier
Shop
PutItem
CustomerGetItem
Customer
GetItem
Bug:The program crashes with an exception at Shop.
• Size: 300 LoC

• Difﬁculty: Medium

Subject 3: List
31
A B CInitially, create three synchronized lists
…
.add(item)B
.add(item)B
.add(item)B
C .add(item)
C .add(item)
C .add(item)
B .clear()
A .addAll( )B
Item Item Item Itemnull Item Item ItemA B C :
• Size: 25 KLoC

• Difﬁculty: Hard

Study Setup
32
32 Developers:  
- Graduate students 
- Development experience 
(2~30 years, 11 median) 
- Concurrency experience 
(7 beginners, 10 experts)

Study Design
33
T1 T2 T3
S1 S2 S3 1) S1-T1, S2-T2, S3-T3
2) S1-T1, S2-T3, S3-T2
3) S1-T2, S2-T1, S3-T3
4) S1-T2, S2-T3, S3-T1
5) S1-T3, S2-T1, S3-T2
6) S1-T3, S2-T2, S3-T1
Factorial
Design

Study Setup
34
Protocol:  
- 1 hr 30 min =  
20 min tutorial +  
20 min per task + 
10 min buffer 
- Task = Debug + Survey 
- 5 surveys

Surveys
35
Background:

• Programming experience

• Concurrency experience

!
For each task:

• Usefulness

• Understanding

• Fix

!
!
!

Surveys
36
Background:



!
For each task:

• Usefulness

• Understanding

• Fix

!
!
!

Surveys
37
Background:



!
For each task:

• Usefulness

• Understanding

• Fix

!
!
!

Surveys
38
Background:



!
For each task:

• Usefulness

• Understanding

• Fix

!
Final:

• Rank of the tools

• General feedback
Evaluation
 
Scores (1 to 5 scale)

• Usefulness

• Understanding: graded

• Fix: ranking-based

!
Hypothesis Testing

• For each task, we performed  
unpaired t-test for different tool users

Overall Result
39
Score Type
Hypothesis
Testing
Task 1:!
Bank Account!
Task 2:!
Shop
Task 3:
List
Usefulness
Griffin > Tracer 0.67 2.53 2.17
Griffin > Unicorn -0.14 0.31 1.44
Unicorn > Tracer 0.81 2.22 0.72
Understanding
Griffin > Unicorn 0.62 0.07 1.11
Unicorn > Tracer -0.07 0.11 -0.13
Fix
Unicorn > Tracer -0.29 0.56 -0.69
* Numbers = Mean difference (-4 to 4); Bold = Statistically significant (p < 0.05).

Hypothesis Testing
40
Score Type
Hypothesis
Testing
Task 1:!
Bank Account!
Task 2:!
Shop
Task 3:
List
Usefulness
Understanding
Unicorn > Tracer -0.07 0.11 -0.13
Fix
Unicorn > Tracer -0.29 0.56 -0.69
H1 (understanding): Unicorn > Tracer 

Hypothesis Testing
41
Score Type
Hypothesis
Testing
Task 1:!
Bank Account!
Task 2:!
Shop
Task 3:
List
Usefulness
Understanding
Unicorn > Tracer -0.07 0.11 -0.13
Fix
Unicorn > Tracer -0.29 0.56 -0.69
H2 (understanding): Grifﬁn > Unicorn,Tracer 

Hypothesis Testing
42
Score Type
Hypothesis
Testing
Task 1:!
Bank Account!
Task 2:!
Shop
Task 3:
List
Usefulness
Understanding
Unicorn > Tracer -0.07 0.11 -0.13
Fix
Unicorn > Tracer -0.29 0.56 -0.69
H3 (ﬁx): Unicorn, Grifﬁn > Tracer 

• H1 (understanding): Unicorn > Tracer 
• H2 (understanding): Griffin > Unicorn,Tracer 
• H3 (fix): Unicorn, Griffin > Tracer
Hypothesis Testing
43

Analysis by Tool Preference
• How many participants rate Grifﬁn as the
best tool?

!
• Did these participants actually understand
bugs better?
44

Results by Tool Preference
45
Task Score Type
Group-T!
(2)
Group-U!
(7)
Group-G!
(21)
Task 1: !
Bank Account
Understanding 3.0 3.75 3.78
Fix 2.0 2.37 3.05
Task 2: !
Shop
Fix 2.33 3.75 4.0
Task 3:!
List
Fix 1.33 2.87 2.68
21
* Numbers in headers = # participants, Numbers in other cells = average scores
• How many participants rate Grifﬁn as the best
tool? 21 (70%)

• Did these participants actually understand bugs
better? Yes

Discussion:Tool Usage
46
Griffin Tracer
Track
Confirm
“There are three dimensions to think about:
Time vs.Thread vs. Context. Griffin showed
these quite effectively. However, the other two
tools lacked in these aspects.”
•“Tracer might be useful for simple code.
However, overall it won’t scale in real life
scenarios because most programs are complex.”

•“Tracer wasn’t very useful on this task because
there were too many threads and instructions
to keep track of.”

Discussion: Improvements
47
Fix Advice
Interactive

Debugging
Visual

Improvement

Future Work
48
Software
Testcase
Increase Bug Coverage
Reduce Overhead
ImproveVisualization
Support Interactive
Debugging
Use Multiple Inputs
Provide
Fix Advice

Data Available in Public
49
Data Location
Unicorn http://www.cc.gatech.edu/~sangminp/unicorn
Grifﬁn http://www.cc.gatech.edu/~sangminp/grifﬁn
Subject Programs http://www.cc.gatech.edu/~sangminp/bugs
User Study
http://www.cc.gatech.edu/~sangminp/concurrency-
study

Contributions
50
• H1: Participants using Unicorn will understand
concurrency bugs better than participants using Tracer 
• H2: Participants using Griffin will understand concurrency
bugs better than participants using Unicorn 
• H3: Participants using Unicorn and Griffin will fix
concurrency bugs better than participants using Tracer
Hypothesis Testing
32
Data Available in Public
45
Data Location
Unicorn http://www.cc.gatech.edu/~sangminp/unicorn
Griffin http://www.cc.gatech.edu/~sangminp/griffin
Subject Programs http://www.cc.gatech.edu/~sangminp/bugs
User Study
http://www.cc.gatech.edu/~sangminp/concurrency-
study
14
Ranked List!
1. R-W-R
2. R-W-W
3. R-W-W
4. W-W-W
5. R-W-W
....
Software
Testcase
Falcon
Statistical
Fault Localization
Dynamic
Pattern Detection
Single-Variable
Patterns
FALCON Introduction Background Prior Work User Study Conclusion
15
Software
Testcase
1. R-W-R
2. R-W-W-W
3. R-W-W-R
4. W-W-W-W
5. R-W
....
Ranked List!
Unicorn
Dynamic
Pair Detection
Pattern
Combination
Fault Localization
UNICORN
Fault Explanation
17
GRIFFIN
Software
Testcase
Bug Graphs!
calling stacks +!
suspicious methods)
Thread 1 Thread 2
150 Foo ()
270 int getS ()
271 return s; R
R
R
W 851 b.s += c.s;
852 b.a += c.a;W
152 Foo ()
680 void Bar()
681 a.s = b.s;
682 a.a = b.a;
Griffin
Unicorn
Fault Localization
Pattern
Clustering
Patterns
per
Executions
Clustered
Patterns
Context
Reconstruction

Why did you implement
Tracer as Eclipse plugin?
• To minimize the effect of UI:

• Same IDE: Eclipse

• Similar UI for all debuggers: similar colors,
list view

• Language difference: C#, Java
52

Concurrency Explorer
53Shared-memory dump Editor windows

ConcurrencyExplorer
vs Tracer
54
ConcurrencyExplorer Tracer
Output
Memory dump
(Source line, Thread, Object ID
Memory dump
(Source line, Thread)
UI Elements
• Window for Dump
• Editor for Source
• Window for Dump
• Editor for Source
IDE Visual Studio Eclipse
* ConcurrencyExplorer doesn’t show values of variables.

Tools for Concurrent S/W
55
Jive and Jove

!
• Link: http://
cs.brown.edu/~spr/
research/visjove.html 
!
• To show thread
interactions

• Not focused on
showing bugs

56
TIE

!
• Link:  
https://
www.youtube.com/
watch?
v=kbNXlLAkPgU

!
• To show thread
interactions

• Not focused on
showing bugs

57
ConcurrencyVisualizer (for Performance): The snapshot
shows inter-thread dependencies.

Tutorial
• Tutorial on Java Concurrency

• Bugs: Order/AtomicityViolations

• Fix Strategies

• Example Program

• Demo on Debugging Tools
58

Survey Links
• Link (Background): https://docs.google.com/forms/d/
1xthnR5Ibw8q1qrqn-WrBti5zjFVD1b-
nYZZ1S4RurMM/viewform

• Link (Task): https://docs.google.com/forms/d/
1SNlg4anVAZmR99EZjErvwG0rnW2nLrLYpwqZjNf
Q-yc/viewform

• Link (Final): https://docs.google.com/forms/d/
1L3_Intjm6oSwoZp3wWfHIv8Z2jcpNcbbn1nPeuiv1
b8/viewform
59

Fix Strategies
60

Study Design
61
T1 T2 T3
S1 S2 S3 1) S1-T1, S2-T2, S3-T3
2) S1-T1, S2-T3, S3-T2
3) S1-T2, S2-T1, S3-T3
4) S1-T2, S2-T3, S3-T1
5) S1-T3, S2-T1, S3-T2
6) S1-T3, S2-T2, S3-T1
Factorial
Design
1 3
 
1 1 
1 1
!
1 2
!
1 2
!
2 1
Beginner
Expert
• Setup: Random distribution of participants

• Results: No signiﬁcant score differences
between groups

Factorial Design
62https://explorable.com/factorial-design
•“Factorial designs are extremely useful to
psychologists and ﬁeld scientists as a preliminary study,
allowing them to judge whether there is a link
between variables, whilst reducing the possibility of
experimental error and confounding variables.”

•“The main disadvantage is the difﬁculty of
experimenting with more than two factors, or
many levels.A”

Eclipse Navigation Data
63
Task 1:!
Bank Account
Task 2:!
Shop
Task 3:!
List
Tracer!
Users
54.5 46.11 75.1
Unicorn!
Users
63.4 60.6 62.33
Griffin!
Users
59.11 69 39.11
• Numbers = Average navigation data (click+keyboard)

• For Task 3, Griffin users have fewer navigation, but the
result is not statistically significant.

Why is Fixing more difﬁcult?
• Many strategies

• Adding a lock

• Adding a condition (if, while)

• Switch statements, …

• Many decisions in one strategy

• Where should we add a lock?

• Should we use an existing lock or add a new one?

• Fix becomes bugs (e.g., adding a new lock -> deadlock)64

Limitations
•Participants - size, quality

•Factorial design

•Debugging - no editing
65

Related Work
• Empirical Studies for Sequential Bugs/Debuggers

• Weiser,Whyline, Parnin & Orso

• Empirical Studies for Concurrency

• for writing faster code

• for education

• Empirical Studies for Concurrency Bugs/Debuggers

• Sadowski andYi’s study
66

Effective Fault-Localization Techniques for Concurrent Software

More Related Content

What's hot (20)

Viewers also liked (15)

Similar to Effective Fault-Localization Techniques for Concurrent Software (20)

Recently uploaded (20)

Effective Fault-Localization Techniques for Concurrent Software