Practitioners’ Expectations on Automated Fault Localization

Practitioners’ Expectations on
Automated Fault Localization
Pavneet Singh Kochhar*, Xin Xia+, David Lo*, Shanping Li+
*Singapore Management University
+Zhejiang University
The International Symposium on Software Testing and Analysis (ISSTA)

Too many bugs!
• Many projects receive large numbers of bug reports.
• Large number of bug reports can overwhelm developers.
- Mozilla developer - “Everyday, almost 300 bugs appear that need
triaging. This is far too much for only the Mozilla programmers to
handle” *
What have researchers proposed to overcome this issue?
*J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug
repository,” in ETX, pp. 35–39, 2005
2/31

Fault Localization
Thousands of Source Code Files
Find the buggy files/
methods/statements/
blocks
3/31
------>
GOAL:

How Fault Localization Works
4/31
Bug Reports Test Cases
Fault Localization Techniques
Information Retrieval-Based, Slicing, Spectrum-Based etc.
Statements Methods Classes

Fault Localization
What are the expectations of practitioners
on fault localization?
What factors affect adoption of fault
localization tools?
What are the thresholds for adoption?
5/31

Our Study
Practitioners Expectations
6/31
Survey Literature
Review

Practitioners Survey
• Multi-pronged strategy:
• Our contacts in IT industry
• Email 3300 practitioners on
• We receive 403 responses.
8/31

Survey Demographics
• 386 responses
• 33 countries
• Job profile
• Software Dev – 80.83%
• Software Testing – 30.05%
• Project Management – 17.10%
• Professional – 78.13%, Open-source – 44.24%
9/31

RQ1: Importance of Fault Localization
10/31
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
All Dev Test PM ExpLow ExpMed ExpHigh OS Prof
Ratings
Demographics
Essential Worthwhile Unimportant Unwise

11/31
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
All Dev Test PM ExpLow ExpMed ExpHigh OS Prof
Ratings
Demographics
Essential Worthwhile Unimportant Unwise
Fisher’s Exact Test = p-values < 0.05

Why “Unimportant” or “Unwise”
• Can’t deal with difficult bugs
- “I’m well aware of what static analysis can do and very few
hard bugs would be solved with it.”
• No rationale
- “I doubt any automated software can explain the reason for
things such as broken backwards compatibility, unclear
documentation etc.”
• Status quo
- “I don’t think personally I would pay for it, because for my cases
usual stack trace is over than enough”
12/31

RQ2: Availability of Debugging Data
13/31
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Math-Spec Text-Spec One-Test Multi-Tests Suc-Tests Text-Desc
Ratings
Debugging Hints Available
All the time Sometimes Rarely Never

14/31
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Ratings
>70% respondents mention availability of test cases

15/31
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Ratings
>80% respondents mention availability of bug reports

RQ3: Preferred Granularity Level
16/31
20.21%
26.42%
51.81%
44.30%
50.00%
0%
20%
40%
60%
Component Class Method Block Statement
PercentageofRespondents
Preferred Granularity Level

RQ4: Minimum Success Criterion
17/31
Position of the buggy element in returned list
9.43%
73.58%
15.09%
1.35% 0.54%
0%
25%
50%
75%
100%
Top 1 Top 5 Top 10 Top 20 Top 50
PercentageofRespondents
Minimum Success Criterion

RQ5: Trustworthiness
18/31
Proportion of times a technique works.
0%
25%
50%
75%
100%
5% 20% 50% 75% 90% 100%
SatisfactionRate
Minimum Success Rate

RQ6: Scalability
19/31
Program sizes a technique can work on.
0%
25%
50%
75%
100%
1-100 1-1000 1-10,000 1-100,000 1-1000,000
SatisfactionRate
Minimum Program Size

RQ7: Efficiency
20/31
Time taken to produce the results.
0%
25%
50%
75%
100%
< 1 seconds < 1 minute < 30 minutes < 1 hour < 1 day
SatisfactionRate
Maximum Runtime

RQ8: Willingness to Adopt
21/31
• > 98% willing to adopt a trustworthy, scalable and efficient
fault localization technique.
• Unwilling
- Resistance to Change
“Since I already have one and to use another would require
training time and time to get used to it”
- More information needed
“Would it be open source? Would it work with my main
programming language? Would it work with distributed
environments?”
- Disbelief of possibility of success
“I don’t think you can do it.”

RQ9: Other Factors (Hypotheses)
22/31
• Rationale
- An automated debugging tool must provide a rationale
why some program locations are marked as suspicious.
- I will *still adopt* an efficient, scalable, and trustworthy
automated debugging tool, even if it cannot provide
rationales.
• IDE Integration
- An automated debugging tool must be integrated well to
my favourite IDE.
- I will *still adopt* a an efficient, scalable, and trustworthy
automated debugging tool, even if it is not integrated well
to my favourite IDE.

RQ9: Other Factors
23/31
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Rationale Adoption w/o
Rationale
IDE Adoption w/o
IDE
Ratings
Statements
Strongly Agree Agree Neutral Disagree Strongly Disagree

RQ9: Other Factors
24/31
• Rationale
- False Positives
“False positives are worst than false negatives in my opinion”
- Rationale for buggy code
“Because to make a decisions about bug fixing I want to
*exactly* know why the automated tool “thinks” that the code
have a bug.”
• IDE Integration
- Extra steps needed
“No integration means extra steps which means testing will be
more cumbersome and hence less used.”
- Strong Reliance on IDE
“IDE is our environment. If I can’t add something into my
environment, it’s useless.”

Literature Review
26/31
• Papers published in last 5 years (2011-2015)
- ICSE (417) ---> 2
- FSE/ESEC-FSE (255) ---> 5
- ISSTA (169) ---> 3
- TSE (350) ---> 2
- TOSEM (137) ---> 4
• Included papers
- Spectrum-Based fault localization, Information-retrieval-
Based etc.
• Excluded papers
- Automatic repair, empirical study on debugging, bug
prediction, bug detection etc.
16
papers

Literature Review
Factor Type Papers
Debugging Data
Specification -
Test Cases [4], [5], [24], [29],
[35], [40], [44], [55],
[57], [59]
Bug Reports [16], [19], [24], [52],
[56], [60]
Granularity
Method [24], [52]
Statement [4], [5], [29], [35],
[44], [55], [57], [59]
Basic Block [16]
Other [19], [40], [56], [60]
27/31

Literature Review
Factor Satisfaction Rate Papers
Success Rate
90% (90%) -
75% (75%) -
50% (50%) [16], [19], [35], [40], [52], [56],
[59], [60]
? [4], [5], [29], [55], [57]
Scalability
90% (≥1M LOC) [29], [52]
75% (≥100,000 LOC) [16], [24], [56], [59], [60]
50% (≥10,000 LOC) [4], [5], [35], [40], [44], [55], [57]
? [19]
Efficiency
90% (<1 minute) [4], [24], [40], [44], [56]
? [16], [19], [29], [52], [57], [60]
28/31

Literature Review
Factor Support? Papers
Rationale Yes [29], [44]
IDE
Integration
Yes -
29/31

Key Takeaways
Large demand for fault localization
 >97% mention “Essential” or “Worthwhile”
High adoption barrier
 Satisfy 75% of practitioners; successful results in Top 5,
works 75% of time; ≥100,000 LOC; takes <1 minute.
Current techniques can’t satisfy 75% of respondents.
Techniques that satisfy 50% of respondents work on
coarse granularity (class or file).
Rationale and IDE Integration are important.
30/31

Future Work
Develop fault localization techniques to bring
current state-of-research closer to practitioners
expectations.
Systematic Literature Review (SLR)
31/31

Thank You!
Pavneet Singh Kochhar
kochharps.wix.com/pavneet
Email: kochharps.2012@smu.edu.sg

Conclusion
386 practitioners surveyed from 33 countries.
Test cases and bug reports are often available.
Preferred granularity - Method & Statement
Preferred Success Criterion – Top 5.
Different satisfaction rates for trustworthiness,
scalability and efficiency.
Rationale and IDE Integration are important.
33/30

Practitioners’ Expectations on Automated Fault Localization

More Related Content

What's hot (20)

Similar to Practitioners’ Expectations on Automated Fault Localization (20)

More from Pavneet Singh Kochhar (13)

Recently uploaded (20)

Practitioners’ Expectations on Automated Fault Localization

Editor's Notes