SlideShare a Scribd company logo
Vibha Singhal Sinha, Senthil Mani and Debdoot Mukherjee
IBM Research - India
23rd
October 2012, SPLASH-Wavefront, Tucson, AZ, USA
2
Can Text
Search help in
Debugging?
3
1. Search within past bug reports
• Find similar bug reports and identify patches
linked to them
1. Search within source – code
• Search comments, method names, variable
names etc to identify code regions with high
text overlap 4
 No dependence on program sizes, programming
languages, types of faults or the presence of
passing & failing test inputs; unlike existing
program-analysis based approaches:
 Program slicing
 Statistical debugging / spectra-based techniques
 Delta debugging / mutation based approaches
 Can be readily applied to jumpstart debugging
Possible Tactic: Identify a small set of files with text
search and feed that as input to a program analysis
based technique to localize to a set of lines
5
 IR systems proposed in different areas of software
maintenance to recommend relevant artifacts in context of
developer tasks
 Hipikat, Lassie, DebugAdvisor
 Efficacy of different language models have been evaluated for
fault localization (Rao et al, Marcus et al, Cleary et al)
 Vector Space Model, Latent Semantic Indexing, Latent Dirichlet
Allocation, Cluster Based Decision Making
 Rao and Kak suggest that IR-based bug localization is at least
as effective as static and dynamic analysis techniques
 Enslen proposed Identifier-Splitting to increase vocabulary
overlap between bug reports and code base
 E.g., a code word TextFieldTool is split into three words: text, field,
tool. 6
Index
Creator
Index
Creator
Query
Creator
Query
Creator
From repository of past
Resolved Bugs
Search For {query}
Search On {created indices}
Incoming Bug
Past Resolved
Bugs and linked
Code repository
Search Results:
Ranked list of files
Bug Index
(BI)
Bug Index
(BI)
Code Index
(CI)
Code Index
(CI)
From code repository
Meta Index
(MI)
Meta Index
(MI)
From code repository –
processed through
identifier splitting
Search
Module
Search
Module
Results
Collator
Results
Collator
Collate Title &
Description (A)
Collate Title &
Description (A)
Boost weight of
Title Words
Boost weight of
Title Words
Boost weight of
Code Words
Boost weight of
Code Words
Indexing Strategies
Querying Strategies
7
 RQ1 : How do the following search approaches
compare in terms of efficacy? Are they any better than
chance?
 Search on past bug reports – Bug Index (BI)
 Search on code repository – Code Index (CI)
 Search on processed code repository– Meta Index (MI)
 RQ2 : Can we combine them to increase efficacy?
 RQ3 : How do different features of the source code and
the bugs available in a project impact the effectiveness
of search? 8
 4 open source subjects
 BIRT, Datatools (Eclipse)
 Derby, Hadoop (Apache)
 Linking bug reports to change-sets
 Mined from references to bug-ids in
commit comments
 Tracing JIRA links
 Test set has bug reports with at
least one source file associated
with them
 1177 bugs in test set
 35% of total bugs in chosen releases
 3-4% of the bug repositories
9
 Average Precision, Recall and F1-Score
 For each bug in the test set taken as a query, we
calculate precision, recall and F1-score and then
average across the test-set.
 Bug Coverage
 Percentage of bugs in the test set for which the
search returns at least one file in the
recommendation set matches the ground truth.
10
 RQ1 : How do the following search approaches
compare in terms of efficacy? Are they any better than
chance?
 Search on past bug reports – Bug Index (BI)
 Search on code repository – Code Index (CI)
 Search on processed code repository– Meta Index (MI)
 RQ2 : Can we combine them to increase efficacy?
 RQ3 : How do different features of the source code and
the bugs available in a project impact the effectiveness
11
CI:A
MI:A
BI:A
PRECISION RECALL F1-SCORE
Increase in recall much slower than drop in precision; so F-score dips beyond result-set size of 3Increase in recall much slower than drop in precision; so F-score dips beyond result-set size of 3
Suggests that search techniques may NOT help in identifying ALL files that needs to be fixedSuggests that search techniques may NOT help in identifying ALL files that needs to be fixed
BIRT DATATOOLS
DERBY HADOOP
Bug Coverage Increases with Increase in Result-set SizeBug Coverage Increases with Increase in Result-set Size
None of the techniques emerge as the clear winnerNone of the techniques emerge as the clear winner
MI isn’t any better than CI. Sometimes it performs worseMI isn’t any better than CI. Sometimes it performs worse
Hadoop gives much better results than other 3 subjectsHadoop gives much better results than other 3 subjects
13
 Compare with efficacy of a user who randomly selects source
files from the code repository as the files to be fixed to resolve
a bug
 Think of the code repository as a bin of black and white balls, where the
files that need fix for a bug resolution are white balls; rest are black
balls.
 The hyper-geometric distribution gives the probability of choosing
white balls without replacement
 probability p of getting at least x files that require a fix by choosing k
files at random from the repository:
 If p < 0.05, reject the null hypothesis that search technique is no better
than chance. Do FDR test for multiple hypothesis testing.
14
 Even if one correct result is returned for a bug, then the
result is usually significant.
 Datatools has many queries failing the FDR test
 Certain queries have a large number of fixed files (e.g., 491 in 2 bugs)
 Record the average number of files in the repository at which
the techniques break even with chance: p >= 0.05
 Ranges from 66 in Derby (MI:A) to 158 in Datatools (CI:A)
15
 RQ1 : How do the following search approaches
compare in terms of efficacy? Are they any better than
chance?
 Search on past bug reports – Bug Index (BI)
 Search on code repository – Code Index (CI)
 Search on processed code repository– Meta Index (MI)
 RQ2 : Can we combine them to increase efficacy?
 RQ3 : How do different features of the source code and
the bugs available in a project impact the effectiveness
16
 Fleiss’ Kappa analysis to measure the degree of agreement
amongst the three techniques
 Each technique rates a bug: Yes, if technique covers the bug; else No
 Code based techniques (CI, MI) are similar, they are quite different
from the bug based technique (BI)
 Combine bug based and code based to get better results ?? 17
 Fire the same query on the 3 different indices and choose
the top X search results using the following ranking
schemes:
 RankScore: Rank using the absolute search similarity scores
returned by the search engine
 NormScore: Rank using a normalized similarity score - fraction
of maximum score returned by the query
 AggregateScore: Rank on the basis of sum of scores from
different techniques
 Sample: Pick the top 2*(X/5) search results from the results of
BI:A and CI:A, and the remaining X/5 results from MI:A.
18
 RankScore works better than the best of the
individual techniques across all subjects
 Improvement in bug coverage ranges from 1% to 46%
19
 RQ1 : How do the following search approaches
compare in terms of efficacy? Are they any better than
chance?
 Search on past bug reports – Bug Index (BI)
 Search on code repository – Code Index (CI)
 Search on processed code repository– Meta Index (MI)
 RQ2 : Can we combine them to increase efficacy?
 RQ3 : How do different features of the source code and
the bugs available in a project impact the effectiveness
20
 Since query sizes can become very large, there may
be a need for artificially boosting important words –
TitleWords, CodeWords
 TitleBoost helps improve bug coverage
 Except in Hadoop where the fraction of titleWords that
come up significant is already high even without boost.
MI MI
MI MI
BI CI BI
BI CI
CI
BI CI
21
 Compare the efficacy of techniques that directly search
the code repository with those that search over past bug
reports
 No clear winner is observed
 Bug coverage ranges from 20 to 60% across 4 subjects
 Techniques are better than chance
 Identifier splitting does not yield much benefit
 The techniques are complementary
 Bug coverage improves by 1% - 46% by combining them
 Favoring title-words help in most cases
24

More Related Content

PDF
PDF
Cser13.ppt
PDF
Software bug prediction
PDF
Thesis+of+latifa+guerrouj.ppt
PDF
Open & reproducible research - What can we do in practice?
PDF
Opinion Mining for Software Engineering
PDF
Finding Bad Code Smells with Neural Network Models
PPTX
Keyphrase Extraction And Source Code Similarity Detection- A Survey
Cser13.ppt
Software bug prediction
Thesis+of+latifa+guerrouj.ppt
Open & reproducible research - What can we do in practice?
Opinion Mining for Software Engineering
Finding Bad Code Smells with Neural Network Models
Keyphrase Extraction And Source Code Similarity Detection- A Survey

What's hot (17)

PPTX
Mapping and visualization of source code a survey
PDF
Machine Learning in Static Analysis of Program Source Code
PDF
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
PDF
Put Your Hands in the Mud: What Technique, Why, and How
PDF
Implications of Open Source Software Use (or Let's Talk Open Source)
PDF
Characterizing the transition to Kotlin of Android Apps: A Study on F-Droid, ...
PDF
Implications of GPT-3
PDF
Runtime Behavior of JavaScript Programs
PDF
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
PPTX
A brief primer on OpenAI's GPT-3
PDF
A novel approach for clone group mapping
PDF
Building a Dynamic Bidding system for a location based Display advertising Pl...
PPTX
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
PDF
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
PDF
Automated server-side model for recognition of security vulnerabilities in sc...
PPT
Experiments on Design Pattern Discovery
PDF
A Novel Approach for Code Clone Detection Using Hybrid Technique
Mapping and visualization of source code a survey
Machine Learning in Static Analysis of Program Source Code
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
Put Your Hands in the Mud: What Technique, Why, and How
Implications of Open Source Software Use (or Let's Talk Open Source)
Characterizing the transition to Kotlin of Android Apps: A Study on F-Droid, ...
Implications of GPT-3
Runtime Behavior of JavaScript Programs
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
A brief primer on OpenAI's GPT-3
A novel approach for clone group mapping
Building a Dynamic Bidding system for a location based Display advertising Pl...
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
Automated server-side model for recognition of security vulnerabilities in sc...
Experiments on Design Pattern Discovery
A Novel Approach for Code Clone Detection Using Hybrid Technique
Ad

Viewers also liked (9)

PPT
Determining QoS of WS-BPEL Compositions
PPT
Naelektriziranost na telata
DOCX
Latihan thn 1
PPT
Which Work-Item Updates Need Your Response?
PPT
PPT
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
PDF
PPTX
Electrocardiography,cvp,blood pressure
PPTX
Mobile computing
Determining QoS of WS-BPEL Compositions
Naelektriziranost na telata
Latihan thn 1
Which Work-Item Updates Need Your Response?
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Electrocardiography,cvp,blood pressure
Mobile computing
Ad

Similar to Is Text Search an Effective Approach for Fault Localization: A Practitioners Perspective (20)

PPTX
Automated bug localization
PDF
A tale of bug prediction in software development
PDF
Implementation of reducing features to improve code change based bug predicti...
PDF
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
PDF
IRJET-Automatic Bug Triage with Software
PPTX
Potential Biases in Bug Localization: Do They Matter?
PPTX
STRICT: Information Retrieval Based Search Term Identification for Concept Lo...
PDF
PDF
Keynote HotSWUp 2012
PDF
Quality of Bug Reports in Open Source
PDF
Towards effective bug triage with software data reduction techniques
PPTX
Bug prediction + sdlc automation
PPTX
Towards Automated Supports for Code Reviews using Reviewer Recommendation and...
PDF
Towards effective bug triage with software data reduction techniques
PPTX
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
PDF
Evaluating the Usefulness of IR-Based Fault LocalizationTechniques
PDF
Towards Effective Bug Triage with Software Data Reduction Techniques
PDF
A Tale of Experiments on Bug Prediction
PDF
Comparative performance analysis
PDF
Comparative Performance Analysis of Machine Learning Techniques for Software ...
Automated bug localization
A tale of bug prediction in software development
Implementation of reducing features to improve code change based bug predicti...
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
IRJET-Automatic Bug Triage with Software
Potential Biases in Bug Localization: Do They Matter?
STRICT: Information Retrieval Based Search Term Identification for Concept Lo...
Keynote HotSWUp 2012
Quality of Bug Reports in Open Source
Towards effective bug triage with software data reduction techniques
Bug prediction + sdlc automation
Towards Automated Supports for Code Reviews using Reviewer Recommendation and...
Towards effective bug triage with software data reduction techniques
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
Evaluating the Usefulness of IR-Based Fault LocalizationTechniques
Towards Effective Bug Triage with Software Data Reduction Techniques
A Tale of Experiments on Bug Prediction
Comparative performance analysis
Comparative Performance Analysis of Machine Learning Techniques for Software ...

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Electronic commerce courselecture one. Pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation theory and applications.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
Review of recent advances in non-invasive hemoglobin estimation
Electronic commerce courselecture one. Pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Unlocking AI with Model Context Protocol (MCP)
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Per capita expenditure prediction using model stacking based on satellite ima...
Building Integrated photovoltaic BIPV_UPV.pdf
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Understanding_Digital_Forensics_Presentation.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Network Security Unit 5.pdf for BCA BBA.

Is Text Search an Effective Approach for Fault Localization: A Practitioners Perspective

  • 1. Vibha Singhal Sinha, Senthil Mani and Debdoot Mukherjee IBM Research - India 23rd October 2012, SPLASH-Wavefront, Tucson, AZ, USA
  • 2. 2
  • 3. Can Text Search help in Debugging? 3
  • 4. 1. Search within past bug reports • Find similar bug reports and identify patches linked to them 1. Search within source – code • Search comments, method names, variable names etc to identify code regions with high text overlap 4
  • 5.  No dependence on program sizes, programming languages, types of faults or the presence of passing & failing test inputs; unlike existing program-analysis based approaches:  Program slicing  Statistical debugging / spectra-based techniques  Delta debugging / mutation based approaches  Can be readily applied to jumpstart debugging Possible Tactic: Identify a small set of files with text search and feed that as input to a program analysis based technique to localize to a set of lines 5
  • 6.  IR systems proposed in different areas of software maintenance to recommend relevant artifacts in context of developer tasks  Hipikat, Lassie, DebugAdvisor  Efficacy of different language models have been evaluated for fault localization (Rao et al, Marcus et al, Cleary et al)  Vector Space Model, Latent Semantic Indexing, Latent Dirichlet Allocation, Cluster Based Decision Making  Rao and Kak suggest that IR-based bug localization is at least as effective as static and dynamic analysis techniques  Enslen proposed Identifier-Splitting to increase vocabulary overlap between bug reports and code base  E.g., a code word TextFieldTool is split into three words: text, field, tool. 6
  • 7. Index Creator Index Creator Query Creator Query Creator From repository of past Resolved Bugs Search For {query} Search On {created indices} Incoming Bug Past Resolved Bugs and linked Code repository Search Results: Ranked list of files Bug Index (BI) Bug Index (BI) Code Index (CI) Code Index (CI) From code repository Meta Index (MI) Meta Index (MI) From code repository – processed through identifier splitting Search Module Search Module Results Collator Results Collator Collate Title & Description (A) Collate Title & Description (A) Boost weight of Title Words Boost weight of Title Words Boost weight of Code Words Boost weight of Code Words Indexing Strategies Querying Strategies 7
  • 8.  RQ1 : How do the following search approaches compare in terms of efficacy? Are they any better than chance?  Search on past bug reports – Bug Index (BI)  Search on code repository – Code Index (CI)  Search on processed code repository– Meta Index (MI)  RQ2 : Can we combine them to increase efficacy?  RQ3 : How do different features of the source code and the bugs available in a project impact the effectiveness of search? 8
  • 9.  4 open source subjects  BIRT, Datatools (Eclipse)  Derby, Hadoop (Apache)  Linking bug reports to change-sets  Mined from references to bug-ids in commit comments  Tracing JIRA links  Test set has bug reports with at least one source file associated with them  1177 bugs in test set  35% of total bugs in chosen releases  3-4% of the bug repositories 9
  • 10.  Average Precision, Recall and F1-Score  For each bug in the test set taken as a query, we calculate precision, recall and F1-score and then average across the test-set.  Bug Coverage  Percentage of bugs in the test set for which the search returns at least one file in the recommendation set matches the ground truth. 10
  • 11.  RQ1 : How do the following search approaches compare in terms of efficacy? Are they any better than chance?  Search on past bug reports – Bug Index (BI)  Search on code repository – Code Index (CI)  Search on processed code repository– Meta Index (MI)  RQ2 : Can we combine them to increase efficacy?  RQ3 : How do different features of the source code and the bugs available in a project impact the effectiveness 11
  • 12. CI:A MI:A BI:A PRECISION RECALL F1-SCORE Increase in recall much slower than drop in precision; so F-score dips beyond result-set size of 3Increase in recall much slower than drop in precision; so F-score dips beyond result-set size of 3 Suggests that search techniques may NOT help in identifying ALL files that needs to be fixedSuggests that search techniques may NOT help in identifying ALL files that needs to be fixed
  • 13. BIRT DATATOOLS DERBY HADOOP Bug Coverage Increases with Increase in Result-set SizeBug Coverage Increases with Increase in Result-set Size None of the techniques emerge as the clear winnerNone of the techniques emerge as the clear winner MI isn’t any better than CI. Sometimes it performs worseMI isn’t any better than CI. Sometimes it performs worse Hadoop gives much better results than other 3 subjectsHadoop gives much better results than other 3 subjects 13
  • 14.  Compare with efficacy of a user who randomly selects source files from the code repository as the files to be fixed to resolve a bug  Think of the code repository as a bin of black and white balls, where the files that need fix for a bug resolution are white balls; rest are black balls.  The hyper-geometric distribution gives the probability of choosing white balls without replacement  probability p of getting at least x files that require a fix by choosing k files at random from the repository:  If p < 0.05, reject the null hypothesis that search technique is no better than chance. Do FDR test for multiple hypothesis testing. 14
  • 15.  Even if one correct result is returned for a bug, then the result is usually significant.  Datatools has many queries failing the FDR test  Certain queries have a large number of fixed files (e.g., 491 in 2 bugs)  Record the average number of files in the repository at which the techniques break even with chance: p >= 0.05  Ranges from 66 in Derby (MI:A) to 158 in Datatools (CI:A) 15
  • 16.  RQ1 : How do the following search approaches compare in terms of efficacy? Are they any better than chance?  Search on past bug reports – Bug Index (BI)  Search on code repository – Code Index (CI)  Search on processed code repository– Meta Index (MI)  RQ2 : Can we combine them to increase efficacy?  RQ3 : How do different features of the source code and the bugs available in a project impact the effectiveness 16
  • 17.  Fleiss’ Kappa analysis to measure the degree of agreement amongst the three techniques  Each technique rates a bug: Yes, if technique covers the bug; else No  Code based techniques (CI, MI) are similar, they are quite different from the bug based technique (BI)  Combine bug based and code based to get better results ?? 17
  • 18.  Fire the same query on the 3 different indices and choose the top X search results using the following ranking schemes:  RankScore: Rank using the absolute search similarity scores returned by the search engine  NormScore: Rank using a normalized similarity score - fraction of maximum score returned by the query  AggregateScore: Rank on the basis of sum of scores from different techniques  Sample: Pick the top 2*(X/5) search results from the results of BI:A and CI:A, and the remaining X/5 results from MI:A. 18
  • 19.  RankScore works better than the best of the individual techniques across all subjects  Improvement in bug coverage ranges from 1% to 46% 19
  • 20.  RQ1 : How do the following search approaches compare in terms of efficacy? Are they any better than chance?  Search on past bug reports – Bug Index (BI)  Search on code repository – Code Index (CI)  Search on processed code repository– Meta Index (MI)  RQ2 : Can we combine them to increase efficacy?  RQ3 : How do different features of the source code and the bugs available in a project impact the effectiveness 20
  • 21.  Since query sizes can become very large, there may be a need for artificially boosting important words – TitleWords, CodeWords  TitleBoost helps improve bug coverage  Except in Hadoop where the fraction of titleWords that come up significant is already high even without boost. MI MI MI MI BI CI BI BI CI CI BI CI 21
  • 22.  Compare the efficacy of techniques that directly search the code repository with those that search over past bug reports  No clear winner is observed  Bug coverage ranges from 20 to 60% across 4 subjects  Techniques are better than chance  Identifier splitting does not yield much benefit  The techniques are complementary  Bug coverage improves by 1% - 46% by combining them  Favoring title-words help in most cases 24