SlideShare a Scribd company logo
Detecting Bug Duplicate Reports Through Locality of Reference Tomi Prifti, Sean Banerjee, Bojan Cukic Lane Department of CSEE West Virginia University Morgantown, WV, USA September 2011
Presentation Outline Introduction Goals Related Work Understanding the Firefox Repository Experimental Setup Results Summary
Introduction Bug tracking systems are essential for software maintenance and testing Developers and simple users can report failure occurrences Advantages: Users involved in error reporting Direct impact of software quality. Disadvantages: Large number of reports on daily basis. Significant effort to triage. Users may submit many duplicate reports.
A typical bug report
Goals Comprehensive empirical analysis of a large bug report dataset. Creation of a search tool Encourage users to search the repository Avoid duplicate report submission whenever possible Assisting with report triage Build a list of reports possibly describing the same problem  Let a triager examines the suggested list
Related Work Providing Triagers with a Suggested List Provide a suggested list of similar bugs to triagers for examinations. Wang et. al. exploit NLP techniques  and execution information Duplicate detection rate as high as 67%-93% Semi-automated Filtering Determine the type of the report (Duplicate or Primary). If the new report is classified as a duplicate filter it out. Jalbert et al. use text semantics and a graph clustering technique to predict duplicate status Filtered out only 8% of duplicate reports
Related Work Semi-automated Assignment Apply text categorization techniques to predict the developer that should work on the bug Cubranic et. al. apply supervised Bayesian learning. Correctly classify 30% of the reports Anvik et. al. uses a supervised machine learning algorithm. Precision rates of 57% and 64% for Firefox and Eclipse Improving Report Quality Duplicate reports are not considered harmful Bettenburg et al. developed a tool, called CUEZILLA, that measures the quality of bug reports in real time “ Steps to reproduce” and “Stack traces” are the most useful information in bug reports
Related Work Bugzilla Search Tool Bugzilla 4.0 released around February 2011 provides duplicate detection Tool performs a Boolean full text search on the title over the entire repository Generates a dozen or so reports that may match at least one of the search terms In some instance testing with the exact title from an existing report title did not return the report itself Unknown accuracy of reported matches
Firefox Repository Firefox releases: 1.0.5, 1.5, 2.0, 3.0, 3.5 and the current version 3.6 (as of June 2010). 65% of reports reside in groups of one. 90% of duplicates are distributed in small groups of 2-16 reports
Time Interval Between Reports Many bugs receive the first duplicate within the first few months of the original report.
Experimental Setup Tokenization - “Bag-of-Words” Stemming. Reducing words to their root Stop Words Removal Lucene API used for pre-processing Term Frequency/Inverse Document Frequency (TF/IDF) used for weighting words Cosine Similarity used for similarity measures Example of tokenizing, stemming and stop word removal  Sending email is not functional.  send email function
Experimental Procedure Start with initial 50% as historical information Group containing most recent primary or duplicate is on top of the initial list Build suggested list using IR techniques As experiment progresses historical repository increases Continue until reports classified as duplicate or primary If a bug is primary it is forwarded to the repository This may not be realistic as triagers may misjudge reports
Measuring Performance Performance of the bug search tool is measured by the recall rate N recalled  refers to the number of duplicate reports correctly classified N total  refers to the total number of duplicate reports
Approach methodology Reporters query the repository. Use “title” (summary) to compare reports. Four experiments: TF/IDF “ Sliding Window” - TF/IDF “ Sliding Window” - Group Centroids - TF/IDF “ Sliding Window” - Group Centroids The centroid is composed of all unique terms from all reports in the group and the sum of their frequencies in each report. The total frequency of each term is divided by  the number of reports in the group.
Sliding Window Defined “ Sliding-Window” approach. Keep a window of fixed size n Sort all groups based on the time elapsed between the last report and the new incoming report. Select top n groups (2000 is optimal analysis shows 95% accuracy of duplicate being in this group) Apply IR techniques only on top n groups Build a short list of top  m  most similar reports to present to the triager/reporter
Experimental Results Our results demonstrate that Time-Window/Group Centroid and report summaries predict duplicate problem reports with a recall ratio of up to 53%.
Performance and Runtime Large variance in recall rate initially.  Time window approach stabilizes, while TF/IDF degrades.  Classification run time is faster for the Time Window approach.  Additional report increases computation time in TF/IDF
Result Comparisons Group Approach Results Hiew, et-al Text analysis Recall rate ~50% Cubranic, et-al Bayesian learning Text categorization Correctly predicted ~30% duplicates Jalbert, et-al Text Similarity Clustering  Recall rate ~51%  List size 20 Wang, et-al NLP Execution Information 67-93% detection rate (43-72% with NLP) Wang, et-al Enhanced version of prior algorithm 17-31% improvement over state of art Our approach Time Window/Centroids ~53% recall rate
Threats to Validity Assumption that the ground truth on duplicates is correct The life cycle of a bug is ever changing Some reports often change state multiple times
Summary and Future Work SUMMARY Comprehensive study to analyze long term duplicate trends in a large, open source project. Improve search features in duplicate detection by providing a search list. Time interval between reports can be used to improve the search space. FUTURE WORK Compare with other projects (eg: Eclipse) to be able to generalize the approach. Effects on duplicate propagation caused by a user incorrectly selecting a report from the suggested list.
TF/IDF  Compare vector representing a new report to every vector that is currently in the database.  Vectors in the database are weighted using TF/IDF to emphasize rare words.  The reports are ranked based on their cosine-similarity scores.  Report ranking is used to build the suggested list presented to the user. Run time impacted as repository size grows.
Sliding Window - TF/IDF  Apply time window to limit groups under consideration for search.  Only the reports within  2,000  groups are considered. Reports are weighted using TF/IDF.  Scoring and building of the suggested list same as TF/IDF
Sliding Window – Centroid Same time window.  Reports from the  2,000  groups not immediately searched and weighted using TF/IDF.  Centroid  vector representing each group is used. Example: Summary 1 unable send email Summary 2 send email function Summary 3 send email after enter recipient The resulting centroid of the group is: 1.0 send, 0.33 unable, 1.0 mail, 0.33 function, 0.33 after, 0.33 enter, 0.33 recipient.
Sliding Window – Centroid – TD/IDF Uses centroid technique described before. Weight each term in centroids using TF/IDF weighting scheme.

More Related Content

PDF
130531 francis nahm - on the evolution of antipatterns genealogies
DOCX
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
PPTX
capture-recapture Single Defect
PDF
UCSF Profiles: Research Networking Usage at a Large Biomedical Institution
PDF
Robot Hunter: or precisely what I thought I wouldn't be doing when I became a...
PPTX
#iCanHazRobot?: improved robot detection for IR usage statistics
PPT
IOTA OpenURL Quality @ 2011 UKSG Conference
PDF
Agressive feature selection for text categorization
130531 francis nahm - on the evolution of antipatterns genealogies
A PARTICLE SWARM OPTIMIZATION TECHNIQUE FOR GENERATING PAIRWISE TEST CASES
capture-recapture Single Defect
UCSF Profiles: Research Networking Usage at a Large Biomedical Institution
Robot Hunter: or precisely what I thought I wouldn't be doing when I became a...
#iCanHazRobot?: improved robot detection for IR usage statistics
IOTA OpenURL Quality @ 2011 UKSG Conference
Agressive feature selection for text categorization

Viewers also liked (10)

PPTX
Pipelining Cache
PPTX
GALE: Geometric active learning for Search-Based Software Engineering
PPTX
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
PPTX
Lexisnexis june9
PPTX
Welcome to ICSE NIER’15 (new ideas and emerging results).
PPTX
Future se oct15
PPTX
Talks2015 novdec
PPTX
Big Data: the weakest link
PPTX
Superscalar & superpipeline processor
PPT
pipelining
Pipelining Cache
GALE: Geometric active learning for Search-Based Software Engineering
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Lexisnexis june9
Welcome to ICSE NIER’15 (new ideas and emerging results).
Future se oct15
Talks2015 novdec
Big Data: the weakest link
Superscalar & superpipeline processor
pipelining
Ad

Similar to PROMISE 2011: "Detecting Bug Duplicate Reports through Locality of Reference" (20)

PDF
PDF
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
PDF
Ppt Open Mrs 1
PDF
A Survey on Bug Tracking System for Effective Bug Clearance
PDF
AUTOMATED BUG TRIAGE USING ADVANCED DATA REDUCTION TECHNIQUES
PDF
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
PDF
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
PDF
Towards effective bug triage with software data reduction techniques
PDF
Bug or Not? Bug Report Classification using N-Gram Idf
PDF
From Bugs to Decision Support - Selected Research Highlights
PDF
Towards Effective Bug Triage with Software Data Reduction Techniques
DOCX
Towards effective bug triage with software
PDF
Towards effective bug triage with software data reduction techniques
PDF
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
PDF
Bug Triage: An Automated Process
PDF
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
PPTX
Automated bug localization
PDF
Quality of Bug Reports in Open Source
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
Ppt Open Mrs 1
A Survey on Bug Tracking System for Effective Bug Clearance
AUTOMATED BUG TRIAGE USING ADVANCED DATA REDUCTION TECHNIQUES
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
Towards effective bug triage with software data reduction techniques
Bug or Not? Bug Report Classification using N-Gram Idf
From Bugs to Decision Support - Selected Research Highlights
Towards Effective Bug Triage with Software Data Reduction Techniques
Towards effective bug triage with software
Towards effective bug triage with software data reduction techniques
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
Bug Triage: An Automated Process
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Automated bug localization
Quality of Bug Reports in Open Source
Ad

More from CS, NcState (20)

PPTX
Icse15 Tech-briefing Data Science
PPTX
Kits to Find the Bits that Fits
PPTX
Ai4se lab template
PPTX
Automated Software Enging, Fall 2015, NCSU
PPT
Requirements Engineering
PPT
172529main ken and_tim_software_assurance_research_at_west_virginia
PPTX
Automated Software Engineering
PDF
Next Generation “Treatment Learning” (finding the diamonds in the dust)
PPTX
Tim Menzies, directions in Data Science
PPTX
Goldrush
PPTX
Dagstuhl14 intro-v1
PPTX
Know thy tools
PPTX
The Art and Science of Analyzing Software Data
PPTX
What Metrics Matter?
PPTX
In the age of Big Data, what role for Software Engineers?
PDF
Sayyad slides ase13_v4
PDF
Ase2013
PPTX
Warning: don't do CS
PPTX
How to do better experiments in SE
PPTX
Idea Engineering
Icse15 Tech-briefing Data Science
Kits to Find the Bits that Fits
Ai4se lab template
Automated Software Enging, Fall 2015, NCSU
Requirements Engineering
172529main ken and_tim_software_assurance_research_at_west_virginia
Automated Software Engineering
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Tim Menzies, directions in Data Science
Goldrush
Dagstuhl14 intro-v1
Know thy tools
The Art and Science of Analyzing Software Data
What Metrics Matter?
In the age of Big Data, what role for Software Engineers?
Sayyad slides ase13_v4
Ase2013
Warning: don't do CS
How to do better experiments in SE
Idea Engineering

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation theory and applications.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Machine Learning_overview_presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25-Week II
MIND Revenue Release Quarter 2 2025 Press Release
Review of recent advances in non-invasive hemoglobin estimation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
The Rise and Fall of 3GPP – Time for a Sabbatical?
The AUB Centre for AI in Media Proposal.docx
Encapsulation theory and applications.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectral efficient network and resource selection model in 5G networks
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
A comparative analysis of optical character recognition models for extracting...
Machine Learning_overview_presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Per capita expenditure prediction using model stacking based on satellite ima...

PROMISE 2011: "Detecting Bug Duplicate Reports through Locality of Reference"

  • 1. Detecting Bug Duplicate Reports Through Locality of Reference Tomi Prifti, Sean Banerjee, Bojan Cukic Lane Department of CSEE West Virginia University Morgantown, WV, USA September 2011
  • 2. Presentation Outline Introduction Goals Related Work Understanding the Firefox Repository Experimental Setup Results Summary
  • 3. Introduction Bug tracking systems are essential for software maintenance and testing Developers and simple users can report failure occurrences Advantages: Users involved in error reporting Direct impact of software quality. Disadvantages: Large number of reports on daily basis. Significant effort to triage. Users may submit many duplicate reports.
  • 4. A typical bug report
  • 5. Goals Comprehensive empirical analysis of a large bug report dataset. Creation of a search tool Encourage users to search the repository Avoid duplicate report submission whenever possible Assisting with report triage Build a list of reports possibly describing the same problem Let a triager examines the suggested list
  • 6. Related Work Providing Triagers with a Suggested List Provide a suggested list of similar bugs to triagers for examinations. Wang et. al. exploit NLP techniques and execution information Duplicate detection rate as high as 67%-93% Semi-automated Filtering Determine the type of the report (Duplicate or Primary). If the new report is classified as a duplicate filter it out. Jalbert et al. use text semantics and a graph clustering technique to predict duplicate status Filtered out only 8% of duplicate reports
  • 7. Related Work Semi-automated Assignment Apply text categorization techniques to predict the developer that should work on the bug Cubranic et. al. apply supervised Bayesian learning. Correctly classify 30% of the reports Anvik et. al. uses a supervised machine learning algorithm. Precision rates of 57% and 64% for Firefox and Eclipse Improving Report Quality Duplicate reports are not considered harmful Bettenburg et al. developed a tool, called CUEZILLA, that measures the quality of bug reports in real time “ Steps to reproduce” and “Stack traces” are the most useful information in bug reports
  • 8. Related Work Bugzilla Search Tool Bugzilla 4.0 released around February 2011 provides duplicate detection Tool performs a Boolean full text search on the title over the entire repository Generates a dozen or so reports that may match at least one of the search terms In some instance testing with the exact title from an existing report title did not return the report itself Unknown accuracy of reported matches
  • 9. Firefox Repository Firefox releases: 1.0.5, 1.5, 2.0, 3.0, 3.5 and the current version 3.6 (as of June 2010). 65% of reports reside in groups of one. 90% of duplicates are distributed in small groups of 2-16 reports
  • 10. Time Interval Between Reports Many bugs receive the first duplicate within the first few months of the original report.
  • 11. Experimental Setup Tokenization - “Bag-of-Words” Stemming. Reducing words to their root Stop Words Removal Lucene API used for pre-processing Term Frequency/Inverse Document Frequency (TF/IDF) used for weighting words Cosine Similarity used for similarity measures Example of tokenizing, stemming and stop word removal Sending email is not functional. send email function
  • 12. Experimental Procedure Start with initial 50% as historical information Group containing most recent primary or duplicate is on top of the initial list Build suggested list using IR techniques As experiment progresses historical repository increases Continue until reports classified as duplicate or primary If a bug is primary it is forwarded to the repository This may not be realistic as triagers may misjudge reports
  • 13. Measuring Performance Performance of the bug search tool is measured by the recall rate N recalled refers to the number of duplicate reports correctly classified N total refers to the total number of duplicate reports
  • 14. Approach methodology Reporters query the repository. Use “title” (summary) to compare reports. Four experiments: TF/IDF “ Sliding Window” - TF/IDF “ Sliding Window” - Group Centroids - TF/IDF “ Sliding Window” - Group Centroids The centroid is composed of all unique terms from all reports in the group and the sum of their frequencies in each report. The total frequency of each term is divided by the number of reports in the group.
  • 15. Sliding Window Defined “ Sliding-Window” approach. Keep a window of fixed size n Sort all groups based on the time elapsed between the last report and the new incoming report. Select top n groups (2000 is optimal analysis shows 95% accuracy of duplicate being in this group) Apply IR techniques only on top n groups Build a short list of top m most similar reports to present to the triager/reporter
  • 16. Experimental Results Our results demonstrate that Time-Window/Group Centroid and report summaries predict duplicate problem reports with a recall ratio of up to 53%.
  • 17. Performance and Runtime Large variance in recall rate initially. Time window approach stabilizes, while TF/IDF degrades. Classification run time is faster for the Time Window approach. Additional report increases computation time in TF/IDF
  • 18. Result Comparisons Group Approach Results Hiew, et-al Text analysis Recall rate ~50% Cubranic, et-al Bayesian learning Text categorization Correctly predicted ~30% duplicates Jalbert, et-al Text Similarity Clustering Recall rate ~51% List size 20 Wang, et-al NLP Execution Information 67-93% detection rate (43-72% with NLP) Wang, et-al Enhanced version of prior algorithm 17-31% improvement over state of art Our approach Time Window/Centroids ~53% recall rate
  • 19. Threats to Validity Assumption that the ground truth on duplicates is correct The life cycle of a bug is ever changing Some reports often change state multiple times
  • 20. Summary and Future Work SUMMARY Comprehensive study to analyze long term duplicate trends in a large, open source project. Improve search features in duplicate detection by providing a search list. Time interval between reports can be used to improve the search space. FUTURE WORK Compare with other projects (eg: Eclipse) to be able to generalize the approach. Effects on duplicate propagation caused by a user incorrectly selecting a report from the suggested list.
  • 21. TF/IDF Compare vector representing a new report to every vector that is currently in the database. Vectors in the database are weighted using TF/IDF to emphasize rare words. The reports are ranked based on their cosine-similarity scores. Report ranking is used to build the suggested list presented to the user. Run time impacted as repository size grows.
  • 22. Sliding Window - TF/IDF Apply time window to limit groups under consideration for search. Only the reports within 2,000 groups are considered. Reports are weighted using TF/IDF. Scoring and building of the suggested list same as TF/IDF
  • 23. Sliding Window – Centroid Same time window. Reports from the 2,000 groups not immediately searched and weighted using TF/IDF. Centroid vector representing each group is used. Example: Summary 1 unable send email Summary 2 send email function Summary 3 send email after enter recipient The resulting centroid of the group is: 1.0 send, 0.33 unable, 1.0 mail, 0.33 function, 0.33 after, 0.33 enter, 0.33 recipient.
  • 24. Sliding Window – Centroid – TD/IDF Uses centroid technique described before. Weight each term in centroids using TF/IDF weighting scheme.