PROMISE 2011: "Detecting Bug Duplicate Reports through Locality of Reference"

Detecting Bug Duplicate Reports Through Locality of Reference Tomi Prifti, Sean Banerjee, Bojan Cukic Lane Department of CSEE West Virginia University Morgantown, WV, USA September 2011

Presentation Outline Introduction Goals Related Work Understanding the Firefox Repository Experimental Setup Results Summary

Introduction Bug tracking systems are essential for software maintenance and testing Developers and simple users can report failure occurrences Advantages: Users involved in error reporting Direct impact of software quality. Disadvantages: Large number of reports on daily basis. Significant effort to triage. Users may submit many duplicate reports.

Goals Comprehensive empirical analysis of a large bug report dataset. Creation of a search tool Encourage users to search the repository Avoid duplicate report submission whenever possible Assisting with report triage Build a list of reports possibly describing the same problem Let a triager examines the suggested list

Related Work Providing Triagers with a Suggested List Provide a suggested list of similar bugs to triagers for examinations. Wang et. al. exploit NLP techniques and execution information Duplicate detection rate as high as 67%-93% Semi-automated Filtering Determine the type of the report (Duplicate or Primary). If the new report is classified as a duplicate filter it out. Jalbert et al. use text semantics and a graph clustering technique to predict duplicate status Filtered out only 8% of duplicate reports

Related Work Semi-automated Assignment Apply text categorization techniques to predict the developer that should work on the bug Cubranic et. al. apply supervised Bayesian learning. Correctly classify 30% of the reports Anvik et. al. uses a supervised machine learning algorithm. Precision rates of 57% and 64% for Firefox and Eclipse Improving Report Quality Duplicate reports are not considered harmful Bettenburg et al. developed a tool, called CUEZILLA, that measures the quality of bug reports in real time “ Steps to reproduce” and “Stack traces” are the most useful information in bug reports

Related Work Bugzilla Search Tool Bugzilla 4.0 released around February 2011 provides duplicate detection Tool performs a Boolean full text search on the title over the entire repository Generates a dozen or so reports that may match at least one of the search terms In some instance testing with the exact title from an existing report title did not return the report itself Unknown accuracy of reported matches

Firefox Repository Firefox releases: 1.0.5, 1.5, 2.0, 3.0, 3.5 and the current version 3.6 (as of June 2010). 65% of reports reside in groups of one. 90% of duplicates are distributed in small groups of 2-16 reports

Time Interval Between Reports Many bugs receive the first duplicate within the first few months of the original report.

Experimental Setup Tokenization - “Bag-of-Words” Stemming. Reducing words to their root Stop Words Removal Lucene API used for pre-processing Term Frequency/Inverse Document Frequency (TF/IDF) used for weighting words Cosine Similarity used for similarity measures Example of tokenizing, stemming and stop word removal Sending email is not functional. send email function

Experimental Procedure Start with initial 50% as historical information Group containing most recent primary or duplicate is on top of the initial list Build suggested list using IR techniques As experiment progresses historical repository increases Continue until reports classified as duplicate or primary If a bug is primary it is forwarded to the repository This may not be realistic as triagers may misjudge reports

Measuring Performance Performance of the bug search tool is measured by the recall rate N recalled refers to the number of duplicate reports correctly classified N total refers to the total number of duplicate reports

Approach methodology Reporters query the repository. Use “title” (summary) to compare reports. Four experiments: TF/IDF “ Sliding Window” - TF/IDF “ Sliding Window” - Group Centroids - TF/IDF “ Sliding Window” - Group Centroids The centroid is composed of all unique terms from all reports in the group and the sum of their frequencies in each report. The total frequency of each term is divided by the number of reports in the group.

Sliding Window Defined “ Sliding-Window” approach. Keep a window of fixed size n Sort all groups based on the time elapsed between the last report and the new incoming report. Select top n groups (2000 is optimal analysis shows 95% accuracy of duplicate being in this group) Apply IR techniques only on top n groups Build a short list of top m most similar reports to present to the triager/reporter

Experimental Results Our results demonstrate that Time-Window/Group Centroid and report summaries predict duplicate problem reports with a recall ratio of up to 53%.

Performance and Runtime Large variance in recall rate initially. Time window approach stabilizes, while TF/IDF degrades. Classification run time is faster for the Time Window approach. Additional report increases computation time in TF/IDF

Result Comparisons Group Approach Results Hiew, et-al Text analysis Recall rate ~50% Cubranic, et-al Bayesian learning Text categorization Correctly predicted ~30% duplicates Jalbert, et-al Text Similarity Clustering Recall rate ~51% List size 20 Wang, et-al NLP Execution Information 67-93% detection rate (43-72% with NLP) Wang, et-al Enhanced version of prior algorithm 17-31% improvement over state of art Our approach Time Window/Centroids ~53% recall rate

Threats to Validity Assumption that the ground truth on duplicates is correct The life cycle of a bug is ever changing Some reports often change state multiple times

Summary and Future Work SUMMARY Comprehensive study to analyze long term duplicate trends in a large, open source project. Improve search features in duplicate detection by providing a search list. Time interval between reports can be used to improve the search space. FUTURE WORK Compare with other projects (eg: Eclipse) to be able to generalize the approach. Effects on duplicate propagation caused by a user incorrectly selecting a report from the suggested list.

TF/IDF Compare vector representing a new report to every vector that is currently in the database. Vectors in the database are weighted using TF/IDF to emphasize rare words. The reports are ranked based on their cosine-similarity scores. Report ranking is used to build the suggested list presented to the user. Run time impacted as repository size grows.

Sliding Window - TF/IDF Apply time window to limit groups under consideration for search. Only the reports within 2,000 groups are considered. Reports are weighted using TF/IDF. Scoring and building of the suggested list same as TF/IDF

Sliding Window – Centroid Same time window. Reports from the 2,000 groups not immediately searched and weighted using TF/IDF. Centroid vector representing each group is used. Example: Summary 1 unable send email Summary 2 send email function Summary 3 send email after enter recipient The resulting centroid of the group is: 1.0 send, 0.33 unable, 1.0 mail, 0.33 function, 0.33 after, 0.33 enter, 0.33 recipient.

Sliding Window – Centroid – TD/IDF Uses centroid technique described before. Weight each term in centroids using TF/IDF weighting scheme.

PROMISE 2011: "Detecting Bug Duplicate Reports through Locality of Reference"

More Related Content

Viewers also liked (10)

Similar to PROMISE 2011: "Detecting Bug Duplicate Reports through Locality of Reference" (20)

More from CS, NcState (20)

Recently uploaded (20)

PROMISE 2011: "Detecting Bug Duplicate Reports through Locality of Reference"