A Bug Report Analysis and Search Tool
M.Sc. Presentation
Yguaratã Cerqueira Cavalcanti
yguarata@gmail.com
Advisor: Silvio Romero de Lemos Meira
Co-Advisor: Eduardo Santana de Almeida
Center for Informatics – Federal University of Pernambuco (UFPE)
http://www.cin.ufpe.br
Reuse in Software Engineering (RiSE)
http://www.rise.com.br
07/03/2009, Recife – Brazil
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 1 / 57
Summary
1 Introduction
M.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization Study
Definition, Planning and Operation, Results
3 BAST
Requirements, Architecture, Overview
4 Case Study
Definition, Planning, Analysis and interpretation
5 Experiment
Definition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 2 / 57
Outline
1 Introduction
M.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization Study
Definition, Planning and Operation, Results
3 BAST
Requirements, Architecture, Overview
4 Case Study
Definition, Planning, Analysis and interpretation
5 Experiment
Definition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 3 / 57
M.Sc. Context
Change management handles requests for:
new features
correction of errors
improvements
It drives the software maintenance and evolution
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 4 / 57
M.Sc. Context
Change management handles requests for:
new features
correction of errors
improvements
It drives the software maintenance and evolution
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 4 / 57
Motivation
Software maintenance and evolution are characterised by their huge
cost and slow speed of implementation
Sommerville says that it takes almost 90% of costs
Year Total costs Reference
2000 >90% Erlikh (2000)
1993 75% Eastwood (1993)
1990 >90% Moad (1990)
1990 60–70% Huff (1990)
1988 60–70% Port (1988)
1984 65–75% McKee (1984)
1981 >50% Lientz and Swanson (1981)
1979 67% Zelkowitz et al. (1979)
Table: Conducted studies about software maintenance costs (Koskinen, 2004).
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 5 / 57
Bug tracking activity
Bug reports management
Verify bug report validity
Analyze the impact of a bug report
Assign a developer
Help with development process in general
Bug reports Software artifact that describes some defect or enhancement;
Generally, bug report submitters are developers, users, or
testers
Bug trackers Bug trackers are used to manage, store and handle change
requests (also known as bug reports)
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 6 / 57
Bug tracking activity
Bug reports management
Verify bug report validity
Analyze the impact of a bug report
Assign a developer
Help with development process in general
Bug reports Software artifact that describes some defect or enhancement;
Generally, bug report submitters are developers, users, or
testers
Bug trackers Bug trackers are used to manage, store and handle change
requests (also known as bug reports)
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 6 / 57
Bug trackers advantages
Traceability (developers, releases)
Fast identification of problems
Metrics (errors per developers, to identify critical components, etc)
Comments
Project history
Examples: Mantis, Bugzilla, Trac, Jyra
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 7 / 57
A bug report example
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 8 / 57
A bug report example [2]
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 9 / 57
A bug report example [3]
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 10 / 57
A bug report example [4]
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 11 / 57
Issues coming from bug trackers
Dynamic assignment of bug reports (Anvik et al., 2006);
Change impact analysis and effort estimation of new bug reports
(Song et al., 2006);
Quality of bug report descriptions (Ko et al., 2006);
Software evolution traceability (Sandusky et al., 2004); and
Duplicate bug reports detection consists in avoiding the submission of
bug reports that describe the submitted issue (Hiew, 2006).
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 12 / 57
The bug report duplication problem
Characterized by the submission of two or more bug reports that describe
the same software issue
Overhead of rework to search and analyze bug reports
People take almost 5-15 minutes to perform search and analysis (Anvik
et al., 2005; Cavalcanti et al., 2008)
10% to 30% of a bug report repository are composed by duplicated bug
reports (Anvik et al., 2005; Runeson et al., 2007; Cavalcanti et al., 2008)
So, costs with
opening bug reports (5-15 minutes)
CCB analysis (5-15 minutes)
developer analysis (5-15 minutes)
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 13 / 57
Proposed solution
The proposed solution consists in a Web based application that enables
people involved with bug report search and analysis to perform such
tasks more effectively.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 14 / 57
Outline
1 Introduction
M.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization Study
Definition, Planning and Operation, Results
3 BAST
Requirements, Architecture, Overview
4 Case Study
Definition, Planning, Analysis and interpretation
5 Experiment
Definition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 15 / 57
Definition
The goal of this study was to analyze bug repositories and the activities for
searching and analyzing bug reports
with the purpose of understanding them with respect to the possible factors
that could impact on the duplication problem and their
consequences on software development
from the point of view of the researchers
in the context of software development projects
Questions
Q1: Do the projects have a considerable amount of duplicate bug reports?
Q2: Is the productivity being affected by the bug report duplication problem?
Q3: Is there a common vocabulary for bug report descriptions?
Q4: How are the relationships between master bug reports and duplicate bug
reports characterized?
Q5: Does the type of bug report influence the amount of duplicates?
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 16 / 57
Definition
The goal of this study was to analyze bug repositories and the activities for
searching and analyzing bug reports
with the purpose of understanding them with respect to the possible factors
that could impact on the duplication problem and their
consequences on software development
from the point of view of the researchers
in the context of software development projects
Questions
Q1: Do the projects have a considerable amount of duplicate bug reports?
Q2: Is the productivity being affected by the bug report duplication problem?
Q3: Is there a common vocabulary for bug report descriptions?
Q4: How are the relationships between master bug reports and duplicate bug
reports characterized?
Q5: Does the type of bug report influence the amount of duplicates?
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 16 / 57
Planning and operation
Projects and data selection
All bug reports till June/2008
Project LOC Staff size Bugs Life-time
Bugzilla 55K 340 12829 14
Eclipse 6.5M 352 130095 7
Epiphany 100K 19 10683 6
Evolution 1M 156 72646 11
Firefox 80K 514 60233 9
GCC 4.2M 285 35797 9
Thunderbird 310K 192 19204 8
Tomcat 200K 57 8293 8
Private Project 2M 21 7955 2
Performed at C.E.S.A.R. between June/2008 to August/2008
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 17 / 57
Results
Question 1: Do the analyzed projects have a considerable amount of
duplicate bug reports?
Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD
M1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4
Question 2: Is the submitters productivity being affected by the bug report
duplication problem?
Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD
M2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88
M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1
Question 3: Is there a common vocabulary for bug report descriptions?
Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD
M5 % – 25 – – 22 – – – 35 31.2 9.5
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 18 / 57
Results
Question 1: Do the analyzed projects have a considerable amount of
duplicate bug reports?
Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD
M1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4
Question 2: Is the submitters productivity being affected by the bug report
duplication problem?
Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD
M2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88
M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1
Question 3: Is there a common vocabulary for bug report descriptions?
Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD
M5 % – 25 – – 22 – – – 35 31.2 9.5
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 18 / 57
Results
Question 1: Do the analyzed projects have a considerable amount of
duplicate bug reports?
Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD
M1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4
Question 2: Is the submitters productivity being affected by the bug report
duplication problem?
Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD
M2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88
M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1
Question 3: Is there a common vocabulary for bug report descriptions?
Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD
M5 % – 25 – – 22 – – – 35 31.2 9.5
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 18 / 57
Results [2]
Question 4: How are the relationships between master bug reports and
duplicate bug reports characterized?
One to one relation
bug123: bug3453
One to many relation
bug345: bug45345,
bug465, bug654
Figure: Bug reports grouping.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 19 / 57
Results [3]
Question 5: Does the type of bug report influence the amount of duplicates?
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 20 / 57
Study summary
All the projects are being affected by the bug report duplication problem;
The productivity is being affected by the bug reports duplication problem;
It is not used a common vocabulary to describe the bug reports;
> 80% of the groups are composed by one-to-one grouping type;
The bug report duplication occur independently of the type of bug reports;
The number of LOC is not a factor for the duplication problem;
The size of the repository is not a factor for duplication;
Projects’ life-time is not a factor for duplication;
The staff size (developers) is not a factor for the duplication problem;
and
The profile of the submitter is a determining factor for the submission of
duplicates: sporadic ≥ average ≥ frequent
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 21 / 57
Study summary
All the projects are being affected by the bug report duplication problem;
The productivity is being affected by the bug reports duplication problem;
It is not used a common vocabulary to describe the bug reports;
> 80% of the groups are composed by one-to-one grouping type;
The bug report duplication occur independently of the type of bug reports;
The number of LOC is not a factor for the duplication problem;
The size of the repository is not a factor for duplication;
Projects’ life-time is not a factor for duplication;
The staff size (developers) is not a factor for the duplication problem;
and
The profile of the submitter is a determining factor for the submission of
duplicates: sporadic ≥ average ≥ frequent
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 21 / 57
Outline
1 Introduction
M.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization Study
Definition, Planning and Operation, Results
3 BAST
Requirements, Architecture, Overview
4 Case Study
Definition, Planning, Analysis and interpretation
5 Experiment
Definition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 22 / 57
Requirements
Functional requirements
FR1 - Keyword-based search
FR2 - Rank search results based
on bug reports similarity rate
FR3 - Index bug reports from XML
files
FR4 - Index bug reports from
original database
FR5 - Extract useful information
from bug reports
Non-Functional requirements
NFR1 - Simple and intuitive filters
interface
NFR2 - Reports about bug
repository status
NFR3 - Integration with most
popular bug report tracking
systems
NFR4 - Log search queries and
user actions
NFR5 - Reasonable similarity rate
NFR6 - Web-based interface with
AJAX
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 23 / 57
Requirements
Functional requirements
FR1 - Keyword-based search
FR2 - Rank search results based
on bug reports similarity rate
FR3 - Index bug reports from XML
files
FR4 - Index bug reports from
original database
FR5 - Extract useful information
from bug reports
Non-Functional requirements
NFR1 - Simple and intuitive filters
interface
NFR2 - Reports about bug
repository status
NFR3 - Integration with most
popular bug report tracking
systems
NFR4 - Log search queries and
user actions
NFR5 - Reasonable similarity rate
NFR6 - Web-based interface with
AJAX
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 23 / 57
Architecture
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 24 / 57
Overview
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 25 / 57
Outline
1 Introduction
M.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization Study
Definition, Planning and Operation, Results
3 BAST
Requirements, Architecture, Overview
4 Case Study
Definition, Planning, Analysis and interpretation
5 Experiment
Definition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 26 / 57
Definition
Context. Performed in a real test cycle at a C.E.S.A.R. partner
between July and August 2008
Systematic process to test and open bug reports
Objectives. 1 Which can prevent more duplicate bug reports
2 To consider whether our tool decreases the time spent on
analysis of bug reports
Baseline tool. Internal tool where testers can search for bug reports using
SQL filters.
Null hypotheses
H0: µtime with BAST > µtime with baseline
µduplicates avoided with BAST < µduplicates avoided with baseline
Alternative hypotheses
H1: µtime with BAST < µtime with baseline
µduplicates avoided with BAST > µduplicates avoided with baseline
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 27 / 57
Definition
Context. Performed in a real test cycle at a C.E.S.A.R. partner
between July and August 2008
Systematic process to test and open bug reports
Objectives. 1 Which can prevent more duplicate bug reports
2 To consider whether our tool decreases the time spent on
analysis of bug reports
Baseline tool. Internal tool where testers can search for bug reports using
SQL filters.
Null hypotheses
H0: µtime with BAST > µtime with baseline
µduplicates avoided with BAST < µduplicates avoided with baseline
Alternative hypotheses
H1: µtime with BAST < µtime with baseline
µduplicates avoided with BAST > µduplicates avoided with baseline
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 27 / 57
Planning
The tool was tested by the Bug Report Master
Responsible for the test cycle
Most experienced tester
Doubt should be saned with him
Case study design: Search and analysis being performed in:
1 step. Internal tool =⇒ BAST
2 step. BAST =⇒ Internal tool
Metrics (manual annotations):
Type of bug reports analyzed
Number of duplicate bug reports avoided
Time spent to analyze similar bug reports
Quantitative analysis: Descriptive statistics
It were analyzed 144 bug reports
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 28 / 57
Analysis and interpretation
Repository status
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 29 / 57
Analysis and interpretation [2]
Duplicates found
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 30 / 57
Analysis and interpretation [3]
Time spent
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 31 / 57
Case study summary
Bug tracker status. More than 50% of duplicates
Duplicates found. Our tool can prevent more duplicates than the
baseline tool
Time spent. The bug report master saved time using our tool
Drawbacks
Case study design. Accommodation of the subject, in which he prefers
to use one tool instead of other.
Amount of bug reports in treatments. The amounts of bug reports that
were analyzed in each treatment were very different.
Lack of subjects. The number of subjects was not sufficient to
generalize the case study results.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 32 / 57
Case study summary
Bug tracker status. More than 50% of duplicates
Duplicates found. Our tool can prevent more duplicates than the
baseline tool
Time spent. The bug report master saved time using our tool
Drawbacks
Case study design. Accommodation of the subject, in which he prefers
to use one tool instead of other.
Amount of bug reports in treatments. The amounts of bug reports that
were analyzed in each treatment were very different.
Lack of subjects. The number of subjects was not sufficient to
generalize the case study results.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 32 / 57
Outline
1 Introduction
M.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization Study
Definition, Planning and Operation, Results
3 BAST
Requirements, Architecture, Overview
4 Case Study
Definition, Planning, Analysis and interpretation
5 Experiment
Definition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 33 / 57
Definition
The goal of this experiment was to analyze a tool to improve search and
analysis of bug reports
with the purpose of evaluating it with respect to its effectiveness and efficiency
on detection of duplicate bug reports and time saving
from the point of view of the researchers
in the context of software development projects
Questions
Q1 Is there a reduction on the number of duplicated bug reports
with the new tool adoption?
Q2 Is there a reduction on the time that submitters spend to perform
the search and analysis of bug reports with the tool adoption?
Q3 Did the submitters have difficulties to use the tool?
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 34 / 57
Definition
The goal of this experiment was to analyze a tool to improve search and
analysis of bug reports
with the purpose of evaluating it with respect to its effectiveness and efficiency
on detection of duplicate bug reports and time saving
from the point of view of the researchers
in the context of software development projects
Questions
Q1 Is there a reduction on the number of duplicated bug reports
with the new tool adoption?
Q2 Is there a reduction on the time that submitters spend to perform
the search and analysis of bug reports with the tool adoption?
Q3 Did the submitters have difficulties to use the tool?
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 34 / 57
Definition [2]
Objects of study: BAST and Bugzilla.
Quality focus: Effectiveness and efficiency of the tool developed.
Context: The adoption of a tool developed to aid the bug report tracking
process, focusing on search and analysis of bug report to avoid
duplicates.
Experiment type: Off-line experiment (Wohlin et al., 2000)
Subjects: 18 Ph.D. and M.Sc. students from the Computer Science
department at Federal University of Pernambuco/Brazil
Performed distributed (no place restrictions)
Bug reports from Firefox open-source project
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 35 / 57
Planning
Subjects selection. Selected by convenience sampling (Wohlin et al.,
2000; Kitchenham and Pfleeger, 2002)
Instrumentation: 32 error descriptions concerning Firefox project
50% with defects that already have bug reports describing them in the
repository
50% with unique/not-reported defects
Guidelines to guide the experiment execution (FAQ)
Time-sheets to collect the time with search and analysis
Quantitative analysis: Descriptive statistics and hypothesis testing
[test-t (Wohlin et al., 2000)]
Qualitative analysis: Questionnaire
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 36 / 57
Planning [2]
Null hypothesis
H0: µtime with BAST > µtime with baseline
µduplicates avoided with BAST < µduplicates avoided with baseline
Alternative hypothesis
H1: µtime with BAST < µtime with baseline
µduplicates avoided with BAST > µduplicates avoided with baseline
Independent variables. The tool used (BAST or Bugzilla)
Dependent variables. (a) amount of duplicate bug reports and (b) the
time spent with search and analysis
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 37 / 57
Planning [3]
Experiment design
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 38 / 57
Analysis and interpretation
Descriptive statistics
Time spent on analysis Bug-reports avoided
BAST Bugzilla BAST Bugzilla
Mean 4.54 4.32 7.56 8.33
Maximum 6.84 9.56 13 12
Minimum 1.78 2.47 0 0
SD 1.49 1.91 3.5 3.2
Table: Descriptive statistics.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 39 / 57
Analysis and interpretation [2]
Descriptive statistics [2]
Figure: Box plot for time spent.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 40 / 57
Analysis and interpretation [3]
Descriptive statistics [3]
Figure: Box plot for duplicates avoided.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 41 / 57
Analysis and interpretation [4]
Hypothesis test
Time spent on analysis Duplicates avoided
t0 0.6292 -1.2466
Degrees of freedom 17 17
p-value 0.5376 0.2294
T distribution 2.11 2.11
Result (t0 > T) H0: not rejected H0: not rejected
Analysis of dependency
BAST time Bugzilla time BAST duplicates Bugzilla duplicates
Years of experience -0.13 -0.02 -0.19 0.18
Number of projects -0.11 0.37 -0.28 -0.025
Bug trackers used -0.16 0.35 -0.26 0.05
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 42 / 57
Analysis and interpretation [4]
Hypothesis test
Time spent on analysis Duplicates avoided
t0 0.6292 -1.2466
Degrees of freedom 17 17
p-value 0.5376 0.2294
T distribution 2.11 2.11
Result (t0 > T) H0: not rejected H0: not rejected
Analysis of dependency
BAST time Bugzilla time BAST duplicates Bugzilla duplicates
Years of experience -0.13 -0.02 -0.19 0.18
Number of projects -0.11 0.37 -0.28 -0.025
Bug trackers used -0.16 0.35 -0.26 0.05
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 42 / 57
Qualitative analysis
BAST features. Seven (7) used the filter features provided by the tool.
BAST Usability. Only one mentioned some difficult to use the filters, and only
one subject had problem with ordering features.
BAST usefulness. Fifteen (15) subjects believe that the way as bug report
details are presented in BAST is useful for the analysis, more than Bugzilla.
Testimonials
“in fact, the way details are presented saves time to check them, since it is not
necessary to open extra tabs or windows to see the details”, and other wrote “it
became easier to identify the duplicate bug reports and navigate among the
details of the them”.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 43 / 57
Qualitative analysis
BAST features. Seven (7) used the filter features provided by the tool.
BAST Usability. Only one mentioned some difficult to use the filters, and only
one subject had problem with ordering features.
BAST usefulness. Fifteen (15) subjects believe that the way as bug report
details are presented in BAST is useful for the analysis, more than Bugzilla.
Testimonials
“in fact, the way details are presented saves time to check them, since it is not
necessary to open extra tabs or windows to see the details”, and other wrote “it
became easier to identify the duplicate bug reports and navigate among the
details of the them”.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 43 / 57
Validity Threats
Boredom
Lack of Historical Data
Environment
Subjects Knowledge on bug reports
Errors re-descriptions and fictitious errors
Halo Effect
Internet Connection Constraints
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 44 / 57
Outline
1 Introduction
M.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization Study
Definition, Planning and Operation, Results
3 BAST
Requirements, Architecture, Overview
4 Case Study
Definition, Planning, Analysis and interpretation
5 Experiment
Definition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 45 / 57
Related work
Automated Support for Classifying Software Failure Reports
(Podgurski et al., 2003)
Bug reports: Software failures automatically submitted
Technique: Supervised and unsupervised pattern classification and
multivariate visualization
Testing: Batch runs
Dataset: GCC, Jikes, and JavaC
Assisted Detection of Duplicate Bug Reports (Hiew, 2006)
Bug reports: Natural language bug reports
Technique: Organize similar bug reports into centroids using TF-IDF
Testing: Batch runs
Dataset: Firefox, Eclipse, Apache, and Fedora Core
Results: Precision of 29% and recall of 50%
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 46 / 57
Related work [2]
Detection of Duplicate Defect Reports Using Natural Language
Processing (Runeson et al., 2007)
Bug reports: Natural language bug reports
Technique: Natural Language Processing (NLP)
Testing: Batch runs and a tool
Dataset: Sony Ericsson Mobile Communications
Results: Recall of 40%
An Approach to Detecting Duplicate Bug Reports Using Natural
Language and Execution Information (Wang et al., 2008)
Bug reports: Natural language bug reports
Technique: NLP and execution information
Testing: Batch runs
Dataset: Firefox and Eclipse
Results: Recall of 67%-93% at its best
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 47 / 57
Outline
1 Introduction
M.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization Study
Definition, Planning and Operation, Results
3 BAST
Requirements, Architecture, Overview
4 Case Study
Definition, Planning, Analysis and interpretation
5 Experiment
Definition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 48 / 57
Research contribution
A taxonomy for the bug repositories mining area
The state-of-the-art on mining bug repositories
A characterization of the bug report duplication problem
A tool to reduce the time spent with search and analysis of bug
reports
A case study to evaluate the tool proposed;
An experiment with 18 subjects to evaluate the tool
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 49 / 57
Papers
Cavalcanti, Y. C., Martins, A. C., de Almeida, E. S., and de Lemos Meira,
S. R. (2008a). Avoiding Duplicate CR reports in Open Source Software
Projects. In The 9th International Free Software Forum (IFSF’08), Porto
Alegre, Brazil.
Cavalcanti, Y. C., de Almeida, E. S., da Cunha, C. E. A., Pinto, E. R., and
Meira, S. R. L. (2008b). The Bug Report Duplication Problem: A
Characterization Study. Technical report, C.E.S.A.R and Federal
University of Pernambuco.
Papers for the Case Study and for the Experiment
And more two journal papers being written (characterization and thesis)
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 50 / 57
Future Work
Evolve from prototype
Information visualization
Alternative integration methods
Provide integration with other
tools
Search and raking techniques
Comments of a bug report
Number of informal references
Experiment replications
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 51 / 57
Outline
1 Introduction
M.Sc. Context, Motivation, Proposed solution
2 The Bug Report Duplication Problem: A Characterization Study
Definition, Planning and Operation, Results
3 BAST
Requirements, Architecture, Overview
4 Case Study
Definition, Planning, Analysis and interpretation
5 Experiment
Definition, Planning, Analysis and interpretation
6 Related Work
7 Conclusion
8 References
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 52 / 57
References I
Anvik, J., Hiew, L., and Murphy, G. C. (2005). Coping with an open bug
repository. In Proceedings of the 2005 OOPSLA workshop on Eclipse
technology eXchange, pages 35–39, New York, NY, USA. ACM Press.
Anvik, J., Hiew, L., and Murphy, G. C. (2006). Who should fix this bug? In
Proceeding of the 28th International Conference on Software Engineering
(ICSE’06), pages 361–370, New York, NY, USA. ACM Press.
Cavalcanti, Y. C., Almeida, E. S., da Cunha, C. E. A., Pinto, E. R., and Meira,
S. R. L. (2008). The bug-report duplication problem: a characterization
study. Technical report, C.E.S.A.R and Federal University of Pernambuco.
Eastwood, A. (1993). Firm fires shots at legacy systems. Computing Canada,
19(2), 17.
Erlikh, L. (2000). Leveraging legacy system dollars for e-business. IT
Professional, 2(3), 17–23.
Hiew, L. (2006). Assisted Detection of Duplicate Bug Reports. Master’s thesis,
The University of British Columbia.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 53 / 57
References II
Huff, F. (1990). Information systems maintenance. The Business Quarterly,
(55), 30–32.
Kitchenham, B. and Pfleeger, S. L. (2002). Principles of survey research: part
5: populations and samples. SIGSOFT Software Engineering Notes, 27(5),
17–20.
Ko, A. J., Myers, B. A., and Chau, D. H. (2006). A linguistic analysis of how
people describe software problems. In Proceedings of the Visual
Languages and Human-Centric Computing (VLHCC’06), pages 127–134,
Washington, DC, USA. IEEE Computer Science.
Koskinen, J. (2004). Software maintenance costs.
http://www.cs.jyu.fi/~koskinen/smcosts.htm.
Lientz, B. P. and Swanson, E. B. (1981). Problems in application software
maintenance. Communications of the ACM, 24(11), 763–769.
McKee, J. R. (1984). Maintenance as a function of design. In AFIPS National
Conference Proceeding, volume 53, pages 187–1983.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 54 / 57
References III
Moad, J. (1990). Maintaining the competitive edge. Datamation, 4(36), 61–62.
Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J., and Wang,
B. (2003). Automated support for classifying software failure reports. In
Proceedings of the 25th International Conference on Software Engineering
(ICSE’03), pages 465–475, Washington, DC, USA. IEEE Computer Society.
Port, O. (1988). The software trap – automate or else. Business Week,
9(3051), 142–154.
Runeson, P., Alexandersson, M., and Nyholm, O. (2007). Detection of
duplicate defect reports using natural language processing. In Proceedings
of the 29th International Conference on Software Engineering (ICSE’07),
pages 499–510. IEEE Computer Science Press.
Sandusky, R. J., Gasser, L., and Ripoche, G. (2004). Bug report networks:
Varieties, strategies, and impacts in a f/oss development community. In
Proceedings of the 1st International Workshop on Mining Software
Repositories (MSR’04), pages 80–84, University of Waterloo, Waterloo.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 55 / 57
References IV
Sommerville, I. (2007). Software Engineering. Addison Wesley, 8 edition.
Song, Q., Shepperd, M. J., Cartwright, M., and Mair, C. (2006). Software
defect association mining and defect correction effort prediction. IEEE
Transactions on Software Engineering, 32(2), 69–82.
Wang, X., Zhang, L., Xie, T., Anvik, J., and Sun, J. (2008). An approach to
detecting duplicate bug reports using natural language and execution
information. In Proceedings of the 13th International Conference on
Software Engineering (ICSE’08), pages 461–470. ACM Press.
Wohlin, C., Runeson, P., Martin Höst, M. C. O., Regnell, B., and Wesslén, A.
(2000). Experimentation in Software Engineering: An Introduction. The
Kluwer Internation Series in Software Engineering. Kluwer Academic
Publishers, Norwell, Massachusets, USA.
Zelkowitz, M. V., Shaw, A. C., and Gannon, J. D. (1979). Principles of Software
Engineering and Design. Prentice Hall Professional Technical Reference.
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 56 / 57
A Bug Report Analysis and Search Tool
M.Sc. Presentation
Yguaratã Cerqueira Cavalcanti
yguarata@gmail.com
Advisor: Silvio Romero de Lemos Meira
Co-Advisor: Eduardo Santana de Almeida
Center for Informatics – Federal University of Pernambuco (UFPE)
http://www.cin.ufpe.br
Reuse in Software Engineering (RiSE)
http://www.rise.com.br
07/03/2009, Recife – Brazil
Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 57 / 57

More Related Content

PDF
Combining Rule-based and Information Retrieval Techniques to assign Software ...
PDF
AN AUTOMATED APPROACH TO ASSIGN SOFTWARE CHANGE REQUESTS (Ph.D. Presentation)
PDF
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
PDF
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
PDF
A survey of fault prediction using machine learning algorithms
PDF
Software bug prediction
PDF
Final thesis: Technological maturity of future energy systems
PDF
Cser13.ppt
Combining Rule-based and Information Retrieval Techniques to assign Software ...
AN AUTOMATED APPROACH TO ASSIGN SOFTWARE CHANGE REQUESTS (Ph.D. Presentation)
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A survey of fault prediction using machine learning algorithms
Software bug prediction
Final thesis: Technological maturity of future energy systems
Cser13.ppt

What's hot (20)

PDF
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
PDF
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
PPT
Technology Readiness
PDF
Recommending Software Refactoring Using Search-based Software Enginnering
PDF
Virtual Qualification
PPTX
An Investigation Of EXtreme Programming Practices
PDF
Technology & innovation Management Course - Session 2
DOCX
Mary_Deepthy
PDF
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
PDF
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
PPTX
Trl and value chain
PDF
A Mono- and Multi-objective Approach for Recommending Software Refactoring
PPT
Assessing the Reliability of a Human Estimator
DOC
SOFTWARE QUALITY ASSURANCE AND TESTING - SHORT NOTES
PDF
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
PPT
A suite of tools for technology assessment
PPT
An Application-Oriented Approach for Computer Security Education
PDF
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
PDF
Ssbse12b.ppt
PDF
Fehlmann and Kranich - Measuring tests using cosmic
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
Technology Readiness
Recommending Software Refactoring Using Search-based Software Enginnering
Virtual Qualification
An Investigation Of EXtreme Programming Practices
Technology & innovation Management Course - Session 2
Mary_Deepthy
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
Tools for Building Confidence in Using Simulation To Inform or Replace Real-W...
Trl and value chain
A Mono- and Multi-objective Approach for Recommending Software Refactoring
Assessing the Reliability of a Human Estimator
SOFTWARE QUALITY ASSURANCE AND TESTING - SHORT NOTES
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
A suite of tools for technology assessment
An Application-Oriented Approach for Computer Security Education
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
Ssbse12b.ppt
Fehlmann and Kranich - Measuring tests using cosmic
Ad

Similar to A Bug Report Analysis and Search Tool (presentation for M.Sc. degree) (20)

PDF
When do software issues get reported in large open source software - Rakesh Rana
PPTX
When do software issues get reported in large open source software
PDF
Bug Triage: An Automated Process
PDF
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
PDF
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
PDF
Defect Management Practices and Problems in Free/Open Source Software Projects
PDF
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...
PDF
M018147883
PDF
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
PDF
Defect Prediction: Accomplishments and Future Challenges
PPT
PROMISE 2011: "Detecting Bug Duplicate Reports through Locality of Reference"
PDF
survey on analysing the crash reports of software applications
PDF
Five Minute Speech: An Overview of Activities Developed in Disciplines and Gu...
PDF
A Complexity Based Regression Test Selection Strategy
PDF
CGIAR Consortium/System Office - Monitoring, Evaluation and Learning
ODP
Workshop BI/DWH AGILE TESTING SNS Bank English
PDF
Software CrashLocator: Locating the Faulty Functions by Analyzing the Crash S...
PDF
F017652530
PDF
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
When do software issues get reported in large open source software - Rakesh Rana
When do software issues get reported in large open source software
Bug Triage: An Automated Process
178 - A replicated study on duplicate detection: Using Apache Lucene to searc...
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
Defect Management Practices and Problems in Free/Open Source Software Projects
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...
M018147883
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
Defect Prediction: Accomplishments and Future Challenges
PROMISE 2011: "Detecting Bug Duplicate Reports through Locality of Reference"
survey on analysing the crash reports of software applications
Five Minute Speech: An Overview of Activities Developed in Disciplines and Gu...
A Complexity Based Regression Test Selection Strategy
CGIAR Consortium/System Office - Monitoring, Evaluation and Learning
Workshop BI/DWH AGILE TESTING SNS Bank English
Software CrashLocator: Locating the Faulty Functions by Analyzing the Crash S...
F017652530
A Review on Software Fault Detection and Prevention Mechanism in Software Dev...
Ad

Recently uploaded (20)

PPTX
Python is a high-level, interpreted programming language
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PPTX
Introduction to Windows Operating System
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PDF
AI Guide for Business Growth - Arna Softech
PDF
Microsoft Office 365 Crack Download Free
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PPTX
Airline CRS | Airline CRS Systems | CRS System
PPTX
How to Odoo 19 Installation on Ubuntu - CandidRoot
PPTX
Computer Software - Technology and Livelihood Education
PPTX
Cybersecurity: Protecting the Digital World
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
PPTX
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
PPTX
Full-Stack Developer Courses That Actually Land You Jobs
PDF
Guide to Food Delivery App Development.pdf
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
Python is a high-level, interpreted programming language
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
MCP Security Tutorial - Beginner to Advanced
DNT Brochure 2025 – ISV Solutions @ D365
Introduction to Windows Operating System
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
AI Guide for Business Growth - Arna Softech
Microsoft Office 365 Crack Download Free
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
Airline CRS | Airline CRS Systems | CRS System
How to Odoo 19 Installation on Ubuntu - CandidRoot
Computer Software - Technology and Livelihood Education
Cybersecurity: Protecting the Digital World
Topaz Photo AI Crack New Download (Latest 2025)
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
Full-Stack Developer Courses That Actually Land You Jobs
Guide to Food Delivery App Development.pdf
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Wondershare Recoverit Full Crack New Version (Latest 2025)

A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)

  • 1. A Bug Report Analysis and Search Tool M.Sc. Presentation Yguaratã Cerqueira Cavalcanti yguarata@gmail.com Advisor: Silvio Romero de Lemos Meira Co-Advisor: Eduardo Santana de Almeida Center for Informatics – Federal University of Pernambuco (UFPE) http://www.cin.ufpe.br Reuse in Software Engineering (RiSE) http://www.rise.com.br 07/03/2009, Recife – Brazil Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 1 / 57
  • 2. Summary 1 Introduction M.Sc. Context, Motivation, Proposed solution 2 The Bug Report Duplication Problem: A Characterization Study Definition, Planning and Operation, Results 3 BAST Requirements, Architecture, Overview 4 Case Study Definition, Planning, Analysis and interpretation 5 Experiment Definition, Planning, Analysis and interpretation 6 Related Work 7 Conclusion 8 References Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 2 / 57
  • 3. Outline 1 Introduction M.Sc. Context, Motivation, Proposed solution 2 The Bug Report Duplication Problem: A Characterization Study Definition, Planning and Operation, Results 3 BAST Requirements, Architecture, Overview 4 Case Study Definition, Planning, Analysis and interpretation 5 Experiment Definition, Planning, Analysis and interpretation 6 Related Work 7 Conclusion 8 References Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 3 / 57
  • 4. M.Sc. Context Change management handles requests for: new features correction of errors improvements It drives the software maintenance and evolution Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 4 / 57
  • 5. M.Sc. Context Change management handles requests for: new features correction of errors improvements It drives the software maintenance and evolution Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 4 / 57
  • 6. Motivation Software maintenance and evolution are characterised by their huge cost and slow speed of implementation Sommerville says that it takes almost 90% of costs Year Total costs Reference 2000 >90% Erlikh (2000) 1993 75% Eastwood (1993) 1990 >90% Moad (1990) 1990 60–70% Huff (1990) 1988 60–70% Port (1988) 1984 65–75% McKee (1984) 1981 >50% Lientz and Swanson (1981) 1979 67% Zelkowitz et al. (1979) Table: Conducted studies about software maintenance costs (Koskinen, 2004). Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 5 / 57
  • 7. Bug tracking activity Bug reports management Verify bug report validity Analyze the impact of a bug report Assign a developer Help with development process in general Bug reports Software artifact that describes some defect or enhancement; Generally, bug report submitters are developers, users, or testers Bug trackers Bug trackers are used to manage, store and handle change requests (also known as bug reports) Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 6 / 57
  • 8. Bug tracking activity Bug reports management Verify bug report validity Analyze the impact of a bug report Assign a developer Help with development process in general Bug reports Software artifact that describes some defect or enhancement; Generally, bug report submitters are developers, users, or testers Bug trackers Bug trackers are used to manage, store and handle change requests (also known as bug reports) Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 6 / 57
  • 9. Bug trackers advantages Traceability (developers, releases) Fast identification of problems Metrics (errors per developers, to identify critical components, etc) Comments Project history Examples: Mantis, Bugzilla, Trac, Jyra Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 7 / 57
  • 10. A bug report example Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 8 / 57
  • 11. A bug report example [2] Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 9 / 57
  • 12. A bug report example [3] Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 10 / 57
  • 13. A bug report example [4] Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 11 / 57
  • 14. Issues coming from bug trackers Dynamic assignment of bug reports (Anvik et al., 2006); Change impact analysis and effort estimation of new bug reports (Song et al., 2006); Quality of bug report descriptions (Ko et al., 2006); Software evolution traceability (Sandusky et al., 2004); and Duplicate bug reports detection consists in avoiding the submission of bug reports that describe the submitted issue (Hiew, 2006). Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 12 / 57
  • 15. The bug report duplication problem Characterized by the submission of two or more bug reports that describe the same software issue Overhead of rework to search and analyze bug reports People take almost 5-15 minutes to perform search and analysis (Anvik et al., 2005; Cavalcanti et al., 2008) 10% to 30% of a bug report repository are composed by duplicated bug reports (Anvik et al., 2005; Runeson et al., 2007; Cavalcanti et al., 2008) So, costs with opening bug reports (5-15 minutes) CCB analysis (5-15 minutes) developer analysis (5-15 minutes) Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 13 / 57
  • 16. Proposed solution The proposed solution consists in a Web based application that enables people involved with bug report search and analysis to perform such tasks more effectively. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 14 / 57
  • 17. Outline 1 Introduction M.Sc. Context, Motivation, Proposed solution 2 The Bug Report Duplication Problem: A Characterization Study Definition, Planning and Operation, Results 3 BAST Requirements, Architecture, Overview 4 Case Study Definition, Planning, Analysis and interpretation 5 Experiment Definition, Planning, Analysis and interpretation 6 Related Work 7 Conclusion 8 References Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 15 / 57
  • 18. Definition The goal of this study was to analyze bug repositories and the activities for searching and analyzing bug reports with the purpose of understanding them with respect to the possible factors that could impact on the duplication problem and their consequences on software development from the point of view of the researchers in the context of software development projects Questions Q1: Do the projects have a considerable amount of duplicate bug reports? Q2: Is the productivity being affected by the bug report duplication problem? Q3: Is there a common vocabulary for bug report descriptions? Q4: How are the relationships between master bug reports and duplicate bug reports characterized? Q5: Does the type of bug report influence the amount of duplicates? Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 16 / 57
  • 19. Definition The goal of this study was to analyze bug repositories and the activities for searching and analyzing bug reports with the purpose of understanding them with respect to the possible factors that could impact on the duplication problem and their consequences on software development from the point of view of the researchers in the context of software development projects Questions Q1: Do the projects have a considerable amount of duplicate bug reports? Q2: Is the productivity being affected by the bug report duplication problem? Q3: Is there a common vocabulary for bug report descriptions? Q4: How are the relationships between master bug reports and duplicate bug reports characterized? Q5: Does the type of bug report influence the amount of duplicates? Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 16 / 57
  • 20. Planning and operation Projects and data selection All bug reports till June/2008 Project LOC Staff size Bugs Life-time Bugzilla 55K 340 12829 14 Eclipse 6.5M 352 130095 7 Epiphany 100K 19 10683 6 Evolution 1M 156 72646 11 Firefox 80K 514 60233 9 GCC 4.2M 285 35797 9 Thunderbird 310K 192 19204 8 Tomcat 200K 57 8293 8 Private Project 2M 21 7955 2 Performed at C.E.S.A.R. between June/2008 to August/2008 Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 17 / 57
  • 21. Results Question 1: Do the analyzed projects have a considerable amount of duplicate bug reports? Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD M1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4 Question 2: Is the submitters productivity being affected by the bug report duplication problem? Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD M2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88 M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1 Question 3: Is there a common vocabulary for bug report descriptions? Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD M5 % – 25 – – 22 – – – 35 31.2 9.5 Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 18 / 57
  • 22. Results Question 1: Do the analyzed projects have a considerable amount of duplicate bug reports? Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD M1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4 Question 2: Is the submitters productivity being affected by the bug report duplication problem? Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD M2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88 M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1 Question 3: Is there a common vocabulary for bug report descriptions? Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD M5 % – 25 – – 22 – – – 35 31.2 9.5 Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 18 / 57
  • 23. Results Question 1: Do the analyzed projects have a considerable amount of duplicate bug reports? Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD M1 % 23.32 19.44 31.52 43.24 38.39 17.68 49.10 8.24 21.59 28.1 13.4 Question 2: Is the submitters productivity being affected by the bug report duplication problem? Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD M2 (min) 05-15 – 05-15 05-15 05-10 05-15 05-15 – 20-30 12.5 1.88 M4 bugs per day 71 722 59 403 334 198 106 46 145 231.5 222.1 Question 3: Is there a common vocabulary for bug report descriptions? Metric Bugz. Eclip. Epiph. Evol. Firef. GCC Thund. Tomc. Private Proj. Mean SD M5 % – 25 – – 22 – – – 35 31.2 9.5 Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 18 / 57
  • 24. Results [2] Question 4: How are the relationships between master bug reports and duplicate bug reports characterized? One to one relation bug123: bug3453 One to many relation bug345: bug45345, bug465, bug654 Figure: Bug reports grouping. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 19 / 57
  • 25. Results [3] Question 5: Does the type of bug report influence the amount of duplicates? Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 20 / 57
  • 26. Study summary All the projects are being affected by the bug report duplication problem; The productivity is being affected by the bug reports duplication problem; It is not used a common vocabulary to describe the bug reports; > 80% of the groups are composed by one-to-one grouping type; The bug report duplication occur independently of the type of bug reports; The number of LOC is not a factor for the duplication problem; The size of the repository is not a factor for duplication; Projects’ life-time is not a factor for duplication; The staff size (developers) is not a factor for the duplication problem; and The profile of the submitter is a determining factor for the submission of duplicates: sporadic ≥ average ≥ frequent Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 21 / 57
  • 27. Study summary All the projects are being affected by the bug report duplication problem; The productivity is being affected by the bug reports duplication problem; It is not used a common vocabulary to describe the bug reports; > 80% of the groups are composed by one-to-one grouping type; The bug report duplication occur independently of the type of bug reports; The number of LOC is not a factor for the duplication problem; The size of the repository is not a factor for duplication; Projects’ life-time is not a factor for duplication; The staff size (developers) is not a factor for the duplication problem; and The profile of the submitter is a determining factor for the submission of duplicates: sporadic ≥ average ≥ frequent Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 21 / 57
  • 28. Outline 1 Introduction M.Sc. Context, Motivation, Proposed solution 2 The Bug Report Duplication Problem: A Characterization Study Definition, Planning and Operation, Results 3 BAST Requirements, Architecture, Overview 4 Case Study Definition, Planning, Analysis and interpretation 5 Experiment Definition, Planning, Analysis and interpretation 6 Related Work 7 Conclusion 8 References Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 22 / 57
  • 29. Requirements Functional requirements FR1 - Keyword-based search FR2 - Rank search results based on bug reports similarity rate FR3 - Index bug reports from XML files FR4 - Index bug reports from original database FR5 - Extract useful information from bug reports Non-Functional requirements NFR1 - Simple and intuitive filters interface NFR2 - Reports about bug repository status NFR3 - Integration with most popular bug report tracking systems NFR4 - Log search queries and user actions NFR5 - Reasonable similarity rate NFR6 - Web-based interface with AJAX Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 23 / 57
  • 30. Requirements Functional requirements FR1 - Keyword-based search FR2 - Rank search results based on bug reports similarity rate FR3 - Index bug reports from XML files FR4 - Index bug reports from original database FR5 - Extract useful information from bug reports Non-Functional requirements NFR1 - Simple and intuitive filters interface NFR2 - Reports about bug repository status NFR3 - Integration with most popular bug report tracking systems NFR4 - Log search queries and user actions NFR5 - Reasonable similarity rate NFR6 - Web-based interface with AJAX Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 23 / 57
  • 31. Architecture Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 24 / 57
  • 32. Overview Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 25 / 57
  • 33. Outline 1 Introduction M.Sc. Context, Motivation, Proposed solution 2 The Bug Report Duplication Problem: A Characterization Study Definition, Planning and Operation, Results 3 BAST Requirements, Architecture, Overview 4 Case Study Definition, Planning, Analysis and interpretation 5 Experiment Definition, Planning, Analysis and interpretation 6 Related Work 7 Conclusion 8 References Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 26 / 57
  • 34. Definition Context. Performed in a real test cycle at a C.E.S.A.R. partner between July and August 2008 Systematic process to test and open bug reports Objectives. 1 Which can prevent more duplicate bug reports 2 To consider whether our tool decreases the time spent on analysis of bug reports Baseline tool. Internal tool where testers can search for bug reports using SQL filters. Null hypotheses H0: µtime with BAST > µtime with baseline µduplicates avoided with BAST < µduplicates avoided with baseline Alternative hypotheses H1: µtime with BAST < µtime with baseline µduplicates avoided with BAST > µduplicates avoided with baseline Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 27 / 57
  • 35. Definition Context. Performed in a real test cycle at a C.E.S.A.R. partner between July and August 2008 Systematic process to test and open bug reports Objectives. 1 Which can prevent more duplicate bug reports 2 To consider whether our tool decreases the time spent on analysis of bug reports Baseline tool. Internal tool where testers can search for bug reports using SQL filters. Null hypotheses H0: µtime with BAST > µtime with baseline µduplicates avoided with BAST < µduplicates avoided with baseline Alternative hypotheses H1: µtime with BAST < µtime with baseline µduplicates avoided with BAST > µduplicates avoided with baseline Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 27 / 57
  • 36. Planning The tool was tested by the Bug Report Master Responsible for the test cycle Most experienced tester Doubt should be saned with him Case study design: Search and analysis being performed in: 1 step. Internal tool =⇒ BAST 2 step. BAST =⇒ Internal tool Metrics (manual annotations): Type of bug reports analyzed Number of duplicate bug reports avoided Time spent to analyze similar bug reports Quantitative analysis: Descriptive statistics It were analyzed 144 bug reports Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 28 / 57
  • 37. Analysis and interpretation Repository status Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 29 / 57
  • 38. Analysis and interpretation [2] Duplicates found Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 30 / 57
  • 39. Analysis and interpretation [3] Time spent Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 31 / 57
  • 40. Case study summary Bug tracker status. More than 50% of duplicates Duplicates found. Our tool can prevent more duplicates than the baseline tool Time spent. The bug report master saved time using our tool Drawbacks Case study design. Accommodation of the subject, in which he prefers to use one tool instead of other. Amount of bug reports in treatments. The amounts of bug reports that were analyzed in each treatment were very different. Lack of subjects. The number of subjects was not sufficient to generalize the case study results. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 32 / 57
  • 41. Case study summary Bug tracker status. More than 50% of duplicates Duplicates found. Our tool can prevent more duplicates than the baseline tool Time spent. The bug report master saved time using our tool Drawbacks Case study design. Accommodation of the subject, in which he prefers to use one tool instead of other. Amount of bug reports in treatments. The amounts of bug reports that were analyzed in each treatment were very different. Lack of subjects. The number of subjects was not sufficient to generalize the case study results. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 32 / 57
  • 42. Outline 1 Introduction M.Sc. Context, Motivation, Proposed solution 2 The Bug Report Duplication Problem: A Characterization Study Definition, Planning and Operation, Results 3 BAST Requirements, Architecture, Overview 4 Case Study Definition, Planning, Analysis and interpretation 5 Experiment Definition, Planning, Analysis and interpretation 6 Related Work 7 Conclusion 8 References Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 33 / 57
  • 43. Definition The goal of this experiment was to analyze a tool to improve search and analysis of bug reports with the purpose of evaluating it with respect to its effectiveness and efficiency on detection of duplicate bug reports and time saving from the point of view of the researchers in the context of software development projects Questions Q1 Is there a reduction on the number of duplicated bug reports with the new tool adoption? Q2 Is there a reduction on the time that submitters spend to perform the search and analysis of bug reports with the tool adoption? Q3 Did the submitters have difficulties to use the tool? Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 34 / 57
  • 44. Definition The goal of this experiment was to analyze a tool to improve search and analysis of bug reports with the purpose of evaluating it with respect to its effectiveness and efficiency on detection of duplicate bug reports and time saving from the point of view of the researchers in the context of software development projects Questions Q1 Is there a reduction on the number of duplicated bug reports with the new tool adoption? Q2 Is there a reduction on the time that submitters spend to perform the search and analysis of bug reports with the tool adoption? Q3 Did the submitters have difficulties to use the tool? Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 34 / 57
  • 45. Definition [2] Objects of study: BAST and Bugzilla. Quality focus: Effectiveness and efficiency of the tool developed. Context: The adoption of a tool developed to aid the bug report tracking process, focusing on search and analysis of bug report to avoid duplicates. Experiment type: Off-line experiment (Wohlin et al., 2000) Subjects: 18 Ph.D. and M.Sc. students from the Computer Science department at Federal University of Pernambuco/Brazil Performed distributed (no place restrictions) Bug reports from Firefox open-source project Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 35 / 57
  • 46. Planning Subjects selection. Selected by convenience sampling (Wohlin et al., 2000; Kitchenham and Pfleeger, 2002) Instrumentation: 32 error descriptions concerning Firefox project 50% with defects that already have bug reports describing them in the repository 50% with unique/not-reported defects Guidelines to guide the experiment execution (FAQ) Time-sheets to collect the time with search and analysis Quantitative analysis: Descriptive statistics and hypothesis testing [test-t (Wohlin et al., 2000)] Qualitative analysis: Questionnaire Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 36 / 57
  • 47. Planning [2] Null hypothesis H0: µtime with BAST > µtime with baseline µduplicates avoided with BAST < µduplicates avoided with baseline Alternative hypothesis H1: µtime with BAST < µtime with baseline µduplicates avoided with BAST > µduplicates avoided with baseline Independent variables. The tool used (BAST or Bugzilla) Dependent variables. (a) amount of duplicate bug reports and (b) the time spent with search and analysis Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 37 / 57
  • 48. Planning [3] Experiment design Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 38 / 57
  • 49. Analysis and interpretation Descriptive statistics Time spent on analysis Bug-reports avoided BAST Bugzilla BAST Bugzilla Mean 4.54 4.32 7.56 8.33 Maximum 6.84 9.56 13 12 Minimum 1.78 2.47 0 0 SD 1.49 1.91 3.5 3.2 Table: Descriptive statistics. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 39 / 57
  • 50. Analysis and interpretation [2] Descriptive statistics [2] Figure: Box plot for time spent. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 40 / 57
  • 51. Analysis and interpretation [3] Descriptive statistics [3] Figure: Box plot for duplicates avoided. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 41 / 57
  • 52. Analysis and interpretation [4] Hypothesis test Time spent on analysis Duplicates avoided t0 0.6292 -1.2466 Degrees of freedom 17 17 p-value 0.5376 0.2294 T distribution 2.11 2.11 Result (t0 > T) H0: not rejected H0: not rejected Analysis of dependency BAST time Bugzilla time BAST duplicates Bugzilla duplicates Years of experience -0.13 -0.02 -0.19 0.18 Number of projects -0.11 0.37 -0.28 -0.025 Bug trackers used -0.16 0.35 -0.26 0.05 Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 42 / 57
  • 53. Analysis and interpretation [4] Hypothesis test Time spent on analysis Duplicates avoided t0 0.6292 -1.2466 Degrees of freedom 17 17 p-value 0.5376 0.2294 T distribution 2.11 2.11 Result (t0 > T) H0: not rejected H0: not rejected Analysis of dependency BAST time Bugzilla time BAST duplicates Bugzilla duplicates Years of experience -0.13 -0.02 -0.19 0.18 Number of projects -0.11 0.37 -0.28 -0.025 Bug trackers used -0.16 0.35 -0.26 0.05 Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 42 / 57
  • 54. Qualitative analysis BAST features. Seven (7) used the filter features provided by the tool. BAST Usability. Only one mentioned some difficult to use the filters, and only one subject had problem with ordering features. BAST usefulness. Fifteen (15) subjects believe that the way as bug report details are presented in BAST is useful for the analysis, more than Bugzilla. Testimonials “in fact, the way details are presented saves time to check them, since it is not necessary to open extra tabs or windows to see the details”, and other wrote “it became easier to identify the duplicate bug reports and navigate among the details of the them”. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 43 / 57
  • 55. Qualitative analysis BAST features. Seven (7) used the filter features provided by the tool. BAST Usability. Only one mentioned some difficult to use the filters, and only one subject had problem with ordering features. BAST usefulness. Fifteen (15) subjects believe that the way as bug report details are presented in BAST is useful for the analysis, more than Bugzilla. Testimonials “in fact, the way details are presented saves time to check them, since it is not necessary to open extra tabs or windows to see the details”, and other wrote “it became easier to identify the duplicate bug reports and navigate among the details of the them”. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 43 / 57
  • 56. Validity Threats Boredom Lack of Historical Data Environment Subjects Knowledge on bug reports Errors re-descriptions and fictitious errors Halo Effect Internet Connection Constraints Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 44 / 57
  • 57. Outline 1 Introduction M.Sc. Context, Motivation, Proposed solution 2 The Bug Report Duplication Problem: A Characterization Study Definition, Planning and Operation, Results 3 BAST Requirements, Architecture, Overview 4 Case Study Definition, Planning, Analysis and interpretation 5 Experiment Definition, Planning, Analysis and interpretation 6 Related Work 7 Conclusion 8 References Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 45 / 57
  • 58. Related work Automated Support for Classifying Software Failure Reports (Podgurski et al., 2003) Bug reports: Software failures automatically submitted Technique: Supervised and unsupervised pattern classification and multivariate visualization Testing: Batch runs Dataset: GCC, Jikes, and JavaC Assisted Detection of Duplicate Bug Reports (Hiew, 2006) Bug reports: Natural language bug reports Technique: Organize similar bug reports into centroids using TF-IDF Testing: Batch runs Dataset: Firefox, Eclipse, Apache, and Fedora Core Results: Precision of 29% and recall of 50% Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 46 / 57
  • 59. Related work [2] Detection of Duplicate Defect Reports Using Natural Language Processing (Runeson et al., 2007) Bug reports: Natural language bug reports Technique: Natural Language Processing (NLP) Testing: Batch runs and a tool Dataset: Sony Ericsson Mobile Communications Results: Recall of 40% An Approach to Detecting Duplicate Bug Reports Using Natural Language and Execution Information (Wang et al., 2008) Bug reports: Natural language bug reports Technique: NLP and execution information Testing: Batch runs Dataset: Firefox and Eclipse Results: Recall of 67%-93% at its best Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 47 / 57
  • 60. Outline 1 Introduction M.Sc. Context, Motivation, Proposed solution 2 The Bug Report Duplication Problem: A Characterization Study Definition, Planning and Operation, Results 3 BAST Requirements, Architecture, Overview 4 Case Study Definition, Planning, Analysis and interpretation 5 Experiment Definition, Planning, Analysis and interpretation 6 Related Work 7 Conclusion 8 References Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 48 / 57
  • 61. Research contribution A taxonomy for the bug repositories mining area The state-of-the-art on mining bug repositories A characterization of the bug report duplication problem A tool to reduce the time spent with search and analysis of bug reports A case study to evaluate the tool proposed; An experiment with 18 subjects to evaluate the tool Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 49 / 57
  • 62. Papers Cavalcanti, Y. C., Martins, A. C., de Almeida, E. S., and de Lemos Meira, S. R. (2008a). Avoiding Duplicate CR reports in Open Source Software Projects. In The 9th International Free Software Forum (IFSF’08), Porto Alegre, Brazil. Cavalcanti, Y. C., de Almeida, E. S., da Cunha, C. E. A., Pinto, E. R., and Meira, S. R. L. (2008b). The Bug Report Duplication Problem: A Characterization Study. Technical report, C.E.S.A.R and Federal University of Pernambuco. Papers for the Case Study and for the Experiment And more two journal papers being written (characterization and thesis) Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 50 / 57
  • 63. Future Work Evolve from prototype Information visualization Alternative integration methods Provide integration with other tools Search and raking techniques Comments of a bug report Number of informal references Experiment replications Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 51 / 57
  • 64. Outline 1 Introduction M.Sc. Context, Motivation, Proposed solution 2 The Bug Report Duplication Problem: A Characterization Study Definition, Planning and Operation, Results 3 BAST Requirements, Architecture, Overview 4 Case Study Definition, Planning, Analysis and interpretation 5 Experiment Definition, Planning, Analysis and interpretation 6 Related Work 7 Conclusion 8 References Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 52 / 57
  • 65. References I Anvik, J., Hiew, L., and Murphy, G. C. (2005). Coping with an open bug repository. In Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange, pages 35–39, New York, NY, USA. ACM Press. Anvik, J., Hiew, L., and Murphy, G. C. (2006). Who should fix this bug? In Proceeding of the 28th International Conference on Software Engineering (ICSE’06), pages 361–370, New York, NY, USA. ACM Press. Cavalcanti, Y. C., Almeida, E. S., da Cunha, C. E. A., Pinto, E. R., and Meira, S. R. L. (2008). The bug-report duplication problem: a characterization study. Technical report, C.E.S.A.R and Federal University of Pernambuco. Eastwood, A. (1993). Firm fires shots at legacy systems. Computing Canada, 19(2), 17. Erlikh, L. (2000). Leveraging legacy system dollars for e-business. IT Professional, 2(3), 17–23. Hiew, L. (2006). Assisted Detection of Duplicate Bug Reports. Master’s thesis, The University of British Columbia. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 53 / 57
  • 66. References II Huff, F. (1990). Information systems maintenance. The Business Quarterly, (55), 30–32. Kitchenham, B. and Pfleeger, S. L. (2002). Principles of survey research: part 5: populations and samples. SIGSOFT Software Engineering Notes, 27(5), 17–20. Ko, A. J., Myers, B. A., and Chau, D. H. (2006). A linguistic analysis of how people describe software problems. In Proceedings of the Visual Languages and Human-Centric Computing (VLHCC’06), pages 127–134, Washington, DC, USA. IEEE Computer Science. Koskinen, J. (2004). Software maintenance costs. http://www.cs.jyu.fi/~koskinen/smcosts.htm. Lientz, B. P. and Swanson, E. B. (1981). Problems in application software maintenance. Communications of the ACM, 24(11), 763–769. McKee, J. R. (1984). Maintenance as a function of design. In AFIPS National Conference Proceeding, volume 53, pages 187–1983. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 54 / 57
  • 67. References III Moad, J. (1990). Maintaining the competitive edge. Datamation, 4(36), 61–62. Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J., and Wang, B. (2003). Automated support for classifying software failure reports. In Proceedings of the 25th International Conference on Software Engineering (ICSE’03), pages 465–475, Washington, DC, USA. IEEE Computer Society. Port, O. (1988). The software trap – automate or else. Business Week, 9(3051), 142–154. Runeson, P., Alexandersson, M., and Nyholm, O. (2007). Detection of duplicate defect reports using natural language processing. In Proceedings of the 29th International Conference on Software Engineering (ICSE’07), pages 499–510. IEEE Computer Science Press. Sandusky, R. J., Gasser, L., and Ripoche, G. (2004). Bug report networks: Varieties, strategies, and impacts in a f/oss development community. In Proceedings of the 1st International Workshop on Mining Software Repositories (MSR’04), pages 80–84, University of Waterloo, Waterloo. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 55 / 57
  • 68. References IV Sommerville, I. (2007). Software Engineering. Addison Wesley, 8 edition. Song, Q., Shepperd, M. J., Cartwright, M., and Mair, C. (2006). Software defect association mining and defect correction effort prediction. IEEE Transactions on Software Engineering, 32(2), 69–82. Wang, X., Zhang, L., Xie, T., Anvik, J., and Sun, J. (2008). An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 13th International Conference on Software Engineering (ICSE’08), pages 461–470. ACM Press. Wohlin, C., Runeson, P., Martin Höst, M. C. O., Regnell, B., and Wesslén, A. (2000). Experimentation in Software Engineering: An Introduction. The Kluwer Internation Series in Software Engineering. Kluwer Academic Publishers, Norwell, Massachusets, USA. Zelkowitz, M. V., Shaw, A. C., and Gannon, J. D. (1979). Principles of Software Engineering and Design. Prentice Hall Professional Technical Reference. Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 56 / 57
  • 69. A Bug Report Analysis and Search Tool M.Sc. Presentation Yguaratã Cerqueira Cavalcanti yguarata@gmail.com Advisor: Silvio Romero de Lemos Meira Co-Advisor: Eduardo Santana de Almeida Center for Informatics – Federal University of Pernambuco (UFPE) http://www.cin.ufpe.br Reuse in Software Engineering (RiSE) http://www.rise.com.br 07/03/2009, Recife – Brazil Yguaratã Cavalcanti (UFPE/CIn, RiSE) A Bug Report Analysis and Search Tool 07/03/2009, Recife – Brazil 57 / 57