SlideShare a Scribd company logo
BushraDBR: An Automatic Approach
to Retrieving Duplicate Bug Reports
Ra'Fat Al-Msie'deen
Department of Software Engineering, Faculty of IT, Mutah University, Mutah 61710, Karak,
Jordan
E-mail address: rafatalmsiedeen@mutah.edu.jo
https://rafat66.github.io/Al-Msie-Deen/
➢ To cite this version:
R. Al-Msie’deen, “BushraDBR: An Automatic Approach to Retrieving
Duplicate Bug Reports,” International Journal of Computing and Digital
Systems, vol. 15, no. 1, pp. 221-238, 2024.
• URL: https://journal.uob.edu.bh/handle/123456789/5343
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
Keywords
Keywords: Software engineering, Software maintenance, Duplicate bug
reports, Formal concept analysis, Latent semantic indexing, Bug
tracking system, Bug report.
Abstract
A Bug Tracking System (BTS), such as Bugzilla,
is generally utilized to track submitted Bug
Reports (BRs) for a particular software system.
Duplicate Bug Report (DBR) retrieval is the
process of obtaining a DBR in the BTS. This
process is important to avoid needless work
from engineers on DBRs. To prevent wasting
engineer resources, such as effort and time, on
previously submitted (or duplicate) BRs, it is
essential to find and retrieve DBRs as soon as
they are submitted by software users.
Abstract …
Thus, this paper proposes an automatic approach
(called BushraDBR) that aims to assist an
engineer (called a triager) to retrieve DBRs and
stop the duplicates before they start. Where
BushraDBR stands for Bushra Duplicate Bug
Reports retrieval process. Therefore, when a new
BR is sent to the Bug Repository (BRE), an
engineer checks whether it is a duplicate of an
existing BR in BRE or not via BushraDBR approach.
If it is, the engineer marks it as DBR, and the BR is
excluded from consideration for any additional
work; otherwise, the BR is added to the BRE.
BushraDBR approach relies on Textual Similarity
(TS) between the newly submitted BR and the rest
of the BRs in BRE to retrieve DBRs.
Abstract …
BushraDBR exploits unstructured data from BRs
to apply Information Retrieval (IR) methods in
an efficient way. BushraDBR approach uses two
techniques to retrieve DBRs: Latent Semantic
Indexing (LSI) and Formal Concept Analysis
(FCA). The originality of BushraDBR is to stop
DBRs before they occur by comparing the newly
reported BR with the rest of the BRs in the BTS,
thus saving time and effort during the Software
Maintenance (SM) process. BushraDBR also
uniquely retrieves DBR through the use of LSI and
FCA techniques. BushraDBR approach had been
validated and evaluated on several publicly
available data sets from Bugzilla. Experiments
show the ability of BushraDBR approach to
retrieve DBRs in an efficient and accurate
manner.
BushraDBR
Approach
List of Abbreviations
• Bug Tracking System (BTS)
• Bug Reports (BRs)
• Duplicate Bug Report (DBR)
• Bug Repository (BRE)
• Information Retrieval (IR)
• Latent Semantic Indexing (LSI)
• Software Maintenance (SM)
• Formal Concept Analysis (FCA)
• Master BRs set (MRs)
Vector Space Model (VSM)
Deep Learning (DL)
Machine learning (ML)
Convolutional Neural Network (CNN)
New BR (Nr)
Natural Language Processing (NLP)
Quality Assurance (QA)
Drawing Shapes Application (DSA)
Bushra Duplicate Bug Reports retrieval process (BushraDBR)
Cosine Similarity Matrix (CSM)
Cosine Similarity (CS)
Term-Query Matrix (TQM)
Term Document Matrix (TDM)
Textual Similarity (TS)
List of DBRs (LDRs)
Binary Formal Context (BFC)
Singular Value Decomposition (SVD)
Introduction
✓ In this work, reports that are textually similar to each other are called DBRs.
✓ TS between BRs is good evidence that they describe the same (or similar) issue.
✓ Frequently, DBRs are reported by multiple software users.
✓ These users come from different backgrounds and use different vocabularies to describe
the same (or similar) software bug.
✓ The important role of the triager is to check the reported BRs for any possible duplicates
before sending them to the BRE.
✓ A manual check of submitted BRs is a difficult task due to the huge number of BRs
reported every day.
✓ On the other hand, retrieving DBR after storing it in the BRE is a tedious and expensive task
for software developers.
✓ So, the process of stopping DBRs before they start is important and very useful.
Introduction …
• The novelty of this paper is that it proposes a textual-based
approach (i.e., an IR-based solution) to stopping DBRs before
they start.
• BushraDBR prevents DBRs by continuously checking the
recently reported or submitted BR against the BRs stored in the
BTS.
• In the event that the submitted BR is textually similar to any of
the BRs inside the BTS, the triager will exclude this BR from any
further work and not include it in the BTS.
Figure 1. An
example of
a BR from
drawing
shapes
application.
“Bug details”
Bug ID: 000001.
Product: Drawing shapes App.
Submitted on: Sept. 12, 2022.
Summary: The PDF viewer is very slow for line drawings.
Component: Layout.
Type: defect.
Severity: S4.
Priority: P5.
Status: Assigned.
“Bug description”
Actual results:
PDF is very slow to load. The text parts of the PDF appear fast, but
every line drawing is very slow.
Expected results:
PDFs with line drawings should show up a lot faster without any delay.
TABLE I. A pair
of DBRs from
Bugzilla (i.e.,
core product).
Figure 2. The workflow
for retrieving DBRs via
BushraDBR approach.
BR_001: …
Bug repository
BushraDBR
approach
Similarity
BR documents
BR_002: …
BR_003: …
…
BR_n: …
BR_001: 0.02
BR_002: 0.58
BR_003: 0.98
…
BR_n: 0.45
Duplicate
Nr_Bug-ID
BR _003
Duplicate or
non-duplicate
New bug report (Nr)
Nr_Bug-ID
Query
LSI
Test duplicate bug reports (DBRs)
✓
Nr
FCA
Figure 3. The key
elements of
BurshraDBR
approach.
New bug report (Nr)
BR documents from BTS
Duplicate BR
Non-duplicate BR
Threshold = 80
LSI
FCA
OUTPUT
DATA SETS
EVALUATION METRICS Precision
F-measure
Recall
WebPayments UI
Firefox
Symbolication & General
Eliot
DOM: Editor
Audio/Video
Core
Drawing Shapes App
TECHNIQUES
BurshraDBR
approach
TABLE II. Structured
and unstructured
information that is
leveraged by the
selected approaches
(i.e., survey).
TABLE III. Data
sets that are
utilized by the
selected
approaches
(i.e., survey).
TABLE IV. Review and classification of current studies relevant for DBR retrieval approaches.
Figure 4. DBRs describe the
same software failure and use
similar vocabularies.
Bug Report X
Bug Report Y
Bug X Bug Tracking
System (BTS)
Triager or
developer
BushraDBR
approach
Non-duplicate
Duplicate
Master (or original) bug report
Software
user
Submit
Software
failure X
Nr
Similar
vocabularies
Figure 5. The DBR
retrieval process -
BushraDBR approach.
Master bug reports
set (MRs)
List of duplicate bug reports (LDRs)
Repository
BushraDBR approach
Non-duplicate
Duplicate
No
Yes
BR documents
Similarity
value >=
threshold
(Thr)
Insert into
MRs
Insert into
LDRs
New bug
report (Nr)
Query (Nr)
document
Inputs
Latent Semantic Indexing
Formal Concept Analysis
Preprocessing
Figure 6. An example of a bug
document generated by BushraDBR
approach.
Bug Document
BushraDBR
Bug ID: 000007
Incapable of using the
mouse wheel to scroll on an
art site.
Scrolling the contents of
the art site by clicking
the scroll wheel of the
mouse device does not
always work well.
Algorithm 1:
Preprocessing
of BRs.
Figure 7. An
example of a
pre-processed
bug
document.
Bug Document
BushraDBR
Bug ID: 000007
incapable
use
mouse
wheel
scroll
art
site
scroll
content
art
site
click
scroll
wheel
mouse
device
always
work
well
1. Extracting
bug content
2. Splitting
words or
tokenization
3. Removing
stop words
4. Stemming
Pre-processed document
TABLE V. Bug reports from the drawing shapes application data set.
TABLE VI. BushraDBR preprocessing steps, explanation, and a real-world example
from the DSA data set.
Figure 8. Retrieving DBRs
using LSI and FCA
techniques — the
BushraDBR approach.
BushraDBR approach
Latent Semantic Indexing
2
Formal Concept Analysis
3
Extracting
Similarity matrix (SM)
SM as binary formal context
Nr
BR-1 BR-2 BR-N
0.01 0.93 0.07
Concept_0
BR-2
Nr
Nr
BRs
Nr
BR-1 BR-2 BR-N
0 1 0
Nr
BR-1 BR-2 BR-N
x
The AOC-poset
Formal context
DBRs
Formal
concept
..., of the myOval()
and … displayed, …
…, of, the, my, oval,
and, ..., displayed, ..
.., oval, .., displayed,
…
…, oval, …, display, ..
Splitting
Preprocessing
1
The intent
of concept
The extent of concept
Stop words Stemming
Algorithm 2:
Measuring
TS between
BRs via LSI.
Measuring TS between BRs via LSI
1
2
3
4
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
Figure 9. The TS
values between
Nr and BRs as a
directed graph.
Algorithm 3: Retrieving DBRs using FCA.
Retrieving DBRs using FCA
Figure 10. The AOC-poset for the
DSA data set.
1
2
3
4
Figure 10. The AOC-
poset for DSA data
set.
Formal
concept
analysis
Concept_0
1819151
1819152
Drawing Shapes
Application
(DSA)
Symbolication
& General -
Eliot product
Concept_0
1490824
1510066
WebPayments UI
– Firefox product
(partial)
Concept_0
1545235
1545237
Audio/Video –
Core product
(partial)
DOM editor –
Core product
(partial)
Concept_0
176525
1841744
Concept_0
000006
000007
The extent of
Concept_0
The intent of
Concept_0
The
concept ID
Formal concept
Concept_0
000006
000007
Formal concept analysis
Master bug reports set (MRs)
List of duplicate
bug reports
(LDRs)
Repository
BushraDBR approach
Non-duplicate
Duplicate
No
Yes
Preprocessing
Latent Semantic Indexing
1
2
Formal Concept Analysis
3
Extracting Splitting
Stop words Stemming
Similarity matrix (SM)
SM as binary formal context
BR documents
Similarity
value >=
threshold
(Thr)
Nr
BR-1 BR-2 BR-N
0.01 0.93 0.07
Nr
BR-1 BR-2 BR-N
0 1 0
Insert into MRs
Insert into
LDRs
Concept_0
BR-2
Nr
New bug
report (Nr)
Query (Nr)
document
Figure 9. The TS values between
Nr and BRs as a directed graph.
000007
000002
000003
000004
000005
000001
000006
New bug report
(Nr)
List of duplicate
bug reports (LDRs)
BushraDBR approach Non-duplicate
Duplicate
No
Yes
Preprocessing
Latent Semantic Indexing
1
2
Formal Concept Analysis
3
Extracting Splitting
Stop words Stemming
Similarity matrix (SM)
SM as binary formal context
Query (Nr) document
Similarity
value >=
threshold
(Thr)
Insert into LDRs
BushraDBR
approach
Experimentation
Master bug
reports set (MRs)
Repository
BR documents
data sets
Cosine similarity matrices
Figure 10. The AOC-poset for each data set in
the experiments: (A) DSA, (B) Eliot, (C)
WebPayments UI [partial], (D) Audio / Video
[partial], and (E) DOM editor [partial].
Concept_0
1819151
1819152
Concept_1
1768863
1814509
1729698
1741434
1801212
1812345
1815981
1707879
1815982
1811299
1801169
1811236
1475334
1649535
1745533
Concept_1
000005
000003
000004
000001
000002
Concept_0
1490824
1510066
Concept_1
1476344
1498447
1446577
1498225
1438784
1494439
1501447
1499837
1507623
1464356
1494559
1470197
1495151
.
.
Concept_0
1545235
1545237
Concept_1
1582074
1673285
1532646
1732199
1816175
1776641
1680362
1743870
1676015
1691996
1545237
1757124
1799132
.
.
Concept_1
1088194
201410
1036856
1276391
1327934
1723853
1840784
1710784
458524
460903
377297
1462368
1567160
.
.
Concept_0
176525
1841744
Concept_0
000006
000007
(A) (B) (C) (D) (E)
Conclusion
• This paper has introduced a novel
approach called BushraDBR targeted
at automatically retrieving DBRs
using LSI and FCA. BushraDBR aimed
to prevent developers from wasting
their resources, such as effort and
time, on previously submitted BRs.
The novelty of BushraDBR is that it
exploits textual data in BRs to apply
LSI and FCA techniques in an efficient
way to retrieve DBRs.
Preprocessing
Latent Semantic Indexing
1
2
Formal Concept Analysis
3
Extracting Splitting
Stop words Stemming
Similarity matrix (SM)
SM as binary formal context
Conclusion …
• BushraDBR prevents DBRs
before they occur by
comparing the newly reported
BR with the rest of the BRs in
the repository. The suggested
approach had been validated
and evaluated on different data
sets from Bugzilla. Experiments
show the capacity of
BushraDBR approach to
retrieve DBRs in an efficient
and accurate manner.
Future work
• Regarding BushraDBR's future work, the author
plans to extend the current approach by
developing an ML-based solution to retrieve DBRs
and prevent duplicates before they start. Also, he
plans to compare BushraDBR (i.e., an IR-based
approach) with current ML-based approaches.
Furthermore, additional empirical tests can be
conducted to verify BushraDBR approach using
open-source and industrial data sets. There is also
a necessary need to conduct a comprehensive
survey and make comparisons between all current
approaches relevant to DBR retrieval.
New bug
report (Nr)
List of duplicate
bug reports
(LDRs)
Non-duplicate
Duplicate
No
Yes
Query (Nr) document
Similarity value >= threshold (Thr)
Insert into
LDRs
References
[3] N. Jalbert and W. Weimer, “Automated duplicate detection for bug tracking systems,” in The 38th Annual IEEE / IFIP
International Conference on Dependable Systems and Networks, DSN 2008, June 24-27, 2008, Anchorage, Alaska, USA, Proceedings.
IEEE Computer Society, 2008, pp. 52–61. [Online]. Available: https://doi.org/10.1109/DSN.2008.4630070
[4] A. Hindle and C. Onuczko, “Preventing duplicate bug reports by continuously querying bug reports,” Empir. Softw. Eng., vol.
24, no. 2, pp. 902–936, 2019. [Online].Available: https://doi.org/10.1007/s10664-018-9643-4
[11]A. Kukkar, R. Mohana, Y. Kumar, A. Nayyar, M. Bilal, and K. Kwak, “Duplicate bug report detection and classification system
based on deep learning technique,” IEEE Access, vol. 8, pp. 200 749–200 763, 2020. [Online]. Available:
https://doi.org/10.1109/ACCESS.2020.3033045
[17] X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, “An approach to detecting duplicate bug reports using natural language and
execution information,” in 30th International Conference on Software Engineering (ICSE 2008), Leipzig, Germany, May 10-18,
2008, W. Sch¨afer, M. B. Dwyer, and V. Gruhn, Eds. ACM, 2008, pp. 461–470. [Online]. Available:
https://doi.org/10.1145/1368088.1368151
[36] C. Sun, D. Lo, X. Wang, J. Jiang, and S. Khoo, “A discriminative model approach for accurate duplicate bug report
retrieval,” in Proceedings of the 32nd ACM / IEEE International Conference on Software Engineering - Volume 1, ICSE 2010, Cape
Town, South Africa, 1-8 May 2010, J. Kramer, J. Bishop, P. T. Devanbu, and S. Uchitel, Eds. ACM, 2010, pp. 45–54. [Online].
Available: https://doi.org/10.1145/1806799.1806811
[37] C. Sun, D. Lo, S. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in 26th IEEE / ACM
International Conference on Automated Software Engineering (ASE 2011), Lawrence, KS, USA, November 6-10, 2011, P. Alexander,
C. S. Pasareanu, and J. G. Hosking, Eds. IEEE Computer Society, 2011, pp. 253–262. [Online]. Available:
https://doi.org/10.1109/ASE.2011.6100061
[38] F. Thung, P. S. Kochhar, and D. Lo, “Dupfinder: integrated tool support for duplicate bug report detection,” in ACM / IEEE
International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15 - 19, 2014, I. Crnkovic, M.
Chechik, and P. Gr¨unbacher, Eds. ACM, 2014, pp. 871–874. [Online].Available: https://doi.org/10.1145/2642937.2648627
[39] J. He, L. Xu, M. Yan, X. Xia, and Y. Lei, “Duplicate bug report detection using dual-channel convolutional neural
networks,” in ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13-15, 2020.
ACM, 2020, pp. 117–127. [Online].Available: https://doi.org/10.1145 /3387904.3389263
[40] P. Runeson, M. Alexandersson, and O. Nyholm, “Detection of duplicate defect reports using natural language processing,” in
29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, USA, May 20-26, 2007. IEEE Computer
Society, 2007, pp. 499–510. [Online].Available: https://doi.org/10.1109/ICSE.2007.32
[44] M. S. Rakha, C. Bezemer, and A. E. Hassan, “Revisiting the performance evaluation of automated approaches for the
retrieval of duplicate issue reports,” IEEE Trans. Software Eng., vol. 44, no. 12, pp. 1245–1268, 2018. [Online]. Available:
https://doi.org/10.1109/TSE.2017.2755005
[47] B. S. Neysiani and S. Morteza Babamir, “Automatic duplicate bug report detection using information retrieval-based versus
machine learning-based approaches,” in 2020 6th International Conference on Web Research (ICWR), April 2020, pp. 288–293.
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug
Reports
BushraDBR: An Automatic Approach
to Retrieving Duplicate Bug Reports
Ra'Fat Al-Msie'deen
Department of Software Engineering, Faculty of IT, Mutah University, Mutah 61710, Karak,
Jordan
E-mail address: rafatalmsiedeen@mutah.edu.jo
https://rafat66.github.io/Al-Msie-Deen/

More Related Content

PDF
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports.pdf
PDF
PDF
A Survey on Bug Tracking System for Effective Bug Clearance
PDF
Poster: Improving Bug Localization with Report Quality Dynamics and Query Ref...
PPT
Chapter3 Search
PDF
CSMR10a.ppt
PDF
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
PDF
Smart City: Definitions, Architectures, Development Life Cycle, Technologies,...
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports.pdf
A Survey on Bug Tracking System for Effective Bug Clearance
Poster: Improving Bug Localization with Report Quality Dynamics and Query Ref...
Chapter3 Search
CSMR10a.ppt
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
Smart City: Definitions, Architectures, Development Life Cycle, Technologies,...

More from Ra'Fat Al-Msie'deen (20)

PDF
ScaMaha: A Tool for Parsing, Analyzing, and Visualizing Object-Oriented Softw...
PDF
ScaMaha: A Tool for Parsing, Analyzing, and Visualizing Object-Oriented Softw...
PDF
Software evolution understanding: Automatic extraction of software identifier...
PDF
FeatureClouds: Naming the Identified Feature Implementation Blocks from Softw...
PDF
Requirements Traceability: Recovering and Visualizing Traceability Links Betw...
PDF
Supporting software documentation with source code summarization
PDF
SoftCloud: A Tool for Visualizing Software Artifacts as Tag Clouds.pdf
PDF
Requirements Traceability: Recovering and Visualizing Traceability Links Betw...
PDF
Automatic Labeling of the Object-oriented Source Code: The Lotus Approach
PDF
Constructing a software requirements specification and design for electronic ...
PDF
Detecting commonality and variability in use-case diagram variants
PDF
Naming the Identified Feature Implementation Blocks from Software Source Code
PPTX
Application architectures - Software Architecture and Design
PPTX
Planning and writing your documents - Software documentation
PPTX
Requirements management planning & Requirements change management
PPTX
Requirements change - requirements engineering
PPTX
Requirements validation - requirements engineering
PPTX
Software Documentation - writing to support - references
PPTX
Algorithms - "heap sort"
PPTX
Algorithms - "quicksort"
ScaMaha: A Tool for Parsing, Analyzing, and Visualizing Object-Oriented Softw...
ScaMaha: A Tool for Parsing, Analyzing, and Visualizing Object-Oriented Softw...
Software evolution understanding: Automatic extraction of software identifier...
FeatureClouds: Naming the Identified Feature Implementation Blocks from Softw...
Requirements Traceability: Recovering and Visualizing Traceability Links Betw...
Supporting software documentation with source code summarization
SoftCloud: A Tool for Visualizing Software Artifacts as Tag Clouds.pdf
Requirements Traceability: Recovering and Visualizing Traceability Links Betw...
Automatic Labeling of the Object-oriented Source Code: The Lotus Approach
Constructing a software requirements specification and design for electronic ...
Detecting commonality and variability in use-case diagram variants
Naming the Identified Feature Implementation Blocks from Software Source Code
Application architectures - Software Architecture and Design
Planning and writing your documents - Software documentation
Requirements management planning & Requirements change management
Requirements change - requirements engineering
Requirements validation - requirements engineering
Software Documentation - writing to support - references
Algorithms - "heap sort"
Algorithms - "quicksort"
Ad

Recently uploaded (20)

PPTX
Cell Structure & Organelles in detailed.
PDF
Classroom Observation Tools for Teachers
PPTX
Lesson notes of climatology university.
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
RMMM.pdf make it easy to upload and study
PDF
Insiders guide to clinical Medicine.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Institutional Correction lecture only . . .
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Basic Mud Logging Guide for educational purpose
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
Cell Structure & Organelles in detailed.
Classroom Observation Tools for Teachers
Lesson notes of climatology university.
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
O7-L3 Supply Chain Operations - ICLT Program
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
RMMM.pdf make it easy to upload and study
Insiders guide to clinical Medicine.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Supply Chain Operations Speaking Notes -ICLT Program
Microbial diseases, their pathogenesis and prophylaxis
Pharmacology of Heart Failure /Pharmacotherapy of CHF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial disease of the cardiovascular and lymphatic systems
Institutional Correction lecture only . . .
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
Basic Mud Logging Guide for educational purpose
Abdominal Access Techniques with Prof. Dr. R K Mishra
STATICS OF THE RIGID BODIES Hibbelers.pdf
Ad

BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports

  • 1. BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports Ra'Fat Al-Msie'deen Department of Software Engineering, Faculty of IT, Mutah University, Mutah 61710, Karak, Jordan E-mail address: rafatalmsiedeen@mutah.edu.jo https://rafat66.github.io/Al-Msie-Deen/
  • 2. ➢ To cite this version: R. Al-Msie’deen, “BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports,” International Journal of Computing and Digital Systems, vol. 15, no. 1, pp. 221-238, 2024. • URL: https://journal.uob.edu.bh/handle/123456789/5343
  • 4. Keywords Keywords: Software engineering, Software maintenance, Duplicate bug reports, Formal concept analysis, Latent semantic indexing, Bug tracking system, Bug report.
  • 5. Abstract A Bug Tracking System (BTS), such as Bugzilla, is generally utilized to track submitted Bug Reports (BRs) for a particular software system. Duplicate Bug Report (DBR) retrieval is the process of obtaining a DBR in the BTS. This process is important to avoid needless work from engineers on DBRs. To prevent wasting engineer resources, such as effort and time, on previously submitted (or duplicate) BRs, it is essential to find and retrieve DBRs as soon as they are submitted by software users.
  • 6. Abstract … Thus, this paper proposes an automatic approach (called BushraDBR) that aims to assist an engineer (called a triager) to retrieve DBRs and stop the duplicates before they start. Where BushraDBR stands for Bushra Duplicate Bug Reports retrieval process. Therefore, when a new BR is sent to the Bug Repository (BRE), an engineer checks whether it is a duplicate of an existing BR in BRE or not via BushraDBR approach. If it is, the engineer marks it as DBR, and the BR is excluded from consideration for any additional work; otherwise, the BR is added to the BRE. BushraDBR approach relies on Textual Similarity (TS) between the newly submitted BR and the rest of the BRs in BRE to retrieve DBRs.
  • 7. Abstract … BushraDBR exploits unstructured data from BRs to apply Information Retrieval (IR) methods in an efficient way. BushraDBR approach uses two techniques to retrieve DBRs: Latent Semantic Indexing (LSI) and Formal Concept Analysis (FCA). The originality of BushraDBR is to stop DBRs before they occur by comparing the newly reported BR with the rest of the BRs in the BTS, thus saving time and effort during the Software Maintenance (SM) process. BushraDBR also uniquely retrieves DBR through the use of LSI and FCA techniques. BushraDBR approach had been validated and evaluated on several publicly available data sets from Bugzilla. Experiments show the ability of BushraDBR approach to retrieve DBRs in an efficient and accurate manner.
  • 9. List of Abbreviations • Bug Tracking System (BTS) • Bug Reports (BRs) • Duplicate Bug Report (DBR) • Bug Repository (BRE) • Information Retrieval (IR) • Latent Semantic Indexing (LSI) • Software Maintenance (SM) • Formal Concept Analysis (FCA) • Master BRs set (MRs) Vector Space Model (VSM) Deep Learning (DL) Machine learning (ML) Convolutional Neural Network (CNN) New BR (Nr) Natural Language Processing (NLP) Quality Assurance (QA) Drawing Shapes Application (DSA) Bushra Duplicate Bug Reports retrieval process (BushraDBR) Cosine Similarity Matrix (CSM) Cosine Similarity (CS) Term-Query Matrix (TQM) Term Document Matrix (TDM) Textual Similarity (TS) List of DBRs (LDRs) Binary Formal Context (BFC) Singular Value Decomposition (SVD)
  • 10. Introduction ✓ In this work, reports that are textually similar to each other are called DBRs. ✓ TS between BRs is good evidence that they describe the same (or similar) issue. ✓ Frequently, DBRs are reported by multiple software users. ✓ These users come from different backgrounds and use different vocabularies to describe the same (or similar) software bug. ✓ The important role of the triager is to check the reported BRs for any possible duplicates before sending them to the BRE. ✓ A manual check of submitted BRs is a difficult task due to the huge number of BRs reported every day. ✓ On the other hand, retrieving DBR after storing it in the BRE is a tedious and expensive task for software developers. ✓ So, the process of stopping DBRs before they start is important and very useful.
  • 11. Introduction … • The novelty of this paper is that it proposes a textual-based approach (i.e., an IR-based solution) to stopping DBRs before they start. • BushraDBR prevents DBRs by continuously checking the recently reported or submitted BR against the BRs stored in the BTS. • In the event that the submitted BR is textually similar to any of the BRs inside the BTS, the triager will exclude this BR from any further work and not include it in the BTS.
  • 12. Figure 1. An example of a BR from drawing shapes application. “Bug details” Bug ID: 000001. Product: Drawing shapes App. Submitted on: Sept. 12, 2022. Summary: The PDF viewer is very slow for line drawings. Component: Layout. Type: defect. Severity: S4. Priority: P5. Status: Assigned. “Bug description” Actual results: PDF is very slow to load. The text parts of the PDF appear fast, but every line drawing is very slow. Expected results: PDFs with line drawings should show up a lot faster without any delay.
  • 13. TABLE I. A pair of DBRs from Bugzilla (i.e., core product).
  • 14. Figure 2. The workflow for retrieving DBRs via BushraDBR approach. BR_001: … Bug repository BushraDBR approach Similarity BR documents BR_002: … BR_003: … … BR_n: … BR_001: 0.02 BR_002: 0.58 BR_003: 0.98 … BR_n: 0.45 Duplicate Nr_Bug-ID BR _003 Duplicate or non-duplicate New bug report (Nr) Nr_Bug-ID Query LSI Test duplicate bug reports (DBRs) ✓ Nr FCA
  • 15. Figure 3. The key elements of BurshraDBR approach. New bug report (Nr) BR documents from BTS Duplicate BR Non-duplicate BR Threshold = 80 LSI FCA OUTPUT DATA SETS EVALUATION METRICS Precision F-measure Recall WebPayments UI Firefox Symbolication & General Eliot DOM: Editor Audio/Video Core Drawing Shapes App TECHNIQUES BurshraDBR approach
  • 16. TABLE II. Structured and unstructured information that is leveraged by the selected approaches (i.e., survey).
  • 17. TABLE III. Data sets that are utilized by the selected approaches (i.e., survey).
  • 18. TABLE IV. Review and classification of current studies relevant for DBR retrieval approaches.
  • 19. Figure 4. DBRs describe the same software failure and use similar vocabularies. Bug Report X Bug Report Y Bug X Bug Tracking System (BTS) Triager or developer BushraDBR approach Non-duplicate Duplicate Master (or original) bug report Software user Submit Software failure X Nr Similar vocabularies
  • 20. Figure 5. The DBR retrieval process - BushraDBR approach. Master bug reports set (MRs) List of duplicate bug reports (LDRs) Repository BushraDBR approach Non-duplicate Duplicate No Yes BR documents Similarity value >= threshold (Thr) Insert into MRs Insert into LDRs New bug report (Nr) Query (Nr) document Inputs Latent Semantic Indexing Formal Concept Analysis Preprocessing
  • 21. Figure 6. An example of a bug document generated by BushraDBR approach. Bug Document BushraDBR Bug ID: 000007 Incapable of using the mouse wheel to scroll on an art site. Scrolling the contents of the art site by clicking the scroll wheel of the mouse device does not always work well.
  • 23. Figure 7. An example of a pre-processed bug document. Bug Document BushraDBR Bug ID: 000007 incapable use mouse wheel scroll art site scroll content art site click scroll wheel mouse device always work well 1. Extracting bug content 2. Splitting words or tokenization 3. Removing stop words 4. Stemming Pre-processed document
  • 24. TABLE V. Bug reports from the drawing shapes application data set.
  • 25. TABLE VI. BushraDBR preprocessing steps, explanation, and a real-world example from the DSA data set.
  • 26. Figure 8. Retrieving DBRs using LSI and FCA techniques — the BushraDBR approach. BushraDBR approach Latent Semantic Indexing 2 Formal Concept Analysis 3 Extracting Similarity matrix (SM) SM as binary formal context Nr BR-1 BR-2 BR-N 0.01 0.93 0.07 Concept_0 BR-2 Nr Nr BRs Nr BR-1 BR-2 BR-N 0 1 0 Nr BR-1 BR-2 BR-N x The AOC-poset Formal context DBRs Formal concept ..., of the myOval() and … displayed, … …, of, the, my, oval, and, ..., displayed, .. .., oval, .., displayed, … …, oval, …, display, .. Splitting Preprocessing 1 The intent of concept The extent of concept Stop words Stemming
  • 28. Measuring TS between BRs via LSI 1 2 3 4
  • 30. Figure 9. The TS values between Nr and BRs as a directed graph.
  • 31. Algorithm 3: Retrieving DBRs using FCA.
  • 32. Retrieving DBRs using FCA Figure 10. The AOC-poset for the DSA data set. 1 2 3 4
  • 33. Figure 10. The AOC- poset for DSA data set.
  • 35. Concept_0 1819151 1819152 Drawing Shapes Application (DSA) Symbolication & General - Eliot product Concept_0 1490824 1510066 WebPayments UI – Firefox product (partial) Concept_0 1545235 1545237 Audio/Video – Core product (partial) DOM editor – Core product (partial) Concept_0 176525 1841744 Concept_0 000006 000007 The extent of Concept_0 The intent of Concept_0 The concept ID Formal concept Concept_0 000006 000007 Formal concept analysis
  • 36. Master bug reports set (MRs) List of duplicate bug reports (LDRs) Repository BushraDBR approach Non-duplicate Duplicate No Yes Preprocessing Latent Semantic Indexing 1 2 Formal Concept Analysis 3 Extracting Splitting Stop words Stemming Similarity matrix (SM) SM as binary formal context BR documents Similarity value >= threshold (Thr) Nr BR-1 BR-2 BR-N 0.01 0.93 0.07 Nr BR-1 BR-2 BR-N 0 1 0 Insert into MRs Insert into LDRs Concept_0 BR-2 Nr New bug report (Nr) Query (Nr) document
  • 37. Figure 9. The TS values between Nr and BRs as a directed graph. 000007 000002 000003 000004 000005 000001 000006
  • 38. New bug report (Nr) List of duplicate bug reports (LDRs) BushraDBR approach Non-duplicate Duplicate No Yes Preprocessing Latent Semantic Indexing 1 2 Formal Concept Analysis 3 Extracting Splitting Stop words Stemming Similarity matrix (SM) SM as binary formal context Query (Nr) document Similarity value >= threshold (Thr) Insert into LDRs BushraDBR approach
  • 39. Experimentation Master bug reports set (MRs) Repository BR documents
  • 42. Figure 10. The AOC-poset for each data set in the experiments: (A) DSA, (B) Eliot, (C) WebPayments UI [partial], (D) Audio / Video [partial], and (E) DOM editor [partial]. Concept_0 1819151 1819152 Concept_1 1768863 1814509 1729698 1741434 1801212 1812345 1815981 1707879 1815982 1811299 1801169 1811236 1475334 1649535 1745533 Concept_1 000005 000003 000004 000001 000002 Concept_0 1490824 1510066 Concept_1 1476344 1498447 1446577 1498225 1438784 1494439 1501447 1499837 1507623 1464356 1494559 1470197 1495151 . . Concept_0 1545235 1545237 Concept_1 1582074 1673285 1532646 1732199 1816175 1776641 1680362 1743870 1676015 1691996 1545237 1757124 1799132 . . Concept_1 1088194 201410 1036856 1276391 1327934 1723853 1840784 1710784 458524 460903 377297 1462368 1567160 . . Concept_0 176525 1841744 Concept_0 000006 000007 (A) (B) (C) (D) (E)
  • 43. Conclusion • This paper has introduced a novel approach called BushraDBR targeted at automatically retrieving DBRs using LSI and FCA. BushraDBR aimed to prevent developers from wasting their resources, such as effort and time, on previously submitted BRs. The novelty of BushraDBR is that it exploits textual data in BRs to apply LSI and FCA techniques in an efficient way to retrieve DBRs. Preprocessing Latent Semantic Indexing 1 2 Formal Concept Analysis 3 Extracting Splitting Stop words Stemming Similarity matrix (SM) SM as binary formal context
  • 44. Conclusion … • BushraDBR prevents DBRs before they occur by comparing the newly reported BR with the rest of the BRs in the repository. The suggested approach had been validated and evaluated on different data sets from Bugzilla. Experiments show the capacity of BushraDBR approach to retrieve DBRs in an efficient and accurate manner.
  • 45. Future work • Regarding BushraDBR's future work, the author plans to extend the current approach by developing an ML-based solution to retrieve DBRs and prevent duplicates before they start. Also, he plans to compare BushraDBR (i.e., an IR-based approach) with current ML-based approaches. Furthermore, additional empirical tests can be conducted to verify BushraDBR approach using open-source and industrial data sets. There is also a necessary need to conduct a comprehensive survey and make comparisons between all current approaches relevant to DBR retrieval. New bug report (Nr) List of duplicate bug reports (LDRs) Non-duplicate Duplicate No Yes Query (Nr) document Similarity value >= threshold (Thr) Insert into LDRs
  • 47. [3] N. Jalbert and W. Weimer, “Automated duplicate detection for bug tracking systems,” in The 38th Annual IEEE / IFIP International Conference on Dependable Systems and Networks, DSN 2008, June 24-27, 2008, Anchorage, Alaska, USA, Proceedings. IEEE Computer Society, 2008, pp. 52–61. [Online]. Available: https://doi.org/10.1109/DSN.2008.4630070 [4] A. Hindle and C. Onuczko, “Preventing duplicate bug reports by continuously querying bug reports,” Empir. Softw. Eng., vol. 24, no. 2, pp. 902–936, 2019. [Online].Available: https://doi.org/10.1007/s10664-018-9643-4 [11]A. Kukkar, R. Mohana, Y. Kumar, A. Nayyar, M. Bilal, and K. Kwak, “Duplicate bug report detection and classification system based on deep learning technique,” IEEE Access, vol. 8, pp. 200 749–200 763, 2020. [Online]. Available: https://doi.org/10.1109/ACCESS.2020.3033045 [17] X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, “An approach to detecting duplicate bug reports using natural language and execution information,” in 30th International Conference on Software Engineering (ICSE 2008), Leipzig, Germany, May 10-18, 2008, W. Sch¨afer, M. B. Dwyer, and V. Gruhn, Eds. ACM, 2008, pp. 461–470. [Online]. Available: https://doi.org/10.1145/1368088.1368151 [36] C. Sun, D. Lo, X. Wang, J. Jiang, and S. Khoo, “A discriminative model approach for accurate duplicate bug report retrieval,” in Proceedings of the 32nd ACM / IEEE International Conference on Software Engineering - Volume 1, ICSE 2010, Cape Town, South Africa, 1-8 May 2010, J. Kramer, J. Bishop, P. T. Devanbu, and S. Uchitel, Eds. ACM, 2010, pp. 45–54. [Online]. Available: https://doi.org/10.1145/1806799.1806811 [37] C. Sun, D. Lo, S. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in 26th IEEE / ACM International Conference on Automated Software Engineering (ASE 2011), Lawrence, KS, USA, November 6-10, 2011, P. Alexander, C. S. Pasareanu, and J. G. Hosking, Eds. IEEE Computer Society, 2011, pp. 253–262. [Online]. Available: https://doi.org/10.1109/ASE.2011.6100061
  • 48. [38] F. Thung, P. S. Kochhar, and D. Lo, “Dupfinder: integrated tool support for duplicate bug report detection,” in ACM / IEEE International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15 - 19, 2014, I. Crnkovic, M. Chechik, and P. Gr¨unbacher, Eds. ACM, 2014, pp. 871–874. [Online].Available: https://doi.org/10.1145/2642937.2648627 [39] J. He, L. Xu, M. Yan, X. Xia, and Y. Lei, “Duplicate bug report detection using dual-channel convolutional neural networks,” in ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13-15, 2020. ACM, 2020, pp. 117–127. [Online].Available: https://doi.org/10.1145 /3387904.3389263 [40] P. Runeson, M. Alexandersson, and O. Nyholm, “Detection of duplicate defect reports using natural language processing,” in 29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, USA, May 20-26, 2007. IEEE Computer Society, 2007, pp. 499–510. [Online].Available: https://doi.org/10.1109/ICSE.2007.32 [44] M. S. Rakha, C. Bezemer, and A. E. Hassan, “Revisiting the performance evaluation of automated approaches for the retrieval of duplicate issue reports,” IEEE Trans. Software Eng., vol. 44, no. 12, pp. 1245–1268, 2018. [Online]. Available: https://doi.org/10.1109/TSE.2017.2755005 [47] B. S. Neysiani and S. Morteza Babamir, “Automatic duplicate bug report detection using information retrieval-based versus machine learning-based approaches,” in 2020 6th International Conference on Web Research (ICWR), April 2020, pp. 288–293.
  • 49. BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
  • 50. BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports Ra'Fat Al-Msie'deen Department of Software Engineering, Faculty of IT, Mutah University, Mutah 61710, Karak, Jordan E-mail address: rafatalmsiedeen@mutah.edu.jo https://rafat66.github.io/Al-Msie-Deen/