SlideShare a Scribd company logo
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
Project Titles: http://shakastech.weebly.com/2015-2016-titles
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
TOWARDS EFFECTIVE BUG TRIAGE WITH SOFTWARE
DATA REDUCTION TECHNIQUES
Abstract—Software companies spend over 45 percent of cost in dealing with software bugs. An
inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new
bug. To decrease the time cost in manual work, text classification techniques are applied to
conduct automatic bug triage. In this paper, we address the problem of data reduction for bug
triage, i.e., how to reduce the scale and improve the quality of bug data. We combine instance
selection with feature selection to simultaneously reduce data scale on the bug dimension and the
word dimension. To determine the order of applying instance selection and feature selection, we
extract attributes from historical bug data sets and build a predictive model for a new bug data
set. We empirically investigate the performance of data reduction on totally 600,000 bug reports
of two large open source projects, namely Eclipse and Mozilla. The results show that our data
reduction can effectively reduce the data scale and improve the accuracy of bug triage. Our work
provides an approach to leveraging techniques on data processing to form reduced and high-
quality bug data in software development and maintenance.
EXISTING SYSTEM:
we review existing work on modeling bug data, bug triage, and the quality of bug data
with defect prediction. 7.1 Modeling Bug Data To investigate the relationships in bug data,
Sandusky et al. form a bug report network to examine the dependency among bug reports.
Besides studying relationships among bug reports, Hong et al. build a developer social network
to examine the collaboration among developers based on the bug data in Mozilla project. This
developer social network is helpful to understand the developer community and the project
evolution. By mapping bug priorities to developers, Xuan et al. identify the developer
prioritization in open source bug repositories. The developer prioritization can distinguish
developers and assist tasks in software maintenance. Bug Triage Bug triage aims to assign an
appropriate developer to fix a new bug, i.e., to determine who should fix a bug. _Cubrani_c and
Murphy first propose the problem of automatic bug triage to reduce the cost of manual bug
triage. They apply text classification techniques to predict related developers. Anvik et al.
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
Project Titles: http://shakastech.weebly.com/2015-2016-titles
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
examine multiple techniques on bug triage, including data preparation and typical classifiers.
Anvik and Murphy extend above work to reduce the effort of bug triage by creating
development-oriented recommenders. Jeong et al. find out that over 37 percent of bug reports
have been reassigned in manual bug triage. They propose a tossing graph method to reduce
reassignment in bug triage. To avoid low-quality bug reports in bug triage, Xuan et al. train a
semi-supervised classifier by combining unlabeled bug reports with labeled ones. Park et al.
convert bug triage into an optimization problem and propose a collaborative filtering approach to
reducing the bugfixing time.
PROPOSED SYSTEM:
The primary contributions of this paper are as follows:
1) We present the problem of data reduction for bug triage. This problem aims to augment the
data set of bug triage in two aspects, namely a) to simultaneously reduce the scales of the bug
dimension and the word dimension and b) to improve the accuracy of bug triage.
2) We propose a combination approach to addressing the problem of data reduction. This can be
viewed as an application of instance selection and feature selection in bug repositories.
3) We build a binary classifier to predict the order of applying instance selection and feature
selection. To our knowledge, the order of applying instance selection and feature selection has
not been investigated in related domains. This paper is an extension of our previous work. In this
extension, we add new attributes extracted from bug data sets, prediction for reduction orders,
and experiments on four instance selection algorithms, four feature selection algorithms, and
their combinations In this paper, we address the problem of data reduction for bug triage, i.e.,
how to reduce the bug data to save the labor cost of developers and improve the quality to
facilitate the process of bug triage. Data reduction for bug triage aims to build a small-scale and
high-quality set of bug data by removing bug reports and words, which are redundant or non-
informative. In our work, we combine existing techniques of instance selection and feature
selection to simultaneously reduce the bug dimension and the word dimension. The reduced bug
data contain fewer bug reports and fewer words than the original bug data and provide similar
information over the original bug data. We evaluate the reduced bug data according to two
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
Project Titles: http://shakastech.weebly.com/2015-2016-titles
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
criteria: the scale of a data set and the accuracy of bug triage. To avoid the bias of a single
algorithm, we empirically examine the results of four instance selection algorithms and four
feature selection algorithms.
MODULE 1
DATA REDUCTION
Data reduction is the transformation of numerical or alphabetical digital information derived
empirically or experimentally into a corrected, ordered, and simplified form. The basic concept is
the reduction of multitudinous amounts of data down to the meaningful parts. When information
is derived from instrument readings there may also be a transformation from analog to digital
form. When the data are already in digital form the 'reduction' of the data typically involves some
editing, scaling, coding, sorting, collating, and producing tabular summaries. When the
observations are discrete but the underlying phenomenon is continuous then smoothing and
interpolation are often needed. Often the data reduction is undertaken in the presence of reading
or measurement errors. Some idea of the nature of these errors is needed before the most likely
value may be determined.
Module 2
Benefit of Data Reduction
In our work, to save the labor cost of developers, the data reduction for bug triage has two goals,
1) reducing the data scale and 2) improving the accuracy of bug triage. In contrast to modeling
the textual content of bug reports in existing work, we aim to augment the data set to build a
preprocessing approach, which can be applied before an existing bug triage approach. We
explain the two goals of data reduction as follows. Reducing the Data Scale - We reduce scales
of data sets to save the labor cost of developers. Bug dimension.The aim of bug triage is to
assign developers for bug fixing. Once a developer is assigned to a new bug report, the developer
can examine historically fixed bugs to form a solution to the current bug report. For example,
historical bugs are checked to detect whether the new bug is the duplicate of an existing one;
moreover, existing solutions to bugs can be searched and applied to the new bug . Thus, we
consider reducing duplicate and noisy bug reports to decrease the number of historical bugs. In
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
Project Titles: http://shakastech.weebly.com/2015-2016-titles
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
practice, the labor cost of developers (i.e., the cost of examining historical bugs) can be saved by
decreasing the number of bugs based on instance selection. Word dimension. We use feature
selection to remove noisy or duplicate words in a data set. Based on feature selection, the
reduced data set can be handled more easily by automatic techniques (e.g., bug triage
approaches) than the original data set. Besides bug triage, the reduced data set can be further
used for other software tasks after bug triage (e.g., severity identification, time prediction, and
reopened bug analysis).
Improving the Accuracy - Accuracy is an important evaluation criterion for bug triage. In our
work, data reduction explores and removes noisy or duplicate information in data sets. Bug
dimension. Instance selection can remove uninformative bug reports; meanwhile, we can observe
that the accuracy may be decreased by removing bug reports. Word dimension By removing
uninformative words, feature selection improves the accuracy of bug triage. This can recover the
accuracy loss by instance selection.
Module3
Data reduction for bug triage
We propose bug data reduction to reduce the scale and to improve the quality of data in bug
repositories. We combine existing techniques of instance selection and feature selection to
remove certain bug reports and words. A problem for reducing the bug data is to determine the
order of applying instance selection and feature selection, which is denoted as the prediction of
reduction orders. In this section, we first present how to apply instance selection and feature
selection to bug data, i.e., data reduction for bug triage. Then, we list the benefit of the data
reduction.
Module 4
Applying Instance Selection and Feature Selection
In bug triage, a bug data set is converted into a text matrix with two dimensions, namely the bug
dimension and the word dimension. In our work, we leverage the combination of instance
selection and feature selection to generate a reduced bug data set. We replace the original data
set with the reduced data set for bug triage. Instance selection and feature selection are widely
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
Project Titles: http://shakastech.weebly.com/2015-2016-titles
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
used techniques in data processing. For a given data set in a certain application, instance
selection is to obtain a subset of relevant instances (i.e., bug reports in bug data) while feature
selection aims to obtain a subset of relevant features (i.e., words in bug data). In our work, we
employ the combination of instance selection and feature selection. To distinguish the orders of
applying instance selection and feature selection, we give the following denotation. Given an
instance selection algorithm IS and a feature selection algorithm FS, we use FS!IS to denote the
bug data reduction, which first applies FS and then IS; on the other hand, IS!FS denotes first
applying IS and then FS. In Algorithm 1, we briefly present how to reduce the bug data based on
FS ! IS. Given a bug data set, the output of bug data reduction is a new and reduced data set.
Two algorithms FS and IS are applied sequentially. Note that in Step 2), some of bug reports
may be blank during feature selection, i.e., all the words in a bug report are removed. Such blank
bug reports are also removed in the feature selection.
Module 5
Reduction Orders
To apply the data reduction to each new bug data set, we need to check the accuracy of both two
orders (FS ! IS and IS!FS) and choose a better one. To avoid the time cost of manually checking
both reduction orders, we consider predicting the reduction order for a new bug data set based on
historical data sets. We convert the problem of prediction for reduction orders into a binary
classification problem. A bug data set is mapped to an instance and the associated reduction
order (either FS ! IS or IS ! FS) is mapped to the label of a class of instances. Note that a
classifier can be trained only once when facing many new bug data sets. That is, training such a
classifier once can predict the reduction orders for all the new data sets without checking both
reduction orders. To date, the problem of predicting reduction orders of applying feature
selection and instance selection has not been investigated in other application scenarios. From
the perspective of software engineering, predicting the reduction order for bug data sets can be
viewed as a kind of software metrics, which involves activities for measuring some property for
a piece of software. However, the features in our work are extracted from the bug data set while
the features in existing work on software metrics are for individual software artifacts,3 e.g., an
individual bug report or an individual piece of code. In this paper, to avoid ambiguous
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
Project Titles: http://shakastech.weebly.com/2015-2016-titles
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
denotations, an attribute refers to an extracted feature of a bug data set while a feature refers to a
word of a bug report.
CONCLUSIONS
Bug triage is an expensive step of software maintenance in both labor cost and time cost. In this
paper, we combine feature selection with instance selection to reduce the scale of bug data sets
as well as improve the data quality. To determine the order of applying instance selection and
feature selection for a new bug data set, we extract attributes of each bug data set and train a
predictive model based on historical data sets. We empirically investigate the data reduction for
bug triage in bug repositories of two large open source projects, namely Eclipse and Mozilla.
Our work provides an approach to leveraging techniques on data processing to form reduced and
high-quality bug data in software development and maintenance. In future work, we plan on
improving the results of data reduction in bug triage to explore how to prepare a highquality bug
data set and tackle a domain-specific software task. For predicting reduction orders, we plan to
pay efforts to find out the potential relationship between the attributes of bug data sets and the
reduction orders.
REFERENCES
[1] J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?” in Proc. 28th Int. Conf.
Softw. Eng., May 2006, pp. 361–370.
[2] S. Artzi, A. Kie_zun, J. Dolby, F. Tip, D. Dig, A. Paradkar, and M. D. Ernst, “Finding bugs
in web applications using dynamic test generation and explicit-state model checking,” IEEE
Softw., vol. 36, no. 4, pp. 474–494, Jul./Aug. 2010.
[3] J. Anvik and G. C. Murphy, “Reducing the effort of bug report triage: Recommenders for
development-oriented decisions,” ACM Trans. Soft. Eng. Methodol., vol. 20, no. 3, article 10,
Aug. 2011.
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
Project Titles: http://shakastech.weebly.com/2015-2016-titles
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
[4] C. C. Aggarwal and P. Zhao, “Towards graphical models for text processing,” Knowl.
Inform. Syst., vol. 36, no. 1, pp. 1–21, 2013.
[5] Bugzilla, (2014). [Online]. Avaialble: http://bugzilla.org/
[6] K. Balog, L. Azzopardi, and M. de Rijke, “Formal models for expert finding in enterprise
corpora,” in Proc. 29th Annu. Int. ACM SIGIR Conf. Res. Develop. Inform. Retrieval, Aug.
2006, pp. 43–50.

More Related Content

PDF
Towards effective bug triage with software data reduction techniques
PDF
Towards effective bug triage with software data reduction techniques
DOCX
Towards effective bug triage with software
PPTX
Software Maintenance Bug Triaging
PDF
Towards Effective Bug Triage with Software Data Reduction Techniques
PDF
Bug Triage: An Automated Process
DOCX
Towards effective bug triage with software
PDF
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Towards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniques
Towards effective bug triage with software
Software Maintenance Bug Triaging
Towards Effective Bug Triage with Software Data Reduction Techniques
Bug Triage: An Automated Process
Towards effective bug triage with software
Survey on Software Data Reduction Techniques Accomplishing Bug Triage

What's hot (19)

PDF
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
PDF
Knowledge and Data Engineering IEEE 2015 Projects
PDF
J034057065
PDF
TOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUES
PDF
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
PDF
Predicting Fault-Prone Files using Machine Learning
PDF
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
DOCX
High performance intrusion detection using modified k mean & naïve bayes
PDF
A Survey on Bug Tracking System for Effective Bug Clearance
PDF
Knowledge and Data Engineering IEEE 2015 Projects
PDF
Comparative Performance Analysis of Machine Learning Techniques for Software ...
PDF
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
PDF
An efficient tool for reusable software
PDF
Generation of Search Based Test Data on Acceptability Testing Principle
PDF
Benchmarking machine learning techniques
PDF
Implementation of reducing features to improve code change based bug predicti...
PDF
The International Journal of Engineering and Science (IJES)
DOC
Abstract.doc
IRJET- Data Reduction in Bug Triage using Supervised Machine Learning
Knowledge and Data Engineering IEEE 2015 Projects
J034057065
TOWARDS PREDICTING SOFTWARE DEFECTS WITH CLUSTERING TECHNIQUES
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Predicting Fault-Prone Files using Machine Learning
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
High performance intrusion detection using modified k mean & naïve bayes
A Survey on Bug Tracking System for Effective Bug Clearance
Knowledge and Data Engineering IEEE 2015 Projects
Comparative Performance Analysis of Machine Learning Techniques for Software ...
Software Engineering Domain Knowledge to Identify Duplicate Bug Reports
An efficient tool for reusable software
Generation of Search Based Test Data on Acceptability Testing Principle
Benchmarking machine learning techniques
Implementation of reducing features to improve code change based bug predicti...
The International Journal of Engineering and Science (IJES)
Abstract.doc
Ad

Similar to TOWARDS EFFECTIVE BUG TRIAGE WITH SOFTWARE DATA REDUCTION TECHNIQUES (20)

PDF
IRJET-Automatic Bug Triage with Software
PDF
AUTOMATED BUG TRIAGE USING ADVANCED DATA REDUCTION TECHNIQUES
PDF
PDF
A tale of bug prediction in software development
PPTX
Biological modeling of software development dynamics
PPTX
sri indu 1213 it
DOCX
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
PDF
A tale of experiments on bug prediction
PDF
Effective Bug Identification-Best Practices and Techniques
PDF
Effective Bug Tracking Systems: Theories and Implementation
DOC
75.bug tracking for improving software quality & reliability
PDF
Evaluating the Usefulness of IR-Based Fault LocalizationTechniques
PPT
Memories of Bug Fixes
PDF
Comparative performance analysis
PPT
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
PPT
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
PPTX
A Bug Tracking System Is A Software Application
PPTX
Automated bug localization
PDF
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
IRJET-Automatic Bug Triage with Software
AUTOMATED BUG TRIAGE USING ADVANCED DATA REDUCTION TECHNIQUES
A tale of bug prediction in software development
Biological modeling of software development dynamics
sri indu 1213 it
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
A tale of experiments on bug prediction
Effective Bug Identification-Best Practices and Techniques
Effective Bug Tracking Systems: Theories and Implementation
75.bug tracking for improving software quality & reliability
Evaluating the Usefulness of IR-Based Fault LocalizationTechniques
Memories of Bug Fixes
Comparative performance analysis
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
A Bug Tracking System Is A Software Application
Automated bug localization
A Bug Report Analysis and Search Tool (presentation for M.Sc. degree)
Ad

More from Shakas Technologies (20)

DOCX
A Review on Deep-Learning-Based Cyberbullying Detection
DOCX
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
DOCX
A Novel Framework for Credit Card.
DOCX
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
DOCX
NS2 Final Year Project Titles 2023- 2024
DOCX
MATLAB Final Year IEEE Project Titles 2023-2024
DOCX
Latest Python IEEE Project Titles 2023-2024
DOCX
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
DOCX
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
DOCX
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
DOCX
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
DOCX
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
DOCX
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
DOCX
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
DOCX
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
DOCX
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
DOCX
Fighting Money Laundering With Statistics and Machine Learning.docx
DOCX
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
DOCX
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
DOCX
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...
A Review on Deep-Learning-Based Cyberbullying Detection
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
A Novel Framework for Credit Card.
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
NS2 Final Year Project Titles 2023- 2024
MATLAB Final Year IEEE Project Titles 2023-2024
Latest Python IEEE Project Titles 2023-2024
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Fighting Money Laundering With Statistics and Machine Learning.docx
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...

Recently uploaded (20)

PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
01-Introduction-to-Information-Management.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Institutional Correction lecture only . . .
PDF
Complications of Minimal Access Surgery at WLH
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Lesson notes of climatology university.
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Basic Mud Logging Guide for educational purpose
PDF
Classroom Observation Tools for Teachers
PPTX
Cell Types and Its function , kingdom of life
PPTX
GDM (1) (1).pptx small presentation for students
PDF
RMMM.pdf make it easy to upload and study
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
O7-L3 Supply Chain Operations - ICLT Program
01-Introduction-to-Information-Management.pdf
human mycosis Human fungal infections are called human mycosis..pptx
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Anesthesia in Laparoscopic Surgery in India
Institutional Correction lecture only . . .
Complications of Minimal Access Surgery at WLH
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Lesson notes of climatology university.
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
TR - Agricultural Crops Production NC III.pdf
Renaissance Architecture: A Journey from Faith to Humanism
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Basic Mud Logging Guide for educational purpose
Classroom Observation Tools for Teachers
Cell Types and Its function , kingdom of life
GDM (1) (1).pptx small presentation for students
RMMM.pdf make it easy to upload and study

TOWARDS EFFECTIVE BUG TRIAGE WITH SOFTWARE DATA REDUCTION TECHNIQUES

  • 1. #13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602, Project Titles: http://shakastech.weebly.com/2015-2016-titles Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com TOWARDS EFFECTIVE BUG TRIAGE WITH SOFTWARE DATA REDUCTION TECHNIQUES Abstract—Software companies spend over 45 percent of cost in dealing with software bugs. An inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new bug. To decrease the time cost in manual work, text classification techniques are applied to conduct automatic bug triage. In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the scale and improve the quality of bug data. We combine instance selection with feature selection to simultaneously reduce data scale on the bug dimension and the word dimension. To determine the order of applying instance selection and feature selection, we extract attributes from historical bug data sets and build a predictive model for a new bug data set. We empirically investigate the performance of data reduction on totally 600,000 bug reports of two large open source projects, namely Eclipse and Mozilla. The results show that our data reduction can effectively reduce the data scale and improve the accuracy of bug triage. Our work provides an approach to leveraging techniques on data processing to form reduced and high- quality bug data in software development and maintenance. EXISTING SYSTEM: we review existing work on modeling bug data, bug triage, and the quality of bug data with defect prediction. 7.1 Modeling Bug Data To investigate the relationships in bug data, Sandusky et al. form a bug report network to examine the dependency among bug reports. Besides studying relationships among bug reports, Hong et al. build a developer social network to examine the collaboration among developers based on the bug data in Mozilla project. This developer social network is helpful to understand the developer community and the project evolution. By mapping bug priorities to developers, Xuan et al. identify the developer prioritization in open source bug repositories. The developer prioritization can distinguish developers and assist tasks in software maintenance. Bug Triage Bug triage aims to assign an appropriate developer to fix a new bug, i.e., to determine who should fix a bug. _Cubrani_c and Murphy first propose the problem of automatic bug triage to reduce the cost of manual bug triage. They apply text classification techniques to predict related developers. Anvik et al.
  • 2. #13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602, Project Titles: http://shakastech.weebly.com/2015-2016-titles Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com examine multiple techniques on bug triage, including data preparation and typical classifiers. Anvik and Murphy extend above work to reduce the effort of bug triage by creating development-oriented recommenders. Jeong et al. find out that over 37 percent of bug reports have been reassigned in manual bug triage. They propose a tossing graph method to reduce reassignment in bug triage. To avoid low-quality bug reports in bug triage, Xuan et al. train a semi-supervised classifier by combining unlabeled bug reports with labeled ones. Park et al. convert bug triage into an optimization problem and propose a collaborative filtering approach to reducing the bugfixing time. PROPOSED SYSTEM: The primary contributions of this paper are as follows: 1) We present the problem of data reduction for bug triage. This problem aims to augment the data set of bug triage in two aspects, namely a) to simultaneously reduce the scales of the bug dimension and the word dimension and b) to improve the accuracy of bug triage. 2) We propose a combination approach to addressing the problem of data reduction. This can be viewed as an application of instance selection and feature selection in bug repositories. 3) We build a binary classifier to predict the order of applying instance selection and feature selection. To our knowledge, the order of applying instance selection and feature selection has not been investigated in related domains. This paper is an extension of our previous work. In this extension, we add new attributes extracted from bug data sets, prediction for reduction orders, and experiments on four instance selection algorithms, four feature selection algorithms, and their combinations In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the bug data to save the labor cost of developers and improve the quality to facilitate the process of bug triage. Data reduction for bug triage aims to build a small-scale and high-quality set of bug data by removing bug reports and words, which are redundant or non- informative. In our work, we combine existing techniques of instance selection and feature selection to simultaneously reduce the bug dimension and the word dimension. The reduced bug data contain fewer bug reports and fewer words than the original bug data and provide similar information over the original bug data. We evaluate the reduced bug data according to two
  • 3. #13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602, Project Titles: http://shakastech.weebly.com/2015-2016-titles Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com criteria: the scale of a data set and the accuracy of bug triage. To avoid the bias of a single algorithm, we empirically examine the results of four instance selection algorithms and four feature selection algorithms. MODULE 1 DATA REDUCTION Data reduction is the transformation of numerical or alphabetical digital information derived empirically or experimentally into a corrected, ordered, and simplified form. The basic concept is the reduction of multitudinous amounts of data down to the meaningful parts. When information is derived from instrument readings there may also be a transformation from analog to digital form. When the data are already in digital form the 'reduction' of the data typically involves some editing, scaling, coding, sorting, collating, and producing tabular summaries. When the observations are discrete but the underlying phenomenon is continuous then smoothing and interpolation are often needed. Often the data reduction is undertaken in the presence of reading or measurement errors. Some idea of the nature of these errors is needed before the most likely value may be determined. Module 2 Benefit of Data Reduction In our work, to save the labor cost of developers, the data reduction for bug triage has two goals, 1) reducing the data scale and 2) improving the accuracy of bug triage. In contrast to modeling the textual content of bug reports in existing work, we aim to augment the data set to build a preprocessing approach, which can be applied before an existing bug triage approach. We explain the two goals of data reduction as follows. Reducing the Data Scale - We reduce scales of data sets to save the labor cost of developers. Bug dimension.The aim of bug triage is to assign developers for bug fixing. Once a developer is assigned to a new bug report, the developer can examine historically fixed bugs to form a solution to the current bug report. For example, historical bugs are checked to detect whether the new bug is the duplicate of an existing one; moreover, existing solutions to bugs can be searched and applied to the new bug . Thus, we consider reducing duplicate and noisy bug reports to decrease the number of historical bugs. In
  • 4. #13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602, Project Titles: http://shakastech.weebly.com/2015-2016-titles Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com practice, the labor cost of developers (i.e., the cost of examining historical bugs) can be saved by decreasing the number of bugs based on instance selection. Word dimension. We use feature selection to remove noisy or duplicate words in a data set. Based on feature selection, the reduced data set can be handled more easily by automatic techniques (e.g., bug triage approaches) than the original data set. Besides bug triage, the reduced data set can be further used for other software tasks after bug triage (e.g., severity identification, time prediction, and reopened bug analysis). Improving the Accuracy - Accuracy is an important evaluation criterion for bug triage. In our work, data reduction explores and removes noisy or duplicate information in data sets. Bug dimension. Instance selection can remove uninformative bug reports; meanwhile, we can observe that the accuracy may be decreased by removing bug reports. Word dimension By removing uninformative words, feature selection improves the accuracy of bug triage. This can recover the accuracy loss by instance selection. Module3 Data reduction for bug triage We propose bug data reduction to reduce the scale and to improve the quality of data in bug repositories. We combine existing techniques of instance selection and feature selection to remove certain bug reports and words. A problem for reducing the bug data is to determine the order of applying instance selection and feature selection, which is denoted as the prediction of reduction orders. In this section, we first present how to apply instance selection and feature selection to bug data, i.e., data reduction for bug triage. Then, we list the benefit of the data reduction. Module 4 Applying Instance Selection and Feature Selection In bug triage, a bug data set is converted into a text matrix with two dimensions, namely the bug dimension and the word dimension. In our work, we leverage the combination of instance selection and feature selection to generate a reduced bug data set. We replace the original data set with the reduced data set for bug triage. Instance selection and feature selection are widely
  • 5. #13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602, Project Titles: http://shakastech.weebly.com/2015-2016-titles Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com used techniques in data processing. For a given data set in a certain application, instance selection is to obtain a subset of relevant instances (i.e., bug reports in bug data) while feature selection aims to obtain a subset of relevant features (i.e., words in bug data). In our work, we employ the combination of instance selection and feature selection. To distinguish the orders of applying instance selection and feature selection, we give the following denotation. Given an instance selection algorithm IS and a feature selection algorithm FS, we use FS!IS to denote the bug data reduction, which first applies FS and then IS; on the other hand, IS!FS denotes first applying IS and then FS. In Algorithm 1, we briefly present how to reduce the bug data based on FS ! IS. Given a bug data set, the output of bug data reduction is a new and reduced data set. Two algorithms FS and IS are applied sequentially. Note that in Step 2), some of bug reports may be blank during feature selection, i.e., all the words in a bug report are removed. Such blank bug reports are also removed in the feature selection. Module 5 Reduction Orders To apply the data reduction to each new bug data set, we need to check the accuracy of both two orders (FS ! IS and IS!FS) and choose a better one. To avoid the time cost of manually checking both reduction orders, we consider predicting the reduction order for a new bug data set based on historical data sets. We convert the problem of prediction for reduction orders into a binary classification problem. A bug data set is mapped to an instance and the associated reduction order (either FS ! IS or IS ! FS) is mapped to the label of a class of instances. Note that a classifier can be trained only once when facing many new bug data sets. That is, training such a classifier once can predict the reduction orders for all the new data sets without checking both reduction orders. To date, the problem of predicting reduction orders of applying feature selection and instance selection has not been investigated in other application scenarios. From the perspective of software engineering, predicting the reduction order for bug data sets can be viewed as a kind of software metrics, which involves activities for measuring some property for a piece of software. However, the features in our work are extracted from the bug data set while the features in existing work on software metrics are for individual software artifacts,3 e.g., an individual bug report or an individual piece of code. In this paper, to avoid ambiguous
  • 6. #13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602, Project Titles: http://shakastech.weebly.com/2015-2016-titles Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com denotations, an attribute refers to an extracted feature of a bug data set while a feature refers to a word of a bug report. CONCLUSIONS Bug triage is an expensive step of software maintenance in both labor cost and time cost. In this paper, we combine feature selection with instance selection to reduce the scale of bug data sets as well as improve the data quality. To determine the order of applying instance selection and feature selection for a new bug data set, we extract attributes of each bug data set and train a predictive model based on historical data sets. We empirically investigate the data reduction for bug triage in bug repositories of two large open source projects, namely Eclipse and Mozilla. Our work provides an approach to leveraging techniques on data processing to form reduced and high-quality bug data in software development and maintenance. In future work, we plan on improving the results of data reduction in bug triage to explore how to prepare a highquality bug data set and tackle a domain-specific software task. For predicting reduction orders, we plan to pay efforts to find out the potential relationship between the attributes of bug data sets and the reduction orders. REFERENCES [1] J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?” in Proc. 28th Int. Conf. Softw. Eng., May 2006, pp. 361–370. [2] S. Artzi, A. Kie_zun, J. Dolby, F. Tip, D. Dig, A. Paradkar, and M. D. Ernst, “Finding bugs in web applications using dynamic test generation and explicit-state model checking,” IEEE Softw., vol. 36, no. 4, pp. 474–494, Jul./Aug. 2010. [3] J. Anvik and G. C. Murphy, “Reducing the effort of bug report triage: Recommenders for development-oriented decisions,” ACM Trans. Soft. Eng. Methodol., vol. 20, no. 3, article 10, Aug. 2011.
  • 7. #13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6. Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602, Project Titles: http://shakastech.weebly.com/2015-2016-titles Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com [4] C. C. Aggarwal and P. Zhao, “Towards graphical models for text processing,” Knowl. Inform. Syst., vol. 36, no. 1, pp. 1–21, 2013. [5] Bugzilla, (2014). [Online]. Avaialble: http://bugzilla.org/ [6] K. Balog, L. Azzopardi, and M. de Rijke, “Formal models for expert finding in enterprise corpora,” in Proc. 29th Annu. Int. ACM SIGIR Conf. Res. Develop. Inform. Retrieval, Aug. 2006, pp. 43–50.