TOWARDS EFFECTIVE BUG TRIAGE WITH SOFTWARE DATA REDUCTION TECHNIQUES

#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, vellore – 6.
Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
Project Titles: http://shakastech.weebly.com/2015-2016-titles
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
TOWARDS EFFECTIVE BUG TRIAGE WITH SOFTWARE
DATA REDUCTION TECHNIQUES
Abstract—Software companies spend over 45 percent of cost in dealing with software bugs. An
inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new
bug. To decrease the time cost in manual work, text classification techniques are applied to
conduct automatic bug triage. In this paper, we address the problem of data reduction for bug
triage, i.e., how to reduce the scale and improve the quality of bug data. We combine instance
selection with feature selection to simultaneously reduce data scale on the bug dimension and the
word dimension. To determine the order of applying instance selection and feature selection, we
extract attributes from historical bug data sets and build a predictive model for a new bug data
set. We empirically investigate the performance of data reduction on totally 600,000 bug reports
of two large open source projects, namely Eclipse and Mozilla. The results show that our data
reduction can effectively reduce the data scale and improve the accuracy of bug triage. Our work
provides an approach to leveraging techniques on data processing to form reduced and high-
quality bug data in software development and maintenance.
EXISTING SYSTEM:
we review existing work on modeling bug data, bug triage, and the quality of bug data
with defect prediction. 7.1 Modeling Bug Data To investigate the relationships in bug data,
Sandusky et al. form a bug report network to examine the dependency among bug reports.
Besides studying relationships among bug reports, Hong et al. build a developer social network
to examine the collaboration among developers based on the bug data in Mozilla project. This
developer social network is helpful to understand the developer community and the project
evolution. By mapping bug priorities to developers, Xuan et al. identify the developer
prioritization in open source bug repositories. The developer prioritization can distinguish
developers and assist tasks in software maintenance. Bug Triage Bug triage aims to assign an
appropriate developer to fix a new bug, i.e., to determine who should fix a bug. _Cubrani_c and
Murphy first propose the problem of automatic bug triage to reduce the cost of manual bug
triage. They apply text classification techniques to predict related developers. Anvik et al.

Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
examine multiple techniques on bug triage, including data preparation and typical classifiers.
Anvik and Murphy extend above work to reduce the effort of bug triage by creating
development-oriented recommenders. Jeong et al. find out that over 37 percent of bug reports
have been reassigned in manual bug triage. They propose a tossing graph method to reduce
reassignment in bug triage. To avoid low-quality bug reports in bug triage, Xuan et al. train a
semi-supervised classifier by combining unlabeled bug reports with labeled ones. Park et al.
convert bug triage into an optimization problem and propose a collaborative filtering approach to
reducing the bugfixing time.
PROPOSED SYSTEM:
The primary contributions of this paper are as follows:
1) We present the problem of data reduction for bug triage. This problem aims to augment the
data set of bug triage in two aspects, namely a) to simultaneously reduce the scales of the bug
dimension and the word dimension and b) to improve the accuracy of bug triage.
2) We propose a combination approach to addressing the problem of data reduction. This can be
viewed as an application of instance selection and feature selection in bug repositories.
3) We build a binary classifier to predict the order of applying instance selection and feature
selection. To our knowledge, the order of applying instance selection and feature selection has
not been investigated in related domains. This paper is an extension of our previous work. In this
extension, we add new attributes extracted from bug data sets, prediction for reduction orders,
and experiments on four instance selection algorithms, four feature selection algorithms, and
their combinations In this paper, we address the problem of data reduction for bug triage, i.e.,
how to reduce the bug data to save the labor cost of developers and improve the quality to
facilitate the process of bug triage. Data reduction for bug triage aims to build a small-scale and
high-quality set of bug data by removing bug reports and words, which are redundant or non-
informative. In our work, we combine existing techniques of instance selection and feature
selection to simultaneously reduce the bug dimension and the word dimension. The reduced bug
data contain fewer bug reports and fewer words than the original bug data and provide similar
information over the original bug data. We evaluate the reduced bug data according to two

Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
criteria: the scale of a data set and the accuracy of bug triage. To avoid the bias of a single
algorithm, we empirically examine the results of four instance selection algorithms and four
feature selection algorithms.
MODULE 1
DATA REDUCTION
Data reduction is the transformation of numerical or alphabetical digital information derived
empirically or experimentally into a corrected, ordered, and simplified form. The basic concept is
the reduction of multitudinous amounts of data down to the meaningful parts. When information
is derived from instrument readings there may also be a transformation from analog to digital
form. When the data are already in digital form the 'reduction' of the data typically involves some
editing, scaling, coding, sorting, collating, and producing tabular summaries. When the
observations are discrete but the underlying phenomenon is continuous then smoothing and
interpolation are often needed. Often the data reduction is undertaken in the presence of reading
or measurement errors. Some idea of the nature of these errors is needed before the most likely
value may be determined.
Module 2
Benefit of Data Reduction
In our work, to save the labor cost of developers, the data reduction for bug triage has two goals,
1) reducing the data scale and 2) improving the accuracy of bug triage. In contrast to modeling
the textual content of bug reports in existing work, we aim to augment the data set to build a
preprocessing approach, which can be applied before an existing bug triage approach. We
explain the two goals of data reduction as follows. Reducing the Data Scale - We reduce scales
of data sets to save the labor cost of developers. Bug dimension.The aim of bug triage is to
assign developers for bug fixing. Once a developer is assigned to a new bug report, the developer
can examine historically fixed bugs to form a solution to the current bug report. For example,
historical bugs are checked to detect whether the new bug is the duplicate of an existing one;
moreover, existing solutions to bugs can be searched and applied to the new bug . Thus, we
consider reducing duplicate and noisy bug reports to decrease the number of historical bugs. In

Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
practice, the labor cost of developers (i.e., the cost of examining historical bugs) can be saved by
decreasing the number of bugs based on instance selection. Word dimension. We use feature
selection to remove noisy or duplicate words in a data set. Based on feature selection, the
reduced data set can be handled more easily by automatic techniques (e.g., bug triage
approaches) than the original data set. Besides bug triage, the reduced data set can be further
used for other software tasks after bug triage (e.g., severity identification, time prediction, and
reopened bug analysis).
Improving the Accuracy - Accuracy is an important evaluation criterion for bug triage. In our
work, data reduction explores and removes noisy or duplicate information in data sets. Bug
dimension. Instance selection can remove uninformative bug reports; meanwhile, we can observe
that the accuracy may be decreased by removing bug reports. Word dimension By removing
uninformative words, feature selection improves the accuracy of bug triage. This can recover the
accuracy loss by instance selection.
Module3
Data reduction for bug triage
We propose bug data reduction to reduce the scale and to improve the quality of data in bug
repositories. We combine existing techniques of instance selection and feature selection to
remove certain bug reports and words. A problem for reducing the bug data is to determine the
order of applying instance selection and feature selection, which is denoted as the prediction of
reduction orders. In this section, we first present how to apply instance selection and feature
selection to bug data, i.e., data reduction for bug triage. Then, we list the benefit of the data
reduction.
Module 4
Applying Instance Selection and Feature Selection
In bug triage, a bug data set is converted into a text matrix with two dimensions, namely the bug
dimension and the word dimension. In our work, we leverage the combination of instance
selection and feature selection to generate a reduced bug data set. We replace the original data
set with the reduced data set for bug triage. Instance selection and feature selection are widely

Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
used techniques in data processing. For a given data set in a certain application, instance
selection is to obtain a subset of relevant instances (i.e., bug reports in bug data) while feature
selection aims to obtain a subset of relevant features (i.e., words in bug data). In our work, we
employ the combination of instance selection and feature selection. To distinguish the orders of
applying instance selection and feature selection, we give the following denotation. Given an
instance selection algorithm IS and a feature selection algorithm FS, we use FS!IS to denote the
bug data reduction, which first applies FS and then IS; on the other hand, IS!FS denotes first
applying IS and then FS. In Algorithm 1, we briefly present how to reduce the bug data based on
FS ! IS. Given a bug data set, the output of bug data reduction is a new and reduced data set.
Two algorithms FS and IS are applied sequentially. Note that in Step 2), some of bug reports
may be blank during feature selection, i.e., all the words in a bug report are removed. Such blank
bug reports are also removed in the feature selection.
Module 5
Reduction Orders
To apply the data reduction to each new bug data set, we need to check the accuracy of both two
orders (FS ! IS and IS!FS) and choose a better one. To avoid the time cost of manually checking
both reduction orders, we consider predicting the reduction order for a new bug data set based on
historical data sets. We convert the problem of prediction for reduction orders into a binary
classification problem. A bug data set is mapped to an instance and the associated reduction
order (either FS ! IS or IS ! FS) is mapped to the label of a class of instances. Note that a
classifier can be trained only once when facing many new bug data sets. That is, training such a
classifier once can predict the reduction orders for all the new data sets without checking both
reduction orders. To date, the problem of predicting reduction orders of applying feature
selection and instance selection has not been investigated in other application scenarios. From
the perspective of software engineering, predicting the reduction order for bug data sets can be
viewed as a kind of software metrics, which involves activities for measuring some property for
a piece of software. However, the features in our work are extracted from the bug data set while
the features in existing work on software metrics are for individual software artifacts,3 e.g., an
individual bug report or an individual piece of code. In this paper, to avoid ambiguous

Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
denotations, an attribute refers to an extracted feature of a bug data set while a feature refers to a
word of a bug report.
CONCLUSIONS
Bug triage is an expensive step of software maintenance in both labor cost and time cost. In this
paper, we combine feature selection with instance selection to reduce the scale of bug data sets
as well as improve the data quality. To determine the order of applying instance selection and
feature selection for a new bug data set, we extract attributes of each bug data set and train a
predictive model based on historical data sets. We empirically investigate the data reduction for
bug triage in bug repositories of two large open source projects, namely Eclipse and Mozilla.
Our work provides an approach to leveraging techniques on data processing to form reduced and
high-quality bug data in software development and maintenance. In future work, we plan on
improving the results of data reduction in bug triage to explore how to prepare a highquality bug
data set and tackle a domain-specific software task. For predicting reduction orders, we plan to
pay efforts to find out the potential relationship between the attributes of bug data sets and the
reduction orders.
REFERENCES
[1] J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?” in Proc. 28th Int. Conf.
Softw. Eng., May 2006, pp. 361–370.
[2] S. Artzi, A. Kie_zun, J. Dolby, F. Tip, D. Dig, A. Paradkar, and M. D. Ernst, “Finding bugs
in web applications using dynamic test generation and explicit-state model checking,” IEEE
Softw., vol. 36, no. 4, pp. 474–494, Jul./Aug. 2010.
[3] J. Anvik and G. C. Murphy, “Reducing the effort of bug report triage: Recommenders for
development-oriented decisions,” ACM Trans. Soft. Eng. Methodol., vol. 20, no. 3, article 10,
Aug. 2011.

Off: 0416-2247353 / 6066663 Mo: +91 9500218218 /8870603602,
[4] C. C. Aggarwal and P. Zhao, “Towards graphical models for text processing,” Knowl.
Inform. Syst., vol. 36, no. 1, pp. 1–21, 2013.
[5] Bugzilla, (2014). [Online]. Avaialble: http://bugzilla.org/
[6] K. Balog, L. Azzopardi, and M. de Rijke, “Formal models for expert finding in enterprise
corpora,” in Proc. 29th Annu. Int. ACM SIGIR Conf. Res. Develop. Inform. Retrieval, Aug.
2006, pp. 43–50.

TOWARDS EFFECTIVE BUG TRIAGE WITH SOFTWARE DATA REDUCTION TECHNIQUES

More Related Content

What's hot (19)

Similar to TOWARDS EFFECTIVE BUG TRIAGE WITH SOFTWARE DATA REDUCTION TECHNIQUES (20)

More from Shakas Technologies (20)

Recently uploaded (20)

TOWARDS EFFECTIVE BUG TRIAGE WITH SOFTWARE DATA REDUCTION TECHNIQUES