Ghotra icse

Revisiting the Impact of Classiﬁcation
Techniques on the Performance of
Defect Prediction Models
Baljinder
Ghotra
Ahmed E.
Hassan
Shane
McIntosh

Quality assurance teams have
limited resources
Personnel Schedules
2

Executing all test suites
takes too long
3
Often release several times
in one day!

Defect models can help QA teams to
allocate limited resources effectively
4
Defect prediction
model

Defect models are trained using historical
data to predict the defect-prone modules
5
a
b
c c
a
New!
c
Reason
for change
Changed
modules
Developer
responsible

Defect prediction
model
Defect models are trained using historical
data to predict the defect-prone modules
6
abccaNew!c
Low risk
a b
High risk
c

Defect models are trained using
various techniques
7
Simple
techniques
Advanced
techniques
Decision
Trees
Logistic
Regression
+
Logistic
Model Trees
(LMT)

Most classiﬁcation techniques produce
models that achieve similar performance?
8
Decision Trees Logistic Model Trees
(LMT)
+
The performance of 17 of 22
studied techniques are
indistinguishable
Benchmarking classiﬁcation
models for software defect
prediction
S. Lessmann, B. Baesens,
C. Mues, S. Pietsch
[TSE 2008]

Limitations of the prior work
9
Overlapping
statistical ranks
Noisy
data
Limited
scope

Do most techniques produce models
with similar performance, when we use:
10
Non-overlapping
statistical ranks
Clean
data
Expanded
scope
Overlapping
statistical ranks
Noisy
data
Limited
scope

11
Non-overlapping
statistical ranks
Expanded
scope
Clean
data

12
Non-overlapping
statistical ranks
Expanded
scope
Clean
data

Our approach to study the impact of
classiﬁcation techniques on defect models
13
Train and
test models
using
different
techniques
Rank
techniques
using
statistical
clustering
11a
22b
NNz
...
Performance
scores for
each
technique
Rank Tech.
1
2
3
z, …
a,b,…
…
Repeat
100 times

Unfortunately, some projects yield
poorer results than others
14
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
CM1
JM1
KC1
KC3
KC4
MW1
PC1
PC2
PC3
PC4
0.5
0.6
0.7
0.8
0.9
AUC
Performance valuesrarely overlap!

Non-overlapping ranks using a
double Scott-Knott test
15
Project 2
Scott-Knott
test (1st run)
...Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
10x
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
10x
Mean AUC
value
Technique N
Mean AUC
value
Technique N
Mean AUC
value
Technique N
10x
T2, T5, T7
TechniqueRank
1
T1, T102
T3, T4, T63
T8, T94
Project 1
Scott-Knott
test (1st run)
...Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
Mean AUC
value
Technique 1
10x
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
Mean AUC
value
Technique 2
10x
Mean AUC
value
Technique N
Mean AUC
value
Technique N
Mean AUC
value
Technique N
10x
T3, T7, T8
TechniqueRank
1
T2, T102
T1, T4, T63
T5, T94
Project M
...

Non-overlapping ranks using a
double Scott-Knott test
16
Scott-Knott
test (2nd run)
Scott-Knott
test (1st run)
10x
T2, T5, T7
TechniqueRank
1
T1, T102
T3, T4, T63
T8, T94
T2, T5
TechniqueRank
1
T1, T7, T102
T3, T4, T63
T8, T94
Scott-Knott
test (1st run)
10x
T3, T7, T8
TechniqueRank
1
T2, T102
T1, T4, T63
T5, T94

17
Non-overlapping test:
Most techniques have similar performance
Rank
1
2
Ad+NB, EM, RBFs, …
Rsub+SMO, J48, …
Technique
Similar to the prior work,techniques are groupedinto 2 distinct ranks

18
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks

19
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks

Clean NASA dataset:
Cleaning criteria of prior work
20
Data Quality: Some Comments on the
NASA Software Defect Datasets
M. Shepperd, Q. Song, Z. Sun, C. Mair
[TSE 2013]
Identical cases
Missing values
Constraint violations

Clean NASA dataset:
Many distinct ranks of techniques
21
Rank
1
2
LMT, SL, …
KNN, RBFs, …
Technique
3 J48, K-means, …
4 SMO, Ridor, …
Unlike the prior work,techniques are groupedinto 4 distinct ranks
Top performers are LMTand logistic regression

22
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks
No, unlike theprior work,techniques aregrouped into 4distinct ranks

23
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
Yes, techniques
are grouped into
2 distinct ranks

Another dataset:
The PROMISE corpus
24

Another dataset:
Four signiﬁcant ranks of techniques
25
Rank
1
2
LMT, SL, …
KNN, RBFs, …
Technique
3 J48, K-means, …
4 SMO, Ridor, …
Unlike the prior work,techniques are groupedinto 4 distinct ranks
Top performers are LMTand logistic regression

26
Non-overlapping
statistical ranks
Expanded
scope
Clean
data
No, similar to the
clean data study,
techniques are
grouped into 4
distinct ranks
Yes, techniques
are grouped into
2 distinct ranks

Classiﬁcation technique
matters!
27
Decision Trees Logistic Model Trees
(LMT)
+

Low-cost suggestion:
Experiment with the available techniques
28
6,618 packages
are available
on CRAN
148 packagesare available inpackage explorer

Ghotra icse

More Related Content

What's hot (14)

Similar to Ghotra icse (20)

More from SAIL_QU (20)

Ghotra icse