Predicting Defective Lines Using a Model-Agnostic Technique

Predicting Defective Lines Using
a Model-AgnosticTechnique
Journal-First Paper (Transactions on Software Engineering)
Supatsara
Wattanakriengkrai
Patanamon
Thongtanunam
Chakkrit
Tantithamthavorn
Hideaki
Hata
Kenichi
Matsumoto

An SQA team needs to carefully identify defects in
changed files that will be merged into the release branch
2
Source Code
Files
Version Control
System
Localize defects
…
Defect-Prone
Code
SQATeam
B. Adams and S. McIntosh, “Modern release engineering in a nutshell–why researchers should care,” in SANER, 2016, pp. 78–90.

To spend the optimal effort, the SQA team needs to
prioritize files that are likely to have defects in the future
3
Source Code
Files
Version Control
System
Prioritize files
SQATeam
Ranked Source
Code Files
Defect-Prone
Code
…
Localize
defective lines
…
B. Adams and S. McIntosh, “Modern release engineering in a nutshell–why researchers should care,” in SANER, 2016, pp. 78–90.

Defect prediction models are proposed to help
SQA teams prioritize their effort
4
Source Code
Files
Version Control
System
Prioritize files
using defect
models
Ranked Source
Code Files
…
[Kamei et al. ICSME 2010]
[Mende and Koschke CSMR 2010]
…

However, developers could still waste an SQA effort on
manually identifying the most risky lines
5
Source Code
Files
Version Control
System
Prioritize files
using defect
models
Ranked Source
Code Files
Defect-Prone
Code
…
Localize
defective lines
…
SQATeam
[Kamei et al. ICSME 2010]
[Mende and Koschke CSMR 2010]
…

As little as 1%-3% of the lines of code in a file are actually
defective after release
6
Studied systems Activemq Camel Derby Groovy Hbase Hive Jruby Lucene Wicket
%Defective files 2-7% 2-8% 6-28% 2-4% 7-11% 6-19% 2-13% 2-8% 2-16%
%Defective lines
in defective files
(at the median)
2% 2% 2% 2% 1% 2% 2% 3% 3%
**Detective lines are the source code lines that will be changed by bug-fixing commits to fix post-release defects

As little as 1%-3% of the lines of code in a file are actually
defective after release
7
Developers may still waste 97%-99% of the effort
when inspecting defect-prone files

8
Defect
Dataset
Extracting
Features
Building
file-level
defect models
File_A.java
File_B.java
File_C.java
Token_1
Token_2
… Token_N
An overview of building file-level defect models

9
File-Level
Defect Models
LIME
Testing Files
Predicting
defect-prone
lines
Defect-Prone
Files
Identifying
defect-prone
lines
Identified Defect-
Prone Lines
Ranking
defect-prone
lines
…
Most-risky
Least-risky
[Ribeiro et al. SIGKDD 2016]
**LIME is a model-agnostic technique that aims to mimic the behavior of the predictions of the defect model by explaining the individual predictions

An illustrative example of our approach
for identifying defect-prone lines
10
LIME
A Defect-Prone
File
Identified defect-prone lines
File-Level Defect
Prediction Model
A File of Interest
(Testing file)
LIME
oldCurrent
current
node
closure
Defective
(LIME Score >0)
Clean
(LIME score < 0)
0.8
0.1
-0.3
-0.7
Ranking tokens based on
LIME scores
Mapping tokens
to lines
if(closure != null){
Object oldCurrent = current;
setClosure(closure, node);
closure.call();
current = oldCurrent;
}
closure.call();
}
File-Level Defect
Prediction Model
A File of Interest
(Testing file)
LIME
oldCurrent
current
node
closure
Defective
(LIME Score >0)
Clean
(LIME score < 0)
0.8
0.1
-0.3
-0.7
LIME scores
Mapping tokens
to lines
closure.call();
}
closure.call();
}
Mapping tokens
to lines
File-Level Defect
Prediction Model
A File of Interest
(Testing file)
LIME
oldCurrent
current
node
closure
Defective
(LIME Score >0)
Clean
(LIME score < 0)
0.8
0.1
-0.3
-0.7
LIME scores
Mapping tokens
to lines
closure.call();
}
closure.call();
}
Ranking tokens
based on LIME
scores
Identified Defect-
Prone Lines
[Ribeiro et al. SIGKDD 2016]

An illustrative example of our approach
for identifying defect-prone lines
11
Code tokens that frequently appeared in defective files
in the past may also appear in the lines that
will be fixed after release

Research Questions
12
Computation
Time
Ranking
Performance
Predictive Accuracy

LIME
We compare our approach
against six line-level baseline approaches
13
Our Appraoch
Random Guessing PMD ErrorProne
NLP
Random
Forest
Logistic
Regression
ErrorProne
!= null)
If ( closure
TMI-RF TMI-LR
[Copeland PMDApplied 2005] [Aftandilian et al SCAM 2012]
[Hellendoorn et al
ESEC/FSE 2017]
**TMI-LR is a traditional model interpretation based approach with logistic regression
**TMI-RF is a traditional model interpretation based approach with random forest

Our approach achieves an overall predictive accuracy
better than baseline approaches
14
Our Approach
Line-level
Baseline Approaches
Recall 0.61 – 0.62 0.01 - 0.51
MCC 0.04 – 0.05 -0.01 – 0.03
False Alarm 0.47 – 0.48 o.01 -0.54
Distance to Heaven
(the root mean square of the recall and
false alarm values)
0.43– 0.44 0.52 – 0.70
LIME
The higher
the better
The lower
the better

Experimental Results
15
Computation
Time
Ranking
Performance
Predictive Accuracy
Our approach
achieves an overall
predictive accuracy
better than baseline
approaches

Given a fixed amount of effort, our approach identify
actual defective lines better than baseline approaches
16
Our Approach
Line-level
Baseline Approaches
Recall@Top20%LOC 0.26 – 0.27 0.17 – 0.22
Initial False Alarm
(the number of clean lines on which
developers spend SQA effort until
the first defective line is found when
lines are ranked)
9 - 16 10- 403
LIME
The higher
the better
The lower
the better

17
Computation
Time
Ranking
Performance
Predictive Accuracy
Our approach
achieves an overall
predictive accuracy
approaches
Given a fixed amount
of effort, our
approach identify
actual defective lines
approaches

The computational time of our approach is manageable
when considering the predictive accuracy of defective lines
18
Within-release Cross-release
Our Approach
3 rd 8.46
PMD
4 th 4 th
ErrorProne
5 th 5 th
NLP
26.85
TMI-LR
6 th 6 th
TMI-RF
11.15 3 rd
1 st 1 st
2 nd
2 nd

Computation
Time
Ranking
Performance
Predictive Accuracy
19
Our approach
achieves an overall
predictive accuracy
approaches
Given a fixed amount
of effort, our
approach identify
actual defective lines
approaches
The computational
time of our approach
is manageable
when considering the
predictive accuracy of
defective lines

20
Our work builds an important step towards line-level defect prediction
by leveraging a model-agnostic technique .
Our framework will help developers effectively prioritize SQA effort.
Supatsara Wattanakriengkrai
wattanakri.supatsara.ws3@is.naist.jp

Predicting Defective Lines Using a Model-Agnostic Technique

More Related Content

Similar to Predicting Defective Lines Using a Model-Agnostic Technique (20)

Recently uploaded (20)

Predicting Defective Lines Using a Model-Agnostic Technique