SlideShare a Scribd company logo
Predicting Defective Lines Using
a Model-AgnosticTechnique
Journal-First Paper (Transactions on Software Engineering)
Supatsara
Wattanakriengkrai
Patanamon
Thongtanunam
Chakkrit
Tantithamthavorn
Hideaki
Hata
Kenichi
Matsumoto
An SQA team needs to carefully identify defects in
changed files that will be merged into the release branch
2
Source Code
Files
Version Control
System
Localize defects


Defect-Prone
Code
SQATeam
B. Adams and S. McIntosh, “Modern release engineering in a nutshell–why researchers should care,” in SANER, 2016, pp. 78–90.
To spend the optimal effort, the SQA team needs to
prioritize files that are likely to have defects in the future
3
Source Code
Files
Version Control
System
Prioritize files
SQATeam
Ranked Source
Code Files
Defect-Prone
Code


Localize
defective lines


B. Adams and S. McIntosh, “Modern release engineering in a nutshell–why researchers should care,” in SANER, 2016, pp. 78–90.
Defect prediction models are proposed to help
SQA teams prioritize their effort
4
Source Code
Files
Version Control
System
Prioritize files
using defect
models
Ranked Source
Code Files


[Kamei et al. ICSME 2010]
[Mende and Koschke CSMR 2010]


However, developers could still waste an SQA effort on
manually identifying the most risky lines
5
Source Code
Files
Version Control
System
Prioritize files
using defect
models
Ranked Source
Code Files
Defect-Prone
Code


Localize
defective lines


SQATeam
[Kamei et al. ICSME 2010]
[Mende and Koschke CSMR 2010]


As little as 1%-3% of the lines of code in a file are actually
defective after release
6
Studied systems Activemq Camel Derby Groovy Hbase Hive Jruby Lucene Wicket
%Defective files 2-7% 2-8% 6-28% 2-4% 7-11% 6-19% 2-13% 2-8% 2-16%
%Defective lines
in defective files
(at the median)
2% 2% 2% 2% 1% 2% 2% 3% 3%
**Detective lines are the source code lines that will be changed by bug-fixing commits to fix post-release defects
As little as 1%-3% of the lines of code in a file are actually
defective after release
7
Developers may still waste 97%-99% of the effort
when inspecting defect-prone files
Predicting Defective Lines Using
a Model-AgnosticTechnique
8
Defect
Dataset
Extracting
Features
Building
file-level
defect models
File_A.java
File_B.java
File_C.java
Token_1
Token_2

 Token_N
An overview of building file-level defect models
Predicting Defective Lines Using
a Model-AgnosticTechnique
9
File-Level
Defect Models
LIME
Testing Files
Predicting
defect-prone
lines
Defect-Prone
Files
Identifying
defect-prone
lines
Identified Defect-
Prone Lines
Ranking
defect-prone
lines


Most-risky
Least-risky
[Ribeiro et al. SIGKDD 2016]
**LIME is a model-agnostic technique that aims to mimic the behavior of the predictions of the defect model by explaining the individual predictions
An illustrative example of our approach
for identifying defect-prone lines
10
LIME
A Defect-Prone
File
Identified defect-prone lines
File-Level Defect
Prediction Model
A File of Interest
(Testing file)
LIME
oldCurrent
current
node
closure
Defective
(LIME Score >0)
Clean
(LIME score < 0)
0.8
0.1
-0.3
-0.7
Ranking tokens based on
LIME scores
Mapping tokens
to lines
if(closure != null){
Object oldCurrent = current;
setClosure(closure, node);
closure.call();
current = oldCurrent;
}
if(closure != null){
Object oldCurrent = current;
setClosure(closure, node);
closure.call();
current = oldCurrent;
}
Identified defect-prone lines
File-Level Defect
Prediction Model
A File of Interest
(Testing file)
LIME
oldCurrent
current
node
closure
Defective
(LIME Score >0)
Clean
(LIME score < 0)
0.8
0.1
-0.3
-0.7
Ranking tokens based on
LIME scores
Mapping tokens
to lines
if(closure != null){
Object oldCurrent = current;
setClosure(closure, node);
closure.call();
current = oldCurrent;
}
if(closure != null){
Object oldCurrent = current;
setClosure(closure, node);
closure.call();
current = oldCurrent;
}
Mapping tokens
to lines
Identified defect-prone lines
File-Level Defect
Prediction Model
A File of Interest
(Testing file)
LIME
oldCurrent
current
node
closure
Defective
(LIME Score >0)
Clean
(LIME score < 0)
0.8
0.1
-0.3
-0.7
Ranking tokens based on
LIME scores
Mapping tokens
to lines
if(closure != null){
Object oldCurrent = current;
setClosure(closure, node);
closure.call();
current = oldCurrent;
}
if(closure != null){
Object oldCurrent = current;
setClosure(closure, node);
closure.call();
current = oldCurrent;
}
Ranking tokens
based on LIME
scores
Identified Defect-
Prone Lines
[Ribeiro et al. SIGKDD 2016]
An illustrative example of our approach
for identifying defect-prone lines
11
Code tokens that frequently appeared in defective files
in the past may also appear in the lines that
will be fixed after release
Research Questions
12
Computation
Time
Ranking
Performance
Predictive Accuracy
LIME
We compare our approach
against six line-level baseline approaches
13
Our Appraoch
Random Guessing PMD ErrorProne
NLP
Random
Forest
Logistic
Regression
ErrorProne
!= null)
If ( closure
TMI-RF TMI-LR
[Copeland PMDApplied 2005] [Aftandilian et al SCAM 2012]
[Hellendoorn et al
ESEC/FSE 2017]
**TMI-LR is a traditional model interpretation based approach with logistic regression
**TMI-RF is a traditional model interpretation based approach with random forest
Our approach achieves an overall predictive accuracy
better than baseline approaches
14
Our Approach
Line-level
Baseline Approaches
Recall 0.61 – 0.62 0.01 - 0.51
MCC 0.04 – 0.05 -0.01 – 0.03
False Alarm 0.47 – 0.48 o.01 -0.54
Distance to Heaven
(the root mean square of the recall and
false alarm values)
0.43– 0.44 0.52 – 0.70
LIME
The higher
the better
The lower
the better
Experimental Results
15
Computation
Time
Ranking
Performance
Predictive Accuracy
Our approach
achieves an overall
predictive accuracy
better than baseline
approaches
Given a fixed amount of effort, our approach identify
actual defective lines better than baseline approaches
16
Our Approach
Line-level
Baseline Approaches
Recall@Top20%LOC 0.26 – 0.27 0.17 – 0.22
Initial False Alarm
(the number of clean lines on which
developers spend SQA effort until
the first defective line is found when
lines are ranked)
9 - 16 10- 403
LIME
The higher
the better
The lower
the better
Experimental Results
17
Computation
Time
Ranking
Performance
Predictive Accuracy
Our approach
achieves an overall
predictive accuracy
better than baseline
approaches
Given a fixed amount
of effort, our
approach identify
actual defective lines
better than baseline
approaches
The computational time of our approach is manageable
when considering the predictive accuracy of defective lines
18
Within-release Cross-release
Our Approach
3 rd 8.46
PMD
4 th 4 th
ErrorProne
5 th 5 th
NLP
26.85
TMI-LR
6 th 6 th
TMI-RF
11.15 3 rd
1 st 1 st
2 nd
2 nd
Computation
Time
Ranking
Performance
Predictive Accuracy
Experimental Results
19
Our approach
achieves an overall
predictive accuracy
better than baseline
approaches
Given a fixed amount
of effort, our
approach identify
actual defective lines
better than baseline
approaches
The computational
time of our approach
is manageable
when considering the
predictive accuracy of
defective lines
Predicting Defective Lines Using
a Model-AgnosticTechnique
20
Our work builds an important step towards line-level defect prediction
by leveraging a model-agnostic technique .
Our framework will help developers effectively prioritize SQA effort.
Supatsara Wattanakriengkrai
wattanakri.supatsara.ws3@is.naist.jp
21

More Related Content

PDF
Survey on Software Defect Prediction
 
PPT
A Regression Analysis Approach for Building a Prediction Model for System Tes...
PDF
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
 
PPTX
Survey on Software Defect Prediction
DOCX
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
PDF
Towards a Better Understanding of the Impact of Experimental Components on De...
PDF
Defect Prediction: Accomplishments and Future Challenges
PDF
Thesis Final Report
Survey on Software Defect Prediction
 
A Regression Analysis Approach for Building a Prediction Model for System Tes...
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
 
Survey on Software Defect Prediction
A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine...
Towards a Better Understanding of the Impact of Experimental Components on De...
Defect Prediction: Accomplishments and Future Challenges
Thesis Final Report

Similar to Predicting Defective Lines Using a Model-Agnostic Technique (20)

PPT
A GENERAL SOFTWARE DEFECT-PRONENESS PREDICTION FRAMEWORK.ppt
PDF
A survey of fault prediction using machine learning algorithms
PDF
Practical Guidelines to Improve Defect Prediction Model – A Review
PPTX
Predict Software Reliability Before the Code is Written
PPTX
An Exploration of Challenges Limiting Pragmatic Software Defect Prediction
PDF
Study of Software Defect Prediction using Forward Pass RNN with Hyperbolic Ta...
 
PDF
IRJET- A Novel Approach on Computation Intelligence Technique for Softwar...
PDF
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...
PDF
Development of software defect prediction system using artificial neural network
PDF
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
PDF
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
PDF
A simplified predictive framework for cost evaluation to fault assessment usi...
PDF
A novel approach to enhancing software quality assurance through early detect...
DOCX
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
PDF
Towards formulating dynamic model for predicting defects in system testing us...
PDF
Insights of effectivity analysis of learning-based approaches towards softwar...
PDF
AI-Driven Software Quality Assurance in the Age of DevOps
PDF
A three-step combination strategy for addressing outliers and class imbalance...
PDF
Predicting Fault-Prone Files using Machine Learning
PDF
A Tale of Experiments on Bug Prediction
A GENERAL SOFTWARE DEFECT-PRONENESS PREDICTION FRAMEWORK.ppt
A survey of fault prediction using machine learning algorithms
Practical Guidelines to Improve Defect Prediction Model – A Review
Predict Software Reliability Before the Code is Written
An Exploration of Challenges Limiting Pragmatic Software Defect Prediction
Study of Software Defect Prediction using Forward Pass RNN with Hyperbolic Ta...
 
IRJET- A Novel Approach on Computation Intelligence Technique for Softwar...
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...
Development of software defect prediction system using artificial neural network
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
A simplified predictive framework for cost evaluation to fault assessment usi...
A novel approach to enhancing software quality assurance through early detect...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Towards formulating dynamic model for predicting defects in system testing us...
Insights of effectivity analysis of learning-based approaches towards softwar...
AI-Driven Software Quality Assurance in the Age of DevOps
A three-step combination strategy for addressing outliers and class imbalance...
Predicting Fault-Prone Files using Machine Learning
A Tale of Experiments on Bug Prediction
Ad

Recently uploaded (20)

PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Introduction to Artificial Intelligence
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
System and Network Administraation Chapter 3
PDF
Nekopoi APK 2025 free lastest update
PDF
System and Network Administration Chapter 2
PDF
AI in Product Development-omnex systems
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
ai tools demonstartion for schools and inter college
PPTX
history of c programming in notes for students .pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
medical staffing services at VALiNTRY
PPTX
Essential Infomation Tech presentation.pptx
Upgrade and Innovation Strategies for SAP ERP Customers
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Introduction to Artificial Intelligence
Operating system designcfffgfgggggggvggggggggg
System and Network Administraation Chapter 3
Nekopoi APK 2025 free lastest update
System and Network Administration Chapter 2
AI in Product Development-omnex systems
Odoo POS Development Services by CandidRoot Solutions
Softaken Excel to vCard Converter Software.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
ai tools demonstartion for schools and inter college
history of c programming in notes for students .pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
medical staffing services at VALiNTRY
Essential Infomation Tech presentation.pptx
Ad

Predicting Defective Lines Using a Model-Agnostic Technique

  • 1. Predicting Defective Lines Using a Model-AgnosticTechnique Journal-First Paper (Transactions on Software Engineering) Supatsara Wattanakriengkrai Patanamon Thongtanunam Chakkrit Tantithamthavorn Hideaki Hata Kenichi Matsumoto
  • 2. An SQA team needs to carefully identify defects in changed files that will be merged into the release branch 2 Source Code Files Version Control System Localize defects 
 Defect-Prone Code SQATeam B. Adams and S. McIntosh, “Modern release engineering in a nutshell–why researchers should care,” in SANER, 2016, pp. 78–90.
  • 3. To spend the optimal effort, the SQA team needs to prioritize files that are likely to have defects in the future 3 Source Code Files Version Control System Prioritize files SQATeam Ranked Source Code Files Defect-Prone Code 
 Localize defective lines 
 B. Adams and S. McIntosh, “Modern release engineering in a nutshell–why researchers should care,” in SANER, 2016, pp. 78–90.
  • 4. Defect prediction models are proposed to help SQA teams prioritize their effort 4 Source Code Files Version Control System Prioritize files using defect models Ranked Source Code Files 
 [Kamei et al. ICSME 2010] [Mende and Koschke CSMR 2010] 

  • 5. However, developers could still waste an SQA effort on manually identifying the most risky lines 5 Source Code Files Version Control System Prioritize files using defect models Ranked Source Code Files Defect-Prone Code 
 Localize defective lines 
 SQATeam [Kamei et al. ICSME 2010] [Mende and Koschke CSMR 2010] 

  • 6. As little as 1%-3% of the lines of code in a file are actually defective after release 6 Studied systems Activemq Camel Derby Groovy Hbase Hive Jruby Lucene Wicket %Defective files 2-7% 2-8% 6-28% 2-4% 7-11% 6-19% 2-13% 2-8% 2-16% %Defective lines in defective files (at the median) 2% 2% 2% 2% 1% 2% 2% 3% 3% **Detective lines are the source code lines that will be changed by bug-fixing commits to fix post-release defects
  • 7. As little as 1%-3% of the lines of code in a file are actually defective after release 7 Developers may still waste 97%-99% of the effort when inspecting defect-prone files
  • 8. Predicting Defective Lines Using a Model-AgnosticTechnique 8 Defect Dataset Extracting Features Building file-level defect models File_A.java File_B.java File_C.java Token_1 Token_2 
 Token_N An overview of building file-level defect models
  • 9. Predicting Defective Lines Using a Model-AgnosticTechnique 9 File-Level Defect Models LIME Testing Files Predicting defect-prone lines Defect-Prone Files Identifying defect-prone lines Identified Defect- Prone Lines Ranking defect-prone lines 
 Most-risky Least-risky [Ribeiro et al. SIGKDD 2016] **LIME is a model-agnostic technique that aims to mimic the behavior of the predictions of the defect model by explaining the individual predictions
  • 10. An illustrative example of our approach for identifying defect-prone lines 10 LIME A Defect-Prone File Identified defect-prone lines File-Level Defect Prediction Model A File of Interest (Testing file) LIME oldCurrent current node closure Defective (LIME Score >0) Clean (LIME score < 0) 0.8 0.1 -0.3 -0.7 Ranking tokens based on LIME scores Mapping tokens to lines if(closure != null){ Object oldCurrent = current; setClosure(closure, node); closure.call(); current = oldCurrent; } if(closure != null){ Object oldCurrent = current; setClosure(closure, node); closure.call(); current = oldCurrent; } Identified defect-prone lines File-Level Defect Prediction Model A File of Interest (Testing file) LIME oldCurrent current node closure Defective (LIME Score >0) Clean (LIME score < 0) 0.8 0.1 -0.3 -0.7 Ranking tokens based on LIME scores Mapping tokens to lines if(closure != null){ Object oldCurrent = current; setClosure(closure, node); closure.call(); current = oldCurrent; } if(closure != null){ Object oldCurrent = current; setClosure(closure, node); closure.call(); current = oldCurrent; } Mapping tokens to lines Identified defect-prone lines File-Level Defect Prediction Model A File of Interest (Testing file) LIME oldCurrent current node closure Defective (LIME Score >0) Clean (LIME score < 0) 0.8 0.1 -0.3 -0.7 Ranking tokens based on LIME scores Mapping tokens to lines if(closure != null){ Object oldCurrent = current; setClosure(closure, node); closure.call(); current = oldCurrent; } if(closure != null){ Object oldCurrent = current; setClosure(closure, node); closure.call(); current = oldCurrent; } Ranking tokens based on LIME scores Identified Defect- Prone Lines [Ribeiro et al. SIGKDD 2016]
  • 11. An illustrative example of our approach for identifying defect-prone lines 11 Code tokens that frequently appeared in defective files in the past may also appear in the lines that will be fixed after release
  • 13. LIME We compare our approach against six line-level baseline approaches 13 Our Appraoch Random Guessing PMD ErrorProne NLP Random Forest Logistic Regression ErrorProne != null) If ( closure TMI-RF TMI-LR [Copeland PMDApplied 2005] [Aftandilian et al SCAM 2012] [Hellendoorn et al ESEC/FSE 2017] **TMI-LR is a traditional model interpretation based approach with logistic regression **TMI-RF is a traditional model interpretation based approach with random forest
  • 14. Our approach achieves an overall predictive accuracy better than baseline approaches 14 Our Approach Line-level Baseline Approaches Recall 0.61 – 0.62 0.01 - 0.51 MCC 0.04 – 0.05 -0.01 – 0.03 False Alarm 0.47 – 0.48 o.01 -0.54 Distance to Heaven (the root mean square of the recall and false alarm values) 0.43– 0.44 0.52 – 0.70 LIME The higher the better The lower the better
  • 15. Experimental Results 15 Computation Time Ranking Performance Predictive Accuracy Our approach achieves an overall predictive accuracy better than baseline approaches
  • 16. Given a fixed amount of effort, our approach identify actual defective lines better than baseline approaches 16 Our Approach Line-level Baseline Approaches Recall@Top20%LOC 0.26 – 0.27 0.17 – 0.22 Initial False Alarm (the number of clean lines on which developers spend SQA effort until the first defective line is found when lines are ranked) 9 - 16 10- 403 LIME The higher the better The lower the better
  • 17. Experimental Results 17 Computation Time Ranking Performance Predictive Accuracy Our approach achieves an overall predictive accuracy better than baseline approaches Given a fixed amount of effort, our approach identify actual defective lines better than baseline approaches
  • 18. The computational time of our approach is manageable when considering the predictive accuracy of defective lines 18 Within-release Cross-release Our Approach 3 rd 8.46 PMD 4 th 4 th ErrorProne 5 th 5 th NLP 26.85 TMI-LR 6 th 6 th TMI-RF 11.15 3 rd 1 st 1 st 2 nd 2 nd
  • 19. Computation Time Ranking Performance Predictive Accuracy Experimental Results 19 Our approach achieves an overall predictive accuracy better than baseline approaches Given a fixed amount of effort, our approach identify actual defective lines better than baseline approaches The computational time of our approach is manageable when considering the predictive accuracy of defective lines
  • 20. Predicting Defective Lines Using a Model-AgnosticTechnique 20 Our work builds an important step towards line-level defect prediction by leveraging a model-agnostic technique . Our framework will help developers effectively prioritize SQA effort. Supatsara Wattanakriengkrai wattanakri.supatsara.ws3@is.naist.jp
  • 21. 21