SlideShare a Scribd company logo
chakkrit.tantithamthavorn@monash.edu @klainfohttp://chakkrit.com
Communicated by Sunghun Kim
The Impact of Class Rebalancing
Techniques on the Performance and
Interpretation of Defect Models
Chakkrit (Kla)

Tantithamthavorn
Ahmed

Hassan
Kenichi

Matsumoto
Analytical 

Models
.
.
. ..
. .
.
.
..
DEFECT MODELS IN A NUTSHELL
An analytical model trained on historical data to predict and explain future software defects
BUG
CLEAN
A.java
B.java
C.java
D.java
FILE CLASSMETRICS
……..
CLEAN
Predict future 

software defects
Explain which factors

are associated with 

defect-proneness
Lewis et al.,
ICSE’13
Mockus et al.,
BLTJ’00
Ostrand et al.,
TSE’05
Kim et al.,
FSE’15
Zimmermann et
al., FSE’09

Naggappan et al.,
ICSE’06
Caglayan et al.,
ICSE’15
Tan et al.,
ICSE’15
Shimagaki et al.,
ICSE’16
Defect Dataset
CLEAN
Analytical 

Models
Defect Dataset .
.
. ..
. .
.
.
..
DEFECT DATASETS ARE IMBALANCED!
The proportion of defective and clean modules is not equally represented
BUG
CLEAN
A.java
B.java
C.java
D.java
FILE CLASSMETRICS
CLEAN
CLEAN
Predict future 

software defects
Explain which factors

are associated with 

defect-pronenessTraditional classification techniques often fail
to accurately identify the minority class (i.e.,
defective modules)
……..
HOW IMBALANCED ARE DEFECT DATASETS?
A histogram of the defective ratios of the 101 defect datasets
We assess 101 publicly-available defect datasets
• 76 from PROMISE
• 12 from NASA
• 5 from Kim et al.
• 5 from D’Ambros et al
• 3 from Zimmermann et al.
HOW IMBALANCED ARE DEFECT DATASETS?
A histogram of the defective ratios of the 101 defect datasets
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60 70 80 90 100
Defective Ratio
Percentage
We assess 101 publicly-available defect datasets
• 76 from PROMISE
• 12 from NASA
• 5 from Kim et al.
• 5 from D’Ambros et al
• 3 from Zimmermann et al.
HOW IMBALANCED ARE DEFECT DATASETS?
A histogram of the defective ratios of the 101 defect datasets
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60 70 80 90 100
Defective Ratio
Percentage
We assess 101 publicly-available defect datasets
• 76 from PROMISE
• 12 from NASA
• 5 from Kim et al.
• 5 from D’Ambros et al
• 3 from Zimmermann et al.
64% of the defect datasets have a
defective ratio below 30%
HOW IMBALANCED ARE DEFECT DATASETS?
A histogram of the defective ratios of the 101 defect datasets
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60 70 80 90 100
Defective Ratio
Percentage
We assess 101 publicly-available defect datasets
• 76 from PROMISE
• 12 from NASA
• 5 from Kim et al.
• 5 from D’Ambros et al
• 3 from Zimmermann et al.
64% of the defect datasets have a
defective ratio below 30%
As little as 8% of defect
datasets have a defective
ratio between 45%-55%
HOW IMBALANCED ARE DEFECT DATASETS?
A histogram of the defective ratios of the 101 defect datasets
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60 70 80 90 100
Defective Ratio
Percentage
We assess 101 publicly-available defect datasets
• 76 from PROMISE
• 12 from NASA
• 5 from Kim et al.
• 5 from D’Ambros et al
• 3 from Zimmermann et al.
64% of the defect datasets have a
defective ratio below 30%
As little as 8% of defect
datasets have a defective
ratio between 45%-55%
Class imbalance is prominent in defect datasets, likely affecting the
performance and interpretation of defect models
TO MITIGATE THE RISK OF CLASS IMBALANCE
Class rebalancing techniques (i.e., techniques for rebalancing the proportion of defective and clean
modules of the training corpus) are often applied
Original
Dataset
MajorityClassMinorityClass
Re-sampled
Dataset
A
B
A
B
A
B
Over-Sampling

Technique
Original
Dataset
Re-sampled
Dataset
A
B
A
B
Under-Sampling

Technique
SMOTE

Technique
ROSE

Techniqu
Original
Dataset
R
A
B
A
B
MajorityClassMinorityClass
MajorityClassMinorityClass
Original
Dataset
MajorityClassMinorityClass
Re-sampled
Dataset
A
B
…
…
A
B
…
…
SyntheticMinorityClass
SHOULD WE REBALANCE OR NOT?
Prior studies arrive at contradictory conclusions, which make it hard to derive practical guidelines
Improve the F-measure 

by 7.8%-22.4%
[Kamei et al.]
Do not improve the percentage

of correctly classified modules 

(i.e., Accuracy) [Riquelme et al.]
Are not harmful when
defective ratio > 20%
[Mahmood et al.]
4 classification techniques, 2
datasets, 3 measures
2 classification techniques, 4
datasets, 2 measures
A meta-analysis of 42 primary
defect prediction studies
SHOULD WE REBALANCE OR NOT?
Class rebalancing techniques may lead to bias in the learned concepts (i.e., concept drift)
B. Turhan, “On the dataset shift problem in software engineering prediction models,” EMSE’11.
Knowledge
Data
Model
World
Decision/Policy

Making
SHOULD WE REBALANCE OR NOT?
Class rebalancing techniques may lead to bias in the learned concepts (i.e., concept drift)
B. Turhan, “On the dataset shift problem in software engineering prediction models,” EMSE’11.
Decision/Policy

Making
Knowledge
Data
Model
World
Data is not representative to
the world
The learned model
may be biased
Different knowledge
Incorrect action plans
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATIONTYPES OF ANALYSIS
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
CASE STUDY SETUP
Study #Classification #datasets Measures
Kamei et al. 4 2 P, R, and F1
Riquelme et al 2 4 AUC
Wang et al. 2 5 PD, PF, Balance, G-mean, AUC
Tan et al. 7 7 P, R, and F1
Agrawal et al. 6 9 P, R, PF, AUC
Bennin et al. 5 40 P, R, AUC, Balance, G-mean
Our study 7 101 10 performance measures
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATIONTYPES OF ANALYSIS
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATIONTYPES OF ANALYSIS
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATIONTYPES OF ANALYSIS
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
WHICH EXPERIMENTAL SETTINGS YEILD THE
BEST BENEFITS?
Defective Ratio Classification 

Techniques
Class Rebalancing 

Techniques
+ ++Metrics Family
+The Risk of Overfitting

(Events Per Variable, EPV)
~Performance
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
Logistic regression models with
under-sampling to defect datasets
(an EPV ratio higher than 40)
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
Logistic regression models with
under-sampling to defect datasets
(an EPV ratio higher than 40)
Neural network is the most sensitive
technique, while Naive Bayes is the
least sensitive technique to class
rebalancing techniques
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
SMOTETUNED by 

[Agrawal and Menzies, ICSE'18]
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Logistic regression models with
under-sampling to defect datasets
(an EPV ratio higher than 40)
Neural network is the most sensitive
technique, while Naive Bayes is the
least sensitive technique to class
rebalancing techniques
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
SMOTETUNED by 

[Agrawal and Menzies, ICSE'18]
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
Similarly, the SMOTE parameter must
be optimized to improve AUC. Works
best with NNet, GBM, RF, and C5.0
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Logistic regression models with
under-sampling to defect datasets
(an EPV ratio higher than 40)
Neural network is the most sensitive
technique, while Naive Bayes is the
least sensitive technique to class
rebalancing techniques
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
SMOTETUNED by 

[Agrawal and Menzies, ICSE'18]
PERFORMANCE .
.
. ..
. .
.
.
..
INTERPRETATION
WHICH EXPERIMENTAL
SETTINGS YIELD THE
BEST BENEFITS?
TYPES OF ANALYSIS
Similarly, the SMOTE parameter must
be optimized to improve AUC. Works
best with NNet, GBM, RF, and C5.0
SMOTETUNED still has a large
impact on the model interpretation
WHAT IS THE IMPACT OF
CLASS REBALANCING
TECHNIQUES?
Unfortunately, class rebalancing
techniques have a large impact on
the model interpretation
Logistic regression models with
under-sampling to defect datasets
(an EPV ratio higher than 40)
Neural network is the most sensitive
technique, while Naive Bayes is the
least sensitive technique to class
rebalancing techniques
Class rebalancing techniques:
- Have little impact on AUC
- Improve Recall
- Decrease Precision
TAKE

AWAY
For predictions
- Use optimised SMOTE for AUC
- Use under-sampling for Recall
For interpretations
- Don’t apply anything!!!!
chakkrit.tantithamthavorn@monash.edu
@klainfohttp://chakkrit.com
Dr. Chakkrit (Kla) Tantithamthavorn

More Related Content

PDF
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
PDF
Automated parameter optimization should be included in future 
defect predict...
PDF
AI-Driven Software Quality Assurance in the Age of DevOps
PDF
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
PDF
Towards a Better Understanding of the Impact of Experimental Components on De...
PDF
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
PDF
Mining Software Defects: Should We Consider Affected Releases?
PDF
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...
Software Analytics In Action: A Hands-on Tutorial on Mining, Analyzing, Model...
Automated parameter optimization should be included in future 
defect predict...
AI-Driven Software Quality Assurance in the Age of DevOps
Explainable Artificial Intelligence (XAI) 
to Predict and Explain Future Soft...
Towards a Better Understanding of the Impact of Experimental Components on De...
An Empirical Comparison of Model Validation Techniques for Defect Prediction ...
Mining Software Defects: Should We Consider Affected Releases?
The Impact of Mislabelling on the Performance and Interpretation of Defect Pr...

What's hot (20)

PDF
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
PPTX
Feature Selection Techniques for Software Fault Prediction (Summary)
PDF
The adoption of machine learning techniques for software defect prediction: A...
PPTX
A software fault localization technique based on program mutations
PDF
A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...
PPT
Experiments on Design Pattern Discovery
PDF
Software testing effort estimation with cobb douglas function a practical app...
PDF
[Tho Quan] Fault Localization - Where is the root cause of a bug?
PPT
Complexity Measures for Secure Service-Orieted Software Architectures
PDF
Speeding-up Software Testing With Computational Intelligence
PDF
AI in SE: A 25-year Journey
PDF
Software testing defect prediction model a practical approach
PDF
SSBSE 2020 keynote
PDF
Bug Triage: An Automated Process
PDF
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
PDF
SBST 2019 Keynote
PDF
Practical Guidelines to Improve Defect Prediction Model – A Review
PPT
Using Developer Information as a Prediction Factor
PDF
Final Exam Questions Fall03
PDF
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Leveraging HPC Resources to Improve the Experimental Design of Software Analy...
Feature Selection Techniques for Software Fault Prediction (Summary)
The adoption of machine learning techniques for software defect prediction: A...
A software fault localization technique based on program mutations
A Review on Parameter Estimation Techniques of Software Reliability Growth Mo...
Experiments on Design Pattern Discovery
Software testing effort estimation with cobb douglas function a practical app...
[Tho Quan] Fault Localization - Where is the root cause of a bug?
Complexity Measures for Secure Service-Orieted Software Architectures
Speeding-up Software Testing With Computational Intelligence
AI in SE: A 25-year Journey
Software testing defect prediction model a practical approach
SSBSE 2020 keynote
Bug Triage: An Automated Process
ESTIMATING HANDLING TIME OF SOFTWARE DEFECTS
SBST 2019 Keynote
Practical Guidelines to Improve Defect Prediction Model – A Review
Using Developer Information as a Prediction Factor
Final Exam Questions Fall03
Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies
Ad

Similar to The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models (20)

PDF
A three-step combination strategy for addressing outliers and class imbalance...
PDF
An overview on data mining designed for imbalanced datasets
PDF
An overview on data mining designed for imbalanced datasets
DOCX
Relationships between diversity of classification ensembles and single class
PDF
Dotnet relationships between diversity of classification ensembles and singl...
PDF
Java relationships between diversity of classification ensembles and single-...
PDF
Relationships between diversity of classification ensembles and single class ...
PDF
Java relationships between diversity of classification ensembles and single-...
PDF
Relationships between diversity of classification ensembles and single class ...
PPTX
COMP_GroupA2.pptx
PDF
Analysis of Imbalanced Classification Algorithms A Perspective View
PDF
Dotnet relationships between diversity of classification ensembles and singl...
PDF
Racing for unbalanced methods selection
PPTX
Classification in the database system.pptx
PDF
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
PDF
Dealing with imbalanced data sets.pdf
PPTX
The Class Imbalance Problem: AdaBoost to the Rescue?
PDF
Buddi health class imbalance based deep learning
PDF
A SURVEY OF METHODS FOR HANDLING DISK DATA IMBALANCE
A three-step combination strategy for addressing outliers and class imbalance...
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
Relationships between diversity of classification ensembles and single class
Dotnet relationships between diversity of classification ensembles and singl...
Java relationships between diversity of classification ensembles and single-...
Relationships between diversity of classification ensembles and single class ...
Java relationships between diversity of classification ensembles and single-...
Relationships between diversity of classification ensembles and single class ...
COMP_GroupA2.pptx
Analysis of Imbalanced Classification Algorithms A Perspective View
Dotnet relationships between diversity of classification ensembles and singl...
Racing for unbalanced methods selection
Classification in the database system.pptx
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
Dealing with imbalanced data sets.pdf
The Class Imbalance Problem: AdaBoost to the Rescue?
Buddi health class imbalance based deep learning
A SURVEY OF METHODS FOR HANDLING DISK DATA IMBALANCE
Ad

Recently uploaded (20)

PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Mega Projects Data Mega Projects Data
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Lecture1 pattern recognition............
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
annual-report-2024-2025 original latest.
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Business Analytics and business intelligence.pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Supervised vs unsupervised machine learning algorithms
Mega Projects Data Mega Projects Data
Foundation of Data Science unit number two notes
Introduction-to-Cloud-ComputingFinal.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Lecture1 pattern recognition............
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
annual-report-2024-2025 original latest.
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Business Analytics and business intelligence.pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Knowledge Engineering Part 1
Reliability_Chapter_ presentation 1221.5784
Qualitative Qantitative and Mixed Methods.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
MODULE 8 - DISASTER risk PREPAREDNESS.pptx

The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

  • 1. chakkrit.tantithamthavorn@monash.edu @klainfohttp://chakkrit.com Communicated by Sunghun Kim The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Models Chakkrit (Kla)
 Tantithamthavorn Ahmed
 Hassan Kenichi
 Matsumoto
  • 2. Analytical 
 Models . . . .. . . . . .. DEFECT MODELS IN A NUTSHELL An analytical model trained on historical data to predict and explain future software defects BUG CLEAN A.java B.java C.java D.java FILE CLASSMETRICS …….. CLEAN Predict future 
 software defects Explain which factors
 are associated with 
 defect-proneness Lewis et al., ICSE’13 Mockus et al., BLTJ’00 Ostrand et al., TSE’05 Kim et al., FSE’15 Zimmermann et al., FSE’09
 Naggappan et al., ICSE’06 Caglayan et al., ICSE’15 Tan et al., ICSE’15 Shimagaki et al., ICSE’16 Defect Dataset CLEAN
  • 3. Analytical 
 Models Defect Dataset . . . .. . . . . .. DEFECT DATASETS ARE IMBALANCED! The proportion of defective and clean modules is not equally represented BUG CLEAN A.java B.java C.java D.java FILE CLASSMETRICS CLEAN CLEAN Predict future 
 software defects Explain which factors
 are associated with 
 defect-pronenessTraditional classification techniques often fail to accurately identify the minority class (i.e., defective modules) ……..
  • 4. HOW IMBALANCED ARE DEFECT DATASETS? A histogram of the defective ratios of the 101 defect datasets We assess 101 publicly-available defect datasets • 76 from PROMISE • 12 from NASA • 5 from Kim et al. • 5 from D’Ambros et al • 3 from Zimmermann et al.
  • 5. HOW IMBALANCED ARE DEFECT DATASETS? A histogram of the defective ratios of the 101 defect datasets 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 70 80 90 100 Defective Ratio Percentage We assess 101 publicly-available defect datasets • 76 from PROMISE • 12 from NASA • 5 from Kim et al. • 5 from D’Ambros et al • 3 from Zimmermann et al.
  • 6. HOW IMBALANCED ARE DEFECT DATASETS? A histogram of the defective ratios of the 101 defect datasets 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 70 80 90 100 Defective Ratio Percentage We assess 101 publicly-available defect datasets • 76 from PROMISE • 12 from NASA • 5 from Kim et al. • 5 from D’Ambros et al • 3 from Zimmermann et al. 64% of the defect datasets have a defective ratio below 30%
  • 7. HOW IMBALANCED ARE DEFECT DATASETS? A histogram of the defective ratios of the 101 defect datasets 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 70 80 90 100 Defective Ratio Percentage We assess 101 publicly-available defect datasets • 76 from PROMISE • 12 from NASA • 5 from Kim et al. • 5 from D’Ambros et al • 3 from Zimmermann et al. 64% of the defect datasets have a defective ratio below 30% As little as 8% of defect datasets have a defective ratio between 45%-55%
  • 8. HOW IMBALANCED ARE DEFECT DATASETS? A histogram of the defective ratios of the 101 defect datasets 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 70 80 90 100 Defective Ratio Percentage We assess 101 publicly-available defect datasets • 76 from PROMISE • 12 from NASA • 5 from Kim et al. • 5 from D’Ambros et al • 3 from Zimmermann et al. 64% of the defect datasets have a defective ratio below 30% As little as 8% of defect datasets have a defective ratio between 45%-55% Class imbalance is prominent in defect datasets, likely affecting the performance and interpretation of defect models
  • 9. TO MITIGATE THE RISK OF CLASS IMBALANCE Class rebalancing techniques (i.e., techniques for rebalancing the proportion of defective and clean modules of the training corpus) are often applied Original Dataset MajorityClassMinorityClass Re-sampled Dataset A B A B A B Over-Sampling
 Technique Original Dataset Re-sampled Dataset A B A B Under-Sampling
 Technique SMOTE
 Technique ROSE
 Techniqu Original Dataset R A B A B MajorityClassMinorityClass MajorityClassMinorityClass Original Dataset MajorityClassMinorityClass Re-sampled Dataset A B … … A B … … SyntheticMinorityClass
  • 10. SHOULD WE REBALANCE OR NOT? Prior studies arrive at contradictory conclusions, which make it hard to derive practical guidelines Improve the F-measure 
 by 7.8%-22.4% [Kamei et al.] Do not improve the percentage
 of correctly classified modules 
 (i.e., Accuracy) [Riquelme et al.] Are not harmful when defective ratio > 20% [Mahmood et al.] 4 classification techniques, 2 datasets, 3 measures 2 classification techniques, 4 datasets, 2 measures A meta-analysis of 42 primary defect prediction studies
  • 11. SHOULD WE REBALANCE OR NOT? Class rebalancing techniques may lead to bias in the learned concepts (i.e., concept drift) B. Turhan, “On the dataset shift problem in software engineering prediction models,” EMSE’11. Knowledge Data Model World Decision/Policy
 Making
  • 12. SHOULD WE REBALANCE OR NOT? Class rebalancing techniques may lead to bias in the learned concepts (i.e., concept drift) B. Turhan, “On the dataset shift problem in software engineering prediction models,” EMSE’11. Decision/Policy
 Making Knowledge Data Model World Data is not representative to the world The learned model may be biased Different knowledge Incorrect action plans
  • 13. PERFORMANCE . . . .. . . . . .. INTERPRETATIONTYPES OF ANALYSIS WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES?
  • 14. CASE STUDY SETUP Study #Classification #datasets Measures Kamei et al. 4 2 P, R, and F1 Riquelme et al 2 4 AUC Wang et al. 2 5 PD, PF, Balance, G-mean, AUC Tan et al. 7 7 P, R, and F1 Agrawal et al. 6 9 P, R, PF, AUC Bennin et al. 5 40 P, R, AUC, Balance, G-mean Our study 7 101 10 performance measures
  • 15. PERFORMANCE . . . .. . . . . .. INTERPRETATIONTYPES OF ANALYSIS WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES?
  • 16. PERFORMANCE . . . .. . . . . .. INTERPRETATIONTYPES OF ANALYSIS Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES?
  • 17. PERFORMANCE . . . .. . . . . .. INTERPRETATIONTYPES OF ANALYSIS Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision Unfortunately, class rebalancing techniques have a large impact on the model interpretation WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES?
  • 18. PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision Unfortunately, class rebalancing techniques have a large impact on the model interpretation WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES?
  • 19. WHICH EXPERIMENTAL SETTINGS YEILD THE BEST BENEFITS? Defective Ratio Classification 
 Techniques Class Rebalancing 
 Techniques + ++Metrics Family +The Risk of Overfitting
 (Events Per Variable, EPV) ~Performance
  • 20. PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 21. PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS Logistic regression models with under-sampling to defect datasets (an EPV ratio higher than 40) WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 22. PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS Logistic regression models with under-sampling to defect datasets (an EPV ratio higher than 40) Neural network is the most sensitive technique, while Naive Bayes is the least sensitive technique to class rebalancing techniques WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 23. SMOTETUNED by 
 [Agrawal and Menzies, ICSE'18] PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Logistic regression models with under-sampling to defect datasets (an EPV ratio higher than 40) Neural network is the most sensitive technique, while Naive Bayes is the least sensitive technique to class rebalancing techniques Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 24. SMOTETUNED by 
 [Agrawal and Menzies, ICSE'18] PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS Similarly, the SMOTE parameter must be optimized to improve AUC. Works best with NNet, GBM, RF, and C5.0 WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Logistic regression models with under-sampling to defect datasets (an EPV ratio higher than 40) Neural network is the most sensitive technique, while Naive Bayes is the least sensitive technique to class rebalancing techniques Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 25. SMOTETUNED by 
 [Agrawal and Menzies, ICSE'18] PERFORMANCE . . . .. . . . . .. INTERPRETATION WHICH EXPERIMENTAL SETTINGS YIELD THE BEST BENEFITS? TYPES OF ANALYSIS Similarly, the SMOTE parameter must be optimized to improve AUC. Works best with NNet, GBM, RF, and C5.0 SMOTETUNED still has a large impact on the model interpretation WHAT IS THE IMPACT OF CLASS REBALANCING TECHNIQUES? Unfortunately, class rebalancing techniques have a large impact on the model interpretation Logistic regression models with under-sampling to defect datasets (an EPV ratio higher than 40) Neural network is the most sensitive technique, while Naive Bayes is the least sensitive technique to class rebalancing techniques Class rebalancing techniques: - Have little impact on AUC - Improve Recall - Decrease Precision
  • 26. TAKE
 AWAY For predictions - Use optimised SMOTE for AUC - Use under-sampling for Recall For interpretations - Don’t apply anything!!!! chakkrit.tantithamthavorn@monash.edu @klainfohttp://chakkrit.com Dr. Chakkrit (Kla) Tantithamthavorn