SlideShare a Scribd company logo
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Paper ID- xxx
Performance Analysis of Machine Learning
Approaches in Software Complexity
Prediction
Sayed Reza1, Mahfujur Rahman2, Hasnat Parvez3,
Omar Badreddin1, and Shamim Al Mamun3
1 University of Texas, 2 Daffodil International University and 3
Jahangirnagar University
1
Paper ID -
410
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Introduction
• Software complexity is an undesired characteristic of a software
• Increasing complexity reduces maintainability and sustainability
• Class level complexity
• Method level complexity
• Complexity can be affected by many factors related to code
structures, object-oriented properties, and source code metrics
• Machine learning techniques can automate the process and get rid of
manual process or code rules to detect class complexity
2
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Research Objectives
• Use machine learning techniques to build complexity
classifiers
• The reason behind using machine learning to get rid of
manual process or code rules to detect class complexity.
• Compare the performance of the ML classifiers
• Report the best technique based on performance
metrics
3
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Motivation
• Early detection of software complexity will
empower better software maintenance
• Effective software maintenance facilitates
better quality over time
• And a well qualified software facilitates
• Enhance future software maintainability
• Ensure a sustainable software over time
• Minimize software development efforts over time
• Reduce the software development costs
4
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Research Questions & Study
Design
• RQ1: How source code metrics are correlated with quality attribute:
class complexity?
• This question reveals the relationships between complexity and source code
metrics
• RQ2: How accurately can machine learning approaches predict class
complexity from source code metrics?
• This question is targeted to find out the accuracy of machine learning
approaches in class level complexity detection
5
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
Figure: Study Design
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Dataset Collection
• Dataset for complexity prediction needs diverse set
of repositories
• We search codebase repositories using ModelMine
tool [1] with the following criteria;
• a repository with primary language Java
• a minimum of 5000 commits (proxy of maintenance)
• at least 100 active contributors
• a minimum of 3000 stars and 500 forks (proxy for
popularity )
• 10 repositories and 38,778 classes in total are
selected
6
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
[1] Sayed Mohsin Reza, Omar Badreddin, and Khandoker Rahad. ModelMine: A tool to facilitate mining models from open-source repositories. In 2020 ACM/IEEE 23rd
International Conference on Model Driven Engineering Languages and Systems(MODELS). ACM, 2020.
Figure: Class distribution among
repositories
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Dataset Collection
(Continue)
• Input Variables: Extract 18 unique source code
metrics using static analyzer tool from each class
in code repositories
• Target Variable: Extract Current Complexity using
CODEMR tool [2] from each class in code repositories
• The variables are then combined using the class name
to create a dataset for complexity classifier
7
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
[2] Asma Shaheen, Usman Qamar, Aiman Nazir, Raheela Bibi, Munazza Ansar, andIqra Zafar. Oocqm: Object oriented code quality meter. In International Conference on
Computational Science/Intelligence & Applied Informatics, pages 149–163.Springer, 2019.
Table: Source Code
Metrics
… … …
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Dataset Preparation
• Remove the duplicate observations
• Find the outliers to remove the bias datapoints
• Visualize explanatory data analysis on input and
target variables
• Create training (80%) and testing dataset (20%)
8
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
Figure: Relationship of some input
variables with target variable
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Correlation Results
• RQ1: How source code metrics are correlated
with quality attribute: class complexity?
• The results of Pearson correlation reveals
the impact of source code metrics on
complexity.
• The following source code metrics DIT, SRFC,
RFC, WMC, CMLOC and CBO *** have moderately
high impact on complexity
9
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
Figure: Correlation between source code
metrics and complexity
*** DIT = Depth Inheritance Tree, RFC = Response for a Class, CMLOC= Class-Method Lines of Code, CBO = Coupling between objects
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Training & Testing
• In training, we choose 5 different Machine Learning techniques to classify
complexity
1. Naive Bayes (NB)
2. Logistic Regression (LR)
3. Decision Tree (DT)
4. Random Forest (RF) and
5. Ada Boost (AB)
• These are well known classifiers in machine learning and used in several similar
research [3,4]
• Perform 10-fold cross validation to ensure the reduction in variability of
performance results
10
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
[3] Istehad Chowdhury and Mohammad Zulkernine. Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. Journal of Systems Architecture,
57(3):294–313, 2011
[4] Yun Zhang, David Lo, Xin Xia, Bowen Xu, Jianling Sun, and Shanping Li. Combining software metrics and text features for vulnerable file prediction. In 2015 20th
International Conference on Engineering of Complex Computer Systems (ICECCS), pages 40–49. IEEE, 2015.
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Performance
Evaluation
• RQ2: How accurately can machine learning
approaches predict class complexity from
source code metrics?
• Decision Tree & Random Forest classifier
has the highest accuracy and precision
compared to other classifiers.
• Random Forest has highest recall & F1
score
• Is that all to declare best technique?
11
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
Figure: Relative performance of ML
classifiers
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Performance
Evaluation (Continue)
• We focus on false negative rate to reduce the risk of
false alarms
• Higher FN Rate -> High number of high complex classes are detected as
Low [Very Risky Model]
• Lower FN Rate -> low number of high complex classes are detected as
Low [Less Risky Model]
• Still, Random Forest(RF) shows lower FN rate compared to
others
• The reason behind this we find out that RF use
bootstrapping random re-sample technique and working
with significant elements which works much better in
prediction.
12
Dataset
Collection
Dataset
Preparation
Correlation
Analysis
(RQ1)
Training
Performance
Evaluation
(RQ2)
Report Best
Technique
Figure: Relative FN rate of
ML classifiers
2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE)
Conclusion
• Problem in quality management: It is undoubtedly necessary to take proper action
before classes are become more complex
• Research Objective & Results
• We compare Machine Learning techniques’ performance to predict class complexity
• Our results shows that Random Forest model is doing better compared to other models
• We also find out the source code metrics which have most impact on class complexity
• Industrial Usage: Using ML automatic prediction on code quality will allow quality
managers, practitioners to take preventive actions against high complex classes
• Long-term Outcome: Ensure a sustainable software, Minimize software development
efforts, Reduce the software development costs over time
13
If you have any questions, email me at sreza3@miners.utep.edu

More Related Content

PPTX
Data collection for software defect prediction
PDF
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
PDF
Complexity metrics and models
PDF
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
PDF
Web Service Antipatterns Detection Using Genetic Programming
PPT
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
PDF
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
PPTX
Measurement Metrics for Object Oriented Design
Data collection for software defect prediction
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
Complexity metrics and models
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
Web Service Antipatterns Detection Using Genetic Programming
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
ICSME 2016: Search-Based Peer Reviewers Recommendation in Modern Code Review
Measurement Metrics for Object Oriented Design

What's hot (20)

PDF
Re2018 Semios for Requirements
PDF
Finding Bad Code Smells with Neural Network Models
PDF
Using cyclomatic complexity to measure code complexity
PDF
Using Interactive Genetic Algorithm for Requirements Prioritization
PDF
IRJET- Attribute Based Adaptive Evaluation System
PDF
Software bug prediction
PDF
The comparison of the text classification methods to be used for the analysis...
PDF
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
PDF
76201929
PDF
130817 latifa guerrouj - context-aware source code vocabulary normalization...
PDF
Functional Verification of Large-integers Circuits using a Cosimulation-base...
PDF
Reusability Metrics for Object-Oriented System: An Alternative Approach
PDF
Similar Characteristics of Internal Software Quality Attributes for Object-Or...
PPTX
Action-based Recommendation in Pull-request Development
PPTX
Model Manipulation for End-User Modelers
PDF
Recommending Software Refactoring Using Search-based Software Enginnering
PDF
WCRE11b.ppt
PDF
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
PPT
Thesis Giani UIC Slides EN
PDF
AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...
Re2018 Semios for Requirements
Finding Bad Code Smells with Neural Network Models
Using cyclomatic complexity to measure code complexity
Using Interactive Genetic Algorithm for Requirements Prioritization
IRJET- Attribute Based Adaptive Evaluation System
Software bug prediction
The comparison of the text classification methods to be used for the analysis...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
76201929
130817 latifa guerrouj - context-aware source code vocabulary normalization...
Functional Verification of Large-integers Circuits using a Cosimulation-base...
Reusability Metrics for Object-Oriented System: An Alternative Approach
Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Action-based Recommendation in Pull-request Development
Model Manipulation for End-User Modelers
Recommending Software Refactoring Using Search-based Software Enginnering
WCRE11b.ppt
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
Thesis Giani UIC Slides EN
AN EMPIRICAL STUDY ON THE POTENTIAL USEFULNESS OF DOMAIN MODELS FOR COMPLETEN...
Ad

Similar to Performance analysis of machine learning approaches in software complexity prediction by sayed mohsin reza at tcce 2020 conference (20)

PDF
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
PDF
Large Language Models for Test Case Evolution and Repair
PDF
A survey of fault prediction using machine learning algorithms
PDF
Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...
PPTX
software engineering module i & ii.pptx
PDF
Can ML help software developers? (TEQnation 2022)
PDF
IRJET- Analysis of Software Cost Estimation Techniques
PDF
LIFT: A Legacy InFormation retrieval Tool
PDF
IRJET- Deep Learning Model to Predict Hardware Performance
PDF
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
PDF
Mumbai University M.E computer engg syllabus
PPTX
A Comprehensive Overview Of Techniquess For Measuring System Readiness Final ...
PDF
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
PDF
Using Data Mining to Identify COSMIC Function Point Measurement Competence
PDF
Computer Oraganisation and Architecture
PDF
Computer Organisation and Architecture Teaching Trends
PDF
Automated Test Case Repair Using Language Models
PPT
CSE320 SOFTWARE ENGINEERING Lecture01 (1).ppt
PPTX
TnT-LLM : Text Mining at Scale with Large Language Models
PDF
ICPE 2022 - Data Challenge
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
Large Language Models for Test Case Evolution and Repair
A survey of fault prediction using machine learning algorithms
Mathematical Modeling using MATLAB, by U.M. Sundar Senior Application Enginee...
software engineering module i & ii.pptx
Can ML help software developers? (TEQnation 2022)
IRJET- Analysis of Software Cost Estimation Techniques
LIFT: A Legacy InFormation retrieval Tool
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
Mumbai University M.E computer engg syllabus
A Comprehensive Overview Of Techniquess For Measuring System Readiness Final ...
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
Using Data Mining to Identify COSMIC Function Point Measurement Competence
Computer Oraganisation and Architecture
Computer Organisation and Architecture Teaching Trends
Automated Test Case Repair Using Language Models
CSE320 SOFTWARE ENGINEERING Lecture01 (1).ppt
TnT-LLM : Text Mining at Scale with Large Language Models
ICPE 2022 - Data Challenge
Ad

Recently uploaded (20)

PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Lesson notes of climatology university.
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Sports Quiz easy sports quiz sports quiz
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
01-Introduction-to-Information-Management.pdf
Cell Structure & Organelles in detailed.
Final Presentation General Medicine 03-08-2024.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Complications of Minimal Access Surgery at WLH
VCE English Exam - Section C Student Revision Booklet
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Lesson notes of climatology university.
102 student loan defaulters named and shamed – Is someone you know on the list?
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Basic Mud Logging Guide for educational purpose
Microbial diseases, their pathogenesis and prophylaxis
Sports Quiz easy sports quiz sports quiz
O7-L3 Supply Chain Operations - ICLT Program

Performance analysis of machine learning approaches in software complexity prediction by sayed mohsin reza at tcce 2020 conference

  • 1. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Paper ID- xxx Performance Analysis of Machine Learning Approaches in Software Complexity Prediction Sayed Reza1, Mahfujur Rahman2, Hasnat Parvez3, Omar Badreddin1, and Shamim Al Mamun3 1 University of Texas, 2 Daffodil International University and 3 Jahangirnagar University 1 Paper ID - 410
  • 2. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Introduction • Software complexity is an undesired characteristic of a software • Increasing complexity reduces maintainability and sustainability • Class level complexity • Method level complexity • Complexity can be affected by many factors related to code structures, object-oriented properties, and source code metrics • Machine learning techniques can automate the process and get rid of manual process or code rules to detect class complexity 2
  • 3. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Research Objectives • Use machine learning techniques to build complexity classifiers • The reason behind using machine learning to get rid of manual process or code rules to detect class complexity. • Compare the performance of the ML classifiers • Report the best technique based on performance metrics 3
  • 4. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Motivation • Early detection of software complexity will empower better software maintenance • Effective software maintenance facilitates better quality over time • And a well qualified software facilitates • Enhance future software maintainability • Ensure a sustainable software over time • Minimize software development efforts over time • Reduce the software development costs 4
  • 5. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Research Questions & Study Design • RQ1: How source code metrics are correlated with quality attribute: class complexity? • This question reveals the relationships between complexity and source code metrics • RQ2: How accurately can machine learning approaches predict class complexity from source code metrics? • This question is targeted to find out the accuracy of machine learning approaches in class level complexity detection 5 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique Figure: Study Design
  • 6. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Dataset Collection • Dataset for complexity prediction needs diverse set of repositories • We search codebase repositories using ModelMine tool [1] with the following criteria; • a repository with primary language Java • a minimum of 5000 commits (proxy of maintenance) • at least 100 active contributors • a minimum of 3000 stars and 500 forks (proxy for popularity ) • 10 repositories and 38,778 classes in total are selected 6 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique [1] Sayed Mohsin Reza, Omar Badreddin, and Khandoker Rahad. ModelMine: A tool to facilitate mining models from open-source repositories. In 2020 ACM/IEEE 23rd International Conference on Model Driven Engineering Languages and Systems(MODELS). ACM, 2020. Figure: Class distribution among repositories
  • 7. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Dataset Collection (Continue) • Input Variables: Extract 18 unique source code metrics using static analyzer tool from each class in code repositories • Target Variable: Extract Current Complexity using CODEMR tool [2] from each class in code repositories • The variables are then combined using the class name to create a dataset for complexity classifier 7 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique [2] Asma Shaheen, Usman Qamar, Aiman Nazir, Raheela Bibi, Munazza Ansar, andIqra Zafar. Oocqm: Object oriented code quality meter. In International Conference on Computational Science/Intelligence & Applied Informatics, pages 149–163.Springer, 2019. Table: Source Code Metrics … … …
  • 8. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Dataset Preparation • Remove the duplicate observations • Find the outliers to remove the bias datapoints • Visualize explanatory data analysis on input and target variables • Create training (80%) and testing dataset (20%) 8 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique Figure: Relationship of some input variables with target variable
  • 9. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Correlation Results • RQ1: How source code metrics are correlated with quality attribute: class complexity? • The results of Pearson correlation reveals the impact of source code metrics on complexity. • The following source code metrics DIT, SRFC, RFC, WMC, CMLOC and CBO *** have moderately high impact on complexity 9 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique Figure: Correlation between source code metrics and complexity *** DIT = Depth Inheritance Tree, RFC = Response for a Class, CMLOC= Class-Method Lines of Code, CBO = Coupling between objects
  • 10. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Training & Testing • In training, we choose 5 different Machine Learning techniques to classify complexity 1. Naive Bayes (NB) 2. Logistic Regression (LR) 3. Decision Tree (DT) 4. Random Forest (RF) and 5. Ada Boost (AB) • These are well known classifiers in machine learning and used in several similar research [3,4] • Perform 10-fold cross validation to ensure the reduction in variability of performance results 10 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique [3] Istehad Chowdhury and Mohammad Zulkernine. Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. Journal of Systems Architecture, 57(3):294–313, 2011 [4] Yun Zhang, David Lo, Xin Xia, Bowen Xu, Jianling Sun, and Shanping Li. Combining software metrics and text features for vulnerable file prediction. In 2015 20th International Conference on Engineering of Complex Computer Systems (ICECCS), pages 40–49. IEEE, 2015.
  • 11. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Performance Evaluation • RQ2: How accurately can machine learning approaches predict class complexity from source code metrics? • Decision Tree & Random Forest classifier has the highest accuracy and precision compared to other classifiers. • Random Forest has highest recall & F1 score • Is that all to declare best technique? 11 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique Figure: Relative performance of ML classifiers
  • 12. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Performance Evaluation (Continue) • We focus on false negative rate to reduce the risk of false alarms • Higher FN Rate -> High number of high complex classes are detected as Low [Very Risky Model] • Lower FN Rate -> low number of high complex classes are detected as Low [Less Risky Model] • Still, Random Forest(RF) shows lower FN rate compared to others • The reason behind this we find out that RF use bootstrapping random re-sample technique and working with significant elements which works much better in prediction. 12 Dataset Collection Dataset Preparation Correlation Analysis (RQ1) Training Performance Evaluation (RQ2) Report Best Technique Figure: Relative FN rate of ML classifiers
  • 13. 2nd International Conference on Trends in Computational and Cognitive Engineering (TCCE) Conclusion • Problem in quality management: It is undoubtedly necessary to take proper action before classes are become more complex • Research Objective & Results • We compare Machine Learning techniques’ performance to predict class complexity • Our results shows that Random Forest model is doing better compared to other models • We also find out the source code metrics which have most impact on class complexity • Industrial Usage: Using ML automatic prediction on code quality will allow quality managers, practitioners to take preventive actions against high complex classes • Long-term Outcome: Ensure a sustainable software, Minimize software development efforts, Reduce the software development costs over time 13 If you have any questions, email me at sreza3@miners.utep.edu