SlideShare a Scribd company logo
International Journal of Trend in Scientific Research and Development (IJTSRD)
Conference Issue | March 2019 Available Online: www.ijtsrd.com e-ISSN: 2456 - 6470
Fostering Innovation, Integration and Inclusion Through
Interdisciplinary Practices in Management
@ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 62
Machine Learning Approach for Employee Attrition Analysis
Dr. R. S. Kamath1, Dr. S. S. Jamsandekar2, Dr. P. G. Naik3
1Associate Professor, 2Assistant Professor, 3Professor
1,2,3Department of Computer Studies,
1,2,3Chhatrapati Shahu Institute of Business Education and Research, Kolhapur, Maharashtra, India
Organised By:
Management Department, Chhatrapati
Shahu Institute of Business Education
and Research, Kolhapur, Maharashtra
How to cite this paper: Dr. R. S. Kamath
| Dr. S. S. Jamsandekar | Dr. P.G. Naik
"Machine Learning Approach for
Employee AttritionAnalysis"Published in
International Journal of Trend in
Scientific Research and Development
(ijtsrd), ISSN: 2456-6470, Special Issue |
Fostering Innovation, Integration and
Inclusion Through
Interdisciplinary
Practices in
Management,
March 2019, pp.62-
67, URL:
https://www.ijtsrd.
com/papers/ijtsrd
23065.pdf
ABSTRACT
Talent management involves a lot of managerial decisions to allocate right
people with the right skills employed at appropriate location and time. Authors
report machine learning solution for Human Resource (HR) attrition analysis
and forecast. The data for this investigation is retrieved from Kaggle, a Data
Science and Machine Learning platform [1]. Present study exhibits performance
estimation of various classification algorithms and compares the classification
accuracy. The performance of the model is evaluated in terms of Error Matrix
and Pseudo R Square estimate of error rate. Performance accuracyrevealedthat
Random Forest model can be effectively used for classification. This analysis
concludes that employeeattrition depends moreonemployees’satisfaction level
as compared to other attributes.
INTRODUCTION
The process to identifying the existing talent in an organization is among the top
talent management challenges and the important issue. For every organization,
human resource plays a vital role in all strategic decisions. Satisfied, highly-
motivated and loyal employees represent the basis of a company and which in
turn have impacts on the productivity of an organization.
The prime objective of the present study is to analyze why some of the best and
most experienced employees are leaving prematurely. This analysis also wishes
to predict which valuable employees will leave next.
The rest of paper is designed as follows; Introduction followed by the materials
and methods utilized in the present study. Then the third section summarizesthe
results and discussions of the HR attrition analysis. The conclusion at the end
justifies the suitability of Random Forest model for this talent mining.
Materials and Methods
The dataset for the present analysis is taken from Kaggle, Machine Learning platform [1]. This is the simulated dataset
comprising 15000 employee records classified into two categories (left or not left) based on satisfaction level, latest
evaluation, number of project worked on, average monthlyhours, timespendin thecompany,workaccident,promotion within
the past 5 years, department and salary. Table 1 gives description of employee dataset.
Table 1: Employee dataset description for talent mining
Attribute Description Data Type
satisfaction_level Level of satisfaction (0-1) Numeric
last_evaluation Time since last performance evaluation (in Years) Numeric
number_project Number of projects completed while at work Numeric
average_montly_hours Average monthly hours at workplace Numeric
time_spend_company Number of years spent in the company Numeric
Work_accident Whether the employee had a workplace accident Numeric
Left Whether the employee left the workplace or not (1 or 0) Numeric
promotion_last_5years Whether the employee was promoted in the last five years Numeric
sales Department in which they work for String
salary Relative level of salary (high) String
IJTSRD23065
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 63
This section explores details of experiment conducted for employee attrition analysis and forecasting. The present study is
carried out using R and Rattle data mining platform [4]. Figure 1 shows summary of the HR dataset. Dataset is partitioned
randomlyinto training, testing and validation with division 70%, 15 % and 15% respectively. We used the trainingdatasetfor
parameter adjustment of model whereas validation set to control learning process.
Figure 1: Dataset exploration – Summary
Among the vast machine learning algorithms, authors have picked Decision Tree, Random Forest, Support Vector Machine
(SVM), and Linear Regression techniques to build the model. These algorithms are based on supervised learning and best
known for building prediction models [8]. Supervised learning algorithms try to model relationships and dependencies
between the target prediction output and the input features/ predictors such that we can predict the output values for new
data based on those relationships which it learned from the previous data sets.
Figure 2 explains Decision tree modeling of HR data. It begins with a root node “satisfaction level”, that part into different
branches, prompting to further nodes, each of which may additionally part or else end as a leaf node. Connected with each
nonleaf node will be a test or question that figures out which branch to take after [7]. The leaf nodes indicatetheattrition sates
whether the employee “left” or “not left”. Figure 3 gives pictorial representation of Decision tree thus derived.
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 64
Figure 2: Decision tree modeling
Figure 3: Decision tree for HR attrition status
Figure 4 explains Random Forest Modeling for HR attrition analysis. RANDOMFOREST packagein Renvironmentis employed
here to analyze model structure[5-6]. RF builds manydecision trees usingrandom subset of data andvariables.Rattle provides
access to three parameters such as the number of trees, sample size and number of variables for tuning the models.
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 65
Figure 4: Summary of the Random Forest Model
Figure 5 explains Support Vector Machine (SVM) designedfortheattrition analysisof employeedata. SVM searches forsupport
a vector that separates the class.
Figure 5: Summary of SVM Model
Figure 6 explains Linear Regression Model. It is the traditional method for fitting a statistical model to data. It is
appropriate since the target variable “attrition status” is numeric.
Figure 6: Summary of Logistic Regression model
Results and Discussion
The present investigation employed different prediction algorithms to analyze employee attrition status and likelihood of
retention-attrition of employees. The performance of the model is evaluated in terms of Error Matrix and Pseudo R Square
estimate of error rate. An error matrix shows the true outcomes against the predicted outcomes. It is also known as confusion
matrix. Table 2 explains performance analysis of these classifiers in terms of error matrix.
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 66
Table 2: Performance Analysis of the Classifiers
Model Error Matrix
Decision Tree
Random Forest
Support Vector Machine
Liner Model
Figure 7, the “Predicted versus Observed” plot shows the performance analysis of all the four models. The plot displays the
predicted values against the observed values. The Pseudo R-Squared, square of the correlation between the predicted and
observed values. The closer to 1, is the acceptable one. Table 3 gives Pseudo R-Square values for these four models.
Figure 7: “Predicted versus Observed plot” for classifiers
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 67
Table 1: Performance accuracy of classifiers
Classifier Pseudo R-square
Decision Tree 0.8473
Random Forest 0.9773
Support Vector Machine 0.8315
Linear Regression 0.2299
Confusion matrix and “Predicted versus Observed” plot concludes that Random Forest is the appropriatemodelforanalysis of
Employee attrition as compared to the other algorithms considered in this studyandtheunderlined data.Figure8 explains the
relative importance of HR dataset attributes using Gini importanceandPermutation importancemeasures. Basedon thesetwo
measures, it reveals that employees’ “satisfaction level” is the predominant predictor of employee attrition.
Figure 8: Dependency of employee attrition status on other attributes
Conclusion
Authors have explored a machine learning solution for HR
attrition analysis and forecast. Present study exhibits
performance estimation of various classification algorithms
and compares the classification accuracy. The performance
of the model is evaluated in terms of Error Matrix and
Pseudo R Square estimate of error rate. Performance
accuracy revealed that Random Forest model can be
effectively used for classification. The result also concludes
that employee attrition depends more on employees’
satisfaction level as compared to other attributes.
References:
[1] Retrieved on 30th Dec, 2017 from
https://www.kaggle.com/ludobenistant/hr-analytics-
1/data
[2] Boudreau, J. W. – Ramstad, P. M.: Beyond HR. Boston.
Harvard Business School Press, 2007.ISBN978-1-4221-
0415-6.
[3] https://www.infogix.com/blog/machine-learning-vs-
statistical-modeling-the-real-difference, accessed date
28/012/2017
[4] Graham, W. Data Mining with Rattle and R: The Art of
Excavating Data for KnowledgeDiscovery,Springer,DOI
10.1007/978-1-4419-9890-3
[5] Breiman, L. (2005), Random Forest. Machine
Learning,45, 5-32
[6] Andy, L., & Matthew, W. (2002). Classification and
Regression by random Forest, R News, 2(3)
[7] R. S. Kamath, R. K. Kamat (2016), Modeling of Random
Textured Tandem Silicon Solar Cells Characteristics:
Decision Tree Approach, Journal of Nano and Electronic
Physics, Vol. 8 No 4(1), 04021(4pp)
[8] R. S. Kamath, R .K. Kamat (2016), Supervised Learning
Model for Kick starter Campaigns with R Mining,
International Journal of Information Technology,
Modeling and Computing (IJITMC), Vol. 4, No.1,
February, 19-30
Copyright © 2019 by author(s) and
International Journal of Trend in
Scientific Research and Development
Journal. This is an Open Access article distributed under the
terms of the Creative Commons Attribution License (CC BY
4.0) (http://creativecommons.org/licenses/by/4.0)

More Related Content

PPTX
ATTRITION ppt
PDF
Predicting Employee Attrition
PPTX
Predicting Employee Attrition
PPTX
Employee Attrition Analysis / Churn Prediction
PDF
IBM HR Analytics Employee Attrition & Performance
PDF
IRJET - Customer Churn Analysis in Telecom Industry
PPTX
Group 6 employee_attrition
PDF
Employee Attrition Rate, MBA HR, Final Project Report.
ATTRITION ppt
Predicting Employee Attrition
Predicting Employee Attrition
Employee Attrition Analysis / Churn Prediction
IBM HR Analytics Employee Attrition & Performance
IRJET - Customer Churn Analysis in Telecom Industry
Group 6 employee_attrition
Employee Attrition Rate, MBA HR, Final Project Report.

What's hot (20)

PPTX
Employee Attrition Analysis
PDF
Hr analytics project
PPTX
Hrm of walmart
PPT
Attrition in HCL
PPTX
Customer Churn Analysis and Prediction
PDF
Telecom Churn Prediction
PPT
ATTRITION,EMPLOYEE RETENTION
PPT
Attrition Rate
PPTX
Telecom Churn Analysis
PPTX
Employee attrition and retention.pptx
PPTX
Data cleansing
DOCX
Project (employee engagement)
PPTX
Employee Attrition
PPTX
WALMART HUMAN RESOURCE MANAGEMENT
PDF
churn prediction in telecom
PPTX
Regression analysis in HR
DOCX
HRM Case STUDY Qusetion AND Answer
DOCX
EMPLOYEE ENGAGEMENT PROJECT
PPTX
Sourcing talent as key recruiting differentiator part 1 A
PDF
Customer churn prediction in banking
Employee Attrition Analysis
Hr analytics project
Hrm of walmart
Attrition in HCL
Customer Churn Analysis and Prediction
Telecom Churn Prediction
ATTRITION,EMPLOYEE RETENTION
Attrition Rate
Telecom Churn Analysis
Employee attrition and retention.pptx
Data cleansing
Project (employee engagement)
Employee Attrition
WALMART HUMAN RESOURCE MANAGEMENT
churn prediction in telecom
Regression analysis in HR
HRM Case STUDY Qusetion AND Answer
EMPLOYEE ENGAGEMENT PROJECT
Sourcing talent as key recruiting differentiator part 1 A
Customer churn prediction in banking
Ad

Similar to Machine Learning Approach for Employee Attrition Analysis (20)

PDF
saiuiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
PPTX
Machine learning project on employee attrition detection using (2).pptx
PPTX
Machine learning project on employee attrition detection using.pptx
PDF
Predicting Employee Attrition using various techniques of Machine Learning
PPTX
3GN20CS040-INTERNSHIP.pptxRegulatory agencies in India and usa topic haiiRegu...
PDF
Identification of human resource analytics using machine learning algorithms
PDF
International Journal of Artificial Intelligence & Applications (IJAIA)
PDF
Employee Attrition Prediction using Machine Learning Models: A Review Paper
PDF
Employee Attrition Prediction using Machine Learning Models: A Review Paper
PDF
EMPLOYEE ATTRITION PREDICTION IN INDUSTRY USING MACHINE LEARNING TECHNIQUES
PDF
EMPLOYEE ATTRITION PREDICTION USING MACHINE LEARNING MODELS: A REVIEW PAPER
PDF
IRJET - Employee Performance Prediction System using Data Mining
PDF
HR Analytics: A Brief study on predictive attrition
PPTX
Employee Retention Prediction: A Data Science Project by Devangi Shukla
PPTX
Employee Retention Prediction: Enhancing Workforce Stability
PPTX
Employee Attrition Predictor .pptx
PPTX
data science training in Hyderabad | Index IT |Data Science Classes in Hyde...
PPTX
"Predicting Employee Retention: A Data-Driven Approach to Enhancing Workforce...
PPTX
Predictive modeling project
PPTX
Artificial intelegence for internship student (AI) .pptx
saiuiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using.pptx
Predicting Employee Attrition using various techniques of Machine Learning
3GN20CS040-INTERNSHIP.pptxRegulatory agencies in India and usa topic haiiRegu...
Identification of human resource analytics using machine learning algorithms
International Journal of Artificial Intelligence & Applications (IJAIA)
Employee Attrition Prediction using Machine Learning Models: A Review Paper
Employee Attrition Prediction using Machine Learning Models: A Review Paper
EMPLOYEE ATTRITION PREDICTION IN INDUSTRY USING MACHINE LEARNING TECHNIQUES
EMPLOYEE ATTRITION PREDICTION USING MACHINE LEARNING MODELS: A REVIEW PAPER
IRJET - Employee Performance Prediction System using Data Mining
HR Analytics: A Brief study on predictive attrition
Employee Retention Prediction: A Data Science Project by Devangi Shukla
Employee Retention Prediction: Enhancing Workforce Stability
Employee Attrition Predictor .pptx
data science training in Hyderabad | Index IT |Data Science Classes in Hyde...
"Predicting Employee Retention: A Data-Driven Approach to Enhancing Workforce...
Predictive modeling project
Artificial intelegence for internship student (AI) .pptx
Ad

More from ijtsrd (20)

PDF
A Study of School Dropout in Rural Districts of Darjeeling and Its Causes
PDF
Pre extension Demonstration and Evaluation of Soybean Technologies in Fedis D...
PDF
Pre extension Demonstration and Evaluation of Potato Technologies in Selected...
PDF
Pre extension Demonstration and Evaluation of Animal Drawn Potato Digger in S...
PDF
Pre extension Demonstration and Evaluation of Drought Tolerant and Early Matu...
PDF
Pre extension Demonstration and Evaluation of Double Cropping Practice Legume...
PDF
Pre extension Demonstration and Evaluation of Common Bean Technology in Low L...
PDF
Enhancing Image Quality in Compression and Fading Channels A Wavelet Based Ap...
PDF
Manpower Training and Employee Performance in Mellienium Ltdawka, Anambra State
PDF
A Statistical Analysis on the Growth Rate of Selected Sectors of Nigerian Eco...
PDF
Automatic Accident Detection and Emergency Alert System using IoT
PDF
Corporate Social Responsibility Dimensions and Corporate Image of Selected Up...
PDF
The Role of Media in Tribal Health and Educational Progress of Odisha
PDF
Advancements and Future Trends in Advanced Quantum Algorithms A Prompt Scienc...
PDF
A Study on Seismic Analysis of High Rise Building with Mass Irregularities, T...
PDF
Descriptive Study to Assess the Knowledge of B.Sc. Interns Regarding Biomedic...
PDF
Performance of Grid Connected Solar PV Power Plant at Clear Sky Day
PDF
Vitiligo Treated Homoeopathically A Case Report
PDF
Vitiligo Treated Homoeopathically A Case Report
PDF
Uterine Fibroids Homoeopathic Perspectives
A Study of School Dropout in Rural Districts of Darjeeling and Its Causes
Pre extension Demonstration and Evaluation of Soybean Technologies in Fedis D...
Pre extension Demonstration and Evaluation of Potato Technologies in Selected...
Pre extension Demonstration and Evaluation of Animal Drawn Potato Digger in S...
Pre extension Demonstration and Evaluation of Drought Tolerant and Early Matu...
Pre extension Demonstration and Evaluation of Double Cropping Practice Legume...
Pre extension Demonstration and Evaluation of Common Bean Technology in Low L...
Enhancing Image Quality in Compression and Fading Channels A Wavelet Based Ap...
Manpower Training and Employee Performance in Mellienium Ltdawka, Anambra State
A Statistical Analysis on the Growth Rate of Selected Sectors of Nigerian Eco...
Automatic Accident Detection and Emergency Alert System using IoT
Corporate Social Responsibility Dimensions and Corporate Image of Selected Up...
The Role of Media in Tribal Health and Educational Progress of Odisha
Advancements and Future Trends in Advanced Quantum Algorithms A Prompt Scienc...
A Study on Seismic Analysis of High Rise Building with Mass Irregularities, T...
Descriptive Study to Assess the Knowledge of B.Sc. Interns Regarding Biomedic...
Performance of Grid Connected Solar PV Power Plant at Clear Sky Day
Vitiligo Treated Homoeopathically A Case Report
Vitiligo Treated Homoeopathically A Case Report
Uterine Fibroids Homoeopathic Perspectives

Recently uploaded (20)

PPTX
master seminar digital applications in india
PDF
Basic Mud Logging Guide for educational purpose
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Cell Types and Its function , kingdom of life
PPTX
Cell Structure & Organelles in detailed.
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Lesson notes of climatology university.
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
RMMM.pdf make it easy to upload and study
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
master seminar digital applications in india
Basic Mud Logging Guide for educational purpose
GDM (1) (1).pptx small presentation for students
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Cell Types and Its function , kingdom of life
Cell Structure & Organelles in detailed.
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Microbial diseases, their pathogenesis and prophylaxis
Lesson notes of climatology university.
O5-L3 Freight Transport Ops (International) V1.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Sports Quiz easy sports quiz sports quiz
Anesthesia in Laparoscopic Surgery in India
RMMM.pdf make it easy to upload and study
human mycosis Human fungal infections are called human mycosis..pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...

Machine Learning Approach for Employee Attrition Analysis

  • 1. International Journal of Trend in Scientific Research and Development (IJTSRD) Conference Issue | March 2019 Available Online: www.ijtsrd.com e-ISSN: 2456 - 6470 Fostering Innovation, Integration and Inclusion Through Interdisciplinary Practices in Management @ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 62 Machine Learning Approach for Employee Attrition Analysis Dr. R. S. Kamath1, Dr. S. S. Jamsandekar2, Dr. P. G. Naik3 1Associate Professor, 2Assistant Professor, 3Professor 1,2,3Department of Computer Studies, 1,2,3Chhatrapati Shahu Institute of Business Education and Research, Kolhapur, Maharashtra, India Organised By: Management Department, Chhatrapati Shahu Institute of Business Education and Research, Kolhapur, Maharashtra How to cite this paper: Dr. R. S. Kamath | Dr. S. S. Jamsandekar | Dr. P.G. Naik "Machine Learning Approach for Employee AttritionAnalysis"Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Special Issue | Fostering Innovation, Integration and Inclusion Through Interdisciplinary Practices in Management, March 2019, pp.62- 67, URL: https://www.ijtsrd. com/papers/ijtsrd 23065.pdf ABSTRACT Talent management involves a lot of managerial decisions to allocate right people with the right skills employed at appropriate location and time. Authors report machine learning solution for Human Resource (HR) attrition analysis and forecast. The data for this investigation is retrieved from Kaggle, a Data Science and Machine Learning platform [1]. Present study exhibits performance estimation of various classification algorithms and compares the classification accuracy. The performance of the model is evaluated in terms of Error Matrix and Pseudo R Square estimate of error rate. Performance accuracyrevealedthat Random Forest model can be effectively used for classification. This analysis concludes that employeeattrition depends moreonemployees’satisfaction level as compared to other attributes. INTRODUCTION The process to identifying the existing talent in an organization is among the top talent management challenges and the important issue. For every organization, human resource plays a vital role in all strategic decisions. Satisfied, highly- motivated and loyal employees represent the basis of a company and which in turn have impacts on the productivity of an organization. The prime objective of the present study is to analyze why some of the best and most experienced employees are leaving prematurely. This analysis also wishes to predict which valuable employees will leave next. The rest of paper is designed as follows; Introduction followed by the materials and methods utilized in the present study. Then the third section summarizesthe results and discussions of the HR attrition analysis. The conclusion at the end justifies the suitability of Random Forest model for this talent mining. Materials and Methods The dataset for the present analysis is taken from Kaggle, Machine Learning platform [1]. This is the simulated dataset comprising 15000 employee records classified into two categories (left or not left) based on satisfaction level, latest evaluation, number of project worked on, average monthlyhours, timespendin thecompany,workaccident,promotion within the past 5 years, department and salary. Table 1 gives description of employee dataset. Table 1: Employee dataset description for talent mining Attribute Description Data Type satisfaction_level Level of satisfaction (0-1) Numeric last_evaluation Time since last performance evaluation (in Years) Numeric number_project Number of projects completed while at work Numeric average_montly_hours Average monthly hours at workplace Numeric time_spend_company Number of years spent in the company Numeric Work_accident Whether the employee had a workplace accident Numeric Left Whether the employee left the workplace or not (1 or 0) Numeric promotion_last_5years Whether the employee was promoted in the last five years Numeric sales Department in which they work for String salary Relative level of salary (high) String IJTSRD23065
  • 2. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 63 This section explores details of experiment conducted for employee attrition analysis and forecasting. The present study is carried out using R and Rattle data mining platform [4]. Figure 1 shows summary of the HR dataset. Dataset is partitioned randomlyinto training, testing and validation with division 70%, 15 % and 15% respectively. We used the trainingdatasetfor parameter adjustment of model whereas validation set to control learning process. Figure 1: Dataset exploration – Summary Among the vast machine learning algorithms, authors have picked Decision Tree, Random Forest, Support Vector Machine (SVM), and Linear Regression techniques to build the model. These algorithms are based on supervised learning and best known for building prediction models [8]. Supervised learning algorithms try to model relationships and dependencies between the target prediction output and the input features/ predictors such that we can predict the output values for new data based on those relationships which it learned from the previous data sets. Figure 2 explains Decision tree modeling of HR data. It begins with a root node “satisfaction level”, that part into different branches, prompting to further nodes, each of which may additionally part or else end as a leaf node. Connected with each nonleaf node will be a test or question that figures out which branch to take after [7]. The leaf nodes indicatetheattrition sates whether the employee “left” or “not left”. Figure 3 gives pictorial representation of Decision tree thus derived.
  • 3. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 64 Figure 2: Decision tree modeling Figure 3: Decision tree for HR attrition status Figure 4 explains Random Forest Modeling for HR attrition analysis. RANDOMFOREST packagein Renvironmentis employed here to analyze model structure[5-6]. RF builds manydecision trees usingrandom subset of data andvariables.Rattle provides access to three parameters such as the number of trees, sample size and number of variables for tuning the models.
  • 4. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 65 Figure 4: Summary of the Random Forest Model Figure 5 explains Support Vector Machine (SVM) designedfortheattrition analysisof employeedata. SVM searches forsupport a vector that separates the class. Figure 5: Summary of SVM Model Figure 6 explains Linear Regression Model. It is the traditional method for fitting a statistical model to data. It is appropriate since the target variable “attrition status” is numeric. Figure 6: Summary of Logistic Regression model Results and Discussion The present investigation employed different prediction algorithms to analyze employee attrition status and likelihood of retention-attrition of employees. The performance of the model is evaluated in terms of Error Matrix and Pseudo R Square estimate of error rate. An error matrix shows the true outcomes against the predicted outcomes. It is also known as confusion matrix. Table 2 explains performance analysis of these classifiers in terms of error matrix.
  • 5. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 66 Table 2: Performance Analysis of the Classifiers Model Error Matrix Decision Tree Random Forest Support Vector Machine Liner Model Figure 7, the “Predicted versus Observed” plot shows the performance analysis of all the four models. The plot displays the predicted values against the observed values. The Pseudo R-Squared, square of the correlation between the predicted and observed values. The closer to 1, is the acceptable one. Table 3 gives Pseudo R-Square values for these four models. Figure 7: “Predicted versus Observed plot” for classifiers
  • 6. International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Paper ID - IJTSRD23065 | Conference Issue | FIIITIPM - 2019 | March 2019 Page: 67 Table 1: Performance accuracy of classifiers Classifier Pseudo R-square Decision Tree 0.8473 Random Forest 0.9773 Support Vector Machine 0.8315 Linear Regression 0.2299 Confusion matrix and “Predicted versus Observed” plot concludes that Random Forest is the appropriatemodelforanalysis of Employee attrition as compared to the other algorithms considered in this studyandtheunderlined data.Figure8 explains the relative importance of HR dataset attributes using Gini importanceandPermutation importancemeasures. Basedon thesetwo measures, it reveals that employees’ “satisfaction level” is the predominant predictor of employee attrition. Figure 8: Dependency of employee attrition status on other attributes Conclusion Authors have explored a machine learning solution for HR attrition analysis and forecast. Present study exhibits performance estimation of various classification algorithms and compares the classification accuracy. The performance of the model is evaluated in terms of Error Matrix and Pseudo R Square estimate of error rate. Performance accuracy revealed that Random Forest model can be effectively used for classification. The result also concludes that employee attrition depends more on employees’ satisfaction level as compared to other attributes. References: [1] Retrieved on 30th Dec, 2017 from https://www.kaggle.com/ludobenistant/hr-analytics- 1/data [2] Boudreau, J. W. – Ramstad, P. M.: Beyond HR. Boston. Harvard Business School Press, 2007.ISBN978-1-4221- 0415-6. [3] https://www.infogix.com/blog/machine-learning-vs- statistical-modeling-the-real-difference, accessed date 28/012/2017 [4] Graham, W. Data Mining with Rattle and R: The Art of Excavating Data for KnowledgeDiscovery,Springer,DOI 10.1007/978-1-4419-9890-3 [5] Breiman, L. (2005), Random Forest. Machine Learning,45, 5-32 [6] Andy, L., & Matthew, W. (2002). Classification and Regression by random Forest, R News, 2(3) [7] R. S. Kamath, R. K. Kamat (2016), Modeling of Random Textured Tandem Silicon Solar Cells Characteristics: Decision Tree Approach, Journal of Nano and Electronic Physics, Vol. 8 No 4(1), 04021(4pp) [8] R. S. Kamath, R .K. Kamat (2016), Supervised Learning Model for Kick starter Campaigns with R Mining, International Journal of Information Technology, Modeling and Computing (IJITMC), Vol. 4, No.1, February, 19-30 Copyright © 2019 by author(s) and International Journal of Trend in Scientific Research and Development Journal. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) (http://creativecommons.org/licenses/by/4.0)