Employee Attrition Prediction Using Machine Learning
Employee Attrition Prediction Using Machine Learning
Rekha Puranik M
Employee Attrition Prediction Using Machine Learning
NEW
JSSATE STAFF
D M KUMAR
Document Details
Submission ID
trn:oid:::1:3295499491 5 Pages
Download Date
File Name
conference_paper_2.docx
File Size
611.7 KB
0 Missing Citation 0%
Matches that have quotation marks, but no in-text citation
Integrity Flags
0 Integrity Flags for Review
Our system's algorithms look deeply at a document for any inconsistencies that
No suspicious text manipulations found. would set it apart from a normal submission. If we notice something strange, we flag
it for you to review.
0 Missing Citation 0%
Matches that have quotation marks, but no in-text citation
Top Sources
The sources with the highest number of matches within the submission. Overlapping sources will not be displayed.
1 Internet
tnsroindia.org.in 20%
2 Internet
github.com 2%
3 Internet
www.coursehero.com 1%
4 Student papers
5 Publication
6 Student papers
7 Student papers
universititeknologimara <1%
8 Student papers
9 Internet
icbsii.in <1%
10 Internet
www.codewithc.com <1%
11 Internet
www.mdpi.com <1%
12 Publication
Inam Ullah Khan, Mariya Ouaissa, Mariyam Ouaissa, Muhammad Fayaz, Rehmat … <1%
13 Internet
inass.org <1%
14 Publication
Parneeta Dhaliwal, Manpreet Kaur, Hardeo Kumar Thakur, Rajeev Kumar Arya, Jo… <1%
15 Publication
Pawan Singh Mehra, Dhirendra Kumar Shukla. "Artificial Intelligence, Blockchain,… <1%
Abstract—Employee attrition is still a major problem in Logistic Regression and Multi-layer Perceptrons
organizational management as high turnover has an adverse (MLP) have high levels of prediction accuracy. [2-3].
impact on productivity, morale, and costs. In this paper,
6 supervised type of machine learning algorithms are used to However, issues regarding data imbalance (i.e., the
predict the employee attrition using the IBM HR Analytics majority of employees are not leaving), a high number
dataset with nine machine learning models tested both with and of attributes, and a need to make models interpretable
without dimensionality reduction and sampling techniques, still remain. Recent research has found solutions to
where Random Forest model accuracy was highest at 99.2%
when oversampling and Principal Component Analysis (PCA) is
those issues through the use of sampling (i.e., over-
applied. Feature importance is analyzed, and better approaches sampling and under-sampling techniques), and
for mitigating attrition are proposed. combined dimensional reduction (i.e., principal
component analysis - PCA) and ensemble learning
methods. [8-11]. The aim of this study is to build on
Keywords—Employee attrition, machine learning, HR that research by implementing and comparing multiple
10 analytics, random forest, PCA, oversampling, talent retention. ML classifiers Logistic Regression, Random Forest,
Support Vector Machines (SVM), Neural Networks
(MLP), Linear Discriminant Analysis (LDA) and other
I. INTRODUCTION ensemble learning to tackle data imbalance issues.
marked improvements after applying these techniques methods help make predictive models transparent,
[2], [5]. A compelling example is presented by Ma et allowing organizations to understand why an
al.[21], who introduced a feature selection employee might leave, not just who might leave [22],
optimization framework integrating filter, wrapper, [24]. Feature selection remains a crucial step in
and embedded methods. This approach reduced aligning data science with HR strategy. Across many
dimensionality by 60% and, when paired with studies including those by Ma et al. [21] and Nandal et
XGBoost and SMOTE, achieved an accuracy of al. [23] variables such as jobSatisfaction, overtime,
87.3%. Importantly, the study used SHAP values to stockOptionLevel, workLifeBalance, and
interpret model predictions, identifying overtime, careerProgression have emerged as consistently strong
monthly income, and job involvement as key predictors. Selecting and emphasizing these features
predictors findings that align closely with HR domain enhances not only prediction performance but also the
knowledge. Similarly, Arqawi et al. [22] compared practical utility of models in decision-making.
ML models with deep learning architectures for Additionally, recent research is moving beyond
attrition prediction. Using the same dataset, their prediction toward prescriptive analytics. Scholars like
experiments showed that Multilayer Perceptrons Latorre et al. [23] have proposed benefit-driven
(MLP) outperformed other classifiers, reaching an F1 retention policies, while others have suggested
score of 94.5%. They also underscored the importance targeted interventions for high-risk groups based on
of data preprocessing, notably balancing the dataset role, demographics, or department. These
and encoding categorical variables. Their results developments emphasize translating model insights
confirm that well-tuned deep learning models, when into actionable HR strategies, reinforcing the value of
combined with appropriate data handling, can offer predictive analytics as a management tool.
high levels of predictive accuracy. Another
comparative study by Nandal et al. [23] evaluated a III. METHODOLOGY
wide spectrum of models including Logistic
Regression, KNN, Naive Bayes, ensemble methods
(e.g., AdaBoost, Random Forest), and deep learning
architectures like FNN and CNN. Their best-
performing model, FNN, achieved an accuracy of
97.5%. Their analysis emphasized job satisfaction,
overtime, and job involvement as strong correlates of
attrition.
and is performed through Principal Component undersampled data with PCA but a decrease in the
Analysis (PCA), which enables the system to retain accuracy.
only the most meaningful features while improving
model performance. C. Decision Tree(DT)
15 The Decision Tree (DT) algorithm is a
supervised learning model to analyse learnings to
1 classify items into a class. The DT works by splitting
the data into smaller subsets, based on either Gini
impurity or information gain, until one
as 17 and weights as distance, the KNN model TABLE I. CLASSIFICATION SCORES FOR IMBALANCED DATA
performed well.
Model Accuracy Precision F1score
1 This was the only unsupervised model and had the
1 highest recall as well as good accuracy, precision, F1 LR 0.875 0.753 0.472
Score. NB 0.787 0.394 0.471
DT 0.836 0.498 0.351
I. MultiLayer Perceptron (MLP) RF 0.862 0.841 0.296
With hyperparameters: logistic activation
AdaBoost 0.858 0.769 0.293
function, alpha as 0.05 and lbfs solver, the best
performance with highest metrics was obtained for SVM 0.867 0.741 0.406
oversampled data with PCA followed by imbalanced LDA 0.867 0.683 0.439
and undersampled data. KNN 0.848 0.771 0.157
MLP 0.866 0.677 0.447
IV. EXPERIMENTAL SETUP AND RESULTS
For the oversampled PCA data, tree based models [9] [9] M. S. Gazi, M. Nasiruddin, S. Dutta, R. Sikder, C. B. Huda, and M.
Z. Islam, “Employee attrition prediction in the USA: A machine
showed the best performance, of which Random Forest learning approach for HR analytics and talent retention strategies,” J.
showed the highest accuracy and F1-score, 99.2% and Bus. Manag. Stud., vol. 6, no. 3, pp. 47–54, May 2024. DOI:
10.32996/jbms.2024.6.3.6.
precision is 98.6%.
[10] [10] M. A. Akasheh, E. F. Malik, O. Hujran, and N. Zaki, “A decade
of research on machine learning techniques for predicting employee
5 V. CONCLUSION AND FUTURE WORKS turnover: A systematic literature review,” Expert Syst. Appl., vol. 238,
p. 121794, 2024. DOI: 10.1016/j.eswa.2023.121794.
[11] [11] A. V. Pachghare, S. Deshmukh, and S. Salunkhe, “Employee
8 This research presented a comprehensive machine churn prediction using machine learning techniques: A systematic
learning approach to predict employee attrition using review,” in Proc. 2024 Int. Conf. Data Sci. Netw. Security (ICDSNS),
2024, pp. 1–6. DOI: 10.1109/ICDSNS62112.2024.10691112..
the IBM HR Analytics dataset. Nine supervised [12] [12]H. Alqahtani, H. Almagrabi, and A. Alharbi, “Employee Attrition
models were evaluated, with Random Forest on Prediction Using Machine Learning Models: A Review Paper,” Int. J.
oversampled data with PCA achieving the highest of Artificial Intelligence and Applications (IJAIA), vol. 15, no. 2, pp.
24–43, Mar. 2024, DOI: 10.5121/ijaia.2024.15202
performance (accuracy: 99.2%, F1 score: 99.2%).
[13] [13] P. Latorre, H. López-Ospina, S. Maldonado, C. A. Guevara, and
Exploratory analysis revealed key attrition indicators J. Pérez, “Designing Employee Benefits to Optimize Turnover: A
such as age, overtime, and monthly income. Prescriptive Analytics Approach,” Computers & Industrial
Engineering, vol. 197, Art. 110582, 2024. DOI:
10.1016/j.cie.2024.110582K.
For future work, model deployment through web [14] [14] Degtyareva, D. A. Ageev, and V. V. Kukartsev, “Finding patterns
applications could enable real-time predictions for HR in employee attrition rates using self-organizing Kohonen maps and
decision trees,” in Proc. 2023 Int. Conf. Innovative Comput., Intell.
teams. Incorporating time-series data and external Commun. Smart Elect. Syst. (ICSESES), 2023, pp. 1–6. DOI:
economic indicators can improve prediction accuracy. 10.1109/ICSESES60034.2023.10465548.
Techniques like SHAP and LIME can be used for [15] [15] Doohee Chung et al., "Predictive model of employee attrition
based on stacking ensemble learning," Journal of Computer and
model interpretability. Additionally, adapting deep Information Sciences, vol. 20, no. 5, pp. 100-107, 2023.
learning architectures and evaluating cross-company [16] [16] Sonam Mittal, Ankita.,"Employee Attrition Prediction Using
generalizability would enhance robustness and real- Machine Learning Algorithms,"
Proc. 3rd Int. Conf. on Smart Generation Computing, Communication
world applicability. and Networking (SMART GENCON),
pp. 1–4, IEEE, Dec. 29–31, 2023, DOI:
REFERENCES 10.1109/SMARTGENCON60755.2023.10442776
[17] [17] F. Guerranti and G. M. Dimitri, “A Comparison of Machine
[1] [1] D.V. Lokeswar Reddy, Shake Hussain Basha, Vineeta Kaur Saluja, Learning Approaches for Predicting Employee Attrition,” Applied
Vandana Tiwari, Sreelekha T. K., and Aarti Sharma,"Optimizing Sciences, vol. 13, no. 1, Art. 267, pp. 1–8, 2023, DOI:
Employee Selection in Human Resource Management Using Decision 10.3390/app13010267
Tree Algorithms,"Proc. 6th Int. Conf. on Mobile Computing and
Sustainable Informatics (ICMCSI), IEEE, pp. 1857–1862, 2025, DOI: [18] [18] F. K. Alsheref, I. E. Fattoh, and W. M. Ead, "Automated Prediction
10.1109/ICMCSI64620.2025.10883407 of Employee Attrition Using Ensemble Model Based on Machine
Learning Algorithms," Computational Intelligence and Neuroscience,
[2] [2] M. Garba, M. Usman, and M. Saidu, “Enhancing employee attrition vol. 2022, Article ID 7728668, pp. 1–9, 2022, DOI:
prediction: The impact of data preprocessing on machine learning 10.1155/2022/7728668
model performance,” FUDMA J. Sci., vol. 9, no. 1, pp. 205–210, Jan.
2025. DOI: 10.33003/fjs-2025-0901-3030. [19] [19] D. Avrahami, D. Pessach, G. Singer, and H. Chalutz Ben-Gal, "A
Human Resources Analytics and Machine-Learning Examination of
[3] [3] Rajkumar Govindarajan, N. Komal Kumar, Sudhakar Reddy P., Sai Turnover: Implications for Theory and Practice," International Journal
Pravallika E., Dhatri B., and Pavan Kumar G.,"Predicting Employee of Manpower, vol. 43, no. 6, pp. 1405–1424, 2022, DOI: 10.1108/IJM-
Attrition: A Comparative Analysis of Machine Learning Models Using 12-2020-0548
the IBM Human Resource Analytics Dataset,"Procedia Computer
Science, vol. 258, pp. 4084–4093, 2025, DOI: [20] [20] L. Geiler, S. Affeldt, and M. Nadif, “An Effective Strategy for
10.1016/j.procs.2025.04.659 Churn Prediction and Customer Profiling,” Data & Knowledge
Engineering, vol. 142, Art. 102100, 2022, DOI:
[4] [4]Nishitha Reddy Nalla. "Machine Learning Models for Predicting 10.1016/j.datak.2022.102100
Employee Retention and Performance,"
International Journal of Data Science and Machine Learning [21] [21] D. Ma, M. Shu, and H. Zhang, "Feature Selection Optimization
(IJDSML), pp. 15–19, Vol. 05, Issue 01, Feb. 2025, DOI: for Employee Retention Prediction: A Machine Learning Approach,"
10.55640/ijdsml-05-01-04 Preprints, pp. 1–11, Apr. 2025, DOI:
10.20944/preprints202504.1549.v1.
[5] [5]A. Benabou, F. Touhami, and M.A. Sabri, "Predicting Employee
Turnover Using Machine Learning Techniques," Acta Informatica [22] [22] S. M. Arqawi, M. A. Abu Rumman, E. A. Zitawi, A. H. Rabaya,
Pragensia, vol. 14, no. 1, pp. 112–127, 2025, DOI: 10.18267/j.aip.255 A. S. Sadaqa, B. S. Abunasser, and S. S. Abu-Naser, "Predicting
Employee Attrition and Performance Using Deep Learning," Journal
[6] [6] R. Govindarajan, N. K. Kumar, S. P. Reddy, S. E. Pravallika, B. of Theoretical and Applied Information Technology, vol. 100, no. 21,
Dhatri, and P. G. Kumar, "Predicting Employee Attrition: A pp. 6526–6533, Nov. 2022.
Comparative Analysis of Machine Learning Models Using the IBM
Human Resource Analytics Dataset," Procedia Computer Science, vol. [23] [23] M. Nandal, V. Grover, D. Sahu, and M. Dogra, "Employee
258, pp. 4084–4093, 2025, doi: 10.1016/j.procs.2025.04.659. Attrition: Analysis of Data Driven Models," EAI Endorsed
Transactions on Internet of Things, vol. 10, no. 2, pp. 1–7, Jan. 2024,
[7] [7]P. Nagpal, A. Pawar, and S. H. M., “Predicting employee attrition DOI: 10.4108/eetiot.4762.
through HR analytics: A machine learning approach,” in Proc. 4th Int.
Conf. Innovative Practices Technol. Manag. (ICIPTM), 2024, pp. 1–6. [24] [24] J. Park, Y. Feng, and S.-P. Jeong, "Developing an Advanced
DOI: 10.1109/ICIPTM59628.2024.10563285. Prediction Model for New Employee Turnover Intention Utilizing
Machine Learning Techniques," Scientific Reports, vol. 14, no. 1221,
[8] [8] Aitong Jin et al., "Predicting Employee Attrition Using Machine pp. 1–15, 2024, DOI: 10.1038/s41598-023-50593-4.
Learning Approaches," International Journal of Computer
Applications, vol. 182, no. 45, pp. 12-19, 2024.