VISVESVARAYA TECHNOLOGICAL UNIVERSITY
“JnanaSangama”, Belagavi-590014, Karnataka, India
IMPACT COLLEGE OF ENGINEERING AND APPLIED SCIENCES
Department of Computer Science and Engineering- Data Science
INTERNSHIP PRESENTATION On
“Predicting Employee Attrition Using Machine Learning”
AT
Varcons Technologies
Under the Guidance of :
Submitted By : Mr. Krishna Mehar
SANJAY GR [1IC22CD400] Professor
Dept. of AI & ML
Internship Overview
Duration
Feb 1, 2025 – May 15, 2025
Mode
Offline experience
Focus
Practical Data Science with Python
Training
Preprocessing, ML, Model Evaluation
Company Overview:
Varcons Technologies
About Varcons
• Leading SaaS provider
• Innovative solutions
• Corporate seminars
• Industrial training
Our Goal
• Deliver smart tech
• Scalable services
• Clients of all sizes
Internship Objectives
Practical Data Science
Hands-on experience
Apply Classroom Learning
Real-world projects
Master Python & ML
New techniques
Understand Data Lifecycle
Collection to evaluation
Improve Skills
Problem-solving, critical thinking
Project Focus: Predicting
Employee Attrition
Project Title
Predicting Employee Attrition
Domain
Human Resources Analytics
Primary Goal
Analyze and predict employee turnover likelihood
Methodology
Leveraging ML models
Project Abstract
Attrition Impact Our Solution
Employees leaving incurs significant costs. Machine learning to predict turnover.
• Average cost: $4129 per new hire. • Extra Trees Classifier (ETC) performed best.
• US attrition rate (2021): 57.3%. • Achieved 93% accuracy.
• Key factors: Age, Monthly Income, Hourly Rate, Job Level.
Introduction to Attrition
Attrition Types of Attrition
Defined Attrition Rate
Employees • Voluntary Employees
leaving the • Involuntary Left / Avg.
organization Employees
• External
• Internal
ML Role
Data-driven
HR decisions
Related Work in Attrition Prediction
80%-88% 93%
Typical Accuracy Our ETC Model
Common ML models Higher accuracy with tuning and balancing
Previous studies used various ML techniques for attrition prediction. Common methods included dataset balancing (e.g.,
SMOTE) and models like Random Forest, Gradient Boosting, and Neural Networks. Most achieved accuracy rates between
80% and 88%. Our Extra Trees Classifier (ETC) model, through meticulous tuning and balancing, surpassed these
benchmarks, achieving 93% accuracy.
Methodology Overview
Step 4: Data Balancing
Step 1: Data Loading Utilize SMOTE
IBM HR Employee Attrition Dataset
Step 5: Split Data
Step 2: Exploratory Data Analysis
Perform EEDA 85% Train, 15% Test
Step 6: Model Training
Step 3: Feature Engineering
Remove low-correlation features; encode categorical data Apply ML algorithms
Step 7: Evaluate & Compare
Assess model performance
Dataset Information
Dataset Source Records Features Key Attributes
IBM HR Analytics 1470 Employees 35 (Categorical & Age, Income,
Numerical) Department,
Satisfaction, Experience,
etc.
Predictive HR Analytics: A
Data Science Approach to
Employee Attrition
Uncovering insights and building models to predict employee
attrition, enhancing HR strategies.
Employee Exploratory Data Analysis (EEDA)
• Young employees (20-25) with low income: high attrition.
• Attrition drops after 4+ years experience.
• Higher earners tend to stay longer.
Feature Engineering: Optimizing Model Inputs
Features Removed Reason for Removal
DailyRate, EmployeeCount, StandardHours, and others. Low correlation with the attrition target variable.
Technique Used Primary Goal
One-Hot Encoding for categorical data transformation. Enhance model input quality and overall accuracy.
Machine Learning Models
Utilized
Logistic Decision Tree Support Vector
Regression Classifier Machine
Baseline for binary Rule-based, Effective for high-
classification. interpretable model. dimensional data.
Extra Trees
Classifier
Ensemble method
with highest
performance.
Data Preprocessing:
Preparing Data for ML
Missing Values
Handled using appropriate imputation strategies.
Dataset Balancing
SMOTE applied to address class imbalance.
Normalization/Scaling
Applied to standardize feature ranges.
Data Split
Divided into 85% training, 15% testing sets.
Model Evaluation Metrics: A Comprehensive View
Evaluation Results: Model
Performance Comparison
ETC Outperformance
Extra Trees Classifier demonstrated superior results.
LR Performance
Logistic Regression was good but slightly lower in metrics.
SVM Limitations
Support Vector Machine less effective with high-dimensional data.
DTC Overfitting
Decision Tree Classifier showed signs of overfitting.
Tools & Technologies Used
Python
Primary programming language.
Pandas
Data manipulation and analysis.
scikit-learn
Machine learning algorithms and utilities.
Jupyter Notebook
Interactive development environment.
Skills Acquired During Internship
Data Sourcing
Effective data acquisition and integration.
Data Preprocessing
Cleaning, handling missing values, transformation.
Feature Engineering
Creating relevant features for models.
Model Training
Hyperparameter tuning for optimal performance.
Version Control
Proficient use of GitHub for collaboration.
Reporting
Clear, concise report writing and presentation.
Internship Deliverables & Conclusion
Key Deliverables
• Structured .CSV dataset.
• ML-ready preprocessed dataset.
• EEDA and missing value treatment report.
• Trained attrition prediction model.
Developed an end-to-end predictive system for employee attrition. Gained a comprehensive understanding of the
Data Science lifecycle. Applied ML to a real-world HR problem, now ready for industry projects.