0% found this document useful (0 votes)

7 views9 pages

Employee Attrition Data Analysis

Uploaded by

Soniya Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views9 pages

Employee Attrition Data Analysis

Uploaded by

Soniya Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

import pandas as pd

# Load the dataset from the uploaded file

file_path = '/content/employee_attrition_data.csv'
employee_attrition_data = pd.read_csv(file_path)

# Display the first few rows of the dataset

print("First few rows of the dataset:")
print(employee_attrition_data.head())

# Display summary information about the dataset

print("\nSummary Information of the dataset:")
print(employee_attrition_data.info())

# Calculate basic statistics for numerical columns

print("\nBasic Statistics of the dataset:")
print(employee_attrition_data.describe())

First few rows of the dataset:

Employee_ID Age Gender Department Job_Title
Years_at_Company \
0 0 27 Male Marketing Manager 9

1 1 53 Female Sales Engineer 10

2 2 59 Female Marketing Analyst 8

3 3 42 Female Engineering Manager 1

4 4 44 Female Sales Engineer 10

Satisfaction_Level Average_Monthly_Hours Promotion_Last_5Years

Salary \
0 0.586251 151 0
60132
1 0.261161 221 1
79947
2 0.304382 184 0
46958
3 0.480779 242 0
40662
4 0.636244 229 1
74307

Attrition
0 0
1 0
2 1
3 0
4 0
Summary Information of the dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Employee_ID 1000 non-null int64
1 Age 1000 non-null int64
2 Gender 1000 non-null object
3 Department 1000 non-null object
4 Job_Title 1000 non-null object
5 Years_at_Company 1000 non-null int64
6 Satisfaction_Level 1000 non-null float64
7 Average_Monthly_Hours 1000 non-null int64
8 Promotion_Last_5Years 1000 non-null int64
9 Salary 1000 non-null int64
10 Attrition 1000 non-null int64
dtypes: float64(1), int64(7), object(3)
memory usage: 86.1+ KB
None

Basic Statistics of the dataset:

Employee_ID Age Years_at_Company Satisfaction_Level
\
count 1000.000000 1000.000000 1000.000000 1000.000000

mean 499.500000 42.205000 5.605000 0.505995

std 288.819436 10.016452 2.822223 0.289797

min 0.000000 25.000000 1.000000 0.001376

25% 249.750000 33.000000 3.000000 0.258866

50% 499.500000 43.000000 6.000000 0.505675

75% 749.250000 51.000000 8.000000 0.761135

max 999.000000 59.000000 10.000000 0.999979

Average_Monthly_Hours Promotion_Last_5Years Salary

Attrition
count 1000.000000 1000.000000 1000.000000
1000.000000
mean 199.493000 0.486000 64624.980000
0.495000
std 29.631908 0.500054 20262.984333
0.500225
min 150.000000 0.000000 30099.000000
0.000000
25% 173.000000 0.000000 47613.500000
0.000000
50% 201.000000 0.000000 64525.000000
0.000000
75% 225.000000 1.000000 81921.000000
1.000000
max 249.000000 1.000000 99991.000000
1.000000

import pandas as pd

file_path = '/content/employee_attrition_data.csv'
employee_attrition_data = pd.read_csv(file_path)

# Check for missing values

missing_values = employee_attrition_data.isnull().sum()
print("Missing values in each column:")
print(missing_values)

# One-hot encode categorical variables

encoded_data = pd.get_dummies(employee_attrition_data,
columns=['Gender', 'Department', 'Job_Title'])

# Display the first few rows of the encoded dataset

print("First few rows of the encoded dataset:")
print(encoded_data.head())

Missing values in each column:

Employee_ID 0
Age 0
Gender 0
Department 0
Job_Title 0
Years_at_Company 0
Satisfaction_Level 0
Average_Monthly_Hours 0
Promotion_Last_5Years 0
Salary 0
Attrition 0
dtype: int64
First few rows of the encoded dataset:
Employee_ID Age Years_at_Company Satisfaction_Level \
0 0 27 9 0.586251
1 1 53 10 0.261161
2 2 59 8 0.304382
3 3 42 1 0.480779
4 4 44 10 0.636244
Average_Monthly_Hours Promotion_Last_5Years Salary Attrition \
0 151 0 60132 0
1 221 1 79947 0
2 184 0 46958 1
3 242 0 40662 0
4 229 1 74307 0

Gender_Female Gender_Male Department_Engineering

Department_Finance \
0 False True False
False
1 True False False
False
2 True False False
False
3 True False True
False
4 True False False
False

Department_HR Department_Marketing Department_Sales \

0 False True False
1 False False True
2 False True False
3 False False False
4 False False True

Job_Title_Accountant Job_Title_Analyst Job_Title_Engineer \

0 False False False
1 False False True
2 False True False
3 False False False
4 False False True

Job_Title_HR Specialist Job_Title_Manager

0 False True
1 False False
2 False False
3 False True
4 False False

import matplotlib.pyplot as plt

import seaborn as sns

# Generate summary statistics for all variables

summary_statistics = encoded_data.describe()
print("Summary Statistics:")
print(summary_statistics)

# Histograms for numerical variables

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

sns.histplot(encoded_data['Age'], kde=True, ax=axes[0])

axes[0].set_title('Age Distribution')

sns.histplot(encoded_data['Satisfaction_Level'], kde=True, ax=axes[1])

axes[1].set_title('Satisfaction Level Distribution')

sns.histplot(encoded_data['Salary'], kde=True, ax=axes[2])

axes[2].set_title('Salary Distribution')

plt.show()

# Count plots for original categorical variables

fig, axes = plt.subplots(1, 2, figsize=(18, 5))

sns.countplot(data=employee_attrition_data, x='Department',
ax=axes[0])
axes[0].set_title('Department Count')

sns.countplot(data=employee_attrition_data, x='Job_Title', ax=axes[1])

axes[1].set_title('Job Title Count')

plt.show()

# Generate a correlation matrix

correlation_matrix = encoded_data.corr()

# Plot the correlation matrix

plt.figure(figsize=(16, 10))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm',
fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix')
plt.show()

Summary Statistics:
Employee_ID Age Years_at_Company Satisfaction_Level
\
count 1000.000000 1000.000000 1000.000000 1000.000000

mean 499.500000 42.205000 5.605000 0.505995

std 288.819436 10.016452 2.822223 0.289797

min 0.000000 25.000000 1.000000 0.001376

25% 249.750000 33.000000 3.000000 0.258866

50% 499.500000 43.000000 6.000000 0.505675

75% 749.250000 51.000000 8.000000 0.761135

max 999.000000 59.000000 10.000000 0.999979

Average_Monthly_Hours Promotion_Last_5Years Salary

# Select features for clustering (excluding the target variable

'Attrition' and identifier 'Employee_ID')
features = encoded_data.drop(columns=['Employee_ID', 'Attrition'])

# Apply K-means clustering

kmeans = KMeans(n_clusters=3, random_state=42)
encoded_data['Cluster'] = kmeans.fit_predict(features)

# Visualize the clusters

plt.figure(figsize=(12, 6))
sns.scatterplot(data=encoded_data, x='Satisfaction_Level',
y='Average_Monthly_Hours', hue='Cluster', palette='viridis')
plt.title('K-means Clustering of Employees')
plt.show()

/usr/local/lib/python3.10/dist-packages/sklearn/cluster/
_kmeans.py:870: FutureWarning: The default value of `n_init` will
change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly
to suppress the warning
warnings.warn(

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Select features and target

X = encoded_data.drop(columns=['Employee_ID', 'Attrition', 'Cluster'])
y = encoded_data['Attrition']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

# Apply logistic regression

logreg = LogisticRegression(max_iter=1000)
logreg.fit(X_train, y_train)

# Predict on the test set

y_pred = logreg.predict(X_test)

# Evaluate the model

classification_report_logreg = classification_report(y_test, y_pred)
confusion_matrix_logreg = confusion_matrix(y_test, y_pred)

print("Classification Report:")
print(classification_report_logreg)
print("\nConfusion Matrix:")
print(confusion_matrix_logreg)

Classification Report:
precision recall f1-score support

0 0.51 0.59 0.55 102

1 0.49 0.41 0.44 98

accuracy 0.50 200

macro avg 0.50 0.50 0.49 200
weighted avg 0.50 0.50 0.50 200

Confusion Matrix:
[[60 42]
[58 40]]

ML Project 2
No ratings yet
ML Project 2
19 pages
Capstone Project - Employee Attrition Rate
No ratings yet
Capstone Project - Employee Attrition Rate
66 pages
Employee Turnover
No ratings yet
Employee Turnover
20 pages
Data Cleaning
No ratings yet
Data Cleaning
1 page
Satya772244@gmail Compdf
No ratings yet
Satya772244@gmail Compdf
7 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Employee - Attrition - Rate - Jupyter Notebook
No ratings yet
Employee - Attrition - Rate - Jupyter Notebook
62 pages
Employee Turnover Analytics
No ratings yet
Employee Turnover Analytics
32 pages
Employee Data Analysis Report
No ratings yet
Employee Data Analysis Report
22 pages
Assignment3: 1) Identify Percentage of Missing Values in Each Column and Display The Same
No ratings yet
Assignment3: 1) Identify Percentage of Missing Values in Each Column and Display The Same
30 pages
Assignment Ds Midterm
No ratings yet
Assignment Ds Midterm
2 pages
Chapter 1
No ratings yet
Chapter 1
19 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Ex 8
No ratings yet
Ex 8
3 pages
Assignment On Classification Tree Model Development: Submitted By-Gaurav Khokhani
No ratings yet
Assignment On Classification Tree Model Development: Submitted By-Gaurav Khokhani
19 pages
Python Project - Checkpoint
No ratings yet
Python Project - Checkpoint
26 pages
ML Projects
No ratings yet
ML Projects
22 pages
R Working Manuals Students
No ratings yet
R Working Manuals Students
11 pages
Group 3
No ratings yet
Group 3
15 pages
CKCS 149 Lab Final - S25Completed
No ratings yet
CKCS 149 Lab Final - S25Completed
6 pages
Employee Burnout Analysis
No ratings yet
Employee Burnout Analysis
20 pages
1624106057@g.us
No ratings yet
1624106057@g.us
13 pages
Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
Random Forest Classifier
No ratings yet
Random Forest Classifier
18 pages
ML Cops
No ratings yet
ML Cops
17 pages
Report
No ratings yet
Report
15 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Count SUM Counta Average Countblank MAX Countif MIN Countifs
No ratings yet
Count SUM Counta Average Countblank MAX Countif MIN Countifs
12 pages
Pandas Revision1
No ratings yet
Pandas Revision1
2 pages
AI Assignment 6 - Employee Performance Analysis - Jupyter Notebook
No ratings yet
AI Assignment 6 - Employee Performance Analysis - Jupyter Notebook
9 pages
Industry Assignment 1 - EmployeeAnalyis
No ratings yet
Industry Assignment 1 - EmployeeAnalyis
4 pages
DW 14
No ratings yet
DW 14
14 pages
Salary Prediction
No ratings yet
Salary Prediction
28 pages
Data Wrangling Report
No ratings yet
Data Wrangling Report
3 pages
Employee Dataset
No ratings yet
Employee Dataset
3 pages
Howxtre
No ratings yet
Howxtre
8 pages
Excel Data Formatting Guide
No ratings yet
Excel Data Formatting Guide
25 pages
HR Data Analysis Using Python
No ratings yet
HR Data Analysis Using Python
20 pages
Frequencies
No ratings yet
Frequencies
14 pages
Dsbda 3a
No ratings yet
Dsbda 3a
11 pages
Prints
No ratings yet
Prints
43 pages
Python
No ratings yet
Python
32 pages
Interview Preparation
No ratings yet
Interview Preparation
8 pages
3 Logical Functions (Practice File)
No ratings yet
3 Logical Functions (Practice File)
12 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
188 Code Tugas 1
No ratings yet
188 Code Tugas 1
18 pages
MySQL Aggregate Functions Practice
No ratings yet
MySQL Aggregate Functions Practice
30 pages
Spss Assignment - Anoushka Sharma, Enrolment No. A0403423058
No ratings yet
Spss Assignment - Anoushka Sharma, Enrolment No. A0403423058
13 pages
R Programing 6 Feb
No ratings yet
R Programing 6 Feb
10 pages
XIIInfo Pract S E 435
0% (1)
XIIInfo Pract S E 435
5 pages
Salary Data Analysis - Phase 1
No ratings yet
Salary Data Analysis - Phase 1
5 pages
PRT 2 Q's
No ratings yet
PRT 2 Q's
7 pages
A Micro-Project Report On: "Analysis of Salary of Data Professions"
No ratings yet
A Micro-Project Report On: "Analysis of Salary of Data Professions"
19 pages
Sanjar Xolmirzayev - SQL Practice Worksheet (Employee Database)
No ratings yet
Sanjar Xolmirzayev - SQL Practice Worksheet (Employee Database)
9 pages
EMPLOYEE SATISFACTION SURVEY Updated v1
No ratings yet
EMPLOYEE SATISFACTION SURVEY Updated v1
14 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
Logistic Regression 007
No ratings yet
Logistic Regression 007
1 page
Thesis Margins Word
100% (3)
Thesis Margins Word
8 pages
Sony EF29M31
No ratings yet
Sony EF29M31
8 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Solution Manual For Principles of Computer System Design An Introduction 1st Edition
No ratings yet
Solution Manual For Principles of Computer System Design An Introduction 1st Edition
143 pages
Circuit Theory Syllabus PDF
No ratings yet
Circuit Theory Syllabus PDF
2 pages
PCC SoW GHB v2.0
100% (1)
PCC SoW GHB v2.0
14 pages
Diploma 2021 Revision
No ratings yet
Diploma 2021 Revision
9 pages
Ghaziabad Branch of Circ of Icai: Institute of Chartered Accountants of India
No ratings yet
Ghaziabad Branch of Circ of Icai: Institute of Chartered Accountants of India
49 pages
Engine Control System
No ratings yet
Engine Control System
24 pages
DATASCI112 Midterm Cheat Sheet
No ratings yet
DATASCI112 Midterm Cheat Sheet
2 pages
Mousemux v2 Loader 2 1 30 32bit 16 10 2024 22 49 01 69b149fdc2cbc160 1c7d9705 Logging
No ratings yet
Mousemux v2 Loader 2 1 30 32bit 16 10 2024 22 49 01 69b149fdc2cbc160 1c7d9705 Logging
2 pages
SK3 16
No ratings yet
SK3 16
6 pages
AWS Step Functions Lab Guide
No ratings yet
AWS Step Functions Lab Guide
2 pages
ISTQB CTFL v4.0 Sample-Exam-D-Questions v1.0 Organized
No ratings yet
ISTQB CTFL v4.0 Sample-Exam-D-Questions v1.0 Organized
15 pages
DBMS Assignment - 2
No ratings yet
DBMS Assignment - 2
3 pages
Physics
No ratings yet
Physics
12 pages
Revit S - Fund - CH01
No ratings yet
Revit S - Fund - CH01
15 pages
Gaussian Mixture Models Explained
No ratings yet
Gaussian Mixture Models Explained
35 pages
Ethiopian Construction Tech Transfer
No ratings yet
Ethiopian Construction Tech Transfer
23 pages
DIY Adriano Obstacle Avoiding Car
No ratings yet
DIY Adriano Obstacle Avoiding Car
6 pages
Python Function Practice 50 Questions
No ratings yet
Python Function Practice 50 Questions
2 pages
Cheng Et Al (2018) A Mixed Method Investigation of Sharing Economy Driven Car-Hailing Services - Online and Offline Perspectives - LT 2019-06-28 13-47-33
No ratings yet
Cheng Et Al (2018) A Mixed Method Investigation of Sharing Economy Driven Car-Hailing Services - Online and Offline Perspectives - LT 2019-06-28 13-47-33
29 pages
JIK8 Calibration
No ratings yet
JIK8 Calibration
2 pages
Database Login Including Forgot Password
No ratings yet
Database Login Including Forgot Password
2 pages
Comet: Instruction Manual
No ratings yet
Comet: Instruction Manual
7 pages
Power Control System: Section
No ratings yet
Power Control System: Section
116 pages
Ciit VC Date Sheet (1st Sessional April 2014)
No ratings yet
Ciit VC Date Sheet (1st Sessional April 2014)
11 pages
Procedure FloBoss Rev1
No ratings yet
Procedure FloBoss Rev1
11 pages
CISSP - Cryptography Drill-Down Handout
No ratings yet
CISSP - Cryptography Drill-Down Handout
43 pages
Siteminder Concepts Guide
100% (2)
Siteminder Concepts Guide
78 pages