Data Project
Data Project
# DATASET:
• File: - Employee.csv
• Extension: - .CSV
# IMPORT NECESSARY PACKAGES
import pandas as pd
import numpy as np
import matplotlib. pyplot as plt
# READ CSV FILE
data = pd. read_csv(r"C:\Users\srish\employees 1.csv")
print(data)
Fig4:null checking
#EXPLANATION
Data.isnull()
The code uses Python (likely with the pandas library) to generate a Boolean DataFrame indicating the
presence of missing values (True for missing, False for not missing) for each cell in the original
DataFrame.
Fig5:null checking
#EXPLAINATION
Data.notnull()
The code uses Python (likely with the pandas library) to generate a Boolean DataFrame indicating the
presence of non-missing values (True for non-missing, False for missing) for each cell in the original
DataFrame.
#CALUCLETING THE BONUS
bonus = data['Salary'] * (data['Bonus %'] / 100)
bonus_round = round(bonus,2)
bonus_string = str(bonus_round)
# EXPLANATION
• Calculates a bonus for each employee as a percentage of their salary.
• Rounds the bonus to two decimal places.
• Creates separate variables for the bonus as a string and rounded bonus
# TOTAL SALARY OF EMPLOYEES
print("Total Salary Of Employees")
total_salary = data['Salary'] + bonus
salary_string = total_salary.round(2).astype(str)
print(emp_name + "got the total salary of " + salary_string)
data['Salary'].hist(kind='bar')
Fig6:Employee salary
# EXPLANATION
This histogram is useful for understanding the distribution of salaries in the dataset, identifying
patterns, and spotting potential outliers.
• X-Axis: Represents the salary ranges.
• Y-Axis: Represents the frequency of salaries within each range.
The histogram shows the distribution of salaries, allowing for easy identification of the most common
salary ranges and the overall spread of the data.
# MEAN SALARY OF DEPARTMENT
print("Mean Salary of department")
grouped_team = data.groupby('Team')
mean_salary = grouped_team['Salary'].mean()
print(mean_salary)
#EXPLAINATION
The code uses Python (likely with the pandas library) to perform the following steps:
1. Group Data by Department: Group the dataset by the "Team" column to aggregate data for
each department.
2. Calculate Mean Salary: Compute the mean salary for each department using the mean()
function.
3. Display Results: Output the mean salary for each department in a readable format.
#EXPLANTION
The purpose of this code is to visualize the average salary (including bonus) for each department
using a line graph.
• X-Axis: Represents the different departments.
• Y-Axis: Represents the average total salary (including bonus) for each department.
• Line Graph: Shows the trend of average salary across departments with data points marked.
#TOTAL TEAM COUNTS
total_team_counts = data['Team'].value_counts().sort_values()
print(total_team_counts)
plt.plot(total_team_counts, "-", label="Total members of team",
color="red")
plt.plot(total_team_counts, ".", color="green", label="Total members
of team")
plt.title("team Counts")
plt.xlabel("team")
plt.ylabel("counts of employee")
plt.xticks(rotation=60)
plt.grid(True)
• Print Missing Names: print(null_name['First Name']): Prints the First Name column of the
null_name DataFrame, which will be all null values which shown in Fig 12.
# MISSING ROWS MULTIPLE VALUES
missing_value = data.isnull() | (data == "")
row_missing_value = missing_value.sum(axis=1)
row_missing_value_judge = data[row_missing_value > 2]
num_row_missing_value = row_missing_value_judge.shape[0]
print(f"Number of row missing values : {num_row_missing_value}")
print(num_row_missing_value)
# EXPLANATION
• Identifies missing values using data. isnull().
• Counts the number of missing values per column .
• Finds employees with missing names or team information.
• Counts rows with more than two missing values , which shown in Fig 13
Fig 15.Mean,Median,varience
# EXPLANATION
➢ Mean Salary for Senior Management: We use groupby() to group the data by 'Senior Management' and
calculate the mean salary using mean().
➢ Overall Average Salary: We calculate the overall average salary using mean().
➢ Median Salary: We calculate the median salary using median().
➢ Variance of Salary: We calculate the variance of salary using var()
# GENDER COUNTS PIE CHART
gender_counts = data['Gender'].value_counts()
gender_counts. Plot(kind='pie',autopct='%.2f%%',title='Gender
Distribution')