[go: up one dir, main page]

0% found this document useful (0 votes)
7 views12 pages

Data Project

Uploaded by

chanchalbag112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views12 pages

Data Project

Uploaded by

chanchalbag112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

# SOFTWARES USED: -

• Anaconda3 20240.6-1 (Python 3.12.4 64-bit)


• Spyder IDE 5.5.1 (conda)

# DATASET:
• File: - Employee.csv
• Extension: - .CSV
# IMPORT NECESSARY PACKAGES
import pandas as pd
import numpy as np
import matplotlib. pyplot as plt
# READ CSV FILE
data = pd. read_csv(r"C:\Users\srish\employees 1.csv")
print(data)

Figure 1. Data of employee


#EXPLAINATION
Importing packages that are required for data manipulation, analysis, numerical computations, and data
visualization.
• pandas (as pd): Used for data manipulation and analysis.
• numpy (as np): Potentially for numerical computations (not used actively here).
• matplotlib. pyplot (as plt): Used for data visualization
Reads the CSV file using pd. read_csv and stores it in a DataFrame named data.
Prints the entire DataFrame for initial inspection which is shown in Figure 1.
# CHECKING
print(data.describe())
print(data.info())
print(data.isnull())
print(data.notnull())

Fig2: Describe Data


#EXPLANATION
data. descride() prints a summary of the numerical column in dataFrame which is shown in fig2.
➢ Count: The number of observations.
➢ Mean: The average value.
➢ Standard Deviation (std): The measure of variation or dispersion of the values.
➢ Minimum (min): The smallest value.
➢ 25th Percentile (25%): The value below which 25% of the data fall.
➢ 50th Percentile (Median) (50%): The middle value of the data.
➢ 75th Percentile (75%): The value below which 75% of the data fall.
➢ Maximum (max): The largest value.

Fig3: data info


#EXPLANATION
Data.info()
The purpose of this code is to display the summary information of a DataFrame, including the column
names, data types, and non-null counts. Fig 3
• Data type of each column (e.g., integer, float, object for strings).
• Number of non-null values in each column

Fig4:null checking
#EXPLANATION
Data.isnull()
The code uses Python (likely with the pandas library) to generate a Boolean DataFrame indicating the
presence of missing values (True for missing, False for not missing) for each cell in the original
DataFrame.
Fig5:null checking
#EXPLAINATION
Data.notnull()
The code uses Python (likely with the pandas library) to generate a Boolean DataFrame indicating the
presence of non-missing values (True for non-missing, False for missing) for each cell in the original
DataFrame.
#CALUCLETING THE BONUS
bonus = data['Salary'] * (data['Bonus %'] / 100)
bonus_round = round(bonus,2)
bonus_string = str(bonus_round)

# EXPLANATION
• Calculates a bonus for each employee as a percentage of their salary.
• Rounds the bonus to two decimal places.
• Creates separate variables for the bonus as a string and rounded bonus
# TOTAL SALARY OF EMPLOYEES
print("Total Salary Of Employees")
total_salary = data['Salary'] + bonus
salary_string = total_salary.round(2).astype(str)
print(emp_name + "got the total salary of " + salary_string)
data['Salary'].hist(kind='bar')

Fig6:Employee salary
# EXPLANATION
This histogram is useful for understanding the distribution of salaries in the dataset, identifying
patterns, and spotting potential outliers.
• X-Axis: Represents the salary ranges.
• Y-Axis: Represents the frequency of salaries within each range.
The histogram shows the distribution of salaries, allowing for easy identification of the most common
salary ranges and the overall spread of the data.
# MEAN SALARY OF DEPARTMENT
print("Mean Salary of department")
grouped_team = data.groupby('Team')
mean_salary = grouped_team['Salary'].mean()
print(mean_salary)

fig7: Average salary based on team

#EXPLAINATION
The code uses Python (likely with the pandas library) to perform the following steps:
1. Group Data by Department: Group the dataset by the "Team" column to aggregate data for
each department.
2. Calculate Mean Salary: Compute the mean salary for each department using the mean()
function.
3. Display Results: Output the mean salary for each department in a readable format.

#SALARY SHOWING ON BAR AND LINE GRAPH


plt.figure(figsize=(12, 7))
mean_salary_team.plot(kind='bar',title="average Salary")
plt.plot(mean_salary_team, "-", label="Average Salary")
plt.plot(mean_salary_team, "-", color="red", label="Average
Salary")
plt.plot(mean_salary_team, ".", color="b", label="Average Salary")
plt.xlabel("Department")
plt.ylabel("Salary + Bonus")
Fig 8 : Bar graph of Average Salary
#EXPLAINATION
The purpose of this code is to visualize the average salary (including bonus) for each
department using a bar chart with a line plot overlay.
# BAR PLOT:
plt.bar(mean_salary_by_department.index,
mean_salary_by_department.values, color='blue', alpha=0.7)

fig 9: Line Graph

#EXPLANTION
The purpose of this code is to visualize the average salary (including bonus) for each department
using a line graph.
• X-Axis: Represents the different departments.
• Y-Axis: Represents the average total salary (including bonus) for each department.
• Line Graph: Shows the trend of average salary across departments with data points marked.
#TOTAL TEAM COUNTS
total_team_counts = data['Team'].value_counts().sort_values()
print(total_team_counts)
plt.plot(total_team_counts, "-", label="Total members of team",
color="red")
plt.plot(total_team_counts, ".", color="green", label="Total members
of team")
plt.title("team Counts")
plt.xlabel("team")
plt.ylabel("counts of employee")
plt.xticks(rotation=60)
plt.grid(True)

Fig 10:Scatter plot of Team counts


#EXPLAINATION
The purpose of this code is to count and display the number of employees in each department.
1. Group Data by Department: Group the dataset by the "Team" column to aggregate data for
each department.
2. Count Employees: Count the number of employees in each department using the count()
function.
3. Display Results: Output the employee count for each department in a readable format.

# NULL VALUE COUNTS


null_values = data.isnull()
count_col_null_values = null_values.sum(axis=0)
print(count_col_null_values)
Fig11: Null Value Count
# EXPLANATION
• Identifies missing values using data.isnull().
• Counts the number of missing values per column (commented out).

#CHECK NULL NAME


print("Whose don’t have name ")
null_name = data[data['First Name'].isnull()]
count_value = null_name.shape[0]
print(null_name['First Name'])
print(f"Number of null Name is : {count_value}")

Fig 12: check Null Values

# Filter for Missing Names:


• null_name = data[data['First Name'].isnull()]: Creates a new DataFrame called null_name
containing only rows where the First Name column is null (missing).
• Count Missing Names: count_value = null_name.shape[0]: Calculates the number of rows in
the null_name DataFrame, which represents the count of employees without a first name.

• Print Missing Names: print(null_name['First Name']): Prints the First Name column of the
null_name DataFrame, which will be all null values which shown in Fig 12.
# MISSING ROWS MULTIPLE VALUES
missing_value = data.isnull() | (data == "")
row_missing_value = missing_value.sum(axis=1)
row_missing_value_judge = data[row_missing_value > 2]
num_row_missing_value = row_missing_value_judge.shape[0]
print(f"Number of row missing values : {num_row_missing_value}")
print(num_row_missing_value)

Fig 13 :Row Missing

# EXPLANATION
• Identifies missing values using data. isnull().
• Counts the number of missing values per column .
• Finds employees with missing names or team information.
• Counts rows with more than two missing values , which shown in Fig 13

# GENDER SALARY MEAN


count_gender = data['Gender'].value_counts()
print(count_gender)
salary_mean_genderBased = data.groupby('Gender')['Salary'].mean()

Fig 14:average salary


# EXPLANATION
Counting Gender Distribution: We use the value_counts() method to count the unique values in the
'Gender' column.
Printing Gender Distribution: We print the result, which shows the count of each gender in the dataset.

# SENIOR MANAGEMENT MEAN


salary_mean_management = data.groupby('Senior
Management')['Salary'].mean() average_salary =
data['Salary'].mean()
median_salary = data['Salary'].median()
varience_salary = data['Salary'].var()
#print(f"average_salary:{average_salary},median_salary:{median_sala
ry},varience_salary:{varience_salary}")

Fig 15.Mean,Median,varience
# EXPLANATION
➢ Mean Salary for Senior Management: We use groupby() to group the data by 'Senior Management' and
calculate the mean salary using mean().
➢ Overall Average Salary: We calculate the overall average salary using mean().
➢ Median Salary: We calculate the median salary using median().
➢ Variance of Salary: We calculate the variance of salary using var()
# GENDER COUNTS PIE CHART
gender_counts = data['Gender'].value_counts()
gender_counts. Plot(kind='pie',autopct='%.2f%%',title='Gender
Distribution')

Fig 16: counting Gender


# EXPLANATION
• Counts the number of employees in each gender category.
• Calculates the average salary for each gender group.
• Creates a pie chart to visualize gender distribution.

# AVERAGE SALARY BASED ON GENDER


plt.figure(figsize=(12,8))
data['Salary+Bonus'] = bonus + data['Salary']
salary_gender = data.groupby('Gender')['Salary+Bonus'].mean()
salary_gender.plot(kind='bar', title="Average salary based on
Gender", color=["orange", "skyblue”])
Fig 17. Average Salary Based On Gender
# Explanation
▪ Create Figure: plt.figure(figsize=(12, 8)): Creates a figure with dimensions 12 inches by 8
inches.
▪ Calculate Total Compensation: data['Salary+Bonus'] = bonus + data['Salary']: Creates a new
column named Salary+Bonus in the DataFrame data by adding the bonus and Salary columns.
▪ Create Bar Plot: salary_gender.plot(kind='bar', title="Average salary based on Gender",
color=["green", "blue"]): Creates a bar plot of the salary_gender Series, setting the title to
"Average salary based on Gender" and using green and blue colors for the bars.
# SALARY DISTRIBUTION BASED ON SALARY + BONUS
plt.figure(figsize=(12,8))
data['Salary+Bonus'] = bonus + data['Salary']
plt.hist(data['Salary+Bonus'],edgecolor='white')
plt.title("salary average ")
plt.xlabel("Salary")
plt.ylabel("frequency")
plt.xticks(rotation=60)

Fig17:Salary average Graph


#Explanation
This histogram is useful for understanding the salary distribution within an organization, identifying
salary ranges with the highest and lowest frequencies, and aiding in salary structure analysis and
planning.
· X-Axis: Represents the salary ranges.
· Y-Axis: Represents the frequency of salaries within each range.
· Histogram: Shows the distribution of salaries across the datase

You might also like