0% found this document useful (0 votes)

7 views12 pages

Data Project

Uploaded by

chanchalbag112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views12 pages

Data Project

Uploaded by

chanchalbag112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

# SOFTWARES USED: -

• Anaconda3 20240.6-1 (Python 3.12.4 64-bit)

• Spyder IDE 5.5.1 (conda)

# DATASET:
• File: - Employee.csv
• Extension: - .CSV
# IMPORT NECESSARY PACKAGES
import pandas as pd
import numpy as np
import matplotlib. pyplot as plt
# READ CSV FILE
data = pd. read_csv(r"C:\Users\srish\employees 1.csv")
print(data)

Figure 1. Data of employee

#EXPLAINATION
Importing packages that are required for data manipulation, analysis, numerical computations, and data
visualization.
• pandas (as pd): Used for data manipulation and analysis.
• numpy (as np): Potentially for numerical computations (not used actively here).
• matplotlib. pyplot (as plt): Used for data visualization
Reads the CSV file using pd. read_csv and stores it in a DataFrame named data.
Prints the entire DataFrame for initial inspection which is shown in Figure 1.
# CHECKING
print(data.describe())
print(data.info())
print(data.isnull())
print(data.notnull())

Fig2: Describe Data

#EXPLANATION
data. descride() prints a summary of the numerical column in dataFrame which is shown in fig2.
➢ Count: The number of observations.
➢ Mean: The average value.
➢ Standard Deviation (std): The measure of variation or dispersion of the values.
➢ Minimum (min): The smallest value.
➢ 25th Percentile (25%): The value below which 25% of the data fall.
➢ 50th Percentile (Median) (50%): The middle value of the data.
➢ 75th Percentile (75%): The value below which 75% of the data fall.
➢ Maximum (max): The largest value.

Fig3: data info

#EXPLANATION
Data.info()
The purpose of this code is to display the summary information of a DataFrame, including the column
names, data types, and non-null counts. Fig 3
• Data type of each column (e.g., integer, float, object for strings).
• Number of non-null values in each column

Fig4:null checking
#EXPLANATION
Data.isnull()
The code uses Python (likely with the pandas library) to generate a Boolean DataFrame indicating the
presence of missing values (True for missing, False for not missing) for each cell in the original
DataFrame.
Fig5:null checking
#EXPLAINATION
Data.notnull()
The code uses Python (likely with the pandas library) to generate a Boolean DataFrame indicating the
presence of non-missing values (True for non-missing, False for missing) for each cell in the original
DataFrame.
#CALUCLETING THE BONUS
bonus = data['Salary'] * (data['Bonus %'] / 100)
bonus_round = round(bonus,2)
bonus_string = str(bonus_round)

# EXPLANATION
• Calculates a bonus for each employee as a percentage of their salary.
• Rounds the bonus to two decimal places.
• Creates separate variables for the bonus as a string and rounded bonus
# TOTAL SALARY OF EMPLOYEES
print("Total Salary Of Employees")
total_salary = data['Salary'] + bonus
salary_string = total_salary.round(2).astype(str)
print(emp_name + "got the total salary of " + salary_string)
data['Salary'].hist(kind='bar')

Fig6:Employee salary
# EXPLANATION
This histogram is useful for understanding the distribution of salaries in the dataset, identifying
patterns, and spotting potential outliers.
• X-Axis: Represents the salary ranges.
• Y-Axis: Represents the frequency of salaries within each range.
The histogram shows the distribution of salaries, allowing for easy identification of the most common
salary ranges and the overall spread of the data.
# MEAN SALARY OF DEPARTMENT
print("Mean Salary of department")
grouped_team = data.groupby('Team')
mean_salary = grouped_team['Salary'].mean()
print(mean_salary)

fig7: Average salary based on team

#EXPLAINATION
The code uses Python (likely with the pandas library) to perform the following steps:
1. Group Data by Department: Group the dataset by the "Team" column to aggregate data for
each department.
2. Calculate Mean Salary: Compute the mean salary for each department using the mean()
function.
3. Display Results: Output the mean salary for each department in a readable format.

#SALARY SHOWING ON BAR AND LINE GRAPH

plt.figure(figsize=(12, 7))
mean_salary_team.plot(kind='bar',title="average Salary")
plt.plot(mean_salary_team, "-", label="Average Salary")
plt.plot(mean_salary_team, "-", color="red", label="Average
Salary")
plt.plot(mean_salary_team, ".", color="b", label="Average Salary")
plt.xlabel("Department")
plt.ylabel("Salary + Bonus")
Fig 8 : Bar graph of Average Salary
#EXPLAINATION
The purpose of this code is to visualize the average salary (including bonus) for each
department using a bar chart with a line plot overlay.
# BAR PLOT:
plt.bar(mean_salary_by_department.index,
mean_salary_by_department.values, color='blue', alpha=0.7)

fig 9: Line Graph

#EXPLANTION
The purpose of this code is to visualize the average salary (including bonus) for each department
using a line graph.
• X-Axis: Represents the different departments.
• Y-Axis: Represents the average total salary (including bonus) for each department.
• Line Graph: Shows the trend of average salary across departments with data points marked.
#TOTAL TEAM COUNTS
total_team_counts = data['Team'].value_counts().sort_values()
print(total_team_counts)
plt.plot(total_team_counts, "-", label="Total members of team",
color="red")
plt.plot(total_team_counts, ".", color="green", label="Total members
of team")
plt.title("team Counts")
plt.xlabel("team")
plt.ylabel("counts of employee")
plt.xticks(rotation=60)
plt.grid(True)

Fig 10:Scatter plot of Team counts

#EXPLAINATION
The purpose of this code is to count and display the number of employees in each department.
1. Group Data by Department: Group the dataset by the "Team" column to aggregate data for
each department.
2. Count Employees: Count the number of employees in each department using the count()
function.
3. Display Results: Output the employee count for each department in a readable format.

# NULL VALUE COUNTS

null_values = data.isnull()
count_col_null_values = null_values.sum(axis=0)
print(count_col_null_values)
Fig11: Null Value Count
# EXPLANATION
• Identifies missing values using data.isnull().
• Counts the number of missing values per column (commented out).

#CHECK NULL NAME

print("Whose don’t have name ")
null_name = data[data['First Name'].isnull()]
count_value = null_name.shape[0]
print(null_name['First Name'])
print(f"Number of null Name is : {count_value}")

Fig 12: check Null Values

# Filter for Missing Names:

• null_name = data[data['First Name'].isnull()]: Creates a new DataFrame called null_name
containing only rows where the First Name column is null (missing).
• Count Missing Names: count_value = null_name.shape[0]: Calculates the number of rows in
the null_name DataFrame, which represents the count of employees without a first name.

• Print Missing Names: print(null_name['First Name']): Prints the First Name column of the
null_name DataFrame, which will be all null values which shown in Fig 12.
# MISSING ROWS MULTIPLE VALUES
missing_value = data.isnull() | (data == "")
row_missing_value = missing_value.sum(axis=1)
row_missing_value_judge = data[row_missing_value > 2]
num_row_missing_value = row_missing_value_judge.shape[0]
print(f"Number of row missing values : {num_row_missing_value}")
print(num_row_missing_value)

Fig 13 :Row Missing

# EXPLANATION
• Identifies missing values using data. isnull().
• Counts the number of missing values per column .
• Finds employees with missing names or team information.
• Counts rows with more than two missing values , which shown in Fig 13

# GENDER SALARY MEAN

count_gender = data['Gender'].value_counts()
print(count_gender)
salary_mean_genderBased = data.groupby('Gender')['Salary'].mean()

Fig 14:average salary

# EXPLANATION
Counting Gender Distribution: We use the value_counts() method to count the unique values in the
'Gender' column.
Printing Gender Distribution: We print the result, which shows the count of each gender in the dataset.

# SENIOR MANAGEMENT MEAN

salary_mean_management = data.groupby('Senior
Management')['Salary'].mean() average_salary =
data['Salary'].mean()
median_salary = data['Salary'].median()
varience_salary = data['Salary'].var()
#print(f"average_salary:{average_salary},median_salary:{median_sala
ry},varience_salary:{varience_salary}")

Fig 15.Mean,Median,varience
# EXPLANATION
➢ Mean Salary for Senior Management: We use groupby() to group the data by 'Senior Management' and
calculate the mean salary using mean().
➢ Overall Average Salary: We calculate the overall average salary using mean().
➢ Median Salary: We calculate the median salary using median().
➢ Variance of Salary: We calculate the variance of salary using var()
# GENDER COUNTS PIE CHART
gender_counts = data['Gender'].value_counts()
gender_counts. Plot(kind='pie',autopct='%.2f%%',title='Gender
Distribution')

Fig 16: counting Gender

# EXPLANATION
• Counts the number of employees in each gender category.
• Calculates the average salary for each gender group.
• Creates a pie chart to visualize gender distribution.

# AVERAGE SALARY BASED ON GENDER

plt.figure(figsize=(12,8))
data['Salary+Bonus'] = bonus + data['Salary']
salary_gender = data.groupby('Gender')['Salary+Bonus'].mean()
salary_gender.plot(kind='bar', title="Average salary based on
Gender", color=["orange", "skyblue”])
Fig 17. Average Salary Based On Gender
# Explanation
▪ Create Figure: plt.figure(figsize=(12, 8)): Creates a figure with dimensions 12 inches by 8
inches.
▪ Calculate Total Compensation: data['Salary+Bonus'] = bonus + data['Salary']: Creates a new
column named Salary+Bonus in the DataFrame data by adding the bonus and Salary columns.
▪ Create Bar Plot: salary_gender.plot(kind='bar', title="Average salary based on Gender",
color=["green", "blue"]): Creates a bar plot of the salary_gender Series, setting the title to
"Average salary based on Gender" and using green and blue colors for the bars.
# SALARY DISTRIBUTION BASED ON SALARY + BONUS
plt.figure(figsize=(12,8))
data['Salary+Bonus'] = bonus + data['Salary']
plt.hist(data['Salary+Bonus'],edgecolor='white')
plt.title("salary average ")
plt.xlabel("Salary")
plt.ylabel("frequency")
plt.xticks(rotation=60)

Fig17:Salary average Graph

#Explanation
This histogram is useful for understanding the salary distribution within an organization, identifying
salary ranges with the highest and lowest frequencies, and aiding in salary structure analysis and
planning.
· X-Axis: Represents the salary ranges.
· Y-Axis: Represents the frequency of salaries within each range.
· Histogram: Shows the distribution of salaries across the datase

Phython Example
No ratings yet
Phython Example
12 pages
L6 and 7-Data Preprocessing-coding
No ratings yet
L6 and 7-Data Preprocessing-coding
34 pages
Document (4)-1
No ratings yet
Document (4)-1
15 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
DAP writeups_merged
No ratings yet
DAP writeups_merged
33 pages
Maxbox Starter139 Top5 Data Diagram Types
No ratings yet
Maxbox Starter139 Top5 Data Diagram Types
4 pages
Python Pandas II Notes XII
No ratings yet
Python Pandas II Notes XII
20 pages
Document (4)
No ratings yet
Document (4)
15 pages
CSE445 NSU Week_3
No ratings yet
CSE445 NSU Week_3
48 pages
Python
No ratings yet
Python
32 pages
prints
No ratings yet
prints
43 pages
2,3. Introduction Pandas & Matplotlib - Copy
No ratings yet
2,3. Introduction Pandas & Matplotlib - Copy
32 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Python For Data Science
No ratings yet
Python For Data Science
45 pages
rajni_ip_file_final
No ratings yet
rajni_ip_file_final
42 pages
Question Bank Class XII IP 065 Long Question Answer
No ratings yet
Question Bank Class XII IP 065 Long Question Answer
35 pages
Lesson 07 Data Manipulation With Pandas
No ratings yet
Lesson 07 Data Manipulation With Pandas
82 pages
Data Understanding and Preparation
No ratings yet
Data Understanding and Preparation
48 pages
2-Introduction to data cleaning P02
No ratings yet
2-Introduction to data cleaning P02
7 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
CO3_1_Pandas Series and Data Frame
No ratings yet
CO3_1_Pandas Series and Data Frame
37 pages
Certificate
No ratings yet
Certificate
25 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Salaries for San Francisco Employee _ ML _ FA _ DA projects
No ratings yet
Salaries for San Francisco Employee _ ML _ FA _ DA projects
33 pages
Code explanation for date types
No ratings yet
Code explanation for date types
8 pages
manishadav
No ratings yet
manishadav
27 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Exp 8_LM
No ratings yet
Exp 8_LM
10 pages
CLS - Xii - Ip - Practical & Project - 2022-23
No ratings yet
CLS - Xii - Ip - Practical & Project - 2022-23
6 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Python-for-Data-Analysis-edgar
No ratings yet
Python-for-Data-Analysis-edgar
49 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
QP DAV 3rd Sem Dec 2023
No ratings yet
QP DAV 3rd Sem Dec 2023
12 pages
project
No ratings yet
project
10 pages
data analysis
No ratings yet
data analysis
42 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Python For Data Analysis: Dr. Kishore Kunal
100% (1)
Python For Data Analysis: Dr. Kishore Kunal
43 pages
EDA+Cheatsheet+-+Class+Note
No ratings yet
EDA+Cheatsheet+-+Class+Note
29 pages
ESE Ques Pattern
No ratings yet
ESE Ques Pattern
3 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
PySpark_slides
No ratings yet
PySpark_slides
30 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Chapter 4 - Python For Data Analysis
No ratings yet
Chapter 4 - Python For Data Analysis
47 pages
Assignment
No ratings yet
Assignment
12 pages
Grade 12 Informatics Practical practice 2024-25
No ratings yet
Grade 12 Informatics Practical practice 2024-25
12 pages
211423205047-Exp1d
No ratings yet
211423205047-Exp1d
6 pages
L3 Notes-1
No ratings yet
L3 Notes-1
8 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Complete Bundle Intro to Python for Computer Science and Data Science Learning to Program With AI Big Data and the Cloud Deitel HQ File
No ratings yet
Complete Bundle Intro to Python for Computer Science and Data Science Learning to Program With AI Big Data and the Cloud Deitel HQ File
404 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
9 pages
All India Senior Secondary Certificate Examination
No ratings yet
All India Senior Secondary Certificate Examination
12 pages
Data Science IBM
No ratings yet
Data Science IBM
157 pages
Fods Unit 4 Ict QP
No ratings yet
Fods Unit 4 Ict QP
2 pages
Aditya Kumar - Internship Report
No ratings yet
Aditya Kumar - Internship Report
3 pages
Data Analysis Using Python Day_1 to Day_4
No ratings yet
Data Analysis Using Python Day_1 to Day_4
30 pages
MandeepSingh
No ratings yet
MandeepSingh
1 page
InformationPractice
No ratings yet
InformationPractice
10 pages
Sales Management System Report File - 4
No ratings yet
Sales Management System Report File - 4
23 pages
Data Analyst Roadmap With Free Resources
No ratings yet
Data Analyst Roadmap With Free Resources
9 pages
Xii Ip Study Material
No ratings yet
Xii Ip Study Material
92 pages
Vikrant_Resume Lates (2)
No ratings yet
Vikrant_Resume Lates (2)
1 page
PythonFinalProject
No ratings yet
PythonFinalProject
3 pages
MSC Data Science 2022
No ratings yet
MSC Data Science 2022
102 pages
Unit - 5 Python
No ratings yet
Unit - 5 Python
45 pages
Samuel Updated Resume For May 13
No ratings yet
Samuel Updated Resume For May 13
1 page
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
Khushbu bijawat
No ratings yet
Khushbu bijawat
1 page
Titanic Dataset Model Prediction
No ratings yet
Titanic Dataset Model Prediction
11 pages
21ad71 Simp
No ratings yet
21ad71 Simp
3 pages
Data-Science Project Life Cycle
No ratings yet
Data-Science Project Life Cycle
3 pages
Split Up Syllabus 2023 24 IP XII
No ratings yet
Split Up Syllabus 2023 24 IP XII
5 pages
Customer Segmentation Using K-Means Algorithm PROJECT
No ratings yet
Customer Segmentation Using K-Means Algorithm PROJECT
28 pages
Phthon Notes
No ratings yet
Phthon Notes
13 pages
A Review On Python Libraries and Ides For Data Science: November 2021
No ratings yet
A Review On Python Libraries and Ides For Data Science: November 2021
19 pages
UNIT - 4
No ratings yet
UNIT - 4
27 pages
Soft Computing Lab
No ratings yet
Soft Computing Lab
2 pages
Class 12 IP Practice Assignment Series 2
No ratings yet
Class 12 IP Practice Assignment Series 2
4 pages
KRISHNAKUMAR
No ratings yet
KRISHNAKUMAR
1 page