[go: up one dir, main page]

0% found this document useful (0 votes)
11 views7 pages

Pandas NEW

The document provides an overview of using Pandas for data analysis, particularly with weather data from Jaipur and Olympic medalist data. It explains the basic data structures in Pandas, such as Series and DataFrame, and includes practical exercises for visualizing data using bar charts, pie charts, and line charts. The document also contains Python code snippets for performing various data manipulation and visualization tasks using Pandas and Matplotlib.

Uploaded by

Hi5 Ver
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Pandas NEW

The document provides an overview of using Pandas for data analysis, particularly with weather data from Jaipur and Olympic medalist data. It explains the basic data structures in Pandas, such as Series and DataFrame, and includes practical exercises for visualizing data using bar charts, pie charts, and line charts. The document also contains Python code snippets for performing various data manipulation and visualization tasks using Pandas and Matplotlib.

Uploaded by

Hi5 Ver
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Science using Pandas

 Pandas word derived from PANel DAta System.


 It becomes popular for data analysis.
 It provides highly optimized performance with back-end source code purely written in C or Python.
 It makes a simple and easy process for data analysis.

Pandas offer two basic data structures:

1. Series
2. DataFrame

We will be working with Jaipur weather data obtained from Kaggle, a platform for data enthusiasts to gather or
share knowledge. The data has been cleaned and simplified, so that we can focus on data visualization instead of
data cleaning. Our data is stored in the file named JaipurFinalCleanData.csv (CSV (Comma-Separated Value) is a
file containing a set of data, separated by commas). This file contains weather information of Jaipur and is saved
at the same location as the notebook.

Today, we will learn how to use Python to open csv files.

import pandas as pd

#saving the csv file into a variable which we will call data frame

dataframe = pd.read_csv("medallists.csv")

#dataframe.head() means we are getting the first 5 rows of data

dataframe.head()

### Display the first 10 rows of data by modifying the function above

dataframe.head(10)

###If you have a large DataFrame with many rows, Pandas will ### only return the first 5 rows, and the last 5
rows

print(dataframe)

### use to_string() to print the entire DataFrame.

print(dataframe.to_string())

###Sorting values using pandas by using the sort_values() function.

medallists = dataframe.sort_values(by='medal_code', ascending = False)

print(medallists.head(5))

###Sort the values in ascending order of mean country code and print the first 5 rows

medallists= dataframe.sort_values(by='country_code',ascending = True)

print(medallists.head(5))

###Pandas provide an easy way for us to drop columns using the ".drop" function.

dataframe = dataframe.drop(["max_dew_pt_2"], axis=1) # index (0 or ‘index’) or columns (1 or ‘columns’).

Page 1 of 7
###Drop the following columns: (min_dew_pt_2, max_pressure_2, min_pressure_2)

dataframe = dataframe.drop(["min_dew_pt_2", "max_pressure_2", "min_pressure_2"], axis=1)

Exercise

Question 1:

Using Pandas, calculate the total number of medals won by each country. Create a bar chart to
visualize the top 10 countries with the most medals.

Hint:

 Group the data by the country column and count the number of medals.
 Use the plot.bar() function from Pandas or Matplotlib to create the bar chart.

Question 2:

Analyze the distribution of medal types (Gold, Silver, Bronze) across all countries. Create a pie chart
that shows the proportion of each medal type.

Hint:

 Use the medal_type column to count the occurrences of each medal type.
 Use the plot.pie() function to create the pie chart.

Question 3:

Using Pandas, find out how the number of medals awarded changes over time. Create a line chart that
plots the number of medals awarded per day.

Hint:

 Group the data by medal_date and count the number of medals awarded each day.
 Use the plot.line() function to create the line chart.

Question 4:

Compare the performance of male and female athletes by counting the total number of medals won by
each gender. Create a bar chart to visualize this comparison.

Hint:

 Group the data by gender and count the number of medals.


 Use the plot.bar() function to create the bar chart.

Page 2 of 7
Question 5:

Analyze the performance in a specific event, such as the "Men's Individual Time Trial." Count the
number of each type of medal won in this event and create a bar chart to visualize the results.

Hint:

 Filter the data for the desired event using the event column.
 Group by medal_type to count the number of each type of medal.
 Use the plot.bar() function to create the bar chart.

Page 3 of 7
Solution
Question 1

import matplotlib.pyplot as plt

import pandas as pd

df = pd.read_csv("medallists.csv")

# Group by country and count the number of medals

medals_by_country = df.groupby('country')['medal_type'].count().sort_values(ascending=False)

# Select the top 10 countries

top_10_countries = medals_by_country.head(10)

# Create a bar chart

top_10_countries.plot(kind='bar', color='skyblue')

plt.title('Top 10 Countries with Most Medals')

plt.xlabel('Country')

plt.ylabel('Number of Medals')

plt.xticks(rotation=45)

plt.show()

Question 2

import matplotlib.pyplot as plt

import pandas as pd

df = pd.read_csv("medallists.csv")

# Count the occurrences of each medal type

medal_distribution = df['medal_type'].value_counts()

# Create a pie chart

medal_distribution.plot(kind='pie', autopct='%1.1f%%', colors=['gold', 'silver', '#cd7f32'])

plt.title('Distribution of Medal Types')

plt.ylabel('') # Hide the y-label

plt.show()

Page 4 of 7
Question 3

import matplotlib.pyplot as plt

import pandas as pd

df = pd.read_csv("medallists.csv")

# Convert medal_date to datetime format

df['medal_date'] = pd.to_datetime(df['medal_date'])

# Group by medal_date and count the number of medals awarded each day

medals_by_date = df.groupby('medal_date')['medal_type'].count()

# Create a line chart

medals_by_date.plot(kind='line', color='green', marker='o')

plt.title('Number of Medals Awarded Over Time')

plt.xlabel('Date')

plt.ylabel('Number of Medals')

plt.grid(True)

plt.show()

Question 4

import matplotlib.pyplot as plt

import pandas as pd

df = pd.read_csv("medallists.csv")

# Group by gender and count the number of medals

medals_by_gender = df.groupby('gender')['medal_type'].count()

# Create a bar chart

medals_by_gender.plot(kind='bar', color=['blue', 'pink'])

plt.title('Medals Won by Gender')

plt.xlabel('Gender')

plt.ylabel('Number of Medals')

plt.xticks(rotation=0)

plt.show()

Page 5 of 7
Question 5

import matplotlib.pyplot as plt

import pandas as pd

df = pd.read_csv("medallists.csv")

# Filter the data for the "Men's Individual Time Trial" event

event_data = df[df['event'] == "Men's Individual Time Trial"]

# Group by medal_type and count the number of each type of medal

medals_in_event = event_data.groupby('medal_type')['medal_type'].count()

# Create a bar chart

medals_in_event.plot(kind='bar', color=['gold', 'silver', '#cd7f32'])

plt.title("Medals in Men's Individual Time Trial")

plt.xlabel('Medal Type')

plt.ylabel('Number of Medals')

plt.xticks(rotation=0)

plt.show()

Page 6 of 7
Bonus

Using the provided dataset of Olympic medalists, write a Python script to display a bar chart showing the
number of Gold, Silver, and Bronze medals won by India.

import matplotlib.pyplot as plt

import pandas as pd

df = pd.read_csv("medallists.csv")

# Filter the data for medals won by India

india_medals = df[df['country'] == 'India']

# Group by medal_type and count the number of medals

india_medals_count = india_medals.groupby('medal_type')['medal_type'].count()

# Create a bar chart

india_medals_count.plot(kind='bar', color=['gold', 'silver', '#cd7f32'])

plt.title('Medals Won by India')

plt.xlabel('Medal Type')

plt.ylabel('Number of Medals')

plt.xticks(rotation=0)

plt.show()

HOT:

Display the same for the country entered by the user

Page 7 of 7

You might also like