[go: up one dir, main page]

0% found this document useful (0 votes)
4 views4 pages

Pandas

Uploaded by

db7646461
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

Pandas

Uploaded by

db7646461
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

📘 Class Notes + Colab Code: Pandas DataFrame

Basics

1. Introduction to Pandas
Pandas is a Python library for data analysis.

Provides two main data structures: - Series → one-dimensional (like a single column). - DataFrame → two-
dimensional (like an Excel spreadsheet).

Why use Pandas instead of spreadsheets? - Automation: repeat tasks easily. - Reproducibility: every step is
written in code. - Flexibility: works across OS, integrates with many data sources.

2. Load Your First Dataset

# Import pandas
import pandas as pd

# Load the Gapminder dataset (tab-separated file)


df = pd.read_csv("https://raw.githubusercontent.com/jennybc/gapminder/master/
data/gapminder.tsv", sep="\t")

# Print first few rows


print(df.head())

👉 Teaching Point: - .read_csv() loads CSV/TSV files. - Always check .head() to preview data.

3. Inspect DataFrame Structure

# Type of object
print(type(df))

# Shape: rows and columns


print("Shape:", df.shape)

# Column names
print("Columns:", df.columns)

1
# Data types
print(df.dtypes)

# More detailed info


print(df.info())

👉 Teaching Point: - .shape is an attribute, not a method → no parentheses. - Columns can be


object , int64 , float64 .

4. Select Columns

# Single column → Series


country_series = df['country']
print(type(country_series))

# Single column → DataFrame


country_df = df[['country']]
print(type(country_df))

# Multiple columns
subset = df[['country', 'year', 'lifeExp']]
print(subset.head())

# Dot notation (shortcut)


print(df.country.head())

👉 Teaching Point: - df['col'] → Series - df[['col']] → DataFrame

5. Select Rows

# By label with .loc[]


print(df.loc[0]) # First row
print(df.loc[[0, 99]]) # Multiple rows

# By index with .iloc[]


print(df.iloc[0]) # First row
print(df.iloc[-1]) # Last row
print(df.iloc[[0, 99, 999]])

👉 Teaching Point: - .loc[] → uses labels (row index names). - .iloc[] → uses positions (row
numbers).

2
6. Subset Rows and Columns

# Select rows 0, 99, 999 and columns country, lifeExp, gdpPercap


print(df.loc[[0, 99, 999], ['country', 'lifeExp', 'gdpPercap']])

# Same with iloc (by position)


print(df.iloc[[0, 99, 999], [0, 3, 5]])

7. Grouped and Aggregated Statistics

# Average life expectancy by year


print(df.groupby('year')['lifeExp'].mean())

# Average lifeExp and gdpPercap by year + continent


grouped = df.groupby(['year', 'continent'])[['lifeExp', 'gdpPercap']].mean()
print(grouped.head())

# Flatten the grouped result


print(grouped.reset_index().head())

# Number of countries per continent


print(df.groupby('continent')['country'].nunique())

👉 Teaching Point: - .groupby() = split → apply → combine. - Use .mean() , .sum() , .count() ,
etc.

8. Basic Plotting

import matplotlib.pyplot as plt

# Global yearly life expectancy trend


global_yearly_life = df.groupby('year')['lifeExp'].mean()

# Plot
global_yearly_life.plot(title="Average Life Expectancy Over Time")
plt.xlabel("Year")
plt.ylabel("Life Expectancy")
plt.show()

3
👉 Teaching Point: - Pandas integrates with Matplotlib. - .plot() quickly visualizes trends.

You might also like