Pandas more
. Viewing Data
CSV File: marks.csv
Name,Maths,Science,English
Rahul,88,90,85
Priya,92,85,89
Anil,75,80,78
Sneha,95,91,92
Kiran,89,87,84
Meera,90,93,88
Python Code
import pandas as pd
# Step 1: Read the CSV file
df = pd.read_csv("marks.csv")
# Step 2: View the first 5 rows
print(" First 5 rows:")
print(df.head())
# Step 3: View the last 5 rows
print("\n Last 5 rows:")
print(df.tail())
# Step 4: View the number of rows and columns
print("\n Shape of the DataFrame (rows, columns):")
print(df.shape)
# Step 5: View column names
print("\n Column Names:")
print(df.columns)
# Step 6: Quick summary (data types, non-null count, memory usage)
print("\n DataFrame Info:")
print(df.info())
Selecting Columns or Rows – Full Example
We'll use the same CSV file:
marks.csv
cs
CopyEdit
Name,Maths,Science,English
Rahul,88,90,85
Priya,92,85,89
Anil,75,80,78
Sneha,95,91,92
Kiran,89,87,84
Meera,90,93,88
Python Code – Column and Row Selection
import pandas as pd
df = pd.read_csv("marks.csv")
# Select a single column (Series)
print(" Only Maths column:")
print(df["Maths"])
# Select multiple columns
print("\n Maths and Science columns:")
print(df[["Maths", "Science"]])
# Select a row by label/index using loc[]
print("\n First row using loc[0]:")
print(df.loc[0]) # Rahul's data
# Select a row by position using iloc[]
print("\n Second row using iloc[1]:")
print(df.iloc[1]) # Priya's data
# Select specific student’s data
print("\n Data for student named 'Sneha':")
print(df[df["Name"] == "Sneha"])
Filtering Data (like “if” conditions) – Full Example
CSV File: marks.csv (same as before)
Name,Maths,Science,English
Rahul,88,90,85
Priya,92,85,89
Anil,75,80,78
Sneha,95,91,92
Kiran,89,87,84
Meera,90,93,88
Python Code – Filtering Examples
import pandas as pd
df = pd.read_csv("marks.csv")
# 1. Students who scored more than 90 in Maths
print(" Maths > 90:")
print(df[df["Maths"] > 90])
# 2. Students who scored more than 85 in all subjects
print("\n Scored >85 in Maths, Science, and English:")
high_all = df[(df["Maths"] > 85) & (df["Science"] > 85) & (df["English"] > 85)]
print(high_all)
# 3. Students who scored less than 80 in English
print("\n English < 80:")
print(df[df["English"] < 80])
# 4. Students whose names start with ‘P’ or ‘R’
print("\n Name starts with P or R:")
filtered = df[df["Name"].str.startswith(('P', 'R'))]
print(filtered)
# 5. Students who scored between 85 and 90 in Science
print("\n Science between 85 and 90:")
print(df[(df["Science"] >= 85) & (df["Science"] <= 90)])
Adding New Columns – Full Example
CSV File: marks.csv
(We'll keep using the same data)
cs
CopyEdit
Name,Maths,Science,English
Rahul,88,90,85
Priya,92,85,89
Anil,75,80,78
Sneha,95,91,92
Kiran,89,87,84
Meera,90,93,88
Python Code – Adding Columns
python
CopyEdit
import pandas as pd
df = pd.read_csv("marks.csv")
# 1. Add Total column
df["Total"] = df["Maths"] + df["Science"] + df["English"]
print(" With Total Marks:\n", df)
# 2. Add Average Marks column
df["Average"] = df["Total"] / 3
print("\n With Average Marks:\n", df)
# 3. Add Result column: Pass if all marks >= 80
df["Result"] = df.apply(lambda row: "Pass" if (row["Maths"] >= 80 and row["Science"] >= 80 and
row["English"] >= 80) else "Fail", axis=1)
print("\n With Pass/Fail Result:\n", df)
# 4. Add Grade column based on Average
def get_grade(avg):
if avg >= 90:
return "A+"
elif avg >= 80:
return "A"
elif avg >= 70:
return "B"
else:
return "C"
df["Grade"] = df["Average"].apply(get_grade)
print("\n With Grade:\n", df)
Sorting Data – Full Example
CSV File: marks.csv
(We’re using the same file, now with extra columns like Total, Average, Grade, Result)
Python Code – Sorting Examples
python
CopyEdit
import pandas as pd
df = pd.read_csv("marks.csv")
# Add total and average again if not already present
df["Total"] = df["Maths"] + df["Science"] + df["English"]
df["Average"] = df["Total"] / 3
# 1. Sort by Maths marks (highest to lowest)
print(" Students sorted by Maths score (descending):")
sorted_maths = df.sort_values(by="Maths", ascending=False)
print(sorted_maths)
# 2. Sort by Total marks (highest first)
print("\n Students ranked by Total marks:")
sorted_total = df.sort_values(by="Total", ascending=False)
print(sorted_total)
# 3. Sort by Name (A–Z)
print("\n Sort by student names (alphabetical):")
sorted_name = df.sort_values(by="Name")
print(sorted_name)
# 4. Sort by Average (lowest to highest)
print("\n Sort by Average marks (ascending):")
sorted_avg = df.sort_values(by="Average")
print(sorted_avg)
9. Handling Missing Data (Null or Empty Values) – Full Example
CSV File: marks_missing.csv
csv
CopyEdit
Name,Maths,Science,English
Rahul,88,,85
Priya,92,85,89
Anil,,80,78
Sneha,95,91,
Kiran,89,87,84
Meera,,93,88
Python Code – Handling Missing Values
python
CopyEdit
import pandas as pd
df = pd.read_csv("marks_missing.csv")
# 1. Display rows with missing values
print(" Rows with missing data:")
print(df[df.isnull().any(axis=1)])
# 2. Fill missing values with 0 (assume absent)
df_fill_zero = df.fillna(0)
print("\n Missing values filled with 0:")
print(df_fill_zero)
# 4. Drop rows with *any* missing values
df_dropped = df.dropna()
print("\n Dropped rows with missing values:")
print(df_dropped)
0. GroupBy – Group and Summarize Data – Full Example
CSV File: classmarks.csv
csv
CopyEdit
Name,Class,Maths,Science,English
Rahul,10A,88,90,85
Priya,10A,92,85,89
Anil,10B,75,80,78
Sneha,10B,95,91,92
Kiran,10A,89,87,84
Meera,10B,90,93,88
Python Code – GroupBy Examples
python
CopyEdit
import pandas as pd
df = pd.read_csv("classmarks.csv")
# 1. Average Maths marks by Class
print(" Average Maths by Class:")
print(df.groupby("Class")["Maths"].mean())
# 2. Average of all subjects by Class
print("\n Subject-wise average by Class:")
print(df.groupby("Class")[["Maths", "Science", "English"]].mean())
# 3. Count of students in each Class
print("\n Student count per Class:")
print(df.groupby("Class")["Name"].count())
# 4. Maximum marks in English per Class
print("\n Highest English score per Class:")
print(df.groupby("Class")["English"].max())
# 5. Add Total marks and get class-wise average total
df["Total"] = df["Maths"] + df["Science"] + df["English"]
print("\n Average Total Marks per Class:")
print(df.groupby("Class")["Total"].mean())
11. Date and Time Handling – Full Example
CSV File: students.csv
csv
CopyEdit
Name,JoinDate
Rahul,2020-06-12
Priya,2019-04-20
Anil,2021-08-01
Sneha,2022-11-15
Kiran,2020-01-10
Meera,2021-03-05
Python Code – DateTime Parsing & Extraction
python
CopyEdit
import pandas as pd
# 1. Load the data
df = pd.read_csv("students.csv")
# 2. Convert 'JoinDate' to datetime format
df["JoinDate"] = pd.to_datetime(df["JoinDate"])
# 3. Extract Year, Month, Day
df["Year"] = df["JoinDate"].dt.year
df["Month"] = df["JoinDate"].dt.month
df["Day"] = df["JoinDate"].dt.day
# 4. Day name (Monday, Tuesday...)
df["DayName"] = df["JoinDate"].dt.day_name()
# 5. Full output
print(" Detailed Join Date Info:")
print(df)