Pandas Notes
Introduction
• Pandas is a Python library for data manipulation and analysis.
• Built on top of NumPy.
• Provides two main data structures:
• Series: 1D labeled array.
• DataFrame: 2D labeled data table.
Series
• One-dimensional, like a column.
• Can store integers, floats, strings, etc.
import pandas as pd
s = pd.Series([10, 20, 30], index=["a", "b", "c"])
print(s)
DataFrame
• Two-dimensional table with rows and columns.
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35]
}
df = pd.DataFrame(data)
print(df)
Key Operations
• Reading/Writing Data:
df = pd.read_csv("file.csv")
df.to_excel("output.xlsx")
1
• Exploring Data:
df.head()
df.info()
df.describe()
• Selection:
df["Name"] # Column
df.loc[0] # By label
df.iloc[0] # By position
• Filtering:
df[df["Age"] > 28]
• GroupBy & Aggregation:
df.groupby("Age").mean()
Handling Missing Data
df.fillna(0)
df.dropna()
Merging & Joining
pd.merge(df1, df2, on="id")
2
Visualization with Pandas
df["Age"].plot(kind="hist")
Conclusion
• Pandas simplifies data cleaning, transformation, and analysis.
• Works seamlessly with NumPy, Matplotlib, and Seaborn.