Data Handling with Pandas - IP Class 12 Notes
1. Introduction to Python Libraries: Pandas, Matplotlib
Pandas
- A powerful Python library for data manipulation and analysis.
- Two main data structures: Series (1D labeled array), DataFrame (2D labeled table).
Importing Pandas:
import pandas as pd
Matplotlib
- A Python library to create visualizations (graphs, charts).
Importing Matplotlib:
import matplotlib.pyplot as plt
2. Data Structures in Pandas
Series: 1D labeled array (example: list of numbers)
DataFrame: 2D labeled table (example: spreadsheet table)
3. Series in Pandas
Theory
- One-dimensional labeled array, can hold any data type (int, float, str).
- Has both indexes and values.
Creating Series
- From ndarray:
import numpy as np
arr = np.array([10, 20, 30, 40])
s = pd.Series(arr)
- From Dictionary:
Data Handling with Pandas - IP Class 12 Notes
data = {'a': 100, 'b': 200, 'c': 300}
s = pd.Series(data)
- From Scalar Value:
s = pd.Series(5, index=['a', 'b', 'c'])
Mathematical Operations:
s = pd.Series([1, 2, 3])
print(s + 5)
Head and Tail Functions:
print(s.head(2))
print(s.tail(2))
4. Selection, Indexing, and Slicing
Theory
- Selection: Choosing item by index or label.
- Indexing: Accessing elements by label/position.
- Slicing: Accessing range of elements.
Selection and Indexing
- By position: s[0]
- By label: s['a']
Slicing
- s[1:3] (position based)
- s['b':'d'] (label based)
5. DataFrames in Pandas
Theory
Data Handling with Pandas - IP Class 12 Notes
- 2D table with labeled rows and columns.
Creating DataFrames
- Dictionary of Series
- List of Dictionaries
- From CSV file
6. Indexing and Slicing in DataFrames
- loc[]: Label-based indexing
- iloc[]: Integer-based indexing
Example:
df.loc[0]
df.iloc[0]
Slicing:
df[0:2]
Differences:
loc is label based (inclusive), iloc is integer based (exclusive).
7. Head() and Tail() in DataFrames
- df.head(3)
- df.tail(2)
8. Importing and Exporting CSV
Import:
df = pd.read_csv('file.csv')
Data Handling with Pandas - IP Class 12 Notes
Export:
df.to_csv('newfile.csv', index=False)
Full Flowchart:
Import pandas -> Create Series/DataFrame -> Index/Selection/Slicing -> Apply Operations -> Import/Export
CSV
Summary
Series vs DataFrame: 1D array vs 2D table
loc vs iloc: label based vs integer based
End of Notes