Chapter 1: Python Pandas
Introduction & Need for Pandas
Pandas is a fast, powerful, and easy-to-use open-source data analysis and manipulation library
built on top of Python. It provides two main data structures: Series and DataFrame, which make
data cleaning, analysis, and visualization easier.
Series & DataFrame
• Series: A one-dimensional labeled array that can hold any data type. • DataFrame: A
two-dimensional labeled data structure with columns of potentially different types.
Difference Between Series and DataFrame
Aspect Series DataFrame
Dimension 1■D 2■D
Structure Single column with index Multiple rows & columns
Creation From list, array, scalar From dict of lists, list of dicts, CSV etc.
Index Single index Row & Column index
Use Case Handle single column data Handle full tabular data
Creating Series & DataFrame
• From List: pd.Series([10, 20, 30]) • From Dictionary: pd.Series({'a':10, 'b':20}) • From Dictionary of
Lists: pd.DataFrame({'Name':['A','B'],'Marks':[80,90]}) • From CSV: pd.read_csv('file.csv')
Indexing & Selection
• loc[] → Label-based selection • iloc[] → Position-based selection • Boolean indexing: df[df['Marks']
> 50]
Handling Missing Data
• dropna(): Removes rows/columns with missing values • fillna(): Fills missing values with given
value or method
Adding/Deleting Columns & Sorting
• Add Column: df['NewCol'] = data • Delete Column: df.drop('ColumnName', axis=1, inplace=True) •
Sort: df.sort_values(by='Column')
Aggregation & GroupBy
• Aggregate Functions: sum(), mean(), median(), mode(), std(), count(), min(), max() • GroupBy
Example: df.groupby('City')['Marks'].mean()
Descriptive Statistics
• df.describe() → Provides count, mean, std, min, max, and quartiles for numerical columns.
Data Visualization
• Line Plot: df['col'].plot(kind='line') • Bar Plot: df['col'].plot(kind='bar') • Histogram:
df['col'].plot(kind='hist') • Box Plot: df.boxplot()
Important Functions Table
Operation Method / Function
Read CSV pd.read_csv('filename.csv')
Write CSV df.to_csv('filename.csv')
Drop missing df.dropna()
Fill missing df.fillna(value=...)
Sort by value df.sort_values(by='column_name')
Group by df.groupby('column_name').agg(...)
Plot histogram df['column'].hist() or plt.hist(df['column'])
Plot bar chart plt.bar(x, y)
Tips & Common Errors
• Always check for missing values before performing operations. • axis=0 → row-wise, axis=1 →
column-wise operations. • Pay attention to inplace=True while dropping columns (modifies original
DataFrame).
PYQs (Past Year Questions)
1. Differentiate between Series and DataFrame with example. 2. Write a program to create a
DataFrame with columns Name and Marks, and display rows with Marks > 50. 3. How can you
handle missing values in a DataFrame? 4. Write the Python statement to group data by City and
find average Age. 5. Which function is used to display basic statistics of a DataFrame?