[go: up one dir, main page]

0% found this document useful (0 votes)
2 views2 pages

Chapter 1 Python Pandas Complete

Uploaded by

vasishthaayaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views2 pages

Chapter 1 Python Pandas Complete

Uploaded by

vasishthaayaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Chapter 1: Python Pandas

Introduction & Need for Pandas


Pandas is a fast, powerful, and easy-to-use open-source data analysis and manipulation library
built on top of Python. It provides two main data structures: Series and DataFrame, which make
data cleaning, analysis, and visualization easier.

Series & DataFrame


• Series: A one-dimensional labeled array that can hold any data type. • DataFrame: A
two-dimensional labeled data structure with columns of potentially different types.

Difference Between Series and DataFrame


Aspect Series DataFrame
Dimension 1■D 2■D
Structure Single column with index Multiple rows & columns
Creation From list, array, scalar From dict of lists, list of dicts, CSV etc.
Index Single index Row & Column index
Use Case Handle single column data Handle full tabular data

Creating Series & DataFrame


• From List: pd.Series([10, 20, 30]) • From Dictionary: pd.Series({'a':10, 'b':20}) • From Dictionary of
Lists: pd.DataFrame({'Name':['A','B'],'Marks':[80,90]}) • From CSV: pd.read_csv('file.csv')

Indexing & Selection


• loc[] → Label-based selection • iloc[] → Position-based selection • Boolean indexing: df[df['Marks']
> 50]

Handling Missing Data


• dropna(): Removes rows/columns with missing values • fillna(): Fills missing values with given
value or method

Adding/Deleting Columns & Sorting


• Add Column: df['NewCol'] = data • Delete Column: df.drop('ColumnName', axis=1, inplace=True) •
Sort: df.sort_values(by='Column')

Aggregation & GroupBy


• Aggregate Functions: sum(), mean(), median(), mode(), std(), count(), min(), max() • GroupBy
Example: df.groupby('City')['Marks'].mean()

Descriptive Statistics
• df.describe() → Provides count, mean, std, min, max, and quartiles for numerical columns.

Data Visualization
• Line Plot: df['col'].plot(kind='line') • Bar Plot: df['col'].plot(kind='bar') • Histogram:
df['col'].plot(kind='hist') • Box Plot: df.boxplot()

Important Functions Table


Operation Method / Function
Read CSV pd.read_csv('filename.csv')
Write CSV df.to_csv('filename.csv')
Drop missing df.dropna()
Fill missing df.fillna(value=...)
Sort by value df.sort_values(by='column_name')
Group by df.groupby('column_name').agg(...)
Plot histogram df['column'].hist() or plt.hist(df['column'])
Plot bar chart plt.bar(x, y)

Tips & Common Errors


• Always check for missing values before performing operations. • axis=0 → row-wise, axis=1 →
column-wise operations. • Pay attention to inplace=True while dropping columns (modifies original
DataFrame).

PYQs (Past Year Questions)


1. Differentiate between Series and DataFrame with example. 2. Write a program to create a
DataFrame with columns Name and Marks, and display rows with Marks > 50. 3. How can you
handle missing values in a DataFrame? 4. Write the Python statement to group data by City and
find average Age. 5. Which function is used to display basic statistics of a DataFrame?

You might also like