Module_4
Module_4
1. Series
2. DataFrame
Output:
Essential Functionalities
📊 1. Reading Data
🔍 2. Exploring Data
🧹 3. Cleaning Data
🎯 4. Filtering and Selecting
🧮 6. Mathematical Operations
🕒 9. DateTime Handling
📤 10. Exporting Data
🧠 Performance Tips
🧾 Summary Table
Feature Description
Data Structures Series (1D), DataFrame (2D)
File IO Read/write CSV, Excel, JSON, SQL
Analysis Tools Groupby, aggregation, filtering
Cleaning Drop, fill, replace, rename
Integration NumPy, Matplotlib, Scikit-learn, SQL
Applications EDA, data prep, modeling, dashboards
A Pandas Series is a one-dimensional labeled array that can hold any data
type (integers, strings, floats, Python objects, etc.).
Pandas Series provides multiple ways to access (query) and index data efficiently.
By position:
Output:
Output:
🔹 7. Modifying Values
Output:
Helps identify high readings for alerts in time-series or IoT sensor data.
📘 What Is a DataFrame?
🧱 Structure of a DataFrame
A DataFrame has:
Source Function
CSV pd.read_csv('file.csv')
Excel pd.read_excel('file.xlsx')
JSON pd.read_json('file.json')
SQL pd.read_sql(query, conn)
Clipboard pd.read_clipboard()
Dictionary pd.DataFrame(dict)
Example – Loading from CSV
🔍 Querying a DataFrame
1. Selecting Columns
2. Selecting Rows
🎯 Indexing in DataFrame
1. Default Index
Auto-generated: 0, 1, 2…
2. Custom Index
3. Set/Reset Index
🔁 Slicing Rows and Columns
Adding a Column
Removing a Column
Changing Values
🧮 Math Operations
🧪 Example: Filter and Compute
✅ Summary Table
Feature Command/Example
Load CSV pd.read_csv()
Access Column df['col'] or df.col
Access Row df.loc[] / df.iloc[]
Filter Rows df[df['col'] > value]
Add Column df['new'] = ...
Group & Aggregate df.groupby()['col'].mean()
Change Index df.set_index() / df.reset_index()
Describe Stats df.describe()
🧱 Syntax
Parameter Description
left, right DataFrames to merge
how Type of join: inner, outer, left, right
on Column(s) to join on (common column)
left_on, right_on Specify different column names if keys differ
🧪 Example DataFrames
Result:
Result:
🔁 3. Right Join (all from right, match from left)
Result:
Result:
🧠 Advanced Parameters
🔹 left_on, right_on
🔍 Merging on Index
Result:
The groupby() function in Pandas is used to split a DataFrame into groups based
on a specific column (or columns), apply a function (like mean, sum, count, etc.),
and then combine the results.
Think of it like SQL's GROUP BY, allowing aggregation and summarization of data.
🧱 Basic Syntax
To apply a function:
🧪 Example DataFrame
Output:
2. Group and Count
Output:
This brings the result back to a regular DataFrame (not Series or multi-index).
🔎 Filtering Groups
Using .filter():
Returns rows from groups where average salary > 55,000.
Output:
This shows total purchase value per customer — great for customer
segmentation.
✅ Summary Table
Just like Excel pivot tables, Pandas' pivot_table() is used to group data across two
axes (rows and columns), apply an aggregation function, and extract insights.
🧱 Syntax of pivot_table()
Parameter Description
data DataFrame to use
values Column to aggregate (e.g., sales, price)
index Keys to group by on the rows
columns Keys to group by on the columns
aggfunc Aggregation function (mean, sum, count, etc.)
fill_value Replace missing values (e.g., 0 instead of NaN)
🧪 Example Dataset
Output:
✅ This shows total Sales by Region and Product.
Gives a multi-index result with both sales and profit data summarized.
📈 Visualization Idea
✅ Summary Table
🔍 Pro Tip:
Pivot tables are highly optimized and can replace many lines of groupby logic
with just one clear line of summarization.
Date/Time Comparison
Convert to Timestamp
🌍 Time Zones (using pytz)
✅ Summary Table
Task Code Example
Get current datetime datetime.now()
Format datetime to string dt.strftime("%Y-%m-%d")
Parse string to datetime datetime.strptime(str, format)
Add days dt + timedelta(days=1)
Extract year/month/day in Pandas df['date'].dt.year
Resample time series df.resample('M').mean()
Convert column to datetime pd.to_datetime(df['column'])
📊 What is a DataFrame?
🔁 4. Sorting Data
🔗 8. Merging DataFrames
Visualization
Modeling (ML)
Export to CSV/Excel
Dashboarding