0% found this document useful (0 votes)

8 views10 pages

EDA LAB ASSIGNMENT2

The document outlines an assignment focused on exploratory data analytics using the Pandas library in Python, covering tasks such as merging datasets, reshaping data with melt() and pivot(), and aggregating total sales by region. It also introduces Matplotlib for creating various visualizations, including line plots, scatter plots, histograms, and bar charts, along with customization techniques. The assignment provides step-by-step instructions and sample code for practical application of these concepts.

Uploaded by

vinaynaidu6872

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views10 pages

EDA LAB ASSIGNMENT2

Uploaded by

vinaynaidu6872

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Name : kola vinay kumar Subject :EXPLORATORY DATA ANALYTICS (EDA)

HT. No : 2403B05107 Semester : M. Tech(I/II)

ASSIGNMENT 02:

Data Transformation and Aggregation

Objective: Practice transforming and reshaping datasets.
Tasks:
• Merge multiple datasets (e.g., orders and customers datasets).
• Perform reshaping using melt() and pivot() functions in Pandas.
• Aggregate data (e.g., calculate the total sales for each region).
• Tools: Pandas.

Data Transformation and Aggregation with Pandas

The objective of this exercise is to practice transforming and reshaping datasets using the
Pandas library in Python. We will focus on merging datasets, reshaping data
using melt() and pivot(), and aggregating data to calculate total sales for each region.
1. Merging Datasets
Merging is the process of combining two or more datasets based on a common key. In Pandas, this is
typically done using the merge() function.
2. Reshaping Data
Reshaping refers to changing the layout of a DataFrame. This can be done using:
• Melt: Converts a DataFrame from wide format to long format.
• Pivot: Converts a DataFrame from long format to wide format.
3. Aggregating Data
Aggregation involves summarizing data, such as calculating totals, averages, or counts. In Pandas,
this can be done using functions like groupby() and agg().
Tools
Pandas: A powerful data manipulation and analysis library for Python.
Step 1: Import Libraries and Create Sample Datasets
import pandas as pd

# Sample customers dataset

customers_data = {
'customer_id': [1, 2, 3],
'customer_name': ['Alice', 'Bob', 'Charlie'],
'region': ['North', 'South', 'East']
}

customers_df = pd.DataFrame(customers_data)

# Sample orders dataset

orders_data = {
'order_id': [101, 102, 103, 104],
'customer_id': [1, 2, 1, 3],
'order_amount': [250, 150, 300, 200],
'order_date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04']
}
orders_df = pd.DataFrame(orders_data)

print("Customers DataFrame:")
print(customers_df)
print("\nOrders DataFrame:")
print(orders_df)

OUTPUT:

Customers DataFrame:
customer_id customer_name region
0 1 Alice North
1 2 Bob South
2 3 Charlie East

Orders DataFrame:
order_id customer_id order_amount order_date
0 101 1 250 2023-01-01
1 102 2 150 2023-01-02
2 103 1 300 2023-01-03
3 104 3 200 2023-01-04

Step 2: Merge Datasets

We will merge the orders_df and customers_df on the customer_id column.
# Merging datasets
merged_df = pd.merge(orders_df, customers_df, on='customer_id', how='inner')

print("\nMerged DataFrame:")
print(merged_df)

Output:

order_id customer_id order_amount order_date customer_name

0 101 1250 2023-01-01 Alice
1 103 300 2023-01-03 Alice
2 102 150 2023-01-02 Bob
3 104 200 2023-01-04 Charlie

Reshaping Using melt()

The melt() function is used to convert a wide format dataset into a long format.

# Reshaping the merged data using melt

melted_df = pd.melt(merged_df, id_vars=['order_id', 'customer_id', 'name'],
value_vars=['product', 'quantity', 'price', 'region'])

print(melted_df)
OUTPUT:

order_id customer_id name variable value

0 101 C1 Alice product Laptop
1 104 C1 Alice product Monitor
2 102 C2 Bob product Phone
3 105 C2 Bob product Keyboard
4 103 C3 Charlie product Tablet
...
3. Reshaping Using pivot()
The pivot() function converts long format back to wide format.

# Pivot table example

pivot_df = merged_df.pivot(index='order_id', columns='product', values='price')

print(pivot_df)

OUTPUT:
product Keyboard Laptop Monitor Phone Tablet
order_id
101 NaN 1000.0 NaN NaN NaN
102 NaN NaN NaN 500.0 NaN
103 NaN NaN NaN NaN 300.0
104 NaN NaN 200.0 NaN NaN
105 50.0 NaN NaN NaN NaN

The pivot() function rearranges the data so that products become columns and the price values are
filled accordingly.

Aggregating Data
We will calculate total sales per region.

# Calculate total sales per region

merged_df['total_sales'] = merged_df['quantity'] * merged_df['price']

# Grouping by region and summing the sales

sales_per_region = merged_df.groupby('region')['total_sales'].sum().reset_index()

print(sales_per_region)

OUTPUT:
region total_sales
0 East 1400
1 North 300
2 West 1150

• We multiply quantity and price to get total_sales.

• Then we group by region and sum the total_sales.
Merged datasets on customer_id.
✔ Reshaped data using melt() (long format) and pivot() (wide format).
✔ Aggregated total sales by region.
This provides a complete workflow of merging, reshaping, and aggregating datasets using
Pandas.

2. Creating and Customizing Plots

• Objective: Use Matplotlib to create various types of visualizations.
• Tasks:
Create line plots, scatter plots, histograms, and bar charts.
Customize plots with legends, annotations, titles, and axis labels.
Generate subplots and explore 3D plotting.
Tools: Matplotlib.
Matplotlib is a powerful library in Python used for creating static, animated, and interactive
visualizations. It provides functionalities to create various types of plots, such as line plots, scatter
plots, histograms, and bar charts.
With Matplotlib, we can:
• Customize plots with titles, labels, legends, and annotations.
• Generate multiple subplots in one figure.
• Explore 3D visualizations using mpl_toolkits.mplot3d.

Importing Required Libraries

Before creating plots, we need to import Matplotlib.

import matplotlib.pyplot as plt

import numpy as np
from mpl_toolkits.mplot3d import Axes3D # For 3D plots
2. Creating Various Types of Plots
2.1 Line Plot
A line plot is used to visualize trends over time.
# Data for the line plot
x = np.linspace(0, 10, 100) # 100 points from 0 to 10
y = np.sin(x) # Sine function

# Creating the line plot

plt.figure(figsize=(8, 5)) # Set figure size
plt.plot(x, y, label='sin(x)', color='blue', linestyle='--', linewidth=2)

# Customization
plt.title("Line Plot of sin(x)", fontsize=14)
plt.xlabel("X values", fontsize=12)
plt.ylabel("Y values", fontsize=12)
plt.legend()
plt.grid(True)

# Show plot
plt.show()
OUTPUT:
Scatter Plot

A scatter plot is used to show relationships between two variables.

# Generate random data

np.random.seed(42)
x = np.random.rand(50) * 10 # Random x values
y = np.random.rand(50) * 10 # Random y values

# Creating the scatter plot

plt.figure(figsize=(8, 5))
plt.scatter(x, y, color='red', marker='o', edgecolors='black', alpha=0.75)

# Customization
plt.title("Scatter Plot of Random Data", fontsize=14)
plt.xlabel("X Axis", fontsize=12)
plt.ylabel("Y Axis", fontsize=12)
plt.grid(True)

# Show plot
plt.show()
Output:
A scatter plot with red circles representing random data points.

Histogram
A histogram is used to represent the distribution of a dataset.
# Generate random data
data = np.random.randn(1000) # 1000 data points following normal distribution

# Creating the histogram

plt.figure(figsize=(8, 5))
plt.hist(data, bins=30, color='green', edgecolor='black', alpha=0.7)

# Customization
plt.title("Histogram of Normally Distributed Data", fontsize=14)
plt.xlabel("Data Bins", fontsize=12)
plt.ylabel("Frequency", fontsize=12)
plt.grid(True)

# Show plot
plt.show()

OUTPUT:

A histogram with 30 bins, showing a normal distribution.

Bar Chart
A bar chart is used to compare categorical data.
# Sample data
categories = ['A', 'B', 'C', 'D']
values = [10, 25, 15, 30]

# Creating the bar chart

plt.figure(figsize=(8, 5))
plt.bar(categories, values, color=['blue', 'orange', 'green', 'red'])

# Customization
plt.title("Bar Chart Example", fontsize=14)
plt.xlabel("Categories", fontsize=12)
plt.ylabel("Values", fontsize=12)
plt.grid(axis='y', linestyle='--')

# Show plot
plt.show()

OUTPUT:

Customizing Plots : Adding Legends, Titles, and Annotations

Customization helps in better interpretation of data.
# Creating a simple line plot with annotations
x = np.linspace(0, 10, 100)
y = np.cos(x)

plt.figure(figsize=(8, 5))
plt.plot(x, y, label='cos(x)', color='purple')

# Adding text annotation

plt.annotate('Peak', xy=(0, 1), xytext=(2, 1.2),
arrowprops=dict(facecolor='black', shrink=0.05))

# Customization
plt.title("Customized Line Plot", fontsize=14)
plt.xlabel("X values", fontsize=12)
plt.ylabel("Y values", fontsize=12)
plt.legend()
plt.grid(True)

# Show plot
plt.show()

OUTPUT:

Subplots
We can create multiple plots in a single figure.

fig, axs = plt.subplots(2, 2, figsize=(10, 8))

# First subplot: Line plot

axs[0, 0].plot(x, y, color='blue')
axs[0, 0].set_title("Line Plot")

# Second subplot: Scatter plot

axs[0, 1].scatter(x, y, color='red')
axs[0, 1].set_title("Scatter Plot")

# Third subplot: Histogram

axs[1, 0].hist(data, bins=20, color='green')
axs[1, 0].set_title("Histogram")

# Fourth subplot: Bar Chart

axs[1, 1].bar(categories, values, color='orange')
axs[1, 1].set_title("Bar Chart")

# Adjust layout
plt.tight_layout()
plt.show()
OUTPUT:

3D Plotting
Matplotlib allows 3D visualization using Axes3D.

fig = plt.figure(figsize=(8, 6))

ax = fig.add_subplot(111, projection='3d')

# Generate 3D data
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X**2 + Y**2))

# Creating the 3D surface plot

ax.plot_surface(X, Y, Z, cmap='viridis')

# Customization
ax.set_title("3D Surface Plot")
ax.set_xlabel("X Axis")
ax.set_ylabel("Y Axis")
ax.set_zlabel("Z Axis")

# Show plot
plt.show()
OUTPUT:

Plot Type Purpose

Line Plot Visualize trends over time
Scatter Plot Show relationships between two variables
Histogram Display data distribution
Bar Chart Compare categorical data
Subplots Show multiple plots in one figure
3D Plot Visualize complex 3D data

Key Customizations
✔ Titles (title())
✔ Axis labels (xlabel(), ylabel())
✔ Legends (legend())
✔ Annotations (annotate())
✔ Grid (grid(True))

Matplotlib provides a flexible and powerful way to visualize data. By mastering these concepts, you
can create highly customized and informative plots for data analysis and presentation.

DOC-20250315-WA0005.
No ratings yet
DOC-20250315-WA0005.
29 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
13 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
AIDS LAB
No ratings yet
AIDS LAB
45 pages
Python Pandas and Matplotlib 7
100% (3)
Python Pandas and Matplotlib 7
72 pages
Python Course Cheat Sheet
No ratings yet
Python Course Cheat Sheet
30 pages
Comprehensive Data Visualization With Matplotlib and Seaborn
No ratings yet
Comprehensive Data Visualization With Matplotlib and Seaborn
40 pages
matplotlib_cheetsheet
No ratings yet
matplotlib_cheetsheet
9 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
datascience
No ratings yet
datascience
26 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
22 pages
DV LAb Staff
No ratings yet
DV LAb Staff
73 pages
ML week 7
No ratings yet
ML week 7
12 pages
Data Analysis
No ratings yet
Data Analysis
20 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Data Visualization using Matplotlib in Python
No ratings yet
Data Visualization using Matplotlib in Python
15 pages
1. NumPy and Pandas Step
No ratings yet
1. NumPy and Pandas Step
9 pages
2,3. Introduction Pandas & Matplotlib - Copy
No ratings yet
2,3. Introduction Pandas & Matplotlib - Copy
32 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
NumPy and Pandas (1)
No ratings yet
NumPy and Pandas (1)
12 pages
exp_2_sdk_ok
No ratings yet
exp_2_sdk_ok
18 pages
matplotlib-cheat-sheet
No ratings yet
matplotlib-cheat-sheet
6 pages
EDA_CODE_SNIPPETS
No ratings yet
EDA_CODE_SNIPPETS
17 pages
Python
No ratings yet
Python
32 pages
EX-02-Data manipulation pandas matplot
No ratings yet
EX-02-Data manipulation pandas matplot
9 pages
DMV-U4-RK
No ratings yet
DMV-U4-RK
16 pages
16 Mark Ds
No ratings yet
16 Mark Ds
18 pages
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
No ratings yet
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
14 pages
Data Visualization Python Tutorial
100% (1)
Data Visualization Python Tutorial
9 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
34 pages
UNIT-IV - Matplotlib
No ratings yet
UNIT-IV - Matplotlib
10 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
matplotlib
No ratings yet
matplotlib
7 pages
Unit 5 PythonPackages(Matplotlib)
No ratings yet
Unit 5 PythonPackages(Matplotlib)
24 pages
Program Questions
No ratings yet
Program Questions
2 pages
matplotlib
No ratings yet
matplotlib
5 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
a9bf73_Introduction to Matplotlib
No ratings yet
a9bf73_Introduction to Matplotlib
18 pages
DSP LAB-3(part-a)
No ratings yet
DSP LAB-3(part-a)
16 pages
Visualisation All
0% (1)
Visualisation All
70 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
BTech_5_CSE_Data_Analytics_Using_Python_Unit_5_Notes
No ratings yet
BTech_5_CSE_Data_Analytics_Using_Python_Unit_5_Notes
9 pages
Assignment 4 On Visualization On Graph With Solution
No ratings yet
Assignment 4 On Visualization On Graph With Solution
14 pages
graphs using matplotlib
No ratings yet
graphs using matplotlib
23 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
Unit 3
No ratings yet
Unit 3
19 pages
DS3.1
No ratings yet
DS3.1
8 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Lab 10
No ratings yet
Lab 10
16 pages
DMV Unit-4-1.pdf
No ratings yet
DMV Unit-4-1.pdf
10 pages
BDA File
No ratings yet
BDA File
26 pages
DS1
No ratings yet
DS1
10 pages
Division and Divisibility Rule
No ratings yet
Division and Divisibility Rule
19 pages
Partnership
No ratings yet
Partnership
17 pages
Factors
No ratings yet
Factors
14 pages
3.Seating Arragement
No ratings yet
3.Seating Arragement
11 pages
EDGE COMPUTING (7)
No ratings yet
EDGE COMPUTING (7)
31 pages
Service Manual MS4 (2020 - 08 - 20 02 - 50 - 44 UTC)
100% (2)
Service Manual MS4 (2020 - 08 - 20 02 - 50 - 44 UTC)
92 pages
Last digit
No ratings yet
Last digit
14 pages
(Instruction Manual) Macroeconomics Global Edition 6th by Stephen D. Williamson pdf download
100% (2)
(Instruction Manual) Macroeconomics Global Edition 6th by Stephen D. Williamson pdf download
17 pages
Ads Assignment4
No ratings yet
Ads Assignment4
12 pages
ADS assignment3
No ratings yet
ADS assignment3
11 pages
2403B05107_DL_ACTIVITY_05
No ratings yet
2403B05107_DL_ACTIVITY_05
10 pages
ICSE Computer Applications Model Questions
No ratings yet
ICSE Computer Applications Model Questions
12 pages
Remainders
No ratings yet
Remainders
15 pages
AI FOR DAILY USE
No ratings yet
AI FOR DAILY USE
8 pages
Crash 2024 8 14 17-5-19
No ratings yet
Crash 2024 8 14 17-5-19
28 pages
2403B05107_DL_ACTIVITY_2-1
No ratings yet
2403B05107_DL_ACTIVITY_2-1
7 pages
Multilayer ANN for regression 5107
No ratings yet
Multilayer ANN for regression 5107
7 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Facility Layout Planning. An Extended Literature Review
No ratings yet
Facility Layout Planning. An Extended Literature Review
41 pages
Hardware Platform Trends
No ratings yet
Hardware Platform Trends
6 pages
Quiz 2 DS5110 32875 Intro To Data Management SEC 01 Spring 2022 BOS 2 TR PDF
No ratings yet
Quiz 2 DS5110 32875 Intro To Data Management SEC 01 Spring 2022 BOS 2 TR PDF
26 pages
ETHICS3
No ratings yet
ETHICS3
4 pages
Understanding Kotlin Coroutines
No ratings yet
Understanding Kotlin Coroutines
22 pages
A2_Worksheet_-_Chat_with_Big_Ed (2)
No ratings yet
A2_Worksheet_-_Chat_with_Big_Ed (2)
5 pages
Soal USBN BHS INGGRIS 2019
No ratings yet
Soal USBN BHS INGGRIS 2019
18 pages
What Does A Pick and Place File Mean
No ratings yet
What Does A Pick and Place File Mean
10 pages
Mma 2020
No ratings yet
Mma 2020
88 pages
Forms Personalization - Training Manual
100% (1)
Forms Personalization - Training Manual
36 pages
Daily Task Dashboard
No ratings yet
Daily Task Dashboard
8 pages
Autocode Usage Guide (2)
No ratings yet
Autocode Usage Guide (2)
5 pages
Secugen® Usb Fingerprint Reader User Guide: Installation, Usage, Care and Troubleshooting
No ratings yet
Secugen® Usb Fingerprint Reader User Guide: Installation, Usage, Care and Troubleshooting
15 pages
COVID-19 - Face Mask Detector With OpenCV, Keras - TensorFlow, and Deep Learning - PyImageSearch
No ratings yet
COVID-19 - Face Mask Detector With OpenCV, Keras - TensorFlow, and Deep Learning - PyImageSearch
45 pages
SMPS CPSM Handbook Vfebruary 2024
No ratings yet
SMPS CPSM Handbook Vfebruary 2024
16 pages
Topic - UDP Socket Programming: Assignment: 5
No ratings yet
Topic - UDP Socket Programming: Assignment: 5
16 pages
AI
No ratings yet
AI
13 pages
IBM Naan Mudhalvan
No ratings yet
IBM Naan Mudhalvan
4 pages
Barracuda Web Application Firewall Best Practices Guide PDF
No ratings yet
Barracuda Web Application Firewall Best Practices Guide PDF
14 pages
Traceability - Template - Team 5.xlsx - Hoja1
No ratings yet
Traceability - Template - Team 5.xlsx - Hoja1
4 pages
Remote Access Procedure Agreement
No ratings yet
Remote Access Procedure Agreement
6 pages
Mahesh Resume-Maintenance
No ratings yet
Mahesh Resume-Maintenance
3 pages
As Cfe en PDF
No ratings yet
As Cfe en PDF
5 pages
Cine Gear Expo 2023 SpeedBadge Registration Confirmation
No ratings yet
Cine Gear Expo 2023 SpeedBadge Registration Confirmation
1 page
Idoc - Pub Freebitcoin Script Roll 10000
No ratings yet
Idoc - Pub Freebitcoin Script Roll 10000
2 pages
Audison Forza Apf8.9bit-24v Tech-Sheet 2019
No ratings yet
Audison Forza Apf8.9bit-24v Tech-Sheet 2019
1 page
Class Diagram: Verify by 1
No ratings yet
Class Diagram: Verify by 1
1 page
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

EDA LAB ASSIGNMENT2

Uploaded by

EDA LAB ASSIGNMENT2

Uploaded by

Name : kola vinay kumar Subject :EXPLORATORY DATA ANALYTICS (EDA)

HT. No : 2403B05107 Semester : M. Tech(I/II)

Data Transformation and Aggregation

Data Transformation and Aggregation with Pandas

# Sample customers dataset

# Sample orders dataset

Step 2: Merge Datasets

order_id customer_id order_amount order_date customer_name

Reshaping Using melt()

# Reshaping the merged data using melt

order_id customer_id name variable value

# Pivot table example

# Calculate total sales per region

# Grouping by region and summing the sales

• We multiply quantity and price to get total_sales.

2. Creating and Customizing Plots

Importing Required Libraries

import matplotlib.pyplot as plt

# Creating the line plot

A scatter plot is used to show relationships between two variables.

# Generate random data

# Creating the scatter plot

# Creating the histogram

A histogram with 30 bins, showing a normal distribution.

# Creating the bar chart

Customizing Plots : Adding Legends, Titles, and Annotations

# Adding text annotation

fig, axs = plt.subplots(2, 2, figsize=(10, 8))

# First subplot: Line plot

# Second subplot: Scatter plot

# Third subplot: Histogram

# Fourth subplot: Bar Chart

fig = plt.figure(figsize=(8, 6))

# Creating the 3D surface plot

Plot Type Purpose

You might also like