Exploratory Data Analysis Lab (N-PECCD503P)
S. B. JAIN INSTITUTE OF TECHNOLOGY, MANAGEMENT &
RESEARCH, NAGPUR.
Exploratory Data Analysis (N-PECCD503P)
Semester/Year: 5th Sem/3rd Year
Department of Emerging Technologies, CSE-(DS), S.B.J.I.T.M.R., Nagpur
Exploratory Data Analysis Lab (N-PECCD503P)
Academic Session: 2025-26
List of Practical’s
Name of Laboratory: Exploratory Data Analysis Lab (N-PECCD503P)
Year/Semester: III/V
Course Objective
To equip students with hands-on expertise in data analysis and visualization using tools focusing on
exploratory data analysis, statistical modelling, and interactive visual representation of diverse
datasets.
Course Outcomes
After successful completion of this course, the students will be able to:
Apply: Utilize appropriate tools to install and perform data analysis and visualization
CO1
effectively.
Analyze: Conduct exploratory data analysis on datasets to extract meaningful insights
CO2
using libraries.
Create: Develop advanced techniques for data cleaning, mapping, and cartographic
CO3
visualizations in time series.
CO
Sr. No. Name of Practical
Mapped
Prelab – CO1
Pre-Lab Utilize suitable data analysis tools to install and set up the environment for effective
visualization and analysis.
Practical Analyze an email dataset to identify patterns and insights using Pandas and data visualization CO2
No. 1 libraries.
Practical Apply fundamental Python libraries to manipulate data structures and create visualizations. CO1
No. 2.
Practical Develop visualizations for time-series data to identify trends, seasonality, and patterns. CO3
No. 3.
Practical Construct interactive map-based visualizations using geographic datasets and mouse rollover CO3
Department of Emerging Technologies, CSE-(DS), S.B.J.I.T.M.R., Nagpur
Exploratory Data Analysis Lab (N-PECCD503P)
No. 4. features.
Practical Design cartographic visualizations to represent multiple datasets across global and Indian CO3
No. 5. regions.
Practical Analyze the Wine Quality dataset to explore feature relationships and assess data quality. CO2
No. 6.
Practical Interpret bivariate relationships through scatter plots and correlation metrics to uncover CO2
No. 7. associations.
Practical Utilize automated EDA tools to summarize and visualize key aspects of datasets. CO2
No. 8.
Practical Analyze the distribution of individual features using histograms, bar charts, and pie charts. CO1
No. 9.
Practical Analyze and visualize categorical variables using group-by, cross-tabulation, and stacked bar CO2
No. 10. charts.
Post-Lab Open Ended Practical CO1,CO2,
CO3
Pre-Lab
AIM:
OBJECTIVE:
THEORY:
CONCLUSION:
Department of Emerging Technologies, CSE-(DS), S.B.J.I.T.M.R., Nagpur
Exploratory Data Analysis Lab (N-PECCD503P)
Practical No. 1
AIM: Analyze an email dataset to identify patterns and insights using Pandas and data
visualization libraries.
Objectives:
Load and understand the structure of an email dataset.
Clean and preprocess data for meaningful analysis.
Extract useful features like time of sending, sender/receiver patterns, and email length.
Visualize patterns using plots to derive insights.
Interpret those insights for real-world application (for ex. productivity, spam detection,
email load).
Theory:
EDA is a critical initial step in data science and machine learning pipelines. It involves using
statistical tools and visual methods to:
Understand the structure of the data.
Detect anomalies and relationships.
Form hypotheses.
Guide pre-processing, modelling, and interpretation.
Data Collection and Understanding involves gathering raw data from different sources like files,
databases, APIs, web scraping, logs, etc. and initial inspection to Understand schema, data types,
and metadata, Evaluate relevance to the business problem, and Identify data granularity and unit
of analysis.
Real-world data is often incomplete, noisy, or inconsistent. Cleaning improves data quality by:
Handling missing values: Removing, imputing, or flagging.
Dealing with outliers: Detecting and deciding to cap, remove, or study further.
Fixing data type errors, duplicates, and inconsistent formatting.
Data visualization serves both exploratory and explanatory purposes. Good visualization should
reveal the structure and patterns of data, Communicate insights clearly and accurately, Allow
interactive exploration (when needed).
Department of Emerging Technologies, CSE-(DS), S.B.J.I.T.M.R., Nagpur
Exploratory Data Analysis Lab (N-PECCD503P)
Code:
//Step 1: Loading and Exploring dataset here.
import pandas as pd
# Load the email dataset as a DataFrame.
df = pd.read_csv('write_your_csv/excel_file_name_here.csv')
# Displays the first 5 rows. Helps us inspect the structure.
print(df.head())
# Get basic structure and non-null info, Shows column names, non-null counts, and data types.
print(df.info())
# Descriptive statistics for numeric columns, Summarizes numerical columns — e.g., counts,
means, percentiles.
print(df.describe())
//Step 2: Data Cleaning process.
# Convert 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'])
# Drop rows with essential missing data
df.dropna(subset=['from', 'to', 'date', 'subject', 'body'], inplace=True)
# pd.to_datetime(df['date']): Converts string dates into Python datetime format.
# dropna(): Removes rows where critical fields are missing — ensures cleaner analysis.
//Step 3: Extracting Patterns in this step.
# Extract hour of the day, weekday, and month
df['hour'] = df['date'].dt.hour
df['weekday'] = df['date'].dt.day_name()
df['month'] = df['date'].dt.month_name()
#Extracts the hour to analyze time-of-day behavior.
Department of Emerging Technologies, CSE-(DS), S.B.J.I.T.M.R., Nagpur
Exploratory Data Analysis Lab (N-PECCD503P)
#day_name() & month_name(): Human-readable week/month names.
# Number of recipients per email. Calculates number of recipients using string split.
df['num_recipients'] = df['to'].apply(lambda x: len(str(x).split(',')))
# Length of email body. Measures content length often reflects email detail or importance.
df['body_length'] = df['body'].apply(len)
#Step 4 : Data Visualization
#Email Sent by Hourly
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10, 5))
sns.countplot(data=df, x='weekday', order=['Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday', 'Sunday'])
plt.title("Emails Sent per Weekday")
plt.xlabel("Day of the Week")
plt.ylabel("Number of Emails")
plt.xticks(rotation=45)
plt.show()
#countplot(): Tallies emails for each weekday.
#order=...: Ensures logical day order.
#plt.xticks(rotation=45): Rotates labels for readability.
Output:
Result:
Department of Emerging Technologies, CSE-(DS), S.B.J.I.T.M.R., Nagpur