0% found this document useful (0 votes)

30 views34 pages

Dev Record Final

Uploaded by

pmkishore03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views34 pages

Dev Record Final

Uploaded by

pmkishore03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

VELAMMAL INSTITUTE OF TECHNOLOGY

Velammal Knowledge Park, Panchetti

DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE

ODD SEMESTER LAB RECORD

ACADEMIC YEAR(2024-2025)

REGULATION-2021

AD3301– DATA EXPLOARTION AND VISUALIZATION LABORATORY

Name of the Student :

Department : ARTIFICIAL INTELLIGENCE & DATA SCIENCE

Name of the Laboratory : DATA EXPLORATION AND VISUALIZATION LAB

Lab Code : AD3301

Year/Semester/Sec :
II/III/B
1
BONAFIDE CERTIFICATE

University Reg. No.

This is to certify that this is a bonafide record work done by

Mr. / Miss. Studying B.E./B.Tech.,

Department in the Laboratory for

semester.

Staff-in-charge Head of the Department

Submitted for the University practical examination held on

at Velammal Institute of Technology.

Internal Examiner External Examiner

2
AD3301–DATA EXPLORATION AND VISUALIZATION LABORATORY

INDEX

S. DATE Page Sign

NAME OF THE EXPERIMENTS
No No

1 Install the data Analysis and Visualization tool:

R/Python/Tableau/Power BI.
2 Perform exploratory data analysis (EDA) on with datasets
like email data set. Export all your emails as a dataset, import
them inside a pandas data frame, visualize them and get
different insights from the data.
3 Working with Numpy arrays, Pandas data frames,
Basic plots using Matplotlib.
4 Explore various variable and row filters in R for cleaning data.
Apply various plot features in Ron sample datasets and visualize.
5 Perform Time Series Analysis and apply the various visualization
techniques.
6 Perform Data Analysis and representation on a Map using
various Map datasets with Mouse Rollover effect, user
interaction, etc..
7 Build cartographic visualization or multiple datasets involving
various countries of the world; states and districts in India etc.
8 Perform EDA on Wine Quality DataSet.
9 Use a case study on a data set and apply the various EDA and
visualization techniques and present an analysis report.

3
Ex 1.
Install the data Analysis and Visualization tool: R/ Python /Power BI.
DATE:

Aim:

To install the data analysis and visualization tool in python.

Algorithm:

1. Install Pandas in Python

2. Create a dataframe in pandas using pd.series method
3. Convert the data in csv file into pandas dataframe
4. Indexing dataframes with pandas using
pandas.dataframe.iloc method
5. Indexing using Labels in pandas using pandas.dataframe.locmethod
6. Install matplotlib in python
7. Plot the dataframe using pandas
Program:

1. Installation:

pip install pandas

2. Creating A DataFrame in Pandas:

# assigning two series to s1 and s2s1 =
pd.Series([1,2])
s2 = pd.Series(["Ashish", "Sid"]) #
framing series objects into datadf =
pd.DataFrame([s1,s2])
# show the data framedf
# data framing in another way
# taking index and column values
dframe = pd.DataFrame([[1,2],["Ashish", "Sid"]],
index=["r1", "r2"],
columns=["c1", "c2"])
dframe
# framing in another way #
dict-like container dframe =
pd.DataFrame({
"c1": [1, "Ashish"],
"c2": [2, "Sid"]})

4
dframe
3. Importing Data with Pandas

# Import the pandas library, renamed as pdimport

pandas as pd
# Read IND_data.csv into a DataFrame, assigned to dfdf =
pd.read_csv("IND_data.csv")
# Prints the first 5 rows of a DataFrame as defaultdf.head()
# Prints no. of rows and columns of a DataFramedf.shape

4. Indexing DataFrames with Pandas

# prints first 5 rows and every column which replicates df.head()df.iloc[0:5,:]
# prints entire rows and columns
df.iloc[:,:]
# prints from 5th rows and first 5 columns
df.iloc[5:,:5]

5. Indexing Using Labels in Pandas

# prints first five rows including 5th index and every columns of dfdf.loc[0:5,:]
# prints from 5th rows onwards and entire columnsdf =
df.loc[5:,:]
# Prints the first 5 rows of Time period#
value
df.loc[:5,"Time period"]

6. Installation
pip install matplotlib

7. Pandas Plotting
# import the required module
import matplotlib.pyplot as plt #
plot a histogram
df['Observation Value'].hist(bins=10)
# shows presence of a lot of outliers/extreme values
df.boxplot(column='Observation Value', by = 'Time period')# plotting
points as a scatter plot
x = df["Observation Value"]y =
df["Time period"]
plt.scatter(x, y, label= "stars", color= "m",
marker= "*", s=30)
# x-axis label
plt.xlabel('Observation Value')#
frequency label plt.ylabel('Time
period')
# function to show the plot
plt.show()

5
Output:

Result:

Thus the program to Install the data Analysis and Visualization tool using python has been
executed successfully.
6
Ex 2. Perform exploratory data analysis (EDA) on with datasets
like email data set. Export all your emails as a dataset,
DATE:
import them inside a pandas data frame, visualize them and
get different insights from the data.

AIM:

To perform exploratory data analysis (EDA) on with datasets like email data set. Export all
your emails as a dataset, import them inside a pandas data frame, visualize them and get different
insights from the data.

ALGORITHM:

1. Import the required libraries.

2. Create email dataset in python.
3. Export and all your email datasets.
4. Visualize the email dataset

PROGRAM:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

!pip install mailbox

from google.colab import drive

drive.mount('/content/gdrive')

import mailbox

mboxfile = "gdrive/My Drive/Colab Notebooks/gmail.mbox"

mbox = mailbox.mbox(mboxfile)

mbox

for key in mbox[0].keys():

7
print(key)

import csv

with open('mailbox.csv', 'w') as outputfile:

writer = csv.writer(outputfile)

writer.writerow(['subject','from','date','to','label','thread'])

for message in mbox:

writer.writerow([message['subject'], message['from'], message['date'], message['to'],

message['X-Gmail-Labels'], message['X-GM-THRID']])

dfs = pd.read_csv('mailbox.csv', names=['subject', 'from', 'date', 'to', 'label', 'thread'])

dfs.dtypes

dfs['date'] = dfs['date'].apply(lambda x: pd.to_datetime(x, errors='coerce', utc=True))

dfs = dfs[dfs['date'].notna()]

dfs.to_csv('gmail.csv')

dfs.info()

dfs.head(10)

dfs.columns

def extract_email_ID(string):

email = re.findall(r'<(.+?)>', string)

if not email:

email = list(filter(lambda y: '@' in y, string.split()))

return email[0] if email else np.nan

dfs['from'] = dfs['from'].apply(lambda x: extract_email_ID(x))

myemail = 'itsmeskm99@gmail.com'

dfs['label'] = dfs['from'].apply(lambda x: 'sent' if x==myemail else 'inbox')

dfs.drop(columns='to', inplace=True)

8
dfs.head(10)

import datetime

import pytz

def refactor_timezone(x):

est = pytz.timezone('US/Eastern')

return x.astimezone(est)

dfs['date'] = dfs['date'].apply(lambda x: refactor_timezone(x))

dfs['dayofweek'] = dfs['date'].apply(lambda x: x.weekday_name)

dfs['dayofweek'] = pd.Categorical(dfs['dayofweek'], categories=[

'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',

'Saturday', 'Sunday'], ordered=True)

dfs['timeofday'] = dfs['date'].apply(lambda x: x.hour + x.minute/60 + x.second/3600)

dfs['hour'] = dfs['date'].apply(lambda x: x.hour)

dfs['year_int'] = dfs['date'].apply(lambda x: x.year)

dfs['year'] = dfs['date'].apply(lambda x: x.year + x.dayofyear/365.25)

dfs.index = dfs['date']

del dfs['date']

print(dfs.index.min().strftime('%a, %d %b %Y %I:%M %p'))

print(dfs.index.max().strftime('%a, %d %b %Y %I:%M %p'))

print(dfs['label'].value_counts())

import matplotlib.pyplot as plt

from matplotlib.ticker import MaxNLocator

def plot_todo_vs_year(df, ax, color='C0', s=0.5, title=''):

ind = np.zeros(len(df), dtype='bool')

est = pytz.timezone('US/Eastern')

9
df[~ind].plot.scatter('year', 'timeofday', s=s, alpha=0.6, ax=ax, color=color)

ax.set_ylim(0, 24)

ax.yaxis.set_major_locator(MaxNLocator(8))

ax.set_yticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))), "%H").strftime("%I

%p") for ts in ax.get_yticks()]);

ax.set_xlabel('')

ax.set_ylabel('')

ax.set_title(title)

ax.grid(ls=':', color='k')

return ax

sent = dfs[dfs['label']=='sent']

received = dfs[dfs['label']=='inbox']

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 4))

plot_todo_vs_year(sent, ax[0], title='Sent')

plot_todo_vs_year(received, ax[1], title='Received')

def plot_number_perday_per_year(df, ax, label=None, dt=0.3, **plot_kwargs):

year = df[df['year'].notna()]['year'].values

T = year.max() - year.min()

bins = int(T / dt)

weights = 1 / (np.ones_like(year) * dt * 365.25)

ax.hist(year, bins=bins, weights=weights, label=label, **plot_kwargs);

ax.grid(ls=':', color='k')

from scipy import ndimage

10
def plot_number_perdhour_per_year(df, ax, label=None, dt=1, smooth=False,

weight_fun=None, **plot_kwargs):

tod = df[df['timeofday'].notna()]['timeofday'].values

year = df[df['year'].notna()]['year'].values

Ty = year.max() - year.min()

T = tod.max() - tod.min()

bins = int(T / dt)

if weight_fun is None:

weights = 1 / (np.ones_like(tod) * Ty * 365.25 / dt)

else:

weights = weight_fun(df)

if smooth:

hst, xedges = np.histogram(tod, bins=bins, weights=weights);

x = np.delete(xedges, -1) + 0.5*(xedges[1] - xedges[0])

hst = ndimage.gaussian_filter(hst, sigma=0.75)

f = interp1d(x, hst, kind='cubic')

x = np.linspace(x.min(), x.max(), 10000)

hst = f(x)

ax.plot(x, hst, label=label, **plot_kwargs)

else:

ax.hist(tod, bins=bins, weights=weights, label=label, **plot_kwargs);

ax.grid(ls=':', color='k')

orientation = plot_kwargs.get('orientation')

if orientation is None or orientation == 'vertical':

ax.set_xlim(0, 24)

11
ax.xaxis.set_major_locator(MaxNLocator(8))

ax.set_xticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))), "%H").strftime("%I

%p")

for ts in ax.get_xticks()]);

elif orientation == 'horizontal':

ax.set_ylim(0, 24)

ax.yaxis.set_major_locator(MaxNLocator(8))

ax.set_yticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))), "%H").strftime("%I

%p")

for ts in ax.get_yticks()]);

class TriplePlot:

def __init__(self):

gs = gridspec.GridSpec(6, 6)

self.ax1 = plt.subplot(gs[2:6, :4])

self.ax2 = plt.subplot(gs[2:6, 4:6], sharey=self.ax1)

plt.setp(self.ax2.get_yticklabels(), visible=False);

self.ax3 = plt.subplot(gs[:2, :4])

plt.setp(self.ax3.get_xticklabels(), visible=False);

def plot(self, df, color='darkblue', alpha=0.8, markersize=0.5, yr_bin=0.1, hr_bin=0.5):

plot_todo_vs_year(df, self.ax1, color=color, s=markersize)

plot_number_perdhour_per_year(df, self.ax2, dt=hr_bin, color=color, alpha=alpha,

orientation='horizontal')

self.ax2.set_xlabel('Average emails per hour')

plot_number_perday_per_year(df, self.ax3, dt=yr_bin, color=color, alpha=alpha)

self.ax3.set_ylabel('Average emails per day')

for ct, addr in enumerate(addrs.index[idx]):

12
tpl.plot(dfs[dfs['from'] == addr], color=colors[ct], alpha=0.3, yr_bin=0.5, markersize=1.0)

labels.append(mpatches.Patch(color=colors[ct], label=addrs[0:4], alpha=0.5))

plt.legend(handles=labels, bbox_to_anchor=[1.4, 0.9], fontsize=12, shadow=True);

sdw = sent.groupby('dayofweek').size() / len(sent)

rdw = received.groupby('dayofweek').size() / len(received)

df_tmp = pd.DataFrame(data={'Outgoing Email': sdw, 'Incoming Email':rdw})

df_tmp.plot(kind='bar', rot=45, figsize=(8,5), alpha=0.5)

plt.xlabel('');

plt.ylabel('Fraction of weekly emails');

plt.grid(ls=':', color='k', alpha=0.5)

import scipy.ndimage

from scipy.interpolate import interp1d

plt.figure(figsize=(8,5))

ax = plt.subplot(111)

for ct, dow in enumerate(dfs.dayofweek.cat.categories):

df_r = received[received['dayofweek']==dow]

weights = np.ones(len(df_r)) / len(received)

plot_number_perdhour_per_year(df_r, ax, dt=1, smooth=True, color=f'C{ct}',

alpha=0.8, lw=3, label=dow, weight_fun=wfun)

df_s = sent[sent['dayofweek']==dow]

weights = np.ones(len(df_s)) / len(sent)

wfun = lambda x: weights

plot_number_perdhour_per_year(df_s, ax, dt=1, smooth=True, color=f'C{ct}',

alpha=0.8, lw=2, label=dow, ls='--', weight_fun=wfun)

ax.set_ylabel('Fraction of weekly emails per hour')

13
plt.legend(loc='upper left')

OUTPUT:

sub thr
from date label
ject ead

2019-09-
New Books: The Python 20 inb 1645216686
1 james@sitepoint.com
Journeyman + Understandi... 14:07:05 ox 186738105
+00:00

2019-09-
iPhone 11 Pro og iPhone 11 News_Europe@Inside 20 inb 1645190169
2
er her Apple.Apple.com 10:33:27 ox 696380553
+00:00

2019-09-
=?utf-
support@totebagfactor 20 inb 1645209548
3 8?Q?Save=20on=20Burlap=2
y.com 15:32:31 ox 975264659
0Bags=20Today=21...
+00:00

2019-09-
Hi there, looking for the best info@email.daraz.com 17 inb 1644916038
4
Dashain deals? ... .np 06:19:10 ox 153843699
+00:00

2019-09-
The file =?UTF-
20 inb 1645222431
5 8?B?J0JyYW5kX0Jvb2sgdG noreply@box.com
19:04:16 ox 795507661
VzdC5wZGY...
+00:00

14
15
RESULT:

Thus the program tp perform exploratory data analysis (EDA) on with datasets like
email data set and exporting all your emails as a dataset, import them inside a pandas data
frame, visualize them and get different insights from the data was executed and
implemented successfully.
16
Ex 3. Working with Nupy arrays, Pandas data frames, Basic plots using
DATE: Matplotlib.Numpy arrays using matplotlib
Aim:

To Work with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.
Numpy arrays using matplotlib

Algorithm:

1. Import the packages

2. Import numpy and matplotlib libraries in python.
3. np.arange() function as the values on the x axis. The corresponding values on the y axis
are stored in another ndarray object y.
4. pyplot() is the most important function in matplotlib library, which is used to plot 2D
data. The following script plots the equation y = 2x + 5.
5. Show the plot
Program:

import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11)
y=2*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()
Output:

Result:

Thus the given program has been executed successfully.

17
3(b)Pandas dataframe using matplotlib

import pandas as pd

import matplotlib.pyplot as plt

df = pd.DataFrame({'Name': ['John', 'Sammy', 'Joe'],'Age': [45, 38, 90]})

df.plot(x="Name", y="Age", kind="bar")

Basic plots

import matplotlib.pyplot as plt

x = [1,1.5,2,2.5,3,3.5,3.6]
y = [7.5,8,8.5,9,9.5,10,10.5]
x1=[8,8.5,9,9.5,10,10.5,11]
y1=[3,3.5,3.7,4,4.5,5,5.2]
plt.scatter(x,y, label='high income low saving',color='r')
plt.scatter(x1,y1,label='low income high savings',color='b')
plt.xlabel('saving*100')
plt.ylabel('income*1000')
plt.title('Scatter Plot')
plt.legend()
plt.show()

18
Result:

Thus the given program to work with Numpy arrays, Pandas data frames, Basic
plots using Matplotlib,Numpy arrays using matplotlib has been executed successfully.

19
Ex 4. Explore various variable and row filters in python for cleaning data.
Apply various plot features in python on sample data sets and visualize
DATE:

Aim:

To Explore various variable and row filters in python for cleaning data. Apply various plot
features in python on sample data sets and visualize.

Algorithm:

1. Load the data

2. View the first few rows
3. Use the subset function on rows and columns
4. Filter the rows and columns based on condition
5. Create a scatter plot and add a line of best fit
6. Similarly create bar plot,histogram and box plot

Program:

# import the pandas library

import pandas as pd

import numpy as np

df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f',

'h'],columns=['one', 'two', 'three'])

df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])

print df

Output:

one two three

a 0.077988 0.476149 0.965836
b NaN NaN NaN
c -0.390208 -0.551605 -2.301950
d NaN NaN NaN
e -2.000303 -0.788201 1.510072

20
Program :( Checking duplicate)

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f',

'h'],columns=['one', 'two', 'three'])

df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])

print df['one'].isnull()

Output:

a False
b True
c False
d True
e False
f False
g True
h False
Name: one, dtype: bool

Program:(filling missing data)

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 3), index=['a', 'c', 'e'],columns=['one',
'two', 'three'])
df = df.reindex(['a', 'b', 'c'])
print df
print ("NaN replaced with '0':")
print df.fillna(0)

Output:

one two three

a -0.576991 -0.741695 0.553172
b NaN NaN NaN
c 0.744328 -1.735166 1.749580

NaN replaced with '0':

21
one two three
a -0.576991 -0.741695 0.553172
b 0.000000 0.000000 0.000000
c 0.744328 -1.735166 1.749580

Program:(Drop missing values)

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f',

'h'],columns=['one', 'two', 'three'])

df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])

print df.dropna()

Output:

one two three

a 0.077988 0.476149 0.965836
c -0.390208 -0.551605 -2.301950
e -2.000303 -0.788201 1.510072
f -0.930230 -0.670473 1.14661

Program:(Replace missing or generic values)

import pandas as pd
import numpy as np
df = pd.DataFrame({'one':[10,20,30,40,50,2000],
'two':[1000,0,30,40,50,60]})
print df.replace({1000:10,2000:60})
Output:

one two
0 10 10
1 20 0
2 30 30
3 40 40
4 50 50
5 60 60
Result:

Thus the given program explore various variable and row filters in python for cleaning
data and applying various plot features in python has been executed successfully.

22
Ex 5. Perform Time Series Analysis and apply the various visualization
techniques.
DATE:

Aim:

To perform time series analysis and apply the various visualization techniques.

Algorithm:

1. Install the necessary packages.

2. Load the data.
3. Convert the column to date time.
4. Set date column as index.
5. Perform seasonal decomposition.
6. Plot the original data,trend,seasonal and residuals.
7. Show the plot.

Program:

import matplotlib as mpl import

matplotlib.pyplot as plt import
seaborn as sns
import numpy as np
import pandas as pd
plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120}) #
Import as Dataframe
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'])
df.head()
Date Value
0 1991-07-01 3.526591
1 1991-08-01 3.180891
2 1991-09-01 3.252221
3 1991-10-01 3.611003
4 1991-11-01 3.565869

23
# Time series data source: fpp pacakge in R.
import matplotlib.pyplot as plt
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
# Draw Plot
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
plt.show()
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in
Australia from 1992 to 2008.')

Output:

Result:
Thus the program to to perform time series analysis and apply the various visualization
techniques is executed successfully.

24
Ex 6. Perform Data Analysis and representation on a Map using variousMap data sets with
Mouse Rollover effect, user interaction, etc..
DATE:

Aim:

To Perform Data Analysis and representation on a Map usingvarious Map data sets with
Mouse Rollover effect, user interaction.

Algorithm:

1. Import the necessary libraries (folium, pandas)

2. Load the data set from a CSV file using pandas library
3. Create a map centered on a specific location using folium
4. Perform data analysis on the loaded data set (e.g. group by and count the datapoints in each
location)
5. Iterate through the location data and add markers to the map for each location
6. Iterate through the location data and add a mouse rollover effect to displayadditional
information when the user hovers over the marker.
7. Add a button to the map to allow the user to toggle the visibility of the markers.
8. Display the map using folium.

Program:

Import folium

import pandas as

# load your data set

data = pd.read_csv("your_data.csv")
# Create a map centered on a specific location
m = folium.Map(location=[45.523, -122.675], zoom_start=13)
# Perform some analysis on your data
# e.g. group by and count the data points in each location

location_data = data.groupby(['lat','lon']).count()

# Add markers to the map for each

location for i in

25
range(0,len(location_data)):

folium.Marker(

location= [location_data.iloc[i].name[0],

location_data.iloc[i].name[1]], popup=f'Count:

{location_data.iloc[i][0]}', icon=folium.Icon(color='red')

).add_to(m)

# Add a mouse rollover effect to display additional

information # when the user hovers over the marker

for i in

range(0,len(location_data)):

folium.Marker(

location= [location_data.iloc[i].name[0],

location_data.iloc[i].name[1]], popup=f'Count:

{location_data.iloc[i][0]}', icon=folium.Icon(color='red')

).add_child(folium.Popup("Additional Info")).add_to(m)

# Add a button to the map to allow the user to toggle the visibility

# of the marker

folium.LayerControl().add_to(m)
# Display the

map m

26
Output:

Result:

Thus, the program to perform data analysis and representation on a map using various
map data sets with mouse rollover effect, user interaction, etc.. is written and executed
successfully.

27
Ex 7. Build cartographic visualization for multiple datasets involving
DATE: various countries of the worldstates and districts in India etc.
Aim:

To build cartographic visualization for multiple datasets involving various countries of

the worldstates and districts in India etc.

Algorithm:

1.Collecting and cleaning the data: This would likely involve using libraries such as Pandas to
import and manipulate the data, and performing tasks such as removing missing values and
ensuring that the data is in the correct format.

2.Mapping the data: Once the data is cleaned, you would need to use a library such as Folium or
Plotly to create the actual map and overlay the data on top of it. You may also need to use
shapefiles to define the boundaries of countries, states, and districts.

3.Styling the map: After the data is mapped, you would likely want to customize the appearance
of the map to make it more visually appealing. This might involve using functions to change the
colors of the map, add labels, and create interactive elements such as hover-over text.

4.Exporting the map: Finally, you would likely want to export the map in a format that can be
easily shared or embedded on a website. This might involve using libraries such as Matplotlib or
Seaborn to save the map as an image, or using libraries such as Plotly to create an interactive
map that can be embedded in a webpage.

28
Program:

Output:

Result:

Thus the given program to build cartographic visualization for multiple datasets involving
various countries of the worldstates and districts in India has been executed successfully.

29
Ex 8. Perform EDA on Wine Quality Data Set
DATE:

Aim:

To Perform EDA on Wine Quality Data Set

Algorithm:

1. Install the necessary packages.

2. Load the wine quality dataset.
3. Print the first 5 rows of the data.
4. Create histogram to visualize the distribution of each features.
5. Create a scatter matrix to visualize the relationship between the features.

Program:

30
Output:

Result:
Thus the given program to to Perform EDA on Wine Quality Data Set is executed successfully.

31
Ex 9. Use a case study on a data set and apply the various EDA and
visualization techniques andpresent an analysis report
DATE:

Aim:

To use a case study on a data set and apply the various EDA and visualization
techniques andpresent an analysis report

Algorithm:
1. Install the necessary packages.
2. Create a data set of date and price.
3. Fix the start date and end date.
4. Generate a price corresponding to the dates.
5. List the data by columns.
6. Group the columns by date.
7. Display the data .
8. Show the plot.
9. Present an analysis report.

Program:
import datetime
import math
import pandas as pd import
random import radar
from faker import Faker fake =
Faker()
def generateData(n): listdata
= []
start = datetime.datetime(2019, 8, 1)
end = datetime.datetime(2019, 8, 30) delta =
end - start
for _ in range(n):
date = radar.random_datetime(start='2019-08-1', stop='2019-08-
30').strftime("%Y-%m-%d")
price = round(random.uniform(900, 1000), 4)

32
Date Price
2019-08-01 999.598900
2019-08-02 957.870150
2019-08-04 978.674200
2019-08-05 963.380375
2019-08-06 978.092900
2019-08-07 987.847700
2019-08-08 952.669900
2019-08-10 973.929400

listdata.append([date, price])
df = pd.DataFrame(listdata, columns = ['Date', 'Price']) df['Date']
= pd.to_datetime(df['Date'], format='%Y-%m-%d') df =
df.groupby(by='Date').mean()
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] =(14,10)
plt.plot(df)

And the plotted graph looks something like this:

Output:

Result:

Thus the given program to apply the various EDA and visualization techniques and
presenting an analysis report has been executed successfully.

33
34

Dev Lab Record
No ratings yet
Dev Lab Record
31 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
34 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
40 pages
24UAD315 DEV Final Record
No ratings yet
24UAD315 DEV Final Record
49 pages
Updated New Eda Manual
No ratings yet
Updated New Eda Manual
76 pages
DEV Lab Material
No ratings yet
DEV Lab Material
16 pages
AI & Data Science Lab Guide
No ratings yet
AI & Data Science Lab Guide
35 pages
Labdev
No ratings yet
Labdev
57 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
DEV Manual - ESEC
No ratings yet
DEV Manual - ESEC
27 pages
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
No ratings yet
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
55 pages
Dev
No ratings yet
Dev
33 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Dev Lab Record
No ratings yet
Dev Lab Record
21 pages
Aids Lab
No ratings yet
Aids Lab
45 pages
Dev Lab Manual Org
No ratings yet
Dev Lab Manual Org
28 pages
EXP - NO:1 Installation of Data Analysis and Visualization Tool Aim: Objectives
No ratings yet
EXP - NO:1 Installation of Data Analysis and Visualization Tool Aim: Objectives
34 pages
Exploratory Data Analysis Course
No ratings yet
Exploratory Data Analysis Course
139 pages
Dev Practical List
No ratings yet
Dev Practical List
34 pages
EDA Progarm and Output
No ratings yet
EDA Progarm and Output
38 pages
Data Prep & EDA for Python Users
No ratings yet
Data Prep & EDA for Python Users
12 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
DEV Lab Record
No ratings yet
DEV Lab Record
46 pages
EDA Lab Manual for Students
No ratings yet
EDA Lab Manual for Students
41 pages
BIDA Practical Print
No ratings yet
BIDA Practical Print
56 pages
Cheat Sheet
No ratings yet
Cheat Sheet
12 pages
DEV Lab Manual-1
No ratings yet
DEV Lab Manual-1
27 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
Edap Lab
No ratings yet
Edap Lab
47 pages
3rd Week Report
No ratings yet
3rd Week Report
7 pages
De&v Record
No ratings yet
De&v Record
36 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Exp 8
No ratings yet
Exp 8
4 pages
Practical D.V
No ratings yet
Practical D.V
13 pages
Pandas Data Structures Guide
No ratings yet
Pandas Data Structures Guide
72 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Pandas Research
No ratings yet
Pandas Research
14 pages
Data Analyst Course
No ratings yet
Data Analyst Course
8 pages
EDA Exp 2 Outout
No ratings yet
EDA Exp 2 Outout
7 pages
Ad3301 Data Exploration and Visualization
100% (3)
Ad3301 Data Exploration and Visualization
30 pages
Mohit
No ratings yet
Mohit
19 pages
Ad3301 Data Exploration and Visualization
No ratings yet
Ad3301 Data Exploration and Visualization
38 pages
Pandas For Python Pro Level Cheat Sheet
No ratings yet
Pandas For Python Pro Level Cheat Sheet
14 pages
41 DS PL MF
No ratings yet
41 DS PL MF
20 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Test 1 Datasheet
No ratings yet
Test 1 Datasheet
3 pages
Python For Data Analysis Jan 28
No ratings yet
Python For Data Analysis Jan 28
105 pages
Certificate
No ratings yet
Certificate
25 pages
Chirayu (1) Merged Merged
No ratings yet
Chirayu (1) Merged Merged
76 pages
Pandas 1702216043
No ratings yet
Pandas 1702216043
86 pages
Python Pandas Tutorial
96% (28)
Python Pandas Tutorial
178 pages
Robbins, Philip - Python Programming For Beginners (2023)
94% (16)
Robbins, Philip - Python Programming For Beginners (2023)
178 pages
Python 3 Cheat Sheet
94% (51)
Python 3 Cheat Sheet
2 pages
The Python Bible
97% (33)
The Python Bible
506 pages
Python For Data Science - Cheat Sheets
100% (4)
Python For Data Science - Cheat Sheets
10 pages
Beginners Python Cheat Sheet PCC All
96% (28)
Beginners Python Cheat Sheet PCC All
26 pages
Practical Projects
100% (32)
Practical Projects
478 pages
Python Quick Reference Card
94% (17)
Python Quick Reference Card
17 pages
Data Analysis with Pandas
No ratings yet
Data Analysis with Pandas
31 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
91% (46)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
Python 3.2 Reference Card
100% (12)
Python 3.2 Reference Card
2 pages
HTML Tutorial
No ratings yet
HTML Tutorial
160 pages
Python Notes For Professionals
100% (18)
Python Notes For Professionals
814 pages
NumPy Basics Cheat Sheet for Python
100% (5)
NumPy Basics Cheat Sheet for Python
14 pages
Object Oriented Python Tutorial
100% (21)
Object Oriented Python Tutorial
111 pages
The Python Manual
97% (32)
The Python Manual
196 pages
HTML - Slide Presentation
100% (1)
HTML - Slide Presentation
78 pages
HTML CSS JavaScript Basics
100% (7)
HTML CSS JavaScript Basics
225 pages
Python Cheat Sheet: Mosh Hamedani
100% (8)
Python Cheat Sheet: Mosh Hamedani
14 pages
(Hunt, J.) A Beginners Guide To Python 3 Programming
96% (47)
(Hunt, J.) A Beginners Guide To Python 3 Programming
440 pages
Root A. Python For Data Analytics. A Beginners Guide For Learning 2019
100% (8)
Root A. Python For Data Analytics. A Beginners Guide For Learning 2019
167 pages
Python Cheat Sheets
97% (33)
Python Cheat Sheets
11 pages
Python Full Notes - Working
100% (5)
Python Full Notes - Working
645 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (8)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Learn Python Visually
100% (9)
Learn Python Visually
134 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
200 Python Practice Exercises 1687850509
89% (9)
200 Python Practice Exercises 1687850509
122 pages
The Absolute Basics: Basic and Intermediate Python 3 - Notes/Cheat Sheet
100% (4)
The Absolute Basics: Basic and Intermediate Python 3 - Notes/Cheat Sheet
11 pages
Analytics Python Programming
92% (13)
Analytics Python Programming
203 pages
Data Analysis With PANDAS: Cheat Sheet
86% (7)
Data Analysis With PANDAS: Cheat Sheet
4 pages
Excel Phase 2 Assignment 1
No ratings yet
Excel Phase 2 Assignment 1
6 pages
DAA MAnual For Reference
No ratings yet
DAA MAnual For Reference
35 pages
DAA Lab Format
No ratings yet
DAA Lab Format
1 page
Zone 9 Email
No ratings yet
Zone 9 Email
1 page
Automated Irrigation
No ratings yet
Automated Irrigation
8 pages
BC
No ratings yet
BC
2 pages
Pragadesh Intern Report
No ratings yet
Pragadesh Intern Report
23 pages
In 2006, Montana State University
No ratings yet
In 2006, Montana State University
2 pages
Fashion Trends: Impact and Debate
No ratings yet
Fashion Trends: Impact and Debate
2 pages
27 3386 01 Primal e 2483 Floor Finish Polymer
No ratings yet
27 3386 01 Primal e 2483 Floor Finish Polymer
5 pages
Unit - I
No ratings yet
Unit - I
17 pages
Travis Picking Fingerpicking
100% (1)
Travis Picking Fingerpicking
3 pages
Food Web Game - ENG
No ratings yet
Food Web Game - ENG
5 pages
Vũ Hoài Ân
No ratings yet
Vũ Hoài Ân
194 pages
Angle Chasing and Phantom Points
No ratings yet
Angle Chasing and Phantom Points
3 pages
Manboa Male Enhancement Reviews This Trick Is Going Viral - What Customers Says!
No ratings yet
Manboa Male Enhancement Reviews This Trick Is Going Viral - What Customers Says!
4 pages
A Guide To Shipping Container Dimensions
No ratings yet
A Guide To Shipping Container Dimensions
5 pages
Characterization of NFRPC
No ratings yet
Characterization of NFRPC
7 pages
Handbook of Industrial Chemistry
100% (5)
Handbook of Industrial Chemistry
1,295 pages
Cambridge IGCSE™: Mathematics (Us) 0444/41 May/June 2020
No ratings yet
Cambridge IGCSE™: Mathematics (Us) 0444/41 May/June 2020
8 pages
Gallardo'S Goes To Mexico: A Case Discussion On
No ratings yet
Gallardo'S Goes To Mexico: A Case Discussion On
20 pages
A New Slogan For Drilling Fluids Engineers
No ratings yet
A New Slogan For Drilling Fluids Engineers
14 pages
Sodium Hypochlorite Solution (5.25 - 16.0 % W/W) : Section 1 - Chemical Product and Company Identification
100% (1)
Sodium Hypochlorite Solution (5.25 - 16.0 % W/W) : Section 1 - Chemical Product and Company Identification
9 pages
D&D Monster: Awakened Snowman
No ratings yet
D&D Monster: Awakened Snowman
1 page
1A Time Clauses
No ratings yet
1A Time Clauses
3 pages
Template - WordPress Web Design Proposal
100% (1)
Template - WordPress Web Design Proposal
6 pages
Export-Import Training Program
No ratings yet
Export-Import Training Program
4 pages
SWA Catalogue 4.0
No ratings yet
SWA Catalogue 4.0
191 pages
US20210301331A1 - Biochip and Manufacturing Method Therefor
No ratings yet
US20210301331A1 - Biochip and Manufacturing Method Therefor
20 pages
Concept Map Unfinish
100% (1)
Concept Map Unfinish
2 pages
ZEDi USB Windows Driver Help
No ratings yet
ZEDi USB Windows Driver Help
6 pages
... Nihonweld Condensed Pricelist For Smaw Welding Electrodes (04.11.2022)
No ratings yet
... Nihonweld Condensed Pricelist For Smaw Welding Electrodes (04.11.2022)
2 pages
DM 16 GraphRepresentation Isomophism
No ratings yet
DM 16 GraphRepresentation Isomophism
36 pages
Assignment 1 - ST36252 Testing of Hypothesis
No ratings yet
Assignment 1 - ST36252 Testing of Hypothesis
4 pages
Site Inspection Report: Robig Builders
No ratings yet
Site Inspection Report: Robig Builders
4 pages
S Agrees To Sing in M's Theatre - S Dies in The Meanwhile The Contract Is
100% (1)
S Agrees To Sing in M's Theatre - S Dies in The Meanwhile The Contract Is
9 pages
Productivity Concept
No ratings yet
Productivity Concept
70 pages