[go: up one dir, main page]

0% found this document useful (0 votes)
10 views31 pages

Dev Lab Record

Uploaded by

Prasanna Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views31 pages

Dev Lab Record

Uploaded by

Prasanna Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Ex. No: 1 INSTALLING DATA ANALYSIS AND VISUALIZATION TOOL

AIM
To write a step to install data analysis and visualization tool: R / Python / Tableau Public /
Power BI.

PROCEDURE
1. R:
 Download R:
 Visit the official R website (https://cran.r-project.org/) and download the
installer for your operating system (Windows, macOS, or Linux).
 Install R by following the instructions provided in the installer.

2. Python:
 Download Python:
 Visit the official website (https://www.python.org/downloads/) and download the
Python installer for your OS (Windows, macOS, or Linux).
 Install Python by running the installer and making sure to check the option to add Python to
your system’s PATH during installation.

(i) INSTALL NUMPY WITH PIP

NumPy (Numerical Python) is an open-source core Python library for scientific


computations. It is a general-purpose array and matrices processing package.

pip install numpy

(ii) INSTALL JUPYTER LAB

Install Jupyter Lab with pip:

pip install jupyterlab

St. Joseph’s College of Engineering 1


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Once installed, launch Jupyter Lab with:

jupyter-lab

(iii) JUPYTER NOTEBOOK

Install the classic Jupyter Notebook with:

pip install notebook

To run the notebook:

Jupyter notebook

(iv) INSTALL SCIPY

Scipy is a Python library that is useful in solving many mathematical equations and algorithms.
It is designed on the top of Numpy library. SCIPY means Scientific Python.

pip install scipy

(v) INSTALL PANDAS

Pandas is a Python Package that provides fast, flexible, and expressive data structures
designed to make working with “relational” or “labeled” data both easy and intuitive.

pip install pandas

(vi) INSTALL MATPLOTLIB

Matplotlib is a comprehensive library for creating static, animated, and interactive


visualizations in Python. Working with “relational” or “labeled” data both easy and
intuitive.

St. Joseph’s College of Engineering 2


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

pip install matplotlib

3. Tableau Public:
 Tableau Public
 It is a web-based tool, so there’s no installation required. Simply visit the Tableau
Public Website (https”//public.tableau.com/s/gallery) and create an account to start
using it.

4. Power Bi:
 Download Power BI Desktop:
 Go to the official Power BI wenbsite (https://powerbi.microsoft.com/en-us/desktop/)
and download Power BI Desktop.
 Installer Power BI Desktop by running the installer.

PROGRAM 1
import numpy as np
import pandas as pd
hafeez = [‘Hafeez’, 19]
aslam = [‘Aslam’, 21]
kareem = [‘kareem’,
18]
dataframe = pd.DataFrame([hafeez, aslam, kareem], columns = [‘Name’, ‘Age’])
print(dataframe)

Output 1

PROGRAM 1
import numpy as

St. Joseph’s College of Engineering 3


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

np import
pandas as pd
import matplotlib.pyplot as plt
data =
pd.read_csv(“CountryData.csv”)
plt.hist(data)
plt.xlabel(“code”)
plt.ylabel(“Total_personal_income”)
plt.show()

CREATE A CSV FILE IN EXCEL:

 First create a CSV file in excel with attributes ‘code’ and ‘Total_personal_income’.

 Save the file with filename mentioned above “CountryData” with extension as .csv file.

Output 2

St. Joseph’s College of Engineering 4


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

RESULT:

St. Joseph’s College of Engineering 5


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Ex. No:2 WORKING WITH NUMPY ARRAYS, PANDAS DATA FRAMES, BASIC PLOTS

USING MATPLOTLIB
AIM

To write the steps for Working with Numpy arrays, Pandas data frames, Basic plots using Matplotlib

PROCEDURE

1. NumPy:

NumPy is a fundamental library for numerical computing in Python. It provides support for multi-
dimensional arrays and various mathematical functions. To get started, you’ll first need to install
NumPy if you haven’t already (you can use pip):

pip install numpy

Once NumPy is installed, you can use it as follows:


import numpy as np

# Creating NumPy array


arr = np.array([1, 2, 3, 4, 5])
print("Original Array:")
print(arr)

# Basic operations
mean = np.mean(arr)
sum = np.sum(arr)
print("\nMean of the array:", mean)
print("Sum of the array:", sum)

# Mathematical functions
square_root = np.sqrt(arr)
exponential = np.exp(arr)
print("\nSquare root of the array:")
print(square_root)

St. Joseph’s College of Engineering 6


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

print("Exponential of the array:")


print(exponential)

# Indexing and Slicing


first_element = arr[0]
sub_array = arr[1:4]
print("\nFirst element of the array:", first_element)
print("Sub-array (elements from index 1 to 3):")
print(sub_array)

# Array Operations
combined_array = np.concatenate([arr, sub_array])
print("\nCombined array:")
print(combined_array)

OUTPUT:

Pandas:

Pandas is a powerful library for data manipulation and


analysis. You can install pandas using pip:

pip install pandas

St. Joseph’s College of Engineering 7


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Working with Pandas DataFrames:

import pandas as pd

# Creating a DataFrame from a dictionary

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],

'Age': [25, 30, 35, 28, 22],

'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami']

df = pd.DataFrame(data)

# Display the entire DataFrame

print("DataFrame:")

print(df)

# Accessing specific columns

print("\nAccessing 'Name' Column:")

print(df['Name'])

# Adding a new column

df['Salary'] = [50000, 60000, 75000, 48000, 55000]

# Filtering data

print("\nPeople older than 30:")

print(df[df['Age'] > 30])

St. Joseph’s College of Engineering 8


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

# Sorting by a column

print("\nSorting by 'Age' in descending order:")

print(df.sort_values(by='Age', ascending=False))

# Aggregating data

print("\nAverage age:")

print(df['Age'].mean())

# Grouping and aggregation

grouped_data = df.groupby('City')['Salary'].mean()

print("\nAverage salary by city:")

print(grouped_data)

# Applying a function to a column

df['Age_Squared'] = df['Age'].apply(lambda x: x ** 2)

# Removing a column

df = df.drop(columns=['Age_Squared'])

# Saving the DataFrame to a CSV file

df.to_csv('output.csv', index=False)

# Reading a CSV file into a DataFrame

new_df = pd.read_csv('output.csv')

print("\nDataFrame from CSV file:")

print(new_df)

St. Joseph’s College of Engineering 9


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

OUTPUT
DataFrame:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 28 Houston
4 Emily 22 Miami

Accessing 'Name' Column:


0 Alice
1 Bob
2 Charlie
3 David
4 Emily
Name: Name, dtype: object

People older than 30:


Name Age City Salary
2 Charlie 35 Chicago 75000

Sorting by 'Age' in descending order:


Name Age City Salary
2 Charlie 35 Chicago 75000
1 Bob 30 Los Angeles 60000
3 David 28 Houston 48000
0 Alice 25 New York 50000
4 Emily 22 Miami 55000

Average age:
28.0

St. Joseph’s College of Engineering 10


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Average salary by city:


City
Chicago 75000.0
Houston 48000.0
Los Angeles 60000.0
Miami 55000.0
New York 50000.0
Name: Salary, dtype: float64

DataFrame from CSV file:


Name Age City Salary
0 Alice 25 New York 50000
1 Bob 30 Los Angeles 60000
2 Charlie 35 Chicago 75000
3 David 28 Houston 48000
4 Emily 22 Miami 55000

St. Joseph’s College of Engineering 11


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

2. Matplotlib:

Matplotlib is a popular library for creating static, animated, or interactive plots and
graphs. Install Matplotlib using pip:
pip install matplotlib

Example of creating a basic plot:

import matplotlib.pyplot as plt


import numpy as np

# Sample data
data = np.random.normal(0, 1, 1000) # Normally distributed data for histogram and box plot
x = np.linspace(0, 10, 100) # Linear space for scatter plot
y = np.sin(x) # Sine wave for scatter plot

# 1. Histogram
plt.figure(figsize=(8, 6))
plt.hist(data, bins=30, color='skyblue', edgecolor='black')
plt.title('Histogram of Normally distributed Data')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

# 2. Box Plot
plt.figure(figsize=(8, 6))
plt.boxplot(data, vert=False, patch_artist=True, boxprops=dict(facecolor='lightgreen'))
plt.title('Box Plot of Normally Distributed Data')
plt.xlabel('Value')
plt.show()

St. Joseph’s College of Engineering 12


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

# 3. Scatter Plot
plt.figure(figsize=(8, 6))
plt.scatter(x, y, color='red', marker='o')
plt.title('Scatter Plot of Sine Wave')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.grid(True)
plt.show()

St. Joseph’s College of Engineering 13


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

RESULT:

St. Joseph’s College of Engineering 14


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

ExNo:3. WINE QUALITY ANALYSIS

1. Problem Statement: Write a python program for performing Exploratory


Data Analysis using Wine Data sets.

2. Expected Learning Outcomes: To train the students to understand the


different Exploratory Data Analysis using visualization methods.

3. Problem Analysis: Take wine datasets and plotting different charts and graphs.
An Exploratory Data Analysis consists of methods for analyzing data in order to
extract meaningful insights and other useful characteristics of data.

4. Algorithm:

Step 1: Load the dataset from Online


Step 2: Understand the check quality using bar
chart. Step 3: Find the correlated columns.
Step 4: Generate the heat map to understand the correlation.
Step 5: Find Alcohol Distribution Graph Level through a distribution.
5. Coding:
Red Wine Quality Analysis
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
#from statsmodels.multivariate.factor import Factor from scipy.stats import skew
from statsmodels.multivariate.factor import Factor
from scipy.stats import skew
df_red = pd.read_csv(r'C:\Users\m_aks\college\wine.csv')#delimiter=';')
print(df_red.columns)
print(df_red.describe())
print(df_red.shape)
print(df_red.info())
sns.set(rc={'figure.figsize': (14, 8)})
sns.countplot(x='quality', data=df_red) #sns.countplot(df_red['quality'])
plt.show()
sns.pairplot(df_red)
St. Joseph’s College of Engineering 15
IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

plt.show()
sns.heatmap(df_red.corr(), annot=True, fmt='.2f', linewidths=2)
plt.show()
sns.distplot(df_red['alcohol'])
plt.show()

Output:

Perform a quality wise analysis using bar chart.

Correlation amounts the columns

St. Joseph’s College of Engineering 16


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Heatmap display for the correlation of the columns

Alcohol distribution level

RESULT:

St. Joseph’s College of Engineering 17


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

ExNo:4 CREATING LINE AND BAR GRAPH USING TABLEAU

AIM

To learn basic tableau functions for dashboard and to create basic Data Visualizations like
Line Chart and Bar Graph.

PROCEDURE

Step 1: Import data into tableau workspace from the computer.

Step 2: Under the Sheets Tab, three sheets will become visible namely Orders, People, and Returns.
Double click on Orders Sheet, and it opens up just like a spreadsheet.

Step 3: use of Data Interpreter, also present under Sheets Tab. By clicking on it, to get a formatted
sheet.

Step 4: Go to the worksheet. Click on the tab sheet 1 at the bottom left of the tableau workspace.

Step 5: In dimension under the Data pane, drag the order date to the Column shelf.

Step 6: the measure tab, drag the sales field onto the Rows shelf.

St. Joseph’s College of Engineering 18


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

OUTPUT:

BAR CHART IN TABLEAU

AIM
To create bar chart in Tableau.

PROCEDURE

Step 1: Create a new worksheet.

Step 2 : Add State and Country under Data pane to Detail on the Marks card. We obtain the map
view

Step 3: Drag Region to the Filters shelf, and then filter down to South only. The map view now
zooms in to the South region only, and a mark represents each state.

St. Joseph’s College of Engineering 19


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Step 4: Drag the Sales measure to the Color tab on the Marks card. We obtain a filled map with the
colors showing the range of sales in each state.

Step 5: We can change the color scheme by clicking Color on the Marks card and selecting Edit
Colors. We can experiment with the available palettes.

Step 6: In the Data pane, drag a field and drop it directly on top of another field or right-click the
field and select.

Step 7: Duplicate the Profit Map worksheet and name it Negative Profit Bar Chart.

Step 8: Click show me on the Negative Profit Bar Chart worksheet. Show me presents the number
of ways in which a graph can be plotted between items mentioned in the worksheet. From show me
select the horizontal bar option and the view updates to horizontal from vertical bars instantly.

St. Joseph’s College of Engineering 20


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

OUTPUT:

RESULT:

St. Joseph’s College of Engineering 21


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Ex No: 5 ADDING FILTERS IN THE DATA AND COLORING


THE GRAPH

AIM :
To add filters and colors to the data set.

PROCEDURE:
Step 1: Category is present under the Dimensions pane. Drag it to the columns shelf and place
it next to Year (order Date)
Step 2: The category should be placed to the right of year.
Step 3: A a bar chart type from a line is created. The chart shows the overall sales for every
product by year.
Step 4: To add labels to the view, click show mark labels on the toolbar
Step 5: Double-click or drag the sub-category dimension to the Columns shelf.
Step 6: Displays a bar for every sub-category broken down by category and year.
Step 7: Under Dimensions, right-click Order Date and select Show Filter. Repeat for Sub->
category field also.
Step 8 : In the Data pane, under Measures, drag Profit to Color on the Marks card.

St. Joseph’s College of Engineering 22


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

OUTPUT:

RESULT:

St. Joseph’s College of Engineering 23


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Ex No: 6 CREATING INTERACTIVE DASHBOARD USING SUPERSTORE DATASET

AIM
To create Interactive dashboard using the Superstore Data Set in Tableau.

PROCEDURE

Step 1: Click the New dashboard button.

Step 2: Drag Sales in the South worksheet which is created before to the empty dashboard

Step 3: Drag Profit Map worksheet to the dashboard, and drop it on top of the Sales in the South
view. Both views can be seen at once. To be able to present data in a manner so that others can
understand it we can arrange the dashboard to our liking.

Step 4: On the Sales South worksheet in the dashboard view, click under the Region

St. Joseph’s College of Engineering 24


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

OUTPUT:

RESULT:

St. Joseph’s College of Engineering 25


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

ExNo:7 PYTHON PROGRAM TO VISUALIZE THE LOLLIPOP CHART USING PANDAS.


USING COMPANY SALES DATA SET

1. Problem Statement: Write a python program to visualize the lollipop chart


using pandas. using company sales data set.

2. Expected Learning Outcomes: To train the students to understand the


visualize the data sets using lollipop chart.

3. Problem Analysis: Take company sales data set and plotting using lollipop
chart. It is a method for analyzing data in order to extract meaningful insights
and other useful characteristics of data.

4. Algorithm:

Step1: Load the dataset from Online


Step 2: Creating an empty chart.
Step 3: Plot using plt.stem.
Step 4: Format the chart.

5. Coding:

# Red Customer Data Analysis #


importing modules
from pandas import *
import matplotlib.pyplot as plt

# reading CSV file


d = read_csv("company_sales_data.csv")

# creating an empty chart fig,


axes = plt.subplots()

# plotting using plt.stem axes.stem(d['month_number'],


d['total_profit'],
use_line_collection=True, basefmt=' ')
# starting value of y-axis axes.set_ylim(0)
# details and formatting of chart
plt.title('Total Profit')
plt.xlabel('Month') plt.ylabel('Profit')
plt.xticks(d['month_number'])
plt.show()

St. Joseph’s College of Engineering 26


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Output:

RESULT:

St. Joseph’s College of Engineering 27


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Ex. No:8 DATA TRANSFORAMTION

1. Problem Statement: Write a python program to create a data frame


and perform the transformations like deduplication identification,
handling missing values and replace values.

2. Expected Learning Outcomes: To train the students to understand


the Data Transformations.

3. Problem Analysis: Create a data frame and perform the data


transformation operations like deduplication and handling missing
values.

4. Algorithm:

Step1: Create a Data Frame with values


Step 2: Creating an empty chart.
Step 3: Identify the duplicate data sets
Step 4: Perform De-duplication
Step 5: Assign the NaN values
Step 6: Replace the NaN values with mean values.

4. Coding:

# Data frame creation and Transformation


import pandas as pd
import numpy as np
data = pd.DataFrame({"a": ["one","two"]*3, "b": [1,1,2,3,2,3]})
print(data)
print(data.duplica
ted())
print(data.drop_d
uplicates())
data["c"]=np.nan

St. Joseph’s College of Engineering 28


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

data.replace([np.nan],data.b.mean(),inplace=Tr
ue) print(data)

Output:

Data frame creation and transformation


a b
0 one 1
1 two 1
2 one 2
3 two 3
4 one 2
5 two 3

0 False
1 False
2 False
3 False
4 True
5 True
dtype: bool

a b
0 one 1
1 two 1
2 one 2
3 two 3

a b c
0 one 1 2.0
1 two 1 2.0
2 one 2 2.0
3 two 3 2.0
4 one 2 2.0
5 two 3 2.0

RESULT:

St. Joseph’s College of Engineering 29


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

Ex No:9 CREATE, INDEXING AND SLICING OF 1D, 2D AND 3D ARRAY


USING NUMPY

1. Problem Statement: Write a python program using Numpy package functions


and 2D array to perform simple matrix operation.
2. Expected Learning Outcomes: To train the students to understand
the fundamental of NumPy array.

3. Problem Analysis: Create arrays and perform various operations.

4. Algorithm:

Step1: Create a NumPy array


Step 2: Print array rank
Step 3: Create a two-dimensional Array
Step 4: Crate an array using tuple
Step 5: Perform Indexing
Step 6: Print array values.

5. Coding:
# Numpy 1D and 2D Array manipulation
# Python program for
# Creation of Arrays
import numpy as np

# Creating a rank 1
Arrayarr = np.array([1, 2, 3])
print("Array with Rank 1: \n",arr)

# Creating a rank 2
Arrayarr = np.array([[1, 2, 3], [4, 5, 6]])
print("Array with Rank 2: \n", arr)

# Creating an array from tuple


arr = np.array((1, 3, 2))
print("\nArray created using " "passed tuple:\n", arr)# Two Dimensional
Array arr = np.array([[-1, 2, 0, 4], [4, -0.5, 6, 0], [2.6, 0, 7, 8], [3, -7, 4, 2.0]])
print("Initial Array: ")
print(arr)

# Printing a range of Array


# with the use of slicing method
sliced_arr = arr[:2, ::2]
print ("Array with first 2 rows and" " alternate columns(0 and 2):\n", sliced_arr)

St. Joseph’s College of Engineering 30


IT1302 – Data Exploration and Visualization (Lab Integrated) Dept of IT

# Printing elements at# specific Indices


Index_arr = arr[[1, 1, 0, 3], [3, 2, 1, 0]]
print ("\nElements at indices (1, 3), " "(1, 2), (0, 1), (3, 0):\n", Index_arr)

Output:

Create and perform basic operations in Array


Array with Rank 1:
[1 2 3]
Array with Rank 2:
[[1 2 3]
[4 5 6]]
Array created using passed tuple:

[1 3 2]
Initial Array:
[[-1. 2. 0. 4. ]
[ 4. -0.5 6. 0. ]
[ 2.6 0. 7. 8. ]
[ 3. -7. 4. 2. ]]

Array with first 2 rows and alternate


columns(0 and 2):
[[-1. 0.]
[ 4. 6.]]

Elements at indices (1, 3), (1, 2), (0, 1), (3, 0):
[0. 6. 2. 3.]

RESULT:

St. Joseph’s College of Engineering 31

You might also like