Data Analysis Lab - Final - 23-24
Data Analysis Lab - Final - 23-24
Course Objectives:
Course Outcomes:
Course Content:
UNIT-I
UNIT-II
Data Loading, Storage, and File Formats: Reading and Writing Data in Text
Format, Binary Data Formats, Interacting with Web APIs, Interacting with
Databases.
Plotting and Visualization: A Brief matplotlib API Primer, Plotting with pandas
and seaborn.
UNIT-IV
Time Series: Date and Time Data Types and Tools, Time Series Basics, Date
Ranges, Frequencies, and Shifting, Time Zone Handling, Periods and Period
Arithmetic, Resampling and Frequency Conversion, Moving Window Functions
Learning Resources:
Textbook(s):
1. Wes McKinney, Python for Data Analysis - Data Wrangling with Pandas,
NumPy, and IPython 2nd Edition. O’Reilly/SPD
References:
List of Experiments:
1. Numpy Array operations
2. Iris Dataset
3. Pandas Series
4. Pandas Dataframes
5. Canada Pizza Price Prediction
6. Mobile Phone Price Data set
7. National Universities Rankings.
8. Adidas Sales Dataset
9. Movies Dataset.
10. Avocado Prices
1.Numpy Array operations
Write a Python program to do the following operations: Library: NumPy
This famous (Fisher’s or Anderson’s) iris data set gives the measurements in ormalizes of
the variables sepal length and width and petal length and width, respectively, for 50
flowers from each of 3 species of iris. The species are Iris ormal, versicolor, and
virginica.
Format
iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length,
Sepal.Width, Petal.Length, Petal.Width, and Species.
The header is : sepal length, sepal width, petal length, petal width, iris, Species No. It has
value 1 for Iris setosa, 2 for Iris virginica and 3 for Iris versicolor.
Questions:
a) Load the data in the file Iris.txt in a 2-D array called iris.
b) Drop column whose index=4 from the array iris.
c) Display the shape, dimensions and size of iris.
d) Split iris into three 2-D arrays, each array for a different species.callthem iris1,
iris2, and iris3.
e) Print the three arrays iris1,iris2,iris3
f) Create a 1-D array header having elements “sepal length”,” sepalwidth”,
“petallength”, “petalwidth”,” species No” in that order.
g) Display the array header.
h) Find the max, min, mean, and standard deviation for the columns of the iris and
store the results in the arrays iris_max, iris_min, iris_avg, iris_std,iris_varresp.The
results must be rounded to not more than two decimal places.
i) Similarly find the max, min, mean, and standard deviation for the columns of the
iris1, iris2, iris3 and store the results in the arrays with appropriate names.
j) Check the minimum value for sepal length, sepal width , petal length, petal width
of the three species in comparison to the minimum value of sepal
length,sepalwidth,petallength,petal width for the data set as awhole and fill the
table below with True if the species value is greater than the dataset value and
False otherwise.
Iris setosa Iris virginica Iris versicolor
k) Sepal length
Sepal width
Petal length
Petal width
Compare Iris setosa’s average sepal width to that of Iris virginica.
l) Compare Iris setosa’s average petallength to that of Iris virginica.
m) Compare Iris setosa’s average petal width to that of Iris virginica.
n) Save the array iris_avg in a comma separated file named IrisMeanValues.txt on
the hard disk.
o) Save the arrays irisw_max, iris_avg, iris_min in a comma separated file named
IrisStat.txt on the hard disk.
Questions:
a) Count the number of null values in the pizza dataset and replace null values with
average of the concerned columns.
b) Calculate average price of pizza prepared by each company.
c) Find the companies, who prepared pizzas with different variants with same
diameter.
d) Which company has more pizzas? Show the result with graph.
e) Check whether the pizza data set contains null value or not. /Count the no. of null
values in the pizza dataset./ Find the number of missing data points per column.
f) Rename the column price_cad as price.
g) Identify the number of companies in each category
h) Identify which type of pizza is more expensive.
i) Find diameter of jumbo size pizza.
j) Any jumbo pizza with diameter less than 16 exists, remove such rows.
k) Calculate average price of a pizza prepared by company A.
l) Find the mean of the diameter and average price of pizzas prepared by company C.
m) Find the companies, who prepared pizzas with different variants with same
diameter.
n) Find the pizza variant with extra_mushrooms and topping with chicken.
o) What is the most expensive pizza in each company?
p) Which company has more pizzas on the menu? Show the result with graph.
q) What is the average price of pizza in each company?
Questions:
Questions:
a) Find the universities along with state whose fee is in between 25,000$ to 30,000$
b) Find university where undergraduate enrollement is morethan 25000 containing
in-state students.
c) Find the states where universities are located in three or more cities.
d) Find max & min tuition fee in each state.
e) Find the city & state where maximum tuition fee difference in the universities in
that city is greater than 5,000.
f) Print the names of universities having no. of branches along with the names of the
branches.
g) Print university name and where it is located.
h) Find cities having more than 2 universities along with state.
i) Find the no. of states and cities locating top 100 universities.
j) Draw the plot to show undergraduate nrolment of each university.
k) Draw the plot to show university name and its corresponding tuition fee.
l) Plot the no. of universities in each state having ranks>100.
Questions:
9. Movies dataset
Columns:
o Title, US Gross,
o Worldwide Gross, US DVD Sales,
o Production Budget, Release Date,
o MPAA Rating, Running Time (min),
o Distributor, Source,
o Major Genre, Creative Type,
o Director, Rotten Tomatoes Rating,
o IMDB Rating, IMDB Votes
Questions:
Questions: