0% found this document useful (0 votes)

156 views11 pages

Data Analysis Lab - Final - 23-24

The document describes a data analysis lab course that covers: 1. Using Python libraries like NumPy and Pandas for data manipulation and visualization. 2. Performing data cleaning, wrangling, and various operations on data. 3. Visualizing data using Matplotlib and seaborn. The course content is divided into 4 units covering topics like NumPy, Pandas, data loading/storage, cleaning, wrangling, aggregation, time series analysis, and visualization. Experiments include tasks with NumPy, the Iris dataset, Series, DataFrames, and predictive analysis on various real-world datasets.

Uploaded by

forallofus435

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

156 views11 pages

Data Analysis Lab - Final - 23-24

Uploaded by

forallofus435

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

CS352 Data Analysis Lab

Course Objectives:

The main objectives of the course are to:

1. Introduce Python libraries used for data manipulation and visualization
2. Create awareness on data cleaning, wrangling and various operations on data
3. Impart knowledge on visualizing the data using various plots

Course Outcomes:

On successful completion of the course, students will be able to:

1. Perform operations on data using basic concepts of Numpy and Pandas
2. Perform Data cleaning and Data wrangling operations
3. Visualize data using the tool Matplotlib
4. Perform operations on aggregations and time series data

Course Content:

UNIT-I

NumPy Basics: Arrays and Vectorized ComputationThe NumPyndarray,

Universal Functions, Array-Oriented Programming with Arrays, File Input and
Output with Arrays, Linear Algebra, Pseudorandom Number Generation, Example:
Random Walks

Pandas Data Structure: Introduction to pandas Data Structure, Essential

Functionality, Summarizing and Computing Descriptive Statistics

UNIT-II

Data Loading, Storage, and File Formats: Reading and Writing Data in Text
Format, Binary Data Formats, Interacting with Web APIs, Interacting with
Databases.

Data Cleaning and Preparation: Handling Missing Data, Data Transformation,

String Manipulation.
UNIT-III

Data Wrangling: Join, Combine, and Reshape: Hierarchical Indexing,

Combining and Merging Datasets, Reshaping and Pivoting

Plotting and Visualization: A Brief matplotlib API Primer, Plotting with pandas
and seaborn.

UNIT-IV

Data Aggregation and Group Operations: Group By Mechanics, Data

Aggregation, Apply: General split-apply-combine, Pivot Tables and Cross-
Tabulation.

Time Series: Date and Time Data Types and Tools, Time Series Basics, Date
Ranges, Frequencies, and Shifting, Time Zone Handling, Periods and Period
Arithmetic, Resampling and Frequency Conversion, Moving Window Functions

Learning Resources:

Textbook(s):

1. Wes McKinney, Python for Data Analysis - Data Wrangling with Pandas,
NumPy, and IPython 2nd Edition. O’Reilly/SPD

References:

1. Jake VanderPlas, Python Data Science Handbook Essential Tools for

Working with Data. O’Reilly/SPD
2. David Taieb ,”Data Analysis with Python: A Modern Approach “ 1st
Edition, Packt Publishing

List of Experiments:
1. Numpy Array operations
2. Iris Dataset
3. Pandas Series
4. Pandas Dataframes
5. Canada Pizza Price Prediction
6. Mobile Phone Price Data set
7. National Universities Rankings.
8. Adidas Sales Dataset
9. Movies Dataset.
10. Avocado Prices
1.Numpy Array operations
Write a Python program to do the following operations: Library: NumPy

a) Create a one-dimensional array and perform all operations on it.

b) Create multi-dimensional arrays and find its shape and dimension
c) Create a matrix full of zeros and ones
d) Reshape and flatten data in the array
e) Perform arithmetic operations on multi-dimensional arrays
f) Append data vertically and horizontally
g) Apply indexing and slicing on array
h) Use statistical functions on array – Min, Max, Mean, Median and Standard Deviation
i) Dot matrix product of two arrays
j) Compute the Eigen values of a matrix
k) Solve a linear matrix equation such as 3 * x0 + x1 = 9, x0 + 2 * x1 = 8
l) Compute the multiplicative inverse of a matrix
m) Compute the rank of a matrix
n) Compute the determinant of an array
o) Perform transpose and change of axes operations on arrays.
p) Perform splitting operations on arrays.

2. Fisher’s Iris Dataset

Description:

This famous (Fisher’s or Anderson’s) iris data set gives the measurements in ormalizes of
the variables sepal length and width and petal length and width, respectively, for 50
flowers from each of 3 species of iris. The species are Iris ormal, versicolor, and
virginica.
Format
iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length,
Sepal.Width, Petal.Length, Petal.Width, and Species.
The header is : sepal length, sepal width, petal length, petal width, iris, Species No. It has
value 1 for Iris setosa, 2 for Iris virginica and 3 for Iris versicolor.

Questions:

a) Load the data in the file Iris.txt in a 2-D array called iris.
b) Drop column whose index=4 from the array iris.
c) Display the shape, dimensions and size of iris.
d) Split iris into three 2-D arrays, each array for a different species.callthem iris1,
iris2, and iris3.
e) Print the three arrays iris1,iris2,iris3
f) Create a 1-D array header having elements “sepal length”,” sepalwidth”,
“petallength”, “petalwidth”,” species No” in that order.
g) Display the array header.
h) Find the max, min, mean, and standard deviation for the columns of the iris and
store the results in the arrays iris_max, iris_min, iris_avg, iris_std,iris_varresp.The
results must be rounded to not more than two decimal places.
i) Similarly find the max, min, mean, and standard deviation for the columns of the
iris1, iris2, iris3 and store the results in the arrays with appropriate names.
j) Check the minimum value for sepal length, sepal width , petal length, petal width
of the three species in comparison to the minimum value of sepal
length,sepalwidth,petallength,petal width for the data set as awhole and fill the
table below with True if the species value is greater than the dataset value and
False otherwise.
Iris setosa Iris virginica Iris versicolor
k) Sepal length
Sepal width
Petal length
Petal width
Compare Iris setosa’s average sepal width to that of Iris virginica.
l) Compare Iris setosa’s average petallength to that of Iris virginica.
m) Compare Iris setosa’s average petal width to that of Iris virginica.
n) Save the array iris_avg in a comma separated file named IrisMeanValues.txt on
the hard disk.
o) Save the arrays irisw_max, iris_avg, iris_min in a comma separated file named
IrisStat.txt on the hard disk.

3. Pandas Series Programs

Write a Python program to do the following operations: Library: PandasSeries

a) To add, subtract, multiple and divide two pandas Series.

b) To convert all the string values to upper, lower cases in a given pandas series.
Also find the length of the string values.
c) To remove whitespaces, left sided whitespaces and right sided whitespaces of the
string values of a given pandas series.
d) To create a series from a list, numpy array and dict
e) To calculate the number of characters in each word in a series.
f) To compare the elements of the two Pandas Series.
Sample Series: [2, 4, 6, 8, 10], [1, 3, 5, 7, 10]
g) To convert a Panda module Series to Python list and it’s type.
h) To create a series from a list, numpy array and dict.
i) To Combine many series to form a dataframe.
j) to stack two series vertically and horizontally
k) To create and display a DataFrame from a specified dictionary data which has the
index labels.
l) Identify frequency counts of unique items of a series.
m) To get the items of series A not present in series B?
n) To convert a numpy array to a dataframe of given shape.

4. Pandas DataFrames Programs

Python program to do the following operations: Library: Pandas DataFrames

I. import and read a CSV file
II. To Generate a basic understanding of a given data.
a. Print First 5 rows, last 5 rows of data
b. Check the basic information of the data
c. Extract the shape of the data
d. Print the unique values of the marital status field based on the column
e. To make it consistent data( widow and widowed )are two different naming for the same
category on the column values
f. Check for duplicates and null values in the whole dataset
III. select and filter data based on conditions
a. Select a subset of the data points(Birthdate, Education, and Income of every customer)
from the data frame.
b. Using loc() and iloc() methods to retrieve the first seven data points.
c. Filter data using the loc() and isin() methods.(Note:we choose the variable of interest
and we select the categories )
d. In our data, that satisfies two conditions such as choosing the customers with an income
higher than 75,000 and with a master’s degree(using python operators) and display the
output.
IV. apply various data operation tools such as creating new variables or changing data types
We can apply different operations on the dataset using Pandas such as
a. setting a new index with the variable of our interest using the .set_index() method
b. sorting the data frame by one of the variable using .sort_values() with ascending or
descending order;
c. creating a new variable which could be the result of a mathematical operation such as
sum of other variables
d. changing the datatype of variables into datetime or integer types
e. determining the age based on year of birth
f. creating the week date (calendar week and year) from the purchase date
V. perform data aggregation using group by and pivot table methods
After we created new variables, we can further aggregate and to analyze data by groups,
a. To apply groupby()method to find the mean of income ,recency,number of web and
store purchases by educational group.
b. To apply pivot_table()method to find the aggregated sum of purchases and mean of
recency per education and marital status group.

5. Canada Pizza Price Prediction

Columns:

company, price_cad, diameter, topping, variant, size, extra_sauce, extra_cheese,

extra_mushrooms

Questions:

a) Count the number of null values in the pizza dataset and replace null values with
average of the concerned columns.
b) Calculate average price of pizza prepared by each company.
c) Find the companies, who prepared pizzas with different variants with same
diameter.
d) Which company has more pizzas? Show the result with graph.
e) Check whether the pizza data set contains null value or not. /Count the no. of null
values in the pizza dataset./ Find the number of missing data points per column.
f) Rename the column price_cad as price.
g) Identify the number of companies in each category
h) Identify which type of pizza is more expensive.
i) Find diameter of jumbo size pizza.
j) Any jumbo pizza with diameter less than 16 exists, remove such rows.
k) Calculate average price of a pizza prepared by company A.
l) Find the mean of the diameter and average price of pizzas prepared by company C.
m) Find the companies, who prepared pizzas with different variants with same
diameter.
n) Find the pizza variant with extra_mushrooms and topping with chicken.
o) What is the most expensive pizza in each company?
p) Which company has more pizzas on the menu? Show the result with graph.
q) What is the average price of pizza in each company?

6.Mobile Phone Price Data set:

 Columns:
o Brand: the manufacturer of the phone
o Model: the name of the phone model
o Storage (GB): the amount of storage space (in gigabytes) available on the
phone
o RAM (GB): the amount of RAM (in gigabytes) available on the phone
o Screen Size (inches): the size of the phone's display screen in inches
o Camera (MP): the megapixel count of the phone's rear camera(s)
o Battery Capacity (mAh): the capacity of the phone's battery in
milliampere hours
o Price ($): the retail price of the phone in US dollars

Questions:

a) Identify the models & the price released by each brand.

b) Identify the correlation between Battery Capacity and price.
c) Find how many models are there per each Battery capacity with same price.
d) Count the number of models in each brand with highest storage. Draw the graph.
e) Identify how many models are released by each brand.
f) Find the RAM capacity of all models of every brand.
g) Identify the correlation between Battery Capacity and price.
h) Find how many models are there per each Battery capacity.
i) Calculate average price of each brand.
j) Find which mobile brand has highest price.
k) Identify any missing values are there in mobile phone price dataset.
l) Display all models associated with apple brand.
m) Find the mobile prices based on Camera (MP).
n) List the models along with brands which have highest storage.
o) How many models in each brand having RAM>6.
p) List the models having price >600 and Storage between 100 and 200.

7.National Universities Rankings

Columns:

o Name – institution name,

o Location – City and state where located,
o Rank – Ranking according to U.S News & World Report ,
o Description – Snippet of text overview from U.S News ,
o Tuition and fees – Combined tuition and fees for out–of–state students ,
o In–state – Tuition and fees for in–state students ,
o Undergraduate Enrollment – Number of enrolled undergraduate students .

Questions:

a) Find the universities along with state whose fee is in between 25,000$ to 30,000$
b) Find university where undergraduate enrollement is morethan 25000 containing
in-state students.
c) Find the states where universities are located in three or more cities.
d) Find max & min tuition fee in each state.
e) Find the city & state where maximum tuition fee difference in the universities in
that city is greater than 5,000.
f) Print the names of universities having no. of branches along with the names of the
branches.
g) Print university name and where it is located.
h) Find cities having more than 2 universities along with state.
i) Find the no. of states and cities locating top 100 universities.
j) Draw the plot to show undergraduate nrolment of each university.
k) Draw the plot to show university name and its corresponding tuition fee.
l) Plot the no. of universities in each state having ranks>100.

8.Adidas Sales Dataset

Columns:

o Retailer ID, Invoice Date,

o Region, State,
o City, Product,
o Price per Unit, Units Sold,
o Total Sales, Operating Profit,
o Operating Margin, Sales Method

Questions:

a) List all the products sold in every region.

b) Find the Cities & the retailers who sold womens related products.
c) Find the total sales of each womens product in in-store method.
d) For each product, find region wise total sales & units sold.
e) For men’s & women’s products, find state wise units sold & total sales.
f) Find states where women’s products sold were more than men’s products.
g) Find region wise units sold for each product
h) Find region wise profit for every retailer.
i) Find the states along with units sold where products sold in more than one city in
the state.
j) Draw plot to show monthly sales in 2020 in every region
k) Draw the plot to show year wise sales in every region.
l) Draw plots to show Region wise sales in every year.

9. Movies dataset
Columns:

o Title, US Gross,
o Worldwide Gross, US DVD Sales,
o Production Budget, Release Date,
o MPAA Rating, Running Time (min),
o Distributor, Source,
o Major Genre, Creative Type,
o Director, Rotten Tomatoes Rating,
o IMDB Rating, IMDB Votes

Questions:

a) Find number of movies released under each genre in each year.

b) Find movies with loss every year for each distributor.
c) Find the Directors who directed for each creative type with IMDB rating above 6.
d) Draw the plot to compare the number of movies released till now by each director.
e) Find the genres of the movies released in each year in the ascending order.
f) Find the budgets of the movies released by each distributor along with movie
names.
g) Find the movies with the same IMDD rating but with different no.of IMDD rating.
h) Write a Pandas program to get those movies whose revenue more than 2 million
and spent less than 1 million.
i) Find the no. of movies in each genre under each source.
j) Find the no. of movies released in each decade.
k) Draw the plot showing the no. of movies released in each genre.
l) Show the no.of movies not rated under each genre in each fiction.
10.Avocado Prices

Historical data on avocado prices and sales volume in multiple US markets

Some relevant columns in the dataset:

o Date - The date of the observation

o AveragePrice - the average price of a single avocado
o type - conventional or organic
o year - the year
o Region - the city or region of the observation
o Total Volume - Total number of avocados sold
o 4046 - Total number of avocados with PLU 4046 sold
o 4225 - Total number of avocados with PLU 4225 sold
o 4770 - Total number of avocados with PLU 4770 sold
About this file
Numerical column names refer to price lookup codes.
1. small Hass
2. large Hass
3. extra large Hass

Questions:

a) How to identify the unique values in the region column.

b) What is the maximum price for an avocado in the dataset.
c) Identify the type distribution and take a single avocado in the dataset and find out
the median price ,mean, and standard deviation.
d) Find the highest, lowest price for conventional avocado’s in year with location.
e) Draw the plots of the distribution of average price for different types of Avocados
f) Find the correlation matrix to measure the strength of the correlation between
variables.
g) Find out the volume of avocado sales has increased in the last 5 years.

Data Analysis With Python
No ratings yet
Data Analysis With Python
51 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
2A - Python+Data Analysis For Pyhton2 v2
No ratings yet
2A - Python+Data Analysis For Pyhton2 v2
38 pages
Aids Lab
No ratings yet
Aids Lab
45 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
ML Programs
No ratings yet
ML Programs
41 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
IDS Syllabus
No ratings yet
IDS Syllabus
5 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Ds Lab-1
No ratings yet
Ds Lab-1
40 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
ML Manual
No ratings yet
ML Manual
21 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
62 pages
2.1 - Introduction To Data Analytics
No ratings yet
2.1 - Introduction To Data Analytics
32 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
45 pages
Report
No ratings yet
Report
18 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
Gec Practicals
No ratings yet
Gec Practicals
31 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
Python Ca22
No ratings yet
Python Ca22
14 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
DS Final
No ratings yet
DS Final
46 pages
Data Science
No ratings yet
Data Science
18 pages
Data Science
No ratings yet
Data Science
42 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
DAP Lab Manual
No ratings yet
DAP Lab Manual
20 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
11th PGM
No ratings yet
11th PGM
9 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
User Manual - Escooter E300SE - EN FR DE IT ES NL 1
No ratings yet
User Manual - Escooter E300SE - EN FR DE IT ES NL 1
110 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Data Science
No ratings yet
Data Science
3 pages
Python CA2
No ratings yet
Python CA2
11 pages
Group Name Ne Ip NE Type Shelf Name Shelf Status Software Hardware
No ratings yet
Group Name Ne Ip NE Type Shelf Name Shelf Status Software Hardware
4 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
Pandas NumPy Practice Questions
No ratings yet
Pandas NumPy Practice Questions
2 pages
Desktop Ruq79bm 20250106 1724
No ratings yet
Desktop Ruq79bm 20250106 1724
48 pages
00 Nego Lecture Notes
No ratings yet
00 Nego Lecture Notes
18 pages
57 Muslim Country List PDF
No ratings yet
57 Muslim Country List PDF
1 page
Instant Ebooks Textbook The Rust Programming Language 2nd Edition Steve Klabnik Download All Chapters
100% (4)
Instant Ebooks Textbook The Rust Programming Language 2nd Edition Steve Klabnik Download All Chapters
49 pages
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
8 pages
Modigliani - Miller Approach
100% (1)
Modigliani - Miller Approach
12 pages
Feeders and Silos
No ratings yet
Feeders and Silos
8 pages
Hands-On Lab: Image Generation in Action With Microsoft Copilot and Microsoft Designer (Optional)
No ratings yet
Hands-On Lab: Image Generation in Action With Microsoft Copilot and Microsoft Designer (Optional)
10 pages
Catalogue Plastics Ukusa 07 2016
No ratings yet
Catalogue Plastics Ukusa 07 2016
13 pages
MH12UP7077 Slavia Tax Invoice Customer1
No ratings yet
MH12UP7077 Slavia Tax Invoice Customer1
2 pages
Cover Letter
No ratings yet
Cover Letter
2 pages
The New York Times
50% (8)
The New York Times
45 pages
Damp Proof Ultra Warranty - Terraces
No ratings yet
Damp Proof Ultra Warranty - Terraces
8 pages
Avr Rel Slimpower 500.1000.1500.2000 WL - Manual
No ratings yet
Avr Rel Slimpower 500.1000.1500.2000 WL - Manual
27 pages
Switch Level
No ratings yet
Switch Level
19 pages
213a - Financial Accounting
No ratings yet
213a - Financial Accounting
22 pages
EPP (Sewing of Household Linens)
No ratings yet
EPP (Sewing of Household Linens)
4 pages
Mba Admission Shedule 2024-25
No ratings yet
Mba Admission Shedule 2024-25
4 pages
Bhavesh Resume
No ratings yet
Bhavesh Resume
3 pages
Getting Started: Raspberry Pi High Quality Camera
No ratings yet
Getting Started: Raspberry Pi High Quality Camera
7 pages
Admit Card
No ratings yet
Admit Card
3 pages
Drain Space
No ratings yet
Drain Space
10 pages
Formula 1 and Le Mans History
No ratings yet
Formula 1 and Le Mans History
2 pages
Industrial Relations: Unit 7
100% (10)
Industrial Relations: Unit 7
24 pages
2010 - Baja Sae Catalog
100% (1)
2010 - Baja Sae Catalog
15 pages
30hxyhxc-High Cop 2012
100% (2)
30hxyhxc-High Cop 2012
12 pages
National Book Store Core Values and Social Responsibility
No ratings yet
National Book Store Core Values and Social Responsibility
2 pages
Proposal For Improvement of Infant Toddler Weighing Scale
No ratings yet
Proposal For Improvement of Infant Toddler Weighing Scale
6 pages
Digital Documentation ch-3 Quest and Ans
No ratings yet
Digital Documentation ch-3 Quest and Ans
10 pages

Data Analysis Lab - Final - 23-24

Uploaded by

Data Analysis Lab - Final - 23-24

Uploaded by

CS352 Data Analysis Lab

The main objectives of the course are to:

On successful completion of the course, students will be able to:

NumPy Basics: Arrays and Vectorized ComputationThe NumPyndarray,

Pandas Data Structure: Introduction to pandas Data Structure, Essential

Data Cleaning and Preparation: Handling Missing Data, Data Transformation,

Data Wrangling: Join, Combine, and Reshape: Hierarchical Indexing,

Data Aggregation and Group Operations: Group By Mechanics, Data

1. Jake VanderPlas, Python Data Science Handbook Essential Tools for

a) Create a one-dimensional array and perform all operations on it.

2. Fisher’s Iris Dataset

3. Pandas Series Programs

a) To add, subtract, multiple and divide two pandas Series.

4. Pandas DataFrames Programs

Python program to do the following operations: Library: Pandas DataFrames

5. Canada Pizza Price Prediction

company, price_cad, diameter, topping, variant, size, extra_sauce, extra_cheese,

6.Mobile Phone Price Data set:

a) Identify the models & the price released by each brand.

7.National Universities Rankings

o Name – institution name,

8.Adidas Sales Dataset

o Retailer ID, Invoice Date,

a) List all the products sold in every region.

a) Find number of movies released under each genre in each year.

Historical data on avocado prices and sales volume in multiple US markets

Some relevant columns in the dataset:

o Date - The date of the observation

a) How to identify the unique values in the region column.

You might also like