[go: up one dir, main page]

0% found this document useful (0 votes)
14 views66 pages

Data Visualization 1

Uploaded by

ramanujbehera91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views66 pages

Data Visualization 1

Uploaded by

ramanujbehera91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Unit – 1[ii]: Data Visualization [Marks:

~10 ]
Data visualization is the representation of data through use of
common graphics, such as charts, plots, infographics, and
even animations.

Python provides various libraries that come with different features


for visualizing data. All these libraries come with different features
and can support various types of graphs.

Data visualization provides a good, organized pictorial


representation of the data which makes it easier to
understand, observe and analyze
Data Visualization
Charts
Data can visualized by :-
1. Line Chart- plot()
2. Bar Chart- bar() –Vertical chart , barh()- Horizontal Chart
3. Histogram –hist()
Purpose of Data visualization
The following purpose we use Data
Visualization: -
• Better analysis
• Identifying patterns
• Finding errors
• Understanding the story
• Exploring business insights
• Grasping the Latest Trends
Plotting library: Matplotlib
Matplotlib is the whole python package/ library used to create 2D graphs and plots by using

python scripts pyplot is a module in matplotlib which supports a very wide variety of
graphs and plots namely histogram, bar charts, power spectra, error charts etc. It is used
along with NumPy to provide an environment for MatLab.
Features: Following features are provided in matplotlib library for
data visualization:
• Drawing plots can be drawn based on passed data through specific
functions.
• Customization plots can be customized as per requirement after
specifying it in the arguments of the functions Like color style
( dotted), width adding label, title, and legend in plots can be
customized.
• Saving plots After drawing and customization plots can be saved for
How to plot in matplotlib
Steps to plot in matplotlib
• Install matplotlib by pip command in command prompt
>> pip install matplotlib
• Create a py import matplotlib library in it using following
statement:>>import matplotlib pyplot as plt
• Set data points in plot() method of plt object
• Customize plot through changing different parameters
• Call the show() method to display plot
• Save the plot/graph if required using savefig() method
Matplotlib: line plot using plot()
function
A line plot/chart is a graph that shows the frequency of data occurring along
a number line. The line plot is represented by a series of datapoints
connected with a straight line. A line plot or line graph can be created using
the plot() function available in pyplot library. We can, not only just plot a
line but we can explicitly define the grid, the x and y axis scale and labels,
title and display options etc.
Example
import matplotlib.pyplot as plt
year =[2014,2015, 2016, 2017, 2018]
jnv=[90 ,92, 94 ,95, 97]
kv=[89, 91, 93 ,95 ,98]
plt.plot (year, jnv, color='g’)
plt.plot ( year,kv, color='orange’)
plt.xlabel (‘Year’)
plt.ylabel ('Pass percentage’)
plt.title ('JNV KV PASS% till 2022’)
plt.show()
Customization of line (plot() function)
Plots could be customized at three levels:
Colors
b – blue
c – cyan
g – green
k – black
m – magenta
r – red
w – white
y – yellow
Line Styles
‘-‘ : solid line
‘- -‘: dotted line
‘- .’: dash-dot line
‘:’ – dotted line
Marker Styles.
. point marker,
, Pixel marker
v – Triangle down marker
^ – Triangle up marker
< – Triangle left marker
> – Triangle right marker
1 – Tripod down marker
2 – Tripod up marker
3 – Tripod left marker
4 – Tripod right marker
s – Square marker
p – Pentagon marker
• – Star marker
Other configurations
color or c
line style
Linewidth
marker
Markeredgewidth
Markeredgecolor
Markerfacecolor
Opacity/ transparency in a Line: alpha
attribute
• Matplotlib allows you to regulate the transparency of a graph plot
using the alpha attribute.
• By default, alpha=1.
• If you would like to form the graph plot more transparent, then
you’ll make alpha but 1, such as 0.5 or 0.25.
Q1. Write a python code to draw a line chart about time and
distance relation
import matplotlib.pyplot as plt
#Step 2: Create the Lists/Arrays
d = [0,5,2,7,3,4,5,2]
t = [0,1,2,3,4,5,6,7]
#Step 3: Plot the Graph
plt.plot(t,d, linestyle = '-',marker = 'o’)
#Step 4: Provide the Details
plt.title("Distance vs Time")
plt.xlabel("Time")
plt.ylabel("Distance")
#Step 5: Save the Plot
plt.savefig("speed.png")
#Step 6: Display the Plot
plt.show()
Output
Q2. Write a Python code to display line chart with marker
import matplotlib.pyplot as plt
year = ['2017 - 18','2018 - 19','2019 - 20','2020 - 21']
kvp = [83.4,89.7,88.7,91.2]
jnv = [87.3,88.3,82.5,90.2]
hcs = [90.2,89.0,83.7,93.5]
plt.plot(year, kvp, marker = 'o', label = 'KVP')
plt.plot(year, jnv, marker = '*', label = 'JNV')
plt.plot(year, hcs, marker = '^', label = 'HCS')
plt.title("Result Analysis")
plt.xlabel("Year")
plt.ylabel("Percentage")
plt.legend()
plt.savefig('Result.png’)
Q3. Write a Python program to plot two or more lines with legends, different
widths and colors.
import matplotlib.pyplot as plt
x1 = [10,20,30]
y1 = [20,40,10]
x2 = [10,20,30]
y2 = [40,10,30]
plt.xlabel('x - axis')
plt.ylabel('y - axis')
plt.title('Two or more lines with different widths and colors with suitable legends')
plt.plot(x1,y1, color='blue', linewidth = 3, label = 'line1-width-3')
plt.plot(x2,y2, color='red', linewidth = 5, label = 'line2-width-5')
# show a legend on the plot
plt.legend(loc='upper left',frameon=False)
plt.show()
Q4. What is the output of a Python program.
import matplotlib.pyplot as plt
# x axis values
x = [1,4,5,6,7] # y axis values
y = [2,6,3,6,3] # plotting the points
plt.plot(x, y, color='red', linestyle='dashdot', linewidth = 3, marker='o',
markerfacecolor='blue', markersize=12)
#Set the y-limits of the current axes.
plt.ylim(1,8) #Set the x-limits of the current axes.
plt.xlim(1,8) # naming the x axis
plt.xlabel('x - axis’) # naming the y axis
plt.ylabel('y - axis’) # giving a title to my graph
plt.title('Display marker’) # function to show the plot
Output
Q5. Write a output of the following python
code
import matplotlib.pyplot as plt
seasons = ['Summer','Monsoon','Autumn','Winter','Spring’]
Ice_cream = [100,80,70,45,85]
plt.bar(seasons,ice_cream, linewidth = 2, edgecolor = 'm')
plt.title('Ice-Cream Per Season')
plt.xlabel('Seasons')
plt.ylabel('Litres of Ice-cream')
plt.show()
Bar Graph
A graph drawn using rectangular bars to show how large each
value is The bars can be horizontal or vertical.

A bar graph makes it easy to compare data between different


groups at a glance Bar graph represents categories on one axis and
a discrete value in the other The goal bar graph is to show the
relationship between the two axes Bar graph can also show big
changes in data over
Bar graph customization
Output
Q6. Write a output of the following python
code
import matplotlib.pyplot as plt
import numpy as np
label = ['Anil', 'Vikas', 'Dharma', 'Mahen', 'Manish', 'Rajesh']
per = [94,85,45,25,50,54]
index = np.arange(len(label))
print(index)
plt.bar(index, per)
plt.plot(index,per)
plt.xlabel('Student Name', fontsize=10)
plt.ylabel('Percentage', fontsize=10)
plt.xticks(index, label, fontsize=8 ,rotation=90)
plt.title('Percentage of Marks achieve by student Class XII')
plt.show()
Output
Home Work
Q1. Plot a line graph to display growth in population in the
past 7 decades. Use the following Table Data for this purpose:-

Q2. Plot a Bar Graph to show the number of boys in each


class 6- 12. Data should be imagined by student.

Q3. Plot a Bar Graph for Marks scored in different subjects. Data should be
imagined.
Histogram
A histogram is a graphical representation which organizes a group of data
points into user-specified ranges.
In other words “A histogram is a graph showing frequency distributions. It
is a graph showing the number of observations within each given interval.”

Histogram provides a visual interpretation of numerical data by showing


the number of data points that fall within a specified range of values
(“bins”). It is similar to a vertical bar graph but without gaps between the
bars.

The matplotlib.pyplot.hist() function is used to compute and create


histogram of x.
Syntax :plt.hist(x, bins, [weights], [edgecolor],
[histtype])
Parameters :
x : array or sequence of array
bins : optional parameter contains integer or sequence or strings
density: optional parameter contains boolean values
range : optional parameter represents upper and lower range of bins
histtype: optional parameter used to create type of histogram [bar, barstacked, step, stepfilled],
default is “bar”
weights: optional parameter contains array of weights having same dimensions as x
orientation: Specifies the orientation of the histogram bars. Options are horizontal or vertical.
The default is vertical. (Optional)
rwidth: A float value that computes the width of the bars relative to the bin width. (Optional)
log: A Boolean value that, if set to True, sets the axis to a log scale. (Optional)
color: Specifies the colors of the bars. (Optional)
label: A string that labels the dataset. (Optional)
stacked: Boolean value that, if set to True, stacks datasets on top of each other.
Example:
plt.hist([5,15,25,35,45, 55], bins=[0,10,20,30,40,50, 60], weights=[20,10,45,33,6,8], edgecolor="red")
Example:
import numpy as np
import matplotlib.pyplot as plt
a = np.array([22, 87, 5, 43, 56, 73, 55, 54, 11,20, 51, 5, 79, 31,27])
plt.hist(a, bins = [0, 20, 40, 60, 80,100], edgecolor='y')
# Show plot
plt.show()
For better understanding we develop the same program with minor
change .
import matplotlib.pyplot as plt
plt.hist([5,15,25,35,15, 55], bins=[0,10,20,30,40,50, 60], weights=[20,10,45,33,6,8],
edgecolor="red")
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()
plt.savefig('Mydata.jpg')
How to save plot
For future use we have to save the plot. To save any plot savefig() method is
used.
plots can be saved like pdf,svg,png,jpg file formats.
plt.savefig('line_plot.pdf')
plt.savefig('line_plot.svg')
plt.savefig('line_plot.png')
Parameter for saving plots .e.g.
plt.savefig('line_plot.jpg', dpi=600, quality=80, optimize=True, progressive=True)
Which Export Format to Use?
The export as vector-based SVG or PDF files is generally preferred over bitmap-
based PNG or JPG files as they are richer formats, usually providing higher
quality plots along
Multiple Chart
import matplotlib.pyplot as plt
month = ['Jan','Feb','Mar','Apr','May','Jun']
Year2021 = [10,12,15,25,30,32]
Year2020 = [18,10,20,25,35,40]
plt.bar(month,Year2020, width = -0.4, align = 'edge', label = '2020’)
#Negetive,<-width shifts the graph to the left
plt.bar(month,Year2021, width = 0.4, align = 'edge', label = '2021’)
#Positive->-width shifts the graph to the right
plt.title("Electricity Comparison 2020 vs 2021")
plt.xlabel("Month")
plt.ylabel("Units")
plt.legend()
plt.show()
No of Customers (in lakh)
BSNL
AIRTEL

Year
JIO
Admitted Students

No of Students Enrolled
Leave Student TCs

Year
plot a dataframe
• We can plot a dataframe using the plot() method. But we need a dataframe to plot.
• We can create a dataframe by just passing a dictionary to the DataFrame() method
of the pandas library.

import pandas as pd
import matplotlib.pyplot as plt
data_dict = {'name':['p1','p2','p3','p4','p5','p6'],'age':
[20,20,21,20,21,20],'math_marks':[100,90,91,98,92,95],'physics_marks':
[90,100,91,92,98,95],'chem_marks':[93,89,99,92,94,92] }
df = pd.DataFrame(data_dict)
df.plot(kind = 'bar', x = 'name', y = 'physics_marks', color = 'green')
plt.title('BarPlot')
plt.show()
We have to specify some parameters for plot() method to get the bar plot.

df.plot(kind='bar',x= 'some_column',y='some_colum',color='somecolor’)

Example:
df.plot(kind = 'bar’, x = 'name’, y = 'physics_marks’, color = 'green’)
plt.title('BarPlot')
plt.show()
pandas.DataFrame.plot
DataFrame.plot(*args, **kwargs[source])
Make plots of Series or DataFrame.
Parameters
dataSeries or DataFrame
The object for which the method is called.
xlabel or position, default None
Only used if data is a DataFrame.
ylabel, position or list of label, positions, default None
Allows plotting of one column versus another. Only used if data is a DataFrame.
The kind “kindstr”of plot to produce:
‘line’ : line plot (default)
‘bar’ : vertical bar plot
‘barh’ : horizontal bar plot
‘hist’ : histogram
‘box’ : boxplot
‘kde’ : Kernel Density Estimation plot
‘density’ : same as ‘kde’
‘area’ : area plot
‘pie’ : pie plot
‘scatter’ : scatter plot (DataFrame only)
‘hexbin’ : hexbin plot (DataFrame only)
xticks sequence
Values to use for the xticks.

yticks sequence
Values to use for the yticks.

xlim-tuple/list
Set the x limits of the current axes.

ylim-tuple/list
Set the y limits of the current axes.

xlabel label, optional


Name to use for the xlabel on x-axis. Default uses index name as xlabel, or the x-
column name for planar plots.
Practical Questions
Q1. Given the school result data, analyses the
performance of the students on different parameters, e.g.
subject wise or class wise.

Q2. For the Data frames created above, analyze, and plot
appropriate charts with title and legend.

Q3. Take data of your interest from an open source (e.g.


data.gov.in), aggregate and summarize it. Then plot it
using different plotting functions of the Matplotlib library.
Sample Questions

Q1. What will be the output of the given code?


import pandas as pd
s = pd.Series([1,2,3,4,5], Q1. 3
index=['akram','brijesh','charu','deepika','era'])
print(s['charu’]) Q2.
a NaN
Q2. Write the output of the given program: v -1.0
import pandas as pd w 2.0
S1=pd.Series([5,6,7,8,10],index=['v','w','x','y','z']) x NaN
l=[2,6,1,4,6] y 2.0
S2=pd.Series(l,index=['z','y','a','w','v']) z 8.0
print(S1-S2) dtype: float64
Answer:
Q1. 3

Q2.
a NaN
v -1.0
w 2.0
x NaN
y 2.0
z 8.0
dtype: float64
Q3. Observe the following figure. Identify the coding for obtaining
this as output.
a. import matplotlib.pyplot as plt B
plt.plot([1,2],[4,5])
plt.show()
b. import matplotlib.pyplot as plt
plt.plot([1,2,3],[4,5,1])
plt.show()
c. import matplotlib.pyplot as plt
plt.plot([2,3],[5,1])
plt.show()
d. import matplotlib.pyplot as plt
plt.plot([1,3],[4,1])
plt.show()
Q4. Ritika is a new learner for the python pandas, and she is aware of some
concepts of python. She has created some lists, but is unable to create the data
frame from the same. Help her by identifying the statement which will create the
data frame.
import pandas as pd
Name=['Manpreet','Kavil','Manu','Ria']
Phy=[70,60,76,89]
Chem=[30,70,50,65]
----------------------------
----------------------------
df=pd.DataFrame({'Phy':Phy,'Chem':Chem}, index=Name)
print(df)
Q5. Fill in the blanks :
The command used to give title to x-axis as “No. of Students” in the graph
is____
Ans: b
a) plt.xlabel.name("No. of Students ")
b) plt.xlabel("No. of Students ")
c) plt.xaxis("No. of Students ")
d) plt.plot( No. of Students,x=True)
Q6. Write a program to draw line charts for the following with suitable label in the X-axis, Y-
axis and a title. Show the unemployment rate from 1930 to 2020

Year = [1930,1940,1950,1960,1970,1980,1990,2000,2010,2020]
Unemployment_Rate = [9.8, 12, 8, 7.2, 6.9, 7, 6.5, 6.2, 5.5, 9.3]

plt.plot(Year,Unemployment_Rate)
plt.show()
Q6. Consider the following graph. Write the code to plot it.
Q7. Consider the following DataFrame df and write code for questions from
(i)- (v)
Id name m1 m2 m3 m4
1 Juvaraj Singh 54 14 30 34
2 Virat Kohli 48 37 21 25
3 Rohit Sharma 40 42 38 34
4 Sikhar Dhavan 42 13 34 30
5 Dinesh Kartik 15 10 28 64
(i)To display the minimum valued of m1.
i. df.m1.min()
(ii)To display the first and second row values. ii. df.head(1) & df.tail(1)
(iii) To remove the m4 column from the dataframe. iii. df.drop([‘m4'], axis=1)
(iv) To add a new column total which is the summation iv. df.sum(axis=1)
of (m1,m2,m3,m4) v. df=pd.T()
(v) Write code to transpose the dataframe
Q8. Write the python code to draw following bar graph
representing the number of students in each class.
Q9. Write a program in Python Pandas to create
the following DataFrame df1 from list of
Dictionaries
x y z
0 10 20.0 NaN
1 50 40.0 50.0
2 70 NaN 90.0
Perform the following operations on the DataFrame :
(i) Fill all NaN with 0(Zero)
(ii) Fill all NaN with 50 for column y i. df.fillna(0) or df.replace(np.nan,0)
ii. df[‘y’]=df[‘y’].fillna(50)
(iii) Fill all NaN with adjacent cell iii. df.fillna(df[‘x’],inplace=True)
Q9. Consider the following graph . Write the code to plot it.
Q10. Create a data frame with following values

Brand Price Year


0 Samsung J7 22000 2015
1 Vivo V11 25000 2013
2 Honor play 27000 2018
3 Xiomi mi8 35000 2018

a) Write the command to sort the data on Brand name


b) Write the command to sort the data on Brand name in descending order
c) Write the command to sort the data on first year basis then price in
ascending order
df.sort_values(by=['Brand'], inplace=True)
Q11. Write a Pandas program to count the number of rows and
columns of a DataFrame. Sample Python dictionary data and list
labels:
Exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael',
'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j’]
…………………………………………
…………………………………………
Expected Output:
df=pd.DataFrame(Exam_data)
Number of Rows: 10 print(“Number of Rows:”,len(df)) or print(df.count())
Number of Columns: 4 print(“Number of Rows:”,len(df.columns))
Q12. Write a Pandas program to select the rows where number of
attempts in the examination is less than 2 and score greater than 15.
Sample Python dictionary data and list labels:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Rows where score between 15 and 20 (inclusive):
attempts name qualify score
c 2 Katherine yes 16.5 To get rows where value is between two values in Pandas
DataFrame, use the query(~) method.
j 1 Jonas yes 19.0 Example : df.query(“15 <= score <= 20")
Q13. What will be the output of the following code ?
import pandas as pd
import numpyas np
#Create a Dictionary of series
d = {'Name':pd.Series(['Sachin','Dhoni','Virat','Rohit','Shikhar']),
'Age':pd.Series([26,25,25,24,31]),
'Score':pd.Series([87,67,89,55,47])}
#Create a DataFrame
df= pd.DataFrame(d)
print (df)
print(df.count())
print("count age",df[['Age']].count())
print("sum of score",df[['Score']].sum())
print("maximum score",df[['Score']].max())
print("mean age",df[['Age']].mean())
print("mode of age",df[['Age']].mode())
Look at the following graph and select appropriate code to
obtain this output. (Please assume that pandas and
matplotlib
a. zone=[1,2,3,4]
is already imported)
schools = [230,430,300,140]
plt.plot(zone, school, ‘School Survey’)
plt.show()
b. zone=[1,2,3,4]
schools = [230,430,300,140]
plt.plot(schools,zone, ‘School Survey’)
plt.show()
c. zone=[1,2,3,4]
schools = [230,430,300,140]
plt.plot(zone, school)
plt.title(“School Survey”)
plt.show()
d. zone=[1,2,3,4]
schools = [230,430,300,140]
plt.plot(schools,zone)
plt.title(“School Survey”)
plt.show()
Write a python code to draw a following chart
Write a python code to draw a following chart
Write a python code to draw a following chart
Create a dataframe ‘df’ with following values and give the answer of given question:-

Brand Price Year


0 Samsung J7 22000 2015
1 Vivo V11 25000 2013
2 Honor play 27000 2018
3 Xiomi mi8 35000 2018
a) Write the command to sort the data on Brand name
b) Write the command to sort the data on Brand name in descending order
c) Write the command to sort the data on first year basis then price in
ascending order
Q. Write a python program to plot a line chart based on the given data to depict
the Monthly scoring rate of a batsman for four months.
Month=[1,2,3,4]
Scoring rate=[140,132,148,164]
Q. Write a python program to plot a line chart based on the given data to depict
the changing weekly average marks of Raman for four weeks test.
Week=[1,2,3,4]
Avg_Marks=[50,82,68,54]
Q. Consider the following graph.
Write a program in python to draw it.
(Height of Bars are 10,1,0,33,6,8)
Q. Consider the following graph .Write the python
program to plot line graphs showing daily sales of
“Drinks” and“Food” items for five days
Q. What will be the output of the following code
Q. Write the output of the given program:
import pandas as pd
S1=pd.Series([5,6,7,8,10],index=['v','w','x','y','z']) l=[2,6,1,4,6]
S2=pd.Series(l,index=['z','y','a','w','v’])
print(S1-S2)

Q. Which command will be used to delete 3 and 5 rows of the data frame.
Assuming the data frame name as DF.
a. DF.drop([2,4],axis=0)
b. DF.drop([2,4],axis=1)
c. DF.drop([3,5],axis=1)
d. DF.drop([3,5])
Mr. Vijay is working in the mobile app development industry and he was comparing the
given chart on the basis of the rating of the various apps available on the play store and
answer the questions (i) to (v).
He is trying to write a code to plot the graph. Help Mr. Vijay to fill in the blanks of the code
and get the desired output.
import ________________ as plt#Statement 1
apps=["ArogyaSetu","WPS Office","CamScanner","WhatsApp","Telegram"]
ps_rating=[3.9,4.5,4.6,4.2,4.3]
plt.__________(apps,ps_rating,color='m',label=__________)
#Statement 2, Statement 3
plt.xlabel("Apps")
plt._____________("Rating") #Statement 4
plt._________ #Statement 5
plt.________ #Statement 6
i) Write the appropriate statement for #statement 1 to import the module.
ii) Write the function name and label name as displayed in the output for #statement 2 and #statement 3
respectively.
iii) Which word should be used for #statement 4?
iv) Write appropriate method names for #Statement 5 to display legends and #Statement 6 to open the figure.
v) Mr. Vijay wants to change the chart type to a line chart. Which statement should be updated and which
method or function is used?

You might also like