DV Lab2 Updated
DV Lab2 Updated
ggplot,ggplot2, plotly
(i) numpy :
NumPy stands for Numerical Python. NumPy is a Python library used for working with arrays. It
also has functions for working in domain of linear algebra, fourier transform, and matrices.
Arrays are the collection of elements/values that can have one (or) more dimensions. An array of one
dimension is called vector and array of two dimensions is called a matrix. Numpy arrays are called ndarray
(or) n-dimensional array.
Output :
Array is of type:
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int64
1. Creation of 0-D array:
import numpy as np
arr = np.array(42)
print(arr)
3. Creation of 2-D array: An array that has 1-D arrays as its elements is called a 2-D array.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
4. Creation of 3-D Arrays: An array that has 2-D arrays (matrices) as its elements is called 3-D
array.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Checking of No of dimensions:
NumPy Arrays provides the ndim attribute that returns an integer that tells us how many
dimensions the array have.
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)
Array of zeros:
This routine is used to create the numpy array with the specified shape where each numpy array
itemis initialized to 0.
Syntax: numpy. zeros (shape)
import numpy as np
b=np.zeros(5)
print(b)
Output: [0 0 0 0 0]
Array of ones:
It is used to create the numpy array with the specified shape where each numpy array item is
initialized to 1.
Syntax: np.ones(shape)
import numpy as np
d=np.ones(6,dtype(int))
print(d)
Output:
[1 1 1 1 1 1]
Array of random values:
The random is a module present in the NumPy library. This module contains the functions which are
used for generating random numbers. This module contains some simple random data generation
methods, some permutation and distribution functions, and random generator functions.
Syntax: np.random.rand(shape)
import numpy as np
e=np.random.rand(6)
print(e)
Output:
[0.02905376 0.59423152 0.25030791 0.60751057 0.52254074 0.80428618]
Array of your choice:
To print the array elements of your choice we use full().
import numpy as np
f=np.full((3,3),7)
print(f)
Output:
[[7 7 7]
[7 7 7]
[7 7 7]]
import numpy as np
g=np.eye(4)
print(g)
Output:
[1 0 0 0]
[0 1 0 0]
[0 0 1 0]
[0 0 0 1]
Syntax: numpy.shape(array_name)
import numpy as np
a=np.array([[1,2,3],[4,5,6]])
print(a.shape)
Output:
(2, 3)
(ii) pandas
Introduction for Pandas:
“Pandas” is developed by WES MCKINNEY in 2008 and is used for data analysis. As for
data analysis requires lots of processing like restructuring, cleaning, etc. So we use pandas.
In python, pandas are defined as an open source library which is used for high performance
data manipulation and high level data structure.
It contains high level data structures and manipulation tools, designed for fast and easy data
analysis in python.
Pandas were built on top of numpy and make it easy and more effective use for numpycentric
application.
Example: Often it will be desirable to create a Series with an index identifying each data point:
import pandas as pd
s=pd.Series ([-2, 5, 7, 9, 23], index= ['a','b','c','d','e'])
print(s)
Output:
a -2
b 5
c 7
d 9
e 23
Now we can the value of any index value:
print(s['d'])
print(s[[„a‟,‟b‟,‟c‟]])
Output:
9
a -2
b 5
c 7
Operator on pandas:
We can also perform all the arithmetic and comparison operation in this array. We can perform
operation like scalar multiplication, or applying math functions.
import pandas as pd
k=pd.Series([-7,6,5,-21,5,65])
print(k)
Output:
0 -7
1 6
2 5
3 -21
4 5
5 65
Example: print(k*k)
Output:
0 49
1 36
2 25
3 441
4 25
5 4225
Example: print(k*2)
Output:
0 -14
1 12
2 10
3 -42
4 10
5 130
Output:
Ohio 35000
Texas 71000
Oregon 16000
Utah 5000
We can give our own indexing to the series. In pandas can check if the given values are null or not null
Ex:
import pandas as pd
data={'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
place=['a','Ohio','Texas','Oregon','Utah','b']
obj=pd.Series(data,index=place)
print(obj)
Output:
a NaN
Ohio 35000.0
Texas 71000.0
Oregon 16000.0
Utah 5000.0
b NaN
a=pd.isnull(obj)
print(a)
Output:
a True
Ohio False
Texas False
Oregon False
Utah False
b True
b=pd.notnull(obj)
print(b)
Output:
a False
Ohio True
Texas True
Oregon True
Utah True
b False
Data Frames:
A Data Frame represents a tabular, spreadsheet-like data structure containing the collection of
columns, each of which can be a different value type (numeric, string, Boolean, etc.). The Data
Frame has both a row and column index; it can be thought of as a dictionary of Series.
The data frame has both rows and column index. The data is stored as two or more dimensional
blocks rather than list, directories, ndarrays or some other collections of one dimensional arrays.
Data Frames stores data internally in two dimensional format and we can easily represent much
high dimensional data in tabular format with hierarchical indexing.
import pandas as pd
data={„Year‟:[2021,2020,2019,2018],‟Month‟:[„Jan‟,‟Feb‟,‟Mar‟,‟Apr‟]}
df=pd.DataFrame(data)
print(df)
Output
Year Month
0 2021 Jan
1 2020 Feb
2 2019 Mar
3 2018 Apr
import pandas as pd
k=pd.DataFrame({'R.no':[4401,4402,4403,4404],'Sname':['ABC','DEF','GHI','JKL'],'Rank':[7,
6,3,9]})
print(k)
Output:
0 4401 ABC 7
1 4402 DEF 6
2 4403 GHI 3
3 4404 JKL 9
import pandas as pd
k=pd.DataFrame({'R.no':[4401,4402,4403,4404],'Sname':['ABC','DEF','GHI','JKL'],'Rank':[7,6,3,9]})
k['Rank']=2
print(k)
Output:
R.no Sname Rank
0 4401 ABC 2
1 4402 DEF 2
2 4403 GHI 2
3 4404 JKL 2
If we want give a range of values to all the record of a particular column to the above set.
import numpy as np
import pandas as pd
k=pd.DataFrame({'R.no':[4401,4402,4403,4404],'Sname':['ABC','DEF','GHI','JKL'],'Rank':[7,
6,3,9]}
k['Rank']=np.arange(4)
print(k)
Output:
R.no Sname Rank
0 4401 ABC 0
1 4402 DEF 1
2 4403 GHI 2
3 4404 JKL 3
If we want to alter all the values of particular column with a unique value to the followingset.
import numpy as np
import pandas as pd
k=pd.DataFrame({'R.no':[4401,4402,4403,4404],'Sname':['ABC','DEF','GHI','JKL'],'Rank':[7,6,3,9]})
val=pd.Series([11,22],index=[0,3])
k['Rank']=val
print(k)
Output:
R.no Sname Rank
import pandas as pd
a=pd.Series([1,2,3],index=['a','b','c'])
print(a)
Output:
a 1
b 2
c 3
import pandas as pd
a=pd.Series([1,2,3],index=['a','b','c'])
a.index=['x','y','z']
print(a)
Output:
x 1
y 2
z 3
In this concept while working with altering index we cannot change the vales in the series.
a.index['x']=5
Output:
Index does not support mutable operations.
OUTPUT:
Example: 2
Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
OUTPUT:
To plot symbols rather than lines, provide an additional string argument.
symbols - , –, -., , . , , , o , ^ , v , < , > , s , + , x , D , d , 1 , 2 , 3 , 4 , h , H , p , | , _
Colors b, g, r, c, m, y, k, w
Example: 3
Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to position (8, 10)
plt.plot(xpoints, ypoints)
plt.show()
OUTPUT:
Example: 4
Mark each point with a circle:
OUTPUT:
Example 5: Mark the points with red color diamond symbol
import matplotlib.pyplot as plt
import numpy as np
plt.plot(xpoints, ypoints,'rD')
plt.show()
Output:
There are several Python packages that provide a grammar of graphics. Here we focus
on plotnine since it‟s one of the most mature ones. plotnine is based on ggplot2 from the R
programming language, so if we have a background in R, then we can consider plotnine as the
equivalent of ggplot2 in Python.
output:
(v) seaborn
Seaborn is a library in Python predominantly used for making statistical graphics. Seaborn is a data
visualization library built on top of matplotlib and closely integrated with pandas data structures in
Python. Visualization is the central part of Seaborn which helps in exploration and understanding of
data.
One has to be familiar with Numpy and Matplotlib and Pandas to learn about Seaborn.
Seaborn offers the following functionalities:
1. Dataset oriented API to determine the relationship between variables.
2. Automatic estimation and plotting of linear regression plots.
3. It supports high-level abstractions for multi-plot grids.
4. Visualizing univariate and bivariate distribution.
These are only some of the functionalities offered by Seaborn, there are many more
Example 1: How do we can get a list of all datasets that are in-built in Seaborn
import seaborn as sns
print(sns.get_dataset_names())
Output:
['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'exercise',
'flights', 'fmri', 'gammas', 'geyser', 'iris', 'mpg', 'penguins', 'planets', 'tips', 'titanic']
Example 2: A simple line plot which is created using the lineplot() method
import seaborn as sns
import matplotlib.pyplot as plt
data = sns.load_dataset("iris")
sns.lineplot(x="sepal_length", y="sepal_width", data=data)
plt.title('Title using Matplotlib Function')
plt.show()
Example 3: A simple bar plot which is created using the barplot() method
import seaborn as sns
import matplotlib.pyplot as plt
data = sns.load_dataset("iris")
output: