Certificate That This Is Bonafide Record of Practical Work Done in The Laboratory by The Candidate P.Hemanth During The Academic Year 2022 - 2023
Certificate That This Is Bonafide Record of Practical Work Done in The Laboratory by The Candidate P.Hemanth During The Academic Year 2022 - 2023
H.T.NO. 2 1 K 6 1 A 1 2 4 0
I) Internal II)External
INFORMATION DATA SCIENCES USING
credits-2
Course Objectives:
The main objective of the course is to inculcate the basic understanding of Data Science and
it’s practical implementation using Python.
Course Outcomes:
Upon successful completion of the course, the students will be able to
List of experiments:
c. Label Encoding
d. On Hot Encoding
12. Perform following visualizations using matplotlib
a. Bar Graph
b. Pie chart
c. Box Plot
d. Histogram
e. Line Chart and Subplots
f. Scatter plot
NUMPY IN PYTHON
EXPNO:1 DATE:
1. CREATING A NUMPY ARRAY
a) Basic ndarray:
>>>import numpy as np #IMPORTING NUMPY
>>>np.nd
ERRORS RAISED:
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
>>>np.nd
ERRORS RAISED:
File "C:\python\lib\site-packages\numpy\ init .py", line 284, in getattr
>>>a=np.nd
ERROR RAISED
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
>>> b=np.zeros([3,3])
>>>b
OUTPUT:
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
c) Array of ones:
>>> a=np.ones(5,dtype=int)
>>>a
OUTPUT:
array([1, 1, 1, 1, 1])
>>>a=np.ones([3,3])
>>>a
OUTPUT:
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
d) Random numbers in ndarray
>>>a=np.random.rand(5,2)
>>>a
OUTPUT:
array([[0.77915258, 0.99248503],
[0.67258782, 0.21354799],
[0.03767875, 0.5003717 ],
[0.45841117, 0.49132479],
[0.76521603, 0.64683037]])
>>>a=np.random.rand(2,2)
>>>a
OUTPUT:
array([[0.17558914, 0.16360674],
[0.26162956, 0.96779824]])
e) An array of your choice:
>>>a=np. array([1,2,3])
>>>a[0]
OUTPUT:
1
>>>a
OUTPUT:
[1 2 3]
f) matrix in numpy:
>>>a=np.matrix('10,20;30,40')
>>>a
OUTPUT:
matrix([[10, 20],
[30, 40]])
>>>a.size
# KNOWING THE SIZE OF THE ARRAY
OUTPUT:
6
d) Reshaping a NumPy array:
>>>d=np.arange().reshape()
Error:
Traceback (most recent call last):
File "<pyshell#35>", line 1, in
<module>
>>> d=np.arange(45).reshape(3,3,5)
>>>d
OUTPUT:
array([[[ 0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
>>>d[0]
OUTPUT:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
e) Falttening a NumPy array:
>>>c=np.array([[1,2,3],[4,5,6]])
>>>c.flatten
ERROR:
<built-in method flatten of numpy.ndarray object at 0x0000026E23BDB570>
>>>c.flatten()
OUTPUT:
array([1, 2, 3, 4, 5, 6])
>>>a=np.transpose(a)
>>>a
OUTPUT:
array([[1, 4],
[2, 5],
[3, 6]])
>>>a=np.sort(a)
OUTPUT:
array([ 9, 40, 65])
>>>a[0:4]
OUTPUT:
array([1, 2, 3, 4])
>>>a[0:5]
OUTPUT:
array([1, 2, 3, 4, 5])
>>>a[0:3,0:3]
OUTPUT:
array([[1, 2, 3],
[4, 5, 6]])
>>>d
OUTPUT:
array([[[ 0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
>>>d[0:3][0:3][0:2]
OUTPUT:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
>>>d[0:2][0:2][0:2]
OUTPUT:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
>>>d[0:2,0:2,0:2]
OUTPUT:
array([[[ 0, 1],
[ 5, 6]],
[[15, 16],
[20, 21]]])
>>>d[0:1,0:1,0:1]
OUTPUT:
array([[[0]]])
>>>x=np.array([1,2,3])
>>>y=np.array([4,5,6])
>>> np.stack((x,y),axis=0)
OUTPUT:
array([[1, 2, 3],
[4, 5, 6]])
>>>np.stack((x,y),axis=1)
OUTPUT:
array([[1, 4],
[2, 5],
[3, 6]])
>>>a=np.array([[1,2],[3,4]])
>>>b=np.array([[5,6],[7,8]])
>>>c=np.concatenate((a,b),axis=1)
>>>c
OUTPUT:
array([[1, 2, 5, 6],
[3, 4, 7, 8]])
c) Broadcasting in NumPy array:
>>>a = np.arange(10,20,2)
>>>a
OUTPUT:
array([10, 12, 14, 16, 18])
>>>a*2
OUTPUT:
array([20, 24, 28, 32, 36])
>>>a+3
array([13, 15, 17, 19, 21])
>>>x=np.array([[1,2,3],[4,5,6]])
>>>y=np.array([7,8,9])
>>>x+y
OUTPUT:
array([[ 8, 10, 12],
[11, 13, 15]])
a) Creating dataframe:
>>>data=[['Hemanth',40],['Sanjay',65]]
>>>df=pd.DataFrame(data,columns=['Name','RegdNo'],index=[1,2])
>>>df
OUTPUT:
Name RegdNo
1 Hemanth 40
2 Sanjay 65
b) Concat:
>>>data1=[['Prasanth',9],['Koushik',37]]
>>>df1=pd.DataFrame(data1,columns=['Name','RegdNo'],index=[3,4])
>>>x=pd.concat([df,df1])
>>>x
OUTPUT:
Name RegdNo
1 Hemanth 40
2 Sanjay 65
3 Prasanth 9
4 Koushik 37
c) Setting Conditions:
>>>x.loc[x['RegdNo']<=40,'<=40']=True
>>>x
OUTPUT:
Name RegdNo <=40
1 Hemanth 40 True
2 Sanjay 65 NaN
3 Prasanth 9 True
4 Koushik 37 True
>>>x.loc[x['RegdNo']>40,'<=40']=False
>>>x
OUTPUT:
Name RegdNo <=40
1 Hemanth 40 True
2 Sanjay 65 False
3 Prasanth 9 True
4 Koushik 37 True
>>>gender=['M','M','M','M']
>>>x['Gender']=gender
>>>x
OUTPUT:
Name RegdNo <=40 Gender
1 Hemanth 40 True M
2 Sanjay 65 False M
3 Prasanth 9 True M
4 Koushik 37 True M
>>>data.sort_values(by= 'petal.length',axis=0,ascending=False)
OUTPUT:
sepal.length sepal.width petal.length petal.width variety
118 7.7 2.6 6.9 2.3 Virginica
122 7.7 2.8 6.7 2.0 Virginica
117 7.7 3.8 6.7 2.2 Virginica
105 7.6 3.0 6.6 2.1 Virginica
c) Groupby:
>>>data.groupby(by="variety",axis=0)
ERROR:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002604B9BC510>
>>>gb=data.groupby(by="variety",axis=0)
>>>gb.first()
OUTPUT:
sepal.length sepal.width petal.length petal.width variety
5.1 3.5 1.4 0.2 Setosa
7.0 3.2 4.7 1.4 Versicolor
6.3 3.3 6.0 2.5 Virginica
a) Text files:
>>>df=pd.read_csv("intro.txt",sep=" ")
>>>df
OUTPUT:
Hemanth 40
0 Sanjay 65
1 Koushik 37
2 Prasanth 9
>>>df=pd.read_csv("intro.txt",sep=" ")
>>>df
OUTPUT:
Name Regdno
0 Hemanth 40
1 Sanjay 65
2 Koushik 37
3 Prasanth 9
b) CSV files:
>>>data=pd.read_csv("iris.csv")
>>>data.head()
#PRINTS THE FIRST 5 ROWS
OUTPUT:
sepal.length sepal.width petal.length petal.width variety
0 5.1 3.5 1.4 0.2 Setosa
1 4.9 3.0 1.4 0.2 Setosa
2 4.7 3.2 1.3 0.2 Setosa
3 4.6 3.1 1.5 0.2 Setosa
4 5.0 3.6 1.4 0.2 Setosa
>>>data.tail()
#DISPLAY THE LAST 5 ROWS
OUTPUT:
sepal.length sepal.width petal.length petal.width variety
145 6.7 3.0 5.2 2.3 Virginica
146 6.3 2.5 5.0 1.9 Virginica
147 6.5 3.0 5.2 2.0 Virginica
148 6.2 3.4 5.4 2.3 Virginica
149 5.9 3.0 5.1 1.8 Virginica
c) Excel files:
>>>df=pd.read_excel('Mailids.xlsx')
>>>df
OUTPUT:
Ex:3
import requests
r=requests.get('http://www.google.com')
print(r)
s=BeautifulSoup(r.content,'html.parser')
print(s.prettify())
output:
<Response [200]>
<!DOCTYPE html>
<html itemscope="" itemtype="http://schema.org/Web
Page" lang="en-IN">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="/images/branding/googleg/1x/googleg
_standard_color_128dp.png" itemprop="image"/>
a) Bar Graph:
>>>import matplotlib.pyplot as plt
OUTPUT:
<BarContainer object of 150 artists>
>>>plt.bar(data['sepal.length'],data['sepal.width'])
>>>plt.show() #BAR GRAPH
OUTPUT:
>>>plt.ylabel("sepal.width") #y-axis
OUTPUT:
Text(0, 0.5, 'sepal.width')
>>>plt.bar(data['sepal.length'],data['sepal.width'])
OUTPUT:
<BarContainer object of 150 artists>
>>>plt.show()
OUTPUT:
b) Pie chart:
>>>d=np.array(data['sepal.length'])
>>>d
OUTPUT:
array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9, 5.4, 4.8, 4.8,
4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5. ,
5. , 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5. , 5.5, 4.9, 4.4,
5.1, 5. , 4.5, 4.4, 5. , 5.1, 4.8, 5.1, 4.6, 5.3, 5. , 7. , 6.4,
6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5. , 5.9, 6. , 6.1, 5.6,
6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7,
6. , 5.7, 5.5, 5.5, 5.8, 6. , 5.4, 6. , 6.7, 6.3, 5.6, 5.5, 5.5,
6.1, 5.8, 5. , 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 7.1, 6.3,
6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5,
7.7, 7.7, 6. , 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2,
7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6. , 6.9, 6.7, 6.9, 5.8,
6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9])
>>>plt.pie(d)
>>>plt.show()
OUTPUT:
>>>import numpy as np
>>>x=np.array([10,20,30,40])
>>>mylabels=["x","y","z","w"]
>>>plt.pie(x,labels=mylabels)
OUTPUT:
([<matplotlib.patches.Wedge object at 0x000001CCD282C730>, <matplotlib.patches.Wedge
object at 0x000001CCD282CBB0>, <matplotlib.patches.Wedge object at
0x000001CCD282D030>, <matplotlib.patches.Wedge object at 0x000001CCD282D4E0>],
[Text(1.0461621663333946, 0.3399186987098808, 'x'), Text(0.33991867422268784,
1.0461621742897658, 'y'), Text(-1.0461621902025062, 0.3399186252483017, 'z'),
Text(0.3399188211458418, -1.0461621265515308, 'w')])
>>>plt.show()
OUTPUT:
c) Box plot:
>>>data=np.random.normal(100,20,200)
>>>fig=plt.figure(figsize=(10,7))
>>>plt.boxplot(data) #BOXPLOT
OUTPUT:
{'whiskers': [<matplotlib.lines.Line2D object at 0x000001CCCAC601F0>,
<matplotlib.lines.Line2D object at 0x000001CCCAC61ED0>], 'caps':
[<matplotlib.lines.Line2D object at 0x000001CCCAC62B90>, <matplotlib.lines.Line2D
object at 0x000001CCCAC63010>], 'boxes': [<matplotlib.lines.Line2D object at
0x000001CCCAC92710>], 'medians': [<matplotlib.lines.Line2D object at
0x000001CCCAC638B0>], 'fliers': [<matplotlib.lines.Line2D object at
0x000001CCCAC63F10>], 'means': []}
>>>plt.show()
OUTPUT:
d) Histogram :
>>> plt.hist(data['sepal.length'],color='blue')
OUTPUT:
(array([ 9., 23., 14., 27., 16., 26., 18., 6., 5., 6.]), array([4.3 , 4.66, 5.02, 5.38, 5.74,
6.1 , 6.46, 6.82, 7.18, 7.54, 7.9 ]), <BarContainer object of 10 artists>)
>>>plt.show()
OUTPUT:
e) Line graph:
>>>plt.xlabel('sepal.length')
OUTPUT:
Text(0.5, 0, 'sepal.length')
>>>plt.ylabel('sepal.width')
OUTPUT:
Text(0, 0.5, 'sepal.width')
>>>plt.plot(data['sepal.length'],data['sepal.width'])
OUTPUT:
[<matplotlib.lines.Line2D object at 0x000001CCCA9DB970>]
>>> plt.show()
OUTPUT:
>>> x=np.array([1,2,3])
>>>y=x*2
>>>plt.plot(x,y) #line plot
OUTPUT:
[<matplotlib.lines.Line2D object at 0x000001CCCA200C40>]
>>>plt.show()
OUTPUT: