0% found this document useful (0 votes)

3 views36 pages

DSF Lab Manual (OCS353T)

The document outlines a series of exercises focused on Python programming for data analytics, covering installation, working with Numpy arrays, Pandas data frames, basic plotting with Matplotlib, and statistical measures including frequency distributions, mean, mode, standard deviation, variance, normal curves, correlation, and scatter plots. Each exercise includes an aim, algorithm, program code, output, and result, demonstrating successful completion of various data analysis tasks. The exercises are dated between February and March 2024.

Uploaded by

KaviarasiRajendren

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views36 pages

DSF Lab Manual (OCS353T)

Uploaded by

KaviarasiRajendren

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

EX NO: 1 Download, install and explore the features

DATE: 15.02.24 of Python for data analytics

AIM:
To download, install and explore feature are in data
analysis.
ALGORITHM:
STEP 1: Search google in "Python 3.14 64 bit download" and
double click the Exe file.
STEP 2: Open command prompt
STEP 3: Check the download packs.
STEP 4: All the packs are download successfully.

FOR COMMAND PROMPT:

STEP 1: python --version
STEP 2: python -m pip --version
STEP 3: python.exe -m pip install –upgrade pip
STEP 4: python -m pip install numpy

OUTPUT:

RESULT:
The python 3.14.2 64 bit is download and explored
successfully.
EX NO: 2
Working with Numpy arrays
DATE: 17.02.24

AIM:
To write a python program to show working of numpy arrays in
python.
ALGORITHM:
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operactions of
array
Step4: Stop

PROGRAM
import numpy as np
arr = np.array( [[ 1, 2, 3], [ 4, 2, 5]] )
print("Array is of type: ", type(arr))
print("No. of dimensions: ", arr.ndim)
print("Shape of array: ", arr.shape)
print("Size of array: ", arr.size)
print("Array stores elements of type: ", arr.dtype)

OUTPUT
Array is of type: <class 'numpy.ndarray’>
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int32

Program to Perform Array Slicing:

a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
print("After slicing")
print(a[1:])

Output
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing
[[3 4 5]
[4 5 6]]

RESULT:
Thus the working with Numpy arrays was successfully
completed.
EX NO: 3
Working with Pandas data frames
DATE: 22.02.24

AIM:
To work with Pandas data frames.

ALGORITHM:
Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop

PROGRAM :
import numpy as np
import pandas as pd
data = np.array([['','Col1','Col2'], ['Row1',1,2],
['Row2',3,4]])
print(pd.DataFrame(data=data[1:,1:], index = data[1:,0],
columns=data[0,1:]))
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(pd.DataFrame(my_2darray))
my_dict = {1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}
print(pd.DataFrame(my_dict))
my_df = pd.DataFrame(data=[4,5,6,7], index=range(0,4),
columns=['A'])
print(pd.DataFrame(my_df))
my_series = pd.Series({"United Kingdom":"London",
"India":"New Delhi", "United
States":"Washington", "Belgium":"Brussels"})
print(pd.DataFrame(my_series))
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
print(df.shape)
print(len(df.index))
Output:
Col1 Col2
Row1 1 2
Row2 3 4
0 1 2
0 1 2 3
1 4 5 61 2 3
0 1 1 2
1 3 2 4A

0 4
1 5
2 6
3 7
0
United Kingdom London
India New Delhi
United States Washington
Belgium Brussels
(2, 3)
2

Result:
Thus the working with Pandas data frames
was successfully completed.
EX NO: 4
Basic plots using Matplotlib
DATE: 29.02.24

Aim:
To draw basic plots in Python program using Matplotlib.

ALGORITHM:
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop

PROGRAM:
import matplotlib.pyplot as plt
x = [1,2,3]
y = [2,4,1]
plt.plot(x, y)
plt.xlabel('x - axis’)
plt.ylabel('y - axis’)
plt.title('My first graph!')
plt.show()

Output:

RESULT:
Thus the basic plots using Matplotlib in Python
program was successfully completed.
EX NO: 5
Statistical and Probability measures
(a) Frequency distributions
DATE: 07.03.24

Aim:
To Count the frequency of occurrence of a word in a body of
text is often needed during text
processing.

ALGORITHM:
Step 1: Start the Program
Step 2: Create text file blake-poems.txt
Step 3: Import the word_tokenize function and gutenberg
Step 4: Write the code to count the frequency of occurrence
of a word in a body of text
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
from nltk.tokenize import word_tokenize
from nltk.corpus import gutenberg
sample = gutenberg.raw("blake-poems.txt")
token = word_tokenize(sample)
wlist = []
for i in range(50):
wlist.append(token[i])
wordfreq = [wlist.count(w) for w in wlist]
print("Pairs\n" + str(zip(token, wordfreq)))

OUTPUT:

RESULT:
Thus the count the frequency of occurrence of a word in
a body of text is often needed during text processing and
Conditional Frequency Distribution program using python was
successfully completed.
EX NO: 5
(B)Mean, Mode, Standard Deviation
DATE: 07.03.24

AIM:
To write a python program to calculate the Mean, Mode,
Standard Deviation.

PROGRAM:
MEAN:
Import numpy as np
arr = np.array([2, 7, 5, 8, 9, 4])
print("Original array:",arr)
arr1 = np.mean(arr)
print("Arithmetic Mean of 1D array:",arr1)

OUTPUT:
Original array: [2 7 5 8 9 4]
Arithmetic Mean of 1D array: 5.833333333333333

MODE:
from scipy import stats as st
import numpy as np
abc = np.array([1, 1, 2, 2, 2, 3, 4, 5])
print(st.mode(abc))

OUTPUT:
ModeResult(mode=2, count=3)

STANDARD DEVIATION:

a=np.array([[1,4,7,10],[2,5,8,11]])
b=np.std(a)
b

OUTPUT:
3.391164991562634

RESULT:
Thus the computation for the Mean, Mode, Standard
Deviation was successfully completed.
EX NO: 5
(c) Variability
DATE: 07.03.24

AIM:
To write a python program to calculate the variance.

ALGORITHM:
Step 1: Start the Program
Step 2: Import statistics module from statistics import
variance
Step 3: Import fractions as parameter values from fractions
import Fraction as frame
Step 4: Create tuple of a set of positive and negative numbers
Step 5: Print the variance of each samples
Step 6: Stop the process

Program:
from statistics import variance
from fractions import Fraction as fr
sample1 = (1, 2, 5, 4, 8, 9, 12)
sample2 = (-2, -4, -3, -1, -5, -6)
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4), fr(5, 6), fr(7, 8))
sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
print("Variance of Sample1 is % s " %(variance(sample1)))
print("Variance of Sample2 is % s " %(variance(sample2)))
print("Variance of Sample3 is % s " %(variance(sample3)))
print("Variance of Sample4 is % s " %(variance(sample4)))
print("Variance of Sample5 is % s " %(variance(sample5)))

Output :
Variance of Sample 1 is 15.80952380952381
Variance of Sample 2 is 3.5
Variance of Sample 3 is 61.125
Variance of Sample 4 is 1/45
Variance of Sample 5 is 0.17613000000000006

Result:
Thus the computation for variance was successfully
completed.
EX NO: 5
d) Normal curves
DATE: 14.03.24

AIM:
To create a normal curve using python program.

ALGORITHM:
Step 1: Start the Program
Step 2: Import packages scipy and call function scipy.stats
Step 3: Import packages numpy, matplotlib and seaborn
Step 4: Create the distribution
Step 5: Visualizing the distribution
Step 6: Stop the process

PROGRAM:
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
data = np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )
sb.set_style('whitegrid’)
sb.lineplot(data, pdf , color = 'black’)
plt.xlabel('Heights’)
plt.ylabel('Probability Density')

OUTPUT:

RESULT:
Thus the normal curve using python program was
successfully completed.
EX NO: 5
(e) Correlation and scatter plots
DATE: 14.03.24

AIM:
To write a python program for correlation with scatter
plot

ALGORITHM:
Step 1: Start the Program
Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: plot the scatter plot
Step 5: Print the result
Step 6: Stop the process

OUTPUT:
[51 92 14 71 60 20 82 86 74 74]
[71, 86, 14]
array([71, 86, 60])
array([[71, 86], [60, 20]])
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
array([ 2, 5, 11])
array([[ 2, 1, 3],[ 6, 5, 7],[10, 9, 11]])
array([[0, 0, 0], [2, 1, 3],[4, 2, 6]])
[[ 0 1 2 3][ 4 5 6 7][ 8 9 10 11]]
array([10, 8, 9])
array([[ 6, 4, 5], [10, 8, 9]])
array([[ 0, 2],[ 4, 6],[ 8, 10]])
(100, 2)
array([16, 82, 94, 90, 64, 84, 99, 83, 69, 52, 14, 24, 40, 12,
57, 95, 48, 41, 71, 36])

Result:
Thus the Correlation and scatter plots using python
program was successfully completed.
EX NO: 5
(f) Correlation coefficient
DATE: 14.03.24

Aim:
To write a python program to compute correlation
coefficient.

ALGORITHM:
Step 1: Start the Program
Step 2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
import numpy as np
rand = np.random.RandomState(42)
x = rand.randint(100, size=10)
print(x)
[x[3], x[7], x[2]]
ind = [3, 7, 4]
x[ind]
ind = np.array([[3, 7],[4, 5]])
x[ind]
X = np.arange(12).reshape((3, 4))
X
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]
X[row[:, np.newaxis], col]
row[:, np.newaxis] * col
print(X)
X[2, [2, 0, 1]]
X[1:, [2, 0, 1]]
mask = np.array([1, 0, 1, 0], dtype=bool)
X[row[:, np.newaxis], mask]
mean = [0, 0]
cov = [[1, 2],[2, 5]]
X = rand.multivariate_normal(mean, cov, 100)
X.shape
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()
plt.scatter(X[:, 0], X[:, 1]);
indices = np.random.choice(X.shape[0], 20, replace=False)
indices
selection = X[indices]
selection.shape
plt.scatter(X[:, 0], X[:, 1], alpha=0.3)
plt.scatter(selection[:, 0], selection[:,
1],facecolor='none', s=200);
import sklearn
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
y = pd.Series([1, 2, 3, 4, 3, 5, 4])
x = pd.Series([1, 2, 3, 4, 5, 6, 7])
correlation = y.corr(x)
correlation
plt.scatter(x, y)
plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))
(np.unique(x)), color='red’)
plt.title('Correlation’)
plt.scatter(x, y)
plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y,
1))(np.unique(x)), color='red’)
plt.xlabel('x axis’)
plt.ylabel('y axis')

OUTPUT:
[51 92 14 71 60 20 82 86 74 74]
[71, 86, 14]
array([71, 86, 60])
array([[71, 86], [60, 20]])
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10,
11]])
array([ 2, 5, 11])
array([[ 2, 1, 3],[ 6, 5, 7],[10, 9, 11]])
array([[0, 0, 0], [2, 1, 3],[4, 2, 6]])
[[ 0 1 2 3][ 4 5 6 7][ 8 9 10 11]]
array([10, 8, 9])
array([[ 6, 4, 5], [10, 8, 9]])
array([[ 0, 2],[ 4, 6],[ 8, 10]])
(100, 2)
array([16, 82, 94, 90, 64, 84, 99, 83, 69, 52, 14, 24, 40,
12, 57, 95, 48, 41, 71, 36])
0.8603090020146067
<matplotlib.collections.PathCollection at 0x1ecb970fa10>
[<matplotlib.lines.Line2D at 0x1ecbfa73250>]
Text(0.5, 1.0, 'Correlation’)
Text(0, 0.5, 'y axis')

RESULT:
Thus the computation for correlation coefficient was
successfully completed.
EX NO: 5
(g) Regression
DATE: 14.03.24

AIM:
To write a python program for regression.

ALGORITHM:
Step 1: Start the Program
Step 2: import package of sklearn
Step 3: create the label name for x and y axis
Step 4: define the color of the output line
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
from sklearn.linear_model import LinearRegression
x = x.reshape(-1, 1)
model = LinearRegression()
model.fit(x, y)
y_pred = model.predict(x)
plt.figure(figsize=(10, 6))
plt.scatter(x, y, color='skyblue')
plt.plot(x, y_pred, color='red', linewidth=2)
plt.title('Linear Regression')
plt.xlabel('X')
plt.ylabel('Y’)
plt.show()

OUTPUT:

RESULT:
Thus the regression program was successfully completed.
EX NO: 6
(A)UNIVARIATE ANALYSIS
DATE: 28.03.24

AIM:
To process the univariate analysis for frequency, mean, median,
mode, variance, standard deviation, skewness and kurtosis.
ALGORITHM:
Step 1: Start the Program
Step 2: check the zip fill attach
Step 3: process the program
Step 4: stop
PROGRAM:
import pandas as pd
df=pd.read_csv("C:/Users/GOD/Desktop/DS Lab/diabetes.csv")
print(df)
df.value_counts("Age")
pd.crosstab(index=df['Age'],columns='Count')
print("Mean=", df['BloodPressure'].mean())
print(df.mean())
print("Meadian",df['Glucose'].median())
print(df.median())
print(df['Age'].mode())
print(df.mode())
print("Variance=",df['Age'].var())
print(df.var())
print("Standard Deviation=",df['Age'].std())
print(df.std())
data=df['Age']
skewness_value=df.skew(axis=0)
print("Measure of skewness column wise:\n",skewness_value)
kurtosis_value=df.kurt(axis=0)
print("kurtosis=",kurtosis_value)
print(df)

OUTPUT:
Pregnancies Glucose BloodPressure SkinThickness
Insulin BMI \
0 6 148 72 35
0 33.6
1 1 85 66 29
0 26.6
2 8 183 64 0
0 23.3
3 1 89 66 23
94 28.1
4 0 137 40 35
168 43.1
.. ... ... ... ...
... ...
763 10 101 76 48
180 32.9
764 2 122 70 27
0 36.8
765 5 121 72 23
112 26.2
766 1 126 60 0
0 30.1
767 1 93 70 31
0 30.4

DiabetesPedigreeFunction Age Outcome

0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1
.. ... ... ...
763 0.171 63 0
764 0.340 27 0
765 0.245 30 0
766 0.349 47 1
7670.315 23 0
[768 rows x 9 columns]
Age
22 72
21 63
25 48
24 46
23 38
28 35
26 33
27 32
29 29
31 24
41 22
30 21
37 19
42 18
33 17
38 16
36 16
32 16
45 15
34 14
0 13
43 13
46 13
39 12
35 10
52 8
44 8
50 8
51 8
58 7
54 6
47 6
53 5
60 5
49 5
57 5
48 5
66 4
62 4
63 4
55 4
59 3
56 3
65 3
67 3
61 2
69 2
64 1
68 1
70 1
72 1
81 1
Name: count, dtype: int64
col_0 Count
Age
21 63
22 72
23 38
24 46
25 48
26 33
27 32
28 35
29 29
30 21
31 24
32 16
33 17
34 14
35 10
36 16
37 19
38 16
39 12
40 13
41 22
42 18
43 13
44 8
45 15
46 13
47 6
48 5
49 5
50 8
51 8
52 8
53 5
54 6
55 4
56 3
57 5
58 7
59 3
60 5
61 2
62 4
63 4
64 1
65 3
66 4
67 3
68 1
69 2
70 1
72 1
81 1

Mean= 69.10546875
Pregnancies 3.845052
Glucose 120.894531
BloodPressure 69.105469
SkinThickness 20.536458
Insulin 79.799479
BMI 31.992578
DiabetesPedigreeFunction 0.471876
Age 33.240885
Outcome 0.348958
dtype: float64
Meadian 117.0
Pregnancies 3.0000
Glucose 117.0000
BloodPressure 72.0000
SkinThickness 23.0000
Insulin 30.5000
BMI 32.0000
DiabetesPedigreeFunction 0.3725
Age 29.0000
Outcome 0.0000
dtype: float64
0 22
Name: Age, dtype: int64
Pregnancies Glucose BloodPressure SkinThickness Insulin
BMI \
0 1.0 99 70.0 0.0 0.0
32.0
1 NaN 100 NaN NaN NaN
NaN

DiabetesPedigreeFunction Age Outcome

0 0.254 22.0 0.0
1 0.258 NaN NaN

Variance= 138.30304589037365

Pregnancies 11.354056
Glucose 1022.248314
BloodPressure 374.647271
SkinThickness 254.473245
Insulin 13281.180078
BMI 62.159984
DiabetesPedigreeFunction 0.109779
Age 138.303046
Outcome 0.227483
dtype: float64
Standard Deviation= 11.76023154067868
Pregnancies 3.369578
Glucose 31.972618
BloodPressure 19.355807
SkinThickness 15.952218
Insulin 115.244002
BMI 7.884160
DiabetesPedigreeFunction 0.331329
Age 11.760232
Outcome 0.476951
dtype: float64
Measure of skewness column wise:
Pregnancies 0.901674
Glucose 0.173754
BloodPressure -1.843608
SkinThickness 0.109372
Insulin 2.272251
BMI -0.428982
DiabetesPedigreeFunction 1.919911
Age 1.129597
Outcome 0.635017
dtype: float64
kurtosis= Pregnancies 0.159220
Glucose 0.640780
BloodPressure 5.180157
SkinThickness -0.520072
Insulin 7.214260
BMI 3.290443
DiabetesPedigreeFunction 5.594954
Age 0.643159
Outcome -1.600930
dtype: float64
Pregnancies Glucose BloodPressure SkinThickness
Insulin BMI \
0 6 148 72 35
0 33.6
1 1 85 66 29
0 26.6
2 8 183 64 0
0 23.3
3 1 89 66 23
94 28.1
4 0 137 40 35
168 43.1
.. ... ... ... ...
... ...
763 10 101 76 48
180 32.9
764 2 122 70 27
0 36.8
765 5 121 72 23
112 26.2
766 1 126 60 0
0 30.1
767 1 93 70 31
0 30.4
DiabetesPedigreeFunction Age Outcome
0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1
.. ... ... ...
763 0.171 63 0
764 0.340 27 0
765 0.245 30 0
766 0.349 47 1
767 0.315 23 0

[768 rows x 9 columns]

RESULT:
The above program was successfully completed.
EX NO: 6
(b)BIVARIATE ANALYSIS
DATE: 04.04.24

AIM:
To process the bivariate analysis for linear and logistic
regression modelling.
ALGORITHM:
Step 1: Start the Program
Step 2: check the zip fill attach
Step 3: process the program
Step 4: stop
PROGRAM:
import pandas as pd
df=pd.read_csv("C:/Users/GOD/Desktop/DS Lab/diabetes.csv")
print(df)
df.isnull().values.any()
df.describe()
drop_Glu=df.index[df.Glucose==0].tolist()
drop_BP=df.index[df.BloodPressure==0].tolist()
drop_Skin=df.index[df.SkinThickness==0].tolist()
drop_Ins=df.index[df.Insulin==0].tolist()
drop_BMI=df.index[df.BMI==0].tolist()
c=drop_Glu+drop_BP+drop_Skin+drop_Ins+drop_BMI
dia=df.drop(df.index[c])
dia.info()
print(dia)
dia.describe()
dia1=dia[dia.Outcome==1]
dia0=dia[dia.Outcome==0]
print(dia1)
print(dia0)
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x=dia.Outcome)
plt.title("count plot for outcome")
Out0=len(dia[dia.Outcome==1])
Out1=len(dia[dia.Outcome==0])
Total=Out0+Out1
Pc_of_1=Out1*100/Total
Pc_of_0=Out0*100/Total
Pc_of_1,Pc_of_0
plt.figure(figsize=(20,6))
plt.subplot(1,3,1)
sns.set_style("dark")
plt.title("Histogram for Pregnancies")
sns.displot(dia.Pregnancies,kde=False)
sns.distplot(dia0.Pregnancies,kde=False,color="Blue",
label="Preg for Outcome=0")
sns.distplot(dia1.Pregnancies,kde=False,color="Gold",
label="Preg for Outcome=1")
plt.title("Histograms for Preg by outcome")
plt.legend()
plt.subplot(1,3,3)
sns.boxplot(x=dia.Outcome,y=dia.Pregnancies)
plt.title("Boxplot for Preg by Outcome")
sns.pairplot(dia,
vars=["Pregnancies","Glucose","BloodPressure","SkinThickness","I
nsulin","BMI","DiabetesPedigreeFunction","Age"],hue="Outcome")
plt.title("Pairplot of Variables by outcome")
cor=dia.corr(method='pearson')
sns.heatmap(cor)
cols=["Pregnancies","Glucose","BloodPressure","SkinThickness","I
nsulin","BMI","DiabetesPedigreeFunction","Age"]
X=dia[cols]
y=dia.Outcome
import statsmodels.api as sm
logit_model=sm.Logit(y,X)
result=logit_model.fit()
print(result.summary())
cols4=["Pregnancies","Glucose","BloodPressure"]
logit_model=sm.Logit(y,X)
result=logit_model.fit()
print(result.summary())
rom sklearn.linear_model import LogisticRegression
logreg=LogisticRegression()
cols4=["Pregnancies","Glucose","BloodPressure"]
X=dia[cols4]
y=dia.Outcome
logreg.fit(X,y)
y_pred=logreg.predict(X)
from sklearn.metrics import classification_report
print(classification_report(y,y_pred))
Y=dia.Outcome
x=dia.drop('Outcome',axis=1)
columns=x.columns
data_X=pd.DataFrame(x,columns=columns)
data_X
from sklearn.model_selection import train_test_split
x_train, x_test, y_train,
y_test=train_test_split(data_X,Y,test_size=0.15, random_state=45
import statsmodels.api as sm
x_trainglu=x_train['Glucose']
X_train_sm=sm.add_constant(x_trainglu)
print(str(X_train_sm))
lr=sm.OLS(y_train,X_train_sm).fit()
lr.params
lr.summary()
import matplotlib.pyplot as plt
fig=plt.gcf()
fig.set_size_inches(10,10)
plt.scatter(x_trainglu,y_train)
plt.plot(x_trainglu,-0.5884+0.0075*x_trainglu,'r')
plt.show()
import matplotlib.pyplot as plt
plt.scatter(dia.Age, dia.BloodPressure)
plt.title('Age vs.Blood Pressure')
plt.xlabel('Age')
plt.ylabel('Blood Pressure')
Text(0, 0.5, 'Blood Pressure')

OUTPUT:
RESULT:
The above program was successfully completed.
EX NO: 7
(a)SUPERVISED LEARNING
DATE: 18.04.24

Aim:
To write a python program to compute supervised learning.

ALGORITHM:
Step 1: Start the Program
Step 2: Import sklearn package
Step 3: Define supervised learning function
Step 4: Plot the grap for supervised learning
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
from sklearn import datasets, linear_model
import matplotlib.pyplot as plt
import numpy as np
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
regr = linear_model.LinearRegression()
regr.fit(diabetes_X_train, diabetes_y_train)
print('Input Values')
print(diabetes_X_test)
diabetes_y_pred = regr.predict(diabetes_X_test)
print("Predicted Output Values")
print(diabetes_y_pred)
plt.scatter(diabetes_X_test, diabetes_y_test, color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='red',
linewidth=1)
plt.show()

OUTPUT:
Input Values
[[ 0.07786339]
[-0.03961813]
[ 0.01103904]
[-0.04069594]
[-0.03422907]
[ 0.00564998]
[ 0.08864151]
[-0.03315126]
[-0.05686312]
[-0.03099563]
[ 0.05522933]
[-0.06009656]
[ 0.00133873]
[-0.02345095]
[-0.07410811]
[ 0.01966154]
[-0.01590626]
[-0.01590626]
[ 0.03906215]
[-0.0730303 ]]
Predicted Output Values
[225.9732401 115.74763374 163.27610621 114.73638965
120.80385422
158.21988574 236.08568105 121.81509832 99.56772822
123.83758651
204.73711411 96.53399594 154.17490936 130.91629517
83.3878227
171.36605897 137.99500384 137.99500384 189.56845268
84.3990668 ]

RESULT:
The above program was successfully completed.
EX NO: 7
(b)UNSUPERVISED LEARNING
DATE: 18.04.24

Aim:
To write a python program to compute unsupervised
learning.

ALGORITHM:
Step 1: Start the Program
Step 2: Import sklearn package
Step 3: Define unsupervised learning function
Step 4: Plot the grap for unsupervised learning
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
from sklearn import datasets
import matplotlib.pyplot as plt
iris_df = datasets.load_iris()
print(dir(iris_df))
print(iris_df.feature_names)
print(iris_df.target_names)
label = {0: 'red', 1: 'blue', 2: 'green'}
print(iris_df.target)
x_axis = iris_df.data[:, 0]
y_axis = iris_df.data[:, 2]
plt.scatter(x_axis, y_axis, c=iris_df.target)
plt.show()
from sklearn import datasets
from sklearn.cluster import KMeans
iris_df = datasets.load_iris()
model = KMeans(n_clusters=3)
model.fit(iris_df.data)
predicted_label = model.predict([[7.2, 3.5, 0.8, 1.6]])
all_predictions = model.predict(iris_df.data)
print(predicted_label)
print(all_predictions)

OUTPUT:

['DESCR', 'data', 'data_module', 'feature_names', 'filename',

'frame', 'target', 'target_names']
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
'petal width (cm)']
['setosa' 'versicolor' 'virginica']
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2
2 2]

RESULT:
The above program was successfully completed.
EX NO: 8
Apply and explore various plotting functions on
DATE: 26.04.24 data set

Aim:
To write a python program to explore the various plotting
function on data set.

ALGORITHM:
Step 1: Start the Program
Step 2: import pandas
Step 3: Load the Breast Cancer Wisconsin dataset from UCI
Step 4: input the label name
Step 5: Print the result
Step 6: Stop the process

PROGRAM:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
url = "https://archive.ics.uci.edu/ml/machine-learning-
databases/breast-cancer-wisconsin/wdbc.data"
names = ['id_number', 'diagnosis', 'radius_mean',
'texture_mean', 'perimeter_mean', 'area_mean',
'smoothness_mean', 'compactness_mean', 'concavity_mean',
'concave_points_mean', 'symmetry_mean',
'fractal_dimension_mean', 'radius_se', 'texture_se',
'perimeter_se', 'area_se', 'smoothness_se',
'compactness_se', 'concavity_se', 'concave_points_se',
'symmetry_se', 'fractal_dimension_se',
'radius_worst', 'texture_worst', 'perimeter_worst',
'area_worst', 'smoothness_worst',
'compactness_worst', 'concavity_worst',
'concave_points_worst', 'symmetry_worst',
'fractal_dimension_worst']
breast_cancer_df = pd.read_csv(url, names=names, header=None)
diagnosis_count = breast_cancer_df['diagnosis'].value_counts()
plt.figure(figsize=(8, 6))
diagnosis_count.plot(kind='bar', color=['skyblue', 'salmon'])
plt.title('Distribution of Diagnosis')
plt.xlabel('Diagnosis’)
plt.ylabel('Count')
plt.xticks(rotation=0)
plt.show()
diagnosis_count =
breast_cancer_df['diagnosis'].value_counts()
plt.figure(figsize=(8, 6))
plt.pie(diagnosis_count, labels=diagnosis_count.index,
colors=['skyblue', 'salmon'], autopct='%1.1f%%',
startangle=140)
plt.title('Distribution of Diagnosis')
plt.axis('equal')
plt.show()
plt.figure(figsize=(8, 6))
plt.scatter(breast_cancer_df['radius_mean'],
breast_cancer_df['texture_mean'], c='blue', alpha=0.5)
plt.title('Scatter Plot of Radius Mean vs Texture Mean')
plt.xlabel('Radius Mean')
plt.ylabel('Texture Mean')
plt.show()
plt.figure(figsize=(8, 6))
plt.hist(breast_cancer_df['radius_mean'], bins=20,
color='orange', alpha=0.7)
plt.title('Histogram of Radius Mean')
plt.xlabel('Radius Mean')
plt.ylabel('Frequency')
plt.show()
plt.figure(figsize=(8, 6))
sns.kdeplot(data=breast_cancer_df, x='radius_mean',
y='texture_mean', cmap='viridis', fill=True)
plt.title('Contour Plot of Radius Mean vs Texture Mean')
plt.xlabel('Radius Mean')
plt.ylabel('Texture Mean')
plt.show()

OUTPUT:
RESULT:
The above program was successfully completed.

DSF Lab Exp Full
No ratings yet
DSF Lab Exp Full
88 pages
Dsa Lab Record (Ai&Ds)
No ratings yet
Dsa Lab Record (Ai&Ds)
34 pages
Dsf-Pyt-Lab Manual
No ratings yet
Dsf-Pyt-Lab Manual
50 pages
FDS Lab Manual (Print)
No ratings yet
FDS Lab Manual (Print)
43 pages
Data Science Fundamentals Lab
No ratings yet
Data Science Fundamentals Lab
24 pages
Fundamentals of Data Science Lab Manual-5-26
No ratings yet
Fundamentals of Data Science Lab Manual-5-26
22 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
Dsa Lab
No ratings yet
Dsa Lab
28 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Lab Manual Data Science
No ratings yet
Lab Manual Data Science
24 pages
Fundamentals of Data Science Lab Manual New
No ratings yet
Fundamentals of Data Science Lab Manual New
33 pages
ML3 Data Analysis
No ratings yet
ML3 Data Analysis
80 pages
Dsf-Pyt-Lab Manual
No ratings yet
Dsf-Pyt-Lab Manual
54 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
Manual
No ratings yet
Manual
21 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
Fods Lab Manual
No ratings yet
Fods Lab Manual
26 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Batch2 FDS Printout
No ratings yet
Batch2 FDS Printout
38 pages
FDSA Lab Record
No ratings yet
FDSA Lab Record
30 pages
DSC Lab Cycle Programs 2024 1-40
No ratings yet
DSC Lab Cycle Programs 2024 1-40
48 pages
FDSA Lab Manual Aim Algorithm
No ratings yet
FDSA Lab Manual Aim Algorithm
32 pages
IP Book 12 Question Bank
No ratings yet
IP Book 12 Question Bank
20 pages
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
24 pages
Unit 5 PythonPackages (Matplotlib)
No ratings yet
Unit 5 PythonPackages (Matplotlib)
24 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
OCS353 Data Science Manual Print
No ratings yet
OCS353 Data Science Manual Print
58 pages
Cs3361-Data Science Lab Manual
No ratings yet
Cs3361-Data Science Lab Manual
44 pages
697e9176-7141-4407-ac59-183e04ddf458
No ratings yet
697e9176-7141-4407-ac59-183e04ddf458
44 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Lab Mannual
No ratings yet
Lab Mannual
49 pages
Ad3411 - Dsa Lab Manual
No ratings yet
Ad3411 - Dsa Lab Manual
34 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
Python
No ratings yet
Python
20 pages
Class X - A.I. - Practical Lab Manual - VVA 2024-25
No ratings yet
Class X - A.I. - Practical Lab Manual - VVA 2024-25
50 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
DSF Lab
No ratings yet
DSF Lab
14 pages
Data Science Practical
No ratings yet
Data Science Practical
28 pages
Fundamentals of Data Science Lab Manual New1
100% (1)
Fundamentals of Data Science Lab Manual New1
32 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Sandeep ML Record
No ratings yet
Sandeep ML Record
31 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
FDS Final Manual
No ratings yet
FDS Final Manual
41 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
45 pages
Study Material For XII Computer Science On: Data Visualization Using Pyplot
No ratings yet
Study Material For XII Computer Science On: Data Visualization Using Pyplot
22 pages
Fds Record
No ratings yet
Fds Record
69 pages
Roadmap
No ratings yet
Roadmap
27 pages
Tutorial 2
No ratings yet
Tutorial 2
9 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
Exp 9
No ratings yet
Exp 9
10 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
14 - QP Ip-01-1
No ratings yet
14 - QP Ip-01-1
8 pages
Data Analytics Essentials Online Course
No ratings yet
Data Analytics Essentials Online Course
15 pages
Abhay Mishra
No ratings yet
Abhay Mishra
1 page
Pandas MCQ Questions With Answers
No ratings yet
Pandas MCQ Questions With Answers
33 pages
Summer 24
100% (1)
Summer 24
19 pages
Class 12 IP Practical Record
No ratings yet
Class 12 IP Practical Record
33 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Shubhang Analyst PDF
No ratings yet
Shubhang Analyst PDF
1 page
Hugging Face
No ratings yet
Hugging Face
1 page
Final Ip Project
No ratings yet
Final Ip Project
27 pages
Cs3352 Foundations of Data Science L T P C
No ratings yet
Cs3352 Foundations of Data Science L T P C
2 pages
CampusX DSMP 2.0 Syllabus
No ratings yet
CampusX DSMP 2.0 Syllabus
66 pages
Ip Class 12 Practical File
No ratings yet
Ip Class 12 Practical File
39 pages
45B Ahmed Shaikh AIML Journal
No ratings yet
45B Ahmed Shaikh AIML Journal
181 pages
Data Analytics Job Ready Program (DAB3.0)
No ratings yet
Data Analytics Job Ready Program (DAB3.0)
28 pages
Boris Ilin CV
No ratings yet
Boris Ilin CV
1 page
Data Analyst Roadmap 2024
No ratings yet
Data Analyst Roadmap 2024
14 pages
Pulkit Dembla's Engineering Resume
No ratings yet
Pulkit Dembla's Engineering Resume
2 pages
27-33python EmpoweringDataScienceApplicationsandResearch
No ratings yet
27-33python EmpoweringDataScienceApplicationsandResearch
8 pages
Google Colab for ML Beginners
No ratings yet
Google Colab for ML Beginners
14 pages
21CSS101J - PPS - Course Assessment Plan - v0
No ratings yet
21CSS101J - PPS - Course Assessment Plan - v0
12 pages
2024SDSC500AD Assignment
No ratings yet
2024SDSC500AD Assignment
24 pages
WhatsApp Chat Analysis Tool
No ratings yet
WhatsApp Chat Analysis Tool
7 pages
B.tech Minor Syllabus-CSE (Data Science) - Final
No ratings yet
B.tech Minor Syllabus-CSE (Data Science) - Final
17 pages
Unit - V Packages & Gui
No ratings yet
Unit - V Packages & Gui
41 pages
Practical File Assignment
No ratings yet
Practical File Assignment
4 pages
Python Libraries
No ratings yet
Python Libraries
17 pages
LAST MINUTES REVISION Pandas Series
No ratings yet
LAST MINUTES REVISION Pandas Series
6 pages
COVID-19 Impact Analysis India 2024
No ratings yet
COVID-19 Impact Analysis India 2024
28 pages

DSF Lab Manual (OCS353T)

Uploaded by

DSF Lab Manual (OCS353T)

Uploaded by

EX NO: 1 Download, install and explore the features

DATE: 15.02.24 of Python for data analytics

FOR COMMAND PROMPT:

Program to Perform Array Slicing:

DiabetesPedigreeFunction Age Outcome

DiabetesPedigreeFunction Age Outcome

[768 rows x 9 columns]

['DESCR', 'data', 'data_module', 'feature_names', 'filename',

You might also like