[go: up one dir, main page]

0% found this document useful (0 votes)
3 views36 pages

DSF Lab Manual (OCS353T)

The document outlines a series of exercises focused on Python programming for data analytics, covering installation, working with Numpy arrays, Pandas data frames, basic plotting with Matplotlib, and statistical measures including frequency distributions, mean, mode, standard deviation, variance, normal curves, correlation, and scatter plots. Each exercise includes an aim, algorithm, program code, output, and result, demonstrating successful completion of various data analysis tasks. The exercises are dated between February and March 2024.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views36 pages

DSF Lab Manual (OCS353T)

The document outlines a series of exercises focused on Python programming for data analytics, covering installation, working with Numpy arrays, Pandas data frames, basic plotting with Matplotlib, and statistical measures including frequency distributions, mean, mode, standard deviation, variance, normal curves, correlation, and scatter plots. Each exercise includes an aim, algorithm, program code, output, and result, demonstrating successful completion of various data analysis tasks. The exercises are dated between February and March 2024.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

EX NO: 1 Download, install and explore the features

DATE: 15.02.24 of Python for data analytics

AIM:
To download, install and explore feature are in data
analysis.
ALGORITHM:
STEP 1: Search google in "Python 3.14 64 bit download" and
double click the Exe file.
STEP 2: Open command prompt
STEP 3: Check the download packs.
STEP 4: All the packs are download successfully.

FOR COMMAND PROMPT:


STEP 1: python --version
STEP 2: python -m pip --version
STEP 3: python.exe -m pip install –upgrade pip
STEP 4: python -m pip install numpy

OUTPUT:

RESULT:
The python 3.14.2 64 bit is download and explored
successfully.
EX NO: 2
Working with Numpy arrays
DATE: 17.02.24

AIM:
To write a python program to show working of numpy arrays in
python.
ALGORITHM:
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operactions of
array
Step4: Stop

PROGRAM
import numpy as np
arr = np.array( [[ 1, 2, 3], [ 4, 2, 5]] )
print("Array is of type: ", type(arr))
print("No. of dimensions: ", arr.ndim)
print("Shape of array: ", arr.shape)
print("Size of array: ", arr.size)
print("Array stores elements of type: ", arr.dtype)

OUTPUT
Array is of type: <class 'numpy.ndarray’>
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int32

Program to Perform Array Slicing:


a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
print("After slicing")
print(a[1:])

Output
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing
[[3 4 5]
[4 5 6]]

RESULT:
Thus the working with Numpy arrays was successfully
completed.
EX NO: 3
Working with Pandas data frames
DATE: 22.02.24

AIM:
To work with Pandas data frames.

ALGORITHM:
Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop

PROGRAM :
import numpy as np
import pandas as pd
data = np.array([['','Col1','Col2'], ['Row1',1,2],
['Row2',3,4]])
print(pd.DataFrame(data=data[1:,1:], index = data[1:,0],
columns=data[0,1:]))
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(pd.DataFrame(my_2darray))
my_dict = {1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}
print(pd.DataFrame(my_dict))
my_df = pd.DataFrame(data=[4,5,6,7], index=range(0,4),
columns=['A'])
print(pd.DataFrame(my_df))
my_series = pd.Series({"United Kingdom":"London",
"India":"New Delhi", "United
States":"Washington", "Belgium":"Brussels"})
print(pd.DataFrame(my_series))
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
print(df.shape)
print(len(df.index))
Output:
Col1 Col2
Row1 1 2
Row2 3 4
0 1 2
0 1 2 3
1 4 5 61 2 3
0 1 1 2
1 3 2 4A

0 4
1 5
2 6
3 7
0
United Kingdom London
India New Delhi
United States Washington
Belgium Brussels
(2, 3)
2

Result:
Thus the working with Pandas data frames
was successfully completed.
EX NO: 4
Basic plots using Matplotlib
DATE: 29.02.24

Aim:
To draw basic plots in Python program using Matplotlib.

ALGORITHM:
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop

PROGRAM:
import matplotlib.pyplot as plt
x = [1,2,3]
y = [2,4,1]
plt.plot(x, y)
plt.xlabel('x - axis’)
plt.ylabel('y - axis’)
plt.title('My first graph!')
plt.show()

Output:

RESULT:
Thus the basic plots using Matplotlib in Python
program was successfully completed.
EX NO: 5
Statistical and Probability measures
(a) Frequency distributions
DATE: 07.03.24

Aim:
To Count the frequency of occurrence of a word in a body of
text is often needed during text
processing.

ALGORITHM:
Step 1: Start the Program
Step 2: Create text file blake-poems.txt
Step 3: Import the word_tokenize function and gutenberg
Step 4: Write the code to count the frequency of occurrence
of a word in a body of text
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
from nltk.tokenize import word_tokenize
from nltk.corpus import gutenberg
sample = gutenberg.raw("blake-poems.txt")
token = word_tokenize(sample)
wlist = []
for i in range(50):
wlist.append(token[i])
wordfreq = [wlist.count(w) for w in wlist]
print("Pairs\n" + str(zip(token, wordfreq)))

OUTPUT:

RESULT:
Thus the count the frequency of occurrence of a word in
a body of text is often needed during text processing and
Conditional Frequency Distribution program using python was
successfully completed.
EX NO: 5
(B)Mean, Mode, Standard Deviation
DATE: 07.03.24

AIM:
To write a python program to calculate the Mean, Mode,
Standard Deviation.

PROGRAM:
MEAN:
Import numpy as np
arr = np.array([2, 7, 5, 8, 9, 4])
print("Original array:",arr)
arr1 = np.mean(arr)
print("Arithmetic Mean of 1D array:",arr1)

OUTPUT:
Original array: [2 7 5 8 9 4]
Arithmetic Mean of 1D array: 5.833333333333333

MODE:
from scipy import stats as st
import numpy as np
abc = np.array([1, 1, 2, 2, 2, 3, 4, 5])
print(st.mode(abc))

OUTPUT:
ModeResult(mode=2, count=3)

STANDARD DEVIATION:

a=np.array([[1,4,7,10],[2,5,8,11]])
b=np.std(a)
b

OUTPUT:
3.391164991562634

RESULT:
Thus the computation for the Mean, Mode, Standard
Deviation was successfully completed.
EX NO: 5
(c) Variability
DATE: 07.03.24

AIM:
To write a python program to calculate the variance.

ALGORITHM:
Step 1: Start the Program
Step 2: Import statistics module from statistics import
variance
Step 3: Import fractions as parameter values from fractions
import Fraction as frame
Step 4: Create tuple of a set of positive and negative numbers
Step 5: Print the variance of each samples
Step 6: Stop the process

Program:
from statistics import variance
from fractions import Fraction as fr
sample1 = (1, 2, 5, 4, 8, 9, 12)
sample2 = (-2, -4, -3, -1, -5, -6)
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4), fr(5, 6), fr(7, 8))
sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
print("Variance of Sample1 is % s " %(variance(sample1)))
print("Variance of Sample2 is % s " %(variance(sample2)))
print("Variance of Sample3 is % s " %(variance(sample3)))
print("Variance of Sample4 is % s " %(variance(sample4)))
print("Variance of Sample5 is % s " %(variance(sample5)))

Output :
Variance of Sample 1 is 15.80952380952381
Variance of Sample 2 is 3.5
Variance of Sample 3 is 61.125
Variance of Sample 4 is 1/45
Variance of Sample 5 is 0.17613000000000006

Result:
Thus the computation for variance was successfully
completed.
EX NO: 5
d) Normal curves
DATE: 14.03.24

AIM:
To create a normal curve using python program.

ALGORITHM:
Step 1: Start the Program
Step 2: Import packages scipy and call function scipy.stats
Step 3: Import packages numpy, matplotlib and seaborn
Step 4: Create the distribution
Step 5: Visualizing the distribution
Step 6: Stop the process

PROGRAM:
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
data = np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )
sb.set_style('whitegrid’)
sb.lineplot(data, pdf , color = 'black’)
plt.xlabel('Heights’)
plt.ylabel('Probability Density')

OUTPUT:

RESULT:
Thus the normal curve using python program was
successfully completed.
EX NO: 5
(e) Correlation and scatter plots
DATE: 14.03.24

AIM:
To write a python program for correlation with scatter
plot

ALGORITHM:
Step 1: Start the Program
Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: plot the scatter plot
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
import numpy as np
rand = np.random.RandomState(42)
x = rand.randint(100, size=10)
print(x)
[x[3], x[7], x[2]]
ind = [3, 7, 4]
x[ind]
ind = np.array([[3, 7],[4, 5]])
x[ind]
X = np.arange(12).reshape((3, 4))
X
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]
X[row[:, np.newaxis], col]
row[:, np.newaxis] * col
print(X)
X[2, [2, 0, 1]]
X[1:, [2, 0, 1]]
mask = np.array([1, 0, 1, 0], dtype=bool)
X[row[:, np.newaxis], mask]
mean = [0, 0]
cov = [[1, 2],[2, 5]]
X = rand.multivariate_normal(mean, cov, 100)
X.shape
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()
plt.scatter(X[:, 0], X[:, 1]);
indices = np.random.choice(X.shape[0], 20, replace=False)
indices
selection = X[indices]
selection.shape
plt.scatter(X[:, 0], X[:, 1], alpha=0.3)
plt.scatter(selection[:, 0], selection[:,1],facecolor='none',
s=200);

OUTPUT:
[51 92 14 71 60 20 82 86 74 74]
[71, 86, 14]
array([71, 86, 60])
array([[71, 86], [60, 20]])
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
array([ 2, 5, 11])
array([[ 2, 1, 3],[ 6, 5, 7],[10, 9, 11]])
array([[0, 0, 0], [2, 1, 3],[4, 2, 6]])
[[ 0 1 2 3][ 4 5 6 7][ 8 9 10 11]]
array([10, 8, 9])
array([[ 6, 4, 5], [10, 8, 9]])
array([[ 0, 2],[ 4, 6],[ 8, 10]])
(100, 2)
array([16, 82, 94, 90, 64, 84, 99, 83, 69, 52, 14, 24, 40, 12,
57, 95, 48, 41, 71, 36])

Result:
Thus the Correlation and scatter plots using python
program was successfully completed.
EX NO: 5
(f) Correlation coefficient
DATE: 14.03.24

Aim:
To write a python program to compute correlation
coefficient.

ALGORITHM:
Step 1: Start the Program
Step 2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
import numpy as np
rand = np.random.RandomState(42)
x = rand.randint(100, size=10)
print(x)
[x[3], x[7], x[2]]
ind = [3, 7, 4]
x[ind]
ind = np.array([[3, 7],[4, 5]])
x[ind]
X = np.arange(12).reshape((3, 4))
X
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]
X[row[:, np.newaxis], col]
row[:, np.newaxis] * col
print(X)
X[2, [2, 0, 1]]
X[1:, [2, 0, 1]]
mask = np.array([1, 0, 1, 0], dtype=bool)
X[row[:, np.newaxis], mask]
mean = [0, 0]
cov = [[1, 2],[2, 5]]
X = rand.multivariate_normal(mean, cov, 100)
X.shape
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()
plt.scatter(X[:, 0], X[:, 1]);
indices = np.random.choice(X.shape[0], 20, replace=False)
indices
selection = X[indices]
selection.shape
plt.scatter(X[:, 0], X[:, 1], alpha=0.3)
plt.scatter(selection[:, 0], selection[:,
1],facecolor='none', s=200);
import sklearn
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
y = pd.Series([1, 2, 3, 4, 3, 5, 4])
x = pd.Series([1, 2, 3, 4, 5, 6, 7])
correlation = y.corr(x)
correlation
plt.scatter(x, y)
plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))
(np.unique(x)), color='red’)
plt.title('Correlation’)
plt.scatter(x, y)
plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y,
1))(np.unique(x)), color='red’)
plt.xlabel('x axis’)
plt.ylabel('y axis')

OUTPUT:
[51 92 14 71 60 20 82 86 74 74]
[71, 86, 14]
array([71, 86, 60])
array([[71, 86], [60, 20]])
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10,
11]])
array([ 2, 5, 11])
array([[ 2, 1, 3],[ 6, 5, 7],[10, 9, 11]])
array([[0, 0, 0], [2, 1, 3],[4, 2, 6]])
[[ 0 1 2 3][ 4 5 6 7][ 8 9 10 11]]
array([10, 8, 9])
array([[ 6, 4, 5], [10, 8, 9]])
array([[ 0, 2],[ 4, 6],[ 8, 10]])
(100, 2)
array([16, 82, 94, 90, 64, 84, 99, 83, 69, 52, 14, 24, 40,
12, 57, 95, 48, 41, 71, 36])
0.8603090020146067
<matplotlib.collections.PathCollection at 0x1ecb970fa10>
[<matplotlib.lines.Line2D at 0x1ecbfa73250>]
Text(0.5, 1.0, 'Correlation’)
Text(0, 0.5, 'y axis')

RESULT:
Thus the computation for correlation coefficient was
successfully completed.
EX NO: 5
(g) Regression
DATE: 14.03.24

AIM:
To write a python program for regression.

ALGORITHM:
Step 1: Start the Program
Step 2: import package of sklearn
Step 3: create the label name for x and y axis
Step 4: define the color of the output line
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
from sklearn.linear_model import LinearRegression
x = x.reshape(-1, 1)
model = LinearRegression()
model.fit(x, y)
y_pred = model.predict(x)
plt.figure(figsize=(10, 6))
plt.scatter(x, y, color='skyblue')
plt.plot(x, y_pred, color='red', linewidth=2)
plt.title('Linear Regression')
plt.xlabel('X')
plt.ylabel('Y’)
plt.show()

OUTPUT:

RESULT:
Thus the regression program was successfully completed.
EX NO: 6
(A)UNIVARIATE ANALYSIS
DATE: 28.03.24

AIM:
To process the univariate analysis for frequency, mean, median,
mode, variance, standard deviation, skewness and kurtosis.
ALGORITHM:
Step 1: Start the Program
Step 2: check the zip fill attach
Step 3: process the program
Step 4: stop
PROGRAM:
import pandas as pd
df=pd.read_csv("C:/Users/GOD/Desktop/DS Lab/diabetes.csv")
print(df)
df.value_counts("Age")
pd.crosstab(index=df['Age'],columns='Count')
print("Mean=", df['BloodPressure'].mean())
print(df.mean())
print("Meadian",df['Glucose'].median())
print(df.median())
print(df['Age'].mode())
print(df.mode())
print("Variance=",df['Age'].var())
print(df.var())
print("Standard Deviation=",df['Age'].std())
print(df.std())
data=df['Age']
skewness_value=df.skew(axis=0)
print("Measure of skewness column wise:\n",skewness_value)
kurtosis_value=df.kurt(axis=0)
print("kurtosis=",kurtosis_value)
print(df)

OUTPUT:
Pregnancies Glucose BloodPressure SkinThickness
Insulin BMI \
0 6 148 72 35
0 33.6
1 1 85 66 29
0 26.6
2 8 183 64 0
0 23.3
3 1 89 66 23
94 28.1
4 0 137 40 35
168 43.1
.. ... ... ... ...
... ...
763 10 101 76 48
180 32.9
764 2 122 70 27
0 36.8
765 5 121 72 23
112 26.2
766 1 126 60 0
0 30.1
767 1 93 70 31
0 30.4

DiabetesPedigreeFunction Age Outcome


0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1
.. ... ... ...
763 0.171 63 0
764 0.340 27 0
765 0.245 30 0
766 0.349 47 1
7670.315 23 0
[768 rows x 9 columns]
Age
22 72
21 63
25 48
24 46
23 38
28 35
26 33
27 32
29 29
31 24
41 22
30 21
37 19
42 18
33 17
38 16
36 16
32 16
45 15
34 14
0 13
43 13
46 13
39 12
35 10
52 8
44 8
50 8
51 8
58 7
54 6
47 6
53 5
60 5
49 5
57 5
48 5
66 4
62 4
63 4
55 4
59 3
56 3
65 3
67 3
61 2
69 2
64 1
68 1
70 1
72 1
81 1
Name: count, dtype: int64
col_0 Count
Age
21 63
22 72
23 38
24 46
25 48
26 33
27 32
28 35
29 29
30 21
31 24
32 16
33 17
34 14
35 10
36 16
37 19
38 16
39 12
40 13
41 22
42 18
43 13
44 8
45 15
46 13
47 6
48 5
49 5
50 8
51 8
52 8
53 5
54 6
55 4
56 3
57 5
58 7
59 3
60 5
61 2
62 4
63 4
64 1
65 3
66 4
67 3
68 1
69 2
70 1
72 1
81 1

Mean= 69.10546875
Pregnancies 3.845052
Glucose 120.894531
BloodPressure 69.105469
SkinThickness 20.536458
Insulin 79.799479
BMI 31.992578
DiabetesPedigreeFunction 0.471876
Age 33.240885
Outcome 0.348958
dtype: float64
Meadian 117.0
Pregnancies 3.0000
Glucose 117.0000
BloodPressure 72.0000
SkinThickness 23.0000
Insulin 30.5000
BMI 32.0000
DiabetesPedigreeFunction 0.3725
Age 29.0000
Outcome 0.0000
dtype: float64
0 22
Name: Age, dtype: int64
Pregnancies Glucose BloodPressure SkinThickness Insulin
BMI \
0 1.0 99 70.0 0.0 0.0
32.0
1 NaN 100 NaN NaN NaN
NaN

DiabetesPedigreeFunction Age Outcome


0 0.254 22.0 0.0
1 0.258 NaN NaN

Variance= 138.30304589037365

Pregnancies 11.354056
Glucose 1022.248314
BloodPressure 374.647271
SkinThickness 254.473245
Insulin 13281.180078
BMI 62.159984
DiabetesPedigreeFunction 0.109779
Age 138.303046
Outcome 0.227483
dtype: float64
Standard Deviation= 11.76023154067868
Pregnancies 3.369578
Glucose 31.972618
BloodPressure 19.355807
SkinThickness 15.952218
Insulin 115.244002
BMI 7.884160
DiabetesPedigreeFunction 0.331329
Age 11.760232
Outcome 0.476951
dtype: float64
Measure of skewness column wise:
Pregnancies 0.901674
Glucose 0.173754
BloodPressure -1.843608
SkinThickness 0.109372
Insulin 2.272251
BMI -0.428982
DiabetesPedigreeFunction 1.919911
Age 1.129597
Outcome 0.635017
dtype: float64
kurtosis= Pregnancies 0.159220
Glucose 0.640780
BloodPressure 5.180157
SkinThickness -0.520072
Insulin 7.214260
BMI 3.290443
DiabetesPedigreeFunction 5.594954
Age 0.643159
Outcome -1.600930
dtype: float64
Pregnancies Glucose BloodPressure SkinThickness
Insulin BMI \
0 6 148 72 35
0 33.6
1 1 85 66 29
0 26.6
2 8 183 64 0
0 23.3
3 1 89 66 23
94 28.1
4 0 137 40 35
168 43.1
.. ... ... ... ...
... ...
763 10 101 76 48
180 32.9
764 2 122 70 27
0 36.8
765 5 121 72 23
112 26.2
766 1 126 60 0
0 30.1
767 1 93 70 31
0 30.4
DiabetesPedigreeFunction Age Outcome
0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1
.. ... ... ...
763 0.171 63 0
764 0.340 27 0
765 0.245 30 0
766 0.349 47 1
767 0.315 23 0

[768 rows x 9 columns]

RESULT:
The above program was successfully completed.
EX NO: 6
(b)BIVARIATE ANALYSIS
DATE: 04.04.24

AIM:
To process the bivariate analysis for linear and logistic
regression modelling.
ALGORITHM:
Step 1: Start the Program
Step 2: check the zip fill attach
Step 3: process the program
Step 4: stop
PROGRAM:
import pandas as pd
df=pd.read_csv("C:/Users/GOD/Desktop/DS Lab/diabetes.csv")
print(df)
df.isnull().values.any()
df.describe()
drop_Glu=df.index[df.Glucose==0].tolist()
drop_BP=df.index[df.BloodPressure==0].tolist()
drop_Skin=df.index[df.SkinThickness==0].tolist()
drop_Ins=df.index[df.Insulin==0].tolist()
drop_BMI=df.index[df.BMI==0].tolist()
c=drop_Glu+drop_BP+drop_Skin+drop_Ins+drop_BMI
dia=df.drop(df.index[c])
dia.info()
print(dia)
dia.describe()
dia1=dia[dia.Outcome==1]
dia0=dia[dia.Outcome==0]
print(dia1)
print(dia0)
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x=dia.Outcome)
plt.title("count plot for outcome")
Out0=len(dia[dia.Outcome==1])
Out1=len(dia[dia.Outcome==0])
Total=Out0+Out1
Pc_of_1=Out1*100/Total
Pc_of_0=Out0*100/Total
Pc_of_1,Pc_of_0
plt.figure(figsize=(20,6))
plt.subplot(1,3,1)
sns.set_style("dark")
plt.title("Histogram for Pregnancies")
sns.displot(dia.Pregnancies,kde=False)
sns.distplot(dia0.Pregnancies,kde=False,color="Blue",
label="Preg for Outcome=0")
sns.distplot(dia1.Pregnancies,kde=False,color="Gold",
label="Preg for Outcome=1")
plt.title("Histograms for Preg by outcome")
plt.legend()
plt.subplot(1,3,3)
sns.boxplot(x=dia.Outcome,y=dia.Pregnancies)
plt.title("Boxplot for Preg by Outcome")
sns.pairplot(dia,
vars=["Pregnancies","Glucose","BloodPressure","SkinThickness","I
nsulin","BMI","DiabetesPedigreeFunction","Age"],hue="Outcome")
plt.title("Pairplot of Variables by outcome")
cor=dia.corr(method='pearson')
sns.heatmap(cor)
cols=["Pregnancies","Glucose","BloodPressure","SkinThickness","I
nsulin","BMI","DiabetesPedigreeFunction","Age"]
X=dia[cols]
y=dia.Outcome
import statsmodels.api as sm
logit_model=sm.Logit(y,X)
result=logit_model.fit()
print(result.summary())
cols4=["Pregnancies","Glucose","BloodPressure"]
logit_model=sm.Logit(y,X)
result=logit_model.fit()
print(result.summary())
rom sklearn.linear_model import LogisticRegression
logreg=LogisticRegression()
cols4=["Pregnancies","Glucose","BloodPressure"]
X=dia[cols4]
y=dia.Outcome
logreg.fit(X,y)
y_pred=logreg.predict(X)
from sklearn.metrics import classification_report
print(classification_report(y,y_pred))
Y=dia.Outcome
x=dia.drop('Outcome',axis=1)
columns=x.columns
data_X=pd.DataFrame(x,columns=columns)
data_X
from sklearn.model_selection import train_test_split
x_train, x_test, y_train,
y_test=train_test_split(data_X,Y,test_size=0.15, random_state=45
import statsmodels.api as sm
x_trainglu=x_train['Glucose']
X_train_sm=sm.add_constant(x_trainglu)
print(str(X_train_sm))
lr=sm.OLS(y_train,X_train_sm).fit()
lr.params
lr.summary()
import matplotlib.pyplot as plt
fig=plt.gcf()
fig.set_size_inches(10,10)
plt.scatter(x_trainglu,y_train)
plt.plot(x_trainglu,-0.5884+0.0075*x_trainglu,'r')
plt.show()
import matplotlib.pyplot as plt
plt.scatter(dia.Age, dia.BloodPressure)
plt.title('Age vs.Blood Pressure')
plt.xlabel('Age')
plt.ylabel('Blood Pressure')
Text(0, 0.5, 'Blood Pressure')

OUTPUT:
RESULT:
The above program was successfully completed.
EX NO: 7
(a)SUPERVISED LEARNING
DATE: 18.04.24

Aim:
To write a python program to compute supervised learning.

ALGORITHM:
Step 1: Start the Program
Step 2: Import sklearn package
Step 3: Define supervised learning function
Step 4: Plot the grap for supervised learning
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
from sklearn import datasets, linear_model
import matplotlib.pyplot as plt
import numpy as np
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
regr = linear_model.LinearRegression()
regr.fit(diabetes_X_train, diabetes_y_train)
print('Input Values')
print(diabetes_X_test)
diabetes_y_pred = regr.predict(diabetes_X_test)
print("Predicted Output Values")
print(diabetes_y_pred)
plt.scatter(diabetes_X_test, diabetes_y_test, color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='red',
linewidth=1)
plt.show()

OUTPUT:
Input Values
[[ 0.07786339]
[-0.03961813]
[ 0.01103904]
[-0.04069594]
[-0.03422907]
[ 0.00564998]
[ 0.08864151]
[-0.03315126]
[-0.05686312]
[-0.03099563]
[ 0.05522933]
[-0.06009656]
[ 0.00133873]
[-0.02345095]
[-0.07410811]
[ 0.01966154]
[-0.01590626]
[-0.01590626]
[ 0.03906215]
[-0.0730303 ]]
Predicted Output Values
[225.9732401 115.74763374 163.27610621 114.73638965
120.80385422
158.21988574 236.08568105 121.81509832 99.56772822
123.83758651
204.73711411 96.53399594 154.17490936 130.91629517
83.3878227
171.36605897 137.99500384 137.99500384 189.56845268
84.3990668 ]

RESULT:
The above program was successfully completed.
EX NO: 7
(b)UNSUPERVISED LEARNING
DATE: 18.04.24

Aim:
To write a python program to compute unsupervised
learning.

ALGORITHM:
Step 1: Start the Program
Step 2: Import sklearn package
Step 3: Define unsupervised learning function
Step 4: Plot the grap for unsupervised learning
Step 5: Print the result
Step 6: Stop the process

PROGRAM:
from sklearn import datasets
import matplotlib.pyplot as plt
iris_df = datasets.load_iris()
print(dir(iris_df))
print(iris_df.feature_names)
print(iris_df.target_names)
label = {0: 'red', 1: 'blue', 2: 'green'}
print(iris_df.target)
x_axis = iris_df.data[:, 0]
y_axis = iris_df.data[:, 2]
plt.scatter(x_axis, y_axis, c=iris_df.target)
plt.show()
from sklearn import datasets
from sklearn.cluster import KMeans
iris_df = datasets.load_iris()
model = KMeans(n_clusters=3)
model.fit(iris_df.data)
predicted_label = model.predict([[7.2, 3.5, 0.8, 1.6]])
all_predictions = model.predict(iris_df.data)
print(predicted_label)
print(all_predictions)

OUTPUT:

['DESCR', 'data', 'data_module', 'feature_names', 'filename',


'frame', 'target', 'target_names']
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
'petal width (cm)']
['setosa' 'versicolor' 'virginica']
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2
2 2]

RESULT:
The above program was successfully completed.
EX NO: 8
Apply and explore various plotting functions on
DATE: 26.04.24 data set

Aim:
To write a python program to explore the various plotting
function on data set.

ALGORITHM:
Step 1: Start the Program
Step 2: import pandas
Step 3: Load the Breast Cancer Wisconsin dataset from UCI
Step 4: input the label name
Step 5: Print the result
Step 6: Stop the process

PROGRAM:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
url = "https://archive.ics.uci.edu/ml/machine-learning-
databases/breast-cancer-wisconsin/wdbc.data"
names = ['id_number', 'diagnosis', 'radius_mean',
'texture_mean', 'perimeter_mean', 'area_mean',
'smoothness_mean', 'compactness_mean', 'concavity_mean',
'concave_points_mean', 'symmetry_mean',
'fractal_dimension_mean', 'radius_se', 'texture_se',
'perimeter_se', 'area_se', 'smoothness_se',
'compactness_se', 'concavity_se', 'concave_points_se',
'symmetry_se', 'fractal_dimension_se',
'radius_worst', 'texture_worst', 'perimeter_worst',
'area_worst', 'smoothness_worst',
'compactness_worst', 'concavity_worst',
'concave_points_worst', 'symmetry_worst',
'fractal_dimension_worst']
breast_cancer_df = pd.read_csv(url, names=names, header=None)
diagnosis_count = breast_cancer_df['diagnosis'].value_counts()
plt.figure(figsize=(8, 6))
diagnosis_count.plot(kind='bar', color=['skyblue', 'salmon'])
plt.title('Distribution of Diagnosis')
plt.xlabel('Diagnosis’)
plt.ylabel('Count')
plt.xticks(rotation=0)
plt.show()
diagnosis_count =
breast_cancer_df['diagnosis'].value_counts()
plt.figure(figsize=(8, 6))
plt.pie(diagnosis_count, labels=diagnosis_count.index,
colors=['skyblue', 'salmon'], autopct='%1.1f%%',
startangle=140)
plt.title('Distribution of Diagnosis')
plt.axis('equal')
plt.show()
plt.figure(figsize=(8, 6))
plt.scatter(breast_cancer_df['radius_mean'],
breast_cancer_df['texture_mean'], c='blue', alpha=0.5)
plt.title('Scatter Plot of Radius Mean vs Texture Mean')
plt.xlabel('Radius Mean')
plt.ylabel('Texture Mean')
plt.show()
plt.figure(figsize=(8, 6))
plt.hist(breast_cancer_df['radius_mean'], bins=20,
color='orange', alpha=0.7)
plt.title('Histogram of Radius Mean')
plt.xlabel('Radius Mean')
plt.ylabel('Frequency')
plt.show()
plt.figure(figsize=(8, 6))
sns.kdeplot(data=breast_cancer_df, x='radius_mean',
y='texture_mean', cmap='viridis', fill=True)
plt.title('Contour Plot of Radius Mean vs Texture Mean')
plt.xlabel('Radius Mean')
plt.ylabel('Texture Mean')
plt.show()

OUTPUT:
RESULT:
The above program was successfully completed.

You might also like