0% found this document useful (0 votes)

74 views54 pages

CS3362 Data Science Laboratory Manual 2022-23

The document provides information on performing univariate and bivariate analysis on diabetes datasets. For univariate analysis, descriptive statistics like frequency, mean, median, mode, variance, standard deviation, skewness and kurtosis are calculated on the Pima Indian Diabetes dataset. For bivariate analysis, linear and logistic regression models are applied to analyze the relationship between variables and predict outcomes. The algorithms and Python programs for both analyses are included. Relevant packages like Pandas, NumPy, Scikit-learn, and Matplotlib are imported and used.

Uploaded by

velavanenterprises835

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views54 pages

CS3362 Data Science Laboratory Manual 2022-23

Uploaded by

velavanenterprises835

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

CS3362

DATA SCIENCE
LABORATORY
MANUAL
2022-23
1) Download Install and Explore
the Features Numpy,Jupyter,
scipy,pandas and statsmodels

Aim:
To verify and download , install and explore the
features of numpy, scipy, jupyter, statsmodels and
packages

Procedure:
1.To verify and download select version of python
to install
2.Download python Executable installer
3. Run Executable Installer
4.verify python was installed
5.verify pip installed
6.All the packages Install through pip
7.Then we install the Jupyter notebook
Code:
#open command prompt and execute the following
commands
>> pip install numpy
>> pip install jupyter
>> pip install scipy
>> pip install statsmodels
>> pip install pandas
>> pip install matplotlib

Features:
Numpy:
❖It is open source software.
❖Numpy array are speed and faster more compact
than python lists
❖Numpy uses much less memory of store data
❖It can be used perform mathematical operations.
Scipy:
❖SciPy contains varities of sub packages.
❖It is used For scientific computation.
❖Easy to use and understand as well as
computational power.
❖It can operate on an array of Numpy library.
Jupyter:
❖Jupyter notebooks are loacally Run on web
application.
❖Python is the default programming for jupyter.
❖Jupyter Notebooks support many programming
languages.
Statsmodels:
❖Statsmodels is a python module provides
EximationOf statistics.
❖This is open source library .
❖As its name implies statsmodels is a python
library.
Pandas:
❖It is open source library
❖Fast and efficient Data Frame object.
❖Reshaping and pivoting of data sets.
❖Time series functionally.
Result:
Thus that successfully download and install all the
packages and verified.
2) Working with Numpy arrays

Aim:
To verify the working with Numpy arrays.
Algorithm 1:
1. Import the numpy packages.
2. Assign the variable array.
3. Create a Numpy array.
4. Print the type of array.
5. Print the array.
Algorithm 2:
1. Start the program.
2. Import the numpy package.
3. Create an array.
4. Print the type of array.
5. Print the shape of array.
6. Print of size of array.
7. Print the type of elements.
8. Step the program.
Program:
import numpy as np
arr=np.array([1,4,3,2,5])
print(“The type of array is :”, type (arr))
print(“The array is :”, arr)

Program:
import numpy as np
arra=np.array([1,2,3],[4,3])
print(“Array is of type :”, type(arra))
print(“shape of array:”,arra.ndim)
print(“size of array :”,arra.size)
print(“No of dimensions:”,arra.ndim)
print(Array stores elements of type:”,arratype)
Result:
Thus the working with numpy array program is
executed successfully.
3) Working with pandas Data frame

Aim:
To working with pandas Data frame library with
pandas in python.

Algorithm 1:
1. Start the program.
2. Import pandas packages as pd.
3. List the strings store into the variable.
4. Calling the Data frame.
5. Print the Data frame.
6. Stop the program.

Algorithm 2:
1. Start the program.
2. Import pandas packages as pd.
3. Define a dictionary containing Data.
4. Then convert the dictionary into the Data frame.
5. Then print the specific columns.
6. Stop the program.
Program:
import pandas as pd
lst =[‘Mercury’ , ‘Venus’ , ‘Earth’, ‘Mars’, ‘Jupiter’,
‘Saturn’, ‘Uranus’, ‘Neptune’]
df=pd.DataFrame(lst)
print(df)

Program:
import pandas as pd
Data={
‘Name’:
[‘Virat’,‘Johnny Depp',‘Hemsworth’,‘Vijay’],
‘Age’:
[34,59,39,48],
‘Nation’:
[‘India’, ‘America’, ‘Australia', ‘India’]
‘Profession’:
[‘Cricketer’, ‘Actor’, ‘Actor’, ‘Actor’]
}
df= pd.dataframe(data)
print(df)
Result:
Thus the python program to working with pandas
library is coded and executed successfully.
4) Descriptive Analytics on the Iris
Data set

Aim:
Reading data from text files, Excel and the web
and exploring various commands for doing descriptive
analytics on the Iris Data set.

Algorithm:
1. Start the program.
2. Import Numpy, Pandas, Metasploit and seaboin
packages.
3. Download and import the Iris Dataset from UCI
website.
4. Load the file to the variable.
5. Read the file by read -case method using pandas.
6. Assign the column head to each column.
7. Replace and simply the target column.
8. Print the unique in target column.
9. Print the first five Rows.
10. Print the information about the Dataset.
11. Exploratory Data Analysis to start the
Analytics.
12. Analyze the dataset and Display the Result of
every operation.
13. Using seaborn, Metasploit packages display
the Graph, piecharts and histrograms.
14. Stop the program.

Program:
#Import Necessary packages
import numpy as np
import pandas as pd
import seaborn as sns
import matploitlib.pyplot as plt

filepath= “dataset/iris.csv”
df=pd.read_csv(filepath)
ds.columns=[“sepal_length”, “sepal_width”,
“petal_length”, “petal_width”, “target”]
df.target.replace({“Iris-setosa”, “setosa”, “Iris-
versicolor”: “versicolor”, “Iris-virginica”:
“virginica”},inplace=True)
df.target.unique()

#Exploratory Data Analsis

print(df.unique())
print(df.head())
print(df.info())
print(df.describe())
print(df.corr())
print(df.target.value_counts())

#Graph and plots

sns.Facetgrid(df,hue=”target”, height=6).
Map(plt.scatter, “sepal_length”,
“sepal_width”).
Add_legend()
sns.Facetgrid(df,hue=”target”, height=6).
Map(plt.scatter, “petal_length”,
“petal_width”).
Add_legend()
plt.hist(df[“sepal_length”],bins=25):
sns.Facetgrid(df,hue=”target”, height=6).
Map(sns.distplot, “petal_width”).
Add_legend()

sns.Facetgrid(df,hue=”target”, height=6).
Map(sns.distplot, “petal_length”).
Add_legend()
sns.boxplot(x=”target”, y= “petal”, data=df)
Plt.show()
Result:
Thus the python program to analyse various graph
and exploring the Isis dataset is coded and executed
successfully.
5a) Using Diabetes Data Set perform
Univariate Analysis

Aim:
Using Pima Indian Diabetes Dataset get the
frequency , mean, median, mode variance, standard
Deviation, skewness and kurtosis.

Algorithm:
1. Start the program.
2. Import numpy and pandas packages.
3. Download and import the pima Indian Diabetes
dataset from UCI or any other websites.
4. Read the file by read _csv method using pandas.
5. Gathering information about this dataset.
6. Then find the frequency , mean, median, mode,
variance, standard deviation, skewness and kurtosis
by various commands.
7. Displaying the Result.
8. Stop the program.
Program:
import numpy as np
import pandas as pd

df=pd.read_csv(‘diabetes’)
df.head().T
df.shape()
df.isnull().values.any()
df.dtypes
df.[‘outcome’]=df [‘outcome’].astype(‘bool’)
df.info()
df.describe().T
df.value_counts().T
print(“mean values:”)
df.mean()
print(“median values:”)
df.median()
print(“mode values:”)
df.mode()
print(“variance:”)
df.var()
print(“standard deviation:”)
df.std()
print(“skewness:”)
df.skew(axis=0,skipna=true)
print(“kurtosis:”)
df.kurtosis()
Result:
Thus the python program to univariate Analysis on
pima Indian Diabetes dataset was coded and executed
successfully.
5b) Using Diabetes Dataset perform
Bivariate Analysis

Aim:
Using pima Indian Diabetes Dataset to analyze the
linear and logistic regression modelling prediction.

Algorithm:
1. Start the program.
2. Import pandas, matploit, numpy, seaborn and
sklearn packages.
3. Load the diabetes Dataset into the df variable.
4. Analyze diabetes dataset.
5. To start the linear regression modelling.
6. Split the dataset into the two variable.
7. Using the skleran.linear Regression() model train
the splitted dataset.
8. Then predict the trained dataset.
9. Show the predicted score.
10. Show the linear Regression graph.
11. Then we start the logistic Regression.
12. Split the Dataset into two variable.
13. Fit the x,y variable in test.
14. Then create the function for roc curve.
15. The train the model by sklearn.logistic
Regression.
16. Show the logistic Regression graph.
17. Stop the program.

Program:
#import Necessary packages
import numpy as np
import pandas as pd
import seaborn as sns
import matplot lin.pyplot as plt
df=pd.read_csv(‘diabetes.csv’)
df.head()
df.keys()
Print(“shape of the Dataframe:”)
df.shape()
print(“Information About Dataset:”)
df.info()

#Modelling
#import packages and modes
from sklearn.model_selection import traina_test_split
from sklearn.preprocessing import standard scalar
from sklearn.linear_model import linear regression
from sklearn.linear_model import logistic regression
from sklearn.model_selection import
cross_val_predictsta
from sklearn.metrics import
accuracy_score,precision_score, recall_score,
roc_auc_score, fi_score, roc_curve, r2_score,
mean_squared_error
#Linear Regression
X=df[[‘Age’]]
Y=df[‘pregnancies’]
Sc_x=standardScalar()
Sc_y=standardScalar()
Y_std=y
Returnxy=True

X_std_train, x_std_test, y_std_train,

y_std_test=train_test_split(x_stud,y_std,test_size=0.25)

Regr=Linear Regression()
Regr.fit(x_std_train, y_std_train)
Print(“Regression:”, regr.score(x_std_test,y_std_test))
Y_std_pred=regr.predict(x_std_test)
Plt.scatter(x_std_test,y_std_test, color= ‘6’)
plt.scatter(x_std_test,y_std_pred, color= “black”,
linewidth=3)
plt.show()

#The coefficients
print(“coefficients: in”, regr.coef_)

#Logistic Regression
X=df [[‘pregnancies’, ‘glucose’, ‘BMI’, ‘Age’]]
Y=df[“outcome”]
X_train,x_test, y_train, y_test=
train_test_split(x,y,test_size=0.2, random_state=42)

Scalar = standardscalar()
X_train = scalar.fit_transform(x_train)
X_test= scalar. Transform(x_test)
Def plot_roc_curve(for, tor, lable= none):
Plt.plot (for,tor, linewidth=2, label=label)
Plt.plot([0,1],[0,1], “k—“)
Plt.xlabel(“false positive Rate”, fontname=
“monospace”, fontsize=15,weight= “semibold")
Plt.ylabel(“t=True positive Rate(recall)”,
fontname= “monospace”, fontsize=15,weight=
“semibold")
Plt.title(“roc curve”,fontname= “ monospace”,
fontsize=17, weight= “bold”)
Plt.axis([0,1,01])
Plt.show()

Model, auc_scores= [],[]

log_reg_clf= logisticregression(random_state= 42,

max_iter=500)
Log_reg_pred=cross_val_predict( log_reg_clf, x_train,
y_train, cv=5)
log_reg_scores=cross_val_predict(log_reg_clf, x_train,
y_train, cv= 3, method = “decision_function”)
log_reg_fpr, log_reg_tpr, _=roc_curve(y_train,
log_reg_scores)
plot_roc_curve(log_reg_for, log_reg_tpr)
log_reg_auc=roc_auc_score(y_train, log_reg_scores)
print(“roc_score:”, log_reg_aug)
Result:
Thus the python program to perform bivariate
analysis on the pima Indian diabetes dataset was coded
and executed successfully.
5c) Using Diabetes Dataset Perform
Multiple Regression

Aim:
Using Pima Indian Diabetes dataset to Perform. the
Multiple Regression and the predict the Model.

Algorithm:
1. Start the Program..
2. impart Pandas. Sklearn and essential Packages.
3. Load the data set into the Variable.
4. Cleansing the Dataset.
5. Analysing the basic information about the
dataset
6. Split the data set. by Dependent and Independent
Variable X, y
7. Print the initial shape of X and y, then train the
dataset and Print the shape of X and Y.
8. Using sklearn. Linear Regression() fit the model
and train the dataset.
9. Print the intercept. Coefficients, mean squared
error and R2 Score
10. Create the dataFrame with actual and Predict
the Value.
11. Visualize the Actual and Predicted Value by
Matplotlib Package.
12. Stop the Program.

Program:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt.
from sklearn import linear_model
df = pd.read.csv('diabetes.csv").

df. head()
import math
me-ins math floor (df. Insulin.median ())
print ("Null Values in Insulin column : " me_ins)

X= df [['pregnancies', 'Glucose', 'Blood' Pressure',

'SkinThickness' ,'Insulin', 'BMI', 'Diabetes Pedigree
Function', 'Age']]
Y= df ['outcome']

Print ("Shape of x and y Before Train.", X. shape y

shape).

from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test y = train_test_split (X,y,

text Size =0.25, random_state=99)
Print ("Shape of x and y After Train :",train_
x.shape,train_y.shape)

le = linear_model.LinearRegressions()
le.fit(train_x, train_y)
print ("Interception : ", le.intercept_)
print ("Coefficients in: \n",le.coef.)
y_pred = le.predict (test_x)

from sklearn.metrices import mean_squared _error,

r2score
print ("Mean Squared error: ", mean_squared
_error (test_y,y_pred))
Print ("R2 Score", r2_score (test_y, y_Pred))

import matplotlib.pyplot as plt

plt.figure(figsize = (7,7))
plt.scatter (tasty, y-Pred)
plt.xlabel("Actual")
plt.ylabel ("predicted")
plt.title ("Actual Vs. Predicted")
plt.show()

result = pd. DataFrame ({'Actual': test_y, 'predict':y-

Pred})
print ("Actual Vs. Predict")
result head (10)
print("Prediction value of Any One Row :
",le.predict([[6,148,72,35,0,33.6,0.627,50]]))
Result:
Thus the Python program to perform the Multiple
Regression on Pima Indian Diabetes Dataset Was coded
and executed successfully.
5d) Compare the Results of the
analysis of two Dataset

Aim:
Compare the Results two data sets. of the analysis
of the two data sets.

Algorithm:
1. Start the Program.
2. Install the datacompy Package with pip install
datacompy.
3. Import the Pandas and data compy Package. 4.
Load the diabetes dataset
5. Compare the two datasets to the Variable. with
datacompy Package
6. Then the Print report of the comparison.
7. Stop the program.
Program:
import pandas as pd
import datacompy as dc
df1 = pd.read_csv('diabetes.csv')
df2 = pd.read_csv ('diabetes.csv')
Compare = dc.compare (df1,df2, join_columns
='Outcome', abs_tol=0.0001, rel-tol = 0, df1_name =
'olddiabetes',df2_name ='newdiabetes')
Print (Compare.report())
Result:
Thus the python program to compare the results of
the analysis of two datasets was coded and executed
successfully.
6a) Normal Curves

Aim:
Using UCI data sets to show the Normal Curves.

Algorithm:
1. Start the program.
2. Import numpy, matplotlib and essential
Packages.
3. Creating a numpy Series of data in range of 1-
50.
4. Create a function for normal Curves.
5. Calculate mean and Standard deviation.
6. Create the function for plotting the results.
7. Plotting the results by matplotlib Packages.
8. Stop the Program.
Program:
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(1,50,200)
def normal_dist (x,mean,sd):
Prob_density = (np.pi*d)*np.exp -
0.5*((x·mean)/sd)**2)
return Prob-density
mean = np.mean (x)
sd=np.std(x)
pdf = normal_dist (x, mean, sd)
plt.plot(x,pdf,color = 'red')
plt.xlabel('Data Points')
plt.ylabel('Probability Density')
Result:
Thus the Python Program to plot the Normal
Curves was coded and executed Successfully.
6b) Density and Contour Plots

Aim:
Using the UCI Datasets to plot the Density and
contour Plots.

Algorithm:
1. Start the Program.
2. import numpy, matplotlib and essential
Packages.
3. Create the two Variable which feature_x and
feature_y.
4. Create the numpy array by np.arange function.
5. Creating two Dimensions grid features with
np.mesh grid function.
6. Create a Subplots.
7. Set the title x_label and y_label to the
Contour Plot.
9.Stop the Program.
Program:
import matplotlib.pyplot as plt
import numpy as np
feature_x = np.arange (0,50,2)
feature_y = np.arange (0,50,3)
[x,y] = np.meshgrid(feature_x, feature_y)
fig, ax = plt.subplot (1,1)
z = np.cos (x/2) + np. sin (y/4)
ax.contour(x,y,z)
ax.set_title('Contour Plot')
ax.set_xlabel('feature_x')
ax.set_ylatel('feature_y')
plt.show()
Result:
Thus the Python program to plot the Density
contour Plot was Coded and executed Successfully.
6c) Correlation and Scatter plots.

Aim:
Using Concrete data set plot the correlation and
Scatter Plots.

Algorithm:
1. Start the program
2. Import Pandas and Seaborn Packages.
3. Download and import the 'concrete.csv' dataset
from github website.
4. Read the csv file and load into the Variable.
5. Gathering information about the Dataset.
6. Change the type of cement column as category.
7. Create the Scatter Plot Using Seaborn. 8. Create
the lmplot Using Seaborn.
9. Stop the Program.
Program:
import Pandas as pd
import Seaborn on sns
con= pd.read_csv('concrete.csv')
con ['cement']=con['Cement'].astype ('category')
sns.scatterplot (x="water", y = "coarseagg", data con);
ax-sns.scatterplot (x = "water", y = coarseage", data-
con)
ax.set_title("Concrete Strength Vs. Fly ash")
ax.set_xlabel ('Fly ash')
ax. set_ylabel ("Strength")
sns.lmPlot(x="water", y = "coarseagg", data = con):
Result:
Thus the Python Program to Create the Co and
Scatter plots with Concrete dataset was and executed
successfully.
6d) Histograms

Aim:
Create the numpy arrays and plot the histograms
using matplotlib.

Algorithm:

1. Start the Program.

2. Import numpy and matplotlib packages.
3. Creating the numpy array.
4. Create the histogram using matplotlib
5. Passing the numpy array Value.
6. plot the histogram Using plt.hist function.
7. Create a dataset Using np.random.seed
8. Then Create the distribution by x and y
Variable
10. Plot and Show the histogram.
10.Stop the program.
Program:
import numpy as np.
import matplotlib.pyplot as plt
a = np.array (122, 87, 5, 43, 56, 73, 55, 53, 8, 20, 51, 5,
79, 31,21)
fig, ax = plt.subplots (figsize = (10,7))
ax.hist (a, bins = (0,25,50, 75, 100])
plt.show()
from matplotlib import colors
from matplotlib.ticks import PercentFormatter
np.random.seed (23685752)
N_Points =10000
n_bins = 20
x = np.random.randn (10000) +25
fig, axs = plt.subplots (1,1, figsize = (10,7),tight_layout
=True)
axs.hist(x, bins= n_bins)
Results:
Thus the Python Program to Create the histogram
Using numpy and matplotlib was coded and
executed Successfully.
6e) Three Dimensional Plotting.

Aim:
To implement the Three Dimensional Plotting.
Using matplotlib Package.

Algorithm:
1. Start the Program.
2. Import the numpy and matplotlib package.
3. Defining the each x,y, and z axis.
4. Put the each axis on to the Variable.
5. Load the variable Scatter plot function.
6. Then show the 3d scatter plot.
7. Stop the Program.
Program:

import numpy as np
import matplotlib.pyplot as plt
from mpl toolkits import mplot3d
fig = plt.figure()
ax = plt.axes(projection = '3d')
z= np. linspace (0, 1, 100)
x = 2*np.sin (25 * z)
y = 2*np.cos (25*z)
c=x+y
ax.scatter (x,y,z,C=c)
ax.set_title ('3d Scatter Plot')
Plt.show()
Result:
Thus the Python program to plot the three
Dimensional plot is coded and executed Successfully.
7) Visualizing Geographic in with
Basemap

Aim:
To Implement Visualizing Geographic Data with
Basemap.

Algorithm:
1. Start the program.
2. Install the basemap Package Using pip install
basemap
3.import numpy, matplotlib and basemap packages.
4. We plot the globe with orthogonal function
5. Then show the Particular location with latitude.
6. The show the Coastline of the world map.
7.Install and import the share file and geopandas
Package.
8.Download the shape file of Indian Political map from
github website.
9. Stop the Program.
Program:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
Plt.figure(figsize=(8,8))
m=Basemap(Projection='ortho', resolution=
None,lat_0=20.5937, lon_0=78.9629)
m.bluemarble(Scale=0.5)

fig = plt.figure(figsize= (8,8))

m = Basemap(Projection= 'lcc', resolution= None,
width=8E6, height=8E6, lat_0= 8.5562,
Ion_0=77.9710)
m.etopo (Scale =0.5, alpha=0.5)
x,y = m(77.9710, 8.5562)
plt.plot(xy, 'ok', markersize = 5)
plt.text(x,y,'Nazareth', fontsize=12);
fig = plt.figure(figsize=(12,12))
m= Basemap()
m.drawCoastlines()
m.drawcoastlines(linewidth=1.0, line style = 'dashed',
color='red')
plt.title("Coast lines", font size=20)
plt.show()

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
import Shapefile as shp
import from Shapely.geometry import Point

sns.set-style("white grid')
fp=r'india-Polygon.shp'
map_df = gpd.read_file(fp)
map_df.plot()
Result:
Thus the Python program to visualising Geographic
Data with Basemap was coded and executed
successfully.

Data Science Lab Manual
No ratings yet
Data Science Lab Manual
85 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
Data Science lab manual..
No ratings yet
Data Science lab manual..
54 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
fds_merged (3) (1)
No ratings yet
fds_merged (3) (1)
102 pages
Data_Science_Assignment_1_Answers
No ratings yet
Data_Science_Assignment_1_Answers
3 pages
FDS LAB
No ratings yet
FDS LAB
43 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Class Xii PDF for Practical
No ratings yet
Class Xii PDF for Practical
24 pages
Data science lab
No ratings yet
Data science lab
61 pages
ml programs
No ratings yet
ml programs
41 pages
fdsa lab manual final
No ratings yet
fdsa lab manual final
70 pages
ml file syllabus
No ratings yet
ml file syllabus
43 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
Python Data Science Practical Complete
No ratings yet
Python Data Science Practical Complete
22 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
cs3362 Foundations of Data Science Lab Manual
No ratings yet
cs3362 Foundations of Data Science Lab Manual
53 pages
FDS Lab Manual (Print)
No ratings yet
FDS Lab Manual (Print)
43 pages
FDS Final Manual
No ratings yet
FDS Final Manual
41 pages
fods(1)-merged (1)-1
No ratings yet
fods(1)-merged (1)-1
100 pages
CS 3362 FDS
No ratings yet
CS 3362 FDS
53 pages
Data Science Laboratory
No ratings yet
Data Science Laboratory
40 pages
1DA (1)
No ratings yet
1DA (1)
18 pages
ML(sudhanshu)
No ratings yet
ML(sudhanshu)
24 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
ML MANUAL
No ratings yet
ML MANUAL
21 pages
FDS_LAB_MANUAL (1)
No ratings yet
FDS_LAB_MANUAL (1)
62 pages
data science programs
No ratings yet
data science programs
11 pages
py10
No ratings yet
py10
5 pages
FDS Aim Algorithm
No ratings yet
FDS Aim Algorithm
18 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
CS 3361 SET 2
No ratings yet
CS 3361 SET 2
3 pages
Data Science
No ratings yet
Data Science
18 pages
23CS302 - dslab - experiment 1
No ratings yet
23CS302 - dslab - experiment 1
5 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Data Sci
No ratings yet
Data Sci
6 pages
Data Science Lab QP
No ratings yet
Data Science Lab QP
4 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Cs3361 Set3 Fds Anna University
No ratings yet
Cs3361 Set3 Fds Anna University
3 pages
CS 3361 SET 1 qn only
No ratings yet
CS 3361 SET 1 qn only
4 pages
CS3361 DS LAB_edited
No ratings yet
CS3361 DS LAB_edited
2 pages
Data Science
No ratings yet
Data Science
3 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
External
No ratings yet
External
11 pages
Exp-1
No ratings yet
Exp-1
22 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
ML LabManual (1)
No ratings yet
ML LabManual (1)
16 pages
Multiple Choice Questions On Quantitative Techniques
No ratings yet
Multiple Choice Questions On Quantitative Techniques
20 pages
final dev record
No ratings yet
final dev record
49 pages
Econometrics Model Exam
100% (3)
Econometrics Model Exam
10 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
74 pages
Financial Literacy and Financial Well Being Among Generation Z University Students Evidence From Greece
No ratings yet
Financial Literacy and Financial Well Being Among Generation Z University Students Evidence From Greece
23 pages
Final Assignment Using Data To Build A Bussiness Practice
No ratings yet
Final Assignment Using Data To Build A Bussiness Practice
31 pages
Unit 5
No ratings yet
Unit 5
27 pages
Bad News Travels Slowly- Size, Analyst Coverage,and the Profitability of Momentum Strategies
No ratings yet
Bad News Travels Slowly- Size, Analyst Coverage,and the Profitability of Momentum Strategies
31 pages
Ginger Paper
No ratings yet
Ginger Paper
8 pages
Revision Sheet Chapter 19
No ratings yet
Revision Sheet Chapter 19
58 pages
Fuzzy Logic - Algorithms Techniques and Implementations
No ratings yet
Fuzzy Logic - Algorithms Techniques and Implementations
294 pages
Vehicle Emissionsa
100% (1)
Vehicle Emissionsa
22 pages
A comparison between three rating scales for perceived exertion and two
No ratings yet
A comparison between three rating scales for perceived exertion and two
13 pages
L5 Normal Equations For Regression PDF
No ratings yet
L5 Normal Equations For Regression PDF
20 pages
Modul 7 Praktikum Machine Learning Python
No ratings yet
Modul 7 Praktikum Machine Learning Python
32 pages
CC655 Final 2021 Key
No ratings yet
CC655 Final 2021 Key
13 pages
371-Article Text-862-1-10-20210104
No ratings yet
371-Article Text-862-1-10-20210104
20 pages
Career Adaptability and Subjective Well-Being in Unemployed Emerging Adults: A Promising and Cautionary Tale
No ratings yet
Career Adaptability and Subjective Well-Being in Unemployed Emerging Adults: A Promising and Cautionary Tale
15 pages
Traffic Engineering-Module-3
No ratings yet
Traffic Engineering-Module-3
22 pages
06 - Effect of Determinants of Lending Behavior On Loan and Advances in Joint Venture Commercial Banks in Nepal
No ratings yet
06 - Effect of Determinants of Lending Behavior On Loan and Advances in Joint Venture Commercial Banks in Nepal
7 pages
Data Collection Was Done Through Both Secondary and Primary Sources
No ratings yet
Data Collection Was Done Through Both Secondary and Primary Sources
7 pages
House Price - Prediction
No ratings yet
House Price - Prediction
4 pages
Impact of Staff Turnover On The Financial Performance of Nigerian Deposit Money Banks
No ratings yet
Impact of Staff Turnover On The Financial Performance of Nigerian Deposit Money Banks
12 pages
Z PDF
No ratings yet
Z PDF
13 pages
MGMT 2012 Practice Questions 2023
No ratings yet
MGMT 2012 Practice Questions 2023
2 pages
Ex 3
No ratings yet
Ex 3
12 pages
Statistical Inference in Nonlinear Sure Model
No ratings yet
Statistical Inference in Nonlinear Sure Model
7 pages
1.10 - Further Examples - STAT 501
No ratings yet
1.10 - Further Examples - STAT 501
4 pages
HW #8 - Hapa, Justin
No ratings yet
HW #8 - Hapa, Justin
2 pages
Alzheimers Screening MSE
No ratings yet
Alzheimers Screening MSE
7 pages
Section 7.5 Page 393 To 407
No ratings yet
Section 7.5 Page 393 To 407
15 pages
Greene - Chap 9
No ratings yet
Greene - Chap 9
2 pages
Impact of Literacy Rate On Unemployment
0% (1)
Impact of Literacy Rate On Unemployment
5 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet

CS3362 Data Science Laboratory Manual 2022-23

Uploaded by

CS3362 Data Science Laboratory Manual 2022-23

Uploaded by

CS3362

#Exploratory Data Analsis

#Graph and plots

X_std_train, x_std_test, y_std_train,

Model, auc_scores= [],[]

log_reg_clf= logisticregression(random_state= 42,

X= df [['pregnancies', 'Glucose', 'Blood' Pressure',

Print ("Shape of x and y Before Train.", X. shape y

from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test y = train_test_split (X,y,

from sklearn.metrices import mean_squared _error,

import matplotlib.pyplot as plt

result = pd. DataFrame ({'Actual': test_y, 'predict':y-

1. Start the Program.

fig = plt.figure(figsize= (8,8))

You might also like