0% found this document useful (0 votes)

12 views55 pages

FINAL FDS MANUAL Print

The document provides an overview of several essential Python packages, including NumPy for array processing, SciPy for scientific computations, Pandas for data manipulation, Statsmodels for statistical modeling, and Jupyter for interactive computing. It also outlines the installation process for Python and these packages, along with example code snippets demonstrating basic functionalities such as array creation, manipulation, and data analysis. The document serves as a comprehensive guide for users looking to leverage these tools for data science and statistical analysis.

Uploaded by

durga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views55 pages

FINAL FDS MANUAL Print

Uploaded by

durga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 55

FEATURES OF PYTHON PACKAGES:

1. NUMPY
One of the most fundamental packages in Python, NumPy is a general-purpose array-
processing package. It provides high-performance multidimensional array objects and tools
to work with the arrays. NumPy is an efficient container of generic multi-dimensional data.
NumPy’s main object is the homogeneous multidimensional array. It is a table of Elements
or numbers of the same datatype, indexed by a tuple of positive integers. In NumPy,
dimensions are called axes and the number of axes is called rank. NumPy’s array class is
called ndarray aka array.
 Basic array operations: add, multiply, slice, flatten, reshape, index arrays
 Advanced array operations: stack arrays, split into sections, broadcast arrays
 Work with DateTime or Linear Algebra
 Basic Slicing and Advanced Indexing in NumPy Python.

2. SCIPY
The SciPy library is one of the core packages that make up the SciPy stack. Now, there is a
difference between SciPy Stack and SciPy, the library. SciPy builds on the NumPy array
object and is part of the stack which includes tools like Matplotlib, Pandas, and SymPy with
additional tools, SciPy library contains modules for efficient mathematical routines as linear
algebra, interpolation, optimization, integration, and statistics. There are various issues
related to Scientific Computation that arises while working with data science.
 SciPy provides us with a variety of sub-packages to solve these issues efficiently.
 SciPy library has amazingly fast computational power and easy to use.
 It can operate an array of NumPy libraries and has also optimized the functions used
in NumPy.
 After GNU Scientific library, SciPy is one of the most used scientific libraries.

3. PANDAS
Pandas is an open-source Python package that provides high-performance, easy-to-use
data structures and data analysis tools for the labeled data in Python programming
language. Pandas stand for Python Data Analysis Library. Pandas is a perfect tool for data
wrangling or munging. It is designed for quick and easy data manipulation, reading,
aggregation, and visualization. Pandas take data in a CSV or TSV file or a SQL database
and create a Python object with rows and columns called a data frame. The data frame is
very similar to a table in statistical software, say Excel or SPSS.

 Indexing, manipulating, renaming, sorting, merging data frame

 Update, Add, Delete columns from a data frame
 Impute missing files, handle missing data or NANs
 Plot data with histogram or box plot

4. STATSMODELS
Statsmodels is built for hardcore statistics. The core of the Statsmodels Library is
production ready”. Traditional models like robust linear models, generalized linear model
(GLM) etc. have all been around for a long time and have been validated against “R &
Stata”. It also contains the time series analysis section, which includes vector
autoregression (VAR), AR & ARMA.
 Linear/ Multiple regression – Linear regression is a statistical method for modeling
the linear relationship between a dependent variable and one or more explanatory
variables.
 Logistic regression – The logistic model is used in statistics to model the
likelihood of a specific event/class occurring such as win/lose, pass/fail, etc.
 Time series analysis – It refers to the analysis of time series data to retrieve
meaningful statistics and many other data characteristics
 Statistical tests – Refers to the many statistical tests that can be done using the
Statsmodels Library.
5. JUPYTER
Project Jupyter is a suite of software products used in interactive computing. Packages
under Jupyter project include
Jupyter notebook − A web based interface to programming environments of Python,
Julia, R and many others
QtConsole − Qt based terminal for Jupyter kernels similar to IPython
nbviewer − Facility to share Jupyter notebooks
JupyterLab − Modern web based integrated interface for all products.
 Offers a powerful interactive Python shell.
 Acts as a main kernel for Jupyter notebook and other front end tools of Project
Jupyter.
 Possesses object introspection ability. Introspection is the ability to check
properties of an object during runtime.
 Syntax highlighting.
 Stores the history of interactions.
 Tab completion of keywords, variables and function names.
 Magic command system useful for controlling Python environment and
performing OS tasks.
PYTHON INSTALLATION
 Open the python official web site. (https://www.python.org/)
 Downloads ==> Windows ==> Select Recent Release. (Requires Windows 10 or above
versions)
 Install "python-3.10.6-amd64.exe"

PACKAGE INSTALLATION
Open command prompt and enter the following code to check whether the python was installed
properly or not, “python –version”. If installation is proper it returns the version of python

Enter the following code to check whether the python package manager was installed properly
or not, “pip –version”.

If installation is proper it returns the version of python package manager

 Enter the following code to install the Numpy library: pip install numpy
 Enter the following code to install the SciPy library: pip install scipy
 Enter the following code to install the Statsmodels library: pip install statsmodels
 Enter the following code to install the Pandas library: pip install Pandas
 Enter the following code to install the Jupyter: pip install Jupyter
OUTPUT:
PROGRAM:

1. Creating Arrays:

 0-D Arrays
Each value in an array is a 0-D array.

import numpy as np
arr = np.array(42)
print(arr)
 1-D Arrays
An array that has 0-D arrays as its elements is called 1-D array.

import numpy as np
arr = np.array([1, 2,3, 4, 5])
print(arr)
 2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
 3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
2. Array Dimensions:
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim) print(b.ndim) print(c.ndim) print(d.ndim)
3. Access 2-D Arrays:
To access elements from 2-D arrays we can use comma separated integers
representing the dimension and the index of the element.

import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])

4. Access 3-D Arrays:

To access elements from 3-D arrays we can use comma separated integers
representing the dimensions and the index of the element.

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])

5. Array Slicing:
Slicing in python means taking elements from one given index to another given index.
We pass slice instead of index like this: [start:end]. We can also define the step, like
this: [start:end:step].

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])

6. Data Types:
NumPy has some extra data types, and refer to data types with one character, like i for
integers, u for unsigned integers etc.
import numpy as np
arr = np.array([1, 2, 3, 4], dtype='S')
print(arr)
print(arr.dtype)

7. Copy & View:

import numpy as np
arr = np.array([1, 2, 3, 4, 5]) x = arr.copy()
arr[0] = 42
print(arr)
print(x)

8. Make a view:
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) x = arr.view()
arr[0] = 42
print(arr) print(x)

9. Array Shape & Reshaping:

Array Shape NumPy arrays have an attribute called shape that returns a tuple with
each index having the number of corresponding elements.
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)

10. Array Reshaping:

Reshaping means changing the shape of an array. By reshaping we can add or remove
dimensions or change number of elements in each dimension.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2) print(newarr)

11. Array Iterating:

Iterating means going through elements one by one. As we deal with multi-
dimensional arrays in numpy, we can do this using basic for loop of python.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
print(x)

12. Joining Array:

Joining means putting contents of two or more arrays in a single array.
import numpy as np
arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

13. Splitting Array:

Splitting is reverse operation of Joining. Joining merges multiple arrays into one and
Splitting breaks one array into multiple.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6]) newarr = np.array_split(arr, 3)
print(newarr)

14. Searching Arrays:

We can search an array for a certain value, and return the indexes that get a match. To
search an array, use the where() method.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4]) x = np.where(arr == 4)
print(x)

15. Sorting:
Sorting means putting elements in an ordered sequence. Ordered sequence is any
sequence that has an order corresponding to elements, like numeric or alphabetical,
ascending or descending. The NumPy ndarray object has a function called sort(), that
will sort a specified array.
import numpy as np
arr = np.array([3, 2, 0, 1]) print(np.sort(arr))
16. Filtering Arrays:
Getting some elements out of an existing array and creating a new array out of them is
called filtering. In NumPy, you filter an array using a boolean index list.
import numpy as np
arr = np.array([41, 42, 43, 44]) x = [True, False, True, False] newarr = arr[x]
print(newarr)

OUTPUT:
PROGRAM:

import numpy as np

a = np.array([[1,2,3], [4,5,6], [7,8,9]])

print("The first matrix value is ::>",a)

b = np.array([[2,3,4],[5,6,7], [8,9,10]])

print("The second matrix value is ::>",b)

mul= np.multiply(a,b)

add= np.add(a,b)

sub=np.subtract(a,b)

div=np.divide(a,b)

print("Addition Matrix Resultant is ::>",add)

print("Subtraction Matrix Resultant is ::>",sub)

print("Division Matrix Resultant is ::>",div)

print("Multiplication Matrix Resultant is ::>",mul)

OUTPUT:
PROGRAM:

import pandas as pd

df = pd.DataFrame({ 'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha

Hinton', 'Syed Wharton'],

'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],

'Age': [18.5, 21.2, 22.5, 22, 23]})

print("Original DataFrame:")

print(df)

df1 = df.copy(deep = True)

df = df.drop([0, 1])

df1 = df1.drop([2])

print("\nNew DataFrames:")

print(df) print(df1)

print('\n"one_to_one”: check if merge keys are unique in both left and right datasets:"')

df_one_to_one = pd.merge(df, df1, validate = "one_to_one")

print(df_one_to_one)

print('\n"one_to_many” or “1:m”: check if merge keys are unique in left dataset:')

df_one_to_many = pd.merge(df, df1, validate = "one_to_many")

print(df_one_to_many)

print('“many_to_one” or “m:1”: check if merge keys are unique in right dataset:')

df_many_to_one = pd.merge(df, df1, validate = "many_to_one")

print(df_many_to_one)
PROGRAM:

#DATA COLLECT

import pandas as pd

import numpy as np

importmatplotlib.pyplot as plt

importseaborn as sns

dataset=pd.read_csv("iris.txt")

dataset.head()

dataset=pd.read_excel("iris.xlsx")

dataset.head()

dataset=pd.read_csv("iris.csv")

dataset.head()

dataset.info()

dataset.Species.unique()

#EDA

dataset.describe()

dataset.corr()

dataset.Species.value_counts()

sns.FacetGrid(dataset,hue="Species",size=6).map(plt.scatter,"Sepal.Length","Sepal.Width")

add_legend()

sns.FacetGrid(dataset,hue="Species",size=6).map(plt.scatter,"Petal.Length","Petal.Widh")

add_legend()

sns.pairplot(dataset,hue="Species")

plt.hist(dataset["Sepal.Length"],bin=25);

sns.FacetGrid(dataset,hue="Species",size=6).map(sns.displot,"Sepal.Width").add_legend();
sns.boxplot(x='Species',y='Petal.Length',data=dataset)

#PREPROCESSING

fromsklearn.preprocessing import StandardScaler

ss=StandardScaler()

x=dataset.drop(['Species'],axis=1) y=dataset['Species']

scaler=ss.fit(x)

x_stdscaler=scaler.transform(x) x_stdscaler

fromsklearn.preprocessing import LabelEncoder

le=LabelEncoder()

y=le.fit_transform(y)

#SPLITTING

From sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)

x_train.value_counts

#MODEL SELECTION

From sklearn.svm import SVC

svc=SVC(kernel="linear")

svc.fit(x_train,y_train)

y_pred=svc.predict(x_test)

y_pred

fromsklearn.metrics import accuracy_score

accuracy_score(y_pred,y_test)

#PREDICTION

fromsklearn.neighbors import KNeighborsClassifier

knn=KNeighborsClassifier(n_neighbors=3)

knn.fit(x_train,y_train)
KNeighborsClassifier(n_neighbors=3)

y_pred=knn.predict(x_test)

accuracy_score(y_pred,y_test)

OUTPUT:

DATASET HEADS:

Unnamed Sepal. Sepal.

Petal.Length Petal.Width Species
:0 Length Width

0 1 5.1 3.5 1.4 0.2 setosa

1 2 4.9 3.0 1.4 0.2 setosa

2 3 4.7 3.2 1.3 0.2 setosa

3 4 4.6 3.1 1.5 0.2 setosa

4 5 5.0 3.6 1.4 0.2 setosa

DATASET INFORMATION:
<class
'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype

0 Unnamed: 0 150 non-null int64

1 Sepal.Length 150 non-null float64
2 Sepal.Width 150 non-null float64
3 Petal.Length 150 non-null float64
4 Petal.Width 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB

DATASET UNIQUE:

array(['setosa', 'versicolor', 'virginica'], dtype=object)

DATASET SPECIES VALUE COUNTS:

setosa 50

versicolor 50

virginica 50

Name: Species, dtype: int64

DATASET DESCRIPTION:

Unnamed: 0 Sepal.Length Sepal.Width Petal.Length Petal.Width

150.0000
count 150.000000 150.000000 150.000000 150.000000
00

mean 75.500000 5.843333 3.057333 3.758000 1.199333

std 43.445368 0.828066 0.435866 1.765298 0.762238

min 1.000000 4.300000 2.000000 1.000000 0.100000

25% 38.250000 5.100000 2.800000 1.600000 0.300000

50% 75.500000 5.800000 3.000000 4.350000 1.300000

75% 112.750000 6.400000 3.300000 5.100000 1.800000

max 150.000000 7.900000 4.400000 6.900000 2.500000

Sepal.Length
Unnamed: 0 sepal.Width Petal.Length Petal.Width

Unnamed: 0 1.000000 0.716676 -0.402301 0.882637 0.900027

Sepal.Length 0.716676 1.000000 -0.117570 0.871754 0.817941

Sepal.Width -0.402301 -0.117570 1.000000 -0.428440 -0.366126

Petal.Length 0.882637 0.871754 -0.428440 1.000000 0.962865

Petal.Width 0.900027 0.817941 0.366126 0.962865 1.000000

DATASET CORRELATION:

SCATTER PLOT:
PAIRPLOT:

HISTOGRAM:
BOXPLOT:

PREPROCESSING:

array([[-1.72054204e+00, -9.00681170e-01, 1.01900435e+00,

-1.34022653e+00, -1.31544430e+00],

[-1.69744751e+00, -1.14301691e+00, -1.31979479e-01,

-1.34022653e+00, -1.31544430e+00],

[-1.67435299e+00, -1.38535265e+00, 3.28414053e-01,

-1.39706395e+00, -1.31544430e+00],

[-1.65125846e+00, -1.50652052e+00, 9.82172869e-02,

-1.28338910e+00, -1.31544430e+00],

[-1.58197489e+00, -1.50652052e+00, 7.88807586e-01, [-2.42492502e-01, -2.94841818e-01, -

3.62176246e-01, 7.62758269e-01, 7.90670654e-01]])

SPLITTING:

bound method DataFrame.value_counts of Unnamed: 0

Sepal.LengthSepal.WidthPetal.LengthPetal.Width

81 82 5.5 2.4 3.7 1.0

133 134 6.3 2.8 5.1 1.5

137 138 6.4 3.1 5.5 1.8

75 76 6.6 3.0 4.4 1.4

109 110 7.2 3.6 6.1 2.5

.. ... ... ... ... ...

71 72 6.1 2.8 4.0 1.3

106 107 4.9 2.5 4.5 1.7

14 15 5.8 4.0 1.2 0.2

92 93 5.8 2.6 4.0 1.2

102 103 7.1 3.0 5.9 2.1

[105 rows x 5 columns]>

MODEL SELECTION:

1.0

PREDICTION:

1.0

PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

df=pd.read_csv("diabetes_csv.csv")

df.head()

df.skin.value_counts()

df.mean(axis = 0)

print(df.loc[:,'skin'].mean())

df.mean(axis = 1)[0:5]

df.median()

print(df.loc[:,'skin'].median())

df.median(axis = 1)[0:5] df.mode()

df.std() print(df.loc[:,'skin'].std())

df.std(axis = 1)[0:5]

df.var()
print(df.skew())

df.describe()

df.describe(include='all')

print(df.kurtosis())

norm_data = pd.DataFrame(np.random.normal(size=100000)) norm_data.plot(kind="density",

figsize=(10,10));

# Plot black line at mean

plt.vlines(norm_data.mean(), ymin=0, ymax=0.4,linewidth=5.0); # Plot red line at median

plt.vlines(norm_data.median(), ymin=0, ymax=0.4, linewidth=2.0, color="red");

OUTPUT:

HEAD DATA’S:

preg Plas pres skin insu mass pedi age class

0 6 148 72 35 0 33.6 0.627 50 tested_positive

1 1 85 66 29 0 26.6 0.351 31 tested_negative

2 8 183 64 0 0 23.3 0.672 32 tested_positive

3 1 89 66 23 94 28.1 0.167 21 tested_negative

4 0 137 40 35 168 43.1 2.288 33 tested_positive

FREQUENCY:

0 227
32 31
30 27
27 23
23 22
33 20
28 20
18 20
31 19
19 18
39 18
29 17
40 16
25 16

MEAN:

20.536458333333332

0 43.153375

1 29.868875

2 38.871500

3 40.283375

4 57.298500

dtype: float64

MODE:
preg plas pres skin insu mass pedi age class

0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 tested_negative

1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN

MEDIAN:

23.0

0 34.30

1 27.80

2 15.65

3 25.55

4 37.50

dtype: float64
STANDARD DEVIATION:

15.952217567727677

0 49.397286

1 31.519803

2 62.253392

3 37.591100

4 61.533847

VARIANCE:

preg 11.354056

plas 1022.248314

pres 374.647271

skin 254.473245

insu 13281.180078

mass 62.159984

pedi 0.109779

age 138.303046

dtype: float64

SKEWNESS:

preg 0.901674

plas 0.173754

pres -1.843608

skin 0.109372

insu 2.272251

dtype: float64

KURTOSIS:
preg 0.159220

plas 0.640780

pres 5.180157

skin -0.520072

insu 7.214260

mass 3.290443

pedi 5.594954

age 0.643159

dtype: float64

GRAPH:
PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

df=pd.read_csv("pima-indians-diabetes.csv")

df.head()

df.mean(axis = 0)

print(df.loc[:,'35'].mean())

df.mean(axis = 1)[0:5]

df.median()

print(df.loc[:,'33.6'].median())

df.median(axis = 1)[0:5] df.mode()

df.std()

print(df.loc[:,'35'].std())

df.std(axis = 1)[0:5] df.var()

print(df.skew())

print(df.kurtosis())
norm_data = pd.DataFrame(np.random.normal(size=100000))
norm_data.plot(kind="density",figsize=(10,10));

# Plot black line at mean

plt.vlines(norm_data.mean(),ymin=0, ymax=0.4,linewidth=5.0); # Plot red line at median

plt.vlines(norm_data.median(), ymin=0, ymax=0.4, linewidth=2.0,color="red");

OUTPUT:

HEAD DATA’S:

6 148 72 35 0 33.6 0.627 50 1

0 1 85 66 29 0 26.6 0.351 31 0

1 8 183 64 0 0 23.3 0.672 32 1

2 1 89 66 23 94 28.1 0.167 21 0

3 0 137 40 35 168 43.1 2.288 33 1

4 5 116 74 0 0 25.6 0.201 30 0

MEAN:

20.517601043024772

0 26.550111

1 34.663556

2 35.807444

3 51.043111

4 27.866778

dtype: float64
MODE:

6 148 72 35 0 33.6 0.627 50 1

0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 0.0

1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN

MEDIAN:

32.0

0 26.6

1 8.0

2 23.0

3 35.0

4 5.0

dtype: float64

STANDARD DEVIATION:

15.954059060433842

0 31.119744

1 59.585320

2 37.639873

3 60.541569

4 41.114755

dtype: float64

VARIANCE:
6 11.362809

148 1022.622445

72 375.125415

35 254.532001

0 13290.194335

33.6 62.237755

0.627 0.109890

50 138.116452

1 0.227226

dtype: float64

SKEWNESS:

6 0.903976

148 0.176412

72 -1.841911

35 0.112058

0 2.270630

33.6 -0.427950

0.627 1.921190

50 1.135165

1 0.638949

dtype: float64

KURTOSIS:

6 0.161293

148 0.642992

72 5.168578

35 -0.518325

0 7.205266

33.6 3.282498
0.627 5.593374

50 0.660872

1 -1.595913

dtype: float64

GRAPH:
PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn import datasets

%matplotlib inline

diabetes=pd.read_csv("C:\\Users\\KSK\\Documents\\diabetes.csv")

diabetes.head()

diabetes = datasets.load_diabetes()

print(diabetes.DESCR)

diabetes.feature_names

# Now we will split the data into the independent and independent variable

X = diabetes.data[:,np.newaxis,3]

Y = diabetes.target

#We will split the data into training and testing data fromsklearn.model_selection

import train_test_split x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.3)

# Linear Regression

fromsklearn.linear_model import LinearRegression

reg=LinearRegression()

reg.fit(x_train,y_train)

y_pred = reg.predict(x_test)

Coef=reg.coef_

print(Coef)

fromsklearn.metrics import mean_squared_error, r2_score

MSE=mean_squared_error(y_test,y_pred)

R2=r2_score(y_test,y_pred) print(R2,MSE)

frommatplotlib.pyplot

import * importmatplotlib.pyplot as plt

plt.scatter(y_pred, y_test)

plt.title('Predicted data vs Real Data')

plt.xlabel('y_pred') plt.ylabel('y_test')

plt.show() plt.scatter(x_test, y_test)

plt.plot(x_test,y_pred,linewidth=2)

plt.title('Linear Regression')

plt.xlabel('y_pred')

plt.ylabel('y_test')

plt.show()

model = LogisticRegression()

model.fit(x_train,y_train)

y_predict=model.predict(x_test)

model_score = model.score(x_test,y_test)

print(model_score)

print(metrics.confusion_matrix(y_test, y_predict))
OUTPUT:

DIABETES DESCRIPTION:

Diabetes dataset

Ten baseline variables, age, sex, body mass index, average blood

Pressure, and six blood serum measurements were obtained for each of n = 442
diabetes patients, as well as the response of interest, a

Quantitative measure of disease progression one year after baseline.

Data Set Characteristics:

: Number of Instances: 442

: Number of Attributes: First 10 columns are numeric predictive values

: Target: Column 11 is a quantitative measure of disease progression one year after

baseline

: Attribute Information:

- Age age in years

- Sex

- bmi body mass index

- bp average blood pressure

- s1 tc, total serum cholesterol

- s2 ldl, low-density lipoproteins

- s3 hdl, high-density lipoproteins

- s4 tch, total cholesterol / HDL

- s5 ltg, possibly log of serum triglycerides level

- s6 glu, blood sugar level

COEFFICIENT VALUE:

[731.87600042]

MEAN SQUARE ERROR AND R2 VALUE:

0.16465773342986756 & 4765.090270861111

PREDICTED DATA VS REAL DATA:

LINEAR REGRESSION:
MODEL SCORE FOR LOGISTIC REGRESSION:

0.007518796992481203

CONFUSION MATRIX FOR LOGISTIC REGRESSION:

[[130 17]

[ 38 46]]
PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn import datasets %matplotlib inline

diabetes=pd.read_csv("C:\\Users\\KSK\\Documents\\FDS LAb\\diabetes.csv")

diabetes.head()

importstatsmodels.api as sm

fromstatsmodels.stats.anova import anova_lm

X = diabetes[["Age", "BMI"]]## the input variables

y = diabetes["Glucose"] ## the output variables, the one you want to predict

X = sm.add_constant(X) ## let's add an intercept (beta_0) to our model

# Note the difference in argument order model2 = sm.OLS(y, X).fit()

predictions = model2.predict(X) # make the predictions by the model # Print out the
statistics
model2.summary()

OUTPUT:

HEAD DATA’S:

Blood Skin DiabetesPedigree

Pregnancies Glucose Insulin BMI Age Outcome
Pressure Thickness
Function

0 6 148 72 35 0 33.6 0.627 50 1

1 1 85 66 29 0 26.6 0.351 31 0

2 8 183 64 0 0 23.3 0.672 32 1

3
1 89 66 23 94 28.1 0.167 21 0

4 0 137 40 35 168 43.1 2.288 33 1

OLS Regression Results

Dep. Variable: Glucose R-squared: 0.114

Model: OLS Adj. R-squared: 0.112

Method: Least Squares F-statistic: 49.33

Date: Tue, 08 Nov 2022 Prob (F-statistic): 7.05e-21

Time: 22:28:35 Log-Likelihood: -3703.7

No. Observations: 768 AIC: 7413.

Df Residuals: 765 BIC: 7427.

Df Model: 2

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

const 70.2952 5.402 13.013 0.000 59.691 80.899

Age 0.6955 0.093 7.514 0.000 0.514 0.877

BMI 0.8589 0.138 6.220 0.000 0.588 1.130

Omnibus: 18.855 Durbin-Watson: 1.836

Prob(Omnibus): 0.000 Jarque-Bera (JB): 38.868

Skew: -0.007 Prob(JB): 3.63e-09

Kurtosis: 4.102 Cond. No. 235.

PROGRAM:

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

mean = df.loc[:,'Fare'].mean()

sd = df.loc[:,'Fare'].std()

plt.plot(x_axis, norm.pdf(x_axis, mean, sd))

plt.show()
OUTPUT:

NORMAL CURVE:
PROGRAM:

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

sns.distplot(df["Fare"]) sns.distplot(df["Age"])

plt.contour(df[["Fare","Parch"]])
OUTPUT:

DENSITY PLOT:

CONTOUR PLOT:
PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv") df.head()

plt.figure(figsize=(8,8))

sn.scatterplot(x="Age", y="Fare", hue="Sex", data=df) plt.show()

df.corr()

# plotting correlation heatmap

dataplot = sns.heatmap(df.corr(), cmap="YlGnBu", annot=True) # displaying heatmap

plt.show()
OUTPUT:

SCATTER PLOT:

HEAP MAP:
PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

plt.hist(df["Fare"])
OUTPUT:

HISTOGRAM:

array([732., 106., 31., 2., 11., 6., 0., 0., 0., 3.]),

array([ 0. , 51.23292, 102.46584, 153.69876, 204.93168, 256.1646 ,

307.39752, 358.63044, 409.86336, 461.09628, 512.3292 ]),

<BarContainer object of 10 artists>)

PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

frommpl_toolkits import mplot3d df=pd.read_csv("C:\\Users\\KSK\\Documents\\

train.csv") df.head()

%matplotlib inline

fig = plt.figure(figsize=(8,8)) ax = plt.axes(projection='3d') ax =

plt.axes(projection='3d') zline = np.linspace(0, 15, 1000) xline = np.sin(zline)

yline = np.cos(zline) ax.plot3D(xline, yline, zline, 'gray') zdata = df[["Fare"]]

xdata = df[["Age"]]

ydata = df[["Parch"]]

ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');

OUTPUT:

THREE DIMENSIONAL LINES:

THREE DIMENSIONAL SCATTERPLOT:

PROGRAM:

%matplotlib inline import numpy as np

import matplotlib.pyplot as plt

frommpl_toolkits.basemap i

mport Basemap plt.figure(figsize=(8, 8))

m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)

m.bluemarble(scale=0.5);

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='lcc', resolution=None, width=8E6, height=8E6,

lat_0=45, lon_0=-100,) m.etopo(scale=0.5, alpha=0.5) x, y = m(-122.3, 47.6)

plt.plot(x, y, 'ok', markersize=5) plt.text(x, y, ' Seattle', fontsize=12);

fig = plt.figure(figsize=(8, 6), edgecolor='w')

m = Basemap(projection='cyl', resolution=None, llcrnrlat=-90, urcrnrlat=90,

llcrnrlon=-180, urcrnrlon=180, ) draw_map(m)

fig = plt.figure(figsize=(8, 6), edgecolor='w')

m = Basemap(projection='moll', resolution=None, lat_0=0, lon_0=0)

draw_map(m)

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=0)

draw_map(m);

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='lcc', resolution=None, lon_0=0, lat_0=50, lat_1=45,

lat_2=55, width=1.6E7, height=1.2E7)

draw_map(m)

OUTPUT:

ORTHO PROJECTION:
MAPPING LONGITUDE AND LATITUDE:

CYLINDRICAL PROJECTIONS:
PSEUDO-CYLINDRICAL PROJECTIONS:

PERSPECTIVE PROJECTION:
CONIC PROJECTION:

Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
Ch-2 Python Libraries For ML
No ratings yet
Ch-2 Python Libraries For ML
70 pages
Attachment 3 Python For Data Analysis Lyst9850
No ratings yet
Attachment 3 Python For Data Analysis Lyst9850
31 pages
Lab Manual Fds
No ratings yet
Lab Manual Fds
44 pages
Unit 4
No ratings yet
Unit 4
27 pages
Unit-V Python - BCC402
No ratings yet
Unit-V Python - BCC402
20 pages
Cs3361-Data Science Lab Manual
No ratings yet
Cs3361-Data Science Lab Manual
44 pages
Data Science Using Python Lab Manual
No ratings yet
Data Science Using Python Lab Manual
68 pages
Unit 5 - Python Programming
No ratings yet
Unit 5 - Python Programming
9 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Week 4 - Introduction To Python #3
No ratings yet
Week 4 - Introduction To Python #3
47 pages
FDS Record Last
No ratings yet
FDS Record Last
61 pages
Fds Record
No ratings yet
Fds Record
69 pages
Unit Iv FDS
No ratings yet
Unit Iv FDS
142 pages
LT2 - 07 - Numpy Matplotlib Pandas
No ratings yet
LT2 - 07 - Numpy Matplotlib Pandas
101 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
M3-Introduction To Numpy and Pandas
No ratings yet
M3-Introduction To Numpy and Pandas
55 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
Nptel Presentation
No ratings yet
Nptel Presentation
24 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
NUMPY
No ratings yet
NUMPY
33 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
62 pages
B14 - LT2 - 07 - Numpy Matplotlib Pandas
No ratings yet
B14 - LT2 - 07 - Numpy Matplotlib Pandas
101 pages
Unit 5 PythonPackages (Matplotlib)
No ratings yet
Unit 5 PythonPackages (Matplotlib)
24 pages
Numpy
No ratings yet
Numpy
32 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
48 pages
Python Module 5
No ratings yet
Python Module 5
43 pages
Unit Iii Using Numpy
No ratings yet
Unit Iii Using Numpy
23 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
05-Unit-V Python Lecture Notes
No ratings yet
05-Unit-V Python Lecture Notes
14 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
62 pages
Dse Unit 3
No ratings yet
Dse Unit 3
12 pages
Python Numpy
No ratings yet
Python Numpy
4 pages
Python Activity
No ratings yet
Python Activity
81 pages
Self Numpy
No ratings yet
Self Numpy
6 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Unit 3
No ratings yet
Unit 3
56 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
Programming Notes 2
No ratings yet
Programming Notes 2
9 pages
Packages
No ratings yet
Packages
37 pages
Yousef Udacity Deep Learning Part1 Introdution + Part 2 NN
No ratings yet
Yousef Udacity Deep Learning Part1 Introdution + Part 2 NN
437 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
Numpy Lib
No ratings yet
Numpy Lib
19 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Google Colab: A Seminar Report On
No ratings yet
Google Colab: A Seminar Report On
17 pages
10 Numpy
No ratings yet
10 Numpy
39 pages
Data Engineers Guide To Python On Snowflake
No ratings yet
Data Engineers Guide To Python On Snowflake
15 pages
Numpy
No ratings yet
Numpy
9 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
Unit 5
No ratings yet
Unit 5
28 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
Data Preprocessing-AIML Algorithm1
No ratings yet
Data Preprocessing-AIML Algorithm1
47 pages
Report Final
No ratings yet
Report Final
42 pages
Basic Array Creation and Operations
No ratings yet
Basic Array Creation and Operations
27 pages
Major Report 1
No ratings yet
Major Report 1
48 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Value Added Course: Programming in Python and Machine Learning UNIT-2
No ratings yet
Value Added Course: Programming in Python and Machine Learning UNIT-2
41 pages
Python Programming For Economics and Finance
No ratings yet
Python Programming For Economics and Finance
267 pages
Final Internship REPORT
No ratings yet
Final Internship REPORT
41 pages
New Chat
No ratings yet
New Chat
30 pages
NumPy Python Library by ChatGPT
No ratings yet
NumPy Python Library by ChatGPT
30 pages
Python Programming For Economics Finance
No ratings yet
Python Programming For Economics Finance
267 pages
My Dsbda Miniproject 1
No ratings yet
My Dsbda Miniproject 1
23 pages
Reshmi-6 Merged
No ratings yet
Reshmi-6 Merged
54 pages
S2 Getting Started With Python and Excel
No ratings yet
S2 Getting Started With Python and Excel
25 pages
Week2-1 Numpy
No ratings yet
Week2-1 Numpy
43 pages
Akshat Sharma Skill Developement Lab File
No ratings yet
Akshat Sharma Skill Developement Lab File
37 pages
CLVII-Part A Lab Manual
No ratings yet
CLVII-Part A Lab Manual
57 pages
Batch 9
No ratings yet
Batch 9
90 pages
PDF To Study Fake News Detection in Online Social Media in Context of Machine DD
No ratings yet
PDF To Study Fake News Detection in Online Social Media in Context of Machine DD
75 pages
Exercise 8 - Nikki
No ratings yet
Exercise 8 - Nikki
11 pages
8.1 CUDA Setup For Google CoLab
No ratings yet
8.1 CUDA Setup For Google CoLab
10 pages
DAwHPC L03 Data Cleaning Practical
No ratings yet
DAwHPC L03 Data Cleaning Practical
43 pages
Introduction To Python PDF
No ratings yet
Introduction To Python PDF
7 pages
University Institute of Technology Rajiv Gandhi Proudyogikivishwavidyalaya Bhopal (M.P.)
No ratings yet
University Institute of Technology Rajiv Gandhi Proudyogikivishwavidyalaya Bhopal (M.P.)
44 pages
Assignment-2 Section-B
No ratings yet
Assignment-2 Section-B
4 pages
Essential Python Libraries
100% (1)
Essential Python Libraries
41 pages
Decimal Solutions
No ratings yet
Decimal Solutions
20 pages
CS8711 Set4
No ratings yet
CS8711 Set4
2 pages
CS8711 Set3
No ratings yet
CS8711 Set3
2 pages
Summer Internship Report
No ratings yet
Summer Internship Report
27 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
Interview Questions About Python Programming
No ratings yet
Interview Questions About Python Programming
16 pages
Python Syllabus
No ratings yet
Python Syllabus
10 pages
Smart Irrigation System
0% (1)
Smart Irrigation System
61 pages
It8761 Set 2
No ratings yet
It8761 Set 2
3 pages
Class Test One 06.11.24
No ratings yet
Class Test One 06.11.24
9 pages
Introduction To Jupyter Notebook - Python Numerical Methods
No ratings yet
Introduction To Jupyter Notebook - Python Numerical Methods
3 pages
Lab 1.1-JupyterNotebook-TheBasics - MD
No ratings yet
Lab 1.1-JupyterNotebook-TheBasics - MD
2 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
Deep Learning With PyTorch: Object Classification - Filliat Et Al
No ratings yet
Deep Learning With PyTorch: Object Classification - Filliat Et Al
3 pages
Face Mask Detection Project
0% (1)
Face Mask Detection Project
57 pages
Jupyter Cheat Sheet Python For Data Science: Working With Different Programming Languages Widgets
No ratings yet
Jupyter Cheat Sheet Python For Data Science: Working With Different Programming Languages Widgets
1 page
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Numpy Simply In Depth
From Everand
Numpy Simply In Depth
Ajit Singh
5/5 (1)