[go: up one dir, main page]

0% found this document useful (0 votes)
12 views55 pages

FINAL FDS MANUAL Print

The document provides an overview of several essential Python packages, including NumPy for array processing, SciPy for scientific computations, Pandas for data manipulation, Statsmodels for statistical modeling, and Jupyter for interactive computing. It also outlines the installation process for Python and these packages, along with example code snippets demonstrating basic functionalities such as array creation, manipulation, and data analysis. The document serves as a comprehensive guide for users looking to leverage these tools for data science and statistical analysis.

Uploaded by

durga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views55 pages

FINAL FDS MANUAL Print

The document provides an overview of several essential Python packages, including NumPy for array processing, SciPy for scientific computations, Pandas for data manipulation, Statsmodels for statistical modeling, and Jupyter for interactive computing. It also outlines the installation process for Python and these packages, along with example code snippets demonstrating basic functionalities such as array creation, manipulation, and data analysis. The document serves as a comprehensive guide for users looking to leverage these tools for data science and statistical analysis.

Uploaded by

durga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 55

FEATURES OF PYTHON PACKAGES:

1. NUMPY
One of the most fundamental packages in Python, NumPy is a general-purpose array-
processing package. It provides high-performance multidimensional array objects and tools
to work with the arrays. NumPy is an efficient container of generic multi-dimensional data.
NumPy’s main object is the homogeneous multidimensional array. It is a table of Elements
or numbers of the same datatype, indexed by a tuple of positive integers. In NumPy,
dimensions are called axes and the number of axes is called rank. NumPy’s array class is
called ndarray aka array.
 Basic array operations: add, multiply, slice, flatten, reshape, index arrays
 Advanced array operations: stack arrays, split into sections, broadcast arrays
 Work with DateTime or Linear Algebra
 Basic Slicing and Advanced Indexing in NumPy Python.

2. SCIPY
The SciPy library is one of the core packages that make up the SciPy stack. Now, there is a
difference between SciPy Stack and SciPy, the library. SciPy builds on the NumPy array
object and is part of the stack which includes tools like Matplotlib, Pandas, and SymPy with
additional tools, SciPy library contains modules for efficient mathematical routines as linear
algebra, interpolation, optimization, integration, and statistics. There are various issues
related to Scientific Computation that arises while working with data science.
 SciPy provides us with a variety of sub-packages to solve these issues efficiently.
 SciPy library has amazingly fast computational power and easy to use.
 It can operate an array of NumPy libraries and has also optimized the functions used
in NumPy.
 After GNU Scientific library, SciPy is one of the most used scientific libraries.

3. PANDAS
Pandas is an open-source Python package that provides high-performance, easy-to-use
data structures and data analysis tools for the labeled data in Python programming
language. Pandas stand for Python Data Analysis Library. Pandas is a perfect tool for data
wrangling or munging. It is designed for quick and easy data manipulation, reading,
aggregation, and visualization. Pandas take data in a CSV or TSV file or a SQL database
and create a Python object with rows and columns called a data frame. The data frame is
very similar to a table in statistical software, say Excel or SPSS.

 Indexing, manipulating, renaming, sorting, merging data frame


 Update, Add, Delete columns from a data frame
 Impute missing files, handle missing data or NANs
 Plot data with histogram or box plot

4. STATSMODELS
Statsmodels is built for hardcore statistics. The core of the Statsmodels Library is
production ready”. Traditional models like robust linear models, generalized linear model
(GLM) etc. have all been around for a long time and have been validated against “R &
Stata”. It also contains the time series analysis section, which includes vector
autoregression (VAR), AR & ARMA.
 Linear/ Multiple regression – Linear regression is a statistical method for modeling
the linear relationship between a dependent variable and one or more explanatory
variables.
 Logistic regression – The logistic model is used in statistics to model the
likelihood of a specific event/class occurring such as win/lose, pass/fail, etc.
 Time series analysis – It refers to the analysis of time series data to retrieve
meaningful statistics and many other data characteristics
 Statistical tests – Refers to the many statistical tests that can be done using the
Statsmodels Library.
5. JUPYTER
Project Jupyter is a suite of software products used in interactive computing. Packages
under Jupyter project include
Jupyter notebook − A web based interface to programming environments of Python,
Julia, R and many others
QtConsole − Qt based terminal for Jupyter kernels similar to IPython
nbviewer − Facility to share Jupyter notebooks
JupyterLab − Modern web based integrated interface for all products.
 Offers a powerful interactive Python shell.
 Acts as a main kernel for Jupyter notebook and other front end tools of Project
Jupyter.
 Possesses object introspection ability. Introspection is the ability to check
properties of an object during runtime.
 Syntax highlighting.
 Stores the history of interactions.
 Tab completion of keywords, variables and function names.
 Magic command system useful for controlling Python environment and
performing OS tasks.
PYTHON INSTALLATION
 Open the python official web site. (https://www.python.org/)
 Downloads ==> Windows ==> Select Recent Release. (Requires Windows 10 or above
versions)
 Install "python-3.10.6-amd64.exe"

PACKAGE INSTALLATION
Open command prompt and enter the following code to check whether the python was installed
properly or not, “python –version”. If installation is proper it returns the version of python

Enter the following code to check whether the python package manager was installed properly
or not, “pip –version”.

If installation is proper it returns the version of python package manager

 Enter the following code to install the Numpy library: pip install numpy
 Enter the following code to install the SciPy library: pip install scipy
 Enter the following code to install the Statsmodels library: pip install statsmodels
 Enter the following code to install the Pandas library: pip install Pandas
 Enter the following code to install the Jupyter: pip install Jupyter
OUTPUT:
PROGRAM:

1. Creating Arrays:

 0-D Arrays
Each value in an array is a 0-D array.

import numpy as np
arr = np.array(42)
print(arr)
 1-D Arrays
An array that has 0-D arrays as its elements is called 1-D array.

import numpy as np
arr = np.array([1, 2,3, 4, 5])
print(arr)
 2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array.

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
 3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
2. Array Dimensions:
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim) print(b.ndim) print(c.ndim) print(d.ndim)
3. Access 2-D Arrays:
To access elements from 2-D arrays we can use comma separated integers
representing the dimension and the index of the element.

import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])

4. Access 3-D Arrays:


To access elements from 3-D arrays we can use comma separated integers
representing the dimensions and the index of the element.

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])

5. Array Slicing:
Slicing in python means taking elements from one given index to another given index.
We pass slice instead of index like this: [start:end]. We can also define the step, like
this: [start:end:step].

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])

6. Data Types:
NumPy has some extra data types, and refer to data types with one character, like i for
integers, u for unsigned integers etc.
import numpy as np
arr = np.array([1, 2, 3, 4], dtype='S')
print(arr)
print(arr.dtype)

7. Copy & View:


import numpy as np
arr = np.array([1, 2, 3, 4, 5]) x = arr.copy()
arr[0] = 42
print(arr)
print(x)

8. Make a view:
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) x = arr.view()
arr[0] = 42
print(arr) print(x)

9. Array Shape & Reshaping:


Array Shape NumPy arrays have an attribute called shape that returns a tuple with
each index having the number of corresponding elements.
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)

10. Array Reshaping:


Reshaping means changing the shape of an array. By reshaping we can add or remove
dimensions or change number of elements in each dimension.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2) print(newarr)

11. Array Iterating:


Iterating means going through elements one by one. As we deal with multi-
dimensional arrays in numpy, we can do this using basic for loop of python.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
print(x)

12. Joining Array:


Joining means putting contents of two or more arrays in a single array.
import numpy as np
arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

13. Splitting Array:


Splitting is reverse operation of Joining. Joining merges multiple arrays into one and
Splitting breaks one array into multiple.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6]) newarr = np.array_split(arr, 3)
print(newarr)

14. Searching Arrays:


We can search an array for a certain value, and return the indexes that get a match. To
search an array, use the where() method.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4]) x = np.where(arr == 4)
print(x)

15. Sorting:
Sorting means putting elements in an ordered sequence. Ordered sequence is any
sequence that has an order corresponding to elements, like numeric or alphabetical,
ascending or descending. The NumPy ndarray object has a function called sort(), that
will sort a specified array.
import numpy as np
arr = np.array([3, 2, 0, 1]) print(np.sort(arr))
16. Filtering Arrays:
Getting some elements out of an existing array and creating a new array out of them is
called filtering. In NumPy, you filter an array using a boolean index list.
import numpy as np
arr = np.array([41, 42, 43, 44]) x = [True, False, True, False] newarr = arr[x]
print(newarr)

OUTPUT:
PROGRAM:

import numpy as np

a = np.array([[1,2,3], [4,5,6], [7,8,9]])

print("The first matrix value is ::>",a)

b = np.array([[2,3,4],[5,6,7], [8,9,10]])

print("The second matrix value is ::>",b)

mul= np.multiply(a,b)

add= np.add(a,b)

sub=np.subtract(a,b)

div=np.divide(a,b)

print("Addition Matrix Resultant is ::>",add)

print("Subtraction Matrix Resultant is ::>",sub)

print("Division Matrix Resultant is ::>",div)

print("Multiplication Matrix Resultant is ::>",mul)


OUTPUT:
PROGRAM:

import pandas as pd

df = pd.DataFrame({ 'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha

Hinton', 'Syed Wharton'],

'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],

'Age': [18.5, 21.2, 22.5, 22, 23]})

print("Original DataFrame:")

print(df)

df1 = df.copy(deep = True)

df = df.drop([0, 1])

df1 = df1.drop([2])

print("\nNew DataFrames:")

print(df) print(df1)

print('\n"one_to_one”: check if merge keys are unique in both left and right datasets:"')

df_one_to_one = pd.merge(df, df1, validate = "one_to_one")

print(df_one_to_one)

print('\n"one_to_many” or “1:m”: check if merge keys are unique in left dataset:')

df_one_to_many = pd.merge(df, df1, validate = "one_to_many")

print(df_one_to_many)

print('“many_to_one” or “m:1”: check if merge keys are unique in right dataset:')


df_many_to_one = pd.merge(df, df1, validate = "many_to_one")

print(df_many_to_one)
PROGRAM:

#DATA COLLECT

import pandas as pd

import numpy as np

importmatplotlib.pyplot as plt

importseaborn as sns

dataset=pd.read_csv("iris.txt")

dataset.head()

dataset=pd.read_excel("iris.xlsx")

dataset.head()

dataset=pd.read_csv("iris.csv")

dataset.head()

dataset.info()

dataset.Species.unique()

#EDA

dataset.describe()

dataset.corr()

dataset.Species.value_counts()

sns.FacetGrid(dataset,hue="Species",size=6).map(plt.scatter,"Sepal.Length","Sepal.Width")

add_legend()

sns.FacetGrid(dataset,hue="Species",size=6).map(plt.scatter,"Petal.Length","Petal.Widh")

add_legend()

sns.pairplot(dataset,hue="Species")

plt.hist(dataset["Sepal.Length"],bin=25);

sns.FacetGrid(dataset,hue="Species",size=6).map(sns.displot,"Sepal.Width").add_legend();
sns.boxplot(x='Species',y='Petal.Length',data=dataset)

#PREPROCESSING

fromsklearn.preprocessing import StandardScaler

ss=StandardScaler()

x=dataset.drop(['Species'],axis=1) y=dataset['Species']

scaler=ss.fit(x)

x_stdscaler=scaler.transform(x) x_stdscaler

fromsklearn.preprocessing import LabelEncoder

le=LabelEncoder()

y=le.fit_transform(y)

#SPLITTING

From sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)

x_train.value_counts

#MODEL SELECTION

From sklearn.svm import SVC

svc=SVC(kernel="linear")

svc.fit(x_train,y_train)

y_pred=svc.predict(x_test)

y_pred

fromsklearn.metrics import accuracy_score

accuracy_score(y_pred,y_test)

#PREDICTION

fromsklearn.neighbors import KNeighborsClassifier

knn=KNeighborsClassifier(n_neighbors=3)

knn.fit(x_train,y_train)
KNeighborsClassifier(n_neighbors=3)

y_pred=knn.predict(x_test)

accuracy_score(y_pred,y_test)

OUTPUT:

DATASET HEADS:

Unnamed Sepal. Sepal.


Petal.Length Petal.Width Species
:0 Length Width

0 1 5.1 3.5 1.4 0.2 setosa

1 2 4.9 3.0 1.4 0.2 setosa

2 3 4.7 3.2 1.3 0.2 setosa

3 4 4.6 3.1 1.5 0.2 setosa

4 5 5.0 3.6 1.4 0.2 setosa

DATASET INFORMATION:
<class
'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype

0 Unnamed: 0 150 non-null int64


1 Sepal.Length 150 non-null float64
2 Sepal.Width 150 non-null float64
3 Petal.Length 150 non-null float64
4 Petal.Width 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB

DATASET UNIQUE:

array(['setosa', 'versicolor', 'virginica'], dtype=object)

DATASET SPECIES VALUE COUNTS:

setosa 50

versicolor 50

virginica 50

Name: Species, dtype: int64

DATASET DESCRIPTION:

Unnamed: 0 Sepal.Length Sepal.Width Petal.Length Petal.Width

150.0000
count 150.000000 150.000000 150.000000 150.000000
00

mean 75.500000 5.843333 3.057333 3.758000 1.199333

std 43.445368 0.828066 0.435866 1.765298 0.762238

min 1.000000 4.300000 2.000000 1.000000 0.100000

25% 38.250000 5.100000 2.800000 1.600000 0.300000

50% 75.500000 5.800000 3.000000 4.350000 1.300000

75% 112.750000 6.400000 3.300000 5.100000 1.800000

max 150.000000 7.900000 4.400000 6.900000 2.500000


Sepal.Length
Unnamed: 0 sepal.Width Petal.Length Petal.Width

Unnamed: 0 1.000000 0.716676 -0.402301 0.882637 0.900027

Sepal.Length 0.716676 1.000000 -0.117570 0.871754 0.817941

Sepal.Width -0.402301 -0.117570 1.000000 -0.428440 -0.366126

Petal.Length 0.882637 0.871754 -0.428440 1.000000 0.962865

Petal.Width 0.900027 0.817941 0.366126 0.962865 1.000000

DATASET CORRELATION:

SCATTER PLOT:
PAIRPLOT:

HISTOGRAM:
BOXPLOT:

PREPROCESSING:

array([[-1.72054204e+00, -9.00681170e-01, 1.01900435e+00,


-1.34022653e+00, -1.31544430e+00],

[-1.69744751e+00, -1.14301691e+00, -1.31979479e-01,

-1.34022653e+00, -1.31544430e+00],

[-1.67435299e+00, -1.38535265e+00, 3.28414053e-01,

-1.39706395e+00, -1.31544430e+00],

[-1.65125846e+00, -1.50652052e+00, 9.82172869e-02,

-1.28338910e+00, -1.31544430e+00],

[-1.58197489e+00, -1.50652052e+00, 7.88807586e-01, [-2.42492502e-01, -2.94841818e-01, -


3.62176246e-01, 7.62758269e-01, 7.90670654e-01]])

SPLITTING:

bound method DataFrame.value_counts of Unnamed: 0

Sepal.LengthSepal.WidthPetal.LengthPetal.Width

81 82 5.5 2.4 3.7 1.0

133 134 6.3 2.8 5.1 1.5

137 138 6.4 3.1 5.5 1.8

75 76 6.6 3.0 4.4 1.4

109 110 7.2 3.6 6.1 2.5

.. ... ... ... ... ...

71 72 6.1 2.8 4.0 1.3

106 107 4.9 2.5 4.5 1.7

14 15 5.8 4.0 1.2 0.2

92 93 5.8 2.6 4.0 1.2

102 103 7.1 3.0 5.9 2.1


[105 rows x 5 columns]>

MODEL SELECTION:

1.0

PREDICTION:

1.0

PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

df=pd.read_csv("diabetes_csv.csv")

df.head()

df.skin.value_counts()

df.mean(axis = 0)

print(df.loc[:,'skin'].mean())

df.mean(axis = 1)[0:5]

df.median()

print(df.loc[:,'skin'].median())

df.median(axis = 1)[0:5] df.mode()

df.std() print(df.loc[:,'skin'].std())

df.std(axis = 1)[0:5]

df.var()
print(df.skew())

df.describe()

df.describe(include='all')

print(df.kurtosis())

norm_data = pd.DataFrame(np.random.normal(size=100000)) norm_data.plot(kind="density",


figsize=(10,10));

# Plot black line at mean

plt.vlines(norm_data.mean(), ymin=0, ymax=0.4,linewidth=5.0); # Plot red line at median

plt.vlines(norm_data.median(), ymin=0, ymax=0.4, linewidth=2.0, color="red");

OUTPUT:

HEAD DATA’S:

preg Plas pres skin insu mass pedi age class

0 6 148 72 35 0 33.6 0.627 50 tested_positive

1 1 85 66 29 0 26.6 0.351 31 tested_negative

2 8 183 64 0 0 23.3 0.672 32 tested_positive

3 1 89 66 23 94 28.1 0.167 21 tested_negative

4 0 137 40 35 168 43.1 2.288 33 tested_positive

FREQUENCY:

0 227
32 31
30 27
27 23
23 22
33 20
28 20
18 20
31 19
19 18
39 18
29 17
40 16
25 16

MEAN:

20.536458333333332

0 43.153375

1 29.868875

2 38.871500

3 40.283375

4 57.298500

dtype: float64

MODE:
preg plas pres skin insu mass pedi age class

0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 tested_negative

1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN

MEDIAN:

23.0

0 34.30

1 27.80

2 15.65

3 25.55

4 37.50

dtype: float64
STANDARD DEVIATION:

15.952217567727677

0 49.397286

1 31.519803

2 62.253392

3 37.591100

4 61.533847

VARIANCE:

preg 11.354056

plas 1022.248314

pres 374.647271

skin 254.473245

insu 13281.180078

mass 62.159984

pedi 0.109779

age 138.303046

dtype: float64

SKEWNESS:

preg 0.901674

plas 0.173754

pres -1.843608

skin 0.109372

insu 2.272251

dtype: float64

KURTOSIS:
preg 0.159220

plas 0.640780

pres 5.180157

skin -0.520072

insu 7.214260

mass 3.290443

pedi 5.594954

age 0.643159

dtype: float64

GRAPH:
PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

df=pd.read_csv("pima-indians-diabetes.csv")

df.head()

df.mean(axis = 0)

print(df.loc[:,'35'].mean())

df.mean(axis = 1)[0:5]

df.median()

print(df.loc[:,'33.6'].median())

df.median(axis = 1)[0:5] df.mode()

df.std()

print(df.loc[:,'35'].std())

df.std(axis = 1)[0:5] df.var()

print(df.skew())

print(df.kurtosis())
norm_data = pd.DataFrame(np.random.normal(size=100000))
norm_data.plot(kind="density",figsize=(10,10));

# Plot black line at mean

plt.vlines(norm_data.mean(),ymin=0, ymax=0.4,linewidth=5.0); # Plot red line at median

plt.vlines(norm_data.median(), ymin=0, ymax=0.4, linewidth=2.0,color="red");

OUTPUT:

HEAD DATA’S:

6 148 72 35 0 33.6 0.627 50 1

0 1 85 66 29 0 26.6 0.351 31 0

1 8 183 64 0 0 23.3 0.672 32 1

2 1 89 66 23 94 28.1 0.167 21 0

3 0 137 40 35 168 43.1 2.288 33 1

4 5 116 74 0 0 25.6 0.201 30 0


MEAN:

20.517601043024772

0 26.550111

1 34.663556

2 35.807444

3 51.043111

4 27.866778

dtype: float64
MODE:

6 148 72 35 0 33.6 0.627 50 1

0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 0.0

1 NaN 100 NaN NaN NaN NaN 0.258 NaN NaN

MEDIAN:

32.0

0 26.6

1 8.0

2 23.0

3 35.0

4 5.0

dtype: float64

STANDARD DEVIATION:

15.954059060433842

0 31.119744

1 59.585320

2 37.639873

3 60.541569

4 41.114755

dtype: float64

VARIANCE:
6 11.362809

148 1022.622445

72 375.125415

35 254.532001

0 13290.194335

33.6 62.237755

0.627 0.109890

50 138.116452

1 0.227226

dtype: float64

SKEWNESS:

6 0.903976

148 0.176412

72 -1.841911

35 0.112058

0 2.270630

33.6 -0.427950

0.627 1.921190

50 1.135165

1 0.638949

dtype: float64

KURTOSIS:

6 0.161293

148 0.642992

72 5.168578

35 -0.518325

0 7.205266

33.6 3.282498
0.627 5.593374

50 0.660872

1 -1.595913

dtype: float64

GRAPH:
PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn import datasets

%matplotlib inline

diabetes=pd.read_csv("C:\\Users\\KSK\\Documents\\diabetes.csv")

diabetes.head()

diabetes = datasets.load_diabetes()

print(diabetes.DESCR)

diabetes.feature_names

# Now we will split the data into the independent and independent variable

X = diabetes.data[:,np.newaxis,3]

Y = diabetes.target

#We will split the data into training and testing data fromsklearn.model_selection

import train_test_split x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.3)

# Linear Regression

fromsklearn.linear_model import LinearRegression


reg=LinearRegression()

reg.fit(x_train,y_train)

y_pred = reg.predict(x_test)

Coef=reg.coef_

print(Coef)

fromsklearn.metrics import mean_squared_error, r2_score

MSE=mean_squared_error(y_test,y_pred)

R2=r2_score(y_test,y_pred) print(R2,MSE)

frommatplotlib.pyplot

import * importmatplotlib.pyplot as plt

plt.scatter(y_pred, y_test)

plt.title('Predicted data vs Real Data')

plt.xlabel('y_pred') plt.ylabel('y_test')

plt.show() plt.scatter(x_test, y_test)

plt.plot(x_test,y_pred,linewidth=2)

plt.title('Linear Regression')

plt.xlabel('y_pred')

plt.ylabel('y_test')

plt.show()

model = LogisticRegression()

model.fit(x_train,y_train)

y_predict=model.predict(x_test)

model_score = model.score(x_test,y_test)

print(model_score)

print(metrics.confusion_matrix(y_test, y_predict))
OUTPUT:

DIABETES DESCRIPTION:

Diabetes dataset

Ten baseline variables, age, sex, body mass index, average blood

Pressure, and six blood serum measurements were obtained for each of n = 442
diabetes patients, as well as the response of interest, a

Quantitative measure of disease progression one year after baseline.

**Data Set Characteristics: **

: Number of Instances: 442

: Number of Attributes: First 10 columns are numeric predictive values

: Target: Column 11 is a quantitative measure of disease progression one year after


baseline

: Attribute Information:

- Age age in years

- Sex

- bmi body mass index

- bp average blood pressure

- s1 tc, total serum cholesterol

- s2 ldl, low-density lipoproteins


- s3 hdl, high-density lipoproteins

- s4 tch, total cholesterol / HDL

- s5 ltg, possibly log of serum triglycerides level

- s6 glu, blood sugar level

COEFFICIENT VALUE:

[731.87600042]

MEAN SQUARE ERROR AND R2 VALUE:

0.16465773342986756 & 4765.090270861111

PREDICTED DATA VS REAL DATA:

LINEAR REGRESSION:
MODEL SCORE FOR LOGISTIC REGRESSION:

0.007518796992481203

CONFUSION MATRIX FOR LOGISTIC REGRESSION:

[[130 17]

[ 38 46]]
PROGRAM:

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from sklearn import datasets %matplotlib inline

diabetes=pd.read_csv("C:\\Users\\KSK\\Documents\\FDS LAb\\diabetes.csv")

diabetes.head()

importstatsmodels.api as sm

fromstatsmodels.stats.anova import anova_lm

X = diabetes[["Age", "BMI"]]## the input variables

y = diabetes["Glucose"] ## the output variables, the one you want to predict

X = sm.add_constant(X) ## let's add an intercept (beta_0) to our model

# Note the difference in argument order model2 = sm.OLS(y, X).fit()

predictions = model2.predict(X) # make the predictions by the model # Print out the
statistics
model2.summary()

OUTPUT:

HEAD DATA’S:

Blood Skin DiabetesPedigree


Pregnancies Glucose Insulin BMI Age Outcome
Pressure Thickness
Function

0 6 148 72 35 0 33.6 0.627 50 1

1 1 85 66 29 0 26.6 0.351 31 0

2 8 183 64 0 0 23.3 0.672 32 1


3
1 89 66 23 94 28.1 0.167 21 0

4 0 137 40 35 168 43.1 2.288 33 1

OLS Regression Results

Dep. Variable: Glucose R-squared: 0.114

Model: OLS Adj. R-squared: 0.112

Method: Least Squares F-statistic: 49.33

Date: Tue, 08 Nov 2022 Prob (F-statistic): 7.05e-21

Time: 22:28:35 Log-Likelihood: -3703.7

No. Observations: 768 AIC: 7413.

Df Residuals: 765 BIC: 7427.

Df Model: 2

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

const 70.2952 5.402 13.013 0.000 59.691 80.899

Age 0.6955 0.093 7.514 0.000 0.514 0.877

BMI 0.8589 0.138 6.220 0.000 0.588 1.130

Omnibus: 18.855 Durbin-Watson: 1.836

Prob(Omnibus): 0.000 Jarque-Bera (JB): 38.868

Skew: -0.007 Prob(JB): 3.63e-09


Kurtosis: 4.102 Cond. No. 235.

PROGRAM:

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

mean = df.loc[:,'Fare'].mean()

sd = df.loc[:,'Fare'].std()

plt.plot(x_axis, norm.pdf(x_axis, mean, sd))

plt.show()
OUTPUT:

NORMAL CURVE:
PROGRAM:

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

sns.distplot(df["Fare"]) sns.distplot(df["Age"])

plt.contour(df[["Fare","Parch"]])
OUTPUT:

DENSITY PLOT:

CONTOUR PLOT:
PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv") df.head()

plt.figure(figsize=(8,8))

sn.scatterplot(x="Age", y="Fare", hue="Sex", data=df) plt.show()

df.corr()

# plotting correlation heatmap

dataplot = sns.heatmap(df.corr(), cmap="YlGnBu", annot=True) # displaying heatmap

plt.show()
OUTPUT:

SCATTER PLOT:

HEAP MAP:
PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

df=pd.read_csv("C:\\Users\\KSK\\Documents\\train.csv")

df.head()

plt.hist(df["Fare"])
OUTPUT:

HISTOGRAM:

array([732., 106., 31., 2., 11., 6., 0., 0., 0., 3.]),

array([ 0. , 51.23292, 102.46584, 153.69876, 204.93168, 256.1646 ,

307.39752, 358.63044, 409.86336, 461.09628, 512.3292 ]),

<BarContainer object of 10 artists>)


PROGRAM:

import numpy as np

import pandas as pd

importseaborn as sn

%matplotlib inline importseaborn as sns

importmatplotlib.pyplot as plt

frommpl_toolkits import mplot3d df=pd.read_csv("C:\\Users\\KSK\\Documents\\


train.csv") df.head()

%matplotlib inline

fig = plt.figure(figsize=(8,8)) ax = plt.axes(projection='3d') ax =


plt.axes(projection='3d') zline = np.linspace(0, 15, 1000) xline = np.sin(zline)

yline = np.cos(zline) ax.plot3D(xline, yline, zline, 'gray') zdata = df[["Fare"]]

xdata = df[["Age"]]

ydata = df[["Parch"]]

ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');


OUTPUT:

THREE DIMENSIONAL LINES:

THREE DIMENSIONAL SCATTERPLOT:


PROGRAM:

%matplotlib inline import numpy as np

import matplotlib.pyplot as plt

frommpl_toolkits.basemap i

mport Basemap plt.figure(figsize=(8, 8))

m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)


m.bluemarble(scale=0.5);

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='lcc', resolution=None, width=8E6, height=8E6,

lat_0=45, lon_0=-100,) m.etopo(scale=0.5, alpha=0.5) x, y = m(-122.3, 47.6)

plt.plot(x, y, 'ok', markersize=5) plt.text(x, y, ' Seattle', fontsize=12);

fig = plt.figure(figsize=(8, 6), edgecolor='w')

m = Basemap(projection='cyl', resolution=None, llcrnrlat=-90, urcrnrlat=90,

llcrnrlon=-180, urcrnrlon=180, ) draw_map(m)

fig = plt.figure(figsize=(8, 6), edgecolor='w')


m = Basemap(projection='moll', resolution=None, lat_0=0, lon_0=0)

draw_map(m)

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=0)

draw_map(m);

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='lcc', resolution=None, lon_0=0, lat_0=50, lat_1=45,


lat_2=55, width=1.6E7, height=1.2E7)

draw_map(m)

OUTPUT:

ORTHO PROJECTION:
MAPPING LONGITUDE AND LATITUDE:

CYLINDRICAL PROJECTIONS:
PSEUDO-CYLINDRICAL PROJECTIONS:

PERSPECTIVE PROJECTION:
CONIC PROJECTION:

You might also like