28/06/2023, 00:19 Logistic Regression
Logistic Regression
Steps of Development of ML in Python
Importing necessary packages
Data preparation and preprocessing
Segregation of Data (Independent and Dependents)
Splitting the dataset into train data and test data
Choosing the model
Training the model
Testing model
Evaluation of the model
Prediction
Importing necessary packages
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
Data preparation and preprocessing
In [7]:
dataset = pd.read_csv("diabetes.csv")
dataset
Out[7]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction A
0 6 148 72 35 0 33.6 0.627
1 1 85 66 29 0 26.6 0.351
2 8 183 64 0 0 23.3 0.672
3 1 89 66 23 94 28.1 0.167
4 0 137 40 35 168 43.1 2.288
... ... ... ... ... ... ... ...
763 10 101 76 48 180 32.9 0.171
764 2 122 70 27 0 36.8 0.340
765 5 121 72 23 112 26.2 0.245
766 1 126 60 0 0 30.1 0.349
767 1 93 70 31 0 30.4 0.315
768 rows × 9 columns
localhost:8888/nbconvert/html/ML/Logistic Regression.ipynb?download=false 1/4
28/06/2023, 00:19 Logistic Regression
In [8]:
print(dataset.columns)
print(dataset.shape)
print(dataset.info())
print(dataset.isnull().sum())
Index(['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome'],
dtype='object')
(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None
Pregnancies 0
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
Segregation of Data (Independent and Dependents)
In [9]:
X = data.drop('Outcome', axis=1)
Y = data['Outcome']
Splitting the dataset into train data and test data
In [10]:
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2)
Choosing the model
In [11]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
Training the model
In [12]:
model.fit(X_train,Y_train)
localhost:8888/nbconvert/html/ML/Logistic Regression.ipynb?download=false 2/4
28/06/2023, 00:19 Logistic Regression
C:\Users\chinu\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:763: Co
nvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
LogisticRegression()
Out[12]:
Teting model
In [13]:
predictions = model.predict(X_test)
print(predictions)
[0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0
1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 1 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1
0 1 0 0 0 0]
Evaluation of model
In [25]:
from sklearn.metrics import confusion_matrix, accuracy_score , recall_score , precis
print(accuracy_score(Y_test,predictions))
print(recall_score(Y_test,predictions))
print(precision_score(Y_test,predictions))
print(confusion_matrix(Y_test,predictions))
0.7662337662337663
0.4807692307692308
0.7352941176470589
[[93 9]
[27 25]]
In [29]:
sns.distplot(predictions,hist=False, color = 'r', label = 'Predicted Values')
sns.distplot(Y_test, hist=False, color = 'b', label = 'Actual Values')
plt.legend(loc = "upper left")
plt.show()
C:\Users\chinu\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarn
ing: `distplot` is a deprecated function and will be removed in a future version. Pl
ease adapt your code to use either `displot` (a figure-level function with similar f
lexibility) or `kdeplot` (an axes-level function for kernel density plots).
warnings.warn(msg, FutureWarning)
C:\Users\chinu\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarn
ing: `distplot` is a deprecated function and will be removed in a future version. Pl
ease adapt your code to use either `displot` (a figure-level function with similar f
lexibility) or `kdeplot` (an axes-level function for kernel density plots).
warnings.warn(msg, FutureWarning)
localhost:8888/nbconvert/html/ML/Logistic Regression.ipynb?download=false 3/4
28/06/2023, 00:19 Logistic Regression
Predictions
In [51]:
Pregnancies = 8
Glucose = 183
BloodPressure = 64
SkinThickness = 0
Insulin = 0
BMI =23.3
DiabetesPedigreeFunction=0.672
Age = 32
new_data = [[Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,
DiabetesPedigreeFunction,Age]]
prediction = model.predict(new_data)
print(prediction)
[1]
In [ ]:
localhost:8888/nbconvert/html/ML/Logistic Regression.ipynb?download=false 4/4