[go: up one dir, main page]

0% found this document useful (0 votes)
10 views6 pages

Logistic Regression

The document outlines a logistic regression analysis using a heart disease dataset, which includes various patient features and a target variable indicating heart disease presence. It details data loading, preprocessing, model training, and evaluation, achieving perfect accuracy, precision, and recall. A confusion matrix is visualized to illustrate the model's performance.

Uploaded by

haneesh Kakumani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Logistic Regression

The document outlines a logistic regression analysis using a heart disease dataset, which includes various patient features and a target variable indicating heart disease presence. It details data loading, preprocessing, model training, and evaluation, achieving perfect accuracy, precision, and recall. A confusion matrix is visualized to illustrate the model's performance.

Uploaded by

haneesh Kakumani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

logisticregression

[1]: import numpy as np


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

[2]: heartdf=pd.read_csv("heart.csv")

[3]: heartdf.head()

[3]: Unnamed: 0 age sex cp trestbps chol fbs restecg thalach exang \
0 0 63 1 3 145 233 1 0 150 0
1 1 37 1 2 130 250 0 1 187 0
2 2 41 0 1 130 204 0 0 172 0
3 3 56 1 1 120 236 0 1 178 0
4 4 57 0 0 120 354 0 1 163 1

oldpeak slope ca thal target


0 2.3 0 0 1 1 1
3.5 0 0 2 1 2 1.4
2 0 2 1 3 0.8 2
0 2 1
4 0.6 2 0 2 1
[4]: heartdf.tail()

[4]: Unnamed: 0 age sex cp trestbps chol fbs restecg thalach exang \
298 298 57 0 0 140 241 0 1 123 1
299 299 45 1 3 110 264 0 1 132 0
300 300 68 1 0 144 193 1 1 141 0
301 301 57 1 0 130 131 0 1 115 1
302 302 57 0 1 130 236 0 0 174 0

oldpeak slope ca thal target


298 0.2 1 0 3 0
299 1.2 1 0 3 0
300 3.4 1 2 3 0
301 1.2 1 1 3 0
302 0.0 1 1 2 0
[5]: heartdf.shape

[5]: (303, 15)

1
[6]: heartdf.columns

[6]: Index(['Unnamed: 0', 'age', 'sex', 'cp', 'trestbps', 'chol', 'fbs',


'restecg',
'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal',
'target'], dtype='object')
[7]: heartdf.dtypes

[7]: Unnamed: 0 int64


age int64
sex int64
cp int64
trestbps int64
chol int64
fbs int64
restecg int64
thalach int64
exang int64
oldpeak float64
slope int64
ca int64
thal int64
target int64
dtype: object
[10]: heartdf.info()

<class
'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to
302 Data columns (total 15
columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 303 non- int64
null
1 age 303 non- int64
null
2 sex 303 non- int64
null
3 cp 303 non- int64
null
4 trestbps 303 non- int64
null
5 chol 303 non- int64
null
6 fbs 303 non- int64

2
null
7 restecg 303 non- int64
null
8 thalach 303 non- int64
null
9 exang 303 non-null
int64
10 oldpeak 303 non-null
float64
11 slope 303 non-null
int64
12 ca 303 non-null
int64
13 thal 303 non-null
int64
14 target 303 non- int64
null dtypes:
float64(1), int64(14)
memory usage: 35.6 KB
[11]: heartdf.describe()

[11]: Unnamed: 0 age sex cp trestbps chol \


count 303.000000 303.000000 303.000000 303.000000 303.000000
303.000000
mean 151.000000 54.366337 0.683168 0.966997 131.623762 246.264026
std 87.612784 9.082101 0.466011 1.032052 17.538143 51.830751
min 0.000000 29.000000 0.000000 0.000000 94.000000 126.000000
25% 75.500000 47.500000 0.000000 0.000000 120.000000 211.000000
50% 151.000000 55.000000 1.000000 1.000000 130.000000 240.000000
75% 226.500000 61.000000 1.000000 2.000000 140.000000 274.500000
max 302.000000 77.000000 1.000000 3.000000 200.000000 564.000000

fbs restecg thalach exang oldpeak slope \


count 303.000000 303.000000 303.000000 303.000000 303.000000
303.000000
mean 0.148515 0.528053 149.646865 0.326733 1.039604 1.399340
std 0.356198 0.525860 22.905161 0.469794 1.161075 0.616226
min 0.000000 0.000000 71.000000 0.000000 0.000000 0.000000
25% 0.000000 0.000000 133.500000 0.000000 0.000000 1.000000
50% 0.000000 1.000000 153.000000 0.000000 0.800000 1.000000
75% 0.000000 1.000000 166.000000 1.000000 1.600000 2.000000
max 1.000000 2.000000 202.000000 1.000000 6.200000 2.000000

ca thal target
count 303.000000 303.000000
303.000000 mean 0.729373 2.313531
0.544554 std 1.022606 0.612277
0.498835 min 0.000000 0.000000
0.000000 25% 0.000000 2.000000
0.000000
50% 0.000000 2.000000 1.000000

3
75% 1.000000 3.000000 1.000000
max 4.000000 3.000000 1.000000
[12]: heartdf.isnull().sum()

[12]: Unnamed: 0 0
age 0 sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
target 0
dtype:
int64
[13]: heartdf.duplicated().sum()

[13]: 0

[14]: x=heartdf.drop('target',axis=1)
y=heartdf['target']

[15]: from sklearn.model_selection import train_test_split

[16]: xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.2,random_state=42)

[17]: from sklearn.linear_model import LogisticRegression

[18]: classifier=LogisticRegression()

[21]: classifier.fit(xtrain,ytrain)

C:\ProgramData\anaconda3\Lib\sitepackages\sklearn\linear_model\
_logistic.py:469: ConvergenceWarning: lbfgs failed to converge
(status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as


shown in:
https://scikit-learn.org/stable/modules/preprocessing.html

4
Please also refer to the documentation for alternative solver
options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(

[21]: LogisticRegression()
[22]: ypredict=classifier.predict(xtest)

[23]: ypredict[0:5]
[23]: array([0, 0, 1, 0, 1], dtype=int64)

[25]: from sklearn import metrics

[27]: conmatrix=metrics.confusion_matrix(ytest,ypredict)
conmatrix

[27]: array([[29, 0],


[ 0, 32]], dtype=int64)

[30]: acc=metrics.accuracy_score(ytest,ypredict)
prec=metrics.precision_score(ytest,ypredict)
rec=metrics.recall_score(ytest,ypredict)
print("Accuracy: ",acc," Precision: ",prec,"
Recall: ",rec)

Accuracy: 1.0 Precision: 1.0 Recall: 1.0

5
[33]: plt.title("Confusion matrix")
sns.heatmap(pd.DataFrame(conmatrix),annot=True,cmap='YlGnBu')
plt.show()

[ ]:

You might also like