0% found this document useful (0 votes)

205 views33 pages

03-Supervised Machine Learning Classification

Uploaded by

suraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

205 views33 pages

03-Supervised Machine Learning Classification

Uploaded by

suraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Final Project ML

Classification: Heart
Disease Prediction

IBM Machine Learning Professional Certificate

Course 03: Supervised Machine Learning: Classification

By Shubham Bohra
Contents
• Dataset Description
• Main objectives of the analysis.
• Applying various classification models.
• Machine learning analysis and findings.
• Models flaws and advanced steps.

Supervised Machine Learning: Classification 2

Data Description Section

Supervised Machine Learning: Classification 3

Introduction

Predicting and diagnosing heart disease is one of the major challenges in the medical industry as it depends on
several factors including physical examination and various symptoms and signs present in the patient. Heart disease
is considered as one of the deadliest disease in the world for human life due to the heart's inability to push the
required amount of blood to other body organs to perform the regular functions in the human body. There are
several factors affecting heart disease include but not limited cholesterol levels in the body, smoking habits and
obesity, family history of diseases, blood pressure, work environment and others. Today, ML Algorithms play an
essential and accurate role in heart disease prediction. Rapid advances in technology allow Machine Language to
integrate with Big Data tools to manage the exponentially growing unstructured data that includes medical data for
patients around the world. Heart disease can be predicted based on different symptoms like age, gender, heart rate
etc. which in turn reduces the death rate for heart patients. In this report we are going to use machine learning
algorithms and Python language to do that!

Supervised Machine Learning: Classification 4

Dataset Description 01

• age: The person’s age in years • trestbps: The person’s resting blood
pressure (mm Hg on admission to the
• sex: The person’s sex (1 = male, 0 = female) hospital).

• cp: chest pain type: • chol: The person’s cholesterol

Value 0: asymptomatic measurement in mg/dl.
Value 1: atypical angina
Value 2: non-anginal pain • fbs: The person’s fasting blood sugar (>
Value 3: typical angina 120 mg/dl, 1 = true; 0 = false).
Supervised Machine
Learning: Classification 5
Dataset Description 02
• restecg: resting electrocardiographic results • slope: the slope of the peak exercise ST segment
Value 0: showing probable or definite 0: upsloping; 1: flat; 2: downsloping
left ventricular hypertrophy by Estes’
criteria • ca: The number of major vessels (0–3)
Value 1: normal
Value 2: having ST-T wave abnormality • thal: A blood disorder called thalassemia
(T wave inversions and/or ST elevation Value 0: NULL (dropped from the dataset previously
or depression of > 0.05 mV). Value 1: fixed defect (no blood flow in some part of
the heart)
• thalach: The person’s maximum heart rate Value 2: normal blood flow
achieved. Value 3: reversible defect (a blood flow is observed
but it is not normal)
• exang: Exercise induced angina (1 = yes; 0 = no)
• target: Heart disease (0 = no, 1= yes)
• oldpeak: ST depression induced by exercise
relative to rest (‘ST’ relates to positions on the
ECG plot. See more here)

Supervised Machine
Learning: Classification 6
Dataset Description 02

Supervised Machine
Learning: Classification 7
Dataset Description 03
Checking for Null values

Great, there is no missing values

within our features !

Supervised Machine Learning: Classification 8

Data Analysis Section

Supervised Machine Learning: Classification 9

Main Objective of the analysis:
In this section I am showing the correlation between the features to find the
most influence features on our target which is Target (Heart Disease Existence).

After that I am building different Classification models based on advanced

techniques such as GridSearch, ML pipelines, and Hyperparameters tuning to get
the best predictive model in terms of accuracy, in addition of what are the flaws
of each model.

Supervised Machine Learning: Classification 10

Data Analysis 01
- Identifying categorical features and continuous features:

Supervised Machine Learning: Classification 11

Data Analysis 02
Viewing the status of people in the data set :

We have 165 people with heart

disease and 138 healthy people, so
the data for the target variable we
want to predict is in balance.

Supervised Machine Learning: Classification 12

Data Analysis 03
Study of the relationship of categorical
features and heart disease:

cp (chest pain): people with chest pain of the type:

cp: [1, 2, 3] tend to have more heart disease than
people without any chest pain cp: 0

restecg (resting ECG results): People with a value of 1

(having an abnormal heart rhythm, which can range
from mild symptoms to severe problems) are more
likely to develop heart disease.

exang (exercise-induced angina): People with non-

exercise-induced angina who have a value of 0 are
more likely to have heart disease than those who
have exercise-induced angina with a value of 1.

Supervised Machine Learning: Classification 13

Data Analysis 04
Slope (rectal slope for the ST segment of peak
exercise): People with a downsloping slope of 2 have
signs of an unhealthy heart therefore they more likely
to have heart disease than people with an upsloping
of 0 or a flat slope A value of 1: minimal change
(typical healthy heart)).

ca (number of blood vessels (0-3) ): the more blood

flow the better heart, so people with a vessel number
ca equal to 0 are more likely to have heart disease.

thal (a blood disorder called thalassemia): People

with a thal value = 2 are more likely to have heart
disease.

Supervised Machine Learning: Classification 14

Data Analysis 05
Study of the relationship of continuous
features and heart disease:

trestbps: When blood pressure is higher than 130-140

mm Hg, this is a cause for concern.

chol: When cholesterol is higher than 200 mg/dL, it is

a very dangerous indicator, as shown in the graphic
above.

thalach: People with a heart rate above 140 are more

likely to have heart disease.

Supervised Machine Learning: Classification 15

Data Analysis 06
- Studying the correlations between features using Heat Map!

The goal of this matrix is to show the relationship

between features, and this is useful for feature
engineering techniques, but what matters most to us in
this lesson is the relationship between the target
variable (knowing whether a person has a heart
disease or not) and the rest of the features, meaning
that our focus will be on the last row from the matrix.

 1. fbs and chol are the features least related to the

target variable.

 2. All other features have a high correlation with the

target variable.

Supervised Machine
Learning: Classification 16
Feature Engineering 01
Converting Categorical features into Numerical features :

Supervised Machine
Learning: Classification 17
Machine Learning
Analysis & Findings

Supervised Machine Learning: Classification 18

Machine Learning Analysis & Findings

In the following analysis will compare between 4 different Classification models

Logistic Regression, KNN, SVM and XGBoost in terms of predicting the Heart Disease.
Where I am going to use the following techniques to help me in developing robust
models:

Standard scaling, cross-validation method, Grid Search, metric measurements such

accuracy, precision, F1 Score etc.

Supervised Machine Learning: Classification 19

Machine Learning Analysis 01
Data Splitting:

Supervised Machine Learning: Classification 20

Machine Learning Analysis 02
Logistic Regression Model:

Supervised Machine Learning: Classification 21

Machine Learning Analysis 03
Logistic Regression Model with penalty = L1:

Supervised Machine Learning: Classification 22

Machine Learning Analysis 04
Logistic Regression Model with penalty = L2:

Supervised Machine Learning: Classification 23

Analysis & Findings
Logistic Regression Models Findings:

The best model in terms of prediction performance is Logistic

Regression with penalty = 2

 Accuracy : 80%
 Precision : 80%
 Recall : 80%
 F1-score : 80%
 Support : 91%

Supervised Machine Learning: Classification 24

Machine Learning Analysis 05
KNN Algorithm

 Accuracy : 84%
 Precision : 85%
 Recall : 84%
 F1-score : 83%
 Support : 91%

Supervised Machine Learning: Classification 25

Machine Learning Analysis 06
Support Vector Machine Model:

 Accuracy : 80%
 Precision : 80%
 Recall : 80%
 F1-score : 80%
 Support : 91%

Supervised Machine Learning: Classification 26

Machine Learning Analysis 07
XGBoost Algorithm

Supervised Machine Learning: Classification 27

Machine Learning Analysis 08
XGBoost Algorithm

 Accuracy : 80%
 Precision : 82%
 Recall : 80%
 F1-score : 80%
 Support : 91%

Supervised Machine Learning: Classification 28

Machine Learning Analysis 09
XGBoost Algorithm

 Accuracy : 80%
 Precision : 82%
 Recall : 80%
 F1-score : 80%
 Support : 91%

Supervised Machine Learning: Classification 29

Machine Learning Analysis 10
Models Comparison

As shown in the previous analysis all the models provide very good prediction results and these
results are so close to each other, But at the end we must choose one model for our dataset
and this depends on the highest result.
Below I ordered the models descending:
KNN
1- KNN
2 XGBoost
3 Logistic Regression with L2
4 Support Vector Machine

Supervised Machine Learning: Classification 30

Models flaws and strengths
and advanced steps

Supervised Machine Learning: Classification 31

Machine Learning Analysis 11
Models Flaws and Strength and further suggestions:

In terms of simplicity, we can say Logistic Regression provided high predictive results and at the
same time it is the simplest and fastest Model in terms of parameters and training but if we look
to other models like KNN it is providing the best results, but it is slower in terms of prediction
process because it requires to calculate the distance between all the points in the dataset to
classify every single point.

XGBoost performance was very good as well but in contrast of KNN it takes longer time in the
training process since we used grid search technique to search about best fitting parameters, so at
the end it is a tradeoff if we have bigger dataset then the performance will be higher with such
models, but the training process will take a longer time.

Supervised Machine Learning: Classification 32

Thank you
IBM Machine Learning Professional Certificate
Supervised Machine Learning: Classification

03 Supervised - Machine.learning - Classification
No ratings yet
03 Supervised - Machine.learning - Classification
45 pages
Project Report
No ratings yet
Project Report
18 pages
Heart Disease Prediction via ML
No ratings yet
Heart Disease Prediction via ML
5 pages
Synopsis (Heart Disease Prediction)
No ratings yet
Synopsis (Heart Disease Prediction)
7 pages
Bibliography
No ratings yet
Bibliography
6 pages
Final Year Project
No ratings yet
Final Year Project
57 pages
Heart Disease Prediction With Machine Learning Approaches
No ratings yet
Heart Disease Prediction With Machine Learning Approaches
6 pages
Heart Disease Prediction Using Machine
No ratings yet
Heart Disease Prediction Using Machine
88 pages
Cse437 4
No ratings yet
Cse437 4
14 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
17 pages
Application of Machine Learning For The Detection of Heart Disease
No ratings yet
Application of Machine Learning For The Detection of Heart Disease
8 pages
Heart Disease Prediction Using Machine Learning-1
No ratings yet
Heart Disease Prediction Using Machine Learning-1
6 pages
Comparative Study For Classification
No ratings yet
Comparative Study For Classification
6 pages
Heart Disease Detection via ML
No ratings yet
Heart Disease Detection via ML
12 pages
Final - PPR (1) BTP
No ratings yet
Final - PPR (1) BTP
14 pages
Project Report
No ratings yet
Project Report
6 pages
Heart Disease Prediction Using ML
No ratings yet
Heart Disease Prediction Using ML
5 pages
AB Report Group 2
No ratings yet
AB Report Group 2
14 pages
Diagnosis and Prediction of Heart Disease Using Machine Learning Techniques
No ratings yet
Diagnosis and Prediction of Heart Disease Using Machine Learning Techniques
11 pages
New Research New 1
No ratings yet
New Research New 1
5 pages
(IJCST-V9I3P22) : Yogesh Gedam, Shivraju Bomble, Uma Kurwade, Bhavana Parchake, Hemant Uike
No ratings yet
(IJCST-V9I3P22) : Yogesh Gedam, Shivraju Bomble, Uma Kurwade, Bhavana Parchake, Hemant Uike
4 pages
Heart Disease Detection Using Machine Learning: Chithambaram T Logesh Kannan N Gowsalya M (Gowsalya.m@vit - Ac.in)
No ratings yet
Heart Disease Detection Using Machine Learning: Chithambaram T Logesh Kannan N Gowsalya M (Gowsalya.m@vit - Ac.in)
5 pages
IEEE Template
No ratings yet
IEEE Template
4 pages
IJMLC DivyanshKhanna RohanSahu
No ratings yet
IJMLC DivyanshKhanna RohanSahu
7 pages
HEART
No ratings yet
HEART
15 pages
Heart Disease Report
No ratings yet
Heart Disease Report
8 pages
Heart Failure Prediction Using Hybrid Method
No ratings yet
Heart Failure Prediction Using Hybrid Method
8 pages
ML for Heart Disease Prediction
No ratings yet
ML for Heart Disease Prediction
4 pages
Heart Disease
No ratings yet
Heart Disease
13 pages
Web Application
No ratings yet
Web Application
13 pages
Suryapdf2 Merged
No ratings yet
Suryapdf2 Merged
20 pages
Review 2
No ratings yet
Review 2
23 pages
Heart Disease Prediction Models
No ratings yet
Heart Disease Prediction Models
45 pages
Research Proposal
No ratings yet
Research Proposal
8 pages
Research Paper - IT - Group No 8
No ratings yet
Research Paper - IT - Group No 8
10 pages
Final PPT Heart Disease
67% (3)
Final PPT Heart Disease
23 pages
Mini Report2
No ratings yet
Mini Report2
40 pages
Heart Failure CETM24
No ratings yet
Heart Failure CETM24
28 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
9 pages
Report
No ratings yet
Report
11 pages
Final Report
No ratings yet
Final Report
43 pages
CARDIAC DISEASES PREDICTION USING SVM WITH XG BOOST ALGORITHM Ijariie19362
No ratings yet
CARDIAC DISEASES PREDICTION USING SVM WITH XG BOOST ALGORITHM Ijariie19362
8 pages
Heart Disease Prediction Using Hybrid Model
No ratings yet
Heart Disease Prediction Using Hybrid Model
6 pages
AI & ML Report
No ratings yet
AI & ML Report
14 pages
Manuscript
No ratings yet
Manuscript
7 pages
HussainBadshah SafwanSheikh
No ratings yet
HussainBadshah SafwanSheikh
12 pages
Heart Disease
No ratings yet
Heart Disease
6 pages
Final Heart Disease Project Proposal
No ratings yet
Final Heart Disease Project Proposal
12 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Heart Disease Prediction System Using Machine Learning
No ratings yet
Heart Disease Prediction System Using Machine Learning
19 pages
A Prediction of Heart Disease Using Machine Learning Algorithms
No ratings yet
A Prediction of Heart Disease Using Machine Learning Algorithms
8 pages
Thesis Fall 2022
No ratings yet
Thesis Fall 2022
16 pages
ML Cep FAisal
No ratings yet
ML Cep FAisal
18 pages
Evaluation of Cardiovascular Disease in Diabetic Patients Using Machine Learning Techniques
No ratings yet
Evaluation of Cardiovascular Disease in Diabetic Patients Using Machine Learning Techniques
13 pages
Sharma Yash Thesis 2023
No ratings yet
Sharma Yash Thesis 2023
46 pages
Heart Disease Prediction Using
No ratings yet
Heart Disease Prediction Using
8 pages
Conference PPT Anas2
No ratings yet
Conference PPT Anas2
14 pages
Efficient Medical Diagnosis of Human Heart Diseases
No ratings yet
Efficient Medical Diagnosis of Human Heart Diseases
27 pages
Vascular Surgery 2nd Edition Linda Hands PDF Download
100% (2)
Vascular Surgery 2nd Edition Linda Hands PDF Download
48 pages
Brain and Bannisters Clinical Neurology Oxford Medical Publications by Roger Bannister PDF
No ratings yet
Brain and Bannisters Clinical Neurology Oxford Medical Publications by Roger Bannister PDF
7 pages
Erythema Ab Igne - DermNet NZ
100% (1)
Erythema Ab Igne - DermNet NZ
2 pages
Epitalon, An Anti-Aging Serum Proven To Work
100% (3)
Epitalon, An Anti-Aging Serum Proven To Work
39 pages
Dimitar Hadzhiev - Operation Corona, Second Edition
No ratings yet
Dimitar Hadzhiev - Operation Corona, Second Edition
219 pages
Pharma Clinical Questions
No ratings yet
Pharma Clinical Questions
27 pages
The Impact of Modern Society On Mental Health
No ratings yet
The Impact of Modern Society On Mental Health
4 pages
Certificate For COVID-19 Vaccination: Beneficiary Details
No ratings yet
Certificate For COVID-19 Vaccination: Beneficiary Details
1 page
Blue Marble Health Neglected Diseases of The Poor Living Amidst Wealth Peter J Hotez Download
100% (4)
Blue Marble Health Neglected Diseases of The Poor Living Amidst Wealth Peter J Hotez Download
54 pages
Introduction Public Health
No ratings yet
Introduction Public Health
28 pages
Oxygen Therapy Escalation Algorithm
No ratings yet
Oxygen Therapy Escalation Algorithm
2 pages
Cardiopulmonary Exercise Testing in Children and Adolescents 1st Edition Thomas W Rowland - Own The Ebook Now With All Fully Detailed Content
No ratings yet
Cardiopulmonary Exercise Testing in Children and Adolescents 1st Edition Thomas W Rowland - Own The Ebook Now With All Fully Detailed Content
58 pages
Asthma
No ratings yet
Asthma
10 pages
(03241750 - Acta Medica Bulgarica) External Congenital Lacrimal Sac Fistula - А Case Report
No ratings yet
(03241750 - Acta Medica Bulgarica) External Congenital Lacrimal Sac Fistula - А Case Report
3 pages
History Taking Compilation 1
No ratings yet
History Taking Compilation 1
31 pages
Deep Learning Inspired Game-Based Cognitive Assessment For Early Dementia DetectionV3
No ratings yet
Deep Learning Inspired Game-Based Cognitive Assessment For Early Dementia DetectionV3
35 pages
Biology Investigatory Project
89% (35)
Biology Investigatory Project
21 pages
Highlights in Aggressive Lymphoma: Asma Quessar, MD University Hospital Cheikh Zaid Casablanca, Morocco
No ratings yet
Highlights in Aggressive Lymphoma: Asma Quessar, MD University Hospital Cheikh Zaid Casablanca, Morocco
34 pages
Safety Data Sheet: 1. Identification
No ratings yet
Safety Data Sheet: 1. Identification
6 pages
Name: Nadia Eka Indrianing Class: Profesi Ners
No ratings yet
Name: Nadia Eka Indrianing Class: Profesi Ners
4 pages
Terminated HHS Grants
No ratings yet
Terminated HHS Grants
14 pages
Spiritual Self-Awareness Workbook
No ratings yet
Spiritual Self-Awareness Workbook
377 pages
Endotracheal Tube Patient Care Plan
No ratings yet
Endotracheal Tube Patient Care Plan
1 page
Full Mouth Rehabilitation of Adult Rampant Caries With Pragmatic Approach
No ratings yet
Full Mouth Rehabilitation of Adult Rampant Caries With Pragmatic Approach
6 pages
Diagnosa
No ratings yet
Diagnosa
13 pages
Assessment of Digestive and GI Function
No ratings yet
Assessment of Digestive and GI Function
23 pages
1.intraoral Radiography Techniques
No ratings yet
1.intraoral Radiography Techniques
76 pages
Schizoprenia: A Beautiful Mind
No ratings yet
Schizoprenia: A Beautiful Mind
4 pages
Concept Map DM
100% (1)
Concept Map DM
2 pages
Animal BT Assignment 1
No ratings yet
Animal BT Assignment 1
10 pages

03-Supervised Machine Learning Classification

Uploaded by

03-Supervised Machine Learning Classification

Uploaded by

Final Project ML

IBM Machine Learning Professional Certificate

Supervised Machine Learning: Classification 2

Supervised Machine Learning: Classification 3

Supervised Machine Learning: Classification 4

• cp: chest pain type: • chol: The person’s cholesterol

Great, there is no missing values

Supervised Machine Learning: Classification 8

Supervised Machine Learning: Classification 9

After that I am building different Classification models based on advanced

Supervised Machine Learning: Classification 10

Supervised Machine Learning: Classification 11

We have 165 people with heart

Supervised Machine Learning: Classification 12

cp (chest pain): people with chest pain of the type:

restecg (resting ECG results): People with a value of 1

exang (exercise-induced angina): People with non-

Supervised Machine Learning: Classification 13

ca (number of blood vessels (0-3) ): the more blood

thal (a blood disorder called thalassemia): People

Supervised Machine Learning: Classification 14

trestbps: When blood pressure is higher than 130-140

chol: When cholesterol is higher than 200 mg/dL, it is

thalach: People with a heart rate above 140 are more

Supervised Machine Learning: Classification 15

The goal of this matrix is to show the relationship

 1. fbs and chol are the features least related to the

 2. All other features have a high correlation with the

Supervised Machine Learning: Classification 18

In the following analysis will compare between 4 different Classification models

Standard scaling, cross-validation method, Grid Search, metric measurements such

Supervised Machine Learning: Classification 19

Supervised Machine Learning: Classification 20

Supervised Machine Learning: Classification 21

Supervised Machine Learning: Classification 22

Supervised Machine Learning: Classification 23

The best model in terms of prediction performance is Logistic

Supervised Machine Learning: Classification 24

Supervised Machine Learning: Classification 25

Supervised Machine Learning: Classification 26

Supervised Machine Learning: Classification 27

Supervised Machine Learning: Classification 28

Supervised Machine Learning: Classification 29

Supervised Machine Learning: Classification 30

Supervised Machine Learning: Classification 31

Supervised Machine Learning: Classification 32

You might also like