0% found this document useful (0 votes)

207 views16 pages

Diabetes Prediction Report

This document discusses a project that aims to predict diabetes using machine learning models. It analyzes a dataset of family members, some with diabetes and some without, to train models. The models take in inputs like pregnancies, glucose, skin thickness, age, blood pressure, and BMI to predict whether a person has diabetes. A literature review discusses previous research applying machine learning algorithms like KNN, SVM, Naive Bayes and logistic regression to clinical datasets, achieving prediction accuracies between 73-98%. The document then outlines the methodology, which applies supervised algorithms KNN, SVM, Naive Bayes and logistic regression to the dataset.

Uploaded by

areeshaabass4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

207 views16 pages

Diabetes Prediction Report

Uploaded by

areeshaabass4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

DATA ANALYTICS (CS-055, CS-065)

Project report
Diabetes prediction using machine learning
Submitted by:
AREESHA ABASS (CS-055)
MAHAM (CS-065)
DATA ANALYTICS (CS-055, CS-065)

ABSTRACT
Diabetic Mellitus is one of the major diseases in this new era because of people’s carelessness
regarding their health they do not eat healthy food and have bad habits of sleeping at the wrong
times, not working out, due to the increased age, hereditary diabetes, high blood pressure, etc.
peoples having diabetes have a high-risk disease like heart disease, kidney disease, an eye
problem, nerve damage, etc. The current hospital procedure is to collect the required information
for diabetes diagnosis through various tests, and then give suitable therapy depending on the
diagnosis. In the healthcare industry, big data analytics is extremely important. The healthcare
industry has vast datasets. Big data analytics can be used to examine large datasets and uncover
hidden information and trends to extract knowledge from the data and anticipate outcomes. The
current prevalence of type 2 diabetes mellitus in Pakistan is 11.77%. In males, the prevalence is
11.20% and in females 9.19%. The mean prevalence in Sindh province is 16.2% in males and
11.70 % in females. Previous research had hospitalized person’s dataset, but this research paper
used the dataset of my family members where some members have diabetes, and some members
are free from this disease so our diabetes prediction model uses machine learning with python
can predict after the input data either the given data of that member having debated Mellitus, or
the member is free from this disease. The inputted data includes pregnancies, glucose, Skin
Thickness, Age, blood pressure, and BMI after entering these values the model will predict that
either it’s a diabetic patient or it’s not a diabetic patient.
DATA ANALYTICS (CS-055, CS-065)

INTRODUCTION
Diabetes is the main task in the healthcare community worldwide, its effects are too high,
according to the World health organization (WHO), it is the seventh major way of early neonatal
death in 2016. Based on the worldwide pervasiveness diabetic patients die each year by
approximately 1.6 million. WHO merged to form partners from around the world to highlight the
impact of diabetes on the day of World Diabetes Day in 2018. The issue is getting worse every
day because one in every three women is overweight as per the report by WHO. So, diabetes has
been considered the key cause of heart attacks, kidney failure, and stroke blindness.
The National Institute of Diabetes and Digestive and Kidney Diseases provided the actual data to
apply some machine algorithms to this dataset including KNN, Naïve Bayes, SVM, and LR. The
main goal of this project is to diagnose whether a patient has diabetes or not diabetes based on
the previous screening measurement. So, from the previous results, this model predicts diabetes
patients with high accuracy.

LITERATURE REVIEW
In September 2018 research was introduced [1], the research mainly focused on big data
analytics, Predictive Analytics, and Machine Learning in Healthcare, it collects a dataset of
PIMA Indian from a repository of UCI machine learning which includes the record of 768
women with 9 features, the main goal of this research was that to predict whether the person is
diabetic patient or not according to the previous measurements. the research includes machine
learning algorithms SVM, KNN, LR, DT, RF, and NB, these algorithms can be applied to the
PIMA Indian dataset, to predict the accuracy of results in predicting diabetes.

In research [2] the main goal is to a deciding whether a person is diabetic or not, so to find out
the answer to this question they used convolutional neural network and conventional machine
learning methods including Support Vector Machine, Random forecast, and many others and
applied on the dataset consists of 768 instances with 8 attributes and attributes includes pregnant
DATA ANALYTICS (CS-055, CS-065)

count, plasma glucose concentration, diastolic blood pressure, skin thickness, and serum insulin
to predict the person’s sugar level and gives the accuracy around 73.94 percent.

This research [3] works on big data which plays an important role to find out the hidden values
in the dataset, the challenge of this research is to make a high accuracy prediction of diabetes, the
accuracy predicted to compare the dataset to the previous dataset, and this research uses a
pipeline model to give the accurate results, algorithms used in this research were supervised
learning, unsupervised learning, and semi-supervised learning to conclude the accuracy of the
diabetes prediction results.

The research [4] proposed a sugar level prediction with an accuracy of 97.11% and sensitivity of
96.25% by using the Deep Neural Network and evaluation Metrics and the methods applied on
Pima Indian Dataset (PID), this research uses deep learning to predict diabetes by using some of
the steps including data collection, data preparation, implement deep network and evaluation
criteria, to predict diabetes with the accuracy of 98.35%.

The main motive of this research [5] is to give a model to predict diabetes with maximum
accuracy, so they use a machine learning algorithm including decision tree, SVM, and Naive
Bayes to apply to PIDD (Pima Indian Diabetes Database) which is collected from UCI. this
research shows the precision, F-measure, and accuracy of the model, the accuracy is around
76.30 percent. but these predicted results are also verified by Operating Characteristic (ROC)
curves

The author in this research [6] mainly focuses on the machine learning data mining, support
vector machine, artificial neural network, and decision tree to predict the debates by taking some
steps like training and testing dataset, pre-processing, feature extraction, and target dataset, this
research also using machine learning algorithm including decision tree method, artificial neural
network, SVM and Naive Baes Classifier and also used the Machine learning Matrix which
consists of Precision, Recall, F1-Score to give the results which predict diabetes with accuracy.

Machine learning has different classifiers which help us in the real world, the main challenge of
this research [7] is the accurate and strong prediction of diabetes mellitus, in this literature the
weight is estimated by the corresponding area under the ROC Curve (AUC) of some Machine
Learning models and the models are made from the different classifiers including (K-nearest
DATA ANALYTICS (CS-055, CS-065)

Neighbor, Decision Trees, Random Forest, AdaBoost, Naive Bayes, and XGBoost), and these all
classifiers applied on the Pima Indian dataset, in the result of this project the framework made
for diabetes prediction can predict with AUC as 0.789, 0.934, 0.092, with respect of 2.00 percent
accuracy in AUC.

As we know that data plays an important role in worldwide and data mining is also significant in
this era, in this research paper [8] keep into consideration of data, and data mining techniques
too, this research can give results for diabetic patient’s data concerning diabetic complications,
prediction, the background of the patients, this only done with the help of machine learning
algorithm SVM (Support Vector Learning) with employed on different types of data including
supervised which 85 percent of available data, unsupervised 15 percent of availability of data.

Machine Learning algorithms are used in in-depth problems in the real world, this research [9]
main purpose is to find out the best machine learning algorithm which helps us in the prediction
of diabetes with make use of clinical data, the Machine learning algorithm has trained on a
different dataset in this research they use K-nearest neighbor (KNN), Random Forest (RF),
Gradient Boosting (GB), Logistic Regression (LR) and Support Vector (SVM). they also
increase accuracy by some pre-processing techniques like label-encoding and normalization, for
the need for accurate results they compare models with some previous whose accuracy is around
2.71 percent to 13.13 percent. This research concludes that they implement this model on the
smart web applications by using python and the developed smart web application can give a
higher accuracy in the prediction of diabetes mellitus.

As we know that diabetes become a very big disease nowadays, the research [10] proposed a
system which determines the patient’s type of diabetes whether the type1 or type2 with high
accuracy of predicting diabetes, to predict the type of diabetes they used parameters including
(Pregnancies, skin thickness, Blood pressure, Insulin, BMI, etc. the prediction is made by using
Machine Learning algorithms like SVM, ANN, DT, LR and these algorithms applied on the
dataset to give the better accuracy of diabetes prediction.

METHODOLOGY
FLOW CHART OF DIABETES PREDICTION METHODOLOGY
DATA ANALYTICS (CS-055, CS-065)

Data mining plays a significant role in the era of data, the industries, companies, offices,
hospitals, and banking systems these all rely on data, with various techniques including machine
learning that are used in the predictions of disease. Therefore, this research works on diabetes
prediction using various machine learning algorithms, machine learning provides chunks of tools
and techniques that can be used to transform raw data into a useful dataset. There are many types
of algorithms in Machine learning but in this research paper, we have only focused on the
supervised algorithm such as K-Nearest Neighbors (KNN), Support Vector Machine Algorithm
(SVM), Naive Bias, and Logistic Regression (LR). These algorithms are applied to the PIMA
Indian dataset, the steps used to predict diabetes are defined in the flow chart that is given in
figure 1.1, this figure illustrates the steps of the diabetes prediction methodology.

Figure 1: Diabetes Prediction Methodology Flow Chart

In the previous models, there are three machine learning algorithms are used:
DATA ANALYTICS (CS-055, CS-065)

1. KNN (K-Nearest Neighbor)

2. Naïve Bayas
3. SVM (Support Vector Machine)

Now I use another algorithm named LR (Logistic Algorithm)

4. Logistic Regression Algorithm

1. KNN (K-NEAREST NEIGHBOR)
 The KNN machine learning algorithm is a supervised machine learning algorithm, which
means the dataset used in any model is labeled dataset, this algorithm solves the problem
statements for both classification and regression.
 The unknown variable that must be predicted in several nearest neighbors is represented
by K.
 The distance-based algorithm KNN, simply finds out the class of the nearest neighbor
around the unknown data point.

let’s take an example of a cats and dogs’ dataset where 4 cats and 1 dog around the cat so KNN
calculates all the points in the nearest unknown data and finally finds out the shortest distances to
it. in figure 1 you see the value of K is 5, the algorithm predicts the dataset is the cat-based
dataset.

Figure 2: This figure is used as an example of the KNN Algorithm

Mathematical Representation of KNN Algorithm:

DATA ANALYTICS (CS-055, CS-065)

The algorithm KNN stated that for a given value of K, for finding the K-nearest of an invisible
data point and then appoint the class to the invisible data point by having the class with the most
data points out of all classes of K neighbors

For distance metrics, we will use Euclidian Metrics:

In the last input, x gets allocated to the class with the largest probability

2. NAÏVE BAYAS ALGORITHM

 This algorithm is based on the Bayes theorem, which states that a supposition that all the
attributes predict the main goal value that is independent of each other,
 It predicts or calculates the probability of each class and then collects the one with the
highest probability.
 It works with natural processing (NLP) problems.

Bayes' Theorem states that the probability of an item, construct according to prior knowledge of
conditions that may be related to the item.

Mathematical Representation of Naïve Bayes Algorithm:

DATA ANALYTICS (CS-055, CS-065)

3. SVM (SUPPORT VECTOR MACHINE)

 A support vector machine is also a supervised learning algorithm, that is also used in
the classification and regression of a problem.
 The SVM's goal is to create the line named hyperplane which can decide the different
classes, there are two other lines parallel to the hyperplane, so the distance between
these two lines is the margin and the point which is nearest to these lines are called
support vectors. The SVM illustrates in the below figure 2:

Figure 3: SVM Illustration

Mathematical Representation of SVM Algorithm:

Let’s see the equation of SVM:

The distance of any line, ax+by+c = 0 from a given point say, (x0,y0) I given by d

The distance of the hyperplane equation is given below:

DATA ANALYTICS (CS-055, CS-065)

Euclidean norm for the length of w given by:

4. LR (LOGISTIC REGRESSION ALGORITHM)

 This Logistic regression is also a supervised (labeled dataset) algorithm used to
estimate/predict a goal value, the nature of the goal value or dependent value is forked,
which means there will be only feasible classes.

Mathematical Representation of LR Algorithm:

 In Mathematical representation, logistic regression models estimate P(Y=1) as a function

of x. one of the simplest Machine learning algorithms which usually detects some the
problems such as diabetes prediction, spam detection, and cancer detection.

EXPLORATORY DATA ANALYSIS (EDA):

Do you want to make an amazing data science project, but you just need a dataset that is free
from all mistakes, now is there any way to find out the detail about the dataset?
Yes! Exploratory Data Analysis is a method to find out the detail of your dataset which is related
to your demanding project it also includes some other features:
 It can be trying to conduct preliminary data analysis to discover variations
 To identify outliers
 To test hypotheses
 To validate assumptions using statically results and visualization
TECHNIQUES OF EDA:
 UNIVARIATE NON-GRAPHICAL
DATA ANALYTICS (CS-055, CS-065)

The simplest EDA technique, where only a single variable is used in data, because of a single
variable data expert does not deal with relationships. And it does not show the complete picture
of the data.
 UNIVARIATE GRAPHICAL
In the univariate analysis, we try to pick up one variable (column) from our dataset, and on that
basis, we will determine the output, but the output must overlap with each other so that’s why we
will move from univariate to Bivariate. In Univariate graphics experts also implement the graphs
like stem-and-leaf, histograms, and box plots.
 BIVARIATE ANALYSIS
The technique by which data experts take two variables and determine the output. It also deals
with the relationships between two variables.
 MULTIVARIATE NON-GRAPHICAL
This technique includes more than two variables, and non-graphical multivariate demonstrate
relationships between more than two variables with the help of statistics and cross-tabulation.
 MULTIVARIATE GRAPHICAL
This technique includes more than two variables and shows the relationship between them.
Graphs can be a bar chart, heat map, bar plot, bubble chart, run chart, scatter plot, and
multivariate chart.
EDA TECHNIQUES APPLYING TO DIABETES DATABASE

1. Univariate Analysis:

Whenever I apply the technique of univariate on a diabetes dataset, I must put only one variable
and then determine the output.
DATA ANALYTICS (CS-055, CS-065)

2. Bivariate Analysis

In this technique, I took two variables and determine the result or output from the relationship
between these two variables.

3. Multivariate Analysis

In this technique, we can determine the relationship between more than two variables and see the
output through the graph.
DATA ANALYTICS (CS-055, CS-065)

4. Missing Values:

In missing values, we can find out which column or variable is missing from the dataset and
handle those missing values to complete the column and fill out these missing values in the
dataset.

RESULTS
This research is working on the diabetes prediction for that we apply some of the machine
learning algorithms on PIMA Indian dataset, the algorithms such as KNN, NB, SVM, and
Logistic Algorithm, this works on accuracy so to apply different kinds of machine learning on a
dataset which is divide by 75% training the data and 25% for testing the data, now let’s see the
below figure which shows the features of columns and rows or datasets point.
Feature heat map in figure 1, shows the attributes in the heat map that may represent the
graphical image where datasets show through the colors, which helps you to see the depth part of
your dataset.
DATA ANALYTICS (CS-055, CS-065)

Figure 4: Featured Heat map

Figure 2, represents the accuracy of the diabetes prediction after applying some machine learning
algorithms, the above confusion matrix describes that 0,0 means true negative and 1,1 means true
positive so the next part shows that 0,1 when a person is negative but the predicted value comes
in positive and the last part shows the accuracy through 1,0 that mean the s person is positive but
the predicted value shows negative, so the first part accurately but it also becomes more accurate
if the multiple models will be ensemble.

Figure 5: Accuracy of Algorithms

DATA ANALYTICS (CS-055, CS-065)

REFERENCES

[1] Sarwar, M.A., Kamal, N., Hamid, W. and Shah, M.A., 2018, September. Prediction of
diabetes using machine learning algorithms in healthcare. In 2018 24th international conference
on automation and computing (ICAC) (pp. 1-6). IEEE.
[2] Yahyaoui, A., Jamil, A., Rasheed, J., and Yesiltepe, M., 2019, November. A decision support
system for diabetes prediction using machine learning and deep learning techniques. In 2019 1st
International Informatics and Software Engineering Conference (UBMYK) (pp. 1-4). IEEE.

[3] Mujumdar, A. and Vaidehi, V., 2019. Diabetes prediction using machine learning
algorithms. Procedia Computer Science, 165, pp.292-299.

[4] Ayon, S.I. and Islam, M.M., 2019. Diabetes prediction: a deep learning
approach. International Journal of Information Engineering and Electronic Business, 12(2),
p.21.

[5] Sisodia, D. and Sisodia, D.S., 2018. Prediction of diabetes using classification
algorithms. Procedia computer science, 132, pp.1578-1585.

[6] Sonar, P. and JayaMalini, K., 2019, March. Diabetes prediction using different machine
learning approaches. In 2019 3rd International Conference on Computing Methodologies and
Communication (ICCMC) (pp. 367-371). IEEE.
DATA ANALYTICS (CS-055, CS-065)

[7] M. K. Hasan, M. A. Alam, D. Das, E. Hossain, and M. Hasan, "Diabetes Prediction Using
Ensembling of Different Machine Learning Classifiers," in IEEE Access, vol. 8, pp. 76516-
76531, 2020, DOI: 10.1109/ACCESS.2020.2989857.

[8] Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I. and Chouvarda, I.,
2017. Machine learning and data mining methods in diabetes research. Computational and
structural biotechnology journal, 15, pp.104-116.

[9] Ahmed, N., Ahammed, R., Islam, M.M., Uddin, M.A., Akhter, A., Talukder, M.A.A. and
Paul, B.K., 2021. Machine learning-based diabetes prediction and development of smart web
application. International Journal of Cognitive Computing in Engineering, 2, pp.229-241.

[10] Naz, H. and Ahuja, S., 2020. Deep learning approach for diabetes prediction using PIMA
Indian dataset. Journal of Diabetes & Metabolic Disorders, 19(1), pp.391-403.

[11] M. A. Sarwar, N. Kamal, W. Hamid, and M. A. Shah, "Prediction of Diabetes Using

Machine Learning Algorithms in Healthcare," 2018 24th International Conference on
Automation and Computing (ICAC), 2018, pp. 1-6, DOI: 10.23919/IConAC.2018.8748992.

[12] Bano Farhana et al 2021 J. Phys.: Conf. Ser. 2089 012002

Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Using Categorical Data With One Hot Encoding - Kaggle PDF
No ratings yet
Using Categorical Data With One Hot Encoding - Kaggle PDF
4 pages
Data Pre-Processing (Pandas)
No ratings yet
Data Pre-Processing (Pandas)
19 pages
ML Project Shivani Pandey
100% (2)
ML Project Shivani Pandey
49 pages
DL Unit1
100% (1)
DL Unit1
79 pages
QuantEconlectures Python3 PDF
100% (1)
QuantEconlectures Python3 PDF
1,125 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
SEM V 21 Electronics Engineering Syllabus
No ratings yet
SEM V 21 Electronics Engineering Syllabus
43 pages
A Comprehensive Survey On Deep Learning Based Malware Detectiontechniques
No ratings yet
A Comprehensive Survey On Deep Learning Based Malware Detectiontechniques
36 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Machine Learning in Healthcare
No ratings yet
Machine Learning in Healthcare
43 pages
Machine Learning in Python Main Developments and T
100% (1)
Machine Learning in Python Main Developments and T
44 pages
Life Expectancy Using Data Analytics
100% (1)
Life Expectancy Using Data Analytics
9 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Hands On Machine Learning With Scikit Learn and TensorFlow Techniques and Tools to Build Learning Machines 1st Edition by AurÃ©lien GÃ©ron 9352135210 9789352135219 - Download the ebook now for instant access to all chapters
100% (10)
Hands On Machine Learning With Scikit Learn and TensorFlow Techniques and Tools to Build Learning Machines 1st Edition by AurÃ©lien GÃ©ron 9352135210 9789352135219 - Download the ebook now for instant access to all chapters
85 pages
Credit Risk Analysis Applying Logistic Regression, Neural Networks and Genetic Algorithms Models
No ratings yet
Credit Risk Analysis Applying Logistic Regression, Neural Networks and Genetic Algorithms Models
12 pages
21csc305p Machine Learning Unit 5
No ratings yet
21csc305p Machine Learning Unit 5
61 pages
School of Computer Science: Prospectus
No ratings yet
School of Computer Science: Prospectus
22 pages
Demographics Segmentation Using Machine Learning
No ratings yet
Demographics Segmentation Using Machine Learning
8 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Disease Prediction Synopsis
No ratings yet
Disease Prediction Synopsis
3 pages
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
4 pages
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
No ratings yet
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
8 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Human Life Span Prediction Using Machine Learning
100% (1)
Human Life Span Prediction Using Machine Learning
9 pages
Enhancing Machine Learning Algorithms For Predictive Analytics in Healthcare - A Comparative Study and Optimization Approach
No ratings yet
Enhancing Machine Learning Algorithms For Predictive Analytics in Healthcare - A Comparative Study and Optimization Approach
53 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
Transfer Learning For Finetuning
No ratings yet
Transfer Learning For Finetuning
19 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Artificial Intelligence and Food Safety - 1
No ratings yet
Artificial Intelligence and Food Safety - 1
2 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Motivation Letter 55797dfggsfafa
No ratings yet
Motivation Letter 55797dfggsfafa
2 pages
(PDF Download) Hands On Machine Learning For Algorithmic Trading Fulll Chapter
100% (12)
(PDF Download) Hands On Machine Learning For Algorithmic Trading Fulll Chapter
34 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Heart Prediction
No ratings yet
Heart Prediction
15 pages
Pima Indian Diabetes Prediction
No ratings yet
Pima Indian Diabetes Prediction
22 pages
Python Machine Learning - (BooksHash)
100% (1)
Python Machine Learning - (BooksHash)
90 pages
Lecture 5
No ratings yet
Lecture 5
19 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
9 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
LLSPS - INT - 2831 - Predicting Life Expectancy Using Machine Learning
100% (1)
LLSPS - INT - 2831 - Predicting Life Expectancy Using Machine Learning
36 pages
Finalyearreport 12
No ratings yet
Finalyearreport 12
35 pages
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
No ratings yet
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
20 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
Deep Learning Unit-1 Finals
No ratings yet
Deep Learning Unit-1 Finals
23 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
Data Science
No ratings yet
Data Science
39 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
Diabetes Prediction Using Data Mining
No ratings yet
Diabetes Prediction Using Data Mining
17 pages
Wa0008.
No ratings yet
Wa0008.
21 pages
Cluster
100% (1)
Cluster
72 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Ai in Sports Cardiology
No ratings yet
Ai in Sports Cardiology
13 pages
Text Analysis Based On Natural Language Processing NLP
No ratings yet
Text Analysis Based On Natural Language Processing NLP
10 pages
Deep Neural Nets - 33 Years Ago and 33 Years From Now
No ratings yet
Deep Neural Nets - 33 Years Ago and 33 Years From Now
17 pages
Artificial Intelligence Adoption and Marketing Performance of Quoted Manufacturing Firms in Nigeria
No ratings yet
Artificial Intelligence Adoption and Marketing Performance of Quoted Manufacturing Firms in Nigeria
14 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Final Twitter - Sentiment - Analysis - Report
100% (1)
Final Twitter - Sentiment - Analysis - Report
14 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
CV MM - Data - Scientist - EN
No ratings yet
CV MM - Data - Scientist - EN
3 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
McKinsey Technology Trends Outlook 2023
No ratings yet
McKinsey Technology Trends Outlook 2023
9 pages
Predicting Sonar Rocks Against Mines With ML
No ratings yet
Predicting Sonar Rocks Against Mines With ML
7 pages
ccs355 Model - A
No ratings yet
ccs355 Model - A
2 pages
Python Setup For Machine Learning
100% (1)
Python Setup For Machine Learning
3 pages
ME Computer Engineering Syllabus
No ratings yet
ME Computer Engineering Syllabus
37 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
8 pages
An LSTM Based Deep Learning Model For Voice-Based Detection of Parkinson's Disease
No ratings yet
An LSTM Based Deep Learning Model For Voice-Based Detection of Parkinson's Disease
7 pages
Machine Learning and Finite Element Method For Physical Systems Modeling
No ratings yet
Machine Learning and Finite Element Method For Physical Systems Modeling
5 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
UoPeople CS4407 Quiz1
No ratings yet
UoPeople CS4407 Quiz1
5 pages
CS229 Final Project Guidelines
No ratings yet
CS229 Final Project Guidelines
6 pages
Data Analytics Using Python
100% (1)
Data Analytics Using Python
982 pages
Python Machine Learning Workbook For Beginners
No ratings yet
Python Machine Learning Workbook For Beginners
264 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
93% (15)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
The Python Bible
97% (31)
The Python Bible
506 pages
Python Data Science
100% (2)
Python Data Science
353 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
Understanding Machine Learning
100% (69)
Understanding Machine Learning
416 pages
Python For Data Science The Ultimate Beginners Guide To Learning Python Data Science Step by Step - Compress
100% (5)
Python For Data Science The Ultimate Beginners Guide To Learning Python Data Science Step by Step - Compress
148 pages
Python Programming For Beginners - Learn Python Programming in 24 Hours PDF
100% (21)
Python Programming For Beginners - Learn Python Programming in 24 Hours PDF
133 pages
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
91% (11)
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
166 pages
Data Visualization Complete Notes
100% (9)
Data Visualization Complete Notes
28 pages
Machine Learning With Python
100% (14)
Machine Learning With Python
692 pages
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
100% (8)
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
530 pages
Machine Learning Projects Python
94% (18)
Machine Learning Projects Python
134 pages
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
100% (8)
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
339 pages
Machine Learning Projects in Python
100% (16)
Machine Learning Projects in Python
135 pages
Intelligent Techniques For Data Science
100% (12)
Intelligent Techniques For Data Science
282 pages
SQL For Data Science
75% (4)
SQL For Data Science
350 pages
Python Data Science
92% (12)
Python Data Science
65 pages
Data Structure and Algorithms With Python
100% (14)
Data Structure and Algorithms With Python
369 pages
Big Data Analytics Tutorial
100% (15)
Big Data Analytics Tutorial
101 pages
Python Machine Learning For Beginners Ebook Final
100% (11)
Python Machine Learning For Beginners Ebook Final
305 pages
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
100% (10)
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
168 pages
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
100% (15)
Algorithms For Data Science 1st Brian Steele (WWW - Ebook DL - Com)
438 pages
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
100% (10)
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
227 pages
EBOOK - Python Crash Course For Data Analysis
100% (12)
EBOOK - Python Crash Course For Data Analysis
168 pages
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (18)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Python Cheat Sheets
97% (33)
Python Cheat Sheets
11 pages
Data Analysis From Scratch With Python - Beginner Guide Using Python, Pandas, NumPy, Scikit-Learn, IPython, TensorFlow and
100% (10)
Data Analysis From Scratch With Python - Beginner Guide Using Python, Pandas, NumPy, Scikit-Learn, IPython, TensorFlow and
104 pages

Diabetes Prediction Report

Uploaded by

Diabetes Prediction Report

Uploaded by

DATA ANALYTICS (CS-055, CS-065)

Figure 1: Diabetes Prediction Methodology Flow Chart

1. KNN (K-Nearest Neighbor)

Now I use another algorithm named LR (Logistic Algorithm)

4. Logistic Regression Algorithm

Figure 2: This figure is used as an example of the KNN Algorithm

Mathematical Representation of KNN Algorithm:

For distance metrics, we will use Euclidian Metrics:

2. NAÏVE BAYAS ALGORITHM

Mathematical Representation of Naïve Bayes Algorithm:

3. SVM (SUPPORT VECTOR MACHINE)

Figure 3: SVM Illustration

Mathematical Representation of SVM Algorithm:

Let’s see the equation of SVM:

The distance of the hyperplane equation is given below:

Euclidean norm for the length of w given by:

4. LR (LOGISTIC REGRESSION ALGORITHM)

Mathematical Representation of LR Algorithm:

 In Mathematical representation, logistic regression models estimate P(Y=1) as a function

EXPLORATORY DATA ANALYSIS (EDA):

Figure 4: Featured Heat map

Figure 5: Accuracy of Algorithms

[11] M. A. Sarwar, N. Kamal, W. Hamid, and M. A. Shah, "Prediction of Diabetes Using

[12] Bano Farhana et al 2021 J. Phys.: Conf. Ser. 2089 012002

You might also like