[go: up one dir, main page]

0% found this document useful (0 votes)
38 views8 pages

Multiple - Disease - Prediction System Using Machine Learning

Uploaded by

shahedab946
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views8 pages

Multiple - Disease - Prediction System Using Machine Learning

Uploaded by

shahedab946
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Multiple Disease Prediction System Using

Machine Learning

Parvati Balappa Todalabagi


Department of CSA
REVA University
Bangalore, India
parvatitodalabagi7@gmail.com

Abstract – The purpose of this paper is to health problems around the world. Doctors and
predict multiple diseases. In multiple researchers agree that catching these diseases
disease prediction it is possible to predict early can make a big difference, but the usual
more than one disease so, the user doesn’t methods aren’t always fast or accurate enough.
need traverse different sites in order to Some of these tools focus too much on one
predict the disease. Many of the existing disease at a time, and they often struggle when
system can predict only one disease that to symptoms overlap. Lately, though, machine
in lower accuracy. As chronic diseases such learning (ML) has started gaining traction as a
as diabetes, heart conditions, and better way to analyze a person’s health history
Parkinson’s become increasingly common, and clinical signs to catch diseases faster and
the need for early and accurate diagnosis is with better precision [1].
becoming more critical. In this study, we
propose a unified machine learning Limitations of Current Systems
framework capable of predicting all three Machine learning (ML) is showing a lot of
conditions through hybrid modeling. We potential for improving healthcare, but right
experimented with several algorithms for now, most systems only focus on predicting
each disease and combined the top two one disease at a time. Many studies have used
using ensemble methods. These hybrid ML to tackle diseases like diabetes [4,5,6],
models achieved between 3% and 6% heart disease [7,8,9,10], and Parkinson’s
higher accuracy compared to individual [11,12,13], but they rarely combine them into
models. The system was built using a single, unified system for multiple
Streamlit, allowing for real-time predictions conditions. This way of working creates a few
via a simple and interactive interface. problems:
Overall, the results suggest that hybrid
approaches may help improve diagnostic Single-Disease Focus: Most ML models are
accuracy in cases involving multiple chronic designed to spot just one condition, which
illnesses and may also help reduce the risk means they’re not as useful for broader
of incorrect diagnoses. diagnostics.
Keywords: Multiple Disease Prediction, Fragmented Infrastructure: Because different
Machine Learning, Hybrid Models, Streamlit, systems are used for different diseases, it
Ensemble Techniques, Health Dignostics makes the whole setup more complicated and
less efficient.
Misdiagnosis Risk: People who already have
I. Introduction one chronic illness are more likely to develop
others, like heart disease or neurological
These days, illnesses like diabetes, heart disorders [2].
disease, and Parkinson’s are still causing major
Proposed Solution actual medical settings, its actual reliability in
real life situations may nevertheless remain an
This paper proposes a multi-disease unknown quantity. Nonetheless it is step in the
prediction system using machine learning right direction as far as making pre-disease
models specifically designed for diabetes, diagnosis affordable and accessible to all.
heart disease, and Parkinson’s. To improve
upon the limitations of existing systems, we Diabetes Prediction Using Machine
use the following approach: Learning
We first test a variety of ML algorithms, Vinod Jain and Anuj Mangal.[4] used machine
including SVC, Random Forest, Gradient learning techniques to predict diabetes. The
Boosting, KNN, and Logistic Random Forest and Logistic Regression
Regression.From there, we select the top two models were used, and their effectiveness in
algorithms for each disease based on how well treating diabetes was evaluated. They made
they perform.To boost the system’s accuracy, advantage of the Kaggle dataset, which
we combine these models into hybrid ones includes information on roughly 520 people in
using ensemble techniques.Once the models the 20–65 age range. They achieved 99.03%
are trained, we deploy them through a simple, accuracy for the Random Forest model and
user-friendly web app built with Streamlit, 94.23% accuracy for the Logistic Regression
which allows for real-time predictions. The approach. They came to the conclusion that
final models are saved using Pickle for easy machine learning models' prediction accuracy
and efficient deployment after training with is assessed. K-Fold validation is used to
clean, preprocessed data. increase the ML models' prediction accuracy.
The RF model's accuracy was determined to be
II. Literature Survey around 99%, which is far higher than what the
other researchers found [2][3]. In the future,
The majority of current research focuses on more diseases can also be predicted using the
forecasting a single disease at a time, RF model with K-Fold. Future applications of
necessitating that users switch between models the suggested K-Fold model to more diabetes
for conditions such as heart disease and datasets are also possible.
diabetes, which makes the procedure time-
consuming and ineffective. Furthermore, there Shrivastava Pragya et al. [5] The study
is a greater chance of a mistake and postponed investigates machine learning models for
treatment if a patient has several conditions but diabetes prediction; the highest accuracy, up to
the algorithm can only predict one. 94.87%, is demonstrated by SVM and ANN.
The goal of this article is to use machine Researchers came to the conclusion that
learning techniques to create a Multiple machine learning improves early diagnosis,
Disease Prediction Model that can diagnose which helps with efficient disease
several illnesses at once.[2] management. They recommended employing
ensemble approaches, enhancing feature
L. D. Gopisetti et al. [18] This project selection, and refining supervised models.
presents a straightforward system to Problems with generalization and data quality
predict three common diseases – Diabetes, were identified, underscoring the need for
Heart Disease and Parkinson’s using Machine more study to integrate big data for clinical
learning. To detect Diabetes and Parkinson's, it applications and validate models across
employs Support Vector Machine various datasets.
(SVM) and Logistic Regression for Heart
Disease detection. It is all properly packaged KM Jyoti Rani [6] The study assesses five
an easy to use application developed with machine learning models for diabetes
Streamlit. The predictions are breathtaking, prediction: Random Forest, K-Nearest
ranging from 97.40% for Diabetes to 87.71% Neighbors, Logistic Regression, Support
for Parkinson’s. The greatest benefit is the way Vector Machine, and Decision Tree.
in which the system integrates various models According to the study, Random Forest and
into a single easy to use platform. However, as Logistic Regression produced the best
it was only evaluated on comparatively minor accuracy. Researchers came to the conclusion
data sets and yet not even remotely within that machine learning improves patient
outcomes by effectively enhancing early accuracy came out somewhere between 71%
diabetes detection. For improved performance, and 73%. Honestly, it’s not the highest result
they recommended addressing class out there, but it’s still useful—especially for
imbalance, improving data preparation, and areas where resources are limited. The authors
honing feature selection. To integrate deep thought it could get better results by adding
learning approaches, validate models on larger more features and improving how the data is
datasets, and enhance real-world clinical cleaned or preprocessed. They also mentioned
usage, more research is required. that testing the model on more types of data
could help it perform better in real-world
Nelly Elsayed et al.[14 ]This study introduces situations.
an affordable and efficient approach for early
diabetes detection using an Extreme Learning In another paper, Shuge Ouyang [8] tested
Machine (ELM) trained on questionnaire- different machine learning models on the same
based data. Impressively, the model reached an problem. The results varied: Decision Tree had
accuracy of 98.07% with zero false positives 77.55%, Random Forest got 82.17%, Naïve
when using a multiquadratic activation Bayes ranged from 74% to 86.1%, SVM
function. Thanks to ELM’s fast training and reached 94.6%, and a hybrid GA-SVM scored
strong generalization—achieved by randomly 84.07%. SVM gave the best results overall.
assigning hidden layer weights—the model Random Forest helped prevent overfitting too,
runs quickly and doesn't demand much which is nice. The study mentioned that there's
computing power. It's especially practical for still a lot to improve—like making the models
rural or underserved communities. Still, since easier to interpret, optimizing parameters, and
it depends on self-reported answers and hasn’t cleaning the data better. There was also a point
been compared with other algorithms, its about needing consistent medical data formats
scope is somewhat limited. Even so, it offers a so that these models can be used in real clinics.
valuable early-warning system that
encourages timely medical consultation. Then I read Narendra Mohan et al. [9], where
they compared K-Nearest Neighbors, Naïve
Lakhwani K. et al. [17] This post discusses Bayes, Random Forest, and Logistic
how machine learning can assist in identifying Regression. Logistic Regression came out as
if a person could have diabetes based on a the most accurate at 90.2%. The rest were
simple method called a Decision Tree. The between 82% and 86.9%. They concluded that
model was trained on a small data set (the these models are actually helpful in predicting
PIMA Indian data set) and was able to get an heart disease, and Logistic Regression seemed
accuracy of 73.91%, which is actually quite the most stable. They recommended trying
good for such a rudimentary approach. One of other models too and focusing on improving
the main advantages of a Decision Tree is that how data is prepared before training.
it is easy to understand—you can literally see
where the model comes to its conclusions. Finally, in the research by M. Gagoriyal and
Nevertheless, there are a few drawbacks. M. K. Khandelwal [10], they tried out six
Firstly, it only checks one particular route, so models: Random Forest, Decision Tree,
we have no idea whether other models could Logistic Regression, SVM, KNN, and
potentially make better decisions. Secondly, XGBoost. Their hybrid model gave the highest
due to the fact that the data set is relatively accuracy—91.3%—while KNN had the
small, the outcome could perhaps not be lowest at 82.8%. They believed ML can boost
trusted with bigger or more complex groups. heart disease prediction. One interesting
Overall, it's a solid beginning, but there's suggestion was to try cutting down on the
certainly room for more improvement with number of features to make things more
higher-end models and more data. efficient. They also thought neural networks
might be a good direction to explore. Still, just
Heart Disease Prediction Using ML like the others, they stressed the need for
testing on more varied datasets to make sure
I looked into a study by Nurbaity Sabri et al. the models actually hold up in different cases.
[7], where they used something called
Categorical Naïve Bayes for predicting heart Lakshmi A. et al [16] This paper does an
disease. After balancing the dataset, the analysis of heart disease
prediction through machine learning several models like SVM, Naïve Bayes, KNN,
comparing models such as Decision Tree, Random Forest, Logistic Regression, and
Random Forest, SVM, and KNN, with the best Decision Tree. Out of all those, SVM and
performance achieved by Random Forest. Random Forest did the best, with accuracy
A significant plus is side-by-side comparison, going over 91%. Like the others, this research
which presents an easy understanding of also leaned on speech data for predictions.
which model performs better. Employing a They said using deep learning could push
common dataset further contributes to the performance even more, and also talked about
reproducibility and consistency. Nevertheless exploring non-invasive ways to gather useful
the paper lacks in a few areas. It does not state health info. Same as the rest, they concluded
how the data were cleaned or how the features that the models need to be validated with real-
were chosen both important steps,particularly world data before anyone can rely on them in
in healthcare-related research. The dataset actual medical practice.
itself is rather small and is not diverse, which
makes it more difficult to believe the results in Sandhiya S et al. [15] It is able to achieve
real-world scenarios.The biggest deficiency is comparatively high accuracy in a biomedical
not a deep consideration of key procedures voice dataset of UCI repository. The
such as preprocessing, which means that the advantages of the study lie in simplicity,
study will feel a bit incomplete, otherwise replicability, and ease of visualizing outcomes.
promising that its direction goes. Nevertheless, it lacks as it considers accuracy
alone for analysis without incorporating
Parkinson’s Disease Prediction Using ML performance metrics such as precision or F1-
score. The dataset size is small, and there was
While going through the work by Ezhilin no cross-validation or hyperparameter tuning.
Freeda S et al. [11], I noticed they focused on The most significant deficiency is the lack of
predicting Parkinson’s disease using machine comparative analysis with other algorithms,
learning models like Random Forest, Decision which makes the study less credible and limits
Tree, and XGBoost. XGBoost ended up giving insight into how KNN behaves compared to
the best results in terms of accuracy. What competing models in real-world diagnostic
stood out to me is that they used speech data applications.
for detection, which seems pretty efficient.
They did mention some improvements could III. Proposed Methodology
help—like selecting better features, cleaning
the data more carefully, and even adding extra We’ve split the process into five simple steps
health indicators. They also pointed out that that make sense when you see them laid out.
testing on bigger and more diverse data is
1. Data Acquisition and Preprocessing
important if this is ever going to be used
seriously in hospitals. For this project, we used three publicly
available datasets, each one focused on a
Then, in another paper by C. Amali et al. [12],
different disease:
they tried combining Random Forest and
XGBoost to create a hybrid model for Target
predicting Parkinson’s. Again, XGBoost did Disease Samples Features
Variable
really well. This study also added something
new—they looked into using wearable devices
and video recordings for early detection. I Outcome (0
Diabetes 768 8
thought that was a cool addition. The authors or 1)
suggested more work on real-time monitoring,
picking out the right features, and making the Heart Target (0 or
303 13
model run more smoothly. Just like the Disease 1)
previous study, they emphasized that testing
on larger data sets and making the model Diagnosis (0
explainable for doctors is still a big deal. Parkinson’s 2,105 34
or 1)
Lastly, in the paper by D. Ambujam
Vigneshwari and Aravinth J. [13], they tested
Before diving into the models, we made sure For Diabetes:
the data was all set. Here's what we took care
of:
Feature scaling: To keep things equal, we
adjusted the features using normalization or
standardization.
Converting categories: We changed any non-
numeric data into numbers so the models could
work with it.
Feature selection: We chose the most
important features by looking at correlations
or what the models told us were key.

For Heart:

For Parkinson’s

2. Model Evaluation and Selection


When it came to modeling, we tested six
different algorithms for each disease:
a. Logistic Regression (LR)
b. Support Vector Classifier (SVC)
c. K-Nearest Neighbors (KNN)
d. Random Forest (RF)
e. Gradient Boosting (GB)
f. AdaBoost 3. Hybrid Model Construction
To make sure the models were reliable, we Once we figured out which two models
used 5-Fold Cross Validation. Then we worked best for each disease, we thought —
evaluated them based on: why settle for just one? Instead, we combined
a. Accuracy them to see if we could get even better results.
b. Precision, Recall, and F1-score It’s like getting a second opinion before
c. ROC-AUC Score making a decision. For each disease, we
picked the top two models and blended their
predictions using something called soft voting
— they both weigh in, and the final result is
based on that. This actually gave us a
noticeable improvement, with accuracy going
up by around 3 to 6 percent compared to using
just one model on its own.
Here’s how we paired them up:
Diabetes: Support Vector Classifier + Logistic
Regression
Heart Disease: Logistic Regression + K-
Nearest Neighbors
Fig. 2: User Interface for Heart Disease Prediction
Parkinson’s: Gradient Boosting + Random
Forest
Once these hybrid models were tested, they
consistently outperformed the single models.
On average, we saw an increase in prediction
accuracy by around 3–6%, which is a solid
gain when you're aiming for reliability in
health-related predictions.

IV. Result
The proposed system successfully predicts
three major diseases—Diabetes, Heart Disease,
and Parkinson’s—using hybrid machine learning
models. After evaluating multiple algorithms for
each disease, the top two were combined using soft
voting to enhance performance. The resulting
hybrid models achieved strong accuracies: Fig. 3: User Interface for Parkinson’s Disease
98.18% for Diabetes (SVC + Logistic Prediction
Regression), 90.16% for Heart Disease (Logistic
Regression + KNN), and 95.23% for Parkinson’s
(Gradient Boosting + Random Forest).
To make the system accessible and user-friendly, a
web application was built using Streamlit. The
interface allows users to enter medical inputs and
instantly receive predictions for each disease.
Screenshots of the prediction interfaces are shown
in the figures below.
Fig. 1: User Interface for Diabetes Prediction
V. Conclusion it easier to access, especially in places with
fewer resources.
So, the main goal here was to see if it was
actually possible to build one system that And last — this ssystem should really support
could predict more than one disease using other languages and local setups. Healthcare
machine learning. Most tools focus on just one isn’t one-size-fits-all, so adjusting it for
thing at a time, but that doesn’t really reflect different regions could help it go much further.
how messy real health problems can be.
People often have overlapping symptoms or
multiple chronic conditions. That’s where the References
idea came from — why not build a smarter
system that does more? [1]. S.Chaudhari, P. Deo, P. Deshmukh,
A. Deshpande, Priya Shelke, A.Chitre.
We tested a bunch of models for each disease
and then picked the two best for each. Turns (2023). Multiple disease prediction using
out, combining those into hybrid models ML learning algorithm. In Proceedings of
(using something called soft voting) gave a the 7th International Conference on
noticeable boost in accuracy. Not a massive Computing, Communication, Control and
jump, but it did go up by a few percent — and Automation (ICCUBE4), Pune, India,
that’s not nothing when it comes to medical August 18-19, 2023.
predictions. We even made a small web app for [2]. R. Shanthakumari, C. Nalini, S.
it using Streamlit, which lets people test it out
Vinothkumar, E. M. Roopadevi, & B
live. Honestly, it's not perfect, but it shows that
this kind of system could actually help in real Govindaraj. (2022). Multi disease
life. prediction system using Random Forest
algorithm in healthcare system. In
V. Future Scope Proceedings of the 2022 International
First off, there’s definitely room to add more Mobile and Embedded Technology
diseases. Right now it only covers diabetes, Conference (MECON).
Parkinson’s, and heart disease — but things [3]. Mohammed Azeez, M. Adnan, M.
like cancer, or even early mental health issues, Mehboob, M., & S. Patil. (2024). Multiple
could be added in the future. disease prediction system using machine
We didn’t get to connect the system to any live learning. International Research Journal
health data, but that’s something that could of Modernization in Engineering
really improve things. Like if it pulled info Technology and Science. Retrieved from
from wearables or health apps, predictions www.irjmets.com
could be much more personalized. [4]. A.Mangal, & V. Jain. (2022). Performance
I kept thinking about deep learning too. We analysis of machine learning models for
used pretty classic models here, but it might be prediction of diabetes. In Proceedings of
worth trying neural networks next time, the 2nd International Conference on
especially for the Parkinson’s data. Innovative Sustainable Computational
Technologies (CISCT). IEEE.
One problem with ML is that the predictions
don’t always explain themselves. If doctors [5]. P. Shrivastava, Aradya Kumari, Shimpu
could see why the system made a call, they’d Kumari, A., & Praveen Bajaj, (2023). A
probably be more likely to trust it. comprehensive review on the prediction of
diabetes disease using machine learning.
This whole thing still needs to be tested on real
In Proceedings of the 11th International
clinical data. Public datasets are fine for
testing, but they don’t show how well it’d Conference on Intelligent Systems and
work in an actual hospital or clinic. Embedded Design (ISED).
[6]. KM Jyoti Rani, (2020). Diabetes
Making this into a mobile app or something prediction using machine learning.
cloud-based would be a smart move. It’d make
International Journal of Scientific
Research in Computer Science, [13]. D. Ambujam Vigneswari & Aravinth J.
Engineering and Information Technology. (2021). Parkinson's Disease Diagnosis
[7]. N. Sabri, A. Shari, , K. A. F. Abu Samah, Using Voice Signals by Machine Learning
M. R. M. Noordin, A. S. Shari, F. M. Ishak, Approach. Proceedings of the 2021 6th
, W. M., Wan Mustapha & Affendi, M. F. International Conference on Recent
A. N. R. (2023). Heart disease prediction Trends on Electronics, Information,
of an individual using Naïve Bayes Communication & Technology (RTEICT),
algorithm. In Proceedings of the 2023 August 27-28, 2021.
IEEE 11th Conference on Systems, [14]. N. Elsayed, Z. ElSayed, & M. Ozer.
Process & Control (ICSPC), Malacca, (2022). Early Stage Diabetes Prediction
Malaysia, December 16, 2023. via Extreme Learning Machine. In
[8]. Shuge Ouyang. (2022). Research of Heart Proceedings of the IEEE SoutheastCon
Disease Prediction Based on Machine 2022. IEEE.
Learning. In Proceedings of the 2022 5th [15]. Sandhiya S, Dr. Prabhu V, Dr. Ashok S, K.
International Conference on Advanced Mohanraj, Mr. G. Vishnu Vardhan Rao, &
Electronic Materials, Computers and Dr. R. Azhagumurugan. (2022). Parkinson
Software Engineering (AEMCSE). Disease Prediction using Machine
[9]. Narendra Mohan, Vinod Jain, & Learning Algorithm. Proceedings of the
Gauranshi Agrawal. (2021). Heart 2022 International Conference on Power,
Disease Prediction Using Supervised Energy, Control and Transmission
Machine Learning Algorithms. Systems (ICPECTS). IEEE.
Proceedings of the 2021 5th International [16]. Lakshmi, A., & Devi, R. (2023). Heart
Conference on Information Systems and Disease Prediction Using Enhanced
Computer Networks (ISCON), GLA Whale Optimization Algorithm Based
University, Mathura, India, Oct 22-23, Feature Selection With Machine Learning
2021. Techniques. Proceedings of the 2023 12th
[10]. Mamta Gagoriyal, & M. K. Khandelwal. International Conference on System
(2023). Heart disease prediction analysis Modeling & Advancement in Research
using hybrid machine learning approach. Trends (SMART). IEEE.
In Proceedings of the 2023 International [17]. Lakhwani, K., Bundele, M. M., Bhargava,
Conference on Intelligent and Innovative
S., Somwanshi, D., & Hiran, K. K. (2020).
Technologies in Computing, Electrical
Prediction of the onset of diabetes using
and Electronics (IITCEE).
artificial neural network and Pima Indians
[11]. Ezhilin Freeda S, Ezhil Selvan T C, &
diabetes dataset. In Proceedings of the 5th
Vishnu Durai R S. (2022). Prediction of
IEEE International Conference on Recent
Parkinson's Disease Using XGBoost.
Advances and Innovations in Engineering
Proceedings of the 2022 8th International
(ICRAIE).
Conference on Advanced Computing and
[18]. L. D. Gopisetti, S. Kuna, S. K. L.
Communication Systems (ICACCS).
Kummera, N. Parsi, S. R. Pattamsetti, and
[12]. C. Amali, R. Rajesh, J. Sai Abrameyan, G.
H. P. Kodali, “Multiple Disease Prediction
Rajasekar, & N. Mohammed Muhaseen.
System using Machine Learning and
(2023). An Enhanced Hybrid Machine
Streamlit,” in Proc. 2023 5th Int. Conf.
Learning Model for Diagnosis of
Smart Systems and Inventive Technology
Parkinson's Disease. Proceedings of the
(ICSSIT), Hyderabad, India, 2023.
International Conference on Innovative
Data Communication Technologies and
Application (ICIDCA-2023). IEEE
Xplore.

You might also like