Disease Prediction
Using Patient Data
By
•P. Tanmai Sai (22HP1A4431)
•K. Hema (22HP1A4406)
•T. Sai Meghana (22HP1A4423)
•Ch. Navya Sri (22HP1A4418)
Aim of the Project
•To develop a machine learning
model that predicts possible
diseases based on the symptoms
provided by the user, enhancing
early detection and awareness.
Project Workflow
1.Problem Statement
•Predict diseases based on symptoms using machine learning.
2.Data Collection
•Gathered disease-symptom dataset.
3.Data Preprocessing
•Encoding labels.
•Handling missing data.
•Balancing the dataset.
4.Model Building
•Trained models: SVM, Naive Bayes, Random Forest.
5.Model Evaluation
•Confusion matrix and accuracy score for each model.
6.Model Integration
•Combined model predictions using majority voting.
7.Prediction System
•User inputs symptoms.
•System predicts possible disease.
How Each Model is Used for Building
•Dataset contains symptoms like itching, joint pain, rashes, ulcer, stomach pain, vomiting,
muscle wasting, burning micturition, spotting, fatigue, weight gain, anxiety, cold hands, etc.
•SVM Model:
• Converts symptom inputs into binary values (present = 1, absent = 0).
• Identifies complex patterns among symptoms.
• Classifies diseases by finding the best hyperplane separating classes.
•Naive Bayes Model:
• Assumes each symptom contributes independently to the disease.
• Calculates the probability of each disease based on the combination of symptoms
present.
• Provides fast and reliable results, especially for large datasets.
•Random Forest Model:
• Creates multiple decision trees based on different symptom subsets.
• Each tree votes for a disease prediction.
• Final decision is made based on the majority vote, improving robustness.
•Final Prediction:
• Outputs from all three models are combined using majority voting.
• Ensures higher accuracy and reduces individual model errors.
Steps Involved
•Import necessary libraries (NumPy, Pandas, Seaborn, Scikit-learn).
•Load and preprocess dataset.
•Visualize class distribution.
•Apply oversampling to balance classes.
•Train classifiers:
• Support Vector Machine (SVM)
• Naive Bayes
• Random Forest
•Combine predictions using mode.
•Build prediction function for user inputs.
Conclusion
•Our project predicts diseases using patient
data based on common symptoms.
•The combined model helps improve the
reliability of predictions.
•This system can assist in early disease
detection and provide preliminary awareness
to users.
Thank You
•Thank you for your attention!
•Questions are welcome.