0% found this document useful (0 votes)

35 views33 pages

Project Report Minor

Uploaded by

x6h25yxr5y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views33 pages

Project Report Minor

Uploaded by

x6h25yxr5y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 33

A PROJECT REPORT

DIABETES PREDICTION USING MACHINE

LEARNING

Submitted to
KIIT Deemed to be University

In Partial Fulfilment of the Requirement for the Award of

BACHELOR’S DEGREE IN
INFORMATION TECHNOLOGY

BY
ANAND PRAKASH 22053924
TUSHIT MITTAL 22053995
AMAN YADAV 22051316
AAROH THAPA 22051477
HARSHDEEP DAS 2205378

UNDER THE GUIDANCE OF

Dr. MUKESH KUMAR

SCHOOL OF COMPUTER ENGINEERING

KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY
BHUBANESWAR, ODISHA - 751024
April 2025
A PROJECT REPORT
on
DIABETES PREDICTION USING MACHINE LEARNING

Submitted to
KIIT Deemed to be University
In Partial Fulfilment of the Requirement for the Award of

BACHELOR’S DEGREE IN
INFORMATION TECHNOLOGY
BY

ANAND PRAKASH 22053924

TUSHIT MITTAL 22053995
AATOH THAPA 22051316
AMAN YADAV 22051477
HARSHDEEP DAS 2205378

UNDER THE GUIDANCE OF

Dr. MUKESH KUMAR

SCHOOL OF COMPUTER ENGINEERING

KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY
BHUBANESWAE, ODISHA -751024
April 2025
KIIT Deemed to be University
School of Computer Engineering
Bhubaneswar, ODISHA 751024

CERTIFICATE
This is certify that the project entitled
DIABETES PREDICTION USING MACHINE LEARNING
submitted by

ANAND PRAKASH 22053924

TUHSIT MITTAL 22053995
AAROH THAPA 22051477
AMAN YADAV 22051316
HARSHDEEP DAS 2205378

is a record of bonafide work carried out by them, in the partial fulfilment of the
requirement for the award of Degree of Bachelor of Engineering (Computer Sci-
ence & Engineering OR Information Technology) at KIIT Deemed to be university,
Bhubaneswar. This work is done during year 2024-2025, under our guidance.

Date: 07/04/2025

Dr. MUKESH KUMAR

Project Guide
Acknowledgements

We are profoundly grateful to Dr. MUKESH KUMAR , School of Computer

Engineering, KIIT University. It is because of his able and mature guidance,
insights, advises, co-operation, suggestions, keen interest and thorough
encouragement extended throughout the period of project work without which it
would not be possible for me to complete my project.

I am very grateful to all the faculty members of our college for their precious
time and untiring
effort spent over our training for acquainting me with the nuances of the entailing
work and thanks for the invaluable time they spent training me in the intricacies of
the job.

I extend my sincere gratitude towards him for providing us the opportunity and
resources to work on this project. It has been of great learning to be on the training
and doing the project simultaneously, which enriched my knowledge and developed
my outlook for becoming a better professional.

It is my pleasant duty to thank all the concerned people who have directly or
indirectly extended their helping hand during the course of this project report.
Above all, I gratefully acknowledge the constant support, encouragement and
patience of my family and friends during the entire duration of my project training.
Hopefully, This project would add as an asset to my academic profile. Thank You!

ANAND PRAKASH
TUSHIT MITTAL
AAROH THAPA
AMAN YADAV
HARSHDEEP DAS
ABSTRACT
Diabetes mellitus is a chronic metabolic disorder that poses serious health risks and
has become increasingly prevalent across the globe. Early detection of diabetes is
critical in reducing complications and improving patient outcomes. With the
advancement of artificial intelligence and data analytics, machine learning has
shown significant promise in enhancing diagnostic accuracy and supporting clinical
decision-making. This study explores the application of various machine learning
algorithms to predict diabetes using features derived from medical records. The
analysis focuses on key clinical parameters such as glucose level, insulin, BMI, and
age, which are known indicators of diabetes risk.

In this work, models such as Logistic Regression, Support Vector Machine,

Random Forest, and Decision Trees were implemented and evaluated. These
models were trained and tested using publicly available datasets, including the
PIMA Indian Diabetes dataset. Evaluation metrics such as accuracy, precision,
recall, and F1-score were used to compare model performance. Among the tested
approaches, ensemble methods like Random Forest achieved superior results,
indicating better generalization and predictive capability. This research highlights
the potential of machine learning in developing reliable and efficient diagnostic
tools to aid healthcare professionals in the early detection of diabetes.

Keywords: Diabetes Prediction, Machine Learning, PIMA Dataset, Classification,

Medical Diagnosis, Random Forest, Logistic Regression, Healthcare Analytics
Contents
1 Introduction 1
2 Basic Concept 2-4
2.1 Basic concept in Machine learning for classification 2
2.2 Data Preprocessing Technique 3
2.3 Performance Evaluation Metrics 3
2.4 Review of Related Work 4
2.5 Conclusion of Literature Review 4
3 Problem Statement / Requirement Specifications 5-7
3.1 Project Planning 5
3.2 Project Analysis 6
3.3 System Design 6
3.3.1 Design Constraints 6
3.3.2 Block Diagram 7
4 Implementation 8-13
4.1 Data Acquisition and Preprocessing Implementation 8
4.2 Model Development and Training Implementation 9
4.2.1 Model Selection 9
4.2.2 Model Training 9
4.2.3 Hyperparameter Tuning 9
4.2.4 Model Saving 9
4.3 Model Evaluation and Comparison Implementaion 9
4.4 Testing / Verification Plan 10
4.5 Result Analysis 11-13
5 Standards Adopted 14-15
5.1 Coding Standards 14
5.2 Data Handling and Ethics 14
5.3 Model Evaluation Standards 15
5.4 Documentation Standards 15
6 Conclusion and Future Scope 16-17
6.1 Conclusion 16
6.2 Future Scope 17
References 18
Individual Contribution Report 19-23
Plagiarism Report 24
Diabetes Prediction Using Machine Learning

Chapter 1

Introduction
Diabetes mellitus is a chronic and potentially life-threatening disease affecting
millions globally. Characterized by high blood sugar levels due to insulin
resistance or insufficient insulin production, diabetes can lead to severe health
complications including cardiovascular disease, kidney failure, and vision
impairment. According to the International Diabetes Federation, over 537
million adults were living with diabetes in 2021, and this number is expected to
rise significantly in the coming years. In a country like India, where the burden
of non-communicable diseases is high, early detection and proactive
management of diabetes are critical.

Traditional methods for diabetes diagnosis often rely on blood tests and clinical
evaluation, which may not always be accessible or affordable, particularly in
resource-constrained settings. In this context, machine learning presents an
effective alternative for predictive diagnostics by leveraging patterns in medical
data to forecast disease onset.

This project focuses on building a machine learning-based system to predict the

likelihood of diabetes using patient health data. The dataset used for this study is
the PIMA Indian Diabetes dataset, which includes features such as glucose level,
BMI, blood pressure, insulin, and number of pregnancies. Multiple classification
algorithms—including Logistic Regression, Decision Trees, Random Forests,
Support Vector Machines, and XGBoost—are explored and evaluated to identify
the most accurate and reliable model.

By applying systematic preprocessing, feature engineering, and performance

evaluation techniques, this project aims to demonstrate how machine learning
can enhance preventive healthcare. The ultimate goal is to assist healthcare
professionals in identifying high-risk patients, enabling timely intervention and
improved patient outcomes.

School of Computer Engineering, KIIT, BBSR 1

Diabetes Prediction Using Machine Learning

CHAPTER 2: BASIC CONCEPTS / LITERATURE REVIEW

Machine learning (ML) has emerged as a powerful tool in the field of medical
diagnostics, particularly for classification problems such as disease prediction. In
the context of diabetes, various ML models have been applied to analyze patient
data and detect patterns that may indicate a higher risk of developing the disease.
The foundation of this approach lies in supervised learning, where models are
trained on labeled datasets to learn relationships between input features and
outcomes.

2.1 Basic Concepts in Machine Learning for Classification

Supervised Learning: This involves training models on input-output
pairs. In the case of diabetes prediction, the model learns to associate input
features (like glucose level, BMI, insulin, etc.) with an output label
(diabetic or non-diabetic).

Classification Algorithms: Various algorithms can be used for

classification tasks. Each comes with its own strengths and is suitable for
different data distributions and problem types.

Logistic Regression: A statistical model that estimates the probability of a

binary outcome.

Decision Trees: A tree-like model used for making decisions based on

feature values

Random Forest: An ensemble learning method that combines multiple

decision trees to improve predictive performance and reduce overfitting.

Support Vector Machines (SVM): A robust classifier that finds the

optimal hyperplane separating classes in high-dimensional space.

K-Nearest Neighbors (KNN): A non-parametric method that classifies

based on the most common label among the nearest data points.
L-XGBoost: A gradient boosting framework known for its speed and
accuracy in structured data problems.

School of Computer Engineering, KIIT, BBSR 2

Diabetes Prediction Using Machine Learning

2.2 Data Preprocessing Techniques

Effective data preprocessing is essential to ensure the quality and reliability of
ML models:

Handling Missing Values: Missing data is often filled using mean,

median, or other imputation techniques.

Normalization and Scaling: Standardizing the data helps in improving

the convergence of models.

Outlier Detection: Statistical techniques like Z-score or IQR are used to

detect and manage outliers that can skew results.

Feature Selection: Identifying and selecting the most relevant features

can enhance model accuracy and reduce training time.

2.3 Performance Evaluation Metrics

To comprehensively assess classifier performance, we employ:

Accuracy: Proportion of correct predictions over all instances.

Precision: Fraction of true positives among predicted positives,

reflecting false-alarm rate.

Recall (Sensitivity): Fraction of true positives among actual

positives, indicating detection capability.

F1 Score: Harmonic mean of precision and recall, balancing the

two.

AUC (Area Under the ROC Curve): Measures discrimination ability

across classification thresholds.

AUPR (Area Under the Precision-Recall Curve): Focuses on

performance for the positive class, particularly informative for
imbalanced data.

School of Computer Engineering, KIIT, BBSR 3

Diabetes Prediction Using Machine Learning

2.4 Review of Related Work

Numerous studies have shown the efficacy of ML in diabetes prediction. For
example, research published in medical journals and on platforms like Kaggle
demonstrate how models trained on the PIMA Indian Diabetes dataset have
achieved accuracies ranging from 70% to 85%, depending on preprocessing and
algorithm choice.

Ensemble methods like Random Forest and boosting techniques such as

XGBoost have consistently outperformed individual models in terms of accuracy
and robustness. Feature importance analysis in these studies highlights the
critical role of glucose levels, BMI, and insulin in predicting diabetes risk.
Additionally, hybrid approaches combining feature selection and ensemble
learning have shown promise in enhancing diagnostic performance.

Recent developments also focus on explainable AI (XAI), which aims to make

model decisions transparent and interpretable to healthcare professionals,
thereby increasing trust in automated prediction systems.

2.5 Conclusion of Literature Review

The collective findings from existing literature support the use of ML models for
early diabetes prediction. They also emphasize the importance of proper data
handling, algorithm selection, and evaluation metrics in developing an effective
diagnostic tool. This project builds upon these insights to design and implement
a predictive system tailored to the PIMA dataset, with the objective of
contributing to accessible and scalable healthcare analytics.

School of Computer Engineering, KIIT, BBSR 4

Diabetes Prediction Using Machine Learning

Chapter 3

Problem Statement / Requirement

Specifications
Diabetes mellitus is a chronic metabolic disorder that, if left undiagnosed or
untreated, can lead to serious complications such as heart disease, kidney failure,
nerve damage, and vision problems. A significant number of individuals with
diabetes remain undiagnosed due to lack of early symptoms or access to
diagnostic facilities. Therefore, there is a growing need for automated systems
that can assist in the early detection and prediction of diabetes based on patient
data.

This project aims to develop a machine learning-based classification system that

can predict whether a person is likely to have diabetes using a set of
physiological and clinical attributes. The primary goal is to build a model that
can serve as a preliminary diagnostic tool for healthcare professionals or as a
personal health assistant for patients, especially in resource-constrained settings.

The model will be trained and validated using the PIMA Indian Diabetes
Dataset, which contains multiple medical variables collected from female
patients of Pima Indian heritage aged 21 years and older.

Objective:

To build an intelligent system capable of predicting diabetes status based

on a patient’s health attributes.

To evaluate and compare the performance of different machine learning

algorithms on the dataset.

To use data preprocessing and feature selection techniques to improve the

accuracy and reliability of the prediction.

To identify the most influential features contributing to diabetes

prediction.

To present the model performance using evaluation metrics such as

accuracy, precision, recall, F1-score, and ROC-AUC.

School of Computer Engineering, KIIT, BBSR 4

Diabetes Prediction Using Machine Learning

Functional requirements:

Data Input Module: Load and preprocess the dataset including handling
missing values, normalization, and data splitting.

Model Training Module: Train models using Logistic Regression,

Decision Tree, Random Forest, SVM, KNN, and XGBoost classifiers.

Evaluation Module: Evaluate model performance with various metrics

and confusion matrix.

Visualization Module: Generate visualizations such as heatmaps, pair

plots, ROC curves, and feature importance charts.

Reporting Module: Output final accuracy scores and identify the best
performing model for prediction.

Non Functional Requirement:

Accuracy: The model should aim for a high accuracy (ideally 75% or
above) with balanced precision and recall.

Scalability: The system should be capable of being expanded with more

data and additional features.

Usability: Results should be easy to interpret for non-technical users.

Efficiency: The model should be optimized to run efficiently on standard

computing resources.

3.1 Project Planning:

Duration
Phase (Estimated) Key Milestones
Identification of relevant datasets, Data acquisition, Data
Phase 1: Data Collection & cleaning, Handling missing values, Feature engineering, Data
Preparation 4 Weeks splitting
Implementation and training of multiple machine learning
Phase 2: Model Development models (e.g., Logistic Regression, SVM, Decision Tree,
& Training 8 Weeks Random Forest, etc.), Hyperparameter tuning
Evaluation of model performance using appropriate metrics,
Phase 3: Model Evaluation & Comparison of model results, Selection of the best-
Comparison 4 Weeks performing model(s)
Phase 4: Conceptual UI User interface requirements defined, Wireframes and
Design 3 Weeks mockups for potential web/application interface created
Phase 5: Documentation &
Reporting 2 Weeks Project documentation completed, Final report generated

School of Computer Engineering, KIIT, BBSR 5

Diabetes Prediction Using Machine Learning

3.2 Project Analysis:

1. Data Quality and Ambiguity
a. Assess completeness and consistency of clinical measurements
b. Verify class balance and address any skew through resampling if
necessary

2. Model Selection Criteria

a. Ensure fair comparison by fixing random seeds and
hyperparameter grids
b. Avoid data leakage by strictly separating training, validation, and
test sets

3. Performance Trade-offs
a. Balance sensitivity (recall) against specificity (precision)
depending on clinical priorities
b. Determine acceptable thresholds for false negatives to minimize
missed diagnoses

4. Resource Constraints
a. Estimate computation time for each model training and tuning step
b. Plan for hardware limitations (CPU/GPU availability, memory)

3.3 System Design:

3.3.1 Design Constraints:

Programming Language: Python

Development Environment: Jupyter Notebook / Google Colab / VS Code

Libraries Used:

pandas, numpy – for data manipulation and preprocessing

matplotlib, seaborn – for data visualization

scikit-learn – for machine learning models and evaluation metrics

xgboost – for gradient boosting model.

School of Computer Engineering, KIIT, BBSR 6

Diabetes Prediction Using Machine Learning

3.3.2 Block Diagram:

The flowchart illustrates a standard machine learning pipeline. It starts

with Data Validation and Preprocessing. Then, Feature Engineering
prepares the data for Model Selection and Training using various
algorithms. Finally, Model Evaluation helps choose the best model.

School of Computer Engineering, KIIT, BBSR 7

Diabetes Prediction Using Machine Learning

4.Implementation

This section outlines the planned approach for implementing the Diabetes
Prediction System. The implementation will be an iterative process, starting with
data exploration and model development.

4.1. Data Acquisition and Preprocessing Implementation:

 4.1.1. Dataset Identification: Identify and acquire a suitable dataset for
diabetes prediction. Prioritize datasets relevant to the demographic and
health characteristics of the population in Patna, Bihar, if available.
Publicly available datasets like the Pima Indians Diabetes Database will be
used as an initial resource if local data is not immediately accessible.
 4.1.2. Data Exploration and Analysis: Perform exploratory data analysis
(EDA) to understand the characteristics of the dataset, identify potential
issues (e.g., missing values, outliers), and gain insights into the
relationships between features and the target variable (diabetes status).
 4.1.3. Data Cleaning and Handling Missing Values: Implement
strategies to handle missing values, such as imputation techniques (e.g.,
mean, median, mode) or removal of incomplete records, based on the
extent and nature of the missing data.
 4.1.4. Feature Engineering (If Applicable): Explore the creation of new
features from existing ones that might improve the predictive power of the
models (e.g., BMI calculation from weight and height).
 4.1.5. Feature Scaling and Normalization: Apply appropriate scaling or
normalization techniques (e.g., standardization, min-max scaling) to
ensure that features with different ranges do not disproportionately
influence the machine learning models.
 4.1.6. Data Splitting: Divide the preprocessed dataset into training,
validation, and testing sets to train the models, tune hyperparameters, and
evaluate their final performance on unseen data.

School of Computer Engineering, KIIT, BBSR 8

Diabetes Prediction Using Machine Learning

4.2. Model Development and Training Implementation:

 4.2.1. Model Selection: Implement the chosen machine learning models
using libraries like scikit-learn in Python. The initial set of models will
likely include:
o Logistic Regression
o Support Vector Machines (SVM)
o Decision Trees
o Random Forests
o Naive Bayes
o K-Nearest Neighbors (KNN)
o Potentially a simple Artificial Neural Network (ANN) using
TensorFlow or Keras.
 4.2.2. Model Training: Train each selected model on the training dataset
using appropriate training algorithms and parameters.
 4.2.3. Hyperparameter Tuning: Optimize the hyperparameters of each
model using techniques like cross-validation and grid search or
randomized search to achieve the best possible performance on the
validation dataset.
 4.2.4. Model Saving: Save the trained and tuned models for later
evaluation and potential deployment.

4.3. Model Evaluation and Comparison Implementation:

 Performance Metric Calculation: Evaluate the performance of each
trained model on the held-out test dataset using relevant classification
metrics such as accuracy, precision, recall, F1-score, and AUC-ROC.
Generate confusion matrices to further analyze the model's predictions.
 Results Visualization: Visualize the evaluation results (e.g., ROC curves,
bar charts of performance metrics) to facilitate comparison between the
models.
 Statistical Analysis (If Necessary): Perform statistical tests to determine
if the differences in performance between the models are statistically
significant.

School of Computer Engineering, KIIT, BBSR 9

Diabetes Prediction Using Machine Learning

4.4 Testing / Verification Plan

After the development of the models, a series of tests are performed to

verify that the project meets its objectives. The testing plan includes:

Test ID Test Case Title Test Condition System Behavior Expected Result
T01 Data Loading & CSV files are Dataset is The system
Preprocessing uploaded and successfully displays a
correctly parsed loaded, split, and summary of the
standardized dataset with
correct feature
types and
dimensions
T02 Model Training & Model training on Each machine Accuracy,
Prediction training set and learning model is Precision, and
testing on trained and Recall values
validation/test sets outputs within expected
predictions ranges for each
model
T03 Evaluation Evaluation of Generation of ROC curve
Metrics & ROC predictions using performance displays an area
Curves confusion matrix metrics and visual under the curve
and ROC analysis plots (ROC, PR (AUC) consistent
curves) with model
performance;
confusion matrix
correctly reflects
prediction
outcomes

The process of testing not only identifies the precision of prediction but
also the stability of data preparation and model integration.

School of Computer Engineering, KIIT, BBSR 10

Diabetes Prediction Using Machine Learning

4.5 Result Analysis:

The findings are analyzed by comparative evaluation of the performance
of different models on both tabulated and graphical plots. The major
findings are:
Performance Tables:

Performance metrics such as Accuracy, Precision, Recall, ROC-AUC,

and AUPR are reported for each model that was tested. For instance,
Logistic Regression and Random Forest both had encouraging accuracy
rates (0.80 to 0.94), while ensemble methods such as XGBoost and
Gradient Boosting demonstrated enhanced performance over AUC and
precision.
Models are also evaluated in different train and test split:
1. 80 Train / 20 Test:

School of Computer Engineering, KIIT, BBSR 11

Diabetes Prediction Using Machine Learning

70 Train / 30 Test:

School of Computer Engineering, KIIT, BBSR 12

Diabetes Prediction Using Machine Learning

1. 60 Train / 40 Test:

School of Computer Engineering, KIIT, BBSR 13

Diabetes Prediction Using Machine Learning

Chapter 5

Standards Adopted
The development of this project adhered to various technical, ethical, and
procedural standards to ensure the quality, reliability, and transparency of the
diabetes prediction system. These standards span across programming practices,
data handling, model evaluation, documentation, and collaborative development.

5.1 Coding Standards

The project follows PEP 8 coding guidelines to ensure readable and

maintainable code.

Functions and variables are consistently named using lowercase and

underscores.

All code is modularized using functions and classes to enhance

reusability and clarity.

Proper comments and docstrings are included to document the logic of

each function and section.

5.2 Data Handling and Ethics

The dataset used (PIMA Indian Diabetes Dataset) is open-source and

publicly available for educational and research purposes.

No personally identifiable information (PII) is involved, ensuring

patient anonymity and data privacy.

Any data preprocessing steps, such as imputation or feature scaling, were

applied uniformly without introducing bias.

School of Computer Engineering, KIIT, BBSR 14

Diabetes Prediction Using Machine Learning

5.3 Model Evaluation Standards

A consistent set of performance metrics—accuracy, precision, recall, F1-

score, and ROC-AUC—was applied across all models for fair comparison.

Cross-validation was used to ensure that models generalize well to

unseen data and to reduce the variance in performance.

Models were trained and tested on separate datasets using an 80:20 train-
test split to avoid data leakage.

5.4 Documentation Standards

Every stage of the project, including data exploration, preprocessing,

modeling, and evaluation, was thoroughly documented using Jupyter
Notebooks.

Visualizations such as heatmaps, ROC curves, and confusion matrices

were included to enhance interpretability.

The final report includes proper citations and a bibliography of all

datasets, tools, and libraries used.

School of Computer Engineering, KIIT, BBSR 15

Diabetes Prediction Using Machine Learning

Chapter 6

Conclusion and Future Scope

6.1 Conclusion
The project titled “Diabetes Prediction Using Machine Learning” successfully
demonstrates how data-driven techniques can assist in predicting medical
conditions such as diabetes. Using the PIMA Indian Diabetes dataset and various
classification algorithms, we built, evaluated, and compared machine learning
models that predict whether a person is diabetic or not based on health-related
attributes.

Through rigorous implementation and evaluation, it was found that ensemble

models like Random Forest and XGBoost delivered superior performance in
terms of accuracy, precision, recall, and overall robustness. Key features such as
glucose level, BMI, and insulin were identified as the most influential in the
prediction task. The project emphasizes the importance of data preprocessing,
model tuning, and evaluation strategies in achieving reliable predictions.

Overall, the project not only provides a practical solution to a real-world

healthcare challenge but also contributes to understanding how machine learning
can be used to support early diagnosis, thus enabling timely medical intervention
and reducing long-term health complications.

School of Computer Engineering, KIIT, BBSR 16

Diabetes Prediction Using Machine Learning

6.2 Future Scope

While the current project has achieved its objectives, there are several areas
where future improvements and expansions can be made:

Integration with Real-World Healthcare Systems: The predictive

model can be integrated with hospital management systems or mobile
applications for real-time screening.

Larger and More Diverse Datasets: Using a more diverse and larger
dataset would improve the generalization and reliability of the model
across different demographics and geographies.

Advanced Algorithms: Incorporating deep learning models such as

Artificial Neural Networks (ANNs) or Convolutional Neural Networks
(CNNs) could enhance prediction performance, especially with complex
feature interactions.

Explainable AI (XAI): Future versions can integrate interpretability tools

like SHAP or LIME to explain model predictions to doctors and patients
for better trust and understanding.

Time-Series Prediction: For patients with long-term data records,

recurrent neural networks (RNNs) and LSTMs can be used to make
sequential predictions and monitor the progression of diabetes risk.

Lifestyle-Based Feature Expansion: Including lifestyle-related features

like physical activity, sleep patterns, diet, and genetic history could enrich
the dataset and improve prediction accuracy.

User Interface Development: A user-friendly web or mobile interface

could be developed to allow non-technical users to access the prediction
model easily.

School of Computer Engineering, KIIT, BBSR 17

Diabetes Prediction Using Machine Learning

7. References

School of Computer Engineering, KIIT, BBSR 18

Diabetes Prediction Using Machine Learning

INDIVIDUAL CONTRIBUTION REPORT:

BRAIN TUMOR DETECTION AND
SEGMENTATION: INTEGRATING CNN AND PSO
Anand Prakash
22053924

Abstract:
Anand played a pivotal role in the data preprocessing and model development
phase. He implemented data cleaning techniques, handled missing values,
performed feature engineering, and applied normalization. He also contributed to
building core machine learning models such as Logistic Regression and Random
Forests.

Individual Contribution and Findings:

Anand focused on preparing the dataset for machine learning by performing
detailed Exploratory Data Analysis (EDA), feature scaling, and encoding. His
work ensured that the input data was clean and well-structured, which
significantly improved model accuracy and robustness.

Report Preparation Contribution:

He was responsible for writing the Implementation chapter, explaining each
step of preprocessing, model training, and testing in detail. He also reviewed the
structure of the dataset and discussed feature importance and selection.

Presentation and Demonstration Contribution:

Anand created and presented slides on EDA, data cleaning, and the importance
of preprocessing. He demonstrated how these steps impacted model performance
and discussed challenges faced during data transformation.

Full Signature of Supervisor: Full signature of the student:

……………………………. ……………………………..

School of Computer Engineering, KIIT, BBSR 19

Diabetes Prediction Using Machine Learning

INDIVIDUAL CONTRIBUTION REPORT:

BRAIN TUMOR DETECTION AND
SEGMENTATION: INTEGRATING CNN AND PSO
Tushit Mittal
(22053995)

Abstract:
Tushit worked primarily on model evaluation and visualization. He developed
performance metrics and generated plots to visualize confusion matrices, ROC
curves, and classification results.

Individual Contribution and Findings:

Tushit applied evaluation techniques including accuracy, precision, recall, F1-
score, and ROC-AUC to analyze the performance of various models. He
compared algorithms and interpreted results to determine the most suitable
model for deployment.

Report Preparation Contribution:

He authored the Results and Discussion section, where he interpreted
evaluation metrics, discussed model strengths and weaknesses, and visualized
comparative results using graphs.

Presentation and Demonstration Contribution:

Tushit prepared and explained slides on model evaluation, comparison of
classifiers, and graphical summaries. He also highlighted insights gained from
confusion matrices and ROC curves.

Full Signature of Supervisor: Full signature of the student:

……………………………. ……………………………..

School of Computer Engineering, KIIT, BBSR 20

Diabetes Prediction Using Machine Learning

INDIVIDUAL CONTRIBUTION REPORT:

BRAIN TUMOR DETECTION AND
SEGMENTATION: INTEGRATING CNN AND PSO
Aaroh Thapa
(22051477)

Abstract:
Aaroh led the literature review and contextual research. He surveyed related
work on machine learning in healthcare and documented the relevance of various
algorithms in diabetes prediction.

Individual Contribution and Findings:

He analyzed previous research, identified gaps in traditional approaches, and
helped justify the use of selected models like Random Forest and XGBoost. He
was also involved in feature research and medical interpretation of the dataset.

Report Preparation Contribution:

Aaroh was responsible for writing the Introduction and Literature Review
chapters. He structured the review to highlight the evolution of ML in healthcare
and referenced relevant journals and research papers.

Presentation and Demonstration Contribution:

He presented slides on the background of diabetes, motivation for the study, and
summary of previous work. He emphasized how the current project builds upon
and improves past research.

Full Signature of Supervisor: Full signature of the student:

……………………………. ……………………………..

School of Computer Engineering, KIIT, BBSR 21

Diabetes Prediction Using Machine Learning

INDIVIDUAL CONTRIBUTION REPORT:

BRAIN TUMOR DETECTION AND
SEGMENTATION: INTEGRATING CNN AND PSO
Aman Yadav
(22051316)

Abstract:
Aman played a major role in report documentation and final compilation. He
ensured the report was cohesive, well-formatted, and met all university
guidelines. He also contributed to performance analysis and feature importance
interpretation.

Individual Contribution and Findings:

He participated in result analysis and presentation of model outputs. He analyzed
which features (like glucose and BMI) had the most predictive power and helped
fine-tune model parameters.

Report Preparation Contribution:

Aman wrote the Conclusion and Future Scope chapter. He compiled and edited
all chapters, checked citations, and ensured consistency across formatting,
graphs, and tables.

Presentation and Demonstration Contribution:

He was responsible for compiling the final presentation and handled the Q&A
during the demonstration. He summarized the project flow, contributions, and
future enhancements during the final presentation.

Full Signature of Supervisor: Full signature of the student:

……………………………. ……………………………..

School of Computer Engineering, KIIT, BBSR 22

Diabetes Prediction Using Machine Learning

INDIVIDUAL CONTRIBUTION REPORT:

BRAIN TUMOR DETECTION AND
SEGMENTATION: INTEGRATING CNN AND PSO
Harshdeep Das
(2205378)

Abstract:
Harshdeep handled the technical integration, model optimization, and debugging
process. He implemented advanced models like XGBoost and tuned them using
Grid Search and Random Search methods.

Individual Contribution and Findings:

Harshdeep worked on improving model performance through hyperparameter
tuning and ensemble methods. His optimization techniques helped achieve the
highest accuracy among the models tested.

Report Preparation Contribution:

He contributed to the Implementation and Standards Adopted chapters,
documenting model training processes, coding standards, and reproducibility
methods.

Presentation and Demonstration Contribution:

He presented the architecture of the machine learning pipeline, explained
optimization techniques, and demonstrated live testing of the model.

Full Signature of Supervisor: Full signature of the student:

……………………………. ……………………………..

School of Computer Engineering, KIIT, BBSR 23

Diabetes Prediction Using Machine Learning

TURNITIN PLAGIARISM REPORT

(This report is mandatory for all the projects and plagiarism
must be below 25%)

School of Computer Engineering, KIIT, BBSR 24

Report
No ratings yet
Report
47 pages
ML - Mini Project Diabetic Prediction
No ratings yet
ML - Mini Project Diabetic Prediction
13 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
6 pages
Report 4227
No ratings yet
Report 4227
29 pages
1 ML
No ratings yet
1 ML
3 pages
Diabetes Prediction via ML
No ratings yet
Diabetes Prediction via ML
82 pages
Project Report On Diabetes Prediction
No ratings yet
Project Report On Diabetes Prediction
29 pages
Minipro 2
No ratings yet
Minipro 2
24 pages
Diabetes Prediction Using ML
No ratings yet
Diabetes Prediction Using ML
29 pages
CSD Project Batch 4
No ratings yet
CSD Project Batch 4
22 pages
Kush Don FINAL Jatu
No ratings yet
Kush Don FINAL Jatu
11 pages
Bca 5th Sem Minor Report
No ratings yet
Bca 5th Sem Minor Report
46 pages
Diabetes Prediction for Researchers
No ratings yet
Diabetes Prediction for Researchers
5 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
25 pages
Machine Learning and Applications CS522I1C
No ratings yet
Machine Learning and Applications CS522I1C
15 pages
Minor Project Report
No ratings yet
Minor Project Report
46 pages
DIABETIES
No ratings yet
DIABETIES
3 pages
Diabetes Detection with ML
No ratings yet
Diabetes Detection with ML
10 pages
Sample INTERNSHIP Report
No ratings yet
Sample INTERNSHIP Report
32 pages
Major Proj
No ratings yet
Major Proj
12 pages
PM For Diabetes
No ratings yet
PM For Diabetes
11 pages
Simmi
No ratings yet
Simmi
8 pages
AI Project Report
No ratings yet
AI Project Report
23 pages
Automated Payroll Management System
No ratings yet
Automated Payroll Management System
4 pages
DPS
No ratings yet
DPS
18 pages
Diabetes Prediction Using ML Techniques
No ratings yet
Diabetes Prediction Using ML Techniques
18 pages
Mini Project Report
No ratings yet
Mini Project Report
34 pages
Diabetics Prediction Using Machine Learning
100% (1)
Diabetics Prediction Using Machine Learning
18 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
1 page
Diabetes Analysis and Prediction
No ratings yet
Diabetes Analysis and Prediction
45 pages
Diabetes Prediction via ML Models
No ratings yet
Diabetes Prediction via ML Models
9 pages
Risab
No ratings yet
Risab
13 pages
ECE AI Project: Diabetes Diagnosis
No ratings yet
ECE AI Project: Diabetes Diagnosis
12 pages
3 Journal
No ratings yet
3 Journal
9 pages
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
No ratings yet
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
6 pages
Final Survey Diabetes Prediction ML IEEE
No ratings yet
Final Survey Diabetes Prediction ML IEEE
5 pages
Major Project Report 2023-2024
No ratings yet
Major Project Report 2023-2024
33 pages
ZEROTHREVIEW
No ratings yet
ZEROTHREVIEW
10 pages
Innovative
No ratings yet
Innovative
15 pages
1822 B.E Cse Batchno 227
No ratings yet
1822 B.E Cse Batchno 227
45 pages
AICTE Internship 2024 Project Report Template 2
No ratings yet
AICTE Internship 2024 Project Report Template 2
27 pages
Research Paper
No ratings yet
Research Paper
5 pages
Food Del Report 1
No ratings yet
Food Del Report 1
13 pages
Diabetes Prediction via Machine Learning
No ratings yet
Diabetes Prediction via Machine Learning
5 pages
Diabetes Prediciton Model
100% (1)
Diabetes Prediciton Model
23 pages
Presentation 3
No ratings yet
Presentation 3
8 pages
ppt715B.pptm (Autosaved)
No ratings yet
ppt715B.pptm (Autosaved)
15 pages
Irjet V6i3277
No ratings yet
Irjet V6i3277
7 pages
Major Project Final TABLE DIAGRAM
No ratings yet
Major Project Final TABLE DIAGRAM
28 pages
Diabetes ML Project
No ratings yet
Diabetes ML Project
7 pages
Ai Datascience Project Grade 10
No ratings yet
Ai Datascience Project Grade 10
14 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
13 pages
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
No ratings yet
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
11 pages
Poster Template
No ratings yet
Poster Template
1 page
Diabe PDF
No ratings yet
Diabe PDF
11 pages
Labour Act Cap 73
No ratings yet
Labour Act Cap 73
69 pages
Ang Mutya NG Section e (Book 3) (Part 3) (Completed) - 3
No ratings yet
Ang Mutya NG Section e (Book 3) (Part 3) (Completed) - 3
200 pages
Application of Reliability Centered Maintenance (RCM) To HVDC Converter Station
No ratings yet
Application of Reliability Centered Maintenance (RCM) To HVDC Converter Station
7 pages
Q2 DLP For UCSP MELC BASED
No ratings yet
Q2 DLP For UCSP MELC BASED
3 pages
NLP Sample Questions-Stu
No ratings yet
NLP Sample Questions-Stu
4 pages
Nationals vs. Mets: Key Matchup
No ratings yet
Nationals vs. Mets: Key Matchup
4 pages
667 Question Paper
No ratings yet
667 Question Paper
2 pages
Lesson Plan #2: Collaboration: Grade: Third Grade Social Studies Strand: Economics
No ratings yet
Lesson Plan #2: Collaboration: Grade: Third Grade Social Studies Strand: Economics
4 pages
A New Slogan For Drilling Fluids Engineers
No ratings yet
A New Slogan For Drilling Fluids Engineers
14 pages
Diode Characteristics Lab Guide
No ratings yet
Diode Characteristics Lab Guide
5 pages
Handbook of Digital Homecare: Successes and Failures (Communications in Medical and Care Compunetics, 3) - ISBN 3642270646, 978-3642270642
100% (19)
Handbook of Digital Homecare: Successes and Failures (Communications in Medical and Care Compunetics, 3) - ISBN 3642270646, 978-3642270642
23 pages
Petrochemical Product - Wonjin A3 210228
No ratings yet
Petrochemical Product - Wonjin A3 210228
2 pages
Green City Concept As New Paradigm For Planned Urban Growth - A Case of Green City, Jaipur
No ratings yet
Green City Concept As New Paradigm For Planned Urban Growth - A Case of Green City, Jaipur
10 pages
Storm Drainage System Design
No ratings yet
Storm Drainage System Design
15 pages
SOX Matrixes in Tenaris: Start
No ratings yet
SOX Matrixes in Tenaris: Start
388 pages
SPICE + Part B - Approval Letter - AB2519849
No ratings yet
SPICE + Part B - Approval Letter - AB2519849
1 page
Sodium Hypochlorite Solution (5.25 - 16.0 % W/W) : Section 1 - Chemical Product and Company Identification
100% (1)
Sodium Hypochlorite Solution (5.25 - 16.0 % W/W) : Section 1 - Chemical Product and Company Identification
9 pages
Sonometer
No ratings yet
Sonometer
7 pages
Assessment of Civil Servants General Competencies
No ratings yet
Assessment of Civil Servants General Competencies
12 pages
Basic Settings For Approval: Short Text
No ratings yet
Basic Settings For Approval: Short Text
27 pages
Flexible Polyurethane Foam A Primer
No ratings yet
Flexible Polyurethane Foam A Primer
7 pages
Name: Kajal .M.Kothari STD: Fybms Div: B Roll No: 76 Subject: Computer Topic: Current Procesor
No ratings yet
Name: Kajal .M.Kothari STD: Fybms Div: B Roll No: 76 Subject: Computer Topic: Current Procesor
4 pages
1A Time Clauses
No ratings yet
1A Time Clauses
3 pages
Geith Instalation Quick-Coupling Exc
50% (2)
Geith Instalation Quick-Coupling Exc
11 pages
Void Former SD Filcor Cordek Ramp
No ratings yet
Void Former SD Filcor Cordek Ramp
1 page
(Ebook PDF) Canadian Business and The Law 6th Edition by Dorothy Duplessisinstant Download
100% (3)
(Ebook PDF) Canadian Business and The Law 6th Edition by Dorothy Duplessisinstant Download
59 pages
Eloise Henry For Richmond Heights Mayor
No ratings yet
Eloise Henry For Richmond Heights Mayor
4 pages
Aikon VDF
50% (2)
Aikon VDF
20 pages
Yang 2021 PDF
No ratings yet
Yang 2021 PDF
34 pages
Contact Details Updation Form
No ratings yet
Contact Details Updation Form
1 page