A PROJECT REPORT
on
DIABETES PREDICTION USING MACHINE
LEARNING
Submitted to
KIIT Deemed to be University
In Partial Fulfilment of the Requirement for the Award of
BACHELOR’S DEGREE IN
INFORMATION TECHNOLOGY
BY
ANAND PRAKASH 22053924
TUSHIT MITTAL 22053995
AMAN YADAV 22051316
AAROH THAPA 22051477
HARSHDEEP DAS 2205378
UNDER THE GUIDANCE OF
Dr. MUKESH KUMAR
SCHOOL OF COMPUTER ENGINEERING
KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY
BHUBANESWAR, ODISHA - 751024
April 2025
A PROJECT REPORT
on
DIABETES PREDICTION USING MACHINE LEARNING
Submitted to
KIIT Deemed to be University
In Partial Fulfilment of the Requirement for the Award of
BACHELOR’S DEGREE IN
INFORMATION TECHNOLOGY
BY
ANAND PRAKASH 22053924
TUSHIT MITTAL 22053995
AATOH THAPA 22051316
AMAN YADAV 22051477
HARSHDEEP DAS 2205378
UNDER THE GUIDANCE OF
Dr. MUKESH KUMAR
SCHOOL OF COMPUTER ENGINEERING
KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY
BHUBANESWAE, ODISHA -751024
April 2025
KIIT Deemed to be University
School of Computer Engineering
Bhubaneswar, ODISHA 751024
CERTIFICATE
This is certify that the project entitled
DIABETES PREDICTION USING MACHINE LEARNING
submitted by
ANAND PRAKASH 22053924
TUHSIT MITTAL 22053995
AAROH THAPA 22051477
AMAN YADAV 22051316
HARSHDEEP DAS 2205378
is a record of bonafide work carried out by them, in the partial fulfilment of the
requirement for the award of Degree of Bachelor of Engineering (Computer Sci-
ence & Engineering OR Information Technology) at KIIT Deemed to be university,
Bhubaneswar. This work is done during year 2024-2025, under our guidance.
Date: 07/04/2025
Dr. MUKESH KUMAR
Project Guide
Acknowledgements
We are profoundly grateful to Dr. MUKESH KUMAR , School of Computer
Engineering, KIIT University. It is because of his able and mature guidance,
insights, advises, co-operation, suggestions, keen interest and thorough
encouragement extended throughout the period of project work without which it
would not be possible for me to complete my project.
I am very grateful to all the faculty members of our college for their precious
time and untiring
effort spent over our training for acquainting me with the nuances of the entailing
work and thanks for the invaluable time they spent training me in the intricacies of
the job.
I extend my sincere gratitude towards him for providing us the opportunity and
resources to work on this project. It has been of great learning to be on the training
and doing the project simultaneously, which enriched my knowledge and developed
my outlook for becoming a better professional.
It is my pleasant duty to thank all the concerned people who have directly or
indirectly extended their helping hand during the course of this project report.
Above all, I gratefully acknowledge the constant support, encouragement and
patience of my family and friends during the entire duration of my project training.
Hopefully, This project would add as an asset to my academic profile. Thank You!
ANAND PRAKASH
TUSHIT MITTAL
AAROH THAPA
AMAN YADAV
HARSHDEEP DAS
ABSTRACT
Diabetes mellitus is a chronic metabolic disorder that poses serious health risks and
has become increasingly prevalent across the globe. Early detection of diabetes is
critical in reducing complications and improving patient outcomes. With the
advancement of artificial intelligence and data analytics, machine learning has
shown significant promise in enhancing diagnostic accuracy and supporting clinical
decision-making. This study explores the application of various machine learning
algorithms to predict diabetes using features derived from medical records. The
analysis focuses on key clinical parameters such as glucose level, insulin, BMI, and
age, which are known indicators of diabetes risk.
In this work, models such as Logistic Regression, Support Vector Machine,
Random Forest, and Decision Trees were implemented and evaluated. These
models were trained and tested using publicly available datasets, including the
PIMA Indian Diabetes dataset. Evaluation metrics such as accuracy, precision,
recall, and F1-score were used to compare model performance. Among the tested
approaches, ensemble methods like Random Forest achieved superior results,
indicating better generalization and predictive capability. This research highlights
the potential of machine learning in developing reliable and efficient diagnostic
tools to aid healthcare professionals in the early detection of diabetes.
Keywords: Diabetes Prediction, Machine Learning, PIMA Dataset, Classification,
Medical Diagnosis, Random Forest, Logistic Regression, Healthcare Analytics
Contents
1 Introduction 1
2 Basic Concept 2-4
2.1 Basic concept in Machine learning for classification 2
2.2 Data Preprocessing Technique 3
2.3 Performance Evaluation Metrics 3
2.4 Review of Related Work 4
2.5 Conclusion of Literature Review 4
3 Problem Statement / Requirement Specifications 5-7
3.1 Project Planning 5
3.2 Project Analysis 6
3.3 System Design 6
3.3.1 Design Constraints 6
3.3.2 Block Diagram 7
4 Implementation 8-13
4.1 Data Acquisition and Preprocessing Implementation 8
4.2 Model Development and Training Implementation 9
4.2.1 Model Selection 9
4.2.2 Model Training 9
4.2.3 Hyperparameter Tuning 9
4.2.4 Model Saving 9
4.3 Model Evaluation and Comparison Implementaion 9
4.4 Testing / Verification Plan 10
4.5 Result Analysis 11-13
5 Standards Adopted 14-15
5.1 Coding Standards 14
5.2 Data Handling and Ethics 14
5.3 Model Evaluation Standards 15
5.4 Documentation Standards 15
6 Conclusion and Future Scope 16-17
6.1 Conclusion 16
6.2 Future Scope 17
References 18
Individual Contribution Report 19-23
Plagiarism Report 24
Diabetes Prediction Using Machine Learning
Chapter 1
Introduction
Diabetes mellitus is a chronic and potentially life-threatening disease affecting
millions globally. Characterized by high blood sugar levels due to insulin
resistance or insufficient insulin production, diabetes can lead to severe health
complications including cardiovascular disease, kidney failure, and vision
impairment. According to the International Diabetes Federation, over 537
million adults were living with diabetes in 2021, and this number is expected to
rise significantly in the coming years. In a country like India, where the burden
of non-communicable diseases is high, early detection and proactive
management of diabetes are critical.
Traditional methods for diabetes diagnosis often rely on blood tests and clinical
evaluation, which may not always be accessible or affordable, particularly in
resource-constrained settings. In this context, machine learning presents an
effective alternative for predictive diagnostics by leveraging patterns in medical
data to forecast disease onset.
This project focuses on building a machine learning-based system to predict the
likelihood of diabetes using patient health data. The dataset used for this study is
the PIMA Indian Diabetes dataset, which includes features such as glucose level,
BMI, blood pressure, insulin, and number of pregnancies. Multiple classification
algorithms—including Logistic Regression, Decision Trees, Random Forests,
Support Vector Machines, and XGBoost—are explored and evaluated to identify
the most accurate and reliable model.
By applying systematic preprocessing, feature engineering, and performance
evaluation techniques, this project aims to demonstrate how machine learning
can enhance preventive healthcare. The ultimate goal is to assist healthcare
professionals in identifying high-risk patients, enabling timely intervention and
improved patient outcomes.
School of Computer Engineering, KIIT, BBSR 1
Diabetes Prediction Using Machine Learning
CHAPTER 2: BASIC CONCEPTS / LITERATURE REVIEW
Machine learning (ML) has emerged as a powerful tool in the field of medical
diagnostics, particularly for classification problems such as disease prediction. In
the context of diabetes, various ML models have been applied to analyze patient
data and detect patterns that may indicate a higher risk of developing the disease.
The foundation of this approach lies in supervised learning, where models are
trained on labeled datasets to learn relationships between input features and
outcomes.
2.1 Basic Concepts in Machine Learning for Classification
Supervised Learning: This involves training models on input-output
pairs. In the case of diabetes prediction, the model learns to associate input
features (like glucose level, BMI, insulin, etc.) with an output label
(diabetic or non-diabetic).
Classification Algorithms: Various algorithms can be used for
classification tasks. Each comes with its own strengths and is suitable for
different data distributions and problem types.
Logistic Regression: A statistical model that estimates the probability of a
binary outcome.
Decision Trees: A tree-like model used for making decisions based on
feature values
Random Forest: An ensemble learning method that combines multiple
decision trees to improve predictive performance and reduce overfitting.
Support Vector Machines (SVM): A robust classifier that finds the
optimal hyperplane separating classes in high-dimensional space.
K-Nearest Neighbors (KNN): A non-parametric method that classifies
based on the most common label among the nearest data points.
L-XGBoost: A gradient boosting framework known for its speed and
accuracy in structured data problems.
School of Computer Engineering, KIIT, BBSR 2
Diabetes Prediction Using Machine Learning
2.2 Data Preprocessing Techniques
Effective data preprocessing is essential to ensure the quality and reliability of
ML models:
Handling Missing Values: Missing data is often filled using mean,
median, or other imputation techniques.
Normalization and Scaling: Standardizing the data helps in improving
the convergence of models.
Outlier Detection: Statistical techniques like Z-score or IQR are used to
detect and manage outliers that can skew results.
Feature Selection: Identifying and selecting the most relevant features
can enhance model accuracy and reduce training time.
2.3 Performance Evaluation Metrics
To comprehensively assess classifier performance, we employ:
Accuracy: Proportion of correct predictions over all instances.
Precision: Fraction of true positives among predicted positives,
reflecting false-alarm rate.
Recall (Sensitivity): Fraction of true positives among actual
positives, indicating detection capability.
F1 Score: Harmonic mean of precision and recall, balancing the
two.
AUC (Area Under the ROC Curve): Measures discrimination ability
across classification thresholds.
AUPR (Area Under the Precision-Recall Curve): Focuses on
performance for the positive class, particularly informative for
imbalanced data.
School of Computer Engineering, KIIT, BBSR 3
Diabetes Prediction Using Machine Learning
2.4 Review of Related Work
Numerous studies have shown the efficacy of ML in diabetes prediction. For
example, research published in medical journals and on platforms like Kaggle
demonstrate how models trained on the PIMA Indian Diabetes dataset have
achieved accuracies ranging from 70% to 85%, depending on preprocessing and
algorithm choice.
Ensemble methods like Random Forest and boosting techniques such as
XGBoost have consistently outperformed individual models in terms of accuracy
and robustness. Feature importance analysis in these studies highlights the
critical role of glucose levels, BMI, and insulin in predicting diabetes risk.
Additionally, hybrid approaches combining feature selection and ensemble
learning have shown promise in enhancing diagnostic performance.
Recent developments also focus on explainable AI (XAI), which aims to make
model decisions transparent and interpretable to healthcare professionals,
thereby increasing trust in automated prediction systems.
2.5 Conclusion of Literature Review
The collective findings from existing literature support the use of ML models for
early diabetes prediction. They also emphasize the importance of proper data
handling, algorithm selection, and evaluation metrics in developing an effective
diagnostic tool. This project builds upon these insights to design and implement
a predictive system tailored to the PIMA dataset, with the objective of
contributing to accessible and scalable healthcare analytics.
School of Computer Engineering, KIIT, BBSR 4
Diabetes Prediction Using Machine Learning
Chapter 3
Problem Statement / Requirement
Specifications
Diabetes mellitus is a chronic metabolic disorder that, if left undiagnosed or
untreated, can lead to serious complications such as heart disease, kidney failure,
nerve damage, and vision problems. A significant number of individuals with
diabetes remain undiagnosed due to lack of early symptoms or access to
diagnostic facilities. Therefore, there is a growing need for automated systems
that can assist in the early detection and prediction of diabetes based on patient
data.
This project aims to develop a machine learning-based classification system that
can predict whether a person is likely to have diabetes using a set of
physiological and clinical attributes. The primary goal is to build a model that
can serve as a preliminary diagnostic tool for healthcare professionals or as a
personal health assistant for patients, especially in resource-constrained settings.
The model will be trained and validated using the PIMA Indian Diabetes
Dataset, which contains multiple medical variables collected from female
patients of Pima Indian heritage aged 21 years and older.
Objective:
To build an intelligent system capable of predicting diabetes status based
on a patient’s health attributes.
To evaluate and compare the performance of different machine learning
algorithms on the dataset.
To use data preprocessing and feature selection techniques to improve the
accuracy and reliability of the prediction.
To identify the most influential features contributing to diabetes
prediction.
To present the model performance using evaluation metrics such as
accuracy, precision, recall, F1-score, and ROC-AUC.
School of Computer Engineering, KIIT, BBSR 4
Diabetes Prediction Using Machine Learning
Functional requirements:
Data Input Module: Load and preprocess the dataset including handling
missing values, normalization, and data splitting.
Model Training Module: Train models using Logistic Regression,
Decision Tree, Random Forest, SVM, KNN, and XGBoost classifiers.
Evaluation Module: Evaluate model performance with various metrics
and confusion matrix.
Visualization Module: Generate visualizations such as heatmaps, pair
plots, ROC curves, and feature importance charts.
Reporting Module: Output final accuracy scores and identify the best
performing model for prediction.
Non Functional Requirement:
Accuracy: The model should aim for a high accuracy (ideally 75% or
above) with balanced precision and recall.
Scalability: The system should be capable of being expanded with more
data and additional features.
Usability: Results should be easy to interpret for non-technical users.
Efficiency: The model should be optimized to run efficiently on standard
computing resources.
3.1 Project Planning:
Duration
Phase (Estimated) Key Milestones
Identification of relevant datasets, Data acquisition, Data
Phase 1: Data Collection & cleaning, Handling missing values, Feature engineering, Data
Preparation 4 Weeks splitting
Implementation and training of multiple machine learning
Phase 2: Model Development models (e.g., Logistic Regression, SVM, Decision Tree,
& Training 8 Weeks Random Forest, etc.), Hyperparameter tuning
Evaluation of model performance using appropriate metrics,
Phase 3: Model Evaluation & Comparison of model results, Selection of the best-
Comparison 4 Weeks performing model(s)
Phase 4: Conceptual UI User interface requirements defined, Wireframes and
Design 3 Weeks mockups for potential web/application interface created
Phase 5: Documentation &
Reporting 2 Weeks Project documentation completed, Final report generated
School of Computer Engineering, KIIT, BBSR 5
Diabetes Prediction Using Machine Learning
3.2 Project Analysis:
1. Data Quality and Ambiguity
a. Assess completeness and consistency of clinical measurements
b. Verify class balance and address any skew through resampling if
necessary
2. Model Selection Criteria
a. Ensure fair comparison by fixing random seeds and
hyperparameter grids
b. Avoid data leakage by strictly separating training, validation, and
test sets
3. Performance Trade-offs
a. Balance sensitivity (recall) against specificity (precision)
depending on clinical priorities
b. Determine acceptable thresholds for false negatives to minimize
missed diagnoses
4. Resource Constraints
a. Estimate computation time for each model training and tuning step
b. Plan for hardware limitations (CPU/GPU availability, memory)
3.3 System Design:
3.3.1 Design Constraints:
Programming Language: Python
Development Environment: Jupyter Notebook / Google Colab / VS Code
Libraries Used:
pandas, numpy – for data manipulation and preprocessing
matplotlib, seaborn – for data visualization
scikit-learn – for machine learning models and evaluation metrics
xgboost – for gradient boosting model.
School of Computer Engineering, KIIT, BBSR 6
Diabetes Prediction Using Machine Learning
3.3.2 Block Diagram:
The flowchart illustrates a standard machine learning pipeline. It starts
with Data Validation and Preprocessing. Then, Feature Engineering
prepares the data for Model Selection and Training using various
algorithms. Finally, Model Evaluation helps choose the best model.
School of Computer Engineering, KIIT, BBSR 7
Diabetes Prediction Using Machine Learning
4.Implementation
This section outlines the planned approach for implementing the Diabetes
Prediction System. The implementation will be an iterative process, starting with
data exploration and model development.
4.1. Data Acquisition and Preprocessing Implementation:
4.1.1. Dataset Identification: Identify and acquire a suitable dataset for
diabetes prediction. Prioritize datasets relevant to the demographic and
health characteristics of the population in Patna, Bihar, if available.
Publicly available datasets like the Pima Indians Diabetes Database will be
used as an initial resource if local data is not immediately accessible.
4.1.2. Data Exploration and Analysis: Perform exploratory data analysis
(EDA) to understand the characteristics of the dataset, identify potential
issues (e.g., missing values, outliers), and gain insights into the
relationships between features and the target variable (diabetes status).
4.1.3. Data Cleaning and Handling Missing Values: Implement
strategies to handle missing values, such as imputation techniques (e.g.,
mean, median, mode) or removal of incomplete records, based on the
extent and nature of the missing data.
4.1.4. Feature Engineering (If Applicable): Explore the creation of new
features from existing ones that might improve the predictive power of the
models (e.g., BMI calculation from weight and height).
4.1.5. Feature Scaling and Normalization: Apply appropriate scaling or
normalization techniques (e.g., standardization, min-max scaling) to
ensure that features with different ranges do not disproportionately
influence the machine learning models.
4.1.6. Data Splitting: Divide the preprocessed dataset into training,
validation, and testing sets to train the models, tune hyperparameters, and
evaluate their final performance on unseen data.
School of Computer Engineering, KIIT, BBSR 8
Diabetes Prediction Using Machine Learning
4.2. Model Development and Training Implementation:
4.2.1. Model Selection: Implement the chosen machine learning models
using libraries like scikit-learn in Python. The initial set of models will
likely include:
o Logistic Regression
o Support Vector Machines (SVM)
o Decision Trees
o Random Forests
o Naive Bayes
o K-Nearest Neighbors (KNN)
o Potentially a simple Artificial Neural Network (ANN) using
TensorFlow or Keras.
4.2.2. Model Training: Train each selected model on the training dataset
using appropriate training algorithms and parameters.
4.2.3. Hyperparameter Tuning: Optimize the hyperparameters of each
model using techniques like cross-validation and grid search or
randomized search to achieve the best possible performance on the
validation dataset.
4.2.4. Model Saving: Save the trained and tuned models for later
evaluation and potential deployment.
4.3. Model Evaluation and Comparison Implementation:
Performance Metric Calculation: Evaluate the performance of each
trained model on the held-out test dataset using relevant classification
metrics such as accuracy, precision, recall, F1-score, and AUC-ROC.
Generate confusion matrices to further analyze the model's predictions.
Results Visualization: Visualize the evaluation results (e.g., ROC curves,
bar charts of performance metrics) to facilitate comparison between the
models.
Statistical Analysis (If Necessary): Perform statistical tests to determine
if the differences in performance between the models are statistically
significant.
School of Computer Engineering, KIIT, BBSR 9
Diabetes Prediction Using Machine Learning
4.4 Testing / Verification Plan
After the development of the models, a series of tests are performed to
verify that the project meets its objectives. The testing plan includes:
Test ID Test Case Title Test Condition System Behavior Expected Result
T01 Data Loading & CSV files are Dataset is The system
Preprocessing uploaded and successfully displays a
correctly parsed loaded, split, and summary of the
standardized dataset with
correct feature
types and
dimensions
T02 Model Training & Model training on Each machine Accuracy,
Prediction training set and learning model is Precision, and
testing on trained and Recall values
validation/test sets outputs within expected
predictions ranges for each
model
T03 Evaluation Evaluation of Generation of ROC curve
Metrics & ROC predictions using performance displays an area
Curves confusion matrix metrics and visual under the curve
and ROC analysis plots (ROC, PR (AUC) consistent
curves) with model
performance;
confusion matrix
correctly reflects
prediction
outcomes
The process of testing not only identifies the precision of prediction but
also the stability of data preparation and model integration.
School of Computer Engineering, KIIT, BBSR 10
Diabetes Prediction Using Machine Learning
4.5 Result Analysis:
The findings are analyzed by comparative evaluation of the performance
of different models on both tabulated and graphical plots. The major
findings are:
Performance Tables:
Performance metrics such as Accuracy, Precision, Recall, ROC-AUC,
and AUPR are reported for each model that was tested. For instance,
Logistic Regression and Random Forest both had encouraging accuracy
rates (0.80 to 0.94), while ensemble methods such as XGBoost and
Gradient Boosting demonstrated enhanced performance over AUC and
precision.
Models are also evaluated in different train and test split:
1. 80 Train / 20 Test:
School of Computer Engineering, KIIT, BBSR 11
Diabetes Prediction Using Machine Learning
70 Train / 30 Test:
School of Computer Engineering, KIIT, BBSR 12
Diabetes Prediction Using Machine Learning
1. 60 Train / 40 Test:
School of Computer Engineering, KIIT, BBSR 13
Diabetes Prediction Using Machine Learning
Chapter 5
Standards Adopted
The development of this project adhered to various technical, ethical, and
procedural standards to ensure the quality, reliability, and transparency of the
diabetes prediction system. These standards span across programming practices,
data handling, model evaluation, documentation, and collaborative development.
5.1 Coding Standards
The project follows PEP 8 coding guidelines to ensure readable and
maintainable code.
Functions and variables are consistently named using lowercase and
underscores.
All code is modularized using functions and classes to enhance
reusability and clarity.
Proper comments and docstrings are included to document the logic of
each function and section.
5.2 Data Handling and Ethics
The dataset used (PIMA Indian Diabetes Dataset) is open-source and
publicly available for educational and research purposes.
No personally identifiable information (PII) is involved, ensuring
patient anonymity and data privacy.
Any data preprocessing steps, such as imputation or feature scaling, were
applied uniformly without introducing bias.
School of Computer Engineering, KIIT, BBSR 14
Diabetes Prediction Using Machine Learning
5.3 Model Evaluation Standards
A consistent set of performance metrics—accuracy, precision, recall, F1-
score, and ROC-AUC—was applied across all models for fair comparison.
Cross-validation was used to ensure that models generalize well to
unseen data and to reduce the variance in performance.
Models were trained and tested on separate datasets using an 80:20 train-
test split to avoid data leakage.
5.4 Documentation Standards
Every stage of the project, including data exploration, preprocessing,
modeling, and evaluation, was thoroughly documented using Jupyter
Notebooks.
Visualizations such as heatmaps, ROC curves, and confusion matrices
were included to enhance interpretability.
The final report includes proper citations and a bibliography of all
datasets, tools, and libraries used.
School of Computer Engineering, KIIT, BBSR 15
Diabetes Prediction Using Machine Learning
Chapter 6
Conclusion and Future Scope
6.1 Conclusion
The project titled “Diabetes Prediction Using Machine Learning” successfully
demonstrates how data-driven techniques can assist in predicting medical
conditions such as diabetes. Using the PIMA Indian Diabetes dataset and various
classification algorithms, we built, evaluated, and compared machine learning
models that predict whether a person is diabetic or not based on health-related
attributes.
Through rigorous implementation and evaluation, it was found that ensemble
models like Random Forest and XGBoost delivered superior performance in
terms of accuracy, precision, recall, and overall robustness. Key features such as
glucose level, BMI, and insulin were identified as the most influential in the
prediction task. The project emphasizes the importance of data preprocessing,
model tuning, and evaluation strategies in achieving reliable predictions.
Overall, the project not only provides a practical solution to a real-world
healthcare challenge but also contributes to understanding how machine learning
can be used to support early diagnosis, thus enabling timely medical intervention
and reducing long-term health complications.
School of Computer Engineering, KIIT, BBSR 16
Diabetes Prediction Using Machine Learning
6.2 Future Scope
While the current project has achieved its objectives, there are several areas
where future improvements and expansions can be made:
Integration with Real-World Healthcare Systems: The predictive
model can be integrated with hospital management systems or mobile
applications for real-time screening.
Larger and More Diverse Datasets: Using a more diverse and larger
dataset would improve the generalization and reliability of the model
across different demographics and geographies.
Advanced Algorithms: Incorporating deep learning models such as
Artificial Neural Networks (ANNs) or Convolutional Neural Networks
(CNNs) could enhance prediction performance, especially with complex
feature interactions.
Explainable AI (XAI): Future versions can integrate interpretability tools
like SHAP or LIME to explain model predictions to doctors and patients
for better trust and understanding.
Time-Series Prediction: For patients with long-term data records,
recurrent neural networks (RNNs) and LSTMs can be used to make
sequential predictions and monitor the progression of diabetes risk.
Lifestyle-Based Feature Expansion: Including lifestyle-related features
like physical activity, sleep patterns, diet, and genetic history could enrich
the dataset and improve prediction accuracy.
User Interface Development: A user-friendly web or mobile interface
could be developed to allow non-technical users to access the prediction
model easily.
School of Computer Engineering, KIIT, BBSR 17
Diabetes Prediction Using Machine Learning
7. References
School of Computer Engineering, KIIT, BBSR 18
Diabetes Prediction Using Machine Learning
INDIVIDUAL CONTRIBUTION REPORT:
BRAIN TUMOR DETECTION AND
SEGMENTATION: INTEGRATING CNN AND PSO
Anand Prakash
22053924
Abstract:
Anand played a pivotal role in the data preprocessing and model development
phase. He implemented data cleaning techniques, handled missing values,
performed feature engineering, and applied normalization. He also contributed to
building core machine learning models such as Logistic Regression and Random
Forests.
Individual Contribution and Findings:
Anand focused on preparing the dataset for machine learning by performing
detailed Exploratory Data Analysis (EDA), feature scaling, and encoding. His
work ensured that the input data was clean and well-structured, which
significantly improved model accuracy and robustness.
Report Preparation Contribution:
He was responsible for writing the Implementation chapter, explaining each
step of preprocessing, model training, and testing in detail. He also reviewed the
structure of the dataset and discussed feature importance and selection.
Presentation and Demonstration Contribution:
Anand created and presented slides on EDA, data cleaning, and the importance
of preprocessing. He demonstrated how these steps impacted model performance
and discussed challenges faced during data transformation.
Full Signature of Supervisor: Full signature of the student:
……………………………. ……………………………..
School of Computer Engineering, KIIT, BBSR 19
Diabetes Prediction Using Machine Learning
INDIVIDUAL CONTRIBUTION REPORT:
BRAIN TUMOR DETECTION AND
SEGMENTATION: INTEGRATING CNN AND PSO
Tushit Mittal
(22053995)
Abstract:
Tushit worked primarily on model evaluation and visualization. He developed
performance metrics and generated plots to visualize confusion matrices, ROC
curves, and classification results.
Individual Contribution and Findings:
Tushit applied evaluation techniques including accuracy, precision, recall, F1-
score, and ROC-AUC to analyze the performance of various models. He
compared algorithms and interpreted results to determine the most suitable
model for deployment.
Report Preparation Contribution:
He authored the Results and Discussion section, where he interpreted
evaluation metrics, discussed model strengths and weaknesses, and visualized
comparative results using graphs.
Presentation and Demonstration Contribution:
Tushit prepared and explained slides on model evaluation, comparison of
classifiers, and graphical summaries. He also highlighted insights gained from
confusion matrices and ROC curves.
Full Signature of Supervisor: Full signature of the student:
……………………………. ……………………………..
School of Computer Engineering, KIIT, BBSR 20
Diabetes Prediction Using Machine Learning
INDIVIDUAL CONTRIBUTION REPORT:
BRAIN TUMOR DETECTION AND
SEGMENTATION: INTEGRATING CNN AND PSO
Aaroh Thapa
(22051477)
Abstract:
Aaroh led the literature review and contextual research. He surveyed related
work on machine learning in healthcare and documented the relevance of various
algorithms in diabetes prediction.
Individual Contribution and Findings:
He analyzed previous research, identified gaps in traditional approaches, and
helped justify the use of selected models like Random Forest and XGBoost. He
was also involved in feature research and medical interpretation of the dataset.
Report Preparation Contribution:
Aaroh was responsible for writing the Introduction and Literature Review
chapters. He structured the review to highlight the evolution of ML in healthcare
and referenced relevant journals and research papers.
Presentation and Demonstration Contribution:
He presented slides on the background of diabetes, motivation for the study, and
summary of previous work. He emphasized how the current project builds upon
and improves past research.
Full Signature of Supervisor: Full signature of the student:
……………………………. ……………………………..
School of Computer Engineering, KIIT, BBSR 21
Diabetes Prediction Using Machine Learning
INDIVIDUAL CONTRIBUTION REPORT:
BRAIN TUMOR DETECTION AND
SEGMENTATION: INTEGRATING CNN AND PSO
Aman Yadav
(22051316)
Abstract:
Aman played a major role in report documentation and final compilation. He
ensured the report was cohesive, well-formatted, and met all university
guidelines. He also contributed to performance analysis and feature importance
interpretation.
Individual Contribution and Findings:
He participated in result analysis and presentation of model outputs. He analyzed
which features (like glucose and BMI) had the most predictive power and helped
fine-tune model parameters.
Report Preparation Contribution:
Aman wrote the Conclusion and Future Scope chapter. He compiled and edited
all chapters, checked citations, and ensured consistency across formatting,
graphs, and tables.
Presentation and Demonstration Contribution:
He was responsible for compiling the final presentation and handled the Q&A
during the demonstration. He summarized the project flow, contributions, and
future enhancements during the final presentation.
Full Signature of Supervisor: Full signature of the student:
……………………………. ……………………………..
School of Computer Engineering, KIIT, BBSR 22
Diabetes Prediction Using Machine Learning
INDIVIDUAL CONTRIBUTION REPORT:
BRAIN TUMOR DETECTION AND
SEGMENTATION: INTEGRATING CNN AND PSO
Harshdeep Das
(2205378)
Abstract:
Harshdeep handled the technical integration, model optimization, and debugging
process. He implemented advanced models like XGBoost and tuned them using
Grid Search and Random Search methods.
Individual Contribution and Findings:
Harshdeep worked on improving model performance through hyperparameter
tuning and ensemble methods. His optimization techniques helped achieve the
highest accuracy among the models tested.
Report Preparation Contribution:
He contributed to the Implementation and Standards Adopted chapters,
documenting model training processes, coding standards, and reproducibility
methods.
Presentation and Demonstration Contribution:
He presented the architecture of the machine learning pipeline, explained
optimization techniques, and demonstrated live testing of the model.
Full Signature of Supervisor: Full signature of the student:
……………………………. ……………………………..
School of Computer Engineering, KIIT, BBSR 23
Diabetes Prediction Using Machine Learning
TURNITIN PLAGIARISM REPORT
(This report is mandatory for all the projects and plagiarism
must be below 25%)
School of Computer Engineering, KIIT, BBSR 24