[go: up one dir, main page]

0% found this document useful (0 votes)
82 views43 pages

CAPSTONE Report Format

Vit ap capstone project report

Uploaded by

6zuiidr9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views43 pages

CAPSTONE Report Format

Vit ap capstone project report

Uploaded by

6zuiidr9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Comparision of Machine Learning Models in Graduate Admission Prediction

A CAPSTONE PROJECT REPORT

Submitted in partial fulfillment of the


requirement for the award of the
Degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE & ENGINEERING

by

Rachamsetty Rohith Basava Sai (21BCE8151)


Dodda Venkata Sridhara Reddy (21BCE9241)
Ravagondu Nithesh (21BCE8318)
Bhimineni Akhil (21BCE8319)

Under the Guidance of

DR. Hajarathaiah K

SCHOOL OF ELECTRONICS ENGINEERING


VIT-AP UNIVERSITY
AMARAVATI- 522237

DECEMBER 2024
CERTIFICATE

This is to certify that the Capstone Project work titled “Comparision of Machine Learning
Models in Graduate Admission Prediction” that is being submitted by Rachamsetty Rohith
Basava Sai (21BCE8151), Dodda Venkata Sridhara Reddy (21BCE9241), Ravagondu
Nithesh (21BCE8318), and Bhimineni Akhil (21BCE8319) is in partial fulfillment of the
requirements for the award of Bachelor of Technology, is a record of bonafide work done under
my guidance. The contents of this Project work, in full or in parts, have neither been taken from
any other source nor have been submitted to any other Institute or University for award of any
degree or diploma and the same is certified.

Dr. Hajarathaiah K
Guide

The thesis is satisfactory

Prof. LOGANATHAN PAVITHRA Prof. M. RAMMOHAN


Internal Examiner1 Internal Examiner2

Approved by

HoD, Department of ...


School of Computer Science and Engineering

1
ACKNOWLEDGEMENTS

We wish to express our heartfelt gratitude to everyone who supported and guided us throughout
this capstone project. First and foremost, we sincerely thank our mentor, Dr. Hajarathaiah K, for
their exceptional guidance, unwavering support, and constructive feedback. Their expertise and
encouragement provided the foundation for this project's successful completion.

We extend our gratitude to Vellore Institute of technology and the School of Computer Science
and Engineering for offering a conducive environment and access to essential resources, such as
modern computational tools and infrastructure. This support was instrumental in executing our
research and experimentation effectively.
We are deeply thankful to our team members, whose dedication, resilience, and collaborative
spirit transformed this ambitious vision into reality. Each member’s unique contribution was vital
to overcoming challenges and achieving milestones. Additionally, we acknowledge the role of our
families and friends, whose patience, motivation, and emotional support sustained us during
intense phases of the project.
Our appreciation also goes to the open-source community and numerous online platforms that
facilitated our work through access to datasets, libraries, and comprehensive documentation.
These resources were pivotal in implementing cutting-edge machine learning techniques and
refining our predictive models. We further extend our gratitude to the evaluators, reviewers, and
peers who offered critical feedback and inspired us to pursue excellence.
This project represents a confluence of collective effort, perseverance, and innovative thinking. It
reflects not only our academic growth but also the invaluable support and guidance we received
along the way. We are proud to present this work as a culmination of our learning journey and as
a testament to the possibilities of collaboration and technological progress.

2
ABSTRACT

This capstone project focuses on the development and deployment of a predictive system for
graduate admissions using advanced machine learning techniques. The goal is to conduct a
comprehensive comparative analysis of machine learning models—including Logistic Regression,
K-Nearest Neighbors (KNN), Random Forest, XGBoost, LightGBM, Support Vector Machines
(SVM), and Gradient Boosting—to determine the most accurate and efficient approach for
predicting admissions outcomes. The project integrates several stages, from data collection and
preprocessing to model optimization, evaluation, and deployment as an interactive web
application.

In the initial phase, datasets containing attributes such as undergraduate CGPA, standardized test
scores, and past admission decisions were gathered and carefully processed to ensure data
integrity. Preprocessing steps addressed challenges like missing values, outliers, and
normalization, ensuring readiness for training robust machine learning models. A systematic
exploration of feature engineering techniques and hyperparameter tuning was undertaken to
enhance the models' predictive performance. Key performance metrics—including accuracy,
precision, recall, F1-score, and ROC-AUC—served as benchmarks for comparison. Following
this, the best-performing model was selected and integrated into a user-friendly web interface
capable of delivering real-time predictions. This web application aims to assist prospective
students in assessing their admission probabilities while providing insights into critical
influencing factors.

Additionally, the findings and methodologies were documented in a research paper prepared for
submission to IEEE conferences. This paper highlights not only the comparative evaluation of
various models but also the technical and practical challenges encountered during model design
and deployment. Beyond its academic value, the project demonstrates the potential of machine
learning and web technologies to innovate within the educational domain, offering decision-
making support to institutions and applicants alike. By bridging theoretical understanding with
practical implementation, this capstone serves as a milestone in applying artificial intelligence for
societal benefit.

3
TABLE OF CONTENTS

Sl.No. Chapter Title Page Number


1. Acknowledgement 2
2. Abstract 3
3. List of Figures and Table 5,6
4. 1 Introduction 7

1.1 Objectives 7

1.2 Background and Literature Survey 8

8
Organization of the Report
1.3
5. 2 Proposed System and Methodology 10

2.1 Proposed System 10

2.2 Working Methodology 10

2.3 Standards 14

2.4.1 Software 15

6. 3 Cost Analysis 16

7. 4 Results and Discussion 18


8. 5 Conclusion & Future Works 23
9. 6 Appendix 25
10. 7 References 42

List of Tables
4
TABLE NO TITLE PAGE NO

1 MODEL METRICS 19

2 DATASET SAMPLE 31

List of Figures

S.NO TITLE PAGE NO

5
1 METHODOLOGY 10

2 WEB UI 13

3 ACCURACY 19

4 RECALL 20

5 F1-SCORE 20

6 PRECISION 20

7 WEB PAGE RESULT 21

6
CHAPTER 1
INTRODUCTION

The advancement of technology has significantly impacted various industries, including


education. Machine learning (ML), as a subset of artificial intelligence, has emerged as a
transformative tool in data-driven decision-making. In this project, we focus on leveraging
machine learning models to predict graduate admissions, using parameters such as undergraduate
CGPA, standardized test scores, and other academic achievements.

Predicting graduate admissions involves analyzing complex datasets and identifying patterns that
influence the admission decisions of universities. This application of machine learning not only
automates the process but also provides prospective students with actionable insights into their
chances of acceptance, enabling them to make informed decisions.

By employing robust preprocessing techniques and systematic evaluation of multiple machine


learning models, this project bridges the gap between raw academic data and practical decision-
making tools. The culmination of this work is a web application that integrates the selected
machine learning model, offering users a seamless interface for predictions.

1.1 Objectives
The objectives of this project are as follows:
● To collect, preprocess, and analyze data related to graduate admissions, ensuring the
dataset is clean and reliable for model training.
● To evaluate and compare the performance of various machine learning models, including
Logistic Regression, KNN, Random Forest, XGBoost, LightGBM, Gradient Boosting, and
SVM.
● To implement hyperparameter tuning and feature selection techniques for optimizing
model performance.
● To design and develop a user-friendly web application that integrates the best-performing
model for real-time predictions.
● To prepare a research paper for IEEE conferences, emphasizing the comparative analysis
of machine learning models and their application in education technology.

1.2 Background and Literature Survey

7
Machine learning has emerged as a pivotal tool for predictive analytics in various domains,
including education. Predictive models have revolutionized the way institutions handle data,
offering insights into student performance, enrollment trends, and admissions. By leveraging
these models, universities can refine their admissions processes, identify candidates with strong
potential, and allocate resources more efficiently. The rising interest in applying machine learning
to graduate admissions highlights its potential to streamline decision-making and improve
institutional outcomes.

The foundation of this project is rooted in the exploration of machine learning's role in education,
where algorithms have been utilized to analyze academic performance and admission trends. Past
studies have showcased the effectiveness of models like Random Forest and Support Vector
Machines (SVM) in predicting student outcomes. However, many of these efforts have been
limited to theoretical implementations or static analyses, lacking practical deployment avenues
that address user accessibility and interactivity.

While prior research has focused on model performance, there has been a gap in combining
predictive analytics with user-oriented tools. Most studies have emphasized specific algorithms or
datasets without a comprehensive framework for model comparison and selection. Additionally,
limited work has been done to integrate real-time prediction capabilities, which are essential for
practical applications in admissions processes. These challenges underscore the need for a
solution that balances accuracy with usability.

This project aims to address these gaps by incorporating advanced machine learning techniques
and deploying them through a dynamic web application. By integrating ensemble methods and
offering real-time predictions, the system ensures both precision and practicality. The approach
not only evaluates and compares multiple models for their predictive strengths but also provides a
seamless platform for users, making it an innovative step forward in the realm of graduate
admission prediction.

1.3 Organization of the Report

The remaining chapters of the project report are described as follows:


Chapter 2: Proposed System and Methodology
● This chapter provides a detailed explanation of the proposed system, working
methodology, and an overview of the software tools and technologies used in the project.

8
Chapter 3: Cost Analysis
● This chapter presents the cost analysis of the software tools and resources utilized,
focusing on free and open-source technologies where applicable.
Chapter 4: Results and Discussion
● This chapter discusses the results obtained from data preprocessing, model training,
testing, and evaluation. It includes a comparative analysis of the machine learning models
used in the project.
Chapter 5: Conclusion and Future Work
● This chapter concludes the project by summarizing key outcomes and identifying potential
areas for improvement and expansion, such as using more advanced datasets or exploring
deep learning techniques.
Chapter 6: Appendix
● This chapter includes additional materials such as code snippets, dataset descriptions, and
supplementary diagrams that support the project.
Chapter 7: References
● This chapter provides a comprehensive list of all references cited in the report, including
research papers, online resources, and datasets.

9
CHAPTER 2
Proposed System and Methodology

2.1 Proposed System

The proposed system aims to predict graduate admissions using a machine learning-based web
application. The system integrates multiple components, including data preprocessing, model
training, evaluation, and deployment. A key aspect is the comparative analysis of several machine
learning models to identify the most efficient one for integration.

The system follows a structured workflow:

● Data Collection and Preprocessing: Raw data is collected from reliable sources, cleaned,
and transformed into a format suitable for machine learning.
● Model Training and Testing: Various models are trained and evaluated using performance
metrics like accuracy, precision, recall, and F1-score.
● Web Application Development: The selected model is deployed into a user-friendly web
application to enable real-time predictions.
● IEEE Paper Preparation: Insights from the project are documented in a research paper,
highlighting model comparison results.

2.2 Working Methodology


The project follows a structured methodology divided into several key phases, ensuring
systematic development and accurate results. Each phase is detailed below:

FIG 1:METHODOLOGY
2.2.1 Dataset Gathering and Preprocessing
Dataset Gathering:

10
- The dataset used in this project is sourced from publicly available educational datasets related to
graduate admissions. These datasets typically include attributes such as:
- GRE Score (Quantitative and Verbal)
- TOEFL Score
- Undergraduate CGPA
- Research Experience (Binary: Yes/No)
- SOP Strength (Rating: 1-5)
- LOR Strength (Rating: 1-5)
- University Rating (Rating: 1-5)
- Chance of Admission (Target variable)
Preprocessing Steps:
To prepare the data for machine learning models:
1. Handling Missing Values:
● Identify and handle missing values using appropriate imputation techniques (e.g., mean
or median imputation for numerical data).
2. Outlier Detection and Removal:
● Detect outliers using boxplots or Z-scores, and handle them either by removing or
capping extreme values.
3. Normalization:
● Scale numerical features (e.g., GRE, TOEFL scores) to bring them into a uniform range,
often using Min-Max scaling or Standard scaling.
4. Encoding Categorical Data:
● Use one-hot encoding or label encoding for binary or multi-class categorical features,
such as Research Experience.
5. Data Splitting:
● Split the dataset into training (70%), validation (15%), and testing (15%) sets to ensure
proper evaluation and avoid overfitting.
Key Deliverables:
● A clean and ready-to-use dataset optimized for model training.

2.2.2 Model Selection and Evaluation:


Model Selection:

11
A variety of machine learning models are evaluated to ensure comprehensive coverage of
classification techniques. The models include:
1. Logistic Regression: A simple yet effective model for binary classification problems.
2. K-Nearest Neighbors (KNN): A non-parametric algorithm that classifies based on feature
similarity.
3. Random Forest: An ensemble learning method that builds multiple decision trees and averages
their outputs.
4. Gradient Boosting: A sequential ensemble method that improves predictions through gradient
descent optimization.
5.XGBoost and LightGBM: Advanced gradient boosting methods known for speed and
performance.
6. Support Vector Machines (SVM): A model that finds the optimal hyperplane for classification
tasks.
Evaluation Metrics:
Each model is trained and evaluated using the following metrics to ensure robust comparison:
● Accuracy: Measures overall correctness of predictions.
● Precision: Measures the proportion of correctly identified positive instances.
● Recall: Evaluates the model's ability to identify all positive instances.
● F1-Score: A harmonic mean of precision and recall.
● ROC-AUC: Evaluates the tradeoff between true positive and false positive rates.
Hyperparameter Tuning:
Hyperparameters are fine-tuned using grid search or randomized search methods to improve
model performance. Examples include:
● Number of trees in Random Forest.
● Learning rate in Gradient Boosting algorithms.
● Kernel type and regularization parameter in SVM.
Key Deliverables:
● Comparison table showing the performance metrics of all models.
● Identification of the best-performing model for deployment.

2.2.3 Web Application Development:

12
Frontend Development:
● Built using HTML and CSS for an intuitive and interactive user interface.
Key Features:
● Input form for users to enter academic details (GRE, CGPA, etc.).
● A "Predict" button to fetch predictions from the backend.

Backend Development:
● Built using Flask Framework.
● Receives user inputs from the frontend.
● Passes the inputs to the machine learning model.
● Returns the prediction result to the frontend.

Model Integration:
● The best-performing machine learning model is serialized using libraries like
joblib or pickle in Python.
● The serialized model is integrated into the backend, ensuring real-time prediction.

Deployment:
● The application is deployed on cloud platforms such as render for scalability and
accessibility.
● The backend APIs are tested using Postman to ensure seamless communication
between the frontend and backend.

Key Deliverables:
● A fully functional web application capable of predicting graduate admission chances in
real time.

FIG 2:WEB USER INTERFACE

2.2.4 IEEE Paper Preparation:


13
Research Paper Focus:
The paper highlights the following:
● The methodology for evaluating machine learning models.
● Performance analysis of the models and justification for selecting the best one.
● Challenges encountered, such as dataset inconsistencies or model overfitting, and the
strategies used to address them.

IEEE Guidelines:
● The paper adheres to IEEE formatting and citation guidelines.
● It includes sections on methodology, results, discussions, and conclusions, with clear
graphical representations of findings.

Key Deliverables:
● A completed research paper ready for submission to IEEE conferences.

2.3 Standards
To ensure a robust and secure system, the following standards and best practices were adopted
during the development of the project:

Data Privacy and Security:


● The system avoids storing sensitive user data by processing inputs directly for predictions
without retaining them.
● HTTPS is used for secure communication between the user and the deployed application
on Render.

Model Serialization:
● The trained machine learning model was serialized using Python’s pickle library, adhering
to best practices for model deployment to ensure seamless integration with the Flask
backend.

Code Modularity and Reusability:


● The project followed modular design principles, ensuring that the backend logic, model
handling, and frontend components are decoupled and reusable.

Deployment Standards:
● The application was deployed on Render, following guidelines for hosting Flask
applications. Deployment ensured high availability and minimal downtime.

HTML and CSS Compliance:


● The frontend adheres to HTML5 and CSS3 standards to ensure cross-browser
compatibility and responsiveness.

Testing Standards:
● APIs were tested using Postman, ensuring correct functionality and data exchange
between the frontend and backend.
14
2.4 Software Details
The project leverages a comprehensive set of software tools and technologies for data gathering,
preprocessing, model training, and application development. Key components are listed below:

Programming Languages:

● Python: Used for data scraping, preprocessing, model training, and backend development
with Flask.
● HTML/CSS: Used for the static frontend design with internal styling.

Libraries and Frameworks:


● Selenium: For automated web scraping during data gathering, enabling dynamic content
extraction from websites.
● BeautifulSoup: For parsing and extracting structured data from HTML pages.
● Pandas: For data manipulation and preprocessing.
● NumPy: For numerical computations.
● Scikit-learn: For implementing and evaluating machine learning models.
● Flask: For building the backend API.
● Pickle: For serializing the trained machine learning model.

Development and Testing Tools:


● Jupyter Notebook: Used for exploratory data analysis and model training.
● Postman: For testing backend APIs.

Deployment Platform:
● Render: Used for deploying the Flask-based web application. Render provides a reliable
and scalable hosting solution for Python applications.

Version Control:
● Git and GitHub: Used for version control and collaboration during project development.

15
CHAPTER 3
COST ANALYSIS
In this section, we analyze the costs associated with developing and deploying the machine
learning-based web application for graduate admission prediction. The costs are divided into two
categories:
● Development Costs
● Operational Costs.

3.1. Development Costs:

Development costs include the resources and tools required for building the project. The
following factors were considered in estimating the development costs:
Software and Tools:
Programming Languages & Libraries: The project utilizes Python for machine learning models,
Flask for the backend, and HTML/CSS for the frontend. All these tools and libraries are open-
source and free to use.

Machine Learning Models: The models used, including logistic regression, KNN, random forest,
XGBoost, LightGBM, SVM, and gradient boosting, are implemented using libraries such as
scikit-learn, xgboost, and lightgbm, which are free and open-source.

Web Development Tools: The web application is built using Flask (open-source) and hosted
using the Render platform. Render offers a free tier, which was used for the initial deployment,
but costs may arise for scaling based on traffic and resource usage.

Development Time:

Dataset Gathering and Preprocessing: Collecting data, cleaning, and preprocessing took
approximately 29 hours. This includes the time spent scraping data using Selenium and
BeautifulSoup.

Model Training and Testing: The training and testing of multiple machine learning models took
approximately 15 hours. This time involved hyperparameter tuning, cross-validation, and model
evaluation.
Web Application Development: Developing the web application for hosting the machine learning
models took around 10 hours. This includes backend API integration and frontend design.

3.2. Operational Costs


Operational costs refer to the ongoing costs associated with running and maintaining the project
after deployment.

16
Cloud Hosting: The web application is deployed on the Render platform, which provides a free
tier suitable for small-scale applications. For scaling beyond the free tier, the cost may increase
depending on the usage of resources such as CPU, memory, and storage. Based on the expected
usage, the estimated monthly cost for Render hosting is free.

Storage and Database: The storage costs for storing the dataset and the trained models are
minimal due to the limited size of the data. If additional data or models are added, storage costs
might increase depending on the cloud service provider’s pricing.

Compute Resources: Since machine learning model training is computationally intensive, the
costs for cloud-based compute resources (e.g., GPUs or high-performance CPUs) might be
incurred if larger datasets or more complex models are used. The project currently uses personal
resources for training, so there are no immediate costs. However, future scaling might require
additional compute resources at an estimated cost of 2 USD per hour, depending on the selected
provider.

3.Miscellaneous Costs
● Data Collection: If data is acquired from paid sources rather than open web scraping, the
cost of purchasing datasets could be a factor in the project. In this case, the cost for data
acquisition would be Free.
● Maintenance: Post-deployment, the project might require periodic updates or fixes to
improve the system’s performance, maintain security, or upgrade libraries. These
maintenance tasks may incur costs for developer time, server resources, or external
dependencies.

Total Estimated Costs


The total cost of the project is dependent on the scaling requirements and the use of paid resources
for cloud services. Based on the current usage and development, the approximate total cost for
this project is:

● Development Costs: 0 USD


● Operational Costs: 0 USD/month (for hosting and compute resources)
● Miscellaneous Costs: 0 USD (for data or additional maintenance, if applicable)

In conclusion, the cost of developing and deploying the project is relatively low due to the use of
open-source tools and free cloud tiers. However, as the project scales or requires more resources,
operational costs may increase. Future improvements and maintenance will also contribute to the
overall cost.

17
CHAPTER 4

RESULTS AND DISCUSSIONS

Results and Discussion


This section presents the results of the machine learning models used for the graduate admission
prediction task. We will compare the performance of the models based on various evaluation
metrics, such as precision, recall, F1-score, and accuracy. Additionally, we will discuss the
deployment and performance of the web application that integrates the selected model for real-
time predictions.
4.1. Machine Learning Model Comparison
The following machine learning models were trained and tested using the prepared dataset:
Logistic Regression
Random Forest
Support Vector Machine (SVM)
Gradient Boosting
K-Nearest Neighbors (KNN)
XGBoost
LightGBM
Model Evaluation Metrics
For each model, the following metrics were evaluated:
● Precision: The proportion of true positives among all predicted positives.
● Recall: The proportion of true positives among all actual positives.
● F1-Score: The harmonic mean of precision and recall.
● Accuracy: The percentage of correctly predicted instances.

Model Performance Summary:


The models were evaluated on the test set, and their performance metrics are summarized below:

18
TABLE 1:MODEL METRICS
Based on the table above, Logistic Regression achieved the highest accuracy (72.3%) and recall
(72.3%) compared to the other models. It also demonstrated a balanced performance with a
relatively high F1-score (70.8%) and precision (70.7%). While other models, such as Gradient
Boosting and XGBoost, performed well, Logistic Regression was chosen due to its simpler
implementation and strong predictive performance on this dataset.
Model Comparison Graphs
The following graphs illustrate the comparison of model performance across key metrics:
Accuracy Comparison: A bar graph showing the accuracy of each model, with Logistic
Regression outperforming the other models.
Precision and Recall Comparison: A bar graph showing the precision and recall values, where
Logistic Regression again leads with the highest recall.
F1-Score Comparison: A comparison of F1-scores, where Logistic Regression shows a
competitive balance between precision and recall.

FIG 3:ACCURACY

19
FIG 4:RECALL

FIG 5:F1-SCORE

FIG 6:PRECISION

20
4.2. Web Application Results
The Logistic Regression model, chosen for its simplicity and effectiveness, was integrated into a
web application built using Flask. This web application allows users to input their undergraduate
CGPA and test scores and receive a prediction regarding their likelihood of admission to a
graduate program.
The web application is deployed on the Render platform and can be accessed through the
following URL:https://graduate-admission-predictor.onrender.com/ The application provides real-
time predictions based on user input. The frontend features a simple form where users enter their
data, and the backend handles the prediction process using the trained Logistic Regression model.
User Interface and Experience
The web application offers an intuitive and user-friendly interface, which includes:
● Input Section: Users input their undergraduate CGPA and test scores.
● Prediction Result: The model predicts the probability of admission based on the provided
data.
The application is responsive and works well with minimal latency between input submission and
prediction. Since the application is hosted on Render's free tier, it performs adequately for low to
moderate traffic, but may require scaling for larger user bases.

FIG 7:WEB PAGE RESULT


4.3 Discussion
Model Performance: The Logistic Regression model showed strong performance in terms of
accuracy, recall, and F1-score. It was chosen for the web application due to its balanced
performance and ease of deployment. While models like XGBoost and Gradient Boosting showed

21
similar performance, Logistic Regression's simplicity and interpretability made it a more practical
choice for real-time deployment.
Scalability and Application: The web application works effectively under normal conditions.
However, as the user base grows, additional resources may be required to ensure smooth
performance. Future scaling options may include upgrading the Render plan or migrating to a
more powerful hosting service.
Future Improvements: The project could benefit from exploring more advanced features such as
hyperparameter tuning or the inclusion of additional features in the dataset. Additionally,
implementing a user authentication system, providing real-time data visualizations, and collecting
user feedback could further enhance the application.
Limitations: The model was trained on a relatively small dataset, which may affect its
generalizability. Moreover, the Logistic Regression model, while effective for this task, may not
perform as well with more complex datasets or in real-world scenarios where feature interactions
are more intricate. Future work may involve gathering more diverse data to improve the model's
robustness.
In conclusion, the Logistic Regression model provides a solid solution for predicting graduate
admission likelihood, and the web application serves as an accessible platform for users to utilize
this prediction model. With additional improvements and future work, this project can be
expanded to include more advanced features and scale to meet growing demand.

22
CHAPTER 5
CONCLUSION AND FUTURE WORK

5.1 Conclusion
This project aimed to predict the likelihood of graduate school admissions based on undergraduate
CGPA and standardized test scores using machine learning techniques. We employed several
machine learning models, including Logistic Regression, Random Forest, SVM, Gradient
Boosting, K-Nearest Neighbors, XGBoost, and LightGBM, and evaluated them based on various
performance metrics such as precision, recall, F1-score, and accuracy.
Among all the models, Logistic Regression demonstrated the best overall performance, achieving
the highest accuracy (72.3%) and recall (72.3%), making it the most suitable choice for this
application. The model was integrated into a web application built using Flask and deployed on
the Render platform. This web app allows users to input their CGPA and test scores to predict
their likelihood of graduate school admission in real-time.
The key findings of the project are as follows:
● Logistic Regression outperformed other models in terms of key evaluation metrics,
making it an ideal candidate for deployment in the web application.
● The web application, hosted on Render, provides an efficient and user-friendly interface
for users to access predictions.
● The performance of the application is satisfactory under typical usage conditions,
providing real-time predictions based on user input.
Overall, the project successfully demonstrated the use of machine learning for graduate admission
prediction and the practical deployment of the model in a web application.
5.2. Future Work
While this project has provided valuable insights, there are several avenues for future
improvement and expansion:
● Model Optimization: Although Logistic Regression performed well, further optimization
of the models could improve their accuracy. Techniques such as hyperparameter tuning
(e.g., grid search, randomized search) could be employed to enhance model performance.
Additionally, exploring other advanced algorithms like Neural Networks or Ensemble
Methods could provide even better predictions.
● Data Expansion and Diversity: The model was trained on a relatively small dataset, which
may limit its generalizability. Future work should focus on gathering a larger, more
diverse dataset that includes more variables, such as extracurricular activities,
recommendation letters, or research experience, to improve the robustness of the model.

23
● Feature Engineering: Exploring additional features or performing more advanced feature
engineering could potentially boost the model’s performance. For example, incorporating
the number of research papers published, internships, or other academic achievements may
better capture the complexity of graduate admission decisions.
● Web Application Enhancements: The web application can be enhanced with additional
features, such as:
1. User Authentication: Enabling users to save their predictions or track their
admission chances over time.
2. Visualization Tools: Adding charts or graphs to help users better
understand how their CGPA and test scores impact the prediction.
3. Mobile App Development: Expanding the web application to a mobile
platform to increase accessibility.
● Scalability and Deployment: While the application currently works well under typical
usage conditions, future work could focus on scaling the web application to handle higher
traffic. This might involve optimizing the backend or upgrading the hosting service to
ensure smooth performance during peak usage.
● Model Explainability and Transparency: For users to trust the predictions made by the
model, implementing model interpretability techniques such as SHAP (Shapley Additive
Explanations) or LIME (Local Interpretable Model-agnostic Explanations) could be
helpful. These techniques would provide users with insights into why certain predictions
were made, improving the transparency of the model.
In conclusion, while this project serves as a solid foundation for predicting graduate school
admission likelihood using machine learning, there is significant potential for improvement and
expansion. By incorporating more data, optimizing models, and enhancing the web application’s
functionality, this project could evolve into a powerful tool for students and admissions
committees alike.

24
CHAPTER 6

APPENDIX

6.1 Web Scraping Code

25
26
27
28
29
30
Dataset Sample:

TABLE 2:DATASET SAMPLE

31
6.2 Machine Learning Code

AllModels.py

32
33
6.3 Web Application Code:

predict.py

34
App.py

35
36
Index.html Code:

37
38
39
Result.Html code:

40
REFERENCES

[1]Anna Kye. “Comparative Analysis of Classification Per formance for US College Enrollment
Predictive Modeling Using Four Machine Learning Algorithms (Logistic Re gression, Decision
Tree, Support Vector Machine, Artificial Neural Network)”. PhD thesis. Loyola University
Chicago, 2023.

[2] Aashish Singhal and Saurabh Gautam. “Graduate Univer sity Admission Predictor using
Machine Learning”. In: Journal for Modern Trends in Science and Technology 6.12 (2020), pp.
474–478.

[3] Ashiqul Haque Ahmed et al. “Predicting the Possibility of Student Admission into Graduate
Admission by Regression Model: A Statistical Analysis”. In: Journal of Mathematics and
Statistics Studies 4.4 (2023), pp. 97–105.

[4] Selvaprabu Jeganathan, Saravanan Parthasarathy, Arun Raj Lakshminarayanan, PM Ashok


Kumar, and Md Khur shid Alam Khan. “Predicting the post graduate admissions using
classification techniques”. In: 2021 International Conference on Emerging Smart Computing and
Informatics (ESCI). IEEE. 2021, pp. 346–350.

41
[5] Zain Bitar and Amjed Al-Mousa. “Prediction of gradu ate admission using multiple supervised
machine learning models”. In: 2020 SoutheastCon. IEEE. 2020, pp. 1–6.
[6] Aga Maulana et al. “Optimizing University Admissions: A Machine Learning Perspective”.
In: Journal of Educational Management and Learning 1.1 (2023), pp. 1–7.
[7] Mohan S Acharya, Asfia Armaan, and Aneeta S Antony. “A comparison of regression models
for prediction of graduate admissions”. In: 2019 international conference on computational
intelligence in data science (ICCIDS). IEEE. 2019, pp. 1–5.
[8] Adrita Iman and Xiaoguang Tian. “A comparison of classi f ication models in predicting
graduate admission decision”. In: Journal of Higher Education Theory and Practice 21.7 (2021).
[9] Sara Aljasmi, Ali Bou Nassif, Ismail Shahin, and Ashraf Elnagar. “Graduate admission
prediction using machine learning”. In: Int. J. Comput. Commun 14 (2020), pp. 79 83.
[10] Ronald A Fisher. “The use of multiple measurements in taxonomic problems”. In: Annals of
eugenics 7.2 (1936), pp. 179–188.
[11] Evelyn Fix. Discriminatory analysis: nonparametric dis crimination, consistency properties.
Vol. 1. USAF school of Aviation Medicine, 1985.
[12] Tianqi Chen and Carlos Guestrin. “Xgboost: A scalable tree boosting system”. In:
Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data
mining. 2016, pp. 785–794.
[13] Vladimir Vapnik. The nature of statistical learning theory. Springer science & business
media, 2013.
[14] Guolin Ke et al. “Lightgbm: A highly efficient gradient boosting decision tree”. In: Advances
in neural information processing systems 30 (2017).
[15] Leo Breiman. “Random forests”. In: Machine learning 45 (2001), pp. 5–32.
[16] Jerome Friedman, Trevor Hastie, Saharon Rosset, Robert Tibshirani, and Ji Zhu. “Discussion
of boosting papers”. In: Annual Statistics 32 (2004), pp. 102–107.
[17] Tumul Buch Sumeet Jain Kashyap Matani. Yocket- Study Abroad Guide.
https://yocket.com/. Accessed: 2024-10-30

42

You might also like