[go: up one dir, main page]

0% found this document useful (0 votes)
39 views7 pages

Final DM

The document discusses analyzing campus placements in India using machine learning algorithms. It first collects a campus placement dataset from Kaggle containing attributes on students like CGPA, internships, projects, etc. It then preprocesses the data, selects relevant features, and splits the data into 80% for model training and 20% for testing. Various machine learning classification algorithms will be applied and evaluated on the test data to predict student placements based on their attributes.

Uploaded by

Love Gates
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views7 pages

Final DM

The document discusses analyzing campus placements in India using machine learning algorithms. It first collects a campus placement dataset from Kaggle containing attributes on students like CGPA, internships, projects, etc. It then preprocesses the data, selects relevant features, and splits the data into 80% for model training and 20% for testing. Various machine learning classification algorithms will be applied and evaluated on the test data to predict student placements based on their attributes.

Uploaded by

Love Gates
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Analysis of Campus Placements in India

Tejansh Sachdeva Chaitanya Tandon Mitaali Singhal


2110110555 2110110171 2110110883
Computer Science And Engineering Computer Science And Engineering Computer Science And Engineering
Shiv Nadar University Shiv Nadar University Shiv Nadar University
ts879@[Link] ct765@[Link] ms923@[Link]

I. INTRODUCTION introduced a recommendation framework capable of


predicting five distinct placement statuses for scholars,
There was a time when the success of an educational
enhancing their technical and social skills.
institute was judged by the level of skills and knowledge the
students hold but in the current times for most of the This model serves placement cells within academic
Undergraduate as well as postgraduate courses, success of institutions by identifying and advancing students with
education is measured by successful campus placements. potential based on their academic performance in 10th, 12th,
Campus placement of all the students is considered as and graduation, along with current backlog status. The
institutional obligation and merit. The ranking of the evaluation criteria encompass various metrics, including
institutions is based on the number of students placed accuracy scores, percentage accuracy scores, confusion
successfully and the average salary offered. Campus matrices, heat maps, and classification reports, encompassing
Placements is described as a program organized by the precision, recall, f1-score, and support. Several classification
university or educational institute in collaboration with algorithms, such as Support Vector Machine, Gaussian Naive
Bayes, K-Nearest Neighbor, Random Forest, Stochastic
various companies to provide job opportunities to students
Gradient Descent, Logistic Regression, and Neural
who are nearing the completion of studies. It is a widely
Networks, were applied to develop these classifiers.
used phenomena in the education industry.
This pivotal phase not only influences the professional In another study by Pal and Pal [4], the Naïve Bayes
trajectory of students but also significantly impacts the classifier emerged as the most effective choice for placement
standing of colleges and universities. Nevertheless, predictions. Ramanathan, Swarnalatha, and Gopal [5]
forecasting whether a student will successfully secure a adopted a different approach, using the sum of the difference
placement in a coveted company is a formidable challenge, method to predict student placements, considering attributes
as it hinges on a multitude of variables, including academic like age, academic records, and achievements, offering
achievements, personal backgrounds, prior work experience, valuable insights for higher learning institutions to improve
and more. education quality.
The comprehensive dataset used encompasses a plethora III. Methodology
of attributes, including secondary and higher education
Drawing from the insights presented in the literature
percentages, the number of internships undertaken, projects
review, a diverse array of robust analysis methods, including
completed, workshops attended, and much more. K-Nearest Neighbor and Random Forest, have been widely
Leveraging data mining techniques and machine learning adopted due to their credibility and high [Link] our
algorithms, this paper endeavors to predict the placement research, we follow a structured methodology commencing
outcomes of students based on these diverse attributes. with comprehensive data preprocessing to ensure data quality
and uniformity.
II. LITERATURE REVIEW
The significance of campus recruitment for both Subsequently, we embark on the vital task of feature
educational institutions and corporations is well-established selection, strategically assessing their relevance in relation to
in the literature. Research highlights a prevailing mismatch the target feature. To fortify the predictive accuracy of our
between students' skills and industry expectations. Beyond model, we execute rigorous testing, splitting the dataset into
technical expertise and subject knowledge, soft skills are a training subset and a smaller testing subset, maintaining a
emphasized as key factors in the campus recruitment process. ratio of 1:4. This carefully partitioned data allows us to
To bridge this gap, industries are encouraged to engage with scrutinize the performance of the model with precision.
campuses through internships, curriculum development, and Leveraging data mining classification techniques, we proceed
student workshops. Studies underscore the characteristics of to the core of our analysis, employing methods such as
the campus recruitment process and note that engineering Support Vector Machines, Decision Trees, and more.
students primarily base their career choices on intrinsic The ultimate step entails a meticulous evaluation of the
factors. Notably, software services companies in India play a predictive accuracy and performance of each classification
prominent role in campus recruitment, seeking students with technique. This meticulous evaluation forms the bedrock of
logical and problem-solving abilities. Building a positive our analysis, providing valuable insights into the
brand image on campuses is recognized as a pivotal factor in effectiveness of the methods in forecasting campus
attracting top talent, especially among non-computer placements, guiding our conclusive findings.
science/IT students with multiple options for career choices.
In a recent study [2], random forest algorithms were
employed to classify a dataset of campus-placed and non-
placed students, achieving an 86% accuracy rate. The study
The steps involved E. website. Here
in this system are as is the link for the
follows, datase
The campus placement dataset is collected from

The steps involved Kaggle([Link]


kar/campus-placement-data-for-engineering-colleges/
data

in this system are as


The dataset consists of various attributes such as
CGPA, Internships, Projects, Workshops, Aptitude
Scored SoftSkillsRating, Extracurricular Activities,
Placement Training taken or not and High School

follows,
marks.

B. Handling Categorical Data:


A. Data Acquisition Since we cannot deal with categorical values
directly mapping is done.
B. The campus Attributes such as Extracurricular activities and
Placement Training have values as ‘Yes’ and ‘No’. We
will replace these values with boolean numbers like 0,1

placement achieved by map function in python.


For eg:
df[‘training]=df[‘training’].map({‘Yes’:1, ‘No’:0})

dataset is
collected from
Kaggle
C. website. Here
is the link for the
dataset
D. The campus
Additional data-preprocessing is not required since data is
placement clean and does not have any null values in any of the rows.

C. Feature Selection:
dataset is Under this section we evaluate various
features/attributes and their co-relation with the target
feature.
collected from Analysis such as the number of students placed with respect
to the internship they did, number of projects they
completed or whether they took placement training or not.
Kaggle We then deduce whether the placement count of students is
dependent on these features.

D. Split Data:
Here, data is testing data. Where
divided into two 80 % data is taken
parts i.e. training for training our
data & machine
testing data. Where learning algorithm
80 % data is taken and remaining 20 %
for training our data is used for
machine testing
learning algorithm whether our trained
and remaining 20 % machine learning
data is used for model is working
testing correctly or not.
whether our trained Here, data is
machine learning divided into two
model is working parts i.e. training
correctly or not. data &
Here, data is testing data. Where
divided into two 80 % data is taken
parts i.e. training for training our
data & machine
learning algorithm whether our trained
and remaining 20 % machine learning
data is used for model is working
testing correctly or not.
whether our trained Here, data is
machine learning divided into two
model is working parts i.e. training
correctly or not. data &
Here, data is testing data. Where
divided into two 80 % data is taken
parts i.e. training for training our
data & machine
testing data. Where learning algorithm
80 % data is taken and remaining 20 %
for training our data is used for
machine testing
learning algorithm whether our trained
and remaining 20 % machine learning
data is used for model is working
testing correctly or not.
Here, data is testing data. Where
divided into two 80 % data is taken
parts i.e. training for training our
data & machine
testing data. Where learning algorithm
80 % data is taken and remaining 20 %
for training our data is used for
machine testing
learning algorithm whether our trained
and remaining 20 % machine learning
data is used for model is working
testing correctly or not.
Here, data is divided into two parts i.e., training data
whether our trained and testing data. Where 80% data is taken for training our
machine learning algorithm and remaining 20% data is used
for testing whether our trained machine learning model is
machine learning working correctly or not.
We can also use external data sets to test our model
and deduce the accuracy of the result with respect to the
model is working target feature.

E. Machine Learning Algorithm:


correctly or not. a) Logistic Regression:
Logistic regression is a statistical method used to
determine the outcome of a dependent variable(y) based on
Here, data is the values of independent variable(x).
In our problem dependent variable is placement status and
independent variables are the features selected by us in the
divided into two previous step.

parts i.e. training b) Decision Tree:


data & A decision tree is a
graph like a tree
where nodes edges represent the
represent answers of the
the position where question; and the
we select the feature leaves
and ask a question, represent the final
edges represent the output or label of
answers of the the class.
b) Decision Tree:
question; and the A decision tree is a graphical structure resembling a tree,
where nodes symbolize the points where we choose a
feature and pose a question, edges represent the answers to
leaves these questions, and the leaves represent the ultimate output
or class label.

represent the final c) K Nearest Neighbor


K-NN classifies new data by comparing its similarity to
known data in distinct classes based on class labels. It can
output or label of effectively consider a wide array of student attributes, such
as academic performance, internships, etc., to make
predictions based on similarities. This provides practical
the class. insights into the factors influencing campus placements,
making it a suitable choice for our task.

b) Decision Tree: d) Random Forest


The Random Forest classifier consists of several decision
trees which apply on different subsets of our dataset and the
A decision tree is a average of outputs of all the decision trees is taken to
improve the accuracy of output prediction.

graph like a tree e) Naïve Bayes


Naive Bayes offers several advantages, including its
simplicity, efficiency, and suitability for text and categorical
where nodes data. In the context of campus placements, the algorithm's
ability to calculate conditional probabilities allows us to

represent
make informed predictions based on a wide array of student
attributes.

the position where f) Support Vector Machine


SVM is known for its versatility in binary classification

we select the feature


tasks and is particularly valuable when dealing with datasets
that have diverse features and complex decision boundaries.
The use of SVM in campus placement prediction involves

and ask a question,


feature selection and preprocessing, data splitting, model
training, and evaluation.

F. Evaluation of Results
The evaluation of results in this scenario will primarily
focus on accuracy, a fundamental performance metric for
classification models. The accuracy score will be calculated
using the formula: Accuracy = (True Positives + True
Negatives) / (True Positives + False Positives + True
Negatives + False Negatives). True Positives represent
correctly identified placements, while True Negatives
denote correctly identified non-placements. False Positives
and False Negatives correspond to incorrect predictions of
placements and non-placements, respectively. This
comprehensive approach to evaluation will provide insights
into the model's ability to correctly predict placement
outcomes, offering a valuable assessment of its performance
and reliability.

IV. REFERENCES

[1] [Link]
placement-data-for-engineering-colleges/data
[2] Laxmi Shanker Maurya and Md Shadab Hussain and Sarita
Singh, “Developing Classifiers through Machine Learning
Algorithms for Student Placement Prediction Based on Academic
Performance”, In Applied Artificial Intelligence, vol. 35, no. 6, pp.
403-420, 2021, doi: 10.1080/08839514.2021.1901032.
[3] Pratiwi, Oktariani Nurul. “Predicting student placement class using
data minig.” Proceedings of 2013 IEEE International Conference on
Teaching,Assessment and Learning of Engineering(TALE).
IEEE,2013.
[4] [5] Pal, A.K. and S. Pal (2013)Analysis and Mining of Educational
Data for Predicting the Performance of Students. (IJECCE)
International Journal of ElectronicsCommunication and Computer
Engineering, Vol. 4, Issue 5, pp. 1560-1565, ISSN: 2278-4209, 2013.
[5] Ramanathan, L., P. Swarnalathat and G.D. Gopal (2014)Mining
Educational Data for Students’ Placement Prediction using Sum of
Difference Method. International Journal of Computer Applications
99(18): 36-39

You might also like