Automating Resume Screening
Automating Resume Screening
I. INTRODUCTION
In recent years, companies have received an enormous
number of applications from candidates for job openings.
And the HR department needs to pick the best one out of
those applications. It’s very difficult for the hiring
department to look into each resume, analyse them based on
their education, skills, and certifications, and shortlist them
manually, and this makes the hiring process slow down. As
the corporate world runs on tight deadlines, a delay in the
addition of team members during the hiring process could
lead to heavy losses. Furthermore, the candidate who applied
for the job must wait a long time for a response from the Fig 1: Basic Flow of the Model
hiring department. Another problem with manual processes Although many models are proposed for the automation
is that they require a lot of manpower, which can sometimes of the resume screening process, they have some limitations.
lead to human errors. And thereby, the manual analysis and We are planning to build a hybrid model based on those
shortlisting of resumes are highly inefficient. models, removing all those limitations, and getting high
Also, in a recent study, it was reported that in the accuracy.
upcoming years, there will be 75–100 applicants for every There are two main modules in this system; they are: -
job opening. And also, many companies are opening their
jobs to remote workers, which means the applicant need not 1.) Text Analysis: Making the unstructured resumes into
stay in the country of the job opening. And this will further a structured format and breaking the job description
increase the number of applicants. Which makes the manual into vectors for training the classifier.
screening of resumes more inefficient. 2.) Resume Classification: The classifier will use the job
To overcome the above-mentioned limitations, this description as a parameter and classify whether the
research study presents an approach that will automate the resumes are suitable for the job or not.
tasks of resume analysis and classification. This is an
intelligent system that can automate the tasks of resume
analysis and classification. We need the system to take the
resumes as input and classify them according to whether they II. RELATED WORK
are suitable for the job or not. The parameters for the There have been many approaches and techniques
classification will be education, skills, certifications, and proposed for the automation of resume analysis and
experience. which, in return, will save a lot of time in the classification. In this context, some approaches attempt to
hiring process. classify the resumes using cosine similarity as their basis for
92
Authorized licensed use limited to: VIT University. Downloaded on February 07,2025 at 17:59:28 UTC from IEEE Xplore. Restrictions apply.
shown in Figure 2, the proposed system comprises several
modules that are organized as follows:
93
Authorized licensed use limited to: VIT University. Downloaded on February 07,2025 at 17:59:28 UTC from IEEE Xplore. Restrictions apply.
subskill, it reduces the score of the resume, thereby making it
unsuitable for the job. Let’s look into it with the help of an
example. Consider an applicant who listed "React" on his
resume as a skill. And a person who learned React should be
familiar with other skills such as HTML, CSS, and
Javascript. Let’s look at a JD that has HTML, CSS, and JS as
its required skills. When the classifier is not able to identify
the subskills, it makes the applicant ineligible for the job.
This brings the accuracy of the model down. This is where
corpus comes into play; it checks the react skills in corpus
and gets all the subskills of the skills, which will help us
increase the accuracy of the model.
V. EXPERIMENTAL EVALUATION
This section shows the evaluation of the model. Here we
will compare the score of our model with manual models and
see how it performed compared to others.
Table 1 score comparison between manual and model
Difference
Resume Manual Model’s
Job title (Manual- Result
index score score
Models)
Perfect
R1 0.65 0.65 0.00
match
Under
R2 0.70 0.63 0.07
qualified
Java
Full Perfect
R3 0.44 0.44 0.00
Stack match
Dev Over
R4 0.80 0.95 -0.15
qualified
Fig 5: Flow chart for the Structurisation of Job Description R5 0.50 0.38 0.12
Under
qualified
The data stored in the data frame will be compared with Perfect
R6 0.82 0.82 0.00
the data in the corpus, and they both together will be used for Data match
the training of the SVM model and updating the corpus. analyst Perfect
R7 0.57 0.57 0.00
match
The classification will consist of parameters such as Over
R8 0.63 0.72 -0.09
"education required, "skill required, "work experience" (if qualified
mandatory), and "projects. As shown in Table 1, we have taken two job postings and
Now, we will see the conversion of unstructured data eight resumes. The first job post is for a Java Full Stack
from resumes to a structured format. This is much more Developer, and it has the following requirements: 2 years of
complex when compared to the job description. Because experience in web development using Java, front-end
resumes will be in different formats, colours, and types. And languages such as HTML, CSS, and JavaScript, frameworks
the model needs to convert them correctly. If not, it will lead such as React, Django, A strong grip over databases such as
to a lot of errors, resulting in a decrease in accuracy. MySql and MongoDB and certification in Red Hat will be
helpful. And Job Post 1 has four resumes with it, namely R1,
Now, each resume will be classified with the help of the R2, R3, and R4. As we can see from the resumes R1 and R3,
support vector machine. The SVM will classify the resumes the difference is 0, so it’s a perfect match. In resume R2, the
into two categories: - difference is 0.07, which is because the manual score is
higher than the model score, so it's underqualified. For
1.) Eligible for the job.
resume R4, the difference is -0.15 because the model score is
2.) Not eligible for the job. higher than the manual score, so it's overqualified.
The eligibility of a candidate for the job will be based on Now we look at Job Post 2, which is Data Analyst. Its
whether the skills mentioned in the resume match the job requirements are 1 year of experience, a strong grasp of
description and the corpus; if any mandatory work languages such as Python, and tools such as Tableau, Weka,
experience is mentioned in the job description, the classifier and PowerBI. Certification in Hadoop will be an added
will also check it in the resume and classify them advantage. And we have taken four resumes, namely R5, R6,
accordingly. The model adds additional scores to the R7, and R8. As we can see for the resumes R6 and R7, the
resumes if they consist of any certificates or certifications difference is 0 because the manual score and model score is
that match the job description. equal, and thereby it’s a perfect match. For resume R5, the
difference is 0.12 because the manual score is greater than
Need for Corpus: The corpus is what makes the the model score and it is underqualified. For me 8, the
difference between this model and all other existing models. difference is -0.09 because the model score is higher than the
This helps us increase the accuracy of the system. Most of manual score, so it’s overqualified.
the time, the applicant will list a high-level skill that consists
of many subskills, and when the job description mentions a
subskill as a requirement and the classifier can’t find that
94
Authorized licensed use limited to: VIT University. Downloaded on February 07,2025 at 17:59:28 UTC from IEEE Xplore. Restrictions apply.
Table 2 Comparison with other approaches communication skills, and a Certificate in Human Resources
Model Management from York University or SHRM certification,
Resume Manual Cosine
Job title
index score
KNN
Similarity
Score which will be an added advantage. And it has three resumes,
score
namely R4, R5, and R6. As shown in Table 2, the manual
Marketing R1 0.56 0.20 0.37 0.51 scores and model scores are almost the same. And KNN
Manager
R2 0.89 0.56 0.60 0.78 performed the second best, and cosine similarity performed
the least.
R3 0.40 0.20 0.30 0.45
Human R4 0.61 0.55 0.50 0.65 The third job posting is Software Engineer, and its
Resources R5 0.46 0.35 0.20 0.46 requirements are B. Tech. or B.S. A degree in computer
Operator R6 0.53 0.41 0.35 0.54
Software R7 0.35 0.20 0.20 0.35
science, a good understanding of C/C++ and Java, Python
Engineer R8 0.70 0.61 0.70 0.75 full-stack development, and AWS certification will be an
R9 0.20 0.20 0.25 0.25
added advantage. It has three resumes, namely R7, R8, and
Now we compare the results of the model with the manual R9. Similar to the results of the previous 2 job posts, the
score and other models such as the KNN score and cosine scores of our model and manual are almost equal. This time,
similarity. cosine similarity came in at the second and KNN came in at
the third.
KNN stands for k-nearest neighbours algorithm; it is a And the results of the 3 job postings show that the model is
supervised learning technique used to make classifications. not only working well but also performing better compared
Using this algorithm, we can categorize objects into K
to the other available models and algorithms.
different groups. For the resume classification application,
The reason why our model performed better than the cosine
we will be choosing K as 2. So, there are two categories: one
is suitable for the job, and another is not suitable for the job. similarity is that the cosine similarity just looks at the
And we will pass the resumes into the model so the similarity between the job description and the resumes and
algorithm can determine whether they are suitable for the job gives out a score; there is much more to look at than that.
description or not. For instance, a candidate might have all the skills and
certifications mentioned in the job description but not satisfy
the experience requirement. As most of the matter is similar,
even though the requirement is not satisfied, he still gets a
score.
The reason why our model performed better than the KNN
algorithm is that the KNN algorithm doesn’t use its previous
This is the Euclidean distance formula used in the KNN classification insights to classify the resumes. For this
algorithm; it helps us determine whether the resume is reason, every classification is a new classification. Because
suitable for the job or not. of this, the KNN algorithm is also called a lazy learning
Cosine Similarity Cosine similarity is used to measure algorithm. It uses the same old data for all the resumes
similarities between two documents or any two sequences in because it is one-time trained data for all the classifications.
general. This is one of the most important concepts in data But our model overcame this with the help of corpus. Which
science. The application takes two files as input; the files helps the model have high accuracy.
might be of any format (.docx,.txt,.xlsx,.csv). And analyzes
the number of similarities between them and gives us a score. VI. RESULTS AND DISCUSSION
We have passed a list of resumes into the application along
with a job description. And they ate the results of it.
95
Authorized licensed use limited to: VIT University. Downloaded on February 07,2025 at 17:59:28 UTC from IEEE Xplore. Restrictions apply.
These are the people who have the highest model score model on a day-to-day basis, thereby ensuring that the
between their job description and their resume, i.e., people model's accuracy won’t decrease at any point in the future.
who have skills that are very close to their job description.
Now let’s investigate the accuracy, precision, and other REFERENCES
metrics. [1] A. Zaroor, M. Maree and M. Sabha, "JRC: A Job Post and Resume
These are the accuracy scores in detail, which further prove Classification System for Online Recruitment," 2017 IEEE 29th
that our model is a success and can work more accurately International Conference on Tools with Artificial Intelligence
(ICTAI), Boston, MA, USA, 2017, pp. 780-787, doi:
than any manual model. 10.1109/ICTAI.2017.00123.
[2] S. Nasser, C. Sreejith and M. Irshad, "Convolutional Neural Network
with Word Embedding Based Approach for Resume Classification,"
2018 International Conference on Emerging Trends and Innovations
In Engineering And Technological Research (ICETIETR),
Ernakulam, India, 2018, pp. 1-6, doi:
10.1109/ICETIETR.2018.8529097.
[3] B. Gunaseelan, S. Mandal and V. Rajagopalan, "Automatic Extraction
of Segments from Resumes using Machine Learning," 2020 IEEE
17th India Council International Conference (INDICON), New Delhi,
India, 2020, pp. 1-6, doi: 10.1109/INDICON49873.2020.9342596.
[4] Avisha Anand, Sandeep Dubey "CV Analysis Using Machine
Learning", Ijraset Journal For Research in Applied Science and
Engineering Technology (IJRASET), 2022, ISSN: 2321-9653, doi:
https://doi.org/10.22214/ijraset.2022.42295
[5] Rajath V, Riza Tanaz Fareed, Sharadadevi Kaganurmath, 2021,
Resume Classification and Ranking using KNN and Cosine
Similarity, INTERNATIONAL JOURNAL OF ENGINEERING
RESEARCH & TECHNOLOGY (IJERT) Volume 10, Issue 08
(August 2021)
[6] Pradeep Kumar Roy, Sarabjeet Singh Chowdhary, Rocky Bhatia, A
Machine Learning Approach for automation of Resume
Recommendation system, Procedia Computer Science, Volume 167,
2020, Pages 2318-2327, ISSN 1877-0509,
https://doi.org/10.1016/j.procs.2020.03.284.
[7] S. Ramraj, V. Sivakumar and K. Ramnath G., "Real-Time Resume
Classification System Using LinkedIn Profile Descriptions," 2020
International Conference on Computational Intelligence for Smart
Power System and Sustainable Energy (CISPSSE), Keonjhar, India,
2020, pp. 1-4, doi: 10.1109/CISPSSE49931.2020.9212209.
[8] K. Bindra and A. Mishra, "A detailed study of clustering algorithms,"
2017 6th International Conference on Reliability, Infocom
Technologies and Optimization (Trends and Future Directions)
(ICRITO), Noida, India, 2017, pp. 371-376, doi:
10.1109/ICRITO.2017.8342454.
[9] Z. Li, F. Kang, P. Yu and H. Shu, "A Review of Field Text Analysis,"
Fig 7: Accuracy precision and other metrics 2021 IEEE Conference on Telecommunications, Optics and
The accuracy of the model came out to be 99%. We have Computer Science (TOCS), Shenyang, China, 2021, pp. 52-58, doi:
also used the application on our peers’ resumes and 10.1109/TOCS53301.2021.9688944.
calculated their chances of getting a job manually. And the [10] Pal, Riya & Shaikh, Shahrukh & Satpute, Swaraj & Bhagwat,
Sumedha. (2022). Resume Classification using various Machine
results came out to be accurate and matched those of the Learning Algorithms. ITM Web of Conferences. 44. 03011.
manual screening. The application performed the job on the 10.1051/itmconf/20224403011.
10th attempt at manual screening. [11] Gopalakrishna, Suhas & Varadharajan, Vijayaraghavan. (2019).
Automated Tool for Resume Classification Using Semantic Analysis.
VII. CONCLUSION International Journal of Artificial Intelligence & Applications. 10. 11-
23. 10.5121/ijaia.2019.10102.
In conclusion, resume classification using support vector [12] ALI, Irfan et al. Resume Classification System using Natural
machines has been demonstrated to be an effective and Language Processing and Machine Learning Techniques. Mehran
efficient method for automating the resume screening University Research Journal of Engineering and Technology, [S.l.], v.
process. By training a model on a dataset of labelled 41, n. 1, p. 65 - 79, Jan. 2022. ISSN 2413-7219. doi:
resumes, the model can learn to accurately classify new http://dx.doi.org/10.22581/muet1982.2201.07
resumes as suitable for the job or not using fields such as [13] Takkar, Sakshi & Arora, Mohit & Chopra, Shivali. (2019). A Deep
Insight of Automatic Resume Classifiers For Skill Mapping By
education level, work experience, skills, and certifications. Recruiters. 1247.
This can save significant time and resources for HR
[14] S. Bharadwaj, R. Varun, P. S. Aditya, M. Nikhil and G. C. Babu,
departments and recruiters by reducing the amount of time "Resume Screening using NLP and LSTM," 2022 International
spent manually reviewing resumes. Conference on Inventive Computation Technologies (ICICT), Nepal,
2022, pp. 238-241, doi: 10.1109/ICICT54344.2022.9850889.
Most of the present-day resume classifiers are only one- [15] G. Sudha, S. K. K, S. J. S, N. D, S. S and K. T. G, "Personality
time trained, and thereby their accuracy will go down at Prediction Through CV Analysis using Machine Learning Algorithms
some future point because of the updated data. But our for Automated E-Recruitment Process," 2021 4th International
application will overcome this. We have implemented a Conference on Computing and Communications Technologies
corpus that will update the available data with every (ICCCT), Chennai, India, 2021, pp. 617-622, doi:
10.1109/ICCCT53315.2021.971178
classification that the application performs and train the
96
Authorized licensed use limited to: VIT University. Downloaded on February 07,2025 at 17:59:28 UTC from IEEE Xplore. Restrictions apply.