resumes with job posts and calculate a match score.
Based on the match score, the resumes are ranked, hence
significantly reducing the time and labor required to screen through hundreds or thousands of resumes. PROBLEM STATEMENTS
Tedious to manage and analyze high volume of resumes
Traditional means introduce human bias and subjectivity
Manual analysis and ranking is time-consuming and inefficient
OBJECTIVES
To build a custom NER model for parsing relevant
information from resumes and job posts
To build a system that compares resumes and job posts,
calculates a match score and ranks them NLP & NER
Natural Language Processing (NLP) is a field of
artificial intelligence that enables machines to understand, interpret, and generate human language.
Named Entity Recognition (NER) is a subtask of NLP
that focuses on identifying and classifying specific entities (e.g., names, dates, locations) in text. spaCy is a fast and efficient open-source library for NLP in Python.
It provides tools for tasks like tokenization,
part-of-speech tagging, dependency parsing, and NER. METHODOLOGY TOOLS & TECHNOLOGIES ● NLP Libraries: SpaCy ● Programming Language: Python ● Machine Learning: Scikit-Learn, TensorFlow, PyTorch ● Database: MongoDB ● Deployment: AWS or Docker or Google Cloud ● Development Tools: Jupyter Notebook, GitHub, VS Code ● Web Framework: Django or Flask or Streamlit ● Other Libraries & Modules: Pandas, Numpy, Matplotlib, Seaborn, Pytesseract, Pillow, PyMuPDF, Python-Docx… DATA COLLECTION
Resume dataset uploaded by Mr. Roman Shilpakar on Google
Drive that contains 1014 resumes
659 job descriptions collected from various online sources
and compiled into a single text file DATA COLLECTION DATA PREPROCESSING
The text files containing the job descriptions are manually
annotated using an online NER annotator called “arunmozhi”
After annotation, a dataset is obtained in json format with
the custom annotations of various entities. DATA PREPROCESSING DATA PREPROCESSING MODEL GENERATION
Using the dataset, a custom NER model is trained using
spaCy’s command line interface MODEL EVALUATION
It took over 3 hours to train the custom NER model using
the resume dataset containing 1014 resumes with a score of 85%
It took over 2 hours to train the custom NER model
using the job post dataset containing 659 job posts with a score of 70% MODEL EVALUATION MODEL EVALUATION MODEL DEPLOYMENT
The custom NER model was saved in the local directory
for deployment in Python app
This stage involves implementing the NER models to parse
the resumes and job descriptions, which will later on be used to compare the two and calculate a match score WORK PROGRESS TASKS COMPLETED
We collected a text file containing 659 job descriptions
We manually annotated the job descriptions into a
dataset for training a model
We trained a custom NER model using the publicly available
resume dataset and the self-annotated job post dataset for parsing key entities from resumes and job posts SAMPLE OUTPUT (RESUME) SAMPLE OUTPUT (RESUME) SAMPLE OUTPUT (JOB POST) SAMPLE OUTPUT (JOB POST) REMAINING TASKS TASKS REMAINING