Fake Job Detection

This study compares traditional and deep learning models for detecting fake job postings, focusing on TF-IDF with Logistic Regression, XGBoost, and BERT-based models. XGBoost achieved the highest accuracy of 97.2%, while TF-IDF with Logistic Regression provided the fastest processing time, making it suitable for real-time applications. The research aims to enhance user safety on job portals by proposing a scalable fraud detection framework.

Uploaded by

dnrocks0904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views3 pages

Fake Job Detection

Uploaded by

dnrocks0904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 08 | Aug 2025 www.irjet.net p-ISSN: 2395-0072

Fake Job Posting Detection Using Machine Learning: A

Comparative Study
Shaik Mohammed Imran1, Gadupudi Mokshagna2

1B.Tech Student, Department of Computer Science and Engineering (AI & ML), Pragati Engineering College, East
Godavari, India
2B.Tech Student, Department of Computer Science and Engineering (AI & ML), Pragati Engineering College, East

Godavari, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Online job portals have become primary 1.2 Research Objectives
platforms for job seekers, but they are increasingly targeted
by fraudsters posting fake job listings. This study presents a The primary objective of this research is to evaluate
comprehensive comparison of three machine learning both traditional machine learning models and deep
approaches for automated fake job posting detection: TFIDF learning models for the task of fake job posting detection.
with Logistic Regression, XGBoost, and BERT-based models. This includes a detailed performance comparison in terms
Using the Employment Scam Dataset from Kaggle of prediction accuracy, speed, and computational
containing 17,880 job postings, we evaluate these models on efficiency. Another key goal is to identify and analyze the
accuracy, precision, recall, F1-score, and computational most significant features that contribute to detecting
efficiency. Our results demonstrate that XGBoost achieves fraudulent postings, such as missing job details or specific
the highest accuracy of 97.2%, while TF-IDF with Logistic keyword patterns. Finally, the study aims to propose a
practical and scalable fraud detection framework that can
Regression provides the fastest processing time suitable for be integrated into real-time job portal systems to enhance
real-time applications. This research contributes to user safety and trust.
protecting job seekers from employment scams and can be
integrated into job portal platforms for automated fraud 2. LITERATURE REVIEW
detection
2.1Related Work
Key Words: Fake Job Detection, Machine Learning,
TF-IDF, XGBoost, BERT, Text Classification, TF-IDF and n-gram based models are widely used in spam
Employment Fraud, Natural Language Processing filtering. XGBoost has been successfully applied in financial
fraud detection. BERT and transformer models have shown
1.INTRODUCTION strong performance in text classification. However, specific
studies on fake job detection using a comparative model
Fake job scams are increasing rapidly across online job analysis are limited.
portals, affecting millions of job seekers with emotional
and financial consequences. These portals process a large 2.2 Research Gap
number of postings daily, making manual review difficult.
While machine learning has been applied to text While several studies have explored the use of machine
classification and spam detection, limited comparative learning for detecting fraudulent content, there is a lack of
studies exist for fake job posting detection. comprehensive comparison between traditional machine
learning techniques and modern deep learning models
1.1Problem Statement specifically for fake job posting detection. Existing
research often overlooks the practical aspects of
Fake job scams are increasing rapidly across online job deployment, including the computational requirements
portals, affecting millions of job seekers with emotional and real-time applicability of these models. Additionally,
and financial consequences. These portals process a large there is limited investigation into which features are most
number of postings daily, making manual review difficult. indicative of fraudulent job postings, leaving a gap in
While machine learning has been applied to text understanding the underlying patterns that distinguish
classification and spam detection, limited comparative
legitimate listings from scams.
studies exist for fake job posting detection.

© 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page 78
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 08 | Aug 2025 www.irjet.net p-ISSN: 2395-0072

3. METHODOLOGY

3.1 Dataset Description

The dataset used is the Kaggle Employment Scam Dataset
containing seventeen thousand eight hundred eighty job
postings. It is a binary classification task with about four
point eight percent labeled as fraudulent. Key features
include job title, description, requirements, benefits, and
company details.

3.2 Data Preprocessing

Preprocessing steps include cleaning text to remove HTML
and special characters, combining title and description,
extracting word count features, and binary indicators for
missing information. Data was split into training and
testing sets using stratified sampling

3.3 Models Used

 TF-IDF with Logistic Regression
TF-IDF vectors were extracted and combined with
numerical features. Logistic Regression was
applied with L2 regularization. It is fast and
interpretable.

 BERT (Simplified)
Twitter-RoBERTa was used to generate
embeddings. Enhanced TF-IDF with Random
Forest was also tested as a fallback. These models
capture deep semantic patterns.

 XGBoost
This model uses decision trees with gradient
boosting. It handles complex feature interactions
and is robust to overfitting.

© 2025, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 79
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 08 | Aug 2025 www.irjet.net p-ISSN: 2395-0072

3.4 Evaluation Metrics 5.2 Limitations

Models were evaluated using accuracy, precision, recall, This study is subject to certain limitations. The dataset
and F1 score. Training time and prediction time were used originates from a single source, which may limit the
recorded. A five fold cross validation was used. generalizability of the findings across different job
platforms or regions. Moreover, the analysis is restricted
4 RESULTS AND ANALYSIS to English-language postings, excluding fraud patterns
that may exist in non-English job markets. Additionally,
4.1 Performance Comparison the manual feature engineering applied to traditional
machine learning models might not capture deeper,
The three models were evaluated based on several complex patterns that automated or neural approaches
performance metrics, including accuracy, precision, recall, could potentially identify.
F1 score, training time, and prediction time. The TF-IDF
with Logistic Regression model achieved an accuracy of 6 CONCLUSION
96.34 percent, a precision of 85.23 percent, a recall of
78.92 percent, and an F1 score of 81.95 percent. It was This study presents a detailed comparison of machine
also the fastest model, with a training time of 2.45 seconds learning models for detecting fake job postings. XGBoost
and a prediction time of just 0.08 seconds. XGBoost showed the best performance while Logistic Regression
delivered the best overall performance, with an accuracy was best for fast applications. The results can be directly
of 97.21 percent, precision of 88.91 percent, recall of 84.56 applied to improve trust and safety in online job portals.
percent, and an F1 score of 86.68 percent. However, it
required more training time at 15.23 seconds and had a 7 CODE AND DATA AVAILABILITY
prediction time of 0.21 seconds. The BERT-based
approach or Enhanced TF-IDF model showed an accuracy  Code implemented in Python using scikit learn and
of 95.98 percent, precision of 82.34 percent, recall of 80.12 transformers
percent, and an F1 score of 81.21 percent. This model had
 Dataset available publicly on Kaggle
the highest computational cost, with a training time of
45.67 seconds and a prediction time of 1.34 seconds.  Modular pipeline supports easy integration and testing
These results highlight the trade-offs between model
accuracy and efficiency for different deployment 8 REFERENCES
scenarios.
[1] Kaggle, "Fake-job postings dataset", Available at:
4.2 Key Findings https://www.kaggle.com/datasets/shivamb/real-or-fake-
fake-jobposting-prediction
XGBoost had the highest performance overall. TF-IDF with
Logistic Regression had the lowest latency and is suitable
[2] J. Friedman, "Greedy Function Approximation: A
for real time systems. BERT showed good results but
Gradient Boosting Machine", The Annals of Statistics, vol.
required more resources.
29, no. 5, pp. 1189–1232, 2001.
4.3 Feature Importance
[3] [3] J. Devlin, M. Chang, K. Lee, and K. Toutanova,
Important indicators include text length, missing fields "BERT: Pre-training of Deep Bidirectional Transformers
like salary or location, keywords such as urgent and work for Language Understanding", arXiv preprint,
from home, and incomplete company profiles. arXiv:1810.04805, 2018.

4.4 Error Analysis [4] [4] F. Pedregosa et al., "Scikit-learn: Machine Learning
in Python", Journal of Machine Learning Research, vol. 12,
False positives included legitimate remote jobs. Some pp. 2825–2830, 2011.
sophisticated scams were false negatives. The class
imbalance was handled using stratified sampling and [5] [5] Hugging Face, "Transformers Library", Available at:
robust metrics. https://huggingface.co/transformers
5 DISCUSSION [6] [6] T. Chen and C. Guestrin, "XGBoost: A Scalable Tree
Boosting System", Proceedings of the 22nd ACM SIGKDD
5.1 Practical Implications
Conference, pp. 785–794, 2016.
XGBoost is recommended for batch fraud detection.
Logistic Regression is best for real time filtering. A hybrid
system can combine speed and accuracy.

Fake Job Post Detection Using Machine Learning
100% (1)
Fake Job Post Detection Using Machine Learning
24 pages
Fake Job Detection System Guide
No ratings yet
Fake Job Detection System Guide
7 pages
Fake Job Detection Research Proposal
No ratings yet
Fake Job Detection Research Proposal
4 pages
Predicting Fraudulant Job Ads With Machine Learning
No ratings yet
Predicting Fraudulant Job Ads With Machine Learning
3 pages
XGBoost vs Naive Bayes for Job Fraud Detection
No ratings yet
XGBoost vs Naive Bayes for Job Fraud Detection
6 pages
Fin Ijprems1680687249
No ratings yet
Fin Ijprems1680687249
6 pages
Fakejobdett
No ratings yet
Fakejobdett
9 pages
Fakejobpublished
No ratings yet
Fakejobpublished
5 pages
Final
No ratings yet
Final
30 pages
Fake Job Post Detection Using Machine Learning
No ratings yet
Fake Job Post Detection Using Machine Learning
9 pages
1822 B.E Cse Batchno 220
No ratings yet
1822 B.E Cse Batchno 220
74 pages
Project Report: Fake Job Prediction
No ratings yet
Project Report: Fake Job Prediction
3 pages
Fake Online Job Recruitment
100% (1)
Fake Online Job Recruitment
13 pages
Project Viva
No ratings yet
Project Viva
4 pages
Project Paper
No ratings yet
Project Paper
7 pages
Fake Job Detection via ML Classifiers
No ratings yet
Fake Job Detection via ML Classifiers
3 pages
Fake Job Post Prediction Using ML
No ratings yet
Fake Job Post Prediction Using ML
7 pages
Fakejob
No ratings yet
Fakejob
5 pages
Fake Job Post Prediction: Supervisor: I.Lakshmi Manikyamba Ass0Ciate Professor-Cse
No ratings yet
Fake Job Post Prediction: Supervisor: I.Lakshmi Manikyamba Ass0Ciate Professor-Cse
10 pages
Analyzing The Performance of Novel Logistic Regression Over Linear Regression Algorithms
No ratings yet
Analyzing The Performance of Novel Logistic Regression Over Linear Regression Algorithms
5 pages
Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches
No ratings yet
Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches
21 pages
A. Rupasri (20NE1A0510) Sk. Rehamunnisha (20NE1A0539) D. Sai Supriya (20NE1A0542) Sk. Mohammad Fahim (20NE1A0551)
No ratings yet
A. Rupasri (20NE1A0510) Sk. Rehamunnisha (20NE1A0539) D. Sai Supriya (20NE1A0542) Sk. Mohammad Fahim (20NE1A0551)
20 pages
Fake E Job Posting Prediction Based On A
No ratings yet
Fake E Job Posting Prediction Based On A
7 pages
Summer Intern
No ratings yet
Summer Intern
34 pages
Fake Job Detection via Machine Learning
No ratings yet
Fake Job Detection via Machine Learning
6 pages
ABSTRACT
No ratings yet
ABSTRACT
5 pages
Detection of Online Employment Scam Through Fake Jobs Using Random Forest Classifier
No ratings yet
Detection of Online Employment Scam Through Fake Jobs Using Random Forest Classifier
8 pages
Fake Job Prediction
No ratings yet
Fake Job Prediction
23 pages
Synopsis
No ratings yet
Synopsis
12 pages
Fake Job Posting Detection Using SVM & RF
No ratings yet
Fake Job Posting Detection Using SVM & RF
5 pages
Sample IEEE Article Ready Format
No ratings yet
Sample IEEE Article Ready Format
5 pages
Online Recruitment Fraud ORF Detection Using Deep
No ratings yet
Online Recruitment Fraud ORF Detection Using Deep
22 pages
2023-V14I209 Fake Job Detection Using Machine Learning
No ratings yet
2023-V14I209 Fake Job Detection Using Machine Learning
8 pages
Machine Learning for Fake Job Detection
No ratings yet
Machine Learning for Fake Job Detection
5 pages
Final Year Project - Nagabhusana K Nagabhusana K
No ratings yet
Final Year Project - Nagabhusana K Nagabhusana K
6 pages
Fake Job Abstract
No ratings yet
Fake Job Abstract
2 pages
Detecting Fake Job Posts with AI
No ratings yet
Detecting Fake Job Posts with AI
7 pages
Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches
No ratings yet
Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches
18 pages
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
No ratings yet
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
6 pages
Fake Job Detection
No ratings yet
Fake Job Detection
2 pages
Fake Job Posting Detection
No ratings yet
Fake Job Posting Detection
5 pages
Updated Fake Job Posting Detection Presentation
No ratings yet
Updated Fake Job Posting Detection Presentation
13 pages
Fake Job Detection Using Machine Learning
No ratings yet
Fake Job Detection Using Machine Learning
8 pages
Fake Job Listing Detection Using Machine Learning Approach
No ratings yet
Fake Job Listing Detection Using Machine Learning Approach
6 pages
Orf Review
No ratings yet
Orf Review
10 pages
A Comparative Study On Fake Job Post Prediction Using Different Machine Learning Techniques
No ratings yet
A Comparative Study On Fake Job Post Prediction Using Different Machine Learning Techniques
11 pages
Fake Job Entry Detectionnn
No ratings yet
Fake Job Entry Detectionnn
25 pages
Research Paper
No ratings yet
Research Paper
5 pages
M11 Final Document
No ratings yet
M11 Final Document
82 pages
Bibilography 5
No ratings yet
Bibilography 5
29 pages
Ijett V68i4p209s
No ratings yet
Ijett V68i4p209s
6 pages
Bhargav Last (1) - 241128 - 143747
No ratings yet
Bhargav Last (1) - 241128 - 143747
48 pages
A Comparative Study On Fake Job Post Prediction Using Different Data Mining Techniques
100% (1)
A Comparative Study On Fake Job Post Prediction Using Different Data Mining Techniques
5 pages
Rs 1
No ratings yet
Rs 1
7 pages
Fake Job Posting Detection Report
No ratings yet
Fake Job Posting Detection Report
10 pages
Fake Jobs Code
No ratings yet
Fake Jobs Code
3 pages
Fake Job Prediction with ML
No ratings yet
Fake Job Prediction with ML
109 pages
Predicting The Trends of Quality-Oriented Jobs
No ratings yet
Predicting The Trends of Quality-Oriented Jobs
3 pages
AI in Agriculture
No ratings yet
AI in Agriculture
11 pages
Teaching With AI
No ratings yet
Teaching With AI
7 pages
IOT Base Door Locking System
No ratings yet
IOT Base Door Locking System
7 pages
The percentage error in determination of g = 4π
No ratings yet
The percentage error in determination of g = 4π
4 pages
f7919e90ca47ba967b49b04936b88887
No ratings yet
f7919e90ca47ba967b49b04936b88887
1 page
Ready For First Coursebook With Key 4 Edition PDF Ebook Class Audio CD
50% (2)
Ready For First Coursebook With Key 4 Edition PDF Ebook Class Audio CD
14 pages
AL4 Pressure Regulation Fault 2
100% (12)
AL4 Pressure Regulation Fault 2
8 pages
Arai 2013 Brochure
No ratings yet
Arai 2013 Brochure
32 pages
VGP 2013 Annual Report
No ratings yet
VGP 2013 Annual Report
9 pages
Tensile Test (Zuhair)
No ratings yet
Tensile Test (Zuhair)
4 pages
AccountStatement01116728173200 15JAN2025 23181314
No ratings yet
AccountStatement01116728173200 15JAN2025 23181314
3 pages
Intro Presentation TMR4225 Marine Operations
No ratings yet
Intro Presentation TMR4225 Marine Operations
12 pages
Porsche Workshop Manual 911 1972
100% (6)
Porsche Workshop Manual 911 1972
1,005 pages
AI Mid Term Exam Sample Paper
100% (1)
AI Mid Term Exam Sample Paper
2 pages
Wins Monitoring System - Users Manual For Schools
100% (1)
Wins Monitoring System - Users Manual For Schools
7 pages
1.01-66566 Tiki Fire CA
No ratings yet
1.01-66566 Tiki Fire CA
6 pages
Lance Brown Resume 11.07.09. (v.1) Doc
No ratings yet
Lance Brown Resume 11.07.09. (v.1) Doc
3 pages
Direct Current (DC) Supply Grids For LED Lighting
No ratings yet
Direct Current (DC) Supply Grids For LED Lighting
11 pages
eBook-ReaderManual 12
No ratings yet
eBook-ReaderManual 12
102 pages
Vigilante de Tensión Gic - sm175 Catálogo y Certificaciones
No ratings yet
Vigilante de Tensión Gic - sm175 Catálogo y Certificaciones
8 pages
Xilica Customer Stories MontrealHeartInstitute
No ratings yet
Xilica Customer Stories MontrealHeartInstitute
3 pages
Proton Gen 2 Manual 221 PDF
60% (5)
Proton Gen 2 Manual 221 PDF
2 pages
DVD Player Service Manual
No ratings yet
DVD Player Service Manual
4 pages
Radiator Anti-Rust & Water Pump Lubricant: The Problem
No ratings yet
Radiator Anti-Rust & Water Pump Lubricant: The Problem
1 page
Times Leader 05-10-2011
No ratings yet
Times Leader 05-10-2011
44 pages
3-Wastewater Flowrates and Constituent Loadings
100% (1)
3-Wastewater Flowrates and Constituent Loadings
105 pages
Trial Mix Report - Test Cube 3
No ratings yet
Trial Mix Report - Test Cube 3
1 page
Assignment 0-Team 9-Lely
No ratings yet
Assignment 0-Team 9-Lely
4 pages
MassMutual Intro Updated
No ratings yet
MassMutual Intro Updated
3 pages
Elisa 600 6PG
No ratings yet
Elisa 600 6PG
6 pages
Bassoon Basics
No ratings yet
Bassoon Basics
4 pages
Lee Millar: Bow Valley College, Centre For Excellence in Foundational Learning
No ratings yet
Lee Millar: Bow Valley College, Centre For Excellence in Foundational Learning
2 pages
Delfin Industrial Vacuum Cleaner For Textile Industry
No ratings yet
Delfin Industrial Vacuum Cleaner For Textile Industry
2 pages
At SKTSZZ QA PRO 0008 000 C01 Quality Control Inspection
No ratings yet
At SKTSZZ QA PRO 0008 000 C01 Quality Control Inspection
14 pages
AeroClub Module3 Electronics in Avionics 1
No ratings yet
AeroClub Module3 Electronics in Avionics 1
10 pages

Fake Job Detection

Uploaded by

Fake Job Detection

Uploaded by

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 08 | Aug 2025 www.irjet.net p-ISSN: 2395-0072

Fake Job Posting Detection Using Machine Learning: A

3.1 Dataset Description

3.2 Data Preprocessing

3.3 Models Used

3.4 Evaluation Metrics 5.2 Limitations

You might also like