0% found this document useful (0 votes)

20 views30 pages

Anomaly Detection Using ML

The document discusses the importance of anomaly detection in network traffic, emphasizing its applications in fraud detection and cybersecurity. It reviews various machine learning algorithms suitable for detecting anomalies, comparing their effectiveness and suitability for different data types. The KDD Cup 1999 dataset is utilized to build predictive models, with results indicating high performance in detecting various types of network attacks.

Uploaded by

Nizam Azmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views30 pages

Anomaly Detection Using ML

Uploaded by

Nizam Azmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

ANOMALY DETECTION IN

NETWORK TRAFFIC
ANOMALY DETECTION
USING ML
Muhamad Nizam Azmi
Mataram University
WHY ANOMALY DETECTION ?
Anomaly Detection Overview

Crucial task in identifying deviations from normal

behavior
Applications in fraud detection, industrial maintenance,
cybersecurity

Challenges

High-dimensional data
Large-scale distributed systems
Vast amounts of data
advacements in Machine Learning

Powerful tools for pattern recognition

Effective in complex and high-dimensional
datasets

NEXT- PROJECT
Distinguishing between normal and anomalous
behaviors

Unsupervised Learning Algorithms FOCUS

No need for labeled attack data
Ideal for real-world applications with scarce
labeled data
point anomaly

Collective anomaly

3 TYPES OF
contextual anomaly ANOMALY
point anomaly

Collective anomaly

3 TYPES OF
contextual anomaly ANOMALY

Point anomaly
detection
focus project

COMPARISON OF VARIOUS MACHINE LEARNING ALGORTHM

Aim: Identify the most effective techniques for different anomaly detection tasks

Provide insights and guidelines for future applications in cybersecurity, industrial monitoring, and beyond

ADABOOST Naive Bayes Gradient Boosting

Logistic Regression K-Nearest Neigbors (KNN) SVM

Random Forest Decision Tree Neural Network (NN)

WHY CHOOSE THESE
ALGORITHMS?
AdaBoost Naive Bayes Gradent Boosting Random Forest
AdaBoost is chosen for its Naive Bayes is selected for Gradient Boosting is chosen for Random Forest is chosen for its
ability to enhance the its simplicity, speed, and its high predictive accuracy high accuracy, ability to handle
performance of simple effectiveness in handling and ability to handle a variety large datasets with higher
models, making it effective in large datasets, making it of data types and distributions, dimensionality, and robustness
scenarios where data may suitable for real-time which is crucial for detecting against overfitting.
have a lot of noise or complex anomaly detection tasks. subtle anomalies.
patterns.

Logistic Regression KNN SVM Neural Network (NN)

KNN is chosen for its
Logistic Regression is selected SVM is selected for its Neural Networks are chosen
for its interpretability and simplicity and ability to
robustness in high- for their flexibility and ability to
effectiveness in binary perform well with small dimensional spaces and its model complex, non-linear
classification problems, making to medium-sized effectiveness in cases where relationships in data, which is
it useful for understanding and datasets, particularly the anomaly classes are not essential for accurately
explaining anomaly detection
when the data is not linearly separable. detecting anomalies in high-
results.
linearly separable. dimensional datasets.
WHY CHOOSE THESE
ALGORITHMS?
AdaBoost Naive Bayes Gradent Boosting Random Forest

ability to enhance the

Add a These algorithms collectively provide Random
AdaBoost is chosen for its Naive Bayes is selected for
its simplicity, speed, and
a
Gradient Boosting is chosen for
its high predictive accuracy
Forest is chosen for its
high accuracy, ability to handle

robust framework for identifying anomalies,dimensionality, and robustness

performance of simple effectiveness in handling and ability to handle a variety
large datasets with higher
models, making it effective in large datasets, making it of data types and distributions,
scenarios where data may suitable for real-time which is crucial for detecting
against overfitting.
leveraging their individual strengths to enhance
have a lot of noise or complex
patterns.
anomaly detection tasks. subtle anomalies.

the accuracy and reliability of anomaly detection

Logistic Regression KNN SVM Neural Network (NN)
Logistic Regression is selected
system.SVM is selected for its
KNN is chosen for its
Neural Networks are chosen
for its interpretability and simplicity and ability to
robustness in high- for their flexibility and ability to
effectiveness in binary perform well with small dimensional spaces and its model complex, non-linear
classification problems, making to medium-sized effectiveness in cases where relationships in data, which is
it useful for understanding and datasets, particularly the anomaly classes are not essential for accurately
explaining anomaly detection
when the data is not linearly separable. detecting anomalies in high-
results.
linearly separable. dimensional datasets.
Evaluation of Distributed ML Algorithm for Anomaly Detection
PREV-
RESEARCH
Astekin, M., Zengin, H., and Sözer, H. (2018) compared distributed machine
learning algorithms for system log analysis. They focused on scalability
and efficiency, highlighting the strengths of certain algorithms in handling
large datasets.

Metaheuristics and Machine Learning for Anomaly Detection in Big Data

Cavallaro, C., Cutello, V., Pavone, M., and Zito, F. (2023) reviewed the use of
metaheuristics combined with machine learning for anomaly detection.
Their study showed improved detection accuracy and adaptability to
different datasets.

Industrial Anomaly Detection with Neural Network Architectures

Siegel, B. (2020) compared neural network architectures for detecting
industrial anomalies. The study focused on real-time detection capabilities
and accuracy, discussing implementation challenges and solutions.

ETC.
SUMMARIZE
PREV-
RESEARCH
DATASET INFORMATION
Purpose: The KDD Cup 1999 dataset is used to build a predictive model to distinguish
between "bad" connections (intrusions or attacks) and "good" (normal) connections in
a computer network. It aims to protect the network from unauthorized users, including
potential insiders.

Background: The dataset is based on the 1998 DARPA Intrusion Detection Evaluation
Program, managed by MIT Lincoln Labs. The program's objective was to evaluate
research in intrusion detection using a standard set of data that includes various
intrusions simulated in a military network environment.

Data Collection.....
Environment: Simulated a typical U.S. Air Force LAN with multiple simulated attacks.
Duration: Data was collected over nine weeks (seven weeks for training, two weeks for testing).
Data Size:
Training data: 4 gigabytes of compressed binary TCP dump data, resulting in about five million connection
records.
Test data: Around two million connection records.
Connection Records: Each connection is a sequence of TCP packets between a source IP address and a target IP
address, labeled as either normal or a specific type of attack.
TYPES OF ATTACKS

DOS R2L
Denial of Service e.g., Syn flood Remote to Local e.g., guessing
passwords

U2R Probing
User to Root e.g., Buffer overflow e.g., port scanning
attacks
RESULT

ICMP: The most frequent protocol

type with over 250,000
occurrences.
TCP: The second most common
protocol type with over 150,000
occurrences.
UDP: The least frequent protocol
type with fewer than 50,000
occurrences.
RESULT

0: Not logged in
1: Successfully logged in

The number of users who did not

log in (0) is significantly higher
than those who successfully
logged in (1).
RESULT
Categories:
dos: Denial of Service attacks -
391,458 instances
normal: Normal traffic (no
attack) - 97,278 instances
probe: Surveillance and
probing attacks - 4,107
instances
r2l: Remote to local attacks -
1,126 instances
u2r: User to root attacks - 52
instances
RESULT
num_root: Removed due to high correlation with
num_compromised (Correlation = 0.9938).
srv_serror_rate: Removed due to high correlation
with serror_rate (Correlation = 0.9984).
srv_rerror_rate: Removed due to high correlation
with rerror_rate (Correlation = 0.9947).
dst_host_srv_serror_rate: Removed due to high
correlation with srv_serror_rate (Correlation =
0.9993).
dst_host_serror_rate: Removed due to high
correlation with rerror_rate (Correlation = 0.9870).
dst_host_rerror_rate: Removed due to high
correlation with srv_rerror_rate (Correlation =
0.9822).
dst_host_srv_rerror_rate: Removed due to high
correlation with rerror_rate (Correlation = 0.9852).
dst_host_same_srv_rate: Removed due to high
correlation with dst_host_srv_count (Correlation =
0.9737).
DECISION TREE RESULT
RANDOM FOREST RESULT
SVM RESULT
KNN RESULT
LOGISTIC REGRESSION RESULT
NEURAL NETWORK RESULT
GRADIENT BOOSTING RESULT
NAIVE BAYES RESULT
ADABOOST RESULT
ROC CURVE & FEATURE IMPORTANCES RESULT
DOS (Class 0): AUC (Area Under the Curve) = 1.00
Normal (Class 1): AUC = 1.00
Probe (Class 2): AUC = 0.99
R2L (Class 3): AUC = 0.97
U2R (Class 4): AUC = 0.82

High Performance:
Classes 0 and 1 have perfect AUC scores of
1.00, indicating excellent classification
performance with no false positives.
Class 2 also demonstrates very high
performance with an AUC of 0.99.
Moderate Performance:
Class 3 has an AUC of 0.97, showing strong
performance with minimal false positives.
Class 4 has a lower AUC of 0.82, indicating
room for improvement in distinguishing this
class from others.
ROC CURVE & FEATURE IMPORTANCES RESULT

Dominant Feature: The srv_count feature

overwhelmingly dominates the feature
importance, indicating it has a critical
impact on the model's performance.
Other Features: Although other features
contribute less, they still hold importance for
the model, affecting specific aspects of its
predictions.
CONCLUSION
Effectiveness of Machine Learning Algorithms:
This project successfully demonstrated that various machine learning
algorithms such as ADABOOST, Naive Bayes, Gradient Boosting, Logistic
Regression, K-Nearest Neighbors (KNN), Support Vector Machine (SVM),
Random Forest, Decision Tree, and Neural Networks (NN) are capable of
detecting anomalies in large and complex datasets.

Recommendations for Further Development:

For future research, it is recommended to apply advanced data
augmentation techniques and feature engineering to further enhance
model performance. Additionally, combining multiple algorithms or using
ensemble approaches can help improve the accuracy and robustness of
anomaly detection.
REFERENCES
Astekin, M., Zengin, H., & Sözer, H. (2018). Evaluation of Distributed Machine Learning Algorithms for Anomaly Detection from
Large-Scale System Logs: A Case Study. Proceedings of the IEEE International Conference on Big Data, 862-1967. Get In Touch
With Us
Cavallaro, C., Cutello, V., Pavone, M., & Zito, F. (2023). Discovering anomalies in big data: a review focused on the application
of metaheuristics and machine learning techniques. Frontiers in Big Data, 6. Get In Touch With Us
Shabat, G., Segev, D., & Averbuch, A. (2017). Uncovering Unknown Unknowns in Financial Services Big Data by Unsupervised
Methodologies: Present and Future trends. Proceedings of the Machine Learning Research, 71, 8-19. Get In Touch With Us
Siegel, B. (2020). Industrial Anomaly Detection: A Comparison of Unsupervised Neural Network Architectures. IEEE Sensors
Journal, 4(8), 1-4. Get In Touch With Us
Zoppi, T., Ceccarelli, A., & Bondavalli, A. (2020). Into the Unknown: Unsupervised Machine Learning Algorithms for Anomaly-
Based Intrusion Detection. Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN),
50200, 44. Get In Touch With Us

10
THANK YOU FOR
YOUR ATTENTION

Scsa1619 Ids Unit 2
No ratings yet
Scsa1619 Ids Unit 2
20 pages
IEEE Conference Templa
No ratings yet
IEEE Conference Templa
4 pages
Network Anomaly Detection for Experts
No ratings yet
Network Anomaly Detection for Experts
4 pages
Anomaly Detection in Log Files Based On Machine Le
No ratings yet
Anomaly Detection in Log Files Based On Machine Le
13 pages
Mausumi Doi - Org.10.32010.26166127.2020.3.2.196.206
No ratings yet
Mausumi Doi - Org.10.32010.26166127.2020.3.2.196.206
12 pages
1 s2.0 S1110016824002850 Main
No ratings yet
1 s2.0 S1110016824002850 Main
11 pages
Machine Learning For Anomaly Detection
No ratings yet
Machine Learning For Anomaly Detection
23 pages
Network Anomaly Detection Methods
No ratings yet
Network Anomaly Detection Methods
6 pages
Ahmed PDF
No ratings yet
Ahmed PDF
6 pages
CS L06 MachineLearning AnomalyDetection
No ratings yet
CS L06 MachineLearning AnomalyDetection
61 pages
Network Anomaly Detection
No ratings yet
Network Anomaly Detection
18 pages
Optimizing ML for Intrusion Detection
No ratings yet
Optimizing ML for Intrusion Detection
19 pages
Paper 8 CN
No ratings yet
Paper 8 CN
5 pages
Anomaly Detection in Network Traffic Using Machine
No ratings yet
Anomaly Detection in Network Traffic Using Machine
16 pages
10.1007@978 981 13 9710 344
No ratings yet
10.1007@978 981 13 9710 344
13 pages
Ijetae 0512 58 PDF
No ratings yet
Ijetae 0512 58 PDF
5 pages
Anomaly Detection in Big Data
No ratings yet
Anomaly Detection in Big Data
148 pages
Anomaly Detection in Networks Using Machine Learning
No ratings yet
Anomaly Detection in Networks Using Machine Learning
71 pages
Final
No ratings yet
Final
10 pages
CCN Presentation
No ratings yet
CCN Presentation
13 pages
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
No ratings yet
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
12 pages
Anomaly-Based Network Intrusion Detection System
No ratings yet
Anomaly-Based Network Intrusion Detection System
6 pages
Batch 7 Conference Paper
No ratings yet
Batch 7 Conference Paper
5 pages
Multi Level Deep Learning Model For Network Anomal
No ratings yet
Multi Level Deep Learning Model For Network Anomal
12 pages
BCLR 0148
No ratings yet
BCLR 0148
81 pages
A Study On High Speed Outlier Detection
No ratings yet
A Study On High Speed Outlier Detection
17 pages
Cyber Security 2020 V1 3 Revise 16 04 2020 For Joe and The Team
No ratings yet
Cyber Security 2020 V1 3 Revise 16 04 2020 For Joe and The Team
8 pages
Applying Machine Learning To Cyber Security
No ratings yet
Applying Machine Learning To Cyber Security
117 pages
Intrusion Detection On Self Organizing Network Using Pca and Random Forest
No ratings yet
Intrusion Detection On Self Organizing Network Using Pca and Random Forest
16 pages
Ai 05 00143
No ratings yet
Ai 05 00143
17 pages
Bayesian Optimization With Machine Learning Algorithms Towards Anomaly Detection
No ratings yet
Bayesian Optimization With Machine Learning Algorithms Towards Anomaly Detection
6 pages
CANA 1 Deepa+Tatyasaheb+Mane 11 1539
No ratings yet
CANA 1 Deepa+Tatyasaheb+Mane 11 1539
9 pages
Anamoly Detection
0% (1)
Anamoly Detection
20 pages
Ecmlpkdd08 Lazarevic Dmfa
No ratings yet
Ecmlpkdd08 Lazarevic Dmfa
116 pages
Research
No ratings yet
Research
15 pages
Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu
No ratings yet
Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu
13 pages
IOT Project Report
No ratings yet
IOT Project Report
13 pages
Batch 7 Conference Paper
No ratings yet
Batch 7 Conference Paper
5 pages
Faizah
No ratings yet
Faizah
11 pages
Network Intrusion Detection Using Machine Learning: Project Guide DR K Suresh
No ratings yet
Network Intrusion Detection Using Machine Learning: Project Guide DR K Suresh
40 pages
Cyber Security Darknet Threat Detection Using Machine Learning On Network Traffic
No ratings yet
Cyber Security Darknet Threat Detection Using Machine Learning On Network Traffic
22 pages
Amnamoly Detection in Network
No ratings yet
Amnamoly Detection in Network
2 pages
Network Anamoly Detection Paper (DKB)
No ratings yet
Network Anamoly Detection Paper (DKB)
34 pages
05 02 Survey Data Mining v2
No ratings yet
05 02 Survey Data Mining v2
54 pages
Aam Micro
No ratings yet
Aam Micro
13 pages
ML Intrusion Detection for SDN Networks
No ratings yet
ML Intrusion Detection for SDN Networks
4 pages
14 Pages 22 July
No ratings yet
14 Pages 22 July
5 pages
Final Progress
No ratings yet
Final Progress
22 pages
Literature Survey
No ratings yet
Literature Survey
1 page
Anomaly Detection for IT Experts
No ratings yet
Anomaly Detection for IT Experts
8 pages
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
No ratings yet
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
101 pages
19bit0368 Capstone Final Review
No ratings yet
19bit0368 Capstone Final Review
48 pages
Anomaly Detection Tutorial
No ratings yet
Anomaly Detection Tutorial
101 pages
Intrusion Detection in Wireless Sensor Networks
No ratings yet
Intrusion Detection in Wireless Sensor Networks
5 pages
Paper 1-1
No ratings yet
Paper 1-1
26 pages
Time-Series Anomaly Detection Service at Microsoft
No ratings yet
Time-Series Anomaly Detection Service at Microsoft
9 pages
KNN 2
No ratings yet
KNN 2
4 pages
El 17 01 08
No ratings yet
El 17 01 08
8 pages
Anomaly Detection: Jing Gao
No ratings yet
Anomaly Detection: Jing Gao
51 pages
Big Data
No ratings yet
Big Data
41 pages
Aml CS 9 PRV
No ratings yet
Aml CS 9 PRV
47 pages
IMP Questions & Ans On ML & CI Using Python
No ratings yet
IMP Questions & Ans On ML & CI Using Python
21 pages
Chatbot For Multiple Documents
No ratings yet
Chatbot For Multiple Documents
3 pages
Statistical Machine Learning For Quantitative Finance
No ratings yet
Statistical Machine Learning For Quantitative Finance
25 pages
ML - Questions & Answer
No ratings yet
ML - Questions & Answer
45 pages
Data Science - Presentation
No ratings yet
Data Science - Presentation
15 pages
ML Overfitting & Underfitting Guide
No ratings yet
ML Overfitting & Underfitting Guide
3 pages
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
No ratings yet
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
15 pages
HCIA AI Practice Exam All
No ratings yet
HCIA AI Practice Exam All
64 pages
DataScience 1
No ratings yet
DataScience 1
22 pages
Internship Report Iot
No ratings yet
Internship Report Iot
31 pages
2023 Stacked CNN LSTM Approach For Prediction of Suicidal Ideation
No ratings yet
2023 Stacked CNN LSTM Approach For Prediction of Suicidal Ideation
22 pages
Answer 2023-24
No ratings yet
Answer 2023-24
19 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
39 pages
PPTML
No ratings yet
PPTML
16 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
Deep AutoEncoders for Recommender Systems
No ratings yet
Deep AutoEncoders for Recommender Systems
5 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
58 pages
Introduction To Machine Learning and Hands On Sessions
100% (1)
Introduction To Machine Learning and Hands On Sessions
50 pages
MLA LabManual1
No ratings yet
MLA LabManual1
52 pages
Machine Learning in Business - Chapter 1
No ratings yet
Machine Learning in Business - Chapter 1
22 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
Valuation PDF
No ratings yet
Valuation PDF
5 pages
Regularization - Ridge and Lasso
No ratings yet
Regularization - Ridge and Lasso
7 pages
Data Science Unit 5
No ratings yet
Data Science Unit 5
11 pages
Equal Plaintext
No ratings yet
Equal Plaintext
11 pages
Hyperparameters Optimization XGBoost For Network Intrusion Detection Using CSE-CIC-IDS 2018 Dataset
No ratings yet
Hyperparameters Optimization XGBoost For Network Intrusion Detection Using CSE-CIC-IDS 2018 Dataset
10 pages
Muntasir Report
No ratings yet
Muntasir Report
33 pages
Handling Imbalanced Data in ML
No ratings yet
Handling Imbalanced Data in ML
10 pages

Anomaly Detection Using ML

Uploaded by

Anomaly Detection Using ML

Uploaded by

ANOMALY DETECTION IN

Crucial task in identifying deviations from normal

Powerful tools for pattern recognition

Unsupervised Learning Algorithms FOCUS

COMPARISON OF VARIOUS MACHINE LEARNING ALGORTHM

ADABOOST Naive Bayes Gradient Boosting

Logistic Regression K-Nearest Neigbors (KNN) SVM

Random Forest Decision Tree Neural Network (NN)

Logistic Regression KNN SVM Neural Network (NN)

ability to enhance the

robust framework for identifying anomalies,dimensionality, and robustness

the accuracy and reliability of anomaly detection

Metaheuristics and Machine Learning for Anomaly Detection in Big Data

Industrial Anomaly Detection with Neural Network Architectures

ICMP: The most frequent protocol

The number of users who did not

Dominant Feature: The srv_count feature

Recommendations for Further Development:

You might also like