Machine learning for anomaly
detection
December 2024
1. Understanding techniques, applications, and best practices
Agenda
2. Case studies
3. Points to remember
4. Resources and further reading
5. Questions and discussion
01
UNDERSTANDING TECHNIQUES, APPLICATIONS, AND
BEST PRACTICES
Artificial Intelligence vs Machine Learning
AI vs ML?
Artificial intelligence (AI) is a Machine learning (ML) is a
broad concept that describes specific application of AI that
a machine's ability to mimic teaches machines to perform
human intelligence. tasks by learning from data.
WHAT IS MACHINE LEARNING?
Machine Learning Overview
• Machine Learning is a subset of AI that .
enables systems to learn and improve from
experience without explicit programming.
• Key Focus Patterns, predictions, and decision-
making
Process
WHAT IS ANOMALY DETECTION?
Anomaly detection refers to
identifying patterns in data that do
not conform to expected behavior.
Significant in applications like
fraud detection, network security,
and predictive maintenance.
Helps mitigate risks and improve
decision- making processes.
Anomaly detection identifies suspicious activity that falls outside of your established normal
patterns of behavior. A solution protects your system in real-time from instances that could result
in significant financial losses, data breaches, and other harmful events
TYPES OF ANOMALIES
Point Anomalies
Data points significantly
different from the majority (e.g., Contextual Anomalies
a sudden spike in network
traffic). Unusual only within a specific
context (e.g., high temperature
during winter).
Collective Anomalies
A collection of related data
points that deviate as a group
(e.g., a distributed denial- of-
service attack).
SUPERVISED ANOMALY DETECTION UNSUPERVISED ANOMALY DETECTION
• Supervised machine learning builds a • Unsupervised methods do not demand
predictive model using a labeled training manual labeling of training data. Instead,
set with normal and anomalous samples they operate based on the presumption
• The most common supervised methods • The most popular unsupervised anomaly
include Bayesian networks, k-nearest detection algorithms include Autoencoders,
neighbors, decision trees, supervised neural K-means, GMMs, hypothesis tests-based
networks, and SVMs analysis, and PCAs.
• The advantage of supervised models is that • These techniques thus assume collections
they may offer a higher rate of detection of frequent, similar instances are normal
and flag infrequent data groups as
malicious.
SEMI SUPERVISED ANOMALY DETECTION
• Semi-supervised anomaly detection may refer to an approach to creating a model for normal data
based on a data set that contains both normal and anomalous data, but is unlabelled
• The most common semi supervised methods include Linear regression, Outlier detection,Graph-
based.
• A semi-supervised anomaly detection algorithm might also work with a data set that is partially
flagged. It will then build a classification algorithm on just that flagged subset of data, and use that
model to predict the status of the remaining data.
WHY USE MACHINE LEARNING FOR ANOMALY DETECTION?
Advantages of ML Challenges
• Data imbalance Anomalies are
• Handles complex and rare compared to normal data
large datasets effectively.
• . Learns from data to • Dynamic and non- stationary
adapt to new patterns data.Data evolves over time,
dynamically requiring adaptive models
• Provides superior • High dimensionality Complex
accuracy compared to data structures make anomalies
traditional statistical harder to detect
methods.
COMMON ALGORITHMS IN ANOMALY DETECTION
Algorithm Types Anomaly Detection Algorithm Techniques To Know
• Supervised Random Forest, SVM for • Isolation Forest
binary classification. • Local Outlier Factor (LOF)
• Unsupervised: PCA, k- Means, • Robust Covariance
Isolation Forest for detecting • One- class support vector machine
patterns. (SVM)
• Deep Learning: Autoencoders, RNNs • One- class SVM with stochastic
for complex data types like time gradient descent (SGD)
series. • K- means clustering
• Long short- term memory (LSTM)
• Angle- based outlier detection
Techniques
One-Class Support Vector
Isolation Forest Local Outlier Factor Robust Covariance
Machine (SVM)
Isolation Forest isolates LOF identifies anomalies by Robust covariance is a statistical A One-Class SVM creates a
anomalies by creating random comparing the local density of a method that computes the boundary around normal data
partitions in the data. Anomalies point to its neighbors. Points with covariance matrix to identify points in a high-dimensional
are isolated faster than normal significantly lower density than data points deviating from the space, classifying points outside
points due to their distinct their neighbors are flagged as multivariate distribution. the boundary as anomalies.
properties. outliers .
Long Short-Term Memory Angle-Based Outlier
One-Class SVM with SGD K-Means Clustering
(LSTM) Detection
This method optimizes One- K-Means groups data into LSTMs are a type of recurrent This method calculates the
Class SVM using Stochastic clusters, and points far from any neural network that learns angle between points in high-
Gradient Descent to handle cluster center are considered temporal dependencies in dimensional space to detect
large-scale datasets efficiently. anomalies. sequential data. They identify anomalies. Anomalies are
. anomalies by analyzing identified based on deviations
deviations from learned patterns. from expected angular
distributions.
One-Class Support Vector
Isolation Forest Local Outlier Factor(LOF) Machine (SVM)
Long Short-Term Memory (LSTM)
K-Means Clustering
EXAMPLES OF ALGORITHM APPLICATIONS
One-Class Support Vector Machine
Isolation Forest Example Local Outlier Factor (LOF) Example Robust Covariance Example (SVM) Example
Detecting fraudulent Identifying unusual behavior in Detecting unusual patterns in Detecting abnormal network
transactions in credit card data user activity logs for multivariate sensor data in traffic in IT infrastructure.
using an Isolation Forest cybersecurity. manufacturing processes.
algorithm.
Long Short-Term Memory (LSTM) Angle-Based Outlier Detection
One-Class SVM with SGD Example K-Means Clustering Example Example Example
Detecting outliers in massive Identifying rare diseases in Detecting anomalies in time- Detecting outliers in large, high-
customer behavior datasets in patient medical records by series data, such as server logs dimensional datasets like gene
e- commerce. analyzing cluster distances. or stock market fluctuations. expression data.
INFERENCE
Key Inference
Anomaly detection techniques are vital for
uncovering irregularities in various domains.
Choosing the right algorithm depends on the
1. Dataset
2. Scale,
3. Application requirements.
PRACTICAL WORKFLOW FOR ANOMALY DETECTION
Step 1 Data preprocessing: Handle
missing data, outliers, and normalization.
Step 2Algorithm selection based on the
data and problem type.
Workflow
Step 3 Model evaluation using key metrics
like F1- score.
Step 4 Deploy the model and monitor its
performance.
02
CASE STUDIES
03 Resources and Further Reading
1. Books: 'Anomaly Detection Principles and Algorithms' by Aggarwal.
2. Courses: 'Machine Learning for All’, AL/ML at IIT
3. Datasets: UCI Machine Learning Repository
Q&A
Thank you