[go: up one dir, main page]

0% found this document useful (0 votes)
20 views12 pages

Graph Mining For APT

Uploaded by

Xitao Wen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views12 pages

Graph Mining For APT

Uploaded by

Xitao Wen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Graph Mining for APT

Detection
Xitao Wen

1
Problem Scope
• Inputs
– Multi-granularity Heterogeneous
Information Network (HIN)
– Accurate provenance tagging
• Objectives
– Causality Identification
– Connectivity Anomaly Detection
– Behavior Anomaly Detection

2
Heterogeneous Information
Network
• Directed graphs with entities and events
representing temporal information flows
• Features
– Heterogeneity
• Entities: processes/threads, hosts, routers…
Anything else?
• Events: run-on, send-to, read…
– Multi-granularity
• Any examples for each granularity?
– Large scale and dynamic
• Any estimation on scale and velocity?
3
Topic 1: Causality Identification
• Definition
– Given an (anomalous) event, back-propagate in
HIN and find the events that may lead to this
event
• Input
– HIN
– Anomalous event
– Searching parameters (Optional)
• Output
– Subgraph of events that the anomalous event
depends on
4
Causality Identification
• Basic technique
– Graph searching with dynamic pruning heuristics
• Possible parameters/heuristics
– Search depth
– Branch factor
– Granularity
• Challenges
– Combinatorial explosion of search space
– Quality of search results
– Modeling domain-specific pruning parameters

5
Causality Identification
• Clarification questions
– How to determine two events are
related?
– How is HIN stored? What are the query
complexity?

6
Topic 2: Connectivity Anomaly
Detection
• Definition
– An anomalous connection or a missing
connection between two entities
• Input
– Clean information flows (for training)
– Test information flows
• Output
– Anomalous connections or missing
connections in test set
7
Connectivity Anomaly Detection

• Basic Techniques
– Design statistical models to conduct
likelihood estimation
– Unsupervised learning to spot
anomalous connections through analysis
of clusters
– (They are entirely beyond my
knowledge base…)

8
Topic 3: Behavior Anomaly
Detection
• Definition
– Identify frequent behavior patterns (i.e.
subgraphs) and find instances that are
similar but deviate slightly from the pattern
• Input
– A large collection individual activity
subgraphs
• Output
– Behavior patterns
– Anomalous instances
9
Behavior Anomaly Detection
• Basic Technique
– Frequent pattern mining in graphs (a decades-long research topic)
• Challenges
– Scalability
– Deal with noise in data
– Differentiating benign deviations and malicious deviations
– Quantifying the severity of anomalies
• References cited in the proposal
– William Eberle and Lawrence B. Holder. Anomaly detection in data
represented as graphs. Intell. Data Anal., 07.
– William Eberle and Lawrence B. Holder. Discovering structural
anomalies in graph-based data. In MGCS ’07
– William Eberle and Lawrence B. Holder. Applying graph-based
anomaly detection approaches to the discovery of insider threats.
ISI ’09

10
More Previous Work on Frequent
Subgraph Pattern Mining
• Approaches
– Apriori-based approach:
• AGM by Inokuchi et al.
• FSG by Kuramochi and Karypis,
• an edge-disjoint path-join algorithm by Vanetik et al.
– Pattern-Growth Approach
• gSpan by Yan and Han
• MoFa by Borgelt and Berthold
• FFSM by Huan et al.
• SPIN by Huan et al.
• Gaston by Nijssen and Kok
• More scalable approaches
– Significant subgraph
– Representative subgraph
– Dense subgraph
• Fundamental challenge
– Combinatorial explosion: search space grows exponentially with the
pattern size

11
12

You might also like