[go: up one dir, main page]

0% found this document useful (0 votes)
32 views9 pages

Deep Walk Algorithm

The DeepWalk algorithm learns latent representations of nodes in a graph using random walks and the Skip-gram model from Word2Vec to generate vector embeddings. These embeddings facilitate various machine learning tasks such as node classification, clustering, and link prediction. DeepWalk has applications in social network analysis, recommendation systems, biological network analysis, and more, by capturing the structure and relationships within graphs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views9 pages

Deep Walk Algorithm

The DeepWalk algorithm learns latent representations of nodes in a graph using random walks and the Skip-gram model from Word2Vec to generate vector embeddings. These embeddings facilitate various machine learning tasks such as node classification, clustering, and link prediction. DeepWalk has applications in social network analysis, recommendation systems, biological network analysis, and more, by capturing the structure and relationships within graphs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

DEEP WALK ALGORITHM

How Graphs Represent Relationships (Social Networks & Web Pages)


1. Social Networks as Graphs
In a social network (e.g., Facebook, Twitter, LinkedIn), users are nodes, and their
connections (friendships, follows) are edges.
Example: Simple Social Network Graph
Consider four people: Alice, Bob, Charlie, and David
 Alice is friends with Bob and Charlie
 Bob is also friends with David
 Charlie and David are not directly connected
We can represent this as an undirected graph:
Alice ---- Bob
| |
Charlie David
🔹 Nodes = {Alice, Bob, Charlie, David}
🔹 Edges = {(Alice, Bob), (Alice, Charlie), (Bob, David)}
👉 Use Case: Find "friend recommendations" using graph embeddings (DeepWalk).

2. Web Pages as Graphs (Google’s PageRank Algorithm)


The internet is a directed graph where:
 Web pages (URLs) are nodes
 Hyperlinks from one page to another are directed edges
Example: Small Web Graph
Page A → Page B
↓ ↓
Page C → Page D
🔹 Nodes = {A, B, C, D} (Webpages)
🔹 Edges (Hyperlinks) = {(A → B), (A → C), (B → D), (C → D)}
👉 Use Case: Google uses PageRank to rank pages based on incoming links.

What is a Random Walk?


A random walk on a graph is a path where each step is chosen randomly from the neighbors
of the current node. It's a way of exploring the graph structure by following edges in a
probabilistic manner.
🔹 Example:
 Start at a node.
 Move to a random neighbor.
 Repeat for a fixed number of steps.
Why Do Random Walks Capture Graph Structure?
Random walks encode structural information because they naturally reflect local and
global graph properties.
1️ Proximity Information
 If two nodes are frequently visited together, they are likely structurally close.
2️ Connectivity Patterns
 The probability of reaching a node depends on the graph topology.
 Hubs (high-degree nodes) are visited more often, capturing influential nodes.
3️ Structural Similarity
 Nodes with similar neighborhoods will have similar random walk distributions.
 This helps in clustering and classification.
4️ Adaptability
 Random walks can be biased to explore specific structures:
o DFS-like (Depth) walks explore deep hierarchical structures.
o BFS-like (Breadth) walks capture local communities.
What Are Embeddings?
Embeddings are dense vector representations of objects (e.g., words, nodes, or images) in a
lower-dimensional space, where similar objects have similar numerical representations.
Why Are Embeddings Useful?
 Dimensionality Reduction: Converts high-dimensional data (graphs, text, etc.) into
compact numerical vectors.
 Similarity Measurement: Nodes with similar roles in a graph have similar
embeddings.
 Machine Learning Input: Helps apply ML techniques like clustering and
classification on graphs.

Example: Word Embeddings vs. Node Embeddings


1. Word Embeddings (Word2Vec)
o "King" → [0.12, -0.34, 0.56, ...]
o "Queen" → [0.15, -0.32, 0.58, ...]
o Since "King" and "Queen" are related, their embeddings are close.
2. Node Embeddings (DeepWalk)
o Graph Example: Social Network (People as nodes, friendships as edges)
o "Alice" → [0.2, 0.4, -0.1, ...]
o "Bob" → [0.22, 0.38, -0.08, ...]
o Since Alice and Bob have similar connections, their embeddings are similar.

Word2Vec
Word2Vec is a popular algorithm used in Natural Language Processing (NLP) to learn
vector representations (or embeddings) of words in a continuous vector space. The key idea
behind Word2Vec is that words that appear in similar contexts should have similar
representations. There are two primary models in Word2Vec:
1. Skip-Gram
2. Continuous Bag of Words (CBOW).

Skip-Gram Model (Word2Vec)


Goal:
 The Skip-Gram model tries to predict the context words given a target word.
 Example: Given the word "king", the model will try to predict its surrounding words,
like "royal", "queen", "throne", etc., based on the training data.

Example:
Let’s take a sentence:
"The cat sat on the mat."
Step-by-Step Breakdown
 Sentence:
"The cat sat on the mat."
 Context Window:
We define a context window size. For simplicity, let’s use a window size of 1,
meaning we only look at the one word before and one word after the target word.
 Step 1: Generate Training Data We’ll create pairs of (target word, context word)
from the sentence.
For each word in the sentence, we take it as a target word and consider the words within the
context window as context words.
o For the word "cat":
 Target word = cat
 Context words = "The" (left), "sat" (right)
o For the word "sat":
 Target word = sat
 Context words = "cat" (left), "on" (right)
o For the word "on":
 Target word = on
 Context words = "sat" (left), "the" (right)
o For the word "the":
 Target word = the
 Context words = "on" (left), "mat" (right)
Training pairs from this sentence:
(cat, The), (cat, sat)
(sat, cat), (sat, on)
(on, sat), (on, the)
(the, on), (the, mat)
(mat, the)
 Step 2: Learn Embeddings (Training the Model) The Skip-Gram model uses these
pairs to train a neural network. The network learns the weights (embeddings) for each
word based on how often they appear together as target-context pairs. The output of
this training is a vector representation (embedding) for each word.
What Does the Output Look Like?
 The word embeddings are dense vectors (real-valued numbers). These vectors
capture the semantic meaning of words.
After training, words like "cat" and "mat" (which appeared together in the context) will
have similar embeddings, meaning they are closer in the vector space. Similarly, words like
"cat" and "dog" will likely have similar embeddings, as they appear in similar contexts.
Word Embeddings Example:
 cat: [0.1, 0.3, -0.4, 0.8, -0.1]
 sat: [0.2, 0.1, -0.3, 0.7, -0.2]
 mat: [0.15, 0.35, -0.45, 0.85, -0.1]
Here, the values are random numbers to represent the idea that "cat" and "mat" are closer in
the vector space compared to "cat" and an unrelated word like "apple."

Key Insights from Word2Vec:


1. Semantic Similarity:
o Words that appear in similar contexts will have similar embeddings. For
instance, "king" and "queen" will be closer in the vector space because they
often appear in similar contexts, like "royalty" or "throne".
2. Analogies:
o Once trained, you can use vector arithmetic to solve analogies. For example:
 "king" - "man" + "woman" ≈ "queen".
 This is because the vector difference between "king" and "man" (a
male king) plus "woman" should yield a vector close to "queen".

Summary Table

Feature Random Walks on Graphs Word2Vec (NLP)

Nodes Entities (e.g., people, items) Words

Relationships (e.g., friendships, Contextual relationships (co-occurring


Edges
interactions) words)

Learn node embeddings based on graph Learn word embeddings based on


Goal
structure word context

Context-based prediction (Skip-


Method Random movement through graph
Gram/CBOW)

Node embeddings (vector Word embeddings (vector


Output
representations) representations)

Node classification, link prediction, Word similarity, text classification,


Applications
clustering NLP tasks

Conclusion
Both random walks and Word2Vec model relationships by exploring proximity (in graphs
or in text). The underlying concept of learning embeddings based on local structures makes
them highly effective for their respective domains.

Deep Walk Algorithm


DeepWalk is an algorithm for learning latent representations of nodes in a graph. It is based
on random walks and Skip-gram (word2vec) to generate vector embeddings for nodes in a
network. These embeddings help in various machine learning tasks like node classification,
clustering, and link prediction.
Steps of DeepWalk Algorithm
1. Random Walks:
o Perform short random walks from each node in the graph.
o These walks capture the local structure of the graph.
2. Generating Training Data:
o Treat each random walk as a sentence (sequence of nodes).
o Each node is like a word in natural language processing.
3. Skip-gram Model (Word2Vec):
o Use Skip-gram to learn node embeddings.
o The goal is to predict surrounding nodes given a central node.

4. Generating Node Embeddings:


o After training, each node gets a low-dimensional embedding.
o These embeddings can be used in machine learning tasks.
Example
Consider a simple graph:
A -- B -- C
| | |
D -- E -- F
Step 1: Random Walks
A random walk simulates how a node is connected to others by following links.
Random Walk from 'A' (walk length = 4):
A→B→E→F
Repeat this process multiple times for each node to generate training sequences.
Walk from B: B → C → F → E

Step 2: Training Data for Skip-gram


Each random walk is treated like a sentence in NLP.
For example, if we collect multiple walks:
['A', 'B', 'E', 'F']
['B', 'C', 'F', 'E']
['D', 'E', 'B', 'A']
Each node (word) is associated with its neighbors within a window size.
For Skip-gram, we take a central node and predict nearby nodes:

Center Node Context Nodes (window=2)

A B, E

B A, E, C

C B, F

E B, F, A, D

Step 3: Learning Embeddings Using Word2Vec


Skip-gram model (from Word2Vec) is used to learn node embeddings.
Mathematical Formulation
For a node v, the probability of encountering a neighbor u in a random walk is:4

Where each wi is a random walk


vec(v) is the embedding of node v.
V is the set of all nodes.

Step 4: Gradient Descent Optimization


The embeddings are updated using Stochastic Gradient Descent (SGD):
 Start with random embeddings.
 Compute prediction error using Skip-gram.
 Update embeddings based on gradient descent.
 Repeat until convergence.

Final Embeddings
After training, each node gets a low-dimensional vector that captures graph structure:
Example output (assuming vector_size=4):
Node A: [0.21, -0.33, 0.44, 0.56]
Node B: [0.18, -0.29, 0.51, 0.61]
Node E: [0.24, -0.35, 0.42, 0.59]
Nodes with similar roles in the graph will have closer embeddings.
Applications
1. Social Network Analysis
 Use Case: Identifying communities or groups within social networks based on user
interactions.
 How DeepWalk Helps: By representing users as nodes and interactions as edges,
DeepWalk can embed users in a continuous vector space, capturing their behavioral
patterns and relationships. This is useful for tasks like friend recommendations or
identifying influencers.
2. Link Prediction
 Use Case: Predicting missing edges or connections in a graph.
 How DeepWalk Helps: By learning node representations, DeepWalk can capture the
likelihood of a connection between two nodes. This is useful in social networks,
recommendation systems, or knowledge graphs where new relationships need to be
predicted.
3. Recommendation Systems
 Use Case: Suggesting products, services, or content based on user-item interactions.
 How DeepWalk Helps: In collaborative filtering, DeepWalk can learn embeddings of
users and items from interaction graphs (such as users interacting with products),
allowing it to make personalized recommendations based on learned similarities.
4. Biological Network Analysis
 Use Case: Analyzing protein-protein interaction networks or gene regulatory
networks.
 How DeepWalk Helps: DeepWalk can help identify clusters of proteins or genes with
similar functions by learning latent representations of the nodes, which can be used
for drug discovery or disease prediction.
5. Graph-based Search Engines
 Use Case: Ranking and searching over graph data, such as academic papers, patents,
or websites.
 How DeepWalk Helps: The algorithm can embed both documents and terms into a
shared vector space, facilitating semantic search and better ranking of search results
based on relationships between concepts.

6. Natural Language Processing (NLP)


 Use Case: Word embeddings and document classification.
 How DeepWalk Helps: In text-based applications, DeepWalk can model words and
their relationships as nodes in a graph (e.g., co-occurrence graph of words), and then
it learns embeddings that capture semantic meanings of words or documents.
7. Graph Visualization
 Use Case: Visualizing large-scale graphs and networks.
 How DeepWalk Helps: The learned embeddings can be used for dimensionality
reduction techniques such as t-SNE or PCA to visualize complex graphs in 2D or 3D
space, helping researchers and practitioners understand the structure of large datasets.
8. Knowledge Graph Construction
 Use Case: Building knowledge graphs for structured data representation.
 How DeepWalk Helps: DeepWalk can represent relationships in large knowledge
graphs, helping automate the extraction of facts and relationships from unstructured
data sources.
9. Anomaly Detection
 Use Case: Identifying unusual behavior or anomalies in graph data.
 How DeepWalk Helps: By learning the representations of nodes, DeepWalk can
identify outliers or anomalies by detecting nodes whose embeddings deviate
significantly from the majority of nodes in the graph.
10. Graph Classification
 Use Case: Classifying entire graphs based on their structure.
 How DeepWalk Helps: In applications where the goal is to classify entire graphs
(e.g., chemical compounds or social network communities), DeepWalk can embed
graphs into vector spaces and then use those embeddings for classification tasks.
In all these cases, DeepWalk works by generating random walks over the graph, treating them
as sentences (similar to word2vec), and then learning to represent the nodes in a low-
dimensional vector space. This allows the model to capture the local and global structures of
the graph in its node embeddings.

You might also like