Graph Analytics for Big Data
Slides is adapted from CS224W:
Machine Learning with Graphs
Prerequisites
▪ Course title: Graph Analytics for Big Data
▪ Good background in:
▪ Machine Learning (ML)
▪ Algorithms and graph theory
▪ Probability and statistics
▪ Programming:
▪ You should be able to write non trivial programs (in
Python)
▪ Familiarity with Python libraries in ML
Graph machine learning tools
▪ PyG (PyTorch Geometric)
▪ The ultimate library for Graph Neural Networks
▪ Python igraph
▪ NetworkX
▪ GsPan
▪ Top python libraries for data science: numpy,
pandas, scikit-learn,
Instructors
▪ Dr. Van Hao Can
▪ Dr. Van Hoan Do
Textbooks & Communication
▪ Text book
▪ Nagiza F. Samatova, William Hendrix, John Jenkins, Kanchana
Padmanabhan, Arpan Chakraborty, Practical Graph Mining with R,
Chapman and Hall/CRC, 2013.
▪ Graph Representation Learning Book by Will Hamilton
▪ Homework and slides will be sent through Moodle.
▪ E-mail
▪ Van Hoan Do: vanhoan310@gmail.com
▪ Van Hao Can: cvhao89@gmail.com
Grading
▪ Attendance/Attitude: 10%
▪ Midterm Exam: 40%
▪ Written or presentation.
▪ Homework assignment (+1, +2 points)
▪ Final Exam: 50%
▪ Written exam.
▪ 50% of exercises and 50% of code in Python.
ML with graphs
Representation learning
Representation learning
Graph Theory
Representing Graphs:
Adjacency Lists
Definition: An adjacency list can be used to
represent a graph with no multiple edges by
specifying the vertices that are adjacent to each
vertex of the graph.
Example:
Example:
Special Types of Simple Graphs: Complete
Graphs
A complete graph on n vertices, denoted by Kn,
is the simple graph that contains exactly one
edge between each pair of distinct vertices.
Special Types of Simple Graphs: Cycles
and Wheels
A cycle Cn for n ≥ 3 consists of n vertices v1, v2
,⋯ , vn, and edges {v1, v2}, {v2, v3} ,⋯ , {vn-1, vn},
{vn, v1}.
A wheel Wn is obtained by adding an additional
vertex to a cycle Cn for n ≥ 3 and connecting this
new vertex to each of the n vertices in Cn by new
edges.
New Graphs from Old
Definition: A subgraph of a graph G = (V,E) is a graph (W,F), where W ⊂
V and F ⊂ E. A subgraph H of G is a proper subgraph of G if H ≠ G.
Example: Here we show K5 and
one of its subgraphs.
Definition: Let G = (V, E) be a simple graph. The subgraph induced by a
subset W of the vertex set V is the graph (W,F), where the edge set F
contains an edge in E if and only if both endpoints are in W.
Example: Here we show K5 and the subgraph
induced by W = {a,b,c,e}.
New Graphs from Old 2
Definition: The union of two simple graphs
G1 = (V1, E1) and G2 = (V2, E2) is the simple
graph with vertex set V1 ⋃ V2 and edge set E1 ⋃
E2. The union of G1 and G2 is denoted by G1 ⋃
G2.
Example:
Trees
Definition: A tree is a connected undirected graph with no simple circuits.
Example: Which of these
graphs are trees?
Solution: G1 and G2 are trees - both are connected and have no simple
circuits. Because e, b, a, d, e is a simple circuit, G3 is not a tree. G4 is not a
tree because it is not connected.
Definition: A forest is a graph that has no simple circuit,
but is not connected. Each of the connected
components in a forest is a tree.
Trees (continued)
Theorem: An undirected graph is a tree if and only if there is a
unique simple path between any two of its vertices.
Proof: Assume that T is a tree. Then T is connected with no simple
circuits. Hence, if x and y are distinct vertices of T, there is a simple
path between them (by Theorem 1 of Section 10.4). This path must
be unique - for if there were a second path, there would be a simple
circuit in T (by Exercise 59 of Section 10.4). Hence, there is a unique
simple path between any two vertices of a tree.
Now assume that there is a unique simple path between any two
vertices of a graph T. Then T is connected because there is a path
between any two of its vertices. Furthermore, T can have no simple
circuits since if there were a simple circuit, there would be two paths
between some two vertices.
Hence, a graph with a unique simple path between any two vertices
is a tree.
Properties of Trees
Theorem 2: A tree with n vertices has n − 1 edges.
Proof (by mathematical induction):
BASIS STEP: When n = 1, a tree with one vertex has no
edges. Hence, the theorem holds when n = 1.
INDUCTIVE STEP: Assume that every tree with k
vertices has k − 1 edges.
Suppose that a tree T has k + 1 vertices and that v is a
leaf of T. Let w be the parent of v. Removing the vertex v
and the edge connecting w to v produces a tree T′ with k
vertices. By the inductive hypothesis, T′ has k − 1
edges. Because T has one more edge than T′, we see
that T has k edges. This completes the inductive step.
Spanning Trees
Definition: Let G be a simple graph. A spanning tree of G is a
subgraph of G that is a tree containing every vertex of G.
Example: Find the spanning tree of this
simple graph:
Solution: The graph is connected, but is not a tree because it
contains simple circuits. Remove the edge {a, e}. Now one simple
circuit is gone, but the remaining subgraph still has a simple circuit.
Remove the edge {e, f} and then the edge {c, g} to produce a simple
graph with no simple circuits. It is a spanning tree, because it
contains every vertex of the original graph.