Graph Theory and
Its Applications
GRAPHS IN BIO-INFORMATICS
What is a Graph?
a b
i d
e
h
f
g
Vertices
GRAPHS IN BIO-INFORMATICS
What is a Graph?
a b
i d
e
h
f
g
Edges
GRAPHS IN BIO-INFORMATICS
What is a Graph?
a b
i d
e
h Vertices
f
g
Edges
GRAPHS IN BIO-INFORMATICS
A Real-life Puzzle
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS
Can you find a
route around
the city that
crosses every
bridge only
once?
GRAPHS IN BIO-INFORMATICS
Let us map the
puzzle to a
graph.
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS
GRAPHS IN BIO-INFORMATICS
The problem
reduces to
finding an
Eulerian
Circuit in a
graph
GRAPHS IN BIO-INFORMATICS
The Königsberg Bridge Problem is a story of 1736.
Map Coloring
Frequency Assignment Problem
Resource Partitioning
Motivation: Tale of an Ancient King
Treasure Island
Tale of an Ancient King
Treasure Island
Tale of an Ancient King
Treasure Island
Tale of an Ancient King
Treasure Island
Tale of an Ancient King
Treasure Island
Tale of an Ancient King
u1
u2
# r1 = 6
G # r2 = 5
Tale of an Ancient King
u1
u2
# r1 = 6
G # r2 = 5
Applications: Grid Computing
❑ Terminals delegating subtasks are base vertices.
❑ Computing elements are resource vertices
Irrigation Canal NETWORKS
Applications
Canal
Water Network
Pump
Plots in a region to be irrigated
47
Applications
Plots in a region to be irrigated
48
Applications
Find a set of boundaries
that induces
• Connected canals
• Supply water in all the plots
of the region
• Length of the induced canals
is minimum.
Region to be irrigated
Modeling 49
Applications
Find a set of roads
that induces
• Connected road network
• Supply gas in all the regions
of the locality
Gas • Length of the induced road
Pipeline network is minimum.
Gas Field
Roads Network of a Locality
50
Modeling
VLSI LAYOUT
VLSI Layout
VLSI Floorplanning
B
A
E F
G C
Interconnection graph
VLSI Floorplanning
B
B
A A
E F F
E
G C
G C
D
D
Interconnection graph VLSI floorplan
VLSI Floorplanning
B
B
A A
E F F
E
G C
G C
D
D
Interconnection graph VLSI floorplan
VLSI Floorplanning
B
B
A A
E F F
E
G C
G C
D
D
Interconnection graph VLSI floorplan
VLSI Floorplanning
B
B
A A
E F F
E
G C
G C
D
D
Interconnection graph VLSI floorplan
B
A
E F
G C
D
Dual-like graph
VLSI Floorplanning
B
B
A A
E F F
E
G C
G C
D
D
Interconnection graph VLSI floorplan
B B
A A
F E F
E
G G C
C
D D
Dual-like graph Add four corners
VLSI Floorplanning
B
B
A A
E F F
E
G C Rectangular
G C
D drawing
D
Interconnection graph VLSI floorplan
B B
A A
F E F
E
G G C
C
D D
Dual-like graph Add four corners
GRAPHS IN BIO-INFORMATICS
Graphs in Bioinformatics
An Early Application
In the 1950s, Seymour Benzer applied graph theory to
show that genes are linear.
GRAPHS IN BIO-INFORMATICS
Graphs in Bioinformatics
An Early Application
Benzer’s problem was equivalent to deciding whether the graph
obtained from his bacteriophage experiment represented an
interval graph.
1 1 2
2
3 3
6
4
5
6 5 4
Linear Structure Interval Graph
GRAPHS IN BIO-INFORMATICS
Graphs in Bioinformatics
Benzer’s theorem, “genes are linear” played a significant
role in Watson and Crick’s discovery of the double-helix
structure of DNA.
GRAPHS IN BIO-INFORMATICS
Graphs in Bioinformatics
Solve
Problems Solution to
Graph using
in Genomics
Model Graph
Genomics Problems
Algorithms
GRAPHS IN BIO-INFORMATICS
Graphs in Bioinformatics
Problem: DNA Sequencing by Hybridization (SBH)
Given an array of subsequences,
S = { ATG TGG TGC GTG GGC GCA GCG CGT}
All subsequences of length 3 of a sequence of length 10.
Find the DNA Sequence of length 10 which contains all
the subsequences.
Ordering of the subsequences is not known.
Probable Solutions:
1. ATGCGTGGCA
2. ATGGCGTGCA
GRAPHS IN BIO-INFORMATICS
Graphs in Bioinformatics
Modeling SBH as a Hamiltonian Path Problem
S = { ATG TGG TGC GTG GGC GCA GCG CGT}
A Hamiltonian path from the above graph will lead to a solution.
GRAPHS IN BIO-INFORMATICS
Graphs in Bioinformatics
Modeling SBH as a Hamiltonian Path Problem
S = { ATG TGG TGC GTG GGC GCA GCG CGT}
Graph
Solution 1:
ATGCGTGGCA
Solution 2:
ATGGCGTGCA
GRAPHS IN BIO-INFORMATICS
Graphs in Bioinformatics
Modeling SBH as a Hamiltonian Path Problem
Finding a Hamiltonian Path in a graph is difficult.
The problem is thus addressed by searching for
Eulerian Paths.
GRAPHS IN BIO-INFORMATICS
Graphs in Bioinformatics
Modeling SBH as an Eulerian Path Problem
S = { ATG TGG TGC GTG GGC GCA GCG CGT}
GT CG
TG GC
AT CA
GG
An Eulerian path of the above graph will lead to a solution.
GRAPHS IN BIO-INFORMATICS
Graphs in Bioinformatics
Modeling SBH as an Eulerian Path Problem
S = { ATG TGG TGC GTG GGC GCA GCG CGT}
GT CG
TG GC
AT CA
GT CG
GT CG
GG
TG GC
TG GC
AT CA
AT CA
GG
GG
Solution 1: Solution 2:
ATGGCGTGCA ATGCGTGGCA
Pairwise Compatibility Graphs
Pairwise Compatible Graphs: Easy to Construct
h
0.5 0.5
f i
1 1 1 1 3 dmin= 4
k j
e b a
4 3
dmax= 7
d c
A pairwise compatible graph of (T, dmin , dmax )
❑ has n vertices that correspond to the n leaves of T, and
❑ two vertices are adjacent in G if and only if their tree distance is in [dmin, dmax]
Pairwise Compatible Graphs: Easy to Construct
5 ∈ [4,7]
h
0.5 0.5
f i
1 1 dmin= 4 5 a/
1 1 3
k j
e b a
4 3 e/ b/
dmax= 7
d c d/ c/
T
G
A pairwise compatible graph of (T, dmin , dmax )
❑ has n vertices that correspond to the n leaves of T, and
❑ two vertices are adjacent in G if and only if their tree distance is in [dmin, dmax]
Pairwise Compatible Graphs: Easy to Construct
3 ∉ [4,7]
h
0.5 0.5
f i
1 1 dmin= 4 5 a/
1 1 3
k j 3
e b a
4 3 e /
b/
dmax= 7
d c d/ c/
T
G
A pairwise compatible graph of (T, dmin , dmax )
❑ has n vertices that correspond to the n leaves of T, and
❑ two vertices are adjacent in G if and only if their tree distance is in [dmin, dmax]
Pairwise Compatible Graphs: Easy to Construct
7 ∈ [4,7]
h
0.5 0.5
f i
1 1 dmin= 4 5 a/
1 1 3
k j
e b a 7
4 3 e/ b/
dmax= 7
d c d/ c/
T
G
A pairwise compatible graph of (T, dmin , dmax )
❑ has n vertices that correspond to the n leaves of T, and
❑ two vertices are adjacent in G if and only if their tree distance is in [dmin, dmax]
Pairwise Compatible Graphs: Easy to Construct
h
0.5 0.5
f i
1 dmin= 4 a/
1 1 How
1 3 hard is the
k j
e
4 3
Reverse
b a Problem? e/ b/
dmax= 7
d c d/
c/
T
G
A pairwise compatible graph of (T, dmin , dmax )
❑ has n vertices that correspond to the n leaves of T, and
❑ two vertices are adjacent in G if and only if their tree distance is in [dmin, dmax]
Pairwise Compatible Graphs: How to Recognize?
h
f 0.5 0.5
i
1 1 dmin= 4 a/
1 1 3
k j
e b ,d a
2 (T,3 dmin max) ?
e/ b/
dmax= 7
d c d/
c/
T G
A pairwise compatible graph of (T, dmin , dmax )
Given a graph
❑ has G, isthat
n vertices there an edge weighted
correspond tree T of
to the n leaves andT, two
and non-negative integers
dminare
❑ two vertices , dadjacent
max, suchinthat
G ifGand
is aonly
PCGif of (T,tree
their dmin , dmax)?is in [dmin, dmax]
distance
Recognizing PCG by Constructing (T, dmin, dmax)
dmin = 2
dmax = 3.5
Output
Input
dmin = 1.5
dmax = 2
Given a graph G, is there an edge weighted tree T and two non-negative integers
dmin, dmax, such that G is
Another (T, dmin, dOutput
a PCG of possible max)?
Pairwise Compatibility Tree Construction
1
1 1
1 1
G
Pairwise-Compatibility Tree
Complete Graphs of G for dmin=2, dmax=2
1 1
1 1
1
1 1
G
Complete Bipartite Graphs Pairwise-Compatibility Tree
of G for dmin=3, dmax=3
79
Historical Background
[Kearney, Munro, Phillips 2003]
PCGs are used to model evolutionary relationships among a set of organisms.
20
10
20
20
10
Relation Graph Evolutionary Tree
Rigidity Theory
26-Jun-19 Academy Lecture 81
Combinatorial characterization of rigidity
• (J.C.Maxwell, 1870) necessary condition that plane
truss with n joints and m bars is rigid: m≧2n-3
m=13
n=8
2n-3=13
• Maxwell’s rule --- classical and famous theorem
– implies that the rigidity of a truss is determined by its
topology
26-Jun-19 Academy Lecture 83
Theorem (Katoh-Tanigawa 2009,Molecular Theorem)
~
Panel-hinge framework in generic position is rigid in 3D ⇔ G contains 6 edge-
disjoint spanning trees.
Panel-Hinge framework Panel-Hinge graph G
G is rigid panel-hinge graph⇒ (G, q) is rigid for almost all hinge configurations
Srinakharinwirot University, Bangkok
Rigidity analysis for probing protein function:
… some examples Monitor effect of mutations on rigidity
No Drug With drug
HIV Protease Non-mutant Mutant
Jacobs et al: Structure, Function, and Genetics, 2001 Nucleotide binding domain of CFTR
SLIDE – drug docking/screening software (Kuhn, Michigan) Sljoka, TVO, 2011
Simulation of Thermal Protein
Unfolding
Cytochrom c
Hespenheide et al: J. Mol. Graph. & Model., 2002
We can replicate costly and cumbersome experiments using rigidity theory.
Understanding stability of Hyperthermophile and disordered proteins:
Hydrogen-Deuterium Exchange
(HDX) data of Sso Acp -
Acylphosphatase from
hyperthermophile Sulfolobus
solfataricus (Hyperthermophile
protein – solfataricus organism lives
in hot volcanic springs around 80-
90oC)
(Sljoka and Wilson, Physical Biology,
2013)
Rigidity analysis used to
characterize structure and
stability of Tau protein – an
intrinsically flexible protein
responsible in amyloid formation
in Alzheimers, Parkinsons,
dementia and other
neurodegenerative diseases.
Difficult to study experimentally and
with standard computational techniques
(Sljoka, Wilson et al, PLOS ONE 2015)
Social Network Analysis
Can we identify
social
communities?
Nodes: Users
Edges: Friendships
Interesting!
Active users are from the
same community
Critical
Person