Clustering Hierarchical2
Clustering Hierarchical2
Clustering Algorithms
Note to teachers: Teacher notes appear in dark red in the module, allowing faculty to pull
these notes off the teacher version to create a student version of the module.
Target Audience:
Second or third-year undergraduates
Prerequisites:
Students beginning this module should know elementary graph theory and matrix algebra.
Topics:
Introductory network science concepts, techniques, technology, applications.
Learning Outcomes:
Students completing this module will be able to:
• State currently used definitions for network, community, and modularity of a network
• State and apply the definitions for the following types of graphs: simple, directed,
undirected, weighted
• State and calculate the following topological features of a graph: degree sequence,
shortest paths, edge betweenness, connected, disconnected, density, cliques
• State and apply the definitions for the intra-cluster density and the inter-cluster density of
for any subgraph of a network
• For a given partition of a network into a family of clusters, state and apply the definitions
used in Albert-László Barabási's book, Network Science, for (1) the modularity of a
cluster in the partition, and (2) the modularity of the whole network
• Describe the agglomerative and divisive community detection algorithms and be able to
apply these algorithms to detect communities in small, engineered networks. Also, be
able to draw the nested diagram and dendrogram visualizations that result from carrying
out community detection with hierarchical clustering algorithms
• Utilize text and social network analyzer and visualization software (Netlytic) to model,
visualize, and describe social networks
• Interpret the findings from the social network analysis and communicate the results from
community detection
3
Table of Contents
1. Introduction 5
7. Technology Tutorial 47
7. 1 Introduction to Netlytic 47
7.2 Creating a Netlytic Account 47
7.3 Basic Activity to see what Netlytic can create 47
7.4 Discussion Questions 48
7.5 Additional Resources 49
7.6 Homework Exercises 49
8. Suggested Projects 50
9. Endnotes 51
10. References 52
5
1. Introduction
Do you have an idea for a Kickstarter project, say a new electronic instrument or a window
farm? Or are you thinking about running for political office? How do you get backers for your
great idea?
Since its launch on April 28, 2009, Kickstarter has backed over 100,000 successfully funded
projects."i Recent examples include Mary Mattingly's floating garden, Swale (see Figure 1
below) a garden on a 130-by-40-foot steel barge; De La Soul’s album, And the Anonymous
Nobody; and Massoud Hassani’s Mine Kafon drone which finds and detonates landmines.
How did these projects get funded? "The majority of initial funding usually comes from the fans
and friends of each project. If they like it, they'll spread the word to their friends, and so on.
Press, blogs, Twitter, Facebook, and Kickstarter itself are also big sources of traffic and
pledges."ii
Figure 1.1 Swale: "A Floating Food Forest in New York City"iii
Whatever your creative idea might be, completing this module will give you important social
network analysis tools for tapping into communities which can support you and help bring your
great idea to fruition!
6
The quality of our daily lives depends on the successful functioning of many ny kinds of complex
networks, including communications networks such as the World Wide Web, power networks
like the electrical grid, and transportation networks (e.g., airline, train, and highway networks).
Some of the most promising uses of networks lie in biology where researchers are mapping
neural networks and gene regulatory networks in hopes of understanding and intervening with
mechanisms underlying disease. Other important applications lie in public health
h and marketing,
where network analysis helps to find superspreaders of infectious diseases and influencers who
spread news of new products or innovations.
7
In network science, the things that are connected in a network are called nodes. The connections
between the nodes are called links.
links
In 2011, as he was finishing up his Ph.D. at Caltech, Eric Bellm joined LinkedIn and he wrote a
three-part
part blog where he described and displayed social networks to which he belonged -
LinkedIn, Facebook, and Twitter. Today Bellm is Project Scientist at the Palomar Observatory
Observator at
CalTech. Back in 2011 he had belonged to Facebook for seven years. Using the Gephi
visualization tool, here is how he described the communities in his Facebook network, shown in
Figure 2.2:
"While I have not plotted the names for privacy reasons, Gephi
Gephi quite accurately detects the
communities here. Red points are my college friends; red
red-orange
orange are from a college club.
Green points are high school friends, hometown folks, and family. Blue points are grad school
classmates, and peach are members ofof a grad school club. The size of the nodes in this plot scale
with the age of the account; as expected, my college friends (red) have the oldest accounts, while
the newest accounts are generally hometown/family (green)."vi
The use of social network analysis in political science, to study political participation and
conversation, has made visible the polarization in U.S. politics.
In their 2005 paper, "The Political Blogosphere and the 2004 Presidential Election: Divided
They Blog," Lada Adamic and Natalie Glance wrote:
"The 2004 U.S. Presidential Election was the first presidential election in the United States in
which blogging played an important role. ...
We analyze the posts of 40 “A-list” blogs over the period of two months preceding the U.S.
Presidential Election of 2004, to study how often they referred to one another and to quantify
the overlap in the topics they discussed, both within the liberal and conservative communities,
and also across communities."xii
The network in Figure 2.3, from Adamic and Glance, shows the community structure of political
blogs (nodes = political blogs; a link from blog A to blog B means blog A referred to blog B. The
colors indicate political affiliation (red = Republican, blue= Democrat). Orange links go from
liberal to conservative, while purple links go from conservative to liberal. The size of each blog
reflects the number of blogs that link to it.xiii
9
In biology,
y, networks are being used to gain insight into the mechanisms that underlie disease. In
his new book, Network Science
Science, Albert-László Barabási, the Robert Gray Dodge Professor of
Network Science at Northeastern University, observes: "Communities play a particularly
important role in understanding human diseases. Indeed, proteins that are involved in the same
disease tend to interact with each other."xv
etwork"xvi shown in Figure 1.5, "each node is a gene, with two genes
In the “Disease Gene Network"
being connected if they are implicated in the same disorder. The size of each node is proportional
to the number of disorders in which the gene is implicated (see key). ... Only nodes with at least
one link are shown."xvii
10
We end this section with one of the best known examples of a social network, the Zachary
Karate Club network. This network is used to benchmark virtually every community detection
algorithm.
Example 2.3 The Zachary Karate Club network is named for sociologist Wayne Zacharyxviii who
sought to understand the way in which a university karate club split into two groups after a
disagreement arose between the club's instructor and administrator. The club had 34 members
who, because the club was small, all knew each other. Over the three-year the period 1970-1972,
Zachary recorded the number of regular interactions that were occurring between club members
outside the club. From his data set of 34 nodes and 78 links, he constructed the network of
friendships shown in Figure 2.5. Applying the Ford–Fulkerson algorithm, Zachary correctly
assigned all but one member of the club to the groups they actually joined after the split. As
authors M. Bautista, D. Beers, M. Goodloe wrote in their CCICADA module, Who is Really in
Charge":
"The use of social network analysis and visualization techniques helps us to understand,
after the fact, why the members split the way they did. In the right-hand figure below [Figure
2.6], we can see that the members of the club formed two clusters: friends of person 1 (the
instructor) and the friends of person 34 (the club's administrator). The members of these
clusters had friendships almost exclusively with each other and led to the formation of
two new clubs after the break-up."xix
11
Figure 2.5 Zachary's Karate Club before splitxx Figure 2.6 Karate Club after the splitxxi
Having looked
ooked at a variety of networks and informally iintroduced the notion of community, we
now develop the mathematical ideas and tools
to that will help us to approach community more
rigorously and understand the mathematics behind community detection.
Networks are represented by mathematical models called graphs. Regarding terminology, the
word network is often used when the focus is on a real-world, complex system,
system while graph
usually refers to the mathematical model and its properties.
A graph is an ordered pair (V, E), where V is a set of vertices (nodes) and E is a multiset of
edges (links) between the vertices in V. Throughout this module we let N be the number of
vertices in V which are denoted 1, 2, 3, ..., N. Also, we let L be the number of edges in E.
Example 3.1 The graph G = (V, E) in Figure 3.1 has vertex set, V = {1, 2, 3} and multiset
E = {{1, 1}, {2, 1}, {1, 3}, {1, 3}, {2, 3}}. A loop is an edge that begins and ends at itself.
The graph has a loop from vertex 1 to itself.
A graph which has no loops and no repeated edges is a simple graph. For the remainder of this
module we will be working exclusively with simple graphs.
*****
Every graph has an equivalent representation called its adjacency matrix that records the nodes
in the graph and the edge connections between them. Specifically, for a graph G with N vertices,
1, 2, ..., N, the adjacency matrix for G, denoted A = [A ij ], is the N x N matrix defined as
follows:
A ij = 1, if there is an edge from vertex i to vertex j,
otherwise A ij = 0, for all i, j, where 1 ≤ i, j ≤ N.
Its vertex set is V = {1, 2, 3, 4, 5} and E = {{1,2}, {1,5}, {2,3}, {2,4}, {2,5}, {3,4}, {4,5}}.
Here is its adjacency matrix:
Vertex 1 2 3 4 5
1 0 1 0 0 1
2 1 0 1 1 1
3 0 1 0 1 0
4 0 1 1 0 1
5 1 1 0 1 0
Activity 3.1 Give the adjacency matrix for the graph in Figure 3.3.
13
Answer. The 5 x 5 table below is the adjacency matrix for the graph in Figure 3.3.
Vertex a b x y z
a 0 0 1 1 1
b 0 0 1 1 1
x 1 1 0 0 0
y 1 1 0 0 0
z 1 1 0 0 0
A directed graph (digraph) is a graph whose edges have a direction associated with them; a
graph whose edges are two-way is an undirected graph. Among social networks, a Twitter
network is a directed graph (e.g., in a "follows" network, nodes are users of the network; an edge
or arc is drawn from person A to person B if A follows B) while an acquaintance graph (nodes
are users; a link between two users means that they are acquainted with each other) is undirected.
A directed arc from A to B is represented by the order pair (A, B).
Activity 3.2 Give the vertex set, the edge set, and adjacency matrix for the following digraph.
Answer. The directed graph in Figure 3.4 has vertex set V = {1, 2, 3, 4) and edge set
E = {(1, 2), (1, 3), (3, 2), (3, 4), (4, 3)}. Its adjacency matrix is the following 4 x 4 table.
14
Vertices 1 2 3 4
1 0 1 1 0
2 0 0 0 0
3 0 1 0 1
4 0 0 1 0
A graph is weighted if its edges are labeled with numerical values called weights. Weights can
have different meanings, depending on the context.
Example 3.3 In the graph in Figure 3. 3.5, vertices represent U.S. airports, edges represent direct
flightss between the airports, and weights on the edges are the number of air miles between the
airports. Another way to label the edges would be to label each edge with the roundtrip airfare
cost between the connected airports.
3. A weighted graphxxii
Figure 3.5
A graph is defined by the way in which its vertices and edges are arranged, i.e., by the way its
vertices and edges are 'wired together.'xxiv
Example 4.1. In Figure 4.1, the graphs displayed have the same number of vertices and edges
but they have different structures. For example, in part (a) every vertex is adjacent to exactly two
vertices, but in part (b) the graph has a vertex that is adjacent to four vertices.
16
(a) (b)
Figure 4.1 Two graphs with 5 nodes and 5 links but different structures
We look at the structure of a graph from three perspectives: vertices, edges, and global.
Let G be a graph with N vertices and L edges and denote its vertices by 1, 2, ..., N.
The degree of a vertex is the number of edges that are incident with it. In an undirected, simple
graph, the degree of a vertex ranges from 0, when the vertex is adjacent to no other vertices in
the graph, to N-1, when the vertex is connected to all the other vertices in the graph.
When the degrees k 1 , k 2 , ..., k N are written in nonincreasing order as d 1 , d 2 , ..., d N , we call
this sequence the degree sequence of the graph.
Example 4.2 The graph in Figure 3.2 has vertices 1, 2, 3, 4, 5. Their degrees are
k 1 = 2, k 2 = 4, k 3 = 2, k 4 = 3, k 5 = 3. The degree sequence of the graph is 4, 3, 3, 2, 2.
Activity 4.1 Give the degree sequences for the graphs in Figure 4.1 (above).
Answer. Graph (a) has degree sequence 2, 2, 2, 2, 2. Graph (b) has degree sequence 4, 2, 2, 1, 1.
Undirected graphs
There are two important relationships between the structure of an undirected graph, as
represented in its adjacency matrix A = [A ij ], and the degrees of the vertices. They are as
follows:
17
N
(1) ki = ∑ Aij , for any row i in the adjacency matrix, where 1 ≤
j =1
i ≤ N, and
N
(2) ki = ∑ Aji , for any column i in the adjacency matrix, where 1 ≤
j =1
i ≤ N.
Equation (1) follows from the fact that every row in the adjacency matrix consists only of 0s and
1s. When you add up the numbers in row i, this sum is the number of 1's in row ii, i.e., it is the
number of vertices in the graph that are adjacent to i. Equation (2) may similarly be explained.
An additional relationship, the degree sum formula, holdss true for every directed graph:
N
(3) ∑ ki = 2L
i =1
Activity 4.2
(a) Find the degree sequence of the following bipartite graph (Figure 4.2)
(b) Draw two graphs which have different structures but which have degree sequence
2, 2, 2, 2, 2, 2. Explain how your graphs are different.
Answers.
(a) 3, 3, 2, 2, 2
(b) The graphs shown below each have degree sequence 2,2,2, 2, 2,2. The graph on the left has
one component while the graph on the right has two components.
18
Example 4.3 Both graphs in Figure 4.3 have degree sequence 3,2,2,2,1,1,1 but they are wired
differently. While each graph has exactly one vertex of degree 3, the vertex of degree 3 in (a) is
adjacent to a vertex of degree one, while the vertex of degree 3 in (b) is not adjacent to a vertex
of degree 1. (It is adjacent to two vertices of degree 2.)
(a) (b)
Figure 4.3 Two graphs with the same degree sequence but different structures
For a directed graph, we distinguish between the indegree and outdegree of a vertex. For vertex
i, where 1 ≤ i ≤ N, the indegree of i is the number of edges entering i, while the outdegree of i
is the number of edges leaving i.
Activity 4.3 The weighted, directed network in Figure 4.4 was derived from observing the
behavior of elephants over a two month period. The elephants observed were in the Chester Zoo,
which is located in the UK. Each node represents an elephant. A link drawn from elephant A to
elephant B means that A initiated play interaction with B. The weight on the link records the
number of times this occurred over the two month period.
Figure 4.4 Graphic from "The application of social network theory to animal behaviour" xxv
19
Construct a table which shows the indegree, outdegree, and total degree for each elephant. Based on your
table, tell which elephant is most active and speculate on the reasons why.
Answer. vertex Tunga Su Up Sit Ram Sheba Birma Thi Jangoli Maya
indegree 4 2 1 2 2 0 0 0 0 0
outdegree 4 2 1 1 3 0 0 0 0 0
total degree 8 4 2 3 5 0 0 0 0 0
Note that Tunga's indegree is 4, his outdegree is also 4, and his total degree is 8. He has the highest
indegree, outdegree, and total degree of all the elephants.
"Tunga is a popular member of the play network, being linked with every other in
this subgrouping, implying that he occupies a central position. The weights ...
indicate that Tunga is the initiator of much play activity, shown by the high values
associated with his outdegrees."xxvi
Note. For the remainder of this module we will restrict our attention to undirected graphs.
*****
In the popular culture, the phrase "six degrees of separation" conveys the common belief that
every pair of living human beings is separated by no more than six steps; that is, if we were to
draw an acquaintance network of all living human beings, then "anyone on the planet can be
connected to any other person on the planet through a chain of acquaintances that has no more
than five intermediaries."xxvii We now consider how to define the distance between two vertices
in a graph.
A path between two nodes in a graph is a sequence of vertices (no repeats) π : v 1 , v 2 , ... , v k ,
where consecutive pairs of vertices are edges in the graph. The length of the path is k-1.
A shortest path between two nodes in a graph is a path of shortest length between them. Its
length is called the geodesic distance between the vertices.
20
The shortest paths between each pair of nodes are listed in Table 4.1 below.
A topological measure associated with every edge in a graph is its edge betweenness
etweenness. As we will
see in Section 5,, edge betweenness is the basis for an important class of community detection
algorithms, divisive hierarchical clustering algorithms.
algorithms Edges with the highest betweenness in a
xxviii
network are "bridges." "[W]ithout the edge, paths between many pairs of nodes may have to
"[W]ithout
be “re-routed” a longer way."xxix
The betweenness of an edge is the number of shortest paths between pairs of vertices that pass
through that edge.xxx "For
For nodes A and B connected by a path, assume 1 unit of “flow
“flow.” If there
are k shortest paths between A and B, [d]iv
[d]ivide flow evenly along all possible shortest paths from
A to B - if k shortest paths from A and B, then 1/k units of flow pass along each."xxxi
21
Example 4.5 Consider the graph in Figure 4.5 (above). To find the betweenness of edge 3-4, we
follow these steps: (1) Find all the shortest paths between all node pairs in the graph. (2) For
each shortest path, check whether it passes through the edge 3-4; if affirmative, record the units
of traffic flow for this shortest path. (3) Last, sum up all the units of traffic flow to get the
betweenness of edge 3-4. Table 4.2 shows all of our steps. The betweenness of edge 3-4 is 9.
Find the shortest path(s) between Does it include edge {3, 4}? Units of traffic flow
each pair of nodes
The shortest path from 1 to 2 is π : 1, 3, 2 NO 0
The shortest path from 1 to 3 is π : 1,3 NO 0
The shortest path from 1 to 4 is π : 1,3,4 YES 1
The shortest path from 1 to 5 is π : 1,3,4,5 YES 1
The shortest path from 1 to 6 is π : 1,3,4,6 YES 1
The shortest path from 2 to 3 is π : 2,3 NO 0
The shortest path from 2 to 4 is π : 2,3,4 YES 1
The shortest path from 2 to 5 is π : 2,3,4,5 YES 1
The shortest path from 2 to 6 is π : 2,3,4,6 YES 1
The shortest path from 3 to 4 is π : 3,4 YES 1
The shortest path from 3 to 5 is π : 3,4,5 YES 1
The shortest path from 3 to 6 is π : 3,4,6 YES 1
The shortest path from 4 to 5 is π : 4,5 NO 0
The shortest path from 4 to 6 is π : 4,6 NO 0
The shortest path from 5 to 6 is π : 5,6 NO 0
Total flow = 9
Table 4.2 Calculating the betweenness of edge 3-4
An alternative way for analyzing the betweenness of edges is demonstrated in the presentation,
"Betweenness Measures and Graph Partitioning."xxxii We demonstrate this approach by using it
to compute the betweenness of edge 3-4, and then to compute the edge betweenness of the rest of
the edges in the graph in Figure 4.5.
We arrange the edge betweenness values we have found in the matrix in Figure 4.6 where the
22
(i,j)th entry is the edge betweenness for the edge i-j, for all i and j, where 1 ≤ i, j ≤ 6.
nodes 1 2 3 4 5 6
1 0 0 1 0 0 0
2 0 0 1 0 0 0
3 1 1 0 9 0 0
4 0 0 9 0 4 4
5 0 0 0 4 0 1
6 0 0 0 4 1 0
Figure 4.6 Edge Betweenness Table for Example 4.4
Activity 4.4 Find the betweenness score for each of the edges in the graph in Figure 4.7.
Answer The following tablexxxiv gives the betweenness score for each of the edges in
Figure 4.7.
23
In everyday usage, a community is a "social unit (a group of people) who have something in
common, such as norms, values, identity, and often a sense of place that is situated in a given
geographical area (e.g. a village, town, or neighborhood)."xxxv
Example 4.6 In Figure 4.8 you see two connected graphs that have 4 vertices.
Activity 4.5 Draw two connected graphs, each with 4 vertices, whose wiring diagrams are
different from the ones
es in Example 4.
4.8.
Example 4.7 Here are two graphs, each with 4 vertices, that are disconnected.
Example 4.8 The graphs in Figure 4.11 both have degree sequence 3, 3, 2, 2, 2 but they are
wired differently. In the left-hand
hand graph, the two vertices of degree 3 are adjacent; however, the
two vertices of degree 3 are not adjacent.
adjacent
25
Figure 4.11 Two connected graphs with the same degree sequence but different structure
L 2L
(4) δ(G)= =
Lmax N(N-1)
The density of any graph is between 0 to 1 because 0 ≤ L ≤ L max . A graph with density 1,
where all node pairs are linked,
linked is called a complete graph or clique. The symbol K n is used to
denote the complete graph on n vertices. The graphs in Figure 4.12 are the complete graphs K n ,
where 1 ≤ n ≤ 7.
ill-defined. For example, consider the following description of a dense graph (the italics are
ours): "A dense graph is a graph whose number of edges is close to the maximal number of
edges. The opposite, a graph with only a few edges, is a sparse graph. The distinction between
sparse and dense graphs is rather vague, and depends on the context."xl
Example 4.9 Figure 4.13 displays three graphs, all with 5 vertices but of varying density. The
graphs, from left to right, have the following density values: 1, 0.8, 0.5.
Activity 4.8 Find the densities for the following "Star", "Ring", and "Line" networks.
Answer "Star" and "Line" have density 6/21 while "Ring" has density 7/21.
We recall from Section 2 that M.E.J. Newman's definition of as "groups of vertices such that
there are many edges within groups and few edges between groups" is vague. As Newman points
out, what we mean by "many" or "few" is open to question.xlii
As there is no single definition of "community," this means that the community detection
problem is ill-defined. "There are no universal protocols on the fundamental ingredients, like the
definition of community itself, nor on other crucial issues, like the validation of algorithms and
the comparison of their performances."xliii Different algorithms performed on the same network
can produce different clusters of nodes, depending on their sets of assumptions.
27
In order to make precise the intuition that communities in a network are subgraphs whose
vertices are more densely connected to each other than to the rest of the graph, metrics have been
proposed for quantifying the interconnectivity of groups of nodes, as well their intraconnectivity
to other nodes in a network. We present measures that are used by Santo Fortunato, Professor of
Complex Systems in the School of Informatics and Computing at Indiana University.
For any subgraph, C, which has n C nodes and L C links, Fortunato xliv defines two density-related
metrics:
Fortunato describes their value for community detection (the italics are ours): "For C to be a
community, we expect δ int (C ) , to be appreciably higher than δ (G). On the other hand, δ ext (C )
"has to be much smaller than δ int (C ) . Searching for the best tradeoff between a large δ int (C ) , and
a small δ ext (C ) is implicitly or explicitly the goals of most clustering algorithms."
Again, the terms "appreciably greater than" and "much smaller than" and so these metrics are
indistinct, like the notion of community, are indistinct.
Example 5.1 For the network in Figure 5.1 (below), we look to see if the indicated subgraphs
satisfy Fortunato's expectations for communities. We note the graph has N = 6 nodes and L = 6
6 6
edges. Also, the density of G is δ (G)= = = 0.40.
15
6
2
For each subgraph, we carry out the relevant calculations and comparisons of δ (G), δ int (C ), and
δext (C) and list them in Table 5.1.While the inequalities δ int (C ) > δ (G) and δ int (C ) > δ ext (C ) are in
the right direction for both subgraphs, the vagueness of the meaning of "appreciably greater
than" prevents any further conclusions.
28
Subgraph C nC LC # of inter
inter- δint (C ) δ ext (C ) δint (C ) >> δ (G)? δ int (C ) >> δ ext (C )?
cluster edges
Red 3 2 1 0.667 0.1667 0.667 >> 0.40 0.667 >> 0.1667?
Blue 3 3 1 1.00 0.1667 0.67 >> 0.40? 1.00 >> 0.1667?
Activity 5.1 The network in Figure 5.2 appears in a paper by M.E.J. Newman."xlvii Check
whether the three subgraphs (blue, green, red) meet Fortunato's expectations for a community by
computing and comparing δ int (C ), δ ext (C ), and δ(G).
27 27
Answer =
G has parameters N = 15 and L = 27. Its density is δ(G)= = 0.2571.
15 105
2
For each subgraph, we carry out the relevant calculations and comparisons of δ(G), δint (C ), and
δext (C) and list them in the table below.While the inequalities δint (C ) > δ(G) and δint (C)>δext (C)
are in the right direction for all three subgraphs, the ambiguity of the phrase "appreciably greater
than" makes it impossible to draw further conclusions.
Subgraph C nC LC # of inter- δint (C ) δ ext (C ) δint (C ) >> δ (G)? δint (C ) >> δ ext (C )?
cluster edges
Blue 4 5 3 0.5 0.25 0.5 >> 0.2571? 0.5 >> 0.25?
Green 5 8 2 0.8 0.10 0.8 >> 0.2571? 0.8 >> 0.10?
Red 6 10 3 0.67 0.10 0.67 >> 0.2571? 0.67 >> 0.10?
5.2 For a given partition of a network into a family of clusters, how do we measure the
modularity of the partition?
One of the most commonly used optimization methods in community detection is modularity
maximization. In Section 6, we will present two community detection methods that are based on
optimizing the modularity of a network.
To derive an expression for the expected fraction of edges in the random model , we turn to the
concept of a Configuration Model.
According to Wikipedia:
To calculate the probability of there being an edge that connects node i to node j in the random
model, we follow the common practice of introducing the notion of "stubs" or half-edges. Under
30
the Configuration Model, for a particular stub leaving node i, there are 2L-1 possible stubs to
which it might connect; and, for any node j, the number of possible stubs from node j to which a
stub from node i could connect is k j . Therefore, the probability that a stub from node i connects
kj kj
to a stub of j is p j = ≈ , if L is large. So, the expected number of full edges between
2L − 1 2L
ki k j
node i to node j, p ij , is given by: pij =
, if L is large. Now, in the actual network (assumed
2L
simple and connected), the actual number of edges from node i to node j is given by aij .
Therefore, the fraction of the edges that fall within the given groups [i.e., the modules or
communities] minus the expected fraction if edges were distributed at random" is calculated by
finding the following difference:
aij − pij aij pij
= − , for each pair of nodes i and j in the group.
2L 2L 2L
In his book, Network Science, Albert-László Barabási follows the Configuration Model. For a
given partition of a network into a set of n c communities and for each community C c in the
partition, he defines the modularity of C c (where 1 ≤ c ≤ n c ), denoted M c , and the
modularity, denoted M, of the overall network that has been so partitioned as follows:
1
Mc = ∑ ( Aij − pij )
2 L (i , j )∈Cc
(9.9)
where (i, j) means that the nodes i and j belong to the same community and where
ki k j
pi j = . (9.10)
2L
Barabási observes:
"If M c > 0, then the subgraph C c has more links than expected by chance, so it
represents a potential community. If M c = 0, then the connectivity between the N c nodes
is random, fully described by the degree distribution. Finally, if M c < 0, then the nodes
of C c do not form a community. ..."li
He adds:
"Using (9.10) we can derive a simpler form for the modularity (9.9) (Advanced topics 9.B)
2
L k
Mc = c − c , (9.11)
L 2L
where L c is the total number of links within the community C c and k c is the total degree of the
31
"[W]e define the partition's modularity by summing (9.11) over all n c communities:
nc L k 2
M = ∑ c − c . (9.12)"liii
L 2 L
c =1
*****
We apply Barabási's definition of the modularity of a community structure in his formula (9.9) in
the next example.
Example 5.2 Suppose that G is the ring network shown in Figure 5.3. What is the modularity
score, M, for the community structure indicated below where nodes 1, 4, and 3 are in the Blue
community and nodes 2 and 5 are in the Red community?
Using Barabási's formula in (9.9), we will calculate the modularity scores for the red community
and the blue community and add them up to obtain the overall modularity, M, of the whole
network. Then we will compute the network modularity M by using Barabási's formula (9.12).
Figure 5.3 Find the modularity score, M, for this community structure
2*2
We note that because every node has degree 2, then pij = , for all i and j, 1 ≤ i, j ≤ 5.
2*5
Next, we find the adjacency matrix for G:
node 1 2 3 4 5
1 0 1 1 0 0
2 1 0 0 0 1
[ A ij ] = .
3 1 0 0 1 0
4 0 0 1 0 1
5 0 1 0 1 0
32
To make the blue and red communities visible, we reorder the rows and columns of the
adjacency matrix as follows:
node 1 3 4 2 5
1 0 1 0 1 0
3 1 0 1 0 0
[ A ij ] =
4 0 1 0 0 1
2 1 0 0 0 1
5 0 0 1 1 0
Now we compute the modularity for the blue and red communities.
To calculate the modularity of the blue community whose nodes are {1,3, 4}, go through the
terms in the adjacency matrix [ Aij ] which correspond to the blue node pairs; those terms are
highlighted in blue. There are 9 blue node pairs, so there are there are 9 terms, Aij − pij , that are
multiplied by 1 and included in the modularity score. Note that for node pairs where not both
nodes are blue, the difference Aij − pij is multiplied by 0 and so is not included in the
modularity. So, the modularity score for the blue community is:
1
M Blue = {[( A11 − p11 ) + ( A13 − p13 ) + ( A14 − p14 )] + [( A31 − p31 ) + ( A33 − p33 ) + ( A34 − p34 )] +
2*5
[( A41 − p41 ) + ( A43 − p43 ) + ( A44 − p44 )]}
1
= {[(0 − p11 ) + (1 − p13 ) + (0 − p14 )] + [(1 − p31 ) + (0 − p33 ) + (1 − p34 )] +
2*5
[(0 − p41 ) + (1 − p43 ) + (0 − p44 )]}
1 2* 2 2* 2 2* 2 2* 2 2* 2 2* 2
= {[(0 − ) + (1 − ) + (0 − )] + [(1 − ) + (0 − ) + (1 − )] +
2*5 2*5 2*5 2*5 2*5 2*5 2*5
2* 2 2* 2 2* 2
[(0 − ) + (1 − ) + (0 − )]}
2* 5 2*5 2* 5
1 4 4
= {5(0 − ) + 4(1 − )}
2* 5 10 10
= 0.04
Next, we calculate M Re d .
33
1
M Re d = {[( A22 − p22 ) + ( A25 − p25 )] + ( A52 − p52 )] + [( A55 − p55 )]}
2 *5
1
= {[(0 − p22 ) + (1 − p25 )] + (1 − p52 )] + [(0 − p55 )]}
2 *5
1 2*2 2*2 2*2 2*2
= {[(0 − ) + (1 − )] + (1 − )] + [(0 − )]}
2 *5 2*5 2*5 2*5 2*5
1 2* 2 2*2
= {2(0 − ) + 2(1 − )}
2 *5 2 *5 2 *5
1 2* 2 2*2
= {2(0 − ) + 2(1 − )}
2 *5 2 *5 2 *5
1 2* 2 2* 2
= {2(0 − ) + 2(1 − )}
2*5 2*5 2 *5
= 0.04
2
L k
Next, we recalculate M Re d and M Blue by using the formula (9.11), M c = c − c .
L 2L
We observe that L = 5, LBlue = 2 , kBlue = 6, LRed = 1 , kRed = 4. So:
2 2
LRed k Re d 1 4
M Red = − = − = 0.04, and
L 2 L 5 2*5
2 2
LBlue k Blue 2 6
M Blue = − = − = 0.04.
L 2L 5 2*5
Activity 5.2 Give the modularity scores for the community structures shown in Figure 5.4.
As we have noted in this module, no consensus has yet developed around a formal definition of
community. Accordingly, a variety of community detection methods have been developed (e.g.,
clique-based, graph partitioning, and modularity maximizationlv detection methods) in order to
meet the assumptions of different definitions. It is important to note that the brute-force approach
of finding all possible partitions of a network and choosing the one that best fits a particular
definition of community is not feasible as it is an NP-hard problem. Using brute force to solve
even the simplest community-finding problem, graph bisection, is NP-hard.lvi As Barabási states,
"We therefore need algorithms that can identify communities without inspecting all partitions."lvii
In this section we will describe some of the most widely used algorithms, called hierarchical
clustering methods, whose goal is to maximize modularity.
As the word nearest indicates, AGNES is based on distance. Carrying out hierarchical clustering
requires two decisions: (1) the choice of a distance function, d, for measuring the distance d(i, j)
between any two nodes, i and j, in the network; and (2) the decision of how to extend the
distance function, d, to define the distance d(A,B) between any two clusters of nodes, A and B.
There are several options for (1), defining the distance between two nodes. These options access
the topological structure of the network through the degrees of the nodes, k i (1 ≤ i ≤ N), the
1 N
means of the rows in the adjacent matrix, < A i > = ∑ A , and the number of neighbors that
N k =1 ik
two nodes i and j share in common, nij = ∑ Aik Akj (the dot product of rows A i and A j in the
k
adjacency matrix); a neighbor of a node is a node to which it is adjacent. Three commonly used
ways to measure the distance between two nodes i and j in an undirected network are as follows:
35
• the cosine similarity measure, σ ij , which uses the dot product of the vectors formed by
rows i and j in the adjacency matrix
N
∑ Aik Akj
(5) σ ij = k =1
N N
∑ A ik ∑ A jk
k =1
2
k =1
2
• the standard Pearson correlation coefficient, rij , which use the variance of rows i and j
in the adjacency matrix:
N
where cov( Ai , Aj ) denotes the covariance of rows i and j in the adjacency matrix
Very important, each of these metrics can be expressed succinctly in a network-centric way:
Once a distance metric, d, for nodes has been selected, we define the N x N matrix D = [d(i , j)]
to be the distance matrix for the network.
For step (2), i.e., selecting a way to extend d to define the distance d(A,B) between two clusters,
A and B, there are several options, such as:
• minimum or single-linkage clustering, where d(A, B) = min{d(a,b) | a ∈ A, b ∈ B};
36
The sequential steps in hierarchical clustering can be displayed in a nested diagram that shows
the sequential merging of clusters, as well as by a tree called a dendrogram.
We will carry out single-linkage, hierarchical clustering using the squared Euclidean distance
formula (7), dij = ki + k j − 2nij , for measuring the distance between nodes i and j.
Next, we arrange the values n ij (i.e., the number of neighbors of nodes i and j) are as follows:
2 1 1 2 1
1 4 1 2 2
[nij ] = 1 1 2 1 2
2 2 1 3 1
1 2 2 1 3
0 4 2 1 3
4 0 4 3 3
D = [d ] = 2 4 0 3 1 .
ij
1 3 3 0 4
3 3 1 4 0
clusters 1 2 3 4 5
1 0 4 2 1 3
2 4 0 4 3 3
3 2 4 0 3 1
4 1 3 3 0 4
5 3 3 1 4 0
ii. We merge cluster 1 and cluster 4 into one cluster {1,4} at a distance of 1; we also merge cluster
3 and cluster 5 into one cluster {3,5} at a distance of one. The minimum distance between the
clusters is 2.
iii. We merge cluster {1,4} and cluster {3,5} into cluster {{1,4}, {3,5}} at a distance of 2.
The minimum distance between clusters is 3.
iv. We merge cluster {{1,4},{3,5}} and cluster 2 into one cluster {{{1,4},{3,5}},2} at a
distance of 2. The last cluster contains all the objects, thus the agglomerative algorithm
is concluded.
38
In Figure 6.1 we show two visualizations of the sequential steps that we carried out in
hierarchical clustering. In the dendrogram, the nodes of the networks are arranged along the x-
axis. The y-axis scale represents distances between the clusters.
Note. One way to subdivide a network into clusters is to choose a height along the y-axis of the
dendrogram, and draw a horizontal line at that height (see the red line in the above dendrogram),
and count how many vertical lines it crosses. Each vertical line corresponds to a grouping of
clusters, and the members of that grouping are represented by the branches of the dendrogram
that lie below the line. In the above dendrogram, the red horizontal line intersects two vertical
lines. The associated community structure is the set of two clusters: A = {1,4,3,5} and B = {2}.
*****
In general, how can we use the dendrogram to determine how many clusters best fit our data?
Because the distance metric we choose is intended to intended to measure the similarity of nodes,
"The best choice of the no. of clusters is the no. of vertical lines in the dendrogram cut by a
horizontal line that can transverse the maximum distance vertically without intersecting a
cluster."lviii
To illustrate, in the dendrogram shown in Figure 6.2 below, the distance between the two red
lines indicates the difference between the clusters; the longer the line, the greater the difference.
39
These lines cut four vertical lines. For each vertical line, the nodes beneath it form a community.
So, there are four communities: {9,23,17,6,11,3,15,7}, {14,19,16,24,22},
{1,13,12,5,21,4,10,20}, {2,18,8,25}.
Activity 6.1 Consider the following distance matrix for a network of 5 nodes 1, 2, 3, 4, 5.
distance 1 2 3 4 5
1 0 0.10 0.90 0.35 0.80
2 0.10 0 0.30 0.40 0.50
3 0.90 0.30 0 0.60 0.70
4 0.35 0.40 0.60 0 0.20
5 0.80 0.50 0.70 0.20 0
Using minimum distance (single-linkage), give the dendrogram and nested diagram that result
from carrying out agglomerative hierarchical clustering.
Answer We begin by assigning each of the five nodes to its individual cluster.
Step 1 The minimum distance in the table is the distance between nodes 1 and 2.
This distance is 0.10. So we group 1 and 2.
40
cluster 1 2 3 4 5
0
.
1
0
1 0 0.90 0.35 0.80
2 0.10 0 0.30 0.40 0.50
3 0.90 0.30 0 0.60 0.70
4 0.35 0.40 0.60 0 0.20
5 0.80 0.50 0.70 0.20 0
Step 2 The minimum distance in the new table is the distance between nodes 4 and 5.
The distance if 0.20. We now group 4 and 5.
cluster 1,2 3 4 5
1,2 0 0.30 0.35 0.50
3 0.30 0 0.60 0.70
0
.
2
0
4 0.35 0.60 0
5 0.50 0.70 0.20 0
Step 3 The minimum distance in the new table is the distance between {1,2} and 3.
The distance is 0.30. We now cluster 3 with {1,2}.
1,2 0 0.35
3 0.30 0 0.60
4,5 0.35 0.60 0
Step 4 The minimum distance in the new table is the distance between {{1,2},3} and {4,5}.
The distance is 0.35. We now cluster {{1,2},3} with {4,5}.
1,2,3 0
0
.
3
5
4,5 0
Step 5 The minimum distance in the new table is 0. We have one cluster. The process is
complete.
cluster {{{1,2},3},{4,5}}
{{{1,2},3},{4,5}} 0
41
The dendrogram and nested diagrams below visualize our results. The dendrogram is drawn
based on the distances to merge the clusters.
Agglomerative nesting algorithms are 'bottom-up.' They start with each node of the network
assigned to its own cluster.. Next, two nearest clusters are merged into the same cluster.
c This
process is iterated until only one cluster is left.
Divisive analysis clustering (DIANA) algorithms are the reverse of agglomerative nesting
algorithms. They are 'top-down
down.' Initially all the nodes in a network are placed in a single cluster.
Next, based
ased on the intuition that edges with the highest betweenness values are 'bridges' between
communities, groups are subdivided into two group recursively until each node of the network
forms a separate cluster of its own.
In Example 4.5,, we calculated the edge betweenness scores for every edge in the network in
Figure 4.5 (displayed above) and summarized them in the following betweenness table (the
j, for all i and j, where 1 ≤ i,
number in the (i,j)th position is the edge betweenness for the edge ii-j,
j ≤ 6):
nodes 1 2 3 4 5 6
1 0 0 1 0 0 0
2 0 0 1 0 0 0
3 1 1 0 9 0 0
4 0 0 9 0 4 4
5 0 0 0 4 0 1
6 0 0 0 4 1 0
Step 1. Since the edge 3-44 has the highest betweenness score,
score we remove it from the network.
Here is the new network that results:
Step 2. We re-compute
compute the betweenness score
scoress for the edges in the new network
network.
nodes 1 2 3 64 5
1 0 0 2 0
0 0
2 0 0 2 00 0
3 2 2 0 00 0
4 0 0 0 0 1 1
5 0 0 0 1 0 1
6 0 0 0 1 1 0
Edges 1-3 and 2-3 have the highest betweenness score, so we remove them from the network.
Here is the network that results:
Step 3. We re-compute the betweenness scores for the edges in the new network.
nodes 1 2 3 4 5 6
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 1 1
5 0 0 0 1 0 1
6 0 0 0 1 1 0
Edges 4-5, 4-6, 5-6 have the highest betweenness score, so we remove them from the network.
Here is the network that results:
The network has been completely subdivided and the algorithm is complete.
Activity 6.2. Consider the following network, called the "Line" network:
f. Calculate the modularity, M, of the community structure for the network in part e.
nc 2
Lc kc
Again, use A-LBarabási's formulas M = ∑ M c , where M c = − (1 ≤ c ≤ n c ).
c =1 L 2L
Answers
part a
part b
part c
As indicated below, there are 3 communities: red = {G, F, E}, blue = {C, B, A}, and green = {D}.
2 2
LRed k 2 5 23
M Red = − Re d = − = , and
L 2L 6 2 * 6 144
2 2
LBlue k 2 5 23
M Blue = − Blue = − = , and
L 2L 6 2*6 144
2
−4
2
L k 0 2
M Green = Green − Green = − =
L 2L 6 2*6 144
part d
46
part e
part f
As indicated below, there are 7 communities: {G}, {F}, {E}, {C}, {B},{ A}, and {D}.
Therefore, the modularity of this community structure is:
c =7 0 1 2 0 2 2 11
M = ∑ M c = 2 − + 5 − = − = −0.1528.
c =1 6 12 6 12 72
47
7. Technology Tutorial
Note: Netlytic does not access or store your personal email password. Netlytic simply asks
Gmail or Yahoo to verify your identity.
The nodes are tweets that were made. The edges represent when a tweet was retweeted. The
different colors represent different communities retweeting different messages. You might notice
a large community in the middle of the data with the center node being “rt_com”. This is a news
organization, RT, that had tweeted a story about nurses in the United States protesting about a
lack of protection in place to safeguard against ebola. Here is a link to the original tweet. Here is
a link to the news article on RT’s website.
4. Is public consensus in agreement on this issue? Give some evidence as to why or why
not.
5. Think about different groups, politicians, economists, the military, etc. How could these
groups use the information presented here?
Data Set 1: A set of tweets from participants of a Massively Online Open Course called CCK11.
This data set was posted to Netlytic by the pLASMA research team. The data can be downloaded
here.
Data Set 2: A set of tweets regarding a conference about social media called Social Media and
Society in 2015. The data can be downloaded here.
50
8. Suggested Projects
● Twitter posts regarding the discovery of a new particle with properties of the Higgs
boson. The data is from Twitter between July 1st and July 7th, 2012. Students can
analyze the spreading of the news of the discovery. Here is a link to the data set and some
metadata information. The data is hosted by Jure Leskovec at Stanford University and
collected by [1].
● A map of Twitter followership and friendship. This data is a directed graph showing
which Twitter users follow which users. This project can head in numerous different
directions regarding communities, such as where are people getting news via Twitter,
who is popular on Twitter, or which celebrities have the greatest following on Twitter.
Here is a link to the data set and some metadata information. The data is hosted by ASU
and collected by [5].
● We have presented two hierarchical clustering algorithms in this module, the
agglomerative and divisive algorithms. There are numerous other clustering algorithms.
Students could create a project where there independently learn about and present a
different clustering algorithm. This should include a discussion of the advantages and
disadvantages between the new algorithm and the ones presented in this module. The
project should also include worked out examples on some sample networks.
● The community detection algorithms presented in this module use a modularity method
for detecting communities in networks. There are other methods for detecting
communities in networks, for example a betweenness-based algorithm for community
detection. A student could independently explore a different method for detecting
communities. This should include a discussion of the advantages and disadvantages
between the new algorithm and the ones presented in this module. The project should
also include worked out examples on some sample networks.
● In the agglomerative hierarchical clustering method described in this module we have
defined one possible way of defining distance. There are other possible ways to define
distance. A student project could be exploring complete-linkage clustering and average
linkage clustering as different ways to define distance in this method. The student should
consider an example network and compare the communities found using the three
different distances. Included in this should be an analysis of the differences in the
communities found. There should also be an analysis of the advantages and
disadvantages of each of the different distances used.
UCI offers a repository of data sets of numerous different networks here. If you are looking for
more project ideas you can examine the different data sets in the repository.
51
9. Endnotes
i
https://www.kickstarter.com/about?ref=nav
ii
https://www.kickstarter.com/help/faq/kickstarter%20basics
iii
https://www.kickstarter.com/projects/1152620801/swale
iv
http://www.forbes.com/sites/jacobmorgan/2014/05/13/simple-explanation-internet-things-that-anyone-can-
understand/#5de624a86828
v
PowerPoint presentation, "Army Cyber Institute Overview", by MAJ Natalie Vanatta, Deputy Chief of Research,
Army Cybersecurity Institute, United States Military Academy, West Point. Delivered Thursday, June 16, 2016, at
the CCICADA Reconnect Workshop at West Point.
vi
Visualizing Social Networks II: Facebook.
http://bellm.org/blog/2011/06/08/visualizing-social-networks-ii-facebook/
vii
Ibid
viii
p.357. Networks: An Introduction. M .E. J. Newman. Oxford University Press (2010)
ix
Ibid
x
Ibid
xi
Ibid
xii
http://www.maths.tcd.ie/~mnl/store/AdamicGlance2004a.pdf
xiii
"The Political Blogosphere and the 2004 U. S. Presidential Election: Divided They Blog." Lada Adamic and
Natalie Glance. http://www.maths.tcd.ie/~mnl/store/AdamicGlance2004a.pdf
xiv
Figure 1, Ibid
xv
Chapter 9, Network Science. Albert-László Barabási. Cambridge University Press.(2016)
xvi
https://www.quantamagazine.org/wp-content/uploads/2015/01/HDN_Quanta.jpg
xvii
http://www.pnas.org/content/104/21/8685/F2.expansion.html
xviii
An Information Flow Model for Conflict and Fission in Small Groups. Wayne W. Zachary. Journal of
Anthropological Research, Vol. 33, No. 4 (Winter, 1977), pp. 452-473
xix
CCICADA module: Who Is Really in Charge. M. Bautista, D. Beers, M. Goodloe. http://ccicada.org/wp-
content/uploads/2015/06/WhoIsReallyInChargeModule_Bautista_Beers_Goodloe_final-October-29-2015.pdf
(2015)
xx
https://networkdata.ics.uci.edu/data.php?id=105
xxi
http://ifisc.uib-csic.es/~jramasco/fig/zach_layout3.jpg
xxii
http://stanford.edu/class/archive/cs/cs106b/cs106b.1156/images/graph-figure-2.png
xxiii
https://aleadeum.files.wordpress.com/2016/07/lesmis.png?w=700
This network is based on the data set from D. E. Knuth. (1993). The Stanford GraphBase: A Platform for
Combinatorial Computing, Addison-Wesley, Reading, MA
xxiv
Section 2.2, Network Science. 9, Network Science. Albert-László Barabási. Cambridge University Press.(2016).
A-L Barábasi speaks of the "wiring diagram" of a network in his book beginning in Section 2.2.
xxv
The application of social network theory to animal behaviour, Amelia Coleing, March 3, 2009
xxvi
http://biohorizons.oxfordjournals.org/content/2/1/32.full
xxvii
http://whatis.techtarget.com/definition/six-degrees-of-separation
xxviii
Slide 8. Betweenness Measures and Graph Partitioning.
https://www.ismll.uni-hildesheim.de/lehre/cmie-11w/script/lecture5.pdf
xxix
Ibid
xxx
https://en.wikipedia.org/wiki/Girvan%E2%80%93Newman_algorithm
xxxi
Slide 9. Betweenness Measures and Graph Partitioning.
https://www.ismll.uni-hildesheim.de/lehre/cmie-11w/script/lecture5.pdf
xxxii
Slide 10. Betweenness Measures and Graph Partitioning.
https://www.ismll.uni-hildesheim.de/lehre/cmie-11w/script/lecture5.pdf
xxxiii
This graph is from slide 33 of a presentation, Community Detection and Graph-based Clustering. Adapted from
Chapter 3 of the book, Community Detection and Mining in Social Media, by Lei Tang and Huan Liu. Slides
prepared by Qiang Yang. http://images.slideplayer.com/8/2309634/slides/slide_33.jpg
52
xxxiv
Ibid
xxxv
https://en.wikipedia.org/wiki/Community
xxxvi
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3088589/
xxxvii
http://www.cfinder.org/
xxxviii
Network Science. Albert-László Barabási. Cambridge University Press.(2016)
xxxix
http://mathworld.wolfram.com/CompleteGraph.html
xl
https://en.wikipedia.org/wiki/Dense_graph
xli
http://www.faculty.ucr.edu/~hanneman/nettext/C10_Centrality.html
xlii
p. 357. Networks: An Introduction. M .E. J. Newman. Oxford University Press (2010)
xliii
Characterizing the Community Structure of Complex Networks. Andrea Lancichinetti, Mikko Kivelä, Jari
Saramäki, Santo Fortunato. PLOS One.( August 12, 2010)
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0011976
xliv
Community Detection in Graphs. Santo Fortunato. (2010) https://arxiv.org/pdf/0906.0612.pdf
xlv
Ibid
xlvi
Ibid
xlvii
Figure 1: Example network showing community structure. "Communities, modules and large-scale structure in
networks." M. E. J. Newman. Nature Physics 8, 25–31. (2012).
http://www.nature.com/nphys/journal/v8/n1/full/nphys2162.html
xlviii
Ibid
xlix
https://en.wikipedia.org/wiki/Modularity_(networks)
l
Ibid
li
p.339, Ibid
lii
p. 339, Ibid
liii
Ibid
liv
Aaron Clauset. Lecture 5. Network Analysis.
http://tuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2016_L5.pdf
lv
Ibid
lvi
A.-L. Barabási, Chapter 9, Network Science. Cambridge University Press. (2016)
lvii
Ibid
lviii
https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-
clustering/
lix
Ibid
10. References
[5] R. Zafarani and H. Liu, (2009). Social Computing Data Repository at ASU
[http://socialcomputing.asu.edu]. Tempe, AZ: Arizona State University, School of Computing,
Informatics and Decision Systems Engineering.
[6] Albert-László Barabási. Network Science. Cambridge University Press.(2016)