Social Network Analysis
Social Network Analysis
SYLLABUS:
UNITI: Social Network Analysis: Preliminaries and definitions,
Erdos Number Project, Centrality measures,Balance
andHomophily.
UNITII: Random graph models: Random graphs and alternative
models, Models of network growth, Navigationin social
Networks, Cohesive subgroups, Multidimensional Scaling,
Structural equivalence, roles andpositions.
UNITIII:Networktopologyanddiffusion,ContagioninNetworks,Co
mplexcontagion,Percolationandinformati
on,NavigationinNetworksRevisited.
UNITIV: Small world experiments, small world models, origins of
small world, Heavy tails,Small Diameter
,Clusteringofconnectivity
,TheErdosRenyiModel,ClusteringModels.
UNITV: Network structure -Important vertices and page rank
algorithm, towards rational dynamics in networks,basics of
game theory, Coloring and consensus, biased voting, network
formation games,
networkstructureandequilibrium,behavioralexperiments,Spatial
andagent-basedmodels.
1|Page
UNIT-1
SYLLABUS:
Social Network Analysis: Preliminaries and definitions, Erdos
Number Project, Centrality measures,Balance andHomophily.
1. Nodes: Nodes represent individual actors or entities within the social network.
In a social context, nodes can be people, organizations, or any other relevant unit
of analysis.
2. Edges or Ties: Edges, also known as ties or links, represent the connections or
relationships between nodes. These connections can be various types, such as
friendship, communication, collaboration, or any other form of interaction.
2|Page
4. Betweenness Centrality: Betweenness centrality identifies nodes that act as
bridges or intermediaries between different parts of the network. Nodes with high
betweenness centrality have significant control over the flow of information or
interactions in the network.
7. Small World Phenomenon: Social networks often exhibit the small world
phenomenon, where the average distance between nodes is relatively short, and
individuals can reach each other through a few intermediaries.
3. Disease Spread and Epidemiology: SNA helps understand how diseases spread
through networks, identifying critical nodes for intervention.
3|Page
4. Influence and Opinion Propagation: SNA studies how ideas, opinions, and
information spread within social networks, identifying influential individuals or
groups.
Social Network Analysis provides valuable insights into the underlying structure
and dynamics of social networks, enabling a better understanding of social
behavior, information flow, and the impact of relationships on individuals and
communities.
4|Page
2. Edges (Ties, Links): Edges, ties, or links refer to the connections between nodes
in the social network. They represent relationships, interactions, or associations
between individuals or entities.
5|Page
edges. If there is only one connected component, the entire network is
connected.
10. Clustering Coefficient: The clustering coefficient measures the extent to which
nodes in a network tend to cluster together, forming cohesive subgroups or
communities.
11. Small World Phenomenon: The small world phenomenon describes the
property of social networks where the average path length between nodes is
relatively small, indicating that people are connected by a few intermediaries.
6|Page
The Erdős number is calculated as follows:
The Erdős number project aims to determine the Erdős number of as many
mathematicians and researchers as possible, creating a "genealogy" of academic
collaboration in mathematics. Researchers who have an Erdős number of 1 are
highly esteemed in the mathematical community, and having a small Erdős
number is considered a prestigious achievement.
The project has led to interesting insights into the structure of academic
collaboration networks and the interconnectedness of researchers in mathematics
and other fields. Many researchers take pride in having a low Erdős number and
may collaborate with others to reduce their Erdős number or improve their
academic connections.
The Erdős number project is ongoing, and with the growth of academic research
and collaborations, new connections are continually being discovered. Online
databases and tools are available for researchers to calculate and find their Erdős
number based on their academic publications and co-authors. The project has
become a fun and collaborative endeavor that brings mathematicians and
researchers together, fostering academic connections and promoting scientific
collaboration.
7|Page
Centrality measures:
Centrality measures in Social Network Analysis (SNA) quantify the importance or
prominence of nodes (individuals, organizations, or entities) within a social
network. These measures help identify key actors who play critical roles in the
network's structure, communication, and information flow. There are several
centrality measures, each capturing different aspects of a node's importance in
the network. Some common centrality measures include:
1. Degree Centrality: Degree centrality is the simplest and most basic centrality
measure. It calculates the number of direct connections (edges) a node has with
other nodes in the network. Nodes with high degree centrality are more
connected to others and often have a greater influence on the overall network.
8|Page
5. PageRank: PageRank, originally developed by Google for ranking web pages, is a
variant of eigenvector centrality. It assigns importance to nodes based on the
number and quality of incoming links. Nodes with higher PageRank are considered
more important.
Balance andHomophily:
Balance and homophily are two important concepts in social network analysis that
influence the structure and dynamics of social networks.
1. Balance:
9|Page
Balance theory, also known as structural balance theory, is a psychological theory
that examines the balance or imbalance in relationships within a social network. It
suggests that individuals prefer to maintain balanced relationships, where the
patterns of positive and negative ties are consistent. In a balanced triad (a set of
three nodes connected by edges), either all three relationships are positive, or
one relationship is negative while the other two are positive.
For example, in a friendship network, if A and B are friends, and B and C are
friends, it is likely that A and C will also be friends to maintain balance in the triad.
If A and C are not friends, the triad would be unbalanced, leading to potential
tension or discomfort among the nodes involved.
2. Homophily:
10 | P a g e
Homophily has a significant impact on the structure of social networks, as it leads
to the formation of clusters or communities of individuals with similar attributes.
It can reinforce existing social divisions and create echo chambers where like-
minded individuals reinforce each other's beliefs and opinions.
Both balance and homophily play critical roles in shaping the structure and
functioning of social networks. While balance theory focuses on the tendency to
maintain balanced relationships, homophily examines the tendency to form
connections with similar others. Together, these concepts contribute to a deeper
understanding of how social networks evolve and influence individual behaviors
and social interactions.
11 | P a g e
UNIT-2
SYLLABUS:
Random graph models: Random graphs and alternative models, Models of
network growth, Navigationin social Networks, Cohesive subgroups,
Multidimensional Scaling, Structural equivalence, roles andpositions.
Random graph models are mathematical structures used to describe and analyze
random graphs. A random graph is a graph in which the presence or absence of
each edge is determined randomly based on certain probability distributions or
rules. These models are essential in studying the properties of real-world
networks and understanding various phenomena in fields like social networks,
computer networks, biological networks, and more.
There are several popular random graph models, each with its own characteristics
and applications. Here are some of the most well-known ones:
12 | P a g e
3. Watts-Strogatz (WS) Model:
The Watts-Strogatz model generates small-world networks, which have both a
high clustering coefficient (like regular graphs) and a short average path length
(like random graphs). It starts with a regular lattice structure and then rewires
some edges randomly with a probability parameter. This model helps in
understanding the emergence of small-world phenomena in real-world networks.
4. Configuration Model:
The configuration model is a more general random graph model that allows for
the specification of a desired degree sequence for a graph. The model takes a
given sequence of vertex degrees and creates random graphs that have exactly
that degree sequence. However, not all degree sequences correspond to valid
graphs, so this model may require some post-processing to ensure consistency.
These models and their variants have been extensively studied in network science
and graph theory. They provide valuable insights into the structure and behavior
of real-world networks and have applications in various fields, including social
sciences, computer science, biology, and physics.
13 | P a g e
These graphs are essential in understanding real-world networks, such as social
networks, communication networks, biological networks, and more. In addition to
the models mentioned earlier, there are other alternative random graph models
and variations that researchers use to capture specific characteristics or
phenomena. Here are a few examples:
14 | P a g e
network. It is commonly used to test the significance of network properties or
compare them with those of random networks.
15 | P a g e
Models of network growth:
Models of network growth aim to simulate and explain the evolution of networks
over time, reflecting how new nodes and edges are added to the network. These
models are essential in understanding the underlying mechanisms that shape the
structure of various real-world networks. Here are some notable models of
network growth:
3. Copying Model:
The Copying Model, proposed by Kleinberg et al., is a model of network growth
that imitates human behavior when linking to existing content on the internet. It
begins with a small seed graph, and at each time step, a new node is introduced.
The new node either connects to an existing node with a probability proportional
to its degree or copies an existing node's connections with a certain probability.
6. Fitness Model:
The Fitness Model assigns a fitness value to each node, representing its
attractiveness or "fitness." New nodes are introduced over time, and they connect
to existing nodes with a probability that depends on both the node's fitness and
its degree. This model combines preferential attachment and node fitness to
explain network growth.
These models of network growth help researchers gain insights into the
underlying mechanisms that govern the evolution of various networks, including
social networks, citation networks, biological networks, and more. By comparing
the properties of simulated networks with real-world data, researchers can better
understand the processes that shape complex network structures over time.
17 | P a g e
Navigationin social Networks:
Navigation in social networks refers to the process by which individuals or users
traverse through the network, exploring connections and reaching desired
destinations, such as specific profiles, content, or communities. Social networks
are characterized by their interconnected nature, where users can establish
relationships with others, form groups, and share information. Effective navigation
within social networks is crucial for users to find relevant content, connect with
others, and engage with the platform's features. Here are some aspects and
strategies related to navigation in social networks:
1. User Profiles and Timelines: Social networks typically have user profiles and
timelines that display posts, updates, or activities of the users. Navigation involves
scrolling through timelines to see recent content, exploring user profiles to learn
more about individuals, and interacting with posts through likes, comments, or
shares.
4. Notifications and Alerts: Users receive notifications and alerts about activities
related to their network, such as mentions, new followers, friend requests, or
group invitations. Navigation includes managing and responding to these
notifications.
18 | P a g e
5. Hashtags and Topics: Many social networks use hashtags or topic labels to
categorize content and make it easily discoverable. Users can navigate by clicking
on hashtags or topics to explore related content.
8. Privacy and Security Settings: Social networks offer privacy and security settings
that allow users to control who can see their content and interact with them.
Navigation includes managing these settings to ensure a desired level of privacy
and security.
9. Navigation Design: The layout and user interface of a social network play a
significant role in navigation. An intuitive and user-friendly design can make it
easier for users to explore the network and find what they are looking for.
19 | P a g e
Overall, effective navigation in social networks enhances user experience, fosters
engagement, and facilitates meaningful interactions among users, leading to a
thriving and vibrant online community.
Cohesive subgroups:
Cohesive subgroups, also known as cohesive groups or communities, are subsets
of nodes within a network that have stronger internal connections compared to
their connections with nodes outside the subgroup. In other words, cohesive
subgroups exhibit higher density of edges within the subgroup, leading to a
tightly-knit and interconnected structure. These subgroups are characterized by
the presence of many mutual connections among their members, fostering a
sense of closeness and shared interests among the nodes within the group.
20 | P a g e
3. Girvan-Newman Algorithm: This algorithm works by iteratively removing edges
with high betweenness centrality, which are likely to connect different
communities. The process leads to the identification of cohesive subgroups as the
network structure breaks into distinct communities.
Multidimensional Scaling:
Multidimensional Scaling (MDS) is a statistical technique used to analyze and
visualize the similarities or dissimilarities between objects or items in a dataset. It
21 | P a g e
is particularly useful when dealing with data where the relationships between
items are measured based on multiple attributes or dimensions. MDS aims to
represent the data in a lower-dimensional space, typically two or three
dimensions, while preserving the pairwise distances or dissimilarities between the
objects as accurately as possible.
1. Input Data: The first step is to have a matrix or table of dissimilarity measures
between each pair of objects. These dissimilarities can be calculated based on
various distance metrics, such as Euclidean distance, Manhattan distance, or
correlation coefficients, depending on the type of data.
3. Constructing the Distance Matrix: Using the input data, a distance matrix is
constructed that represents the pairwise distances between objects in the higher-
dimensional space.
22 | P a g e
4. Stress Function Minimization: MDS aims to minimize the stress function, which
is a measure of how well the lower-dimensional distances preserve the original
pairwise distances. Various optimization algorithms, such as gradient descent or
iterative methods, are used to find the optimal configuration of points in the
lower-dimensional space.
5. Visualizing the Results: Once the optimal configuration is obtained, the objects'
positions in the lower-dimensional space are plotted as points on a scatter plot or
map. The proximity of the points reflects the similarity or dissimilarity between
the corresponding objects in the higher-dimensional space.
23 | P a g e
Overall, multidimensional scaling is a versatile tool for analyzing complex data and
providing meaningful visualizations to aid in data exploration and pattern
recognition.
Structural equivalence:
Structural equivalence is a concept in network analysis that refers to a type of
similarity between nodes (individuals or entities) within a network. Nodes are
considered structurally equivalent if they have similar patterns of connections to
other nodes in the network. In other words, structural equivalence captures the
idea that nodes are equivalent in terms of their position and role within the
network, regardless of their individual attributes.
1. Regular Equivalence: Two nodes are regularly equivalent if they have identical
connections to all other nodes in the network. In other words, they have the same
neighbors and are connected to the same nodes in the same way. Regular
equivalence implies a complete correspondence in the nodes' relations within the
network.
3. Information Diffusion: Nodes that are structurally equivalent may have similar
abilities to spread information efficiently through the network.
4. Group Formation: In social networks, nodes that are structurally equivalent are
likely to form tight-knit groups or cliques.
roles andpositions:
25 | P a g e
In the context of social network analysis, "roles" and "positions" refer to the
different patterns and positions that individuals or nodes may have within a
network. These concepts help us understand the various roles that nodes play and
how their positions influence their interactions and influence within the network.
Let's explore each term in more detail:
1. Roles:
Roles in social network analysis refer to the specific functions, behaviors, or
positions that individuals or nodes adopt within a network. Different nodes can
have different roles based on their relationships and interactions with other
nodes. Some common roles in social networks include:
a. Hubs: Nodes that have a high degree (many connections) and act as central
points in the network.
c. Isolates: Nodes that have no connections with other nodes in the network.
26 | P a g e
Understanding the roles of nodes in a social network is crucial for identifying key
players, understanding information flow, and analyzing the network's overall
structure and dynamics.
2. Positions:
Positions in social network analysis refer to the specific locations or relative
standings of individuals or nodes within the network. Positions are often
characterized by the nodes' connections, centrality, and relationships with others.
Some common positions include:
a. Centrality: Nodes with high centrality are located at the center of the network
and have a significant influence over other nodes.
b. Periphery: Nodes located at the outskirts of the network with relatively fewer
connections.
d. Cliques: Nodes that are part of tightly-knit groups within the network.
Understanding the positions of nodes within a social network helps identify the
power dynamics, flow of information, and potential vulnerabilities within the
network.
27 | P a g e
Both roles and positions provide valuable insights into the structure and dynamics
of social networks. Social network analysis techniques help researchers uncover
these patterns and understand how they impact information dissemination,
influence, and decision-making within the network.
28 | P a g e
UNIT-3
SYLLABUS:
Networktopologyanddiffusion,ContagioninNetworks,Complexcontagion,Percolatio
nandinformati on,NavigationinNetworksRevisited.
1. Network Topology:
Network topology is a key aspect of understanding the relationships and
interactions among nodes within a network. It describes the pattern of
connections and how nodes are linked to each other. Different types of network
topologies include:
29 | P a g e
c. Small-World Network: A network with a high level of local clustering (nodes
tending to form groups) and short average path lengths between any two nodes.
Small-world networks are characterized by a balance between local and global
connectivity, facilitating efficient information flow.
2. Diffusion in Networks:
Diffusion is the process by which information, ideas, behaviors, or influence
spreads through a network over time. It can be observed in various contexts, such
as the spread of news, innovations, opinions, rumors, or even diseases. The key
factors that influence diffusion in networks include:
a. Network Structure: The topology of the network affects the speed and extent
of diffusion. Certain network structures can facilitate rapid and widespread
diffusion, while others may limit it.
30 | P a g e
c. Homophily: Homophily is the tendency for nodes with similar attributes or
characteristics to be connected. Homophilous connections can accelerate the
diffusion of information within specific subgroups or communities.
Contagion in Networks:
Contagion in networks refers to the process of spreading a particular state,
behavior, or influence from one node (individual or entity) to its connected
neighbors within a network. This concept is widely studied in fields such as
epidemiology, social network analysis, economics, and information diffusion.
Contagion models help us understand how behaviors, ideas, innovations, diseases,
and other phenomena can propagate through interconnected networks.
31 | P a g e
1. Diffusion Process: Contagion involves a diffusion process, where an initial
"seed" node adopts a particular state or behavior and subsequently influences its
connected neighbors to do the same. The influence spreads iteratively through
the network, leading to a chain reaction of adoption.
5. Network Structure: The topology of the network greatly influences the speed
and extent of contagion. Certain network structures, like scale-free networks with
hubs, can lead to faster and more extensive spread compared to regular or
random networks.
32 | P a g e
Applications of contagion in networks are diverse:
- Social Influence: Contagion models help analyze how behaviors, opinions, and
trends spread through social media networks.
Complex contagion:
Complex contagion is a type of contagion process that occurs in networks where
the adoption of a behavior, idea, or innovation is influenced by multiple sources or
requires multiple exposures before adoption can occur. In contrast to simple
contagion, where a single exposure is enough to trigger adoption, complex
contagion requires a more intricate set of conditions or social reinforcement to
influence a node's decision to adopt the behavior.
33 | P a g e
Key characteristics of complex contagion include:
5. Critical Mass: The adoption of the behavior in complex contagion often depends
on reaching a critical mass or a certain number of adopters in the network. Once
the critical mass is achieved, adoption can spread more rapidly.
1. Percolation Theory:
Percolation theory originates from statistical physics and is used to study the
behavior of connected clusters in random networks. The main idea is to analyze
how connectivity emerges in networks as a function of the probability of edges
being present between nodes.
35 | P a g e
The percolation threshold is a critical probability value above which a giant
connected component (the largest cluster) emerges and spans a significant
portion of the network. Below the percolation threshold, only small isolated
clusters exist.
2. Information Percolation:
Information percolation is the application of percolation theory to the process of
information diffusion or spread of influence through a network. Instead of
considering the presence or absence of edges, the focus is on how information
spreads from node to node.
NavigationinNetworksRevisited:
Navigation in networks refers to the process by which individuals or entities move
through a network to find specific information, resources, or destinations. It
involves the traversal of nodes and edges within the network to reach a desired
target. The concept of navigation in networks is essential in various real-world
applications, including online social networks, transportation systems, the
internet, and biological networks.
2. Shortest Path Algorithms: In many scenarios, finding the shortest path between
two nodes is crucial for efficient navigation. Dijkstra's algorithm and the Bellman-
Ford algorithm are common methods to find the shortest path in weighted and
unweighted networks, respectively.
37 | P a g e
4. Routing in Communication Networks: In communication networks, navigation
involves routing data packets from a source node to a destination node. Various
routing algorithms, such as OSPF (Open Shortest Path First) and BGP (Border
Gateway Protocol), are used to ensure efficient data transfer.
38 | P a g e
UNIT-4
SYLLABUS:
Small world experiments, small world models, origins of small world, Heavy tails,
Small Diameter,Clusteringofconnectivity,TheErdosRenyiModel,ClusteringModels.
The main goal of Milgram's experiments was to investigate the degree of social
connectivity between individuals who are geographically distant from each other.
The experiments were designed to test the "six degrees of separation" hypothesis,
which suggests that any two people in the world are connected by, on average, six
intermediate acquaintances.
1. Experimental Design:
Milgram recruited participants from various cities in the United States, starting
with an initial group of volunteers who were randomly assigned to be "starters."
Each starter was given a target person (a specific individual) located in another
city, along with some information about the target person, such as their name,
occupation, and general location.
39 | P a g e
2. Chain of Letters:
The participants were instructed to forward a letter to someone they knew
personally and thought might be closer to the target person. The recipient of the
letter would then do the same, forwarding the letter to someone they knew, and
so on, until the letter reached the target person. Participants were not allowed to
use electronic communication (e.g., email, social media) and had to rely solely on
personal acquaintances.
3. Results:
Milgram found that, on average, the letters reached the target person in about six
steps, supporting the notion of six degrees of separation. This result was
surprising because it suggested that even in a vast country like the United States,
people were socially connected through relatively short chains of acquaintances.
4. Follow-Up Studies:
Milgram conducted variations of the experiment to explore factors influencing the
success of the chains, such as the size of the population, the familiarity of the
participants with the target person, and the nature of the relationship between
the participants and their acquaintances.
The small world experiments by Stanley Milgram significantly influenced the fields
of social psychology and network science. They provided empirical evidence of the
remarkable social connectedness among individuals and inspired further research
on social networks, information diffusion, and the dynamics of human
interactions.
While there have been some criticisms and controversies surrounding the
methodology and interpretation of the experiments, the concept of "six degrees
of separation" has become a widely recognized idea in popular culture and
continues to be a subject of study and fascination in social sciences.
40 | P a g e
small world models:
Small world models are mathematical and computational models used to study
the characteristics and properties of small world networks. These models aim to
replicate the small world phenomenon observed in real-world networks, where
individuals in a large population are connected through surprisingly short chains
of social acquaintances.
1. Short Average Path Length: In small world networks, the average distance or
path length between any two nodes (individuals) is relatively small, even in large
networks. This means that it takes only a few steps or acquaintances to connect
one node to another.
2. High Clustering Coefficient: Small world networks also exhibit a high clustering
coefficient, which means that nodes tend to form tightly-knit clusters or groups.
This clustering indicates that nodes' neighbors are often connected to each other
as well.
Two of the most well-known small world models are the Watts-Strogatz model
and the Barabási-Albert model:
1. Watts-Strogatz Model:
The Watts-Strogatz model, proposed by Duncan J. Watts and Steven H. Strogatz in
1998, is a model that can transform a regular network into a small world network.
The model starts with a ring lattice, where each node is connected to its k nearest
neighbors in a circular manner. Then, with a certain probability p, each edge is
rewired or randomly replaced with a connection to a randomly chosen node. The
rewiring process introduces random shortcuts in the network, reducing the
average path length while maintaining a significant level of clustering.
41 | P a g e
2. Barabási-Albert Model:
The Barabási-Albert model, proposed by Albert-László Barabási and Réka Albert in
1999, is a model that generates scale-free networks with small world properties.
In this model, nodes are added to the network one by one, and each new node
forms m connections (edges) to existing nodes. The probability of connecting to
an existing node is proportional to the node's degree (preferential attachment),
leading to the formation of hubs with a high number of connections. The presence
of hubs reduces the average path length and enhances the small world effect.
Small world models are widely used to study various real-world networks, such as
social networks, biological networks, transportation networks, and the internet.
They help researchers understand the underlying mechanisms that contribute to
the small world phenomenon and provide insights into network dynamics,
information diffusion, and the structure of complex systems.
1. Mathematical Origins:
The term "small world" was initially introduced in mathematics and social network
analysis. In 1959, the Hungarian author Frigyes Karinthy published a short story
titled "Chains" (or "Láncszemek" in Hungarian), where he proposed the idea of six
degrees of separation. Karinthy suggested that any two people in the world are
connected by, on average, a chain of six intermediate acquaintances. Though his
42 | P a g e
work was fictional, it laid the foundation for the concept of a small world in social
networks.
3. Watts-Strogatz Model:
In 1998, mathematicians Duncan J. Watts and Steven H. Strogatz proposed the
Watts-Strogatz model, a random graph model that demonstrated how a regular
network can be transformed into a small world network by introducing random
rewiring. This model provided a mathematical framework for understanding the
emergence of small world properties in networks and helped to further explore
the concept of a small world in various scientific fields.
4. Barabási-Albert Model:
In 1999, Albert-László Barabási and Réka Albert introduced the Barabási-Albert
model, which generates scale-free networks with small world properties. The
model showed that the preferential attachment mechanism, where new nodes
preferentially link to well-connected nodes, can lead to the formation of hubs and
short average path lengths in evolving networks.
43 | P a g e
phenomenon in social networks, communication networks, biological networks,
and other complex systems. The concept of a small world has become a
fundamental concept in network science and has been widely studied in various
fields of research.
Heavy tails:
In statistics and probability theory, heavy tails refer to the property of a
probability distribution having a higher probability of extreme or rare events than
would be expected in a normal or exponential distribution. The presence of heavy
tails means that the distribution exhibits a slower decay rate in its tail region,
leading to a greater likelihood of observing extreme values compared to a
distribution with lighter tails.
The term "tail" in this context refers to the extreme values of a distribution, those
that lie further from the center (mean or median) of the distribution. A heavy-
tailed distribution has more data points in its tail compared to lighter-tailed
distributions like the normal (Gaussian) distribution.
1. Financial Markets: Asset prices in financial markets are known to exhibit heavy
tails, leading to the occurrence of extreme events or market crashes that are not
adequately predicted by traditional models assuming normal distributions.
44 | P a g e
3. Social Networks: In some social networks, the distribution of the number of
connections (degree distribution) among nodes follows a heavy-tailed pattern,
with a few nodes having an exceptionally large number of connections.
It is essential to recognize the presence of heavy tails in data because they can
significantly impact risk management, decision-making, and the performance of
models based on assumptions of normality. Traditional statistical methods
designed for normal distributions may underestimate the probabilities of extreme
events when applied to data with heavy tails. Researchers and practitioners often
use specialized techniques, such as extreme value theory and heavy-tailed
modeling, to analyze and account for the behavior of heavy-tailed distributions.
Small Diameter:
In the context of network theory, the "small diameter" refers to the property of a
network having short average path lengths between pairs of nodes. The diameter
of a network is defined as the maximum distance (number of edges) between any
two nodes in the network. A small diameter means that the average shortest path
between nodes in the network is relatively short.
45 | P a g e
Networks with small diameters are advantageous because they allow for efficient
communication and information transfer between nodes. In social networks, a
small diameter means that individuals can reach each other through relatively few
intermediaries, supporting the concept of "six degrees of separation."
1. Small World Networks: Small world networks are characterized by short average
path lengths, high clustering coefficients (indicating the presence of tightly-knit
groups), and a tendency to have a few well-connected nodes (hubs). Examples of
small world networks include social networks, where individuals are connected to
others through relatively short chains of acquaintances.
Network designers and engineers strive to create networks with small diameters
to improve efficiency, resilience, and robustness. Achieving a small diameter often
involves considering factors such as network topology, routing algorithms, and the
placement of key nodes or hubs. However, in some cases, achieving a small
46 | P a g e
diameter may come at the expense of increased network construction or
maintenance costs.
Clustering of connectivity:
Clustering of connectivity, also known as network clustering or clustering
coefficient, is a measure that quantifies the extent to which nodes in a network
tend to form clusters or tightly-knit groups. In other words, it measures the degree
to which the neighbors of a node are connected to each other. Clustering of
connectivity is an important property in the study of social networks, biological
networks, communication networks, and many other types of networks.
There are two main types of clustering coefficients commonly used in network
analysis:
47 | P a g e
A high global clustering coefficient indicates that the network has many closed
triangles and is highly clustered, implying that nodes tend to form groups with
strong connections among their neighbors.
Clustering of connectivity plays a crucial role in network analysis and has various
implications:
48 | P a g e
3. Robustness: Networks with high clustering coefficients tend to be more robust
against random node failures, as the existence of clusters provides redundancy in
connectivity.
49 | P a g e
2. Edge Probability "p": The parameter "p" determines the probability of including
an edge between any two nodes. It is typically a constant value between 0 and 1.
Higher values of "p" result in a higher likelihood of edges being present, leading to
denser graphs.
3. Expected Degree: The expected degree of each node in the graph can be
calculated as "E(d) = (n-1) * p." This represents the average number of edges
incident to each node in the random graph.
4. Sparse and Dense Graphs: The ER model can generate both sparse graphs (with
relatively few edges) and dense graphs (with many edges) depending on the value
of "p." For small values of "p," the graph tends to be sparse, while for larger values
of "p," the graph becomes denser.
The Erdős-Rényi model has been widely used in the study of random networks
and as a benchmark for comparing real-world networks. It provides insights into
the properties and behaviors of random graphs and serves as a foundation for
more complex network models. However, it should be noted that the ER model
may not capture all the characteristics observed in real-world networks, such as
the presence of high clustering or degree distributions with heavy tails, which are
common in many real networks.
Clustering Models:
50 | P a g e
Clustering models are algorithms and techniques used to group data points or
objects into clusters based on their similarities. The goal of clustering is to identify
groups of data points that are more similar to each other within the same cluster
than to data points in other clusters. Clustering is a fundamental task in
unsupervised machine learning and has various applications in data analysis,
pattern recognition, image segmentation, and more. Several popular clustering
models include:
1. K-Means Clustering:
K-means is one of the most widely used and straightforward clustering algorithms.
It aims to partition data into "k" clusters, where "k" is a user-defined parameter.
The algorithm iteratively assigns data points to the nearest cluster centroid and
updates the centroids based on the mean of the data points in each cluster. The
process continues until convergence, and the clusters' centers represent the final
clustering solution.
2. Hierarchical Clustering:
Hierarchical clustering builds a tree-like structure of clusters, known as a
dendrogram, by iteratively merging or splitting clusters based on similarity. There
are two main approaches to hierarchical clustering: agglomerative, which starts
with individual data points as clusters and merges them, and divisive, which starts
with all data points as a single cluster and recursively splits them into smaller
clusters.
51 | P a g e
4. Gaussian Mixture Model (GMM):
GMM is a probabilistic model used for clustering. It assumes that the data is
generated by a mixture of multiple Gaussian distributions. The model identifies
clusters by estimating the parameters (mean and covariance) of the Gaussian
distributions. GMM can capture clusters with different shapes and can be used for
soft clustering, where data points can belong to multiple clusters with different
probabilities.
5. Spectral Clustering:
Spectral clustering is based on the graph theory approach and works by
transforming the data into a lower-dimensional space using eigenvectors of a
similarity matrix. The data points are then clustered using standard clustering
algorithms (e.g., K-means) in the reduced space. Spectral clustering is effective for
capturing complex structures and non-convex clusters.
6. Affinity Propagation:
Affinity Propagation is a message-passing-based clustering algorithm that does not
require the number of clusters as an input. It uses a similarity matrix to determine
exemplars (representative data points) that best represent the clusters. Data
points are assigned to exemplars based on message-passing and affinity values.
These are just a few examples of clustering models, and there are many other
clustering techniques and variations available. The choice of clustering model
depends on the specific data and the problem at hand. It is important to consider
the nature of the data, the desired number of clusters, the presence of noise, and
the interpretability of the clustering results when selecting a suitable clustering
model.
52 | P a g e
UNIT-5
SYLLABUS:
Network structure -Important vertices and page rank algorithm, towards rational
dynamics in networks,basics of game theory, Coloring and consensus, biased
voting, network formation games,
networkstructureandequilibrium,behavioralexperiments,Spatialandagent-
basedmodels.
53 | P a g e
- Degree Centrality: The number of edges (connections) incident to a node. Nodes
with high degree centrality are well-connected and considered influential in the
network.
- Closeness Centrality: The inverse of the sum of the shortest path distances from
a node to all other nodes in the network. Nodes with high closeness centrality are
close to many other nodes in the network.
2. PageRank Algorithm:
PageRank is an algorithm developed by Larry Page and Sergey Brin, the co-
founders of Google, to rank web pages in search engine results based on their
importance and relevance. The PageRank algorithm assigns a numerical score to
each web page, representing its importance in the web graph.
54 | P a g e
PageRank has various applications, such as ranking websites in web search,
identifying influential users in social networks, recommending relevant content,
and analyzing the structure of complex networks.
55 | P a g e
2. Payoff and Utility: In game theory, each player's objective is described by a
utility function, also known as a payoff function. The utility function represents
the player's preferences and quantifies the benefits or costs associated with
different outcomes.
Game theory is a branch of mathematics and economics that deals with the study
of strategic decision-making in situations where the outcome of one player's
actions depends on the actions of others. It provides a formal framework to model
interactions between rational decision-makers, called players, and analyze their
strategies and outcomes.
1. Players: Players are the individuals or entities involved in the game, each with
their own set of strategies and preferences.
56 | P a g e
2. Strategies: Strategies represent the choices available to each player. Players
select strategies to maximize their utility or payoff.
57 | P a g e
Graph coloring is a fundamental concept in graph theory, where the objective is to
assign colors to the vertices (nodes) of a graph in such a way that no two adjacent
vertices share the same color. The minimum number of colors required to color
the graph without any adjacent vertices having the same color is known as the
chromatic number of the graph.
In graph coloring, the goal is to label the vertices of the graph with colors, subject
to the constraint that adjacent vertices must have different colors. Graph coloring
has various applications, including scheduling problems, register allocation in
computer programming, and frequency assignment in wireless communication.
- Different algorithms and heuristics are used to find a valid coloring of a graph
with the minimum number of colors.
- Planar graphs, which can be drawn on a plane without any edge crossings, have a
special property known as the Four-Color Theorem, which states that they can be
colored with at most four colors.
58 | P a g e
In distributed systems and computer networks, consensus refers to the process by
which a group of nodes or agents collectively agree on a common value or
decision. In a distributed setting, each node has its own information and makes
local decisions based on its data and the data received from other nodes.
The goal of the consensus problem is to ensure that all nodes eventually converge
to the same decision or value, even in the presence of faulty or malicious nodes
and communication delays. Achieving consensus in a distributed system is
essential for coordinated decision-making and the integrity of the system.
Consensus algorithms, such as the Paxos algorithm and the Raft algorithm, are
widely used to solve the consensus problem in distributed systems. These
algorithms provide protocols and rules that allow the nodes to communicate and
reach an agreement on a shared value or decision.
59 | P a g e
In summary, coloring in graph theory focuses on assigning colors to graph vertices
while ensuring certain constraints, while consensus in distributed systems deals
with achieving agreement among distributed nodes on a common value or
decision. Both concepts are important in their respective domains and have
applications in a wide range of fields.
Biased voting systems can be found in various contexts, including political systems,
corporate governance, and decision-making in organizations. For example:
- In some political systems, certain individuals or groups may have more voting
power based on factors such as wealth, education, or social status.
60 | P a g e
The introduction of biases in voting can have significant implications for the
outcomes of the decision-making process. It may lead to the concentration of
power in the hands of a few individuals or groups, potentially affecting the
fairness and representativeness of the decisions made.
- Players (Nodes): The agents or players in the game represent nodes in the
network.
- Strategies: Each player has a set of strategies that represent different possible
links or connections they can form with other players.
- Payoff Function: Each player has a payoff function that quantifies the utility or
benefit the player gains based on its chosen strategies and the strategies of other
players.
61 | P a g e
- Rationality: Players are assumed to be rational and aim to maximize their
individual payoffs.
Network formation games are widely used to study social and economic
interactions in networks, such as the formation of social networks, collaboration
networks, transportation networks, and communication networks. They provide
insights into the emergent properties of network structures resulting from the
decisions made by individual agents.
Overall, biased voting and network formation games are two important concepts
in decision-making and network analysis that explore how individual behavior and
strategic interactions can shape the structure and dynamics of networks.
1. Network Structure:
The network structure refers to the arrangement of nodes and edges in a network.
It describes the pattern of connections and interactions between individual
62 | P a g e
elements within the network. The structure of a network plays a crucial role in
determining how information, resources, or influence flow within the system.
2. Equilibrium in Networks:
Equilibrium in networks, particularly in the context of game theory, refers to a
state where the strategic choices made by individual agents (nodes) lead to a
stable configuration of connections (edges) in the network. In equilibrium, no
agent has an incentive to unilaterally change its strategy or connection, given the
strategies of other agents.
In game theory, various equilibrium concepts are used to analyze the stability of
network structures resulting from the interactions between nodes. The most
common equilibrium concept is the Nash equilibrium, where each node's strategy
is optimal given the strategies of the other nodes.
63 | P a g e
3. Relation Between Network Structure and Equilibrium:
The relationship between network structure and equilibrium is intricate and
bidirectional. The equilibrium strategies chosen by nodes affect the network
structure, while the network structure influences the stability of the equilibrium.
Certain network structures can lead to multiple equilibria, while others may have
unique equilibria. The presence of hubs (nodes with many connections) in a
network, for example, can lead to multiple equilibria, as agents may choose to
connect to the hub or to each other.
Behavioral experiments:
Behavioral experiments, also known as psychological experiments, are research
studies conducted to observe and analyze human or animal behavior in controlled
settings. These experiments are designed to test hypotheses, investigate causal
relationships, and gain insights into the cognitive, emotional, and social processes
that influence behavior.
64 | P a g e
Key features of behavioral experiments include:
65 | P a g e
6. Quantitative Data Analysis: Behavioral experiments typically generate
quantitative data, which is analyzed using statistical methods to determine the
statistical significance of the results and draw conclusions.
Spatialandagent-basedmodels:
Spatial and agent-based models are two types of computational modeling
techniques used to study complex systems, such as ecological systems, social
networks, traffic patterns, and the spread of diseases. Both types of models
incorporate spatial considerations, but they differ in how they represent and
simulate the behavior of individual entities within a system.
1. Spatial Models:
66 | P a g e
Spatial models focus on the spatial distribution and interactions of entities within
a system. These models represent the geographical or physical location of objects
and how they relate to each other based on their positions in space. Spatial
models are widely used in geography, ecology, urban planning, and other fields to
understand the spatial patterns and processes in real-world systems.
- Cellular Automata: A grid-based model where each cell can be in different states,
and the states of neighboring cells influence each other.
- Landscape Ecology Models: Models that study the ecological processes and
patterns in landscapes.
67 | P a g e
decision-making processes. The agents move and interact in the environment
based on their individual attributes and responses to the local conditions and
interactions with other agents.
- Epidemic Spread: Models that simulate the spread of infectious diseases through
interactions between individuals.
69 | P a g e