GBUS515 –Business Intelligence and Information Systems
Chapter 19 – Social Network Analytics
Instructor – Dr. Sunita Goel
Adapted from Shmueli, Bruce & Patel, Data Mining for Business Analytics, 3e
© Galit Shmueli and Peter Bruce 2010
2000’s: The advent of businesses
based on social network data
• Early movers: Friendster, Myspace
• Later: Facebook, LinkedIn, Twitter, Tumblr,
Instagram, Yelp, TripAdvisor, …
• Common element: Data on their network of
users is the main business asset
An undirected network (e.g. who is friends with whom on
Facebook)
Edge (link, connection)
Node (vertex)
A directed network (e.g. who follows whom on Twitter)
Edge weight (thickness) reflects strength of relationship (number of
communications, value or number of transactions, etc.)
x-y coordinates are not meaningful – these two graphs convey the
same information:
=
Principles of graph layout:
•Every node should be visible
•For every node, you should be able to
count its degree
•For every link, you should be able to
follow it from source to destination
•Clusters and outliers should be
identifiable
Adjacency list and matrix
Adjacency list, first few
rows, Twitter graph Adjacency matrix, Twitter graph
Origin
Destination
Dave Peter
Dave Jenny
Dave John
Peter Jenny
Peter Sam
Terms
Edge weight: strength of relationship
Path (and path length): route from one node to another
Connected network: each node has a path to all others
Clique: each node directly connected by single edge to
each other
Singleton: unconnected node
Degree: number of connections to a node
Node Metrics: How important is a
node?
Centrality: average path length to
other nodes
Betweenness: extent to which a
node is on the shortest path between
other nodes
Eigenvector centrality: links to Egocentric
network: The
other highly connected nodes are
network around a
weighted more single node
Network metrics: Describing the
network as a whole
Degree distribution: Distribution of # of connections per node
Density: Ratio of # of edges to maximum possible # of edges
Dense network Sparse network
Link Prediction: Predicting the next link to
form in a network
1. For each node, score similarity to all other nodes
2. Traditional predictive model variables could be used (e.g.
demographic info) to calculate similarity [see nearest-
neighbor methods]
3. Network metrics (shortest path, etc.) can also be used
4. The unlinked pair with highest similarity score is predicted
next link
Similar methods can be used in entity
resolution (is “x” the same person as “y”)?
A new customer orders online: is the person already in the customer
database?
A cell phone number is linked to known terrorists: is its owner the
same person as a suspect known to a partner intelligence service?
Chapter Exercises
(Updated in Canvas)