U N I T- 6
G R A P H A N A LY T I C S A N D D ATA
V I S U A L I Z AT I O N
Prepared By:
Aayushi Chaudhari,
Assistant Professor, CE, CSPIT,
CHARUSAT
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 1
Agenda
• What is data hbmn ?
• Benefits of using data visualization
• Why is it required?
• Its benefits and why is it required?
• Apache Spark GraphX: Property Graph
• Graph Operator
• SubGraph, Triplet
• Neo4j: Modeling data with Neo4j
• Cypher
• Query Language: General clauses
• Read and Write clauses.
• Big Data Visualization with Power BI
• Apache Super-Set
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 2
What is data visualization?
• Data visualization is the practice of translating information into a visual
context, such as a map or graph, to make data easier for the human brain
to understand and pull insights from.
• The main goal of data visualization is to make it easier to identify patterns,
trends and outliers in large data sets.
• The term is often used interchangeably with others, including information
graphics, information visualization and statistical graphics.
• Data visualization is one of the steps of the data science process, which states
that after data has been collected, processed and modeled, it must be
visualized for conclusions to be made.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 3
What is data visualization? Cont..
• Data visualization is important for almost every career.
• It can be used by teachers to display student test results, by computer
scientists exploring advancements in artificial intelligence (AI) or by
executives looking to share information with stakeholders.
• It also plays an important role in big data projects.
• As businesses accumulated massive collections of data during the early years
of the big data trend, they needed a way to quickly and easily get an
overview of their data.
• Visualization tools were a natural fit.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 4
Need of Data Visualization
• When a data scientist is writing advanced predictive analytics or machine
learning (ML) algorithms, it becomes important to visualize the outputs
to monitor results and ensure that models are performing as intended.
• This is because visualizations of complex algorithms are generally easier
to interpret than numerical outputs.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 5
Example
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 6
Importance of Data Visualization
• Data visualization provides a quick and effective way to communicate information in a
universal manner using visual information.
• The practice can also help businesses identify which factors affect customer behavior;
pinpoint areas that need to be improved or need more attention; make data more
memorable for stakeholders; understand when and where to place specific products; and
predict sales volumes.
• It has ability to absorb information quickly, improve insights and make faster decisions.
• It provides an increased understanding of the next steps that must be taken to improve the
organization.
• Provides an improved ability to maintain the audience's interest with information they can
understand.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 7
Importance of Data Visualization cont..
• Provides an easy distribution of information that increases the
opportunity to share insights with everyone involved.
• It eliminates the need for data scientists since data is more
accessible and understandable.
• Provides an increased ability to act on findings quickly and,
therefore, achieve success with greater speed and less mistakes.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 8
Data Visualization for Big data
• Data analysis projects have made visualization more important than ever.
• Companies are increasingly using machine learning to gather massive amounts of data that
can be difficult and slow to sort through, comprehend and explain.
• Visualization offers a means to speed this up and present information to business owners
and stakeholders in ways they can understand.
• Big data visualization often goes beyond the typical techniques used in normal
visualization, such as pie charts, histograms and corporate graphs.
• It instead uses more complex representations, such as heat maps and fever charts.
• Big data visualization requires powerful computer systems to collect raw data, process it
and turn it into graphical representations that humans can use to quickly draw insights.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 9
Needs of Organizations to use Data Visualization
Visualization specialist is required for organization, who can apply appropriate data set and
visual styles so that, it guarantees that the organization are optimizing the use of the data.
Involvement of IT specialist is required as organization would need powerful computer
hardware, efficient storage systems and even a move to the cloud.
Quality of data to be used needs to accurate and should be in control of governing person.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 10
Example of Various Visualization Styles
In the early days of visualization, the most common visualization technique was using a
Microsoft Excel, spreadsheet to transform the information into a table, bar graph or pie
chart. While these visualization methods are still commonly used, more intricate
techniques are now available, including the following:
infographics
bubble clouds
bullet graphs
heat maps
fever charts
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 11
Example of Infographics
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 12
Example of bubble clouds
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 13
Example of Bullet chart
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 14
Example of heat map
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 15
Fever chart example
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 16
Apache Spark GraphX
• GraphX is the graph processing library, built in Apache Spark.
• It makes use of Property Graph and Spark RDD.
• GraphX is the hybrid technology, that combines two components, data
parallel systems, such as Hadoop and spark, which focus on distributed
data across multiple nodes.
• Graph-parallel systems such as pregel, Graph lab, Giraph, efficiently
execute graph algorithms through partitioning and distributing
techniques.
• GraphX will unify data parallel and Graph parallel approach.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 17
Table View v/s Graph view
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 18
Data parallel v/s Graph parallel
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 19
GraphX
• GraphX is the collection of graph that extends the Spark
RDD(Resilient Distributed Database) class, which is an
immutable distributed collection of objects.
• Basically there are two types of graphs:
• Directed Graph: Edges have direction associated with the graph.
• Regular Graph: Graph where each vertex has same number of
edges.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 20
GraphX property graph
• It is a directed multigraph which has multiple edges in a
parallel.
• Every edge and vertex has user defined properties
associated with it.
• The parallel edges allow multiple relationships between
the same vertices.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 21
Example of Property Graph
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 22
Example
In this scenario, we will analyze three flights, information for the same is given in table below:
• Airport will act as vertices
• Routes will act as edges
• For vertices, each of them have an ID and Airport Name as a property.
ID Airport Name SrcID DestID Distance
1 Ahmedabad 1 2 263.3
2 Surat 2 3 279.4
3 Mumbai 3 1 524.2
Table for Routes and Distances Vertex Table for Airports Edges Table for Routes
ID - Long and Airport Name - String SrcID, DestID and Distance - Long
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 23
Graph Operator
• Big data comes in different shapes and sizes. It can be batch data that needs to be
processed offline, processing large set of records and generating the results and insights
at a later time.
• Or the data can be real-time streams which needs to be processed on the fly and create
the data insights almost instantaneously.
• Apache Spark can be used for processing batch (Spark Core) as well as real-time data
(Spark Streaming).
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 24
Graph Operator
GraphX makes it easier to run analytics on graph data with the built-in operators and
algorithms.
It also allows us to cache and uncache the graph data to avoid recomputation when we
need to call a graph multiple times.
Basically, there are four types of graph operators:
1. Basic
2. Property
3. Structural
4. Join
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 25
Types of Graphs
Graph Operators
Basic Join
numEdges joinVertices
numVertices Property Structural outerJoinVertices
inDegress mapVertices reverse
outDegress mapEdges subgraph
degrees mapTriplets mask
groupEdges
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 26
Thank You.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 27