Property Graph Databases
2016-11-08 Dr. Asja Fischer, Prof. Jens Lehmann
Motivation
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 2
Trends in Data
Trend 1: Big Data
The amount of information that we generate and transfer has increased
dramatically in the past few years.
1Source: IDCs Digital Universe Study, sponsored by EMC, December 2012
http://www.emc.com/collateral/analyst- reports/idc- the- digital- universe- in- 2020.pdf
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 3
Trends in Data
Trend 2: Connectedness
Data is evolving to be more interlinked and connected. Hypertext has links, blogs
have pingback, tagging groups all related data.
2 Figure taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 4
Why Relational Databases are not Enough
(Provocative) claim: Relational databases are not good at handling relationships!
3 Table taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 5
Why Relational Databases are not Enough
(Provocative) claim: Relational databases are not good at handling relationships!
In this social network database, it is easy to find people Bob considers his friends:
SELECT p1 . Person
FROM Person p1 JOIN PersonFriend
ON PersonFriend . FriendID = p1 . ID
JOIN Person p2
ON PersonFriend . PersonID = p2 . ID
WHERE p2 . Person = Bob
Result:
3 Table taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 5
Why Relational Databases are not Enough
(Provocative) claim: Relational databases are not good at handling relationships!
In this social network database, it is easy to find people Bob considers his friends:
SELECT p1 . Person
FROM Person p1 JOIN PersonFriend
ON PersonFriend . FriendID = p1 . ID
JOIN Person p2
ON PersonFriend . PersonID = p2 . ID
WHERE p2 . Person = Bob
Result: Alice and Zach
3 Table taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 5
Why Relational Databases are not Enough
B Sad but true: friendship is not always symmetric!
B What if we want to know who is friends with Bob?
SELECT p1 . Person
FROM Person p1 JOIN PersonFriend
ON PersonFriend . PersonID = p1 . ID
JOIN Person p2
ON PersonFriend . FriendID = p2 . ID
WHERE p2 . Person = Bob
B User side: still easy to implement
B Database side: Database has to consider all the rows in the PersonFriend
table more expensive!
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 6
Why Relational Databases are not Enough?
What if we query deeper into the social network?
SELECT p1 . Person AS PERSON , p2 . Person AS F RI E N D _ O F _ F R I E N D
FROM PersonFriend pf1 JOIN Person p1
ON pf1 . PersonID = p1 . ID
JOIN PersonFriend pf2
ON pf2 . PersonID = pf1 . FriendID
JOIN Person p2
ON pf2 . FriendID = p2 . ID
WHERE p1 . Person = Alice AND pf2 . FriendID <> p1 . ID
B Uses recursive joins increase in syntactic, computational, and space
complexity of the query
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 7
Why Relational Databases are not Enough
Experimental results: Given any two persons chosen at random, is there a path
that connects them that is at most 5 relationships long?
Depth RDBMS execution time (s) Neo4j execution time (s) Records returned
2 0.016 0.01 2500
3 30.267 0.168 110,000
4 1543.505 1.359 600,000
5 Unfinished 2.132 800,000
B Comparing relational store and a popular graph database Neo4j
B Social network consisting of 1,000,000 people
B Each person with approximately 50 friends
Conclusion: Graph databases outperform relational databases when we are
dealing with connected data!
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 8
Relational Databases: What exactly is the Problem?
B Relationships exist only between tables problematic for highly connected
domains
B Relationship traversal can become very expensive
B Flat, disconnected data structures - data processing and relationship
construction happens in the application layer
B Bound by a previously defined schema
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 9
Graph Databases using the Labeled Property
Graph Model
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 10
Graph Databases: Advantages
B Explicit graph structure: semantic dependencies between entities are made
explicit
B New nodes and new relationships can be easily added into the database
without having to migrate data or restructure the existing network
B Relationships correspond to paths; querying the database = traversing the
graph this makes certain types of queries simpler
B Schema-free
B Index-free adjacency
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 11
Index-Free Adjacency
B Each node knows (i.e. directly points to) its adjacent nodes - index-free
adjacency
B No explicit index each node acts as micro-index of nearby nodes much
cheaper than global indices
B Because of this, query times are less/not dependent of the size of the graph
they depend only on the part of the graph that has been searched
B Precomputes and stores bidirectional joins as relationships (i.e. no need to
compute them)
B Enables bidirectional traversal of the database: we can traverse both
incoming and outgoing relationships
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 12
Graph Databases: When to Use
When to use:
B Complex, highly-connected data
B Dynamic systems with topology that is difficult to predict
B Dynamic requirements that change over time
B Cases where relationships in data are themselves meaningful
When not to use:
B Large offline batch analysis tasks of static data
B Simple, unconnected data structures
B For certain types of queries:
Graph fishing (no small set of start nodes)
Bulk Scans (generic queries not applying to any indexed subset)
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 13
Labeled Property Graph Model
Building blocks of a Labeled Property Graph:
B Nodes
Can be tagged with one or more labels
Can contain properties
B Relationships:
Connect nodes and structure the graph
Directed
Always have a single name, a start node and an end node no
dangling relationships
Can have properties
B Properties (= key-value pairs)
B Labels:
Group nodes together
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 14
Properties
Properties are named values:
B Key is the name of the property
Always a string
B Values can be:
Numeric values
String values
Boolean values
Lists of any other type of value
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 15
Labels
B Used to assign types to nodes
B Groups nodes into sets: all nodes with the
same label belong to the same set
B Database queries can be constrained to these
sets instead of the whole graph more effi-
cient queries that are easier to write
B Each node has any number of labels (includ-
ing none)
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 16
Nodes
B Together with relationships, fundamen-
tal unit of property graph model
the simplest possible graph is a single
node
B Are often used to represent entities
B Can have zero or more properties
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 17
Relationships
B Organize the nodes by connect-
ing them
B Always connects a start node to
the end node
B A key part of a graph database:
allow us to find related data
B Always has a direction
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 18
Paths
B One or more nodes with connecting
relationships
B Typically is a result of a query or a
traversal
B Length of a path = number of rela-
tionships on that path
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 19
Labeled Property Graph Model: Example
3 Illustration taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 20
Labeled Property Graph Model: Example
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 21
Labeled Property Graph Model: Example
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 22
Labeled Property Graph Model: Example
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 23
Labeled Property Graph Model: Example
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 24
Creating Graph Databases vs. Creating
Relational Databases
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 25
Creating a Relational Database
Step 1: Acquire and improve understanding of data: a whiteboard sketch step
3 Diagram taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 26
Creating a Relational Database
Step 2: Construct an entity-relations (E-R) diagram
Note the complexity of the model before any data has even been added
3 E-R diagram taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 27
Creating a Relational Database
Step 3: Map the E-R diagram into tables and relations and normalize the data
3 Diagram taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 28
Creating a Graph Database
B What you sketch on the whiteboard = what you store in the database
B No normalization, denormalization or conversion to tables-relations structure
necessary
B No conceptual-relational dissonance: the physical layout of the data is same
as it is conceptualized
B Domain modeling = further graph modeling:
Makes sure that every node has the appropriate properties
Ensures that every node is in correct semantic context
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 29
Creating a Graph Database
3 Image taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 30
Graph DBs: Good Design Principles
B Important to review and evaluate the domain model to make sure it is
suitable for answering most likely queries.
B Ensures that any later changes are driven by changes in application
requirements and not by the need to mitigate bad design decisions
B There are two techniques to do this:
Check that the graph reads well
B Pick a node
B Follow relationships to other nodes, reading each nodes role and
each relationships name as you go
B This should form sensible sentences
Design for queryability
B Understand end-users goals: understand the use cases
B Try to craft sample queries for the use cases
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 31
Design Principles: Reading the Graph
App 1 runs on VM 10, which is hosted by Server 1 in Rack 1.
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 32
NEO4J & Other Graph Databases
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 33
Existing Graph Databases
Source: OReilly Graph Databases [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 34
Neo4j
B Probably the most popular graph database
today
B Based on a schema-free labeled property
graph model
https://neo4j.com/ B Scales to billions of nodes, relationships and
properties
Consists of:
B Nodes, Relationships, Properties, Labels, Paths, Traversal, and Schema
(index and constraints)
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 35
Neo4j: Learn more!
B Open-source
B Extensive support and learning material
B Check out https://neo4j.com/developer/ for further resources (including a
sandbox for testing!)
B International events and meetup groups
(Neo4j)-[:LOVES]-(Developers)
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 36
Cypher
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 37
Cypher: Pattern Matching
A pattern matching query language for graphs
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 38
Cypher: Pattern Matching
A pattern matching query language for graphs
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 39
Cypher: Pattern Matching
A pattern matching query language for graphs
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 40
Cypher: ASCII Art
Uses ASCII art representation
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 41
Cypher: ASCII Art
Directed relationship
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 42
Cypher: ASCII Art
Undirected relationship
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 43
Cypher: ASCII Art
Specific relationships
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 44
Cypher: ASCII Art
Joined paths
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 45
Cypher: ASCII Art
Multiple paths
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 46
Cypher: Start & Return clauses
B START specifies starting point(s) in the graph
Note : these can be both nodes or relationships
B START is done via index lookup or directly using relationship IDs
B RETURN specifies nodes, relationships, and properties that should be
returned
START - RETURN example
START n = node(0) RETURN n
reads lookup node with id 0; return that node
START n=node:Person(name=Andreas) RETURN n
reads lookup node in index Person with a name property whose value is
Andreas; return that node
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 47
Cypher: MATCH Clause
B Specification by example: draw data we are looking for
B Used to define a pattern of nodes and relationships that we want to find
MATCH - example
START a=node:user(name = Michael)
MATCH (a)-[:KNOWS] >(b)-[:KNOWS] >(c), (a)-[:KNOWS] >(c)
RETURN b, c
Reads: Find user with name Michael and starting from that node in the
graph, find nodes b and c that fit the pattern specified by MATCH example.
Return those two nodes.
3 Example from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 48
Cypher: Return Options
B Cypher also provides various options to enrich the returned results
B They include options to aggregate, order, filter, and limit the returned data
B Example: count allows us to return only the number of matched instances
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 49
RETURN Options - Example
RETURN Options - example
MATCH (theater:Venue {name:Theatre Royal}),
(bard:Author lastname:Shakespeare),
(theater)< [:VENUE]-()-[p:PERFORMANCE OF] >()
-[:PRODUCTION OF] >(play)< [:WROTE PLAY]-(bard)
RETURN play.title AS play, count(p) AS performance count
ORDER BY performance count DESC
This query counts the number of PERFORMANCE OF relationships using the
identifier p (which is bound to the PERFORMANCE OF relationships in the
MATCH clause). It saves the result as performance count. The results are then
ordered (using the clause ORDER BY) by the count in decreasing order. The
query then returns this result.
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 50
RETURN Options - Example
RETURN Options - example
MATCH (theater:Venue {name:Theatre Royal}),
(bard:Author lastname:Shakespeare),
(theater)< [:VENUE]-()-[p:PERFORMANCE OF] >()
-[:PRODUCTION OF] >(play)< [:WROTE PLAY]-(bard)
RETURN play.title AS play, count(p) AS performance count
ORDER BY performance count DESC
LIMIT 1
Analogously, adding LIMIT clause tells this query to limit the results to the first
instance
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 51
Cypher: WHERE Clause
B Constrains graph matches by eliminating matched subgraphs by requiring
one or more of the following:
That certain paths must be present (or absent) in the matched
subgraphs
That nodes must have certain labels or relationships with certain names
That specific properties on matched nodes and relationships must be
present (or absent)
That certain properties on matched nodes and relationships must have
specific values
That other predicates must be satisfied (e.g., that performances must
have occurred on or before a certain date)
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 52
WHERE - Example
For example, we can query specifically for Shakespeare plays that were written
after 1608
WHERE - example
MATCH (bard:Author {lastname:Shakespeare}),
(play)< [w:WROTE PLAY]-(bard)
WHERE w.year > 1608
RETURN DISTINCT play.title AS play
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 53
Other Clauses
Cypher supports a variety of clauses:
B CREATE and CREATE UNIQUE
B DELETE
B SET
B FOREACH
B UNION
B WITH
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 54
Cypher - Design for Queryability Example
Remember the design for queryability design goal?
Goal: Design a query to find the cause behind an unresponsive application or
service in our example graph.
Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 55
Cypher - Design for Queryability Example
The sample query we would need to define is:
Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 56
Cypher - Design for Queryability Example
The sample query we would need to define is:
Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset
B Describes a variable length path between one and five relationships long
B There is no colon or relationship name between the square brackets the
relationships are unnamed
B There are no arrow-tips relationships are undirected
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 56
Cypher - Design for Queryability Example
The sample query we would need to define is:
Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 57
Cypher - Design for Queryability Example
The sample query we would need to define is:
Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset
B We start with the user who reported a problem
B We add asset nodes that have a status property with a value of down
B Nodes which do not have a status property will not be added to the results
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 57
Cypher - Design for Queryability Example
The sample query we would need to define is:
Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 58
Cypher - Design for Queryability Example
The sample query we would need to define is:
Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset
B Ensures that unique assets are returned in the results, no matter how many
times they are matched
The sample query we would need to define is:
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 58
Sample Queries
Example 1
MATCH (p:Product)
RETURN p.productName, p.unitPrice
ORDER BY p.unitPrice DESC
LIMIT 10;
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 59
Sample Queries
Example 1
MATCH (p:Product)
RETURN p.productName, p.unitPrice
ORDER BY p.unitPrice DESC
LIMIT 10;
B Returns only a subset of attributes, in this case: ProductName and UnitPrice
B Orders by price
B Returns 10 most expensive items
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 59
Sample Queries
Example 2
MATCH (p:Product {productName:Chocolade})
RETURN p.productName, p.unitPrice;
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 60
Sample Queries
Example 2
MATCH (p:Product {productName:Chocolade})
RETURN p.productName, p.unitPrice;
B A shortcut for MATCH (p:Product) WHERE p.productName =
Chocolade
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 60
Sample Queries
Example 3
MATCH (p:Product {productName:Chocolade})< [:PRODUCT]-
(:Order)< [:PURCHASED]-(c:Customer)
RETURN distinct c.companyName;
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 61
Sample Queries
Example 3
MATCH (p:Product {productName:Chocolade})< [:PRODUCT]-
(:Order)< [:PURCHASED]-(c:Customer)
RETURN distinct c.companyName;
B Names of everyone who bought Chocolade
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 61
Sample Queries
Example 4
MATCH (bob:User {username:Bob})-[:SENT] >(email)-[:CC] >(alias),
(alias)-[:ALIAS OF] >(bob)
RETURN email.id
3 Example and figure taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 62
Sample Queries
Example 4
MATCH (bob:User {username:Bob})-[:SENT] >(email)-[:CC] >(alias),
(alias)-[:ALIAS OF] >(bob)
RETURN email.id
B Returns all emails that Bob has sent where hes CCd one of his own aliases
B Returns 1 result: id:1, content:email content
3 Example and figure taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 63
Sample Queries
Example 5
MATCH (email:Email {id:11})< [f:FORWARD OF*]-(:Forward)
RETURN count(f)
3 Example and figure taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 64
Sample Queries
Example 5
MATCH (email:Email {id:11})< [f:FORWARD OF*]-(:Forward)
RETURN count(f)
B This returns the number of times a particular email is forwarded
B The answer is 2 (count the number of FORWARD OF relationships bound
to f)
3 Example and figure taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 64
Serialisation
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 65
Cypher as Serialisation Format
Cypher - Example
CREATE (:Person {name:Ian})-[:EMPLOYMENT] >
(employment:Job {start date:2011-01-05})
-[:EMPLOYER] >(:Company {name:Neo}),
(employment)-[:ROLE] >(:Role {name:engineer})
3 Image taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 66
Geoff
B Based on Cypher
B A Geoff document consists of a one or more subgraphs, each of which
contains one or more paths
B Properties in JSON syntax
( alice {" name ":" Alice "})
( bob {" name ":" Bob "})
( carol {" name ":" Carol "})
( alice ) < -[: KNOWS ] - >( bob ) < -[: KNOWS ] - >( carol ) < -[: KNOWS ] - >( alice )
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 67
GraphML
B XML-based file format for graphs
B Common format for exchanging graph structure data
B Generic (not limited to graph databases)
<? xml version = " 1.0 " encoding = " UTF -8 " ? >
< graphml xmlns = " http: // graphml . graphdrawing . org / xmlns "
[...]
< graph id = " G " edgedefault = " undirected " >
< node id = " n0 " >
< data key = " d0 " > green </ data >
</ node >
< node id = " n1 " / >
< edge id = " e0 " source = " n0 " target = " n1 " >
< data key = " d1 " > 1.0 </ data >
</ edge >
</ graph >
</ graphml >
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 68
Serialisation
B Essentially no standardisation in property graph community
B Serialisation less important than in semantic technologies:
Not much emphasis on data exchange
Data publishing not commonly considered (in contrast to Linked Data)
Embedding in other formats usually not considered (in contrast to
RDFa)
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 69
Hypergraphs
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 70
Graph Shortcomings
Graphs pairwise relationships.
Pairwise relationships are often insufficient - the world is more complex than that.
Example Problem
Given: A collection of articles together with their authors.
Goal: Group articles into different topics.
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 71
Graph Shortcomings
Example Problem
Given: a collection of articles and their authors.
Want: Group articles into different topics.
Possible solution: undirected graph
B Nodes = articles
B If two articles have at least one author in common
add an edge between them
B Add weights to edges equal to the number of
authors the two corresponding articles have in
common
Group articles by topic using clustering approaches
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 72
Example Problem
What is wrong with this solution?
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 73
Example Problem
What is wrong with this solution?
B Loss of information: No way to tell whether the same person joined in
writing three or more articles.
B This is a problem because the articles from the same person are more likely
to belong to the same topic.
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 73
Hypergraphs
Natural solution: a hypergraph!
Definition: Hypergraph
Let V = (v1 , v2 , . . . , vn ) be a set of vertices. A hypergraph on V is a family of
H = (E1 , E2 , . . . , Em ) of subsets of V s.t.:
B For i = 1, 2, . . . , m, Ei 6=
Sm
B Ei = V
i=1
Intuitively: A hypergraph is a graph in which an edge (called hyperedge) can
connect more than two vertices.
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 74
Hypergraphs
Models the example problem:
B Each article matches to a vertex
B Each hyperedge contains set of articles written
by the its corresponding author
Note that hyperedges are represented by
dotted lines around the set of vertices they
include
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 75
Hypergraphs: Summary
B Each hyperedge can connect to an arbitrary number of nodes
B Hypergraphs can be useful for data with a large number of many-to-many
relationships
B Hypergraphs are particularly good at capturing meta-intent
Disadvantage:
B Can be more confusing than the more explicit data modeling techniques
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 76
Converting and Comparing the Graph Models
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 77
RDF to PGM Transformation
Distinguish between two types of RDF triples:
B attribute triples
= triples whose object is a literal
B relationship triples
= triples whose object is an IRI or a blank node
The transformation:
B Every relationship triple an edge
B Every attribute triple a property of the vertex for the subject of that triple
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 78
RDF to PGM Transformation
Simple transformation example:
ex : alice foaf : name " Alice " .
ex : bob foaf : name " Bob " .
<< ex : alice foaf : knows ex : bob > > ex : certainty 0.5 .
<< ex : bob foaf : age 23 > > ex : certainty 0.9 .
Note: << and >> enclose metadata tripes in Turtle syntax. The last line cannot
be converted using this transformation approach. See
https://arxiv.org/pdf/1409.3288v2.pdf for details.
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 79
PGM to RDF Transformation
B Edges (incl. their labels) an ordinary RDF triple
B Vertices (incl. their labels) an ordinary RDF triple
B Edge properties metadata triple whose subject is the triple for the
corresponding edge
B Patterns for generating IRIs that denote edge labels and properties can be
chosen freely
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 80
Hypergraphs to Property Graphs
Labeled hypergraphs can be powerful when modelling many-to-many
relationships:
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 81
Hypergraphs to Property Graphs
All relations need to be made explicit when converting to property graphs:
Why can it still make sense to use property graphs?
B More familiar and explicit modelling technique
B Model can be fine-tuned with properties such as primary driver (for
insurance purposes)
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 82
Knowledge Graph Formats
Characteristics Relational RDF PGM
Standardised yes yes no
Traversal performance - o +
Large analytical queries + o -
Query language SQL SPARQL Cypher & more
Data Publication & Differentiability - + -
Global Identifiers & Cross dataset fusion - + -
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 83
Summary
Covered in the lecture:
B Motivation for Model Graph Databases (Big Data, Connectivity)
B Comparing Relational and Property Graph Databases
B Labeled Property Graph Model
B Cypher Query Language
B Unifying RDF, Property Graphs and Hypergraphs
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 84
References
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 85