[go: up one dir, main page]

Skip to content
This repository has been archived by the owner on Feb 15, 2023. It is now read-only.

sim51/neo4j-fdw

Repository files navigation

Neo4j Postgres integration

GitHub Actions Status

Purpose

Neo4jPg is a foreign data wrapper for Postgresql. It can be used to access data stored into Neo4j from Postgresql. Moreover, you can directly write a cypher query in your select.

Compatibility matrice

Dependencies

The project has two main dependencies Multicorn and Neo4j Python driver.

Multicorn is used for the Postgres FDW layer, so the compatibility with postgres versions depends on it :

Multicorn Postgres Python

0.9.1

9.1

>= 2.6 & < 3.0

1.x

>= 9.2

2.6 or >= 3.3

1.3.4

>= 9.2 and =<10

2.7 or >= 3.6

1.4.0

>= 9.2 and ⇐12

2.7 or >= 3.6

The Neo4j driver is used for the communication with Neo4j, so the compatibility with Neo4j versions depends on it :

Neo4j Driver Neo4j Server Python

1.x

3.X

2.7, 3.4, 3.5, 3.6, and 3.7

1.7

3.X, 4.0 and 4.1

2.7, 3.4, 3.5, 3.6, and 3.7

4.x

3.5 and >= 4.0

3.5, 3.6, and 3.7

Note
Each Neo4j driver release (from 4.0 upwards) is built specifically to work with a corresponding Neo4j release, i.e. that with the same major.minor version number.

Version of neo4j-fdw

The version 1.1.x of the project are based on MultiCorn 1.3.4 and the Neo4j driver ⇐ 1.7 Upper versions are based on MultiCorn 1.4.0

To simplify the understanding of the compatibilty, the project now follows the Neo4j driver version (that’s why There is no 2.X & 3.X). The major & minor version of the project will be linked to the Neo4j driver version, so 4.0.Z means that’s the project is using the version 4.0 of the driver.

Important
The current version (4.0.0) requires python >= 3.3 and is compatible with Neo4j 3.5 & 4.0 and Pg 9.2 to 12.

How to install

With PGXN

Just type those commands :

$> sudo pgxn install multicorn
$> sudo pgxn install neo4j-fdw

Easy, no ?

Note
If you have some troubles, please check that all the project dependencies are installed by PGXN (specially the neo4j driver : pip install neo4j).

From the sources

Requirements

There is some Postgresql & Python requirements to use this project :

  • Postgresql >= 9.1

  • postgresql-plpython

  • Postgresql development packages

  • Python development packages

  • python 3.3 or >= as your default python with pip

On a debian system, you can directly type this command :

$> sudo apt-get install build-essential postgresql-server-dev-10 python3-dev python3-setuptools python3-pip postgresql-plpython3-10

Installing multicorn

This project is based on Multicorn, so you have to install it.

$> cd /tmp
$> git clone git://github.com/Kozea/Multicorn.git
$> cd Multicorn
$> git checkout tags/v1.4.0
$> make && make install

When it’s done, you have to enable the extension in your PostgreSQL database.

$> sudo su - postgres
$> psql
mydatabase=# CREATE EXTENSION multicorn;
CREATE EXTENSION
mydatabase=# \q

Neo4j FDW

You can find The Neo4j foreign data wrapper here : https://github.com/sim51/neo4j-fwd

Clone the repository

$> git clone https://github.com/sim51/neo4j-fwd

Install the project

$> cd neo4j-fdw
$> python setup.py install

At this point, everything is done to use Neo4j in Postgresql !

How to use the Foreign Data Wrapper

Create a neo4j server

First step, you have to create a foreign server in Postgres :

mydatabase=#
CREATE SERVER multicorn_neo4j FOREIGN DATA WRAPPER multicorn
  OPTIONS (
      wrapper  'neo4jPg.neo4jfdw.Neo4jForeignDataWrapper',
      url      'neo4j://172.17.0.1:7687',
      user     'neo4j',
      password 'admin'
  );

Connection options are

  • url The neo4j url for Neo4j (default is neo4j://localhost:7687)

  • user User for Neo4j

  • password password of the Neo4j user

Create a foreign table

Now you can create a foreign table that match a cypher query.

Important
Your cypher query must return a collection, and you have to give an alias on each return variable.
Note
You can specify the neo4j database name. If itis omitted, the default value is 'neo4j'
mydatabase=#
CREATE FOREIGN TABLE neo4j_movie (
    movie varchar
  ) SERVER multicorn_neo4j OPTIONS (
    cypher 'MATCH (n:Movie) RETURN n.title as movie',
    database 'neo4j'
  );

Filtering the data

quals are pushed to the remote database when it’s possible. This include simple operators like :

  • equality, inequality (=, <>, >, <, ⇐, >=)

  • like, ilike and their negations

If you have defined your foreign table with this query MATCH (n:Movie) RETURN n.title as movie, this FDW will push all your WHERE clause directly to Neo4j by generating a cypher query that looks like to this : MATCH (n:Movie) WITH n.title as movie WHERE …​ RETURN movie;

In fact it replaces the RETURN part of your query by a WITH …​ WHERE …​ RETURN. It works, but it’s not optimised …​

To optimise the WHERE clause in the generated cypher query, you can define a WHERE placeholder in the cypher definition of your foreign table

Example :

CREATE FOREIGN TABLE actedIn (
    actor varchar NOT NULL,
    born smallint,
    movie varchar NOT NULL
  ) SERVER multicorn_neo4j OPTIONS (
    cypher 'MATCH (p:Person) /*WHERE{"actor":"p.name", "born":"p.born"}*/  WITH p MATCH (p)-[:ACTED_IN]->(m:Movie) /*WHERE{"movie":"m.title"}*/ RETURN p.name AS actor, p.born AS born, m.title AS movie'
  );

In this example you can see two where placeholders : /*WHERE{"actor":"p.name", "born":"p.born"}*/ & /*WHERE{"movie":"m.title"}*/

A placeholder is defined by /*WHERE{ …​ }*/ (please respect the cast, it’s a strict match). Then inside, you have to define the cypher field name of the SQL field.

With those information, the plugin know how to put the where clause in your cypher query.

So this SQL query :

SELECT * FROM actedIn WHERE born > 1980 AND movie = "The Matrix"

Will generate this cypher query :

MATCH (p:Person) WHERE p.born > 1980
WITH p
MATCH (p)-[:ACTED_IN]->(m:Movie)
WHERE m.title = "The Matrix"
RETURN p.name AS actor, p.born AS born, m.title AS movie

Make cypher query into a sql select

This project also define a cool postgres function cypher, that allow you to write a cypher query into a select. Example : SELECT * FROM cypher('MATCH (n)-[r]→(m) RETURN n,r,m LIMIT 10')

The cypher function returns a postgres JSON type.

Create the functions into your database

You have to declare those functions into your database, before to use it.

mydatabase=#
CREATE EXTENSION plpythonu3;

mydatabase=#
CREATE OR REPLACE FUNCTION cypher(query text) RETURNS SETOF json
LANGUAGE plpythonu3
AS $$
from neo4jPg import neo4jPGFunction
for result in neo4jPGFunction.cypher_default_server(plpy, query, '{}'):
    yield result
$$;
CREATE OR REPLACE FUNCTION cypher(query text, params text) RETURNS SETOF json
LANGUAGE plpythonu3
AS $$
from neo4jPg import neo4jPGFunction
for result in neo4jPGFunction.cypher_default_server(plpy, query, params):
    yield result
$$;
CREATE OR REPLACE FUNCTION cypher(query text, params text, server text) RETURNS SETOF json
LANGUAGE plpythonu3
AS $$
from neo4jPg import neo4jPGFunction
for result in neo4jPGFunction.cypher_with_server(plpy, query, params, server):
    yield result
$$;
CREATE OR REPLACE FUNCTION cypher(query text, params text, server text, dbname text) RETURNS SETOF json
LANGUAGE plpythonu3
AS $$
from neo4jPg import neo4jPGFunction
for result in neo4jPGFunction.cypher_with_server(plpy, query, params, server, dbname):
    yield result
$$;

This define three functions :

  • cypher(query, params, server, dbname) : make a cypher query on the database dbname of the foreign server specify (server is the name of the foreign server. Example multicorn_neo4j) : SELECT * FROM cypher('MATCH (n)-[r]→(m) RETURN n,r,m LIMIT 10', '{}', 'multicorn_neo4j', 'myDb')

  • cypher(query, params, server) : make a cypher query on the foreign server specify (server is the name of the foreign server. Example multicorn_neo4j) : SELECT * FROM cypher('MATCH (n)-[r]→(m) RETURN n,r,m LIMIT 10', '{}', 'multicorn_neo4j')

  • cypher(query, params) : make a cypher query on the first foreign server defined, with neo4j query parameter : SELECT * FROM cypher('MATCH (n:Movie) WHERE n.title CONTAINS $name RETURN n.title AS title LIMIT 10', '{"name":"Matrix"}');

  • cypher(query) : make a cypher query on the first foreign server defined : SELECT * FROM cypher('MATCH (n)-[r]→(m) RETURN n,r,m LIMIT 10')

How to use it

The JSON produced follow your cypher return statement : the key of the first json level correspond to you the name of yours returns, and the value to json serialisation fo the object.

If the return object is a Node, it’s serialize as a JSON object like this : { id:X, labels : [], properties: { object } }

Example :

mydatabase=#
SELECT cypher  FROM cypher('MATCH (n:Location) RETURN n LIMIT 10');

                                                            cypher
 {"n":{"labels": ["Location"],"properties": {"y": 1906520.0, "x": 1158953.0, "name": "025XX W AUGUSTA BLVD"}}}
 {"n":{"labels": ["Location"],"properties": {"y": 1842294.0, "x": 1175702.0, "name": "094XX S HARVARD AVE"}}}
 {"n":{"labels": ["Location"],"properties": {"y": 1931163.0, "x": 1152905.0, "name": "047XX N KIMBALL AVE"}}}
 {"n":{"labels": ["Location"],"properties": {"y": 1887355.0, "x": 1149049.0, "name": "041XX W 24TH PL"}}}
 {"n":{"labels": ["Location"],"properties": {"y": 1869892.0, "x": 1176061.0, "name": "001XX W 53RD ST"}}}
 {"n":{"labels": ["Location"],"properties": {"y": 1862782.0, "x": 1180056.0, "name": "063XX S DR MARTIN LUTHER KING JR DR"}}}
 {"n":{"labels": ["Location"],"properties": {"y": 1908312.0, "x": 1175281.0, "name": "001XX W DIVISION ST"}}}
 {"n":{"labels": ["Location"],"properties": {"y": 1899998.0, "x": 1139456.0, "name": "0000X N PINE AVE"}}}
 {"n":{"labels": ["Location"],"properties": {"y": 1908407.0, "x": 1176113.0, "name": "012XX N STATE PKWY"}}}
 {"n":{"labels": ["Location"],"properties": {"y": 1888098.0, "x": 1148713.0, "name": "023XX S KEELER AVE"}}}
(10 lignes)

If the return object is a relation, it’s serialize as a JSON object like this :` { type : "MY_TYPE", properties: { object } }`

Example :

mydatabase=#
SELECT cypher  FROM cypher('MATCH (n)-[r]->(m) RETURN r AS relation LIMIT 10');

                          cypher
 {"relation":{"type": "IS_TYPE_OF","properties": {}}}
 {"relation":{"type": "IS_TYPE_OF","properties": {}}}
 {"relation":{"type": "IS_LOCALIZED_AT","properties": {}}}
 {"relation":{"type": "HAS_ARREST","properties": {}}}
 {"relation":{"type": "IS_DOMESTIC","properties": {}}}
 {"relation":{"type": "IN_YEAR","properties": {}}}
 {"relation":{"type": "IS_IN_CATEGORY","properties": {}}}
 {"relation":{"type": "IS_TYPE_OF","properties": {}}}
 {"relation":{"type": "IS_TYPE_OF","properties": {}}}
 {"relation":{"type": "IS_TYPE_OF","properties": {}}}
(10 lignes)

Of course, for primitive type are also supported, and you can mix all of this : SELECT cypher FROM cypher('MATCH (y:Year)-[r]→(m) RETURN y.value AS year, r, m LIMIT 10');

mydatabase=#
SELECT cypher  FROM cypher('MATCH (y:Year)-[r]->(m) RETURN y.value AS year, r, m LIMIT 10');
                                                      cypher
 {"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10016718"}}}
 {"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017521"}}}
 {"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10018383"}}}
 {"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10087834"}}}
 {"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017190"}}}
 {"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017379"}}}
 {"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017246"}}}
 {"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017248"}}}
 {"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017208"}}}
 {"year":2015,"r":{"type": "IN_YEAR","properties": {}},"m":{"labels": ["Crime"],"properties": {"id": "10017211"}}}
(10 lignes)

The power of PG & JSON

PG 9.4 have a function name json_to_record, that convert our json into a collection of typed tuple !

mydatabase=#
SELECT year, id  FROM cypher('MATCH (y:Year)<-[r]-(m) RETURN y.value AS year, m.id AS id LIMIT 10') , json_to_record(cypher) as x(year int, id varchar)
 year |    id
------+----------
 2015 | 10016718
 2015 | 10017521
 2015 | 10018383
 2015 | 10087834
 2015 | 10017190
 2015 | 10017379
 2015 | 10017246
 2015 | 10017248
 2015 | 10017208
 2015 | 10017211
(10 lignes)

Run test

You need to have docker compose installed. Then you just have to run the ./scripts/tests.sh script.

More Examples

If you want to see more examples, just take a look in folder test/sql

kb

  • To enable log in postgres : SET client_min_messages = DEBUG

  • To enable query log in Neo4j : CALL dbms.setConfigValue("dbms.logs.query.enabled", "true")

  • To open an psql session on the database neo4j with debug messages : env PGOPTIONS='-c client_min_messages=DEBUG' psql neo4j

  • Alter an option of foreign table (replace ADD by SET or DROP): ALTER FOREIGN TABLE actedin OPTIONS ( ADD estimated_rows '-1');

  • Display the detail of a table : \d+ person