SlideShare a Scribd company logo
Incremental View Maintenance
for openCypher Queries
Gábor Szárnyas, József Marton
4th openCypher Implementers Meeting
MODEL-DRIVEN ENGINEERING
 Primarily for designing critical systems
 Models are first class citizens during development
o SysML / requirements, statecharts, etc.
o Validation and code generation techniques for correctness
Technology:
Eclipse Modeling Framework (EMF)
 Originally started at IBM as an implementation of the Object
Management Group’s (OMG) Meta Object Facility (MOF).
 i.e. an object-oriented model
 i.e. a property graph-like structure with a metamodel
MODEL VALIDATION
 Implemented with model queries
 Models are typed, attributed graphs
 Typical queries
o Get two components connected by a particular edge
MATCH (r:R)…(s:S) WHERE NOT (r)-[:E]->(s)
o Check if two objects are reachable
MATCH (r:R)…(s:S) WHERE NOT (r)-[:E1|E2*]->(s)
o Property checks
MATCH (r:R)-->(s:S) WHERE r.a = 'x' OR (s:Y)
Complex graph queries
1
switch
sensor C
sensor B
2
sensor A
segment
route
RAILWAY NETWORK MODEL
RAILWAY NETWORK MODEL
segment
segment
segmentswitch
sensor C
sensor B
sensor A
route 2
route 1
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
RAILWAY NETWORK MODEL
:FOLLOWS
:MONITORED_BY
:TARGET:REQUIRES
MATCH (route:Route)
-[:FOLLOWS]->(swP:SwitchPosition)
-[:TARGET]->(sw:Switch)
-[:MONITORED_BY]->(sensor:Sensor)
WHERE NOT (route)-[:REQUIRES]->(sensor)
RETURN route, sensor, swP, sw
sw: Switchsensor: Sensor
route: Route swP: SwitchPosition
G. Szárnyas, B. Izsó, I. Ráth, D. Varró:
The Train Benchmark: cross-technology performance
evaluation of continuous model queries.
Software and Systems Modeling, 2017
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
INCREMENTAL VIEW MAINTENANCE (IVM)
In many use cases…
 queries are static
 data changes slowly
-> views can be maintained incrementally
Graph applications
 model validation
 simulation
 recommendation systems
 fraud detection
INGRAPH: IVM ON PROPERTY GRAPHS
Idea: map to relational algebra and use standard IVM techniques
 Challenging aspects
o Property graph data model
o Cypher language
 Formalise the language in relational algebra
 Use nested relational algebra -> closed on operations
Prototype tool: ingraph (OCIM1, OCIM2, GraphConnect talks)
Gábor Szárnyas, József Marton, Dániel Varró:
Formalising openCypher Graph Queries in Relational Algebra.
ADBIS 2017
INGRAPH / GRAPH TO NESTED RELATIONS
INGRAPH / NESTED RELATIONAL ALGEBRA OPS
INGRAPH
 ingraph uses a procedural IVM approach: the Rete algorithm.
o Build caches for each operator
o Maintain caches upon changes
o Supports 15+ out of 25 LDBC BI queries
o Details to be published in a conference paper
o Extensible, but very heavy on memory
 The rest of the talk focuses on the algebraic approach.
Gábor Szárnyas, József Marton et al.:
Incremental View Maintenance on Property Graphs.
arXiv preprint will be available on the 1st week of June
Delta Queries for openCypher
DELTA QUERIES AT A GLANCE
𝐺
evaluate query 𝑄
for each Δ𝐺
evaluate Δ𝑄
changes
Δ𝐺1, Δ𝐺2, …
𝑄(𝐺)
Δ𝑄(Δ𝐺1)
Δ𝑄(Δ𝐺2)
⇒ 𝑄(𝐺 + Δ𝐺1 + Δ𝐺2 + ⋯ )
𝑄 and Δ𝑄 are calculated by the same engine.
IMPLEMENTATION: TRIGGERS IN NEO4J
 Event-driven programming in databases
 Neo4j: TransactionEventHandler interface
o afterCommit(TransactionData data, T state)
o beforeCommit(TransactionData data)
o TransactionData contains Δ𝐺: createdNodes, deletedNodes, …
 Only the updated state of the graph is accessible.
 GraphAware framework: ImprovedTransactionData API
o Get properties and labels/types of deleted elements
Max de Marzi:
Triggers in Neo4j.
2015
Michal Bachman:
Neo4j Improved Transaction Event API.
2014
DERIVING DELTA QUERIES
a b
1 2
3 4
5 6
7 8
a b
1 2
5 6
7 8
𝑅 𝑚
Idea: given query 𝑄, derive delta queries Δ𝑄 and 𝛻𝑄, which
define positive and negative changes, respectively.
But: most IVM techniques are defined for relational algebra.
Notation:
 𝑅 relation
 Δ𝑅 positive changes
 𝛻𝑅 negative changes
 𝑅 𝑚
: maintained relation of 𝑅 ⇒ 𝑅 𝑚
= 𝑅 − 𝛻𝑅 + Δ𝑅
 “−” denotes set minus (∖), “+” denotes set union (∪)
a b
1 2
3 4
5 6
𝑅 𝛻𝑅
Δ𝑅
RELATIONAL ALGEBRA FOR CYPHER
 Query plans in Neo4j ≅ relational algebra + Expand/VarExpand.
 Expand is essentially a natural join.
 Natural join 𝑟 ⋈ 𝑠
 Semijoin 𝑟 ⋉ 𝑠 = 𝜋 𝑅 𝑟 ⋈ 𝑠
 Antijoin 𝑟 ഥ⋉ 𝑠 = 𝑟 ∖ 𝑟 ⋉ 𝑠
 Left outer join 𝑟 ⟕ 𝑠 ≅ 𝑟 ⋈ 𝑠 ∪ 𝑟 ഥ⋉ 𝑠 //plus nulls
Andrés Taylor:
Neo4j Cypher implementation.
First openCypher Implementers Meeting, 2017
v1 v2 v3
1 2 3
1 2 6
RELATIONAL ALGEBRA FOR CYPHER
Natural join: 𝑟 ⋈ 𝑠
MATCH (v1)-[:r]->(v2)-[:s]->(v3)
RETURN *
Semijoin: 𝑟 ⋉ 𝑠
MATCH (v1)-[:r]->(v2)
WHERE (v2)-[:s]->()
Antijoin: 𝑟 ഥ⋉ 𝑠
MATCH (v1)-[:r]->(v2)
WHERE NOT (v2)-[:s]->()
Left outer join: 𝑟 ⟕ 𝑠
MATCH (v1)-[:r]->(v2)
OPTIONAL MATCH (v2)-[:s]->(v3)
1 3
4
2
5 6
:r
:r
:s
:s
v1 v2
1 2
v1 v2
4 5
1 3
4
2
5 6
:r
:r
:s
:s
1 3
4
2
5 6
:r
:r
:s
:s
1 3
4
2
5 6
:r
:r
:s
:s
1 32
6
:r :s
:s
1 2:r
4 5
:r
1 3
4
2
5 6
:r
:r
:s
:s
v1 v2 v3
1 2 3
1 2 6
4 5 null
4 5
v1 v2 v3
1 2 3
1 2 6
T. Griffin, B. Kumar:
Algebraic Change Propagation for Semijoin and Outerjoin Queries.
SIGMOD Record 1998
DERIVING DELTA QUERIES
T. Griffin, L. Libkin:
Incremental Maintenance of Views with Duplicates.
SIGMOD 1995
T. Griffin, L. Libkin, H. Trickey: An Improved Algorithm for the
Incremental Recomputation of Active Relational Expressions.
TKDE 1997
X. Qian, G. Wiederhold:
Incremental Recomputation of Active Relational Expressions.
TKDE 1991
DELTA QUERIES
T. Griffin, L. Libkin:
Incremental Maintenance of Views with Duplicates.
SIGMOD 1995
 The seminal paper
 Δ/𝛻 delta queries
for joins, selections,
projections, etc.
 Bag semantics
DELTA QUERIES
 Semijoins, antijoins,
outer joins
 Set semantics
 Later publications,
e.g. Zhou-Larson’s
ICDE’07 paper
improved these
T. Griffin, B. Kumar:
Algebraic Change Propagation for Semijoin and Outerjoin Queries.
SIGMOD Record 1998
EXAMPLE QUERY #1
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4)
RETURN v1, v2, v3, v4
Relational algebra expression: 𝑎 ⋈ 𝑏 ⋈ 𝑐
Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐
= 𝑎 ⋈ 𝑏 𝑚
⋈ Δ𝑐 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚
= 𝑎 ⋈ 𝑏 𝑚
⋈ Δ𝑐 + 𝑎 𝑚
⋈ Δ𝑏 ⋈ 𝑐 𝑚
+ Δ𝑎 ⋈ 𝑏 𝑚
⋈ 𝑐 𝑚
= 𝑎 𝑚
⋈ 𝑏 𝑚
⋈ Δ𝑐 + 𝑎 𝑚
⋈ Δ𝑏 ⋈ 𝑐 𝑚
+ Δ𝑎 ⋈ 𝑏 𝑚
⋈ 𝑐 𝑚
Similarly to 𝛻 𝑎 ⋈ 𝑏 ⋈ 𝑐 .
v1 v2 v3
:b:a
 𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
v4
:c
EXAMPLE QUERY #1
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4)
RETURN v1, v2, v3, v4
Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐
= 𝑎 𝑚
⋈ 𝑏 𝑚
⋈ Δ𝑐 + 𝑎 𝑚
⋈ Δ𝑏 ⋈ 𝑐 𝑚
+ Δ𝑎 ⋈ 𝑏 𝑚
⋈ 𝑐 𝑚
UNWIND $pcs AS pc
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4)
RETURN v1, v2, v3, v4
$pcs -> pass lists of nodes/edges as parameters
// This only works in embedded mode, see neo4j/issues/10239
v1 v2 v3
:b:a
 𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
v4
:c
POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐
// r1 = a⋈b⋈Δc
UNWIND $pcs AS pc
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4)
RETURN v1, v2, v3, v4
UNION ALL
// r2 = a⋈Δb⋈c
UNWIND $pbs AS pb
MATCH (v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4)
RETURN v1, v2, v3, v4
UNION ALL
// r3 = Δa⋈b⋈c
UNWIND $pas AS pa
MATCH (v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4)
RETURN v1, v2, v3, v4
POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐
Long WITH chains are cumbersome -> patterns+list comprehensions.
WITH [pc IN $pcs | // r1 = a⋈b⋈Δc
[(v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) |
[v1, v2, v3, v4]]]
[pb IN $pbs | // r2 = a⋈Δb⋈c
[(v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) |
[v1, v2, v3, v4]]] +
[pa IN $pas | // r3 = Δa⋈b⋈c
[(v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) |
[v1, v2, v3, v4]]] AS r
RETURN
r[0] AS v1, r[1] AS v2, r[2] AS v3, r[3] AS v4
EXAMPLE QUERY #2
MATCH (route:Route)
-[:FOLLOWS]->(swP:SwitchPosition)
-[:TARGET]->(sw:Switch)
-[:MONITORED_BY]->(sensor:Sensor)
WHERE NOT (route)-[:REQUIRES]->(sensor)
RETURN route, sensor, swP, sw
MATCH (v1)
-[:a]->(v2)
-[:b]->(v3)
-[:c]->(v4)
WHERE NOT (v1)-[:d]->(v4)
RETURN v1, v2, v3, v4
v1 v2
v3v4
:b
:c
:d
:a
:FOLLOWS
:MONITORED_BY
:TARGET:REQUIRES
sw: Switchsensor: Sensor
route: Route
swP:
SwitchPosition
NEGATIVE CONDITIONS
v1 v2
v3v4
:b
:c
:d
:a
MATCH (v1)
-[:a]->(v2)
-[:b]->(v3)
-[:c]->(v4)
WHERE NOT (v1)-[:d]->(v4)
RETURN v1, v2, v3, v4
 𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
 𝑑 𝑣1, 𝑣4
⇒ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
DELTA QUERIES FOR JOINS AND ANTIJOINS
Natural join
 Δ 𝑆 ⋈ 𝑇 = Δ𝑆 ⋈ 𝑇 𝑚
+ 𝑆 𝑚
⋈ Δ𝑇
 𝛻 𝑆 ⋈ 𝑇 = 𝛻𝑆 ⋈ 𝑇 + 𝑆 ⋈ 𝛻𝑇
Antijoin
 Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚
+ Δ𝑆 ഥ⋉ 𝑇 𝑚
 𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇
Expression 2Expression 1
Only 𝑆 𝑚 and 𝑇 𝑚
are available.
SUBEXPRESSIONS
1. Δ𝑇 ഥ⋉ 𝑇 = ?
 R1 ഥ⋉ 𝑅2, where 𝑅1 and 𝑅2 both have schema 𝑅.
 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝜋 𝑅 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2
 If 𝜃 defines equality on all attributes of 𝑅, the theta join (⋈ 𝜃)
becomes a natural join, which is an intersection for relations
with the same schema.
 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 ⋈ 𝑅2 = 𝑅1 ∩ 𝑅2
 𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ∩ 𝑅2 = 𝑅1 − 𝑅2
⇒ ∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2
2. 𝑅 𝑚
= 𝑅 − 𝛻𝑅 + Δ𝑅 ⇒ ∗∗ 𝑅 = 𝑅 𝑚
− Δ𝑅 + 𝛻𝑅
DELTAS FOR ANTIJOINS
Based on Griffin-Kumar’s ’98 paper.
 Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚
+ Δ𝑆 ഥ⋉ 𝑇 𝑚
 Δ 𝑆 ഥ⋉ 𝑇 =
∗
𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚
 Δ 𝑆 ഥ⋉ 𝑇 =
∗∗
𝑆 𝑚
− Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚
 Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚
− Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚
 𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇
 𝛻 𝑆 ഥ⋉ 𝑇 =
∗
𝑆 − 𝛻𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇
 𝛻 𝑆 ഥ⋉ 𝑇 =
∗∗
𝑆 𝑚
− Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚
− 𝛻𝑇
 𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚
− Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚
− 𝛻𝑇
∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2
∗∗ 𝑅 = 𝑅 𝑚
− Δ𝑅 + 𝛻𝑅
NEGATIVE CONDITIONS  𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
 𝑑 𝑣1, 𝑣4
Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 = 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 𝑚
𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚
⋉ 𝛻𝑑 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑
Pushdown 𝛻𝑑:
𝑎 𝑚 ⋉ 𝛻𝑑 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑
𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚
v1 v2
v3v4
:b
:c
:d
:a
NEGATIVE CONDITIONS
Δ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
v1 v2
v3v4
:b
:c
:d
:a  ,
 ,
 ,
 ,
⋉ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ Δ ⋉
⋉ ⋈ Δ ⋈ ⋉
Δ ⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋉
⋈ Δ ⋈ ⋉
Δ ⋈ ⋈ ⋉
NEGATIVE CONDITIONS
⋉ ∈ . , where ∩ is a
single vertex, because and represent edges.
Δ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
⋉ ∈ .
Δ ⋉ ∈ . Δ
⋉ ∈ .
Δ ⋉ ∈ . Δ
R1
R2
R3
R4
R5
R6
R7
⋉ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ Δ ⋉
⋉ ⋈ Δ ⋈ ⋉
Δ ⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋉
⋈ Δ ⋈ ⋉
Δ ⋈ ⋈ ⋉
S1
S2
S3
S4
v1 v2
v3v4
:b
:c
:d
:a  ,
 ,
 ,
 ,
WITH
[] AS pes, [] AS nes
WITH
[pe IN pes WHERE type(pe) = 'a'|pe] AS pas,
[pe IN pes WHERE type(pe) = 'b'|pe] AS pbs,
[pe IN pes WHERE type(pe) = 'c'|pe] AS pcs,
[pe IN pes WHERE type(pe) = 'd'|pe] AS pds,
[ne IN nes WHERE type(ne) = 'a'|ne] AS nas,
[ne IN nes WHERE type(ne) = 'b'|ne] AS nbs,
[ne IN nes WHERE type(ne) = 'c'|ne] AS ncs,
[ne IN nes WHERE type(ne) = 'd'|ne] AS nds
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
[nd IN nds | startNode(nd)] AS nd_v1s,
[nd IN nds | endNode(nd)] AS nd_v4s
// calculating s1s...s4s
// s1s: (𝑎⋉∇𝑑)
UNWIND nd_v1s AS v1
MATCH (v1)-[:a]->(v2)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s,
collect({v1: v1, v2: v2}) AS s1s
// s2s: (Δ𝑎⋉∇𝑑)
UNWIND pas AS pa
MATCH (v1)-[pa]->(v2)
WHERE v1 IN nd_v1s
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s,
s1s,
collect({v1: v1, v2: v2}) AS s2s
// s3s: (𝑐⋉∇𝑑)
UNWIND nd_v4s AS v4
MATCH (v3)-[:c]->(v4)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s,
s1s, s2s,note
collect({v3: v3, v4: v4}) AS s3s
// s4s: (Δ𝑐⋉∇𝑑)
UNWIND pcs AS pc
MATCH (v3)-[pc]->(v4)
WHERE v4 IN nd_v1s
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s,
s1s, s2s, s3s,
collect({v3: v3, v4: v4}) AS s4s
// calculating r1...r7
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s
// r1: (𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑)
UNWIND s1s AS s1
UNWIND s3s AS s3
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4
AS v4
WHERE (v2)-[:b]->(v3)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
collect([v1, v2, v3, v4]) AS r1
// r2: -(𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (Δ𝑐⋉∇𝑑)
UNWIND s1s AS s1
UNWIND s4s AS s4
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1,
s1.v1 AS v1, s1.v2 AS v2, s4.v3 AS v3, s4.v4
AS v4
WHERE (v2)-[:b]->(v3)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1,
collect([v1, v2, v3, v4]) AS r2
// r3: -(𝑎⋉∇𝑑) ⋈ Δ𝑏 ⋈ (𝑐⋉∇𝑑)
UNWIND s1s AS s1
UNWIND s3s AS s3
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2,
s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4
AS v4
MATCH (v2)-[b:b]->(v3)
WHERE b IN pbs
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2,
collect([v1, v2, v3, v4]) AS r3
// r4: -(Δ𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑)
UNWIND s2s AS s2
UNWIND s3s AS s3
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2, r3,
s2.v1 AS v1, s2.v2 AS v2, s3.v3 AS v3, s3.v4
AS v4
MATCH (v2)-[b:b]->(v3)
WHERE b IN pbs
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2, r3,
collect([v1, v2, v3, v4]) AS r4
// r5: 𝑎 ⋈ 𝑏 ⋈ Δ𝑐 ̅⋉ 𝑑
UNWIND pcs AS pc
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:pc]->(v4)
WHERE NOT (v1)-[:d]->(v4)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2, r3, r4,
collect([v1, v2, v3, v4]) AS r5
// r6: 𝑎 ⋈ Δ𝑏 ⋈ 𝑐 ̅⋉ 𝑑
UNWIND pbs AS pb
MATCH (v1)-[:a]->(v2)-[:pb]->(v3)-[:c]->(v4)
WHERE NOT (v1)-[:d]->(v4)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2, r3, r4, r5,
collect([v1, v2, v3, v4]) AS r6
// r7: Δ𝑎 ⋈ 𝑏 ⋈ 𝑐 ̅⋉ 𝑑
UNWIND pas AS pa
MATCH (v1)-[:pa]->(v2)-[:b]->(v3)-[:c]->(v4)
WHERE NOT (v1)-[:d]->(v4)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2, r3, r4, r5, r6,
collect([v1, v2, v3, v4]) AS r7
WITH
r1 + r5 + r6 + r7 AS rp,
r2 + r3 + r4 AS rn
RETURN
[r IN rp WHERE NOT r IN rn] AS results
POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
 Workaround: knowing the change workload helps
o Only consider changes in Δ𝑑 and 𝛻𝑑
o Query is cleaner and much more efficient
UNWIND $nds AS nd
WITH
startNode(nd) AS v1,
endNode(nd) AS v4
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4)
RETURN v1, v2, v3, v4
 This can outperform recomputing the query from scratch.
LIST OPERATIONS IN CYPHER
Instead of subqueries, use chained queries and combine lists.
WITH [1, 1, 2, 2, 3] AS xs
UNWIND xs AS x
RETURN
collect(DISTINCT x) AS unique
WITH [1, 2, 3] AS xs, [2] AS ys
RETURN
xs + ys AS append,
[x IN xs WHERE NOT x IN ys] AS subtraction,
[x IN xs WHERE x IN ys] AS intersection
WITH [1, 1, 2, 2, 3] AS xs
RETURN
reduce(acc = [], x in xs |
acc + CASE x IN acc
WHEN false THEN [x]
ELSE []
END) AS unique
Get unique list in openCypher
DELTA QUERIES IN CYPHER
Delta queries are complex. Features that would be nice:
 Subqueries //pattern comprehensions go some length
 Named subqueries //help reusability
 Subtracting lists //related: CIR-2017-180
 Use collection elements or function results for matching
These are probably too much to ask.
-> recommended approach: compile directly to query plans.
MATCH (n)
WITH collect(n) AS ns
MATCH (ns[0])
RETURN *
MATCH (n)
WITH collect(n) AS ns
WITH ns[0] AS x
MATCH (x)
RETURN *

CHALLENGES FOR PROPERTY GRAPH QUERIES
Data model
 NF2 (Non-First Normal Form): maps, lists
 No schema (schema-optionality)
 Graph structure
Queries
 Nulls, antijoins and left outerjoins
 Updates on property values
 Aggregates on aggregates, non-distributive functions
 Ordering + skip/limit
 Reachability queries
CHALLENGES FOR PROPERTY GRAPH QUERIES
Data model
 NF2
 No schema
 Graph
Queries
 Nulls
 Updates
 Aggregates
 Ordering
 Reachability
A. Gupta, I. S. Mumick:
Materialized Views.
MIT Press, 1999
R. Chirkova, J. Yang:
Materialized Views.
Foundations and Trends
in Databases, 2012








Decades of research -> 2 long surveys
OUR SURVEY OF RELATED IVM TECHNIQUES
DBTOASTER
 Shows the scale of the problem
 Relational data model and SQL queries
 R&D for ~5 years @ EPFL, Johns Hopkins, Cornell, etc.
 Approach
o Queries over an algebraic ring
o Higher-order recursive IVM
 Compiler in OCaml
 Backend with code generation for C++, Scala/Spark
Christoph Koch et al.:
DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views.
VLDB Journal 2014
FUTURE DIRECTIONS
 Work out derivation rules for Expand/VarExpand, …
 Automate delta query derivation
 Integrate to Neo4j
 Run performance experiments
o Train Benchmark (set semantics)
o LDBC Social Network Benchmark’s BI workload (bag semantics)
Short news
LDBC BENCHMARKS
 Social Network Benchmark
o Business Intelligence workload published
o openCypher reference implementation
o Next goal: full conference paper
 Graphalytics
o Competition is online at graphalytics.org
o Neo4j implementation (using the Graph Algorithms library) WIP
graphalytics-platforms-neo4j/pull/6
Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton et al.:
An early look at the LDBC Social Network Benchmark’s BI Workload.
GRADES-NDA at SIGMOD, 2018
ldbc/ldbc_snb_implementations
GRAPH ANALYTICS ON THE PANAMA PAPERS
 Network science approach: multidimensional graph metrics
from social network analysis, biology, physics, etc.
 Our work originally targeted software and system models.
 Progress in 2018
o Q1: implemented adapters for Neo4j and CSV
o Q2 goal: analyse Panama papers and using metrics
Gábor Szárnyas, Zsolt Kővári, Ágnes Salánki, Dániel Varró:
Towards the Characterization of Realistic Models:
Evaluation of Multidisciplinary Graph Metrics,
MODELS 2016 ftsrg/model-analyzer
MAPPING CYPHER TO SQL
 Evaluate graph queries in an RDB - similar to ORM
 Approaches
o Cytosm: Cypher to SQL Mapper / gTop: graph topology
o GraphGen – extracting graphs from RDBs
o Ongoing work to map TCK to SQLite
B. A. Steer, A. Alnaimi, M. Lotz, F. Cuadrado, L. Vaquero, J. Varvenne:
Cytosm: Declarative Property Graph Queries Without Data Migration.
GRADES 2017
K. Xirogiannopoulos, V. Srinivas, A. Deshpande:
GraphGen: Adaptive Graph Processing using Relational Databases.
GRADES 2017
cytosm/cytosm
KonstantinosX/graphgen-project
NEO4J APOC LIBRARY
 CSV loader that follows the schema of the neo4j-import tool
 Goal
o Use headers to generate LOAD CSV commands.
o 1st pass: CALL apoc.import.csv.node(file, labels, …)
o 2nd pass: CALL apoc.import.csv.relationship(file, type, …)
 Result
o Many corner cases -> ~700 LOC + tests
o Covers most use cases, but is very slow
o APOC PR pending
neo4j-apoc-procedures/pull/581 neo4j-documentation/pull/121
SCOPING FOR OPENCYPHER
p
p
x
c
c
slizaa-opencypher-xtext/issues/7
 Xtext grammar for the Slizaa software analysis workbench
 Progress in 2018
o Q1: scope analyser implemented for Cypher grammar of M05
o Q2 goal: update to M10
Incremental View Maintenance for openCypher Queries

More Related Content

PPTX
Mapping Graph Queries to PostgreSQL
PDF
What Makes Graph Queries Difficult?
PDF
L3
PDF
RDataMining slides-r-programming
PDF
Introduction to R Programming
PDF
R programming & Machine Learning
PPTX
R programming Fundamentals
PDF
R programming groundup-basic-section-i
Mapping Graph Queries to PostgreSQL
What Makes Graph Queries Difficult?
L3
RDataMining slides-r-programming
Introduction to R Programming
R programming & Machine Learning
R programming Fundamentals
R programming groundup-basic-section-i

What's hot (19)

ODP
Introduction to the language R
PDF
RDataMining slides-network-analysis-with-r
PDF
Bekas for cognitive_speaker_series
PPT
DATA STRUCTURES
PDF
Introduction to data analysis using R
PPTX
R Programming Language
PPTX
Workshop presentation hands on r programming
PDF
RDataMining slides-text-mining-with-r
PDF
PDF
R - the language
PDF
Introduction to the R Statistical Computing Environment
PPTX
R language
PDF
Why functional programming and category theory strongly matters
PPTX
PDF
R programming language: conceptual overview
KEY
Presentation R basic teaching module
PDF
THoSP: an Algorithm for Nesting Property Graphs
PDF
Working with Complex Types in DataFrames: Optics to the Rescue
PDF
Federation and Navigation in SPARQL 1.1
Introduction to the language R
RDataMining slides-network-analysis-with-r
Bekas for cognitive_speaker_series
DATA STRUCTURES
Introduction to data analysis using R
R Programming Language
Workshop presentation hands on r programming
RDataMining slides-text-mining-with-r
R - the language
Introduction to the R Statistical Computing Environment
R language
Why functional programming and category theory strongly matters
R programming language: conceptual overview
Presentation R basic teaching module
THoSP: an Algorithm for Nesting Property Graphs
Working with Complex Types in DataFrames: Optics to the Rescue
Federation and Navigation in SPARQL 1.1
Ad

Similar to Incremental View Maintenance for openCypher Queries (20)

PDF
Realtime Analytics
PDF
Incremental Graph Queries for Cypher
PDF
The inGraph project and incremental evaluation of Cypher queries
PDF
Cs501 rel algebra
PDF
Compiling openCypher graph queries with Spark Catalyst
PDF
A tutorial on EMF-IncQuery
ODP
Graph databases
PPTX
Relational Algebra in Database Systems.pptx
PDF
Formal methods 4 - Z notation
PPT
Introduction to Domain Calculus Notes.ppt
PPT
Temporal PPT details about the platform and its uses
PPT
[ABDO] Data Integration
PDF
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
PPT
lefg sdfg ssdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg d...
PPT
lecture8Alg.ppt
PPT
Relational Algebra and Calculus.ppt
PDF
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
PPTX
The openCypher Project - An Open Graph Query Language
PPT
Unit-2 relational algebra ikgtu DBMS.ppt
PPTX
3-Data_Chjgjjghgjhgjhgjhgjhontrol.pptxghgjg
Realtime Analytics
Incremental Graph Queries for Cypher
The inGraph project and incremental evaluation of Cypher queries
Cs501 rel algebra
Compiling openCypher graph queries with Spark Catalyst
A tutorial on EMF-IncQuery
Graph databases
Relational Algebra in Database Systems.pptx
Formal methods 4 - Z notation
Introduction to Domain Calculus Notes.ppt
Temporal PPT details about the platform and its uses
[ABDO] Data Integration
From Cypher 9 to GQL: Conceptual overview of multiple named graphs and compos...
lefg sdfg ssdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg sdfg d...
lecture8Alg.ppt
Relational Algebra and Calculus.ppt
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
The openCypher Project - An Open Graph Query Language
Unit-2 relational algebra ikgtu DBMS.ppt
3-Data_Chjgjjghgjhgjhgjhgjhontrol.pptxghgjg
Ad

More from Gábor Szárnyas (11)

PDF
GraphBLAS: A linear algebraic approach for high-performance graph queries
PDF
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
PDF
Writing a Cypher Engine in Clojure
PDF
Learning Timed Automata with Cypher
PDF
Időzített automatatanulás Cypherrel
PDF
Parsing process
PDF
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
PDF
Sharded Joins for Scalable Incremental Graph Queries
PDF
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
PPTX
IncQuery-D: Distributed Incremental Graph Queries
PPTX
IncQuery-D: Incremental Queries in the Cloud
GraphBLAS: A linear algebraic approach for high-performance graph queries
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
Writing a Cypher Engine in Clojure
Learning Timed Automata with Cypher
Időzített automatatanulás Cypherrel
Parsing process
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Sharded Joins for Scalable Incremental Graph Queries
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Incremental Queries in the Cloud

Recently uploaded (20)

PPT
Mechanical Engineering MATERIALS Selection
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
composite construction of structures.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Digital Logic Computer Design lecture notes
PDF
PPT on Performance Review to get promotions
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Project quality management in manufacturing
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
web development for engineering and engineering
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Lecture Notes Electrical Wiring System Components
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Geodesy 1.pptx...............................................
PPTX
CH1 Production IntroductoryConcepts.pptx
Mechanical Engineering MATERIALS Selection
R24 SURVEYING LAB MANUAL for civil enggi
composite construction of structures.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Digital Logic Computer Design lecture notes
PPT on Performance Review to get promotions
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Project quality management in manufacturing
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Embodied AI: Ushering in the Next Era of Intelligent Systems
web development for engineering and engineering
Automation-in-Manufacturing-Chapter-Introduction.pdf
Lecture Notes Electrical Wiring System Components
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Geodesy 1.pptx...............................................
CH1 Production IntroductoryConcepts.pptx

Incremental View Maintenance for openCypher Queries

  • 1. Incremental View Maintenance for openCypher Queries Gábor Szárnyas, József Marton 4th openCypher Implementers Meeting
  • 2. MODEL-DRIVEN ENGINEERING  Primarily for designing critical systems  Models are first class citizens during development o SysML / requirements, statecharts, etc. o Validation and code generation techniques for correctness Technology: Eclipse Modeling Framework (EMF)  Originally started at IBM as an implementation of the Object Management Group’s (OMG) Meta Object Facility (MOF).  i.e. an object-oriented model  i.e. a property graph-like structure with a metamodel
  • 3. MODEL VALIDATION  Implemented with model queries  Models are typed, attributed graphs  Typical queries o Get two components connected by a particular edge MATCH (r:R)…(s:S) WHERE NOT (r)-[:E]->(s) o Check if two objects are reachable MATCH (r:R)…(s:S) WHERE NOT (r)-[:E1|E2*]->(s) o Property checks MATCH (r:R)-->(s:S) WHERE r.a = 'x' OR (s:Y) Complex graph queries
  • 4. 1 switch sensor C sensor B 2 sensor A segment route RAILWAY NETWORK MODEL
  • 6. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight» RAILWAY NETWORK MODEL
  • 7. :FOLLOWS :MONITORED_BY :TARGET:REQUIRES MATCH (route:Route) -[:FOLLOWS]->(swP:SwitchPosition) -[:TARGET]->(sw:Switch) -[:MONITORED_BY]->(sensor:Sensor) WHERE NOT (route)-[:REQUIRES]->(sensor) RETURN route, sensor, swP, sw sw: Switchsensor: Sensor route: Route swP: SwitchPosition G. Szárnyas, B. Izsó, I. Ráth, D. Varró: The Train Benchmark: cross-technology performance evaluation of continuous model queries. Software and Systems Modeling, 2017
  • 8. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 9. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 10. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 11. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 12. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 13. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 14. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 15. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 16. INCREMENTAL VIEW MAINTENANCE (IVM) In many use cases…  queries are static  data changes slowly -> views can be maintained incrementally Graph applications  model validation  simulation  recommendation systems  fraud detection
  • 17. INGRAPH: IVM ON PROPERTY GRAPHS Idea: map to relational algebra and use standard IVM techniques  Challenging aspects o Property graph data model o Cypher language  Formalise the language in relational algebra  Use nested relational algebra -> closed on operations Prototype tool: ingraph (OCIM1, OCIM2, GraphConnect talks) Gábor Szárnyas, József Marton, Dániel Varró: Formalising openCypher Graph Queries in Relational Algebra. ADBIS 2017
  • 18. INGRAPH / GRAPH TO NESTED RELATIONS
  • 19. INGRAPH / NESTED RELATIONAL ALGEBRA OPS
  • 20. INGRAPH  ingraph uses a procedural IVM approach: the Rete algorithm. o Build caches for each operator o Maintain caches upon changes o Supports 15+ out of 25 LDBC BI queries o Details to be published in a conference paper o Extensible, but very heavy on memory  The rest of the talk focuses on the algebraic approach. Gábor Szárnyas, József Marton et al.: Incremental View Maintenance on Property Graphs. arXiv preprint will be available on the 1st week of June
  • 21. Delta Queries for openCypher
  • 22. DELTA QUERIES AT A GLANCE 𝐺 evaluate query 𝑄 for each Δ𝐺 evaluate Δ𝑄 changes Δ𝐺1, Δ𝐺2, … 𝑄(𝐺) Δ𝑄(Δ𝐺1) Δ𝑄(Δ𝐺2) ⇒ 𝑄(𝐺 + Δ𝐺1 + Δ𝐺2 + ⋯ ) 𝑄 and Δ𝑄 are calculated by the same engine.
  • 23. IMPLEMENTATION: TRIGGERS IN NEO4J  Event-driven programming in databases  Neo4j: TransactionEventHandler interface o afterCommit(TransactionData data, T state) o beforeCommit(TransactionData data) o TransactionData contains Δ𝐺: createdNodes, deletedNodes, …  Only the updated state of the graph is accessible.  GraphAware framework: ImprovedTransactionData API o Get properties and labels/types of deleted elements Max de Marzi: Triggers in Neo4j. 2015 Michal Bachman: Neo4j Improved Transaction Event API. 2014
  • 24. DERIVING DELTA QUERIES a b 1 2 3 4 5 6 7 8 a b 1 2 5 6 7 8 𝑅 𝑚 Idea: given query 𝑄, derive delta queries Δ𝑄 and 𝛻𝑄, which define positive and negative changes, respectively. But: most IVM techniques are defined for relational algebra. Notation:  𝑅 relation  Δ𝑅 positive changes  𝛻𝑅 negative changes  𝑅 𝑚 : maintained relation of 𝑅 ⇒ 𝑅 𝑚 = 𝑅 − 𝛻𝑅 + Δ𝑅  “−” denotes set minus (∖), “+” denotes set union (∪) a b 1 2 3 4 5 6 𝑅 𝛻𝑅 Δ𝑅
  • 25. RELATIONAL ALGEBRA FOR CYPHER  Query plans in Neo4j ≅ relational algebra + Expand/VarExpand.  Expand is essentially a natural join.  Natural join 𝑟 ⋈ 𝑠  Semijoin 𝑟 ⋉ 𝑠 = 𝜋 𝑅 𝑟 ⋈ 𝑠  Antijoin 𝑟 ഥ⋉ 𝑠 = 𝑟 ∖ 𝑟 ⋉ 𝑠  Left outer join 𝑟 ⟕ 𝑠 ≅ 𝑟 ⋈ 𝑠 ∪ 𝑟 ഥ⋉ 𝑠 //plus nulls Andrés Taylor: Neo4j Cypher implementation. First openCypher Implementers Meeting, 2017
  • 26. v1 v2 v3 1 2 3 1 2 6 RELATIONAL ALGEBRA FOR CYPHER Natural join: 𝑟 ⋈ 𝑠 MATCH (v1)-[:r]->(v2)-[:s]->(v3) RETURN * Semijoin: 𝑟 ⋉ 𝑠 MATCH (v1)-[:r]->(v2) WHERE (v2)-[:s]->() Antijoin: 𝑟 ഥ⋉ 𝑠 MATCH (v1)-[:r]->(v2) WHERE NOT (v2)-[:s]->() Left outer join: 𝑟 ⟕ 𝑠 MATCH (v1)-[:r]->(v2) OPTIONAL MATCH (v2)-[:s]->(v3) 1 3 4 2 5 6 :r :r :s :s v1 v2 1 2 v1 v2 4 5 1 3 4 2 5 6 :r :r :s :s 1 3 4 2 5 6 :r :r :s :s 1 3 4 2 5 6 :r :r :s :s 1 32 6 :r :s :s 1 2:r 4 5 :r 1 3 4 2 5 6 :r :r :s :s v1 v2 v3 1 2 3 1 2 6 4 5 null 4 5 v1 v2 v3 1 2 3 1 2 6
  • 27. T. Griffin, B. Kumar: Algebraic Change Propagation for Semijoin and Outerjoin Queries. SIGMOD Record 1998 DERIVING DELTA QUERIES T. Griffin, L. Libkin: Incremental Maintenance of Views with Duplicates. SIGMOD 1995 T. Griffin, L. Libkin, H. Trickey: An Improved Algorithm for the Incremental Recomputation of Active Relational Expressions. TKDE 1997 X. Qian, G. Wiederhold: Incremental Recomputation of Active Relational Expressions. TKDE 1991
  • 28. DELTA QUERIES T. Griffin, L. Libkin: Incremental Maintenance of Views with Duplicates. SIGMOD 1995  The seminal paper  Δ/𝛻 delta queries for joins, selections, projections, etc.  Bag semantics
  • 29. DELTA QUERIES  Semijoins, antijoins, outer joins  Set semantics  Later publications, e.g. Zhou-Larson’s ICDE’07 paper improved these T. Griffin, B. Kumar: Algebraic Change Propagation for Semijoin and Outerjoin Queries. SIGMOD Record 1998
  • 30. EXAMPLE QUERY #1 MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 Relational algebra expression: 𝑎 ⋈ 𝑏 ⋈ 𝑐 Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 = 𝑎 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 = 𝑎 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 = 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 Similarly to 𝛻 𝑎 ⋈ 𝑏 ⋈ 𝑐 . v1 v2 v3 :b:a  𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4 v4 :c
  • 31. EXAMPLE QUERY #1 MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 = 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 UNWIND $pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) RETURN v1, v2, v3, v4 $pcs -> pass lists of nodes/edges as parameters // This only works in embedded mode, see neo4j/issues/10239 v1 v2 v3 :b:a  𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4 v4 :c
  • 32. POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 // r1 = a⋈b⋈Δc UNWIND $pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) RETURN v1, v2, v3, v4 UNION ALL // r2 = a⋈Δb⋈c UNWIND $pbs AS pb MATCH (v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 UNION ALL // r3 = Δa⋈b⋈c UNWIND $pas AS pa MATCH (v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4
  • 33. POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 Long WITH chains are cumbersome -> patterns+list comprehensions. WITH [pc IN $pcs | // r1 = a⋈b⋈Δc [(v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) | [v1, v2, v3, v4]]] [pb IN $pbs | // r2 = a⋈Δb⋈c [(v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) | [v1, v2, v3, v4]]] + [pa IN $pas | // r3 = Δa⋈b⋈c [(v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) | [v1, v2, v3, v4]]] AS r RETURN r[0] AS v1, r[1] AS v2, r[2] AS v3, r[3] AS v4
  • 34. EXAMPLE QUERY #2 MATCH (route:Route) -[:FOLLOWS]->(swP:SwitchPosition) -[:TARGET]->(sw:Switch) -[:MONITORED_BY]->(sensor:Sensor) WHERE NOT (route)-[:REQUIRES]->(sensor) RETURN route, sensor, swP, sw MATCH (v1) -[:a]->(v2) -[:b]->(v3) -[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) RETURN v1, v2, v3, v4 v1 v2 v3v4 :b :c :d :a :FOLLOWS :MONITORED_BY :TARGET:REQUIRES sw: Switchsensor: Sensor route: Route swP: SwitchPosition
  • 35. NEGATIVE CONDITIONS v1 v2 v3v4 :b :c :d :a MATCH (v1) -[:a]->(v2) -[:b]->(v3) -[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) RETURN v1, v2, v3, v4  𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4  𝑑 𝑣1, 𝑣4 ⇒ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
  • 36. DELTA QUERIES FOR JOINS AND ANTIJOINS Natural join  Δ 𝑆 ⋈ 𝑇 = Δ𝑆 ⋈ 𝑇 𝑚 + 𝑆 𝑚 ⋈ Δ𝑇  𝛻 𝑆 ⋈ 𝑇 = 𝛻𝑆 ⋈ 𝑇 + 𝑆 ⋈ 𝛻𝑇 Antijoin  Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚 + Δ𝑆 ഥ⋉ 𝑇 𝑚  𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 Expression 2Expression 1 Only 𝑆 𝑚 and 𝑇 𝑚 are available.
  • 37. SUBEXPRESSIONS 1. Δ𝑇 ഥ⋉ 𝑇 = ?  R1 ഥ⋉ 𝑅2, where 𝑅1 and 𝑅2 both have schema 𝑅.  𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝜋 𝑅 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2  If 𝜃 defines equality on all attributes of 𝑅, the theta join (⋈ 𝜃) becomes a natural join, which is an intersection for relations with the same schema.  𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 ⋈ 𝑅2 = 𝑅1 ∩ 𝑅2  𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ∩ 𝑅2 = 𝑅1 − 𝑅2 ⇒ ∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2 2. 𝑅 𝑚 = 𝑅 − 𝛻𝑅 + Δ𝑅 ⇒ ∗∗ 𝑅 = 𝑅 𝑚 − Δ𝑅 + 𝛻𝑅
  • 38. DELTAS FOR ANTIJOINS Based on Griffin-Kumar’s ’98 paper.  Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚 + Δ𝑆 ഥ⋉ 𝑇 𝑚  Δ 𝑆 ഥ⋉ 𝑇 = ∗ 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚  Δ 𝑆 ഥ⋉ 𝑇 = ∗∗ 𝑆 𝑚 − Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚  Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚 − Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚  𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇  𝛻 𝑆 ഥ⋉ 𝑇 = ∗ 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇  𝛻 𝑆 ഥ⋉ 𝑇 = ∗∗ 𝑆 𝑚 − Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚 − 𝛻𝑇  𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚 − Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚 − 𝛻𝑇 ∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2 ∗∗ 𝑅 = 𝑅 𝑚 − Δ𝑅 + 𝛻𝑅
  • 39. NEGATIVE CONDITIONS  𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4  𝑑 𝑣1, 𝑣4 Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 = 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 𝑚 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑 Pushdown 𝛻𝑑: 𝑎 𝑚 ⋉ 𝛻𝑑 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 v1 v2 v3v4 :b :c :d :a
  • 40. NEGATIVE CONDITIONS Δ ⋈ ⋈ ⋉ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ v1 v2 v3v4 :b :c :d :a  ,  ,  ,  , ⋉ ⋈ ⋈ ⋉ ⋉ ⋈ ⋈ Δ ⋉ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋈ ⋈ ⋉
  • 41. NEGATIVE CONDITIONS ⋉ ∈ . , where ∩ is a single vertex, because and represent edges. Δ ⋈ ⋈ ⋉ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ ⋉ ∈ . Δ ⋉ ∈ . Δ ⋉ ∈ . Δ ⋉ ∈ . Δ R1 R2 R3 R4 R5 R6 R7 ⋉ ⋈ ⋈ ⋉ ⋉ ⋈ ⋈ Δ ⋉ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋈ ⋈ ⋉ S1 S2 S3 S4 v1 v2 v3v4 :b :c :d :a  ,  ,  ,  ,
  • 42. WITH [] AS pes, [] AS nes WITH [pe IN pes WHERE type(pe) = 'a'|pe] AS pas, [pe IN pes WHERE type(pe) = 'b'|pe] AS pbs, [pe IN pes WHERE type(pe) = 'c'|pe] AS pcs, [pe IN pes WHERE type(pe) = 'd'|pe] AS pds, [ne IN nes WHERE type(ne) = 'a'|ne] AS nas, [ne IN nes WHERE type(ne) = 'b'|ne] AS nbs, [ne IN nes WHERE type(ne) = 'c'|ne] AS ncs, [ne IN nes WHERE type(ne) = 'd'|ne] AS nds WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, [nd IN nds | startNode(nd)] AS nd_v1s, [nd IN nds | endNode(nd)] AS nd_v4s // calculating s1s...s4s // s1s: (𝑎⋉∇𝑑) UNWIND nd_v1s AS v1 MATCH (v1)-[:a]->(v2) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, collect({v1: v1, v2: v2}) AS s1s // s2s: (Δ𝑎⋉∇𝑑) UNWIND pas AS pa MATCH (v1)-[pa]->(v2) WHERE v1 IN nd_v1s WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, collect({v1: v1, v2: v2}) AS s2s // s3s: (𝑐⋉∇𝑑) UNWIND nd_v4s AS v4 MATCH (v3)-[:c]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s,note collect({v3: v3, v4: v4}) AS s3s // s4s: (Δ𝑐⋉∇𝑑) UNWIND pcs AS pc MATCH (v3)-[pc]->(v4) WHERE v4 IN nd_v1s WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, collect({v3: v3, v4: v4}) AS s4s // calculating r1...r7 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s // r1: (𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑) UNWIND s1s AS s1 UNWIND s3s AS s3 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4 AS v4 WHERE (v2)-[:b]->(v3) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, collect([v1, v2, v3, v4]) AS r1 // r2: -(𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (Δ𝑐⋉∇𝑑) UNWIND s1s AS s1 UNWIND s4s AS s4 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, s1.v1 AS v1, s1.v2 AS v2, s4.v3 AS v3, s4.v4 AS v4 WHERE (v2)-[:b]->(v3) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, collect([v1, v2, v3, v4]) AS r2 // r3: -(𝑎⋉∇𝑑) ⋈ Δ𝑏 ⋈ (𝑐⋉∇𝑑) UNWIND s1s AS s1 UNWIND s3s AS s3 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4 AS v4 MATCH (v2)-[b:b]->(v3) WHERE b IN pbs WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, collect([v1, v2, v3, v4]) AS r3 // r4: -(Δ𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑) UNWIND s2s AS s2 UNWIND s3s AS s3 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, s2.v1 AS v1, s2.v2 AS v2, s3.v3 AS v3, s3.v4 AS v4 MATCH (v2)-[b:b]->(v3) WHERE b IN pbs WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, collect([v1, v2, v3, v4]) AS r4 // r5: 𝑎 ⋈ 𝑏 ⋈ Δ𝑐 ̅⋉ 𝑑 UNWIND pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:pc]->(v4) WHERE NOT (v1)-[:d]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r4, collect([v1, v2, v3, v4]) AS r5 // r6: 𝑎 ⋈ Δ𝑏 ⋈ 𝑐 ̅⋉ 𝑑 UNWIND pbs AS pb MATCH (v1)-[:a]->(v2)-[:pb]->(v3)-[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r4, r5, collect([v1, v2, v3, v4]) AS r6 // r7: Δ𝑎 ⋈ 𝑏 ⋈ 𝑐 ̅⋉ 𝑑 UNWIND pas AS pa MATCH (v1)-[:pa]->(v2)-[:b]->(v3)-[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r4, r5, r6, collect([v1, v2, v3, v4]) AS r7 WITH r1 + r5 + r6 + r7 AS rp, r2 + r3 + r4 AS rn RETURN [r IN rp WHERE NOT r IN rn] AS results POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
  • 43. POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑  Workaround: knowing the change workload helps o Only consider changes in Δ𝑑 and 𝛻𝑑 o Query is cleaner and much more efficient UNWIND $nds AS nd WITH startNode(nd) AS v1, endNode(nd) AS v4 MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4  This can outperform recomputing the query from scratch.
  • 44. LIST OPERATIONS IN CYPHER Instead of subqueries, use chained queries and combine lists. WITH [1, 1, 2, 2, 3] AS xs UNWIND xs AS x RETURN collect(DISTINCT x) AS unique WITH [1, 2, 3] AS xs, [2] AS ys RETURN xs + ys AS append, [x IN xs WHERE NOT x IN ys] AS subtraction, [x IN xs WHERE x IN ys] AS intersection WITH [1, 1, 2, 2, 3] AS xs RETURN reduce(acc = [], x in xs | acc + CASE x IN acc WHEN false THEN [x] ELSE [] END) AS unique Get unique list in openCypher
  • 45. DELTA QUERIES IN CYPHER Delta queries are complex. Features that would be nice:  Subqueries //pattern comprehensions go some length  Named subqueries //help reusability  Subtracting lists //related: CIR-2017-180  Use collection elements or function results for matching These are probably too much to ask. -> recommended approach: compile directly to query plans. MATCH (n) WITH collect(n) AS ns MATCH (ns[0]) RETURN * MATCH (n) WITH collect(n) AS ns WITH ns[0] AS x MATCH (x) RETURN * 
  • 46. CHALLENGES FOR PROPERTY GRAPH QUERIES Data model  NF2 (Non-First Normal Form): maps, lists  No schema (schema-optionality)  Graph structure Queries  Nulls, antijoins and left outerjoins  Updates on property values  Aggregates on aggregates, non-distributive functions  Ordering + skip/limit  Reachability queries
  • 47. CHALLENGES FOR PROPERTY GRAPH QUERIES Data model  NF2  No schema  Graph Queries  Nulls  Updates  Aggregates  Ordering  Reachability A. Gupta, I. S. Mumick: Materialized Views. MIT Press, 1999 R. Chirkova, J. Yang: Materialized Views. Foundations and Trends in Databases, 2012         Decades of research -> 2 long surveys
  • 48. OUR SURVEY OF RELATED IVM TECHNIQUES
  • 49. DBTOASTER  Shows the scale of the problem  Relational data model and SQL queries  R&D for ~5 years @ EPFL, Johns Hopkins, Cornell, etc.  Approach o Queries over an algebraic ring o Higher-order recursive IVM  Compiler in OCaml  Backend with code generation for C++, Scala/Spark Christoph Koch et al.: DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views. VLDB Journal 2014
  • 50. FUTURE DIRECTIONS  Work out derivation rules for Expand/VarExpand, …  Automate delta query derivation  Integrate to Neo4j  Run performance experiments o Train Benchmark (set semantics) o LDBC Social Network Benchmark’s BI workload (bag semantics)
  • 52. LDBC BENCHMARKS  Social Network Benchmark o Business Intelligence workload published o openCypher reference implementation o Next goal: full conference paper  Graphalytics o Competition is online at graphalytics.org o Neo4j implementation (using the Graph Algorithms library) WIP graphalytics-platforms-neo4j/pull/6 Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton et al.: An early look at the LDBC Social Network Benchmark’s BI Workload. GRADES-NDA at SIGMOD, 2018 ldbc/ldbc_snb_implementations
  • 53. GRAPH ANALYTICS ON THE PANAMA PAPERS  Network science approach: multidimensional graph metrics from social network analysis, biology, physics, etc.  Our work originally targeted software and system models.  Progress in 2018 o Q1: implemented adapters for Neo4j and CSV o Q2 goal: analyse Panama papers and using metrics Gábor Szárnyas, Zsolt Kővári, Ágnes Salánki, Dániel Varró: Towards the Characterization of Realistic Models: Evaluation of Multidisciplinary Graph Metrics, MODELS 2016 ftsrg/model-analyzer
  • 54. MAPPING CYPHER TO SQL  Evaluate graph queries in an RDB - similar to ORM  Approaches o Cytosm: Cypher to SQL Mapper / gTop: graph topology o GraphGen – extracting graphs from RDBs o Ongoing work to map TCK to SQLite B. A. Steer, A. Alnaimi, M. Lotz, F. Cuadrado, L. Vaquero, J. Varvenne: Cytosm: Declarative Property Graph Queries Without Data Migration. GRADES 2017 K. Xirogiannopoulos, V. Srinivas, A. Deshpande: GraphGen: Adaptive Graph Processing using Relational Databases. GRADES 2017 cytosm/cytosm KonstantinosX/graphgen-project
  • 55. NEO4J APOC LIBRARY  CSV loader that follows the schema of the neo4j-import tool  Goal o Use headers to generate LOAD CSV commands. o 1st pass: CALL apoc.import.csv.node(file, labels, …) o 2nd pass: CALL apoc.import.csv.relationship(file, type, …)  Result o Many corner cases -> ~700 LOC + tests o Covers most use cases, but is very slow o APOC PR pending neo4j-apoc-procedures/pull/581 neo4j-documentation/pull/121
  • 56. SCOPING FOR OPENCYPHER p p x c c slizaa-opencypher-xtext/issues/7  Xtext grammar for the Slizaa software analysis workbench  Progress in 2018 o Q1: scope analyser implemented for Cypher grammar of M05 o Q2 goal: update to M10