SlideShare a Scribd company logo
Cypher-based
Graph Pattern
Matching
in Apache Flink
About us
Martin Junghanns
Software Engineer @ Neo4j
PhD Student @ University of Leipzig
Max Kiessling
Software Engineer @ Neo4j
Motivation
Graph = Set of Vertices + Set of Edges
A Real World Example - OrcBook
Sauron’s Data Analyst: “Who are the closest enemies of each Orc?”
What Can you do with it?
Cypher
Flink Gelly
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Graph Pattern Matching in Apache Flink
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Graph Pattern Matching in Apache Flink
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Graph Pattern Matching in Apache Flink
Sauron’s Data Analyst: “
Cypher
MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc),
(o1)-[:HATES]->(o2:Orc),
(o2)-[:LEADER_OF]->(c2:Clan),
(o2)-[:KNOWS*1..10]->(h:Hobbit)
WHERE NOT (c1 = c2 AND o1 = o2)
AND h.name = "Frodo"
RETURN o1.name, o2.name
Flink Gelly
or any other non-declarative
graph processing system
Cypher
Graph Fundamentals - Property Graphs
:Orc
name: Helmut
weapon: Axe
weight: 199
:Hobbit
name: Bilbo
job: Burglar
:Clan
name: Saruman
members: 1337
:HATES
:LEADER_OF
since: 3204
:Hobbit
name: Sam
age: 42
:KNOWS
What is Cypher?
Neo4j’s declarative graph query language
● Used to insert, update and retrieve data from Neo4j
● Designed to be easily understood by people with SQL background
● Support for Pattern Matching, Filtering, Aggregation, Projection
● Results are (multidimensional) Tables
Specified, maintained and extended in the openCypher project
by several academic and industry contributors.
http://www.opencypher.org/
Cypher Query Syntax
Find all vertices with
label Clan and assign
them to c1
Traverse incoming edges of
type LEADER_OF
Describes the pattern that should be
matched
Filters the match results
Specifies which fields to return
MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc),
(o1)-[:HATES]->(o2:Orc),
(o2)-[:LEADER_OF]->(c2:Clan),
(o2)-[:KNOWS*1..10]->(h:Hobbit)
WHERE NOT (c1 = c2 AND o1 = o2)
AND h.name = "Frodo"
RETURN o1.name, o2.name
Cypher Query Syntax
MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc),
(o1)-[:HATES]->(o2:Orc),
(o2)-[:LEADER_OF]->(c2:Clan),
(o2)-[:KNOWS*1..10]->(h:Hobbit)
WHERE NOT (c1 = c2 AND o1 = o2)
AND h.name = "Frodo"
RETURN o1.name, o2.name
Cypher Query Syntax
MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc),
(o1)-[:HATES]->(o2:Orc),
(o2)-[:LEADER_OF]->(c2:Clan),
(o2)-[:KNOWS*1..10]->(h:Hobbit)
WHERE NOT (c1 = c2 AND o1 = o2)
AND h.name = "Frodo"
RETURN o1.name, o2.name
Cypher Query Syntax
MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc),
(o1)-[:HATES]->(o2:Orc),
(o2)-[:LEADER_OF]->(c2:Clan),
(o2)-[:KNOWS*1..10]->(h:Hobbit)
WHERE NOT (c1 = c2 AND o1 = o2)
AND h.name = "Frodo"
RETURN o1.name, o2.name
Cypher Query Syntax
MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc),
(o1)-[:HATES]->(o2:Orc),
(o2)-[:LEADER_OF]->(c2:Clan),
(o2)-[:KNOWS*1..10]->(h:Hobbit)
WHERE NOT (c1 = c2 AND o1 = o2)
AND h.name = "Frodo"
RETURN o1.name, o2.name
Cypher Operator in Gradoop
Apache Flink (DataSet API)
Apache Flink Operator Implementation
Extended Property Graph Model (EPGM)
Graph Dataflow Operators
I/O
Cypher
„An open-source graph dataflow framework for
declarative analytics of heterogeneous graph data.“
2
1|Area|title:Mordor
|Area|title:Shire
Extended Property Graph Model (EPGM)
1 2
3
4
5
1 2 3
4
5
:Hobbit
name : Samwise
:Orc
name : Azog
:Clan
name : Tribes of Moria
founded : 1981
:Orc
name : Bolg
:Hobbit
name : Frodo
yob : 2968
:LEADER_OF
since : 2790
:MEMBER_OF
since : 2013
:HATES
since : 2301
:HATES
:KNOWS
since : 2990
= Property Graph Model
+ Logical Graphs
+ Graph Transformations
○ Subgraph
○ Aggregation
○ Transformation
○ Grouping
○ Cypher
○ ...
Pattern
Graph Collection
LogicalGraph graph1 = new CSVDataSource(“hdfs:///path/to/graph”, conf).getLogicalGraph();
a b1
1 2 3
4
5
2|map:{a:1, b:2} 3|map:{a:3, b:4}
1 2 3
4
Cypher Operator
GraphCollection collection = graph1.cypher(pattern);
String pattern = “MATCH (a:Green)-[:orange]->(b:Orange)”;
Internal Representation
2|map:{a:1,e:10,b:2} 3|map:{a:3,e:30,b:4}
1 2 3
4
id label
1 Green
2 Orange
3 Green
4 Orange
5 Orange
id src trgt label
10 1 2 ORANGE
20 3 2 BLUE
30 3 4 ORANGE
40 3 5 BLUE
1
1 2 3
4
5
2010
30
40
DataSet<Vertex> DataSet<Edge>
a be
10
30
id label graphs
1 Green {2}
2 Orange {2}
3 Green {3}
4 Orange {4}
id src trgt label graphs
10 1 2 ORANGE {2}
30 3 4 ORANGE {3}
DataSet<Vertex> DataSet<Edge>
id label properties
2 Green map:{a:1, e:10, b:2}
3 Orange map:{a:3, e:30, b:4}
DataSet<GraphHead>
Cypher Engine
Parsing Execution
c1
o2 h
c2
o1
((c1 != c2) AND (o1 != o2)
AND (h.name = Frodo Baggins)
=> 23
=> 42
=> 84
=> 123
=> 456
=> 789
0
3 4
1
2
0
3 5
1
2 4
0
3 6
1
2 4
0
3 6
1
2
4
7
Planning
Query Overview
MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc),
(o1)-[:HATES]->(o2:Orc),
(o2)-[:LEADER_OF]->(c2:Clan),
(o2)-[:KNOWS*1..10]->(h:Hobbit)
WHERE NOT (c1 = c2 AND o1 = o2)
AND h.name = "Frodo"
RETURN o1.name, o2.name
Parsing and Query Rewriting
o1 h
__edge01
Query Graph
(o1.weapon = "Axe" AND o1.weight > h.weight)
OR o1.weapon = "Sword"
(o1.weapon = "Axe" OR o1.weapon = "Sword")
AND
(o1.weight > h.weight OR o1.weapon = "Sword")
Conjunctive-Normal-Form
Transformation
ANTLR
MATCH (o1:Orc)-[:KNOWS]->(h:Hobbit)
WHERE (o1.weapon = "Axe" AND o1.weight > h.weight)
OR o1.weapon = "Sword"
Intermediate Result Representation - Embedding
1 2
10
Embedding = Mapping between Query graph and Input (Sub-)Graph
a be
Embedding f
f(a) = 1
f(e) = 10
f(b) = 2
Query operators - Filter and Project
Filter
WHERE h.name = ‘Frodo’
Project
DataSet<Vertex> DataSet<Embedding>
vertices.flatMap(FilterAndProject)
id label properties
1 Orc {...}
2 Clan {...}
3 Hobbit {...}
... ... ...
h.id h.name h.height ...
31 Frodo 1.22 ...
h.id
31
Query operators - Join Embeddings
DataSet<Embedding> lhs DataSet<Embedding>
lhs.flatJoin(rhs).with(Combine)DataSet<Embedding> rhs
Combine
JoinEmbeddings
Left: (c1:Clan)<-[:HAS_LEADER]-(o1:Orc)
Right: (o1:Orc)-[:HATES]->(o2.Orc)
c.id _e1.id o1.id
51 11 2
52 12 3
... ... ...
c.id _e1.id o1.id _e2.id o2.id
51 11 2 13 5
52 12 3 14 3
... ... ... ... ...
o1.id _e2.id o2.id
2 13 5
3 14 3
... ... ...
Query operators - Expand Embeddings
ExpandEmbeddings
Left: (o2:Orc)
Edges: (o2)-[:KNOWS*1..10]->(h:Hobbit)
DataSet<Embedding> lhs
DataSet<Embedding>
DataSet<Embedding> rhs
BulkIteration ws = lhs.join(rhs))
filteredPaths = ws.filter(filterByLength)
newPaths = filteredPaths.flatJoin(rhs, combine)
nextWs = ws.union(newPaths)
Combine
Check for vertex/edge isomorphism
o2.id
5
_e3.sid _e3.id _e3.tid
5 26 31
31 27 32
32 28 33
o2.id _e3.id h.id
3 [26] 31
3 [26,31,27] 32
3 [26,31,27,32,28] 33
Query operators - Filter embeddings
FilterEmbeddings
o1.sid _e2.id o2.tid
2 13 5
3 14 3
... ... ...
o1.sid _e2.id o2.tid
2 13 5
... ... ...
DataSet<Embedding> DataSet<Embedding>embeddings.filter(ByPredicate)
Cost-based greedy query planning
● Problem: Query can be computed in a factorial number of ways
○ Goal: Find a way (plan) with minimal / low computation costs
● Use statistics about the input graph
○ Vertex-/Edge counts by label, i.e., label distributions
○ Distinct value counts (source, target) by edge label
○ Property value distributions
● Cost calculation for computing intermediate results
○ Primarily based on join result estimation
○ Filters and projections are evaluated as early as possible
● Planner iteratively builds a physical query plan
○ Greedy: picks plan with minimum cost with each iteration
MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc),
(o1)-[:HATES]->(o2:Orc),
(o2)-[:LEADER_OF]->(c2:Clan),
(o2)-[:KNOWS*1..10]->(h:Hobbit)
WHERE NOT (c1 = c2 AND o1 = o2)
AND h.name = "Frodo"
RETURN o1.name, o2.name
PlanTableEntry | type: GRAPH | all-vars: [...] | proc-vars: [...] | attr-vars: [] | est-card: 23 | prediates: () | Plan :
|-FilterEmbeddingsNode{filterPredicate=((c1 != c2) AND (o1 != o2))}
|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I}
|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[c1], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c1, filterPredicate=((c1.label = Clan)), projectionKeys=[]}
|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e0', targetVar='c1', filterPredicate=((_e0.label = leaderOf)), projectionKeys=[]}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o1, filterPredicate=((o1.label = Orc)), projectionKeys=[]}
|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e1', targetVar='o2', filterPredicate=((_e1.label = hates)), projectionKeys=[]}
|.|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[h], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=h, filterPredicate=((h.label = Hobbit) AND (h.name = Frodo Baggins)), projectionKeys=[]}
|.|.|.|.|-ExpandEmbeddingsNode={startVar='o2', pathVar='_e3', endVar='h', lb=1, ub=10, direction=OUT, vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o2, filterPredicate=((o2.label = Orc)), projectionKeys=[]}
|.|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e3', targetVar='h', filterPredicate=((_e3.label = knows)), projectionKeys=[]}
|.|.|.|-JoinEmbeddingsNode{joinVariables=[c2], vertexMorphism=H, edgeMorphism=I}
|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c2, filterPredicate=((c2.label = Clan)), projectionKeys=[]}
|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e2', targetVar='c2', filterPredicate=((_e2.label = leaderOf)), projectionKeys=[]}
MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc),
(o1)-[:HATES]->(o2:Orc),
(o2)-[:LEADER_OF]->(c2:Clan),
(o2)-[:KNOWS*1..10]->(h:Hobbit)
WHERE NOT (c1 = c2 AND o1 = o2)
AND h.name = "Frodo"
RETURN o1.name, o2.name
Demo
Future work
● Optimizations
○ DP-Planner
○ Improve cost model (more statistics, Flink optimizer hints)
○ Reuse of intermediate results
● Support more Cypher features
○ e.g. Aggregation and Functions
● Introduce new Cypher features
○ e.g. regular path queries
Further reading / Contributing
Gradoop: http://www.gradoop.com
Demo: https://github.com/dbs-leipzig/gradoop_demo
Paper: https://event.cwi.nl/grades/2017/03-Junghanns.pdf
Neo4j: https://neo4j.com/
openCypher: http://www.openCypher.org
Q & A

More Related Content

PDF
How to write rust instead of c and get away with it
PDF
Beyond tf idf why, what & how
PDF
[CB20] Cryfind : A Static Tool to Identify Cryptographic Algorithm in Binary ...
PDF
Python Performance 101
PDF
Java8 stream
PDF
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
PDF
HHVM on AArch64 - BUD17-400K1
PDF
Functional Scala 2020
How to write rust instead of c and get away with it
Beyond tf idf why, what & how
[CB20] Cryfind : A Static Tool to Identify Cryptographic Algorithm in Binary ...
Python Performance 101
Java8 stream
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
HHVM on AArch64 - BUD17-400K1
Functional Scala 2020

What's hot (20)

PDF
The elements of a functional mindset
PPTX
PDF
미려한 UI/UX를 위한 여정
KEY
Template Haskell とか
PDF
Recognize Godzilla
PDF
The Ring programming language version 1.8 book - Part 50 of 202
PDF
Refactoring to Macros with Clojure
PPTX
Poor Man's Functional Programming
PDF
Ray Tracing with ZIO
PDF
Делаем пользовательское Api на базе Shapeless
PDF
Extend R with Rcpp!!!
PDF
Ray tracing with ZIO-ZLayer
DOCX
Dotnet 18
PPT
Euro python2011 High Performance Python
PDF
Rainer Grimm, “Functional Programming in C++11”
PDF
Hadoop I/O Analysis
PDF
The Ring programming language version 1.5.3 book - Part 87 of 184
PDF
The Ring programming language version 1.8 book - Part 39 of 202
PDF
GECon 2017: C++ - a Monster that no one likes but that will outlast them all
PDF
GECon2017_Cpp a monster that no one likes but that will outlast them all _Ya...
The elements of a functional mindset
미려한 UI/UX를 위한 여정
Template Haskell とか
Recognize Godzilla
The Ring programming language version 1.8 book - Part 50 of 202
Refactoring to Macros with Clojure
Poor Man's Functional Programming
Ray Tracing with ZIO
Делаем пользовательское Api на базе Shapeless
Extend R with Rcpp!!!
Ray tracing with ZIO-ZLayer
Dotnet 18
Euro python2011 High Performance Python
Rainer Grimm, “Functional Programming in C++11”
Hadoop I/O Analysis
The Ring programming language version 1.5.3 book - Part 87 of 184
The Ring programming language version 1.8 book - Part 39 of 202
GECon 2017: C++ - a Monster that no one likes but that will outlast them all
GECon2017_Cpp a monster that no one likes but that will outlast them all _Ya...
Ad

Similar to Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Graph Pattern Matching in Apache Flink (20)

PDF
The 2nd graph database in sv meetup
PPTX
The openCypher Project - An Open Graph Query Language
PDF
03 introduction to graph databases
PDF
Writing a Cypher Engine in Clojure
PDF
Neo4j Graph Data Science Training - June 9 & 10 - Slides #3
PDF
Extended Property Graphs and Cypher on Gradoop
PPTX
Mapping Graph Queries to PostgreSQL
PDF
How openCypher is implemented in Neo4j
PPTX
A Game of Data and GraphQL
PDF
Intro to Cypher
PDF
openCypher Technology Compatibility Kit (TCK)
PDF
openCypher: Further Developments on Path Pattern Queries (Regular Path Queries)
PDF
Incremental Graph Queries for Cypher
KEY
Cypher inside out: Como a linguagem de pesquisas em grafo do Neo4j foi constr...
PDF
The inGraph project and incremental evaluation of Cypher queries
PDF
AgensGraph Presentation at PGConf.us 2017
PDF
Path Pattern Queries: Introducing Regular Path Queries in openCypher
PDF
Neo4j Introduction (Basics, Cypher, RDBMS to GRAPH)
PPTX
Inductive Triple Graphs: A purely functional approach to represent RDF
PDF
Computing probabilistic queries in the presence of uncertainty via probabilis...
The 2nd graph database in sv meetup
The openCypher Project - An Open Graph Query Language
03 introduction to graph databases
Writing a Cypher Engine in Clojure
Neo4j Graph Data Science Training - June 9 & 10 - Slides #3
Extended Property Graphs and Cypher on Gradoop
Mapping Graph Queries to PostgreSQL
How openCypher is implemented in Neo4j
A Game of Data and GraphQL
Intro to Cypher
openCypher Technology Compatibility Kit (TCK)
openCypher: Further Developments on Path Pattern Queries (Regular Path Queries)
Incremental Graph Queries for Cypher
Cypher inside out: Como a linguagem de pesquisas em grafo do Neo4j foi constr...
The inGraph project and incremental evaluation of Cypher queries
AgensGraph Presentation at PGConf.us 2017
Path Pattern Queries: Introducing Regular Path Queries in openCypher
Neo4j Introduction (Basics, Cypher, RDBMS to GRAPH)
Inductive Triple Graphs: A purely functional approach to represent RDF
Computing probabilistic queries in the presence of uncertainty via probabilis...
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
Using the New Apache Flink Kubernetes Operator in a Production Deployment
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg

Recently uploaded (20)

PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Lecture1 pattern recognition............
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to machine learning and Linear Models
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Introduction to the R Programming Language
PDF
Business Analytics and business intelligence.pdf
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
Supervised vs unsupervised machine learning algorithms
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Knowledge Engineering Part 1
oil_refinery_comprehensive_20250804084928 (1).pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
climate analysis of Dhaka ,Banglades.pptx
Lecture1 pattern recognition............
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to machine learning and Linear Models
Clinical guidelines as a resource for EBP(1).pdf
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction-to-Cloud-ComputingFinal.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to the R Programming Language
Business Analytics and business intelligence.pdf
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Mega Projects Data Mega Projects Data
Business Ppt On Nestle.pptx huunnnhhgfvu

Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Graph Pattern Matching in Apache Flink

  • 2. About us Martin Junghanns Software Engineer @ Neo4j PhD Student @ University of Leipzig Max Kiessling Software Engineer @ Neo4j
  • 4. Graph = Set of Vertices + Set of Edges
  • 5. A Real World Example - OrcBook
  • 6. Sauron’s Data Analyst: “Who are the closest enemies of each Orc?” What Can you do with it?
  • 14. Flink Gelly or any other non-declarative graph processing system
  • 16. Graph Fundamentals - Property Graphs :Orc name: Helmut weapon: Axe weight: 199 :Hobbit name: Bilbo job: Burglar :Clan name: Saruman members: 1337 :HATES :LEADER_OF since: 3204 :Hobbit name: Sam age: 42 :KNOWS
  • 17. What is Cypher? Neo4j’s declarative graph query language ● Used to insert, update and retrieve data from Neo4j ● Designed to be easily understood by people with SQL background ● Support for Pattern Matching, Filtering, Aggregation, Projection ● Results are (multidimensional) Tables Specified, maintained and extended in the openCypher project by several academic and industry contributors. http://www.opencypher.org/
  • 18. Cypher Query Syntax Find all vertices with label Clan and assign them to c1 Traverse incoming edges of type LEADER_OF Describes the pattern that should be matched Filters the match results Specifies which fields to return MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc), (o1)-[:HATES]->(o2:Orc), (o2)-[:LEADER_OF]->(c2:Clan), (o2)-[:KNOWS*1..10]->(h:Hobbit) WHERE NOT (c1 = c2 AND o1 = o2) AND h.name = "Frodo" RETURN o1.name, o2.name
  • 19. Cypher Query Syntax MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc), (o1)-[:HATES]->(o2:Orc), (o2)-[:LEADER_OF]->(c2:Clan), (o2)-[:KNOWS*1..10]->(h:Hobbit) WHERE NOT (c1 = c2 AND o1 = o2) AND h.name = "Frodo" RETURN o1.name, o2.name
  • 20. Cypher Query Syntax MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc), (o1)-[:HATES]->(o2:Orc), (o2)-[:LEADER_OF]->(c2:Clan), (o2)-[:KNOWS*1..10]->(h:Hobbit) WHERE NOT (c1 = c2 AND o1 = o2) AND h.name = "Frodo" RETURN o1.name, o2.name
  • 21. Cypher Query Syntax MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc), (o1)-[:HATES]->(o2:Orc), (o2)-[:LEADER_OF]->(c2:Clan), (o2)-[:KNOWS*1..10]->(h:Hobbit) WHERE NOT (c1 = c2 AND o1 = o2) AND h.name = "Frodo" RETURN o1.name, o2.name
  • 22. Cypher Query Syntax MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc), (o1)-[:HATES]->(o2:Orc), (o2)-[:LEADER_OF]->(c2:Clan), (o2)-[:KNOWS*1..10]->(h:Hobbit) WHERE NOT (c1 = c2 AND o1 = o2) AND h.name = "Frodo" RETURN o1.name, o2.name
  • 24. Apache Flink (DataSet API) Apache Flink Operator Implementation Extended Property Graph Model (EPGM) Graph Dataflow Operators I/O Cypher „An open-source graph dataflow framework for declarative analytics of heterogeneous graph data.“
  • 25. 2 1|Area|title:Mordor |Area|title:Shire Extended Property Graph Model (EPGM) 1 2 3 4 5 1 2 3 4 5 :Hobbit name : Samwise :Orc name : Azog :Clan name : Tribes of Moria founded : 1981 :Orc name : Bolg :Hobbit name : Frodo yob : 2968 :LEADER_OF since : 2790 :MEMBER_OF since : 2013 :HATES since : 2301 :HATES :KNOWS since : 2990 = Property Graph Model + Logical Graphs + Graph Transformations ○ Subgraph ○ Aggregation ○ Transformation ○ Grouping ○ Cypher ○ ...
  • 26. Pattern Graph Collection LogicalGraph graph1 = new CSVDataSource(“hdfs:///path/to/graph”, conf).getLogicalGraph(); a b1 1 2 3 4 5 2|map:{a:1, b:2} 3|map:{a:3, b:4} 1 2 3 4 Cypher Operator GraphCollection collection = graph1.cypher(pattern); String pattern = “MATCH (a:Green)-[:orange]->(b:Orange)”;
  • 27. Internal Representation 2|map:{a:1,e:10,b:2} 3|map:{a:3,e:30,b:4} 1 2 3 4 id label 1 Green 2 Orange 3 Green 4 Orange 5 Orange id src trgt label 10 1 2 ORANGE 20 3 2 BLUE 30 3 4 ORANGE 40 3 5 BLUE 1 1 2 3 4 5 2010 30 40 DataSet<Vertex> DataSet<Edge> a be 10 30 id label graphs 1 Green {2} 2 Orange {2} 3 Green {3} 4 Orange {4} id src trgt label graphs 10 1 2 ORANGE {2} 30 3 4 ORANGE {3} DataSet<Vertex> DataSet<Edge> id label properties 2 Green map:{a:1, e:10, b:2} 3 Orange map:{a:3, e:30, b:4} DataSet<GraphHead>
  • 29. Parsing Execution c1 o2 h c2 o1 ((c1 != c2) AND (o1 != o2) AND (h.name = Frodo Baggins) => 23 => 42 => 84 => 123 => 456 => 789 0 3 4 1 2 0 3 5 1 2 4 0 3 6 1 2 4 0 3 6 1 2 4 7 Planning Query Overview MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc), (o1)-[:HATES]->(o2:Orc), (o2)-[:LEADER_OF]->(c2:Clan), (o2)-[:KNOWS*1..10]->(h:Hobbit) WHERE NOT (c1 = c2 AND o1 = o2) AND h.name = "Frodo" RETURN o1.name, o2.name
  • 30. Parsing and Query Rewriting o1 h __edge01 Query Graph (o1.weapon = "Axe" AND o1.weight > h.weight) OR o1.weapon = "Sword" (o1.weapon = "Axe" OR o1.weapon = "Sword") AND (o1.weight > h.weight OR o1.weapon = "Sword") Conjunctive-Normal-Form Transformation ANTLR MATCH (o1:Orc)-[:KNOWS]->(h:Hobbit) WHERE (o1.weapon = "Axe" AND o1.weight > h.weight) OR o1.weapon = "Sword"
  • 31. Intermediate Result Representation - Embedding 1 2 10 Embedding = Mapping between Query graph and Input (Sub-)Graph a be Embedding f f(a) = 1 f(e) = 10 f(b) = 2
  • 32. Query operators - Filter and Project Filter WHERE h.name = ‘Frodo’ Project DataSet<Vertex> DataSet<Embedding> vertices.flatMap(FilterAndProject) id label properties 1 Orc {...} 2 Clan {...} 3 Hobbit {...} ... ... ... h.id h.name h.height ... 31 Frodo 1.22 ... h.id 31
  • 33. Query operators - Join Embeddings DataSet<Embedding> lhs DataSet<Embedding> lhs.flatJoin(rhs).with(Combine)DataSet<Embedding> rhs Combine JoinEmbeddings Left: (c1:Clan)<-[:HAS_LEADER]-(o1:Orc) Right: (o1:Orc)-[:HATES]->(o2.Orc) c.id _e1.id o1.id 51 11 2 52 12 3 ... ... ... c.id _e1.id o1.id _e2.id o2.id 51 11 2 13 5 52 12 3 14 3 ... ... ... ... ... o1.id _e2.id o2.id 2 13 5 3 14 3 ... ... ...
  • 34. Query operators - Expand Embeddings ExpandEmbeddings Left: (o2:Orc) Edges: (o2)-[:KNOWS*1..10]->(h:Hobbit) DataSet<Embedding> lhs DataSet<Embedding> DataSet<Embedding> rhs BulkIteration ws = lhs.join(rhs)) filteredPaths = ws.filter(filterByLength) newPaths = filteredPaths.flatJoin(rhs, combine) nextWs = ws.union(newPaths) Combine Check for vertex/edge isomorphism o2.id 5 _e3.sid _e3.id _e3.tid 5 26 31 31 27 32 32 28 33 o2.id _e3.id h.id 3 [26] 31 3 [26,31,27] 32 3 [26,31,27,32,28] 33
  • 35. Query operators - Filter embeddings FilterEmbeddings o1.sid _e2.id o2.tid 2 13 5 3 14 3 ... ... ... o1.sid _e2.id o2.tid 2 13 5 ... ... ... DataSet<Embedding> DataSet<Embedding>embeddings.filter(ByPredicate)
  • 36. Cost-based greedy query planning ● Problem: Query can be computed in a factorial number of ways ○ Goal: Find a way (plan) with minimal / low computation costs ● Use statistics about the input graph ○ Vertex-/Edge counts by label, i.e., label distributions ○ Distinct value counts (source, target) by edge label ○ Property value distributions ● Cost calculation for computing intermediate results ○ Primarily based on join result estimation ○ Filters and projections are evaluated as early as possible ● Planner iteratively builds a physical query plan ○ Greedy: picks plan with minimum cost with each iteration
  • 37. MATCH (c1:Clan)<-[:LEADER_OF]-(o1:Orc), (o1)-[:HATES]->(o2:Orc), (o2)-[:LEADER_OF]->(c2:Clan), (o2)-[:KNOWS*1..10]->(h:Hobbit) WHERE NOT (c1 = c2 AND o1 = o2) AND h.name = "Frodo" RETURN o1.name, o2.name PlanTableEntry | type: GRAPH | all-vars: [...] | proc-vars: [...] | attr-vars: [] | est-card: 23 | prediates: () | Plan : |-FilterEmbeddingsNode{filterPredicate=((c1 != c2) AND (o1 != o2))} |.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c1, filterPredicate=((c1.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e0', targetVar='c1', filterPredicate=((_e0.label = leaderOf)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o1, filterPredicate=((o1.label = Orc)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e1', targetVar='o2', filterPredicate=((_e1.label = hates)), projectionKeys=[]} |.|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[h], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=h, filterPredicate=((h.label = Hobbit) AND (h.name = Frodo Baggins)), projectionKeys=[]} |.|.|.|.|-ExpandEmbeddingsNode={startVar='o2', pathVar='_e3', endVar='h', lb=1, ub=10, direction=OUT, vertexMorphism=H, edgeMorphism=I} |.|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o2, filterPredicate=((o2.label = Orc)), projectionKeys=[]} |.|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e3', targetVar='h', filterPredicate=((_e3.label = knows)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c2], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c2, filterPredicate=((c2.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e2', targetVar='c2', filterPredicate=((_e2.label = leaderOf)), projectionKeys=[]}
  • 39. Demo
  • 40. Future work ● Optimizations ○ DP-Planner ○ Improve cost model (more statistics, Flink optimizer hints) ○ Reuse of intermediate results ● Support more Cypher features ○ e.g. Aggregation and Functions ● Introduce new Cypher features ○ e.g. regular path queries
  • 41. Further reading / Contributing Gradoop: http://www.gradoop.com Demo: https://github.com/dbs-leipzig/gradoop_demo Paper: https://event.cwi.nl/grades/2017/03-Junghanns.pdf Neo4j: https://neo4j.com/ openCypher: http://www.openCypher.org
  • 42. Q & A