SlideShare a Scribd company logo
www.arangodb.com
Handling Billions Of Edges in a
Graph Database
Michael Hackstein
@mchacki
New Technology
Michael Hackstein
‣ ArangoDB Core Team
‣ Web Frontend
‣ Graph visualisation
‣ Graph features
‣ SmartGraphs
‣ Host of cologne.js
‣ Master’s Degree

(spec. Databases and

Information Systems)
2
What are Graph Databases
‣ Schema-free Objects (Vertices)
‣ Relations between them (Edges)
‣ Edges have a direction
3
{
name: "alice",
age: 32
}
{
name: "bob",
age: 35,
size: 1,73m
}
{
name: "fishing"
}
{
name: "reading"
}
{
name: "dancing"
}
married
hobby
hobby
hobby
hobby
What are Graph Databases
‣ Schema-free Objects (Vertices)
‣ Relations between them (Edges)
‣ Edges have a direction
‣ Edges can be queried in both
directions
‣ Easily query a range of edges (2 to
5)
‣ Undefined number of edges (1 to *)
‣ Shortest Path between two vertices
3
{
name: "alice",
age: 32
}
{
name: "bob",
age: 35,
size: 1,73m
}
{
name: "fishing"
}
{
name: "reading"
}
{
name: "dancing"
}
married
hobby
hobby
hobby
hobby
Typical Graph Queries
4
Typical Graph Queries
‣ Give me all friends of Alice
4
Typical Graph Queries
‣ Give me all friends of Alice
‣ Give me all friends-of-friends of Alice
4
Typical Graph Queries
‣ Give me all friends of Alice
‣ Give me all friends-of-friends of Alice
‣ What is the linking path between Alice and Bob
4
Typical Graph Queries
‣ Give me all friends of Alice
‣ Give me all friends-of-friends of Alice
‣ What is the linking path between Alice and Bob
‣ Which Trainstations can I reach if I am allowed to drive a distance of 6 stations on my
ticket
4
Typical Graph Queries
‣ Give me all friends of Alice
‣ Give me all friends-of-friends of Alice
‣ What is the linking path between Alice and Bob
‣ Which Trainstations can I reach if I am allowed to drive a distance of 6 stations on my
ticket
‣ Pattern Matching:
4
Typical Graph Queries
‣ Give me all friends of Alice
‣ Give me all friends-of-friends of Alice
‣ What is the linking path between Alice and Bob
‣ Which Trainstations can I reach if I am allowed to drive a distance of 6 stations on my
ticket
‣ Pattern Matching:
‣ Give me all users that share two hobbies with Alice
4
Typical Graph Queries
‣ Give me all friends of Alice
‣ Give me all friends-of-friends of Alice
‣ What is the linking path between Alice and Bob
‣ Which Trainstations can I reach if I am allowed to drive a distance of 6 stations on my
ticket
‣ Pattern Matching:
‣ Give me all users that share two hobbies with Alice
‣ Give me all products that at least one of my friends has bought together with the
products I already own, ordered by how many friends have bought it and the products
rating, but only 20 of them.
4
Non-Typical Graph Queries
5
Non-Typical Graph Queries
‣ Give me all users which have an age attribute between 21 and 35.
5
Non-Typical Graph Queries
‣ Give me all users which have an age attribute between 21 and 35.
‣ Give me the age distribution of all users
5
Non-Typical Graph Queries
‣ Give me all users which have an age attribute between 21 and 35.
‣ Give me the age distribution of all users
‣ Group all users by their name
5
Traversal
6
Iterate down two edges with some filters
Traversal
‣ We first pick a start vertex (S)
6
S
Iterate down two edges with some filters
Traversal
‣ We first pick a start vertex (S)
‣ We collect all edges on S
6
S
A
B
C
Iterate down two edges with some filters
Traversal
‣ We first pick a start vertex (S)
‣ We collect all edges on S
‣ We apply filters on edges
6
S
A
B
C
Iterate down two edges with some filters
Traversal
‣ We first pick a start vertex (S)
‣ We collect all edges on S
‣ We apply filters on edges
‣ We iterate down one of the new vertices (A)
6
S
A
B
C
D
E
Iterate down two edges with some filters
Traversal
‣ We first pick a start vertex (S)
‣ We collect all edges on S
‣ We apply filters on edges
‣ We iterate down one of the new vertices (A)
‣ We apply filters on edges
6
S
A
B
C
D
E
Iterate down two edges with some filters
Traversal
‣ We first pick a start vertex (S)
‣ We collect all edges on S
‣ We apply filters on edges
‣ We iterate down one of the new vertices (A)
‣ We apply filters on edges
‣ The next vertex (E) is in desired depth. Return the path
S -> A -> E
6
S
A
B
C
D
E
Iterate down two edges with some filters
Traversal
‣ We first pick a start vertex (S)
‣ We collect all edges on S
‣ We apply filters on edges
‣ We iterate down one of the new vertices (A)
‣ We apply filters on edges
‣ The next vertex (E) is in desired depth. Return the path
S -> A -> E
‣ Go back to the next unfinished vertex (B)
6
S
A
B
C
D
E
Iterate down two edges with some filters
Traversal
‣ We first pick a start vertex (S)
‣ We collect all edges on S
‣ We apply filters on edges
‣ We iterate down one of the new vertices (A)
‣ We apply filters on edges
‣ The next vertex (E) is in desired depth. Return the path
S -> A -> E
‣ Go back to the next unfinished vertex (B)
‣ We iterate down on (B)
6
S
A
B
C
D
E
Iterate down two edges with some filters
F
Traversal
‣ We first pick a start vertex (S)
‣ We collect all edges on S
‣ We apply filters on edges
‣ We iterate down one of the new vertices (A)
‣ We apply filters on edges
‣ The next vertex (E) is in desired depth. Return the path
S -> A -> E
‣ Go back to the next unfinished vertex (B)
‣ We iterate down on (B)
‣ We apply filters on edges
6
S
A
B
C
D
E
Iterate down two edges with some filters
F
Traversal
‣ We first pick a start vertex (S)
‣ We collect all edges on S
‣ We apply filters on edges
‣ We iterate down one of the new vertices (A)
‣ We apply filters on edges
‣ The next vertex (E) is in desired depth. Return the path
S -> A -> E
‣ Go back to the next unfinished vertex (B)
‣ We iterate down on (B)
‣ We apply filters on edges
‣ The next vertex (F) is in desired depth. Return the path
S -> B -> F
6
S
A
B
C
D
E
Iterate down two edges with some filters
F
Traversal - Complexity
‣ Once:
‣ Find the start vertex
‣ For every depth:
‣ Find all connected edges
‣ Filter non-matching edges
‣ Find connected vertices
‣ Filter non-matching vertices
7
Depends on indexes: Hash:
Edge-Index or Index-Free:
Linear in edges:
Depends on indexes: Hash:
Linear in vertices:
Only one pass:
O
1
1
n
n * 1
n
3n
Traversal - Complexity
‣ Linear sounds evil?
‣ NOT linear in All Edges O(E)
‣ Only Linear in relevant Edges n < E
‣ Traversals solely scale with their result size.
‣ They are not effected at all by total amount of data
‣ BUT: Every depth increases the exponent: O(3 * n )
‣ "7 degrees of separation": 3*n < E < 3*n
8
d
6 7
‣ MULTI-MODEL database
‣ Stores Documents and Graphs
‣ Query language AQL
‣ Document Queries
‣ Graph Queries
‣ Joins
‣ All can be combined in the same statement
‣ ACID support including Multi Collection Transactions
9
AQL
10
FOR user IN users
RETURN user
AQL
11
FOR user IN users
FILTER user.name == "alice"
RETURN user
AQL
12
FOR user IN users
FILTER user.name == "alice"
FOR product IN OUTBOUND user has_bought
RETURN product
Alice TV
has_bought
AQL
13
FOR user IN users
FILTER user.name == "alice"
FOR recommendation, action, path IN 3 ANY user has_bought
FILTER path.vertices[2].age <= user.age + 5
AND path.vertices[2].age >= user.age - 5
FILTER recommendation.price < 25
LIMIT 10
RETURN recommendation
Alice TV
has_bought
Bob Playstation
has_boughthas_bought
alice.age - 5 <= bob.age &&
bob.age <= alice.age + 5 playstation.price < 25
14
Demo Time
Querying basics
First Boost - Vertex Centric Indices
‣ Remember Complexity? O(3 * n )
‣ Filtering of non-matching edges is linear for every depth
‣ Index all edges based on their vertices and arbitrary other attributes
‣ Find initial set of edges in identical time
‣ Less / No post-filtering required
‣ This decreases the n
15
d
16
Demo Time
Vertex-Centric Indices
Scaling
‣ Vertex-Centric Indexes help with super-nodes
‣ But what if the graph is too large for one machine?
‣ Distribute graph on several machines (sharding)
‣ How to query it now?
‣ No global view of the graph possible any more
‣ What about edges between servers?
17
18
First let's do
the cluster thingy
19
20
Marathon
21
Demo Time
DC / OS
Is Mesosphere required?
‣ ArangoDB can run clusters without it
‣ Setup Requires manual effort (can be scripted):
‣ Configure IP addresses
‣ Correct startup ordering
‣ This works:
‣ Automatic Failover (Follower takes over if leader dies)
‣ Rebalancing of shards
‣ Everything inside of ArangoDB
‣ This is based on Mesos:
‣ Complete self healing
‣ Automatic restart of ArangoDBs (on new machines)
➡ We suggest you have someone on call
22
23
Now distribute
the graph
Dangers of Sharding
‣ Only parts of the graph on every machine
‣ Neighboring vertices may be on different machines
‣ Even edges could be on other machines than their vertices
‣ Queries need to be executed in a distributed way
‣ Result needs to be merged locally
24
Random Distribution
‣ Disadvantages:
‣ Neighbors on different machines
‣ Probably edges on other machines than their
vertices
‣ A lot of network overhead is required for
querying
25
‣ Advantages:
‣ every server takes an equal portion of
data
‣ easy to realize
‣ no knowledge about data required
‣ always works
Random Distribution
‣ Disadvantages:
‣ Neighbors on different machines
‣ Probably edges on other machines than their
vertices
‣ A lot of network overhead is required for
querying
25
‣ Advantages:
‣ every server takes an equal portion of
data
‣ easy to realize
‣ no knowledge about data required
‣ always works
Index-Free Adjacency
26
‣ Used by most other graph databases
‣ Every vertex maintains two lists of it's edges (IN and OUT)
‣ Do not use an index to find edges
‣ How to shard this?
Index-Free Adjacency
26
‣ Used by most other graph databases
‣ Every vertex maintains two lists of it's edges (IN and OUT)
‣ Do not use an index to find edges
‣ How to shard this?
Index-Free Adjacency
26
‣ Used by most other graph databases
‣ Every vertex maintains two lists of it's edges (IN and OUT)
‣ Do not use an index to find edges
‣ How to shard this?
Index-Free Adjacency
26
‣ Used by most other graph databases
‣ Every vertex maintains two lists of it's edges (IN and OUT)
‣ Do not use an index to find edges
‣ How to shard this?
????
Index-Free Adjacency
‣ ArangoDB uses an hash-based EdgeIndex (O(1) - lookup)
‣ The vertex is independent of it's edges
‣ It can be stored on a different machine
26
‣ Used by most other graph databases
‣ Every vertex maintains two lists of it's edges (IN and OUT)
‣ Do not use an index to find edges
‣ How to shard this?
????
Domain Based Distribution
‣ Many Graphs have a natural distribution
‣ By country/region for People
‣ By tags for Blogs
‣ By category for Products
‣ Most edges in same group
‣ Rare edges between groups
27
Domain Based Distribution
‣ Many Graphs have a natural distribution
‣ By country/region for People
‣ By tags for Blogs
‣ By category for Products
‣ Most edges in same group
‣ Rare edges between groups
27
Domain Based Distribution
‣ Many Graphs have a natural distribution
‣ By country/region for People
‣ By tags for Blogs
‣ By category for Products
‣ Most edges in same group
‣ Rare edges between groups
27
uses Domain Knowledge

for short-cuts
28
Sneak Preview
SmartGraphs
Benchmark Comparison
Source: https://www.arangodb.com/2015/10/benchmark-postgresql-mongodb-arangodb/
Thank you
‣ Further questions?
‣ Follow us on twitter: @arangodb
‣ Join our slack: slack.arangodb.com
‣ Follow me on twitter/github: @mchacki
‣ Write me a mail: michael@arangodb.com
30

More Related Content

PDF
Designing APIs with OpenAPI Spec
PPTX
An Introduction To REST API
PPTX
REST API
PPTX
REST API Design & Development
PPTX
Api types
PPTX
API Docs with OpenAPI 3.0
PDF
Introduction to Kong API Gateway
PDF
Why HATEOAS
Designing APIs with OpenAPI Spec
An Introduction To REST API
REST API
REST API Design & Development
Api types
API Docs with OpenAPI 3.0
Introduction to Kong API Gateway
Why HATEOAS

What's hot (20)

PPTX
REST & RESTful Web Services
ODP
Introduction to Swagger
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
PDF
remote-method-guesser - BHUSA2021 Arsenal
PDF
OpenAPI development with Python
PPSX
Rest api standards and best practices
ODP
Kong API Gateway
PPTX
Soap, wsdl et uddi
PDF
Open API and API Management - Introduction and Comparison of Products: TIBCO ...
PPT
Service Oriented Architecture
PPTX
NoSQL databases - An introduction
PDF
OSGi Blueprint Services
PDF
What is REST API? REST API Concepts and Examples | Edureka
PPTX
Azure Web App services
PPTX
PDF
Web develop in flask
PPTX
API Security Fundamentals
PPTX
Understanding REST APIs in 5 Simple Steps
PPT
Introduction to MongoDB
PDF
SOAP-based Web Services
REST & RESTful Web Services
Introduction to Swagger
Best practices and lessons learnt from Running Apache NiFi at Renault
remote-method-guesser - BHUSA2021 Arsenal
OpenAPI development with Python
Rest api standards and best practices
Kong API Gateway
Soap, wsdl et uddi
Open API and API Management - Introduction and Comparison of Products: TIBCO ...
Service Oriented Architecture
NoSQL databases - An introduction
OSGi Blueprint Services
What is REST API? REST API Concepts and Examples | Edureka
Azure Web App services
Web develop in flask
API Security Fundamentals
Understanding REST APIs in 5 Simple Steps
Introduction to MongoDB
SOAP-based Web Services
Ad

Viewers also liked (20)

PDF
Processing large-scale graphs with Google(TM) Pregel
PDF
Neo4j and the Panama Papers - FooCafe June 2016
PDF
Mongo db improve the performance of your application codemotion2016
PDF
Domain driven design @FrOSCon
PDF
Polyglot Persistence & Multi Model-Databases at JMaghreb3.0
PDF
Introduction to Foxx by our community member Iskandar Soesman @ikandars
PDF
Software + Babies
PDF
Extensibility of a database api with js
PDF
PDF
Creating data centric microservices
PDF
Microservice-based software architecture
PDF
Polyglot Persistence & Multi-Model Databases (FullStack Toronto)
PDF
Performance comparison: Multi-Model vs. MongoDB and Neo4j
PDF
Deep dive into the native multi model database ArangoDB
PDF
Polyglot Persistence & Multi-Model Databases
PDF
Creating Fault Tolerant Services on Mesos
PDF
ArangoDB – A different approach to NoSQL
PDF
NoSQL meets Microservices
PPTX
Data Structures - Lecture 10 [Graphs]
PDF
Real time and reliable processing with Apache Storm
Processing large-scale graphs with Google(TM) Pregel
Neo4j and the Panama Papers - FooCafe June 2016
Mongo db improve the performance of your application codemotion2016
Domain driven design @FrOSCon
Polyglot Persistence & Multi Model-Databases at JMaghreb3.0
Introduction to Foxx by our community member Iskandar Soesman @ikandars
Software + Babies
Extensibility of a database api with js
Creating data centric microservices
Microservice-based software architecture
Polyglot Persistence & Multi-Model Databases (FullStack Toronto)
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Deep dive into the native multi model database ArangoDB
Polyglot Persistence & Multi-Model Databases
Creating Fault Tolerant Services on Mesos
ArangoDB – A different approach to NoSQL
NoSQL meets Microservices
Data Structures - Lecture 10 [Graphs]
Real time and reliable processing with Apache Storm
Ad

Similar to Handling Billions of Edges in a Graph Database (17)

PDF
Scaling to billions of Edges in a Graph Database by Max Neunhoeffer at Big Da...
PDF
Visualize your graph database
PDF
Instaduction to instaparse
PPTX
TREE BST HEAP GRAPH
KEY
groovy & grails - lecture 8
PDF
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
PDF
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
PDF
OSDC 2015: Ingo Friepoertner | Polyglot Persistence & Multi-Model NoSQL Datab...
PDF
DFS-model Graph Modeling (CES 417) Lecture 6
PPTX
What is grid system
PPTX
trees-and-graphs_computer_science_for_student.pptx
PDF
CSSO – compress CSS (english version)
PDF
Asynchronous single page applications without a line of HTML or Javascript, o...
PDF
Building Applications with a Graph Database
PPTX
Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...
PDF
Knit One, Compute One - YOW! Night Perth
PDF
openCypher Technology Compatibility Kit (TCK)
Scaling to billions of Edges in a Graph Database by Max Neunhoeffer at Big Da...
Visualize your graph database
Instaduction to instaparse
TREE BST HEAP GRAPH
groovy & grails - lecture 8
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
OSDC 2015: Ingo Friepoertner | Polyglot Persistence & Multi-Model NoSQL Datab...
DFS-model Graph Modeling (CES 417) Lecture 6
What is grid system
trees-and-graphs_computer_science_for_student.pptx
CSSO – compress CSS (english version)
Asynchronous single page applications without a line of HTML or Javascript, o...
Building Applications with a Graph Database
Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...
Knit One, Compute One - YOW! Night Perth
openCypher Technology Compatibility Kit (TCK)

More from ArangoDB Database (20)

PPTX
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
PPTX
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
PPTX
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
PPTX
ArangoDB 3.9 - Further Powering Graphs at Scale
PDF
GraphSage vs Pinsage #InsideArangoDB
PDF
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
PDF
Graph Analytics with ArangoDB
PDF
Getting Started with ArangoDB Oasis
PDF
Custom Pregel Algorithms in ArangoDB
PPTX
Hacktoberfest 2020 - Intro to Knowledge Graphs
PDF
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
PDF
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
PDF
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
PDF
ArangoDB 3.7 Roadmap: Performance at Scale
PDF
Webinar: What to expect from ArangoDB Oasis
PDF
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
PDF
3.5 webinar
PDF
Webinar: How native multi model works in ArangoDB
PDF
An introduction to multi-model databases
PDF
Running complex data queries in a distributed system
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
ArangoDB 3.9 - Further Powering Graphs at Scale
GraphSage vs Pinsage #InsideArangoDB
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Graph Analytics with ArangoDB
Getting Started with ArangoDB Oasis
Custom Pregel Algorithms in ArangoDB
Hacktoberfest 2020 - Intro to Knowledge Graphs
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoDB 3.7 Roadmap: Performance at Scale
Webinar: What to expect from ArangoDB Oasis
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
3.5 webinar
Webinar: How native multi model works in ArangoDB
An introduction to multi-model databases
Running complex data queries in a distributed system

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Electronic commerce courselecture one. Pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Network Security Unit 5.pdf for BCA BBA.
The AUB Centre for AI in Media Proposal.docx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Encapsulation_ Review paper, used for researhc scholars
Electronic commerce courselecture one. Pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectral efficient network and resource selection model in 5G networks

Handling Billions of Edges in a Graph Database

  • 1. www.arangodb.com Handling Billions Of Edges in a Graph Database Michael Hackstein @mchacki New Technology
  • 2. Michael Hackstein ‣ ArangoDB Core Team ‣ Web Frontend ‣ Graph visualisation ‣ Graph features ‣ SmartGraphs ‣ Host of cologne.js ‣ Master’s Degree
 (spec. Databases and
 Information Systems) 2
  • 3. What are Graph Databases ‣ Schema-free Objects (Vertices) ‣ Relations between them (Edges) ‣ Edges have a direction 3 { name: "alice", age: 32 } { name: "bob", age: 35, size: 1,73m } { name: "fishing" } { name: "reading" } { name: "dancing" } married hobby hobby hobby hobby
  • 4. What are Graph Databases ‣ Schema-free Objects (Vertices) ‣ Relations between them (Edges) ‣ Edges have a direction ‣ Edges can be queried in both directions ‣ Easily query a range of edges (2 to 5) ‣ Undefined number of edges (1 to *) ‣ Shortest Path between two vertices 3 { name: "alice", age: 32 } { name: "bob", age: 35, size: 1,73m } { name: "fishing" } { name: "reading" } { name: "dancing" } married hobby hobby hobby hobby
  • 6. Typical Graph Queries ‣ Give me all friends of Alice 4
  • 7. Typical Graph Queries ‣ Give me all friends of Alice ‣ Give me all friends-of-friends of Alice 4
  • 8. Typical Graph Queries ‣ Give me all friends of Alice ‣ Give me all friends-of-friends of Alice ‣ What is the linking path between Alice and Bob 4
  • 9. Typical Graph Queries ‣ Give me all friends of Alice ‣ Give me all friends-of-friends of Alice ‣ What is the linking path between Alice and Bob ‣ Which Trainstations can I reach if I am allowed to drive a distance of 6 stations on my ticket 4
  • 10. Typical Graph Queries ‣ Give me all friends of Alice ‣ Give me all friends-of-friends of Alice ‣ What is the linking path between Alice and Bob ‣ Which Trainstations can I reach if I am allowed to drive a distance of 6 stations on my ticket ‣ Pattern Matching: 4
  • 11. Typical Graph Queries ‣ Give me all friends of Alice ‣ Give me all friends-of-friends of Alice ‣ What is the linking path between Alice and Bob ‣ Which Trainstations can I reach if I am allowed to drive a distance of 6 stations on my ticket ‣ Pattern Matching: ‣ Give me all users that share two hobbies with Alice 4
  • 12. Typical Graph Queries ‣ Give me all friends of Alice ‣ Give me all friends-of-friends of Alice ‣ What is the linking path between Alice and Bob ‣ Which Trainstations can I reach if I am allowed to drive a distance of 6 stations on my ticket ‣ Pattern Matching: ‣ Give me all users that share two hobbies with Alice ‣ Give me all products that at least one of my friends has bought together with the products I already own, ordered by how many friends have bought it and the products rating, but only 20 of them. 4
  • 14. Non-Typical Graph Queries ‣ Give me all users which have an age attribute between 21 and 35. 5
  • 15. Non-Typical Graph Queries ‣ Give me all users which have an age attribute between 21 and 35. ‣ Give me the age distribution of all users 5
  • 16. Non-Typical Graph Queries ‣ Give me all users which have an age attribute between 21 and 35. ‣ Give me the age distribution of all users ‣ Group all users by their name 5
  • 17. Traversal 6 Iterate down two edges with some filters
  • 18. Traversal ‣ We first pick a start vertex (S) 6 S Iterate down two edges with some filters
  • 19. Traversal ‣ We first pick a start vertex (S) ‣ We collect all edges on S 6 S A B C Iterate down two edges with some filters
  • 20. Traversal ‣ We first pick a start vertex (S) ‣ We collect all edges on S ‣ We apply filters on edges 6 S A B C Iterate down two edges with some filters
  • 21. Traversal ‣ We first pick a start vertex (S) ‣ We collect all edges on S ‣ We apply filters on edges ‣ We iterate down one of the new vertices (A) 6 S A B C D E Iterate down two edges with some filters
  • 22. Traversal ‣ We first pick a start vertex (S) ‣ We collect all edges on S ‣ We apply filters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply filters on edges 6 S A B C D E Iterate down two edges with some filters
  • 23. Traversal ‣ We first pick a start vertex (S) ‣ We collect all edges on S ‣ We apply filters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply filters on edges ‣ The next vertex (E) is in desired depth. Return the path S -> A -> E 6 S A B C D E Iterate down two edges with some filters
  • 24. Traversal ‣ We first pick a start vertex (S) ‣ We collect all edges on S ‣ We apply filters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply filters on edges ‣ The next vertex (E) is in desired depth. Return the path S -> A -> E ‣ Go back to the next unfinished vertex (B) 6 S A B C D E Iterate down two edges with some filters
  • 25. Traversal ‣ We first pick a start vertex (S) ‣ We collect all edges on S ‣ We apply filters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply filters on edges ‣ The next vertex (E) is in desired depth. Return the path S -> A -> E ‣ Go back to the next unfinished vertex (B) ‣ We iterate down on (B) 6 S A B C D E Iterate down two edges with some filters F
  • 26. Traversal ‣ We first pick a start vertex (S) ‣ We collect all edges on S ‣ We apply filters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply filters on edges ‣ The next vertex (E) is in desired depth. Return the path S -> A -> E ‣ Go back to the next unfinished vertex (B) ‣ We iterate down on (B) ‣ We apply filters on edges 6 S A B C D E Iterate down two edges with some filters F
  • 27. Traversal ‣ We first pick a start vertex (S) ‣ We collect all edges on S ‣ We apply filters on edges ‣ We iterate down one of the new vertices (A) ‣ We apply filters on edges ‣ The next vertex (E) is in desired depth. Return the path S -> A -> E ‣ Go back to the next unfinished vertex (B) ‣ We iterate down on (B) ‣ We apply filters on edges ‣ The next vertex (F) is in desired depth. Return the path S -> B -> F 6 S A B C D E Iterate down two edges with some filters F
  • 28. Traversal - Complexity ‣ Once: ‣ Find the start vertex ‣ For every depth: ‣ Find all connected edges ‣ Filter non-matching edges ‣ Find connected vertices ‣ Filter non-matching vertices 7 Depends on indexes: Hash: Edge-Index or Index-Free: Linear in edges: Depends on indexes: Hash: Linear in vertices: Only one pass: O 1 1 n n * 1 n 3n
  • 29. Traversal - Complexity ‣ Linear sounds evil? ‣ NOT linear in All Edges O(E) ‣ Only Linear in relevant Edges n < E ‣ Traversals solely scale with their result size. ‣ They are not effected at all by total amount of data ‣ BUT: Every depth increases the exponent: O(3 * n ) ‣ "7 degrees of separation": 3*n < E < 3*n 8 d 6 7
  • 30. ‣ MULTI-MODEL database ‣ Stores Documents and Graphs ‣ Query language AQL ‣ Document Queries ‣ Graph Queries ‣ Joins ‣ All can be combined in the same statement ‣ ACID support including Multi Collection Transactions 9
  • 31. AQL 10 FOR user IN users RETURN user
  • 32. AQL 11 FOR user IN users FILTER user.name == "alice" RETURN user
  • 33. AQL 12 FOR user IN users FILTER user.name == "alice" FOR product IN OUTBOUND user has_bought RETURN product Alice TV has_bought
  • 34. AQL 13 FOR user IN users FILTER user.name == "alice" FOR recommendation, action, path IN 3 ANY user has_bought FILTER path.vertices[2].age <= user.age + 5 AND path.vertices[2].age >= user.age - 5 FILTER recommendation.price < 25 LIMIT 10 RETURN recommendation Alice TV has_bought Bob Playstation has_boughthas_bought alice.age - 5 <= bob.age && bob.age <= alice.age + 5 playstation.price < 25
  • 36. First Boost - Vertex Centric Indices ‣ Remember Complexity? O(3 * n ) ‣ Filtering of non-matching edges is linear for every depth ‣ Index all edges based on their vertices and arbitrary other attributes ‣ Find initial set of edges in identical time ‣ Less / No post-filtering required ‣ This decreases the n 15 d
  • 38. Scaling ‣ Vertex-Centric Indexes help with super-nodes ‣ But what if the graph is too large for one machine? ‣ Distribute graph on several machines (sharding) ‣ How to query it now? ‣ No global view of the graph possible any more ‣ What about edges between servers? 17
  • 39. 18 First let's do the cluster thingy
  • 40. 19
  • 43. Is Mesosphere required? ‣ ArangoDB can run clusters without it ‣ Setup Requires manual effort (can be scripted): ‣ Configure IP addresses ‣ Correct startup ordering ‣ This works: ‣ Automatic Failover (Follower takes over if leader dies) ‣ Rebalancing of shards ‣ Everything inside of ArangoDB ‣ This is based on Mesos: ‣ Complete self healing ‣ Automatic restart of ArangoDBs (on new machines) ➡ We suggest you have someone on call 22
  • 45. Dangers of Sharding ‣ Only parts of the graph on every machine ‣ Neighboring vertices may be on different machines ‣ Even edges could be on other machines than their vertices ‣ Queries need to be executed in a distributed way ‣ Result needs to be merged locally 24
  • 46. Random Distribution ‣ Disadvantages: ‣ Neighbors on different machines ‣ Probably edges on other machines than their vertices ‣ A lot of network overhead is required for querying 25 ‣ Advantages: ‣ every server takes an equal portion of data ‣ easy to realize ‣ no knowledge about data required ‣ always works
  • 47. Random Distribution ‣ Disadvantages: ‣ Neighbors on different machines ‣ Probably edges on other machines than their vertices ‣ A lot of network overhead is required for querying 25 ‣ Advantages: ‣ every server takes an equal portion of data ‣ easy to realize ‣ no knowledge about data required ‣ always works
  • 48. Index-Free Adjacency 26 ‣ Used by most other graph databases ‣ Every vertex maintains two lists of it's edges (IN and OUT) ‣ Do not use an index to find edges ‣ How to shard this?
  • 49. Index-Free Adjacency 26 ‣ Used by most other graph databases ‣ Every vertex maintains two lists of it's edges (IN and OUT) ‣ Do not use an index to find edges ‣ How to shard this?
  • 50. Index-Free Adjacency 26 ‣ Used by most other graph databases ‣ Every vertex maintains two lists of it's edges (IN and OUT) ‣ Do not use an index to find edges ‣ How to shard this?
  • 51. Index-Free Adjacency 26 ‣ Used by most other graph databases ‣ Every vertex maintains two lists of it's edges (IN and OUT) ‣ Do not use an index to find edges ‣ How to shard this? ????
  • 52. Index-Free Adjacency ‣ ArangoDB uses an hash-based EdgeIndex (O(1) - lookup) ‣ The vertex is independent of it's edges ‣ It can be stored on a different machine 26 ‣ Used by most other graph databases ‣ Every vertex maintains two lists of it's edges (IN and OUT) ‣ Do not use an index to find edges ‣ How to shard this? ????
  • 53. Domain Based Distribution ‣ Many Graphs have a natural distribution ‣ By country/region for People ‣ By tags for Blogs ‣ By category for Products ‣ Most edges in same group ‣ Rare edges between groups 27
  • 54. Domain Based Distribution ‣ Many Graphs have a natural distribution ‣ By country/region for People ‣ By tags for Blogs ‣ By category for Products ‣ Most edges in same group ‣ Rare edges between groups 27
  • 55. Domain Based Distribution ‣ Many Graphs have a natural distribution ‣ By country/region for People ‣ By tags for Blogs ‣ By category for Products ‣ Most edges in same group ‣ Rare edges between groups 27 uses Domain Knowledge
 for short-cuts
  • 58. Thank you ‣ Further questions? ‣ Follow us on twitter: @arangodb ‣ Join our slack: slack.arangodb.com ‣ Follow me on twitter/github: @mchacki ‣ Write me a mail: michael@arangodb.com 30