SlideShare a Scribd company logo
www.Objectivity.com
Nick Quinn
Lead Developer - InfiniteGraph
What are we talking about today?
Not that Bacon This Bacon!
• Intro to the Six Degrees Problem
• What is a Graph Database?
• Why Bacon in Graph Database?
• How we solved the problem
Images Courtesy of IMDB (www.imdb.com)
Six Degrees of Bacon
“…any individual involved in the Hollywood, California film industry
can be linked through his or her film roles to actor Kevin Bacon
within six steps”
[http://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon]
Gina Menza
Images Courtesy of IMDB (www.imdb.com)
A Tale of Two Kevins
Why Six Degrees of Bacon?
Actor Age # of Projects
Kevin Bacon 54 76
Harrison Ford 70 70
Tom Cruise 50 40
Julia Roberts 45 50
Tom Hanks 56 73
Denzel Washington 58 53
Michael Caine 80 157
Kiefer Sutherland 46 82
Kevin Bacon
Images Courtesy of IMDB (www.imdb.com)
Bacon Numbers in Google
In the summer of 2012, Google started to allow users to find the
bacon number of any actor simply by following his or her name
with “bacon number”.
Morgan
Freeman
The Dark
Night
Rises
appeared in
Gary
Oldman
appeared in
Kevin
Bacon
Criminal
Law
appeared in
appeared in
www.google.com Graphical Representation
What is a Graph Database ?
The Physical Data Model
• Difference between relational & graph databases
Meetings
P1 Place TimeP2
Alice Denver 5-27-10Bob
Calls
From Time DurationTo
Bob 13:20 25Carlos
Bob 17:10 15Charlie
Payments
From Date AmountTo
Carlos 5-12-10 100000Charlie
Met
5-27-10
Alice
Called
13:20
Bob
Paid
100000
Carlos
Charlie
Called
17:10
Rows/Columns/Tables Relationship/Graph Optimized
Connecting Data
Person Building
?
Work Live RR Visit Eat Shop
Who is Gina Menza?
• How do we get meaning from highly
connected data?
Gina Menza
Jury Forewoman Miss Jeffries
Images Courtesy of IMDB (www.imdb.com)
Strength of Connections Matter!
• Why 6 degrees of separation and not 3.74?
• We need analysis tools in order to
– identify and filter out “unimportant” data and
– infer what needs to be filtered as we investigate it.
“When considering another
person in the world, a friend
of your friend knows a
friend of their friend”
- facebook
Why Bacon in a Graph Database ?
Graph Analysis
• Why use Graph Databases for graph analysis?
– Dynamic on Live Data
– Feedback/Inference
– Optimized for concurrent user access
– Handles big data problems
– Native Graph Traversal API
– Manage memory efficiently
Paths to Bacon
Bacon Number
(Degree of Separation / 2)
# of People
1 2823
2 323677
3 1088560
4 272905
5 22533
6 2300
Using the IMDB (www.imdb.com) data set, we can study how many paths
can be found by degrees of separation from Kevin Bacon. Out of 5,067,124
nodes and 11,505,797 edges, we get the following:
0
200000
400000
600000
800000
1000000
1200000
1 2 3 4 5 6
# of
People
Big Data + Graph = Big Graph Data
4 Degrees of Kevin Bacon
(Breadth First up to 20K connections)
Images generated using the IG Visualizer
Analyzing Bacon
• To be able to perform meaningful analysis,
these are things that you will need:
– Ingest IMDB Dataset – About 50 Formatted
compressed files (Largest > 200 MB)
– Custom algorithm support to perform meaningful
analysis
• Optimize queries to get results back in reasonable time
– Visualization tool to test and view the results of
the navigation (optional)
How IG Sizzles Your Bacon
Ingest
Update
Navigate
Massive graph data require efficient and intelligent tools
to analyze and understand it.
Super Simple Java API
Actor bacon = new Actor(“Kevin Bacon”);
imdbGraphDB.addVertex( bacon );
Movie apollo= new Movie(“Apollo 13”, 1995);
imdbGraphDB.addVertex( apollo );
ActedIn bacon2apollo = new ActedIn(“Jack Swigert”);
imdbGraphDB.addEdge(bacon2apollo, bacon, apollo,
EdgeKind.BIDIRECTIONAL, 1 /**weight**/);
Ingest
Scaling Writes
• Big/Fast data demands write performance
• Most NoSQL solutions allow you to scale
writes by…
– Partitioning the data
– Understanding your consistency requirements
– Allowing you to defer conflicts
Ingest
App-2
(Ingest V2)
App-2
(E23{ V2V3})
Scaling Graph Writes
ACID Transactions
InfiniteGraph
Objectivity/DB Persistence Layer
App-1
(Ingest V1)
App-3
(Ingest V3)
V1 V2 V3
App-1
(E1 2{ V1V2})
App-3
E12 E23
Ingest
High Performance Edge Ingest
IG Core/API
C1
C2
C3
E12
E23
TargetContainers
PipelineContainers
E(1->2)
E(3->1)
E(2->3)
E(2->1)
E(2->3)
E(3->1)
E(1->2)
E(3->2)
E(1->2)
E(2->3)
E(3->1)
E(2->1)
E(2->3)
E(3->1)
E(3->2)
E(1->2)
Pipeline
Agent
Ingest
Trade offs
• Excellent for efficient use of page cache
• Able to maintain full database consistency
• Achieves highest ingest rate in distributed
environments
• Almost always has highest “perceived” rate
• Trading Off :
• Eventual consistency in graph (connections)
• Updates are still atomic, isolated and durable but phased
• External agent performs graph building
Ingest
Result…
1 client
2 clients
4 clients
8 clients
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
1
2
4
NodesandEdgespersecond
1 client
2 clients
4 clients
8 clients
Ingest
Scaling Reads and Query
Distributed API
Application(s)
Partition 1 Partition 3Partition 2 Partition ...n
Processor Processor Processor Processor
Partitioning and Read Replicas… easy right !
Why are Graphs Different ?
Distributed API
Application(s)
Partition 1 Partition 3Partition 2 Partition ...n
Processor Processor Processor Processor
Navigate
Distributed Navigation
• Detect local hops and perform in memory
traversal
• Send the partial path to the distributed
processing to continue the navigation.
• Intelligently cache remote data when accessed
frequently
• Route tasks to other hosts when it is optimal
Navigate
Distributed Navigation Server
Processor
Distributed API
Partition 1 Partition 2
Processor
Application
A
X
Y
B
C
D
E
P(A,B,C,D)
F
G
Navigate
GraphViews
Leveraging Schema in the Graph
Patient Prescription
Drug
Ingredient
Outcome
Complaint
Visit
Allergy
Physician
Navigate
Schema Enables Views
• GraphViews are extremely powerful
• Allow Big Data to appear small !
• Connection inference can lead to exponential
gains in query performance
• Views are reusable between queries
• Views can be persisted
• Built into the native kernel
Navigate
Problem of Supernodes
In Graph Theory, a “supernode” is a vertex with a
disproportionally high number of connected edges.
Supernodes make it difficult to do a navigational query in
real-time due to the amount of effort it may be to pursue
paths through it that may be unfruitful.
Navigate
Images generated using the IG Visualizer
Supernodes in Bacon
Navigate
In the IMDB data set, some examples of supernodes may be talk
shows, awards shows, compilations or variety shows.
Images generated using the IG Visualizer
How to avoid supernodes
1. Setting policies on the navigation like the
NoRevisitPolicy , MaximumResultCountPolicy and
MaximumPathDepthPolicy can be used to customize the
overall behavior of the navigation.
PolicyChain policies = new PolicyChain();
// Only traverse the same vertex once
policies.addPolicy(new NoRevisitPolicy());
// limits the number of paths that will be returned to 10K
policies.addPolicy(new MaximumResultCountPolicy(10000));
// limits the path depth to 6
policies.addPolicy(new MaximumPathDepthPolicy(6));
Navigate
How to avoid supernodes
2. Graph View to exclude or limit types
GraphView view = new GraphView();
//Excludes all instances of TvShow from navigation
view.excludeClass(myDb.getTypeId(TvShow.class.getName()));
//Excludes all movies made for TV/Video
view.excludeClass(myDb.getTypeId(Movie.class.getName()),
“details.madeForTv || details.madeForVideo”);
//Include ActedIn w/ characterName not containing “Himself”
view.excludeClass(myDb.getTypeId(WorkedOn.class.getName()));
view.includeClass(myDb.getTypeId(ActedIn.class.getName()),
“!CONTAINS(characterName, “Himself”)”);
Navigate
Kevin Bacon
Actor
The
Following
TV Show
Behind the
Scenes
Movie
Apollo 13
Movie
HimselfRyan Hardy
Jack Swigert
How to avoid supernodes
3. Using these policies and graph view, we can
filter the size of the result set in our navigation:
Navigator navigator = bacon.navigate(view,
Guide.SIMPLE_BREADTH_FIRST, Qualifier.ANY,
new VertexPredicate(Person.class, ""),
policies, myResultHandler);
navigator.start();
Navigate
Filtered Views in Bacon
The results of this navigation would look something like this…
Navigate
Images generated using the IG Visualizer
Why InfiniteGraph™?
• Objectivity/DB is a proven foundation
– Building distributed databases since 1993
– A complete database management system
• Concurrency, transactions, cache, schema, query, indexing
• It’s a Graph Specialist !
– Simple but powerful API tailored for data navigation.
– Easy to configure distribution model
Advanced Configured Placement
• Physically co-locate “closely related” data
• Driven through a declarative placement model
• Dramatically speeds “local” reads
Facility Data Page(s)Patient Data Page(s)
Mr
Citizen
Visit Visit
Dr
Jones
San
Jose
Facility
Dr
Smith
Primary
Physician
HasHas With
At
Located Located
Facility Data Page(s)
Dr
Blake
Sunny-
vale
Dr
Quinn
Located Located
With
At
Fully Distributed Data Model
Zone 2Zone 1
HostA
IG Core/API
Distributed Object and Relationship Persistence Layer
Customizable Placement
HostB HostC HostX
AddVertex()
Polyglot NoSQL Architectures
Distributed Data
Processing
Platform Document
Graph
Database
RDBMS
Partitioned Distributed DB (often Document / KV)
Users
Applications
External/LegacyData
TransformationMDM
Business
What else!
• Distributed update.
Update
… we are working on it.
Conclusion
I hope that you enjoyed the bacon.
My apologies to my kosher friends for any offense.
Look out for new features coming soon!
QUESTIONS?

More Related Content

PPTX
How we Learned to Stop Worrying and Solve the Distributed Graph Problem
PDF
The Value of Explicit Schema for Graph Use Cases
PDF
Horizon: Deep Reinforcement Learning at Scale
PDF
How Graph Technology is Changing AI
PPTX
Graph Data: a New Data Management Frontier
PPTX
AI in the Enterprise at Scale
PDF
Graph Databases and Graph Data Science in Neo4j
PDF
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
How we Learned to Stop Worrying and Solve the Distributed Graph Problem
The Value of Explicit Schema for Graph Use Cases
Horizon: Deep Reinforcement Learning at Scale
How Graph Technology is Changing AI
Graph Data: a New Data Management Frontier
AI in the Enterprise at Scale
Graph Databases and Graph Data Science in Neo4j
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...

What's hot (20)

PDF
4. Document Discovery with Graph Data Science
PPTX
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
PDF
Relationships Matter: Using Connected Data for Better Machine Learning
PDF
3. Relationships Matter: Using Connected Data for Better Machine Learning
PDF
Einstieg in Neo4j Graph Data Science
PDF
Making Sense of Graph Databases
PDF
Leveraging Graphs for Better AI
PDF
Einführung in Neo4j
PDF
Digital Transformation in a Connected World
PDF
Graphs for Finance - AML with Neo4j Graph Data Science
PDF
Graph-based Network & IT Management.
PDF
RAPIDS cuGraph – Accelerating all your Graph needs
PPTX
Graph tour keynote 2019
PPTX
The years of the graph: The future of the future is here
PPTX
Applying Noisy Knowledge Graphs to Real Problems
PPTX
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
PDF
Graphs for Enterprise Architects
PDF
Visualize the Knowledge Graph and Unleash Your Data
PDF
A Picture is Worth 1,000 Rows
PDF
The Future is Big Graphs: A Community View on Graph Processing Systems
4. Document Discovery with Graph Data Science
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning
Einstieg in Neo4j Graph Data Science
Making Sense of Graph Databases
Leveraging Graphs for Better AI
Einführung in Neo4j
Digital Transformation in a Connected World
Graphs for Finance - AML with Neo4j Graph Data Science
Graph-based Network & IT Management.
RAPIDS cuGraph – Accelerating all your Graph needs
Graph tour keynote 2019
The years of the graph: The future of the future is here
Applying Noisy Knowledge Graphs to Real Problems
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Graphs for Enterprise Architects
Visualize the Knowledge Graph and Unleash Your Data
A Picture is Worth 1,000 Rows
The Future is Big Graphs: A Community View on Graph Processing Systems
Ad

Viewers also liked (9)

PPT
Text Analytics for Semantic Computing
PDF
PowerOfRelationshipsInBigData_SVNoSQL
PDF
Using A Distributed Graph Database To Make Sense Of Disparate Data Stores
PPTX
Webinar 3/12/14: Using Social Media to Drive Value
PPTX
Social Media Mining - Chapter 10 (Behavior Analytics)
PPTX
Social Media Mining - Chapter 6 (Community Analysis)
PPTX
Social Media Mining - Chapter 3 (Network Measures)
PPT
IBM Software Defined Networking for Virtual Environments (IBM SDN VE)
PDF
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Text Analytics for Semantic Computing
PowerOfRelationshipsInBigData_SVNoSQL
Using A Distributed Graph Database To Make Sense Of Disparate Data Stores
Webinar 3/12/14: Using Social Media to Drive Value
Social Media Mining - Chapter 10 (Behavior Analytics)
Social Media Mining - Chapter 6 (Community Analysis)
Social Media Mining - Chapter 3 (Network Measures)
IBM Software Defined Networking for Virtual Environments (IBM SDN VE)
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Ad

Similar to Everything Goes Better With Bacon: Revisiting the Six Degrees Problem with a Graph Database (20)

PPTX
Revisiting the Six Degrees Problem with a Graph Database - Nick Quinn
PPT
Making sense of the Graph Revolution
PPT
Application Modeling with Graph Databases
PDF
Graph Database Use Cases - StampedeCon 2015
PDF
Graph database Use Cases
PDF
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
PDF
Application modelling with graph databases
PPTX
Introduction to graph databases in term of neo4j
PDF
Graph Databases 101
PPT
Meetup: An Introduction to InfiniteGraph, and Connecting the Dots in Big Data.
PPTX
Neo4j 20 minutes introduction
PDF
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
ODP
Graph databases
PDF
Soft Shake Event / A soft introduction to Neo4J
PPT
New Data Technologies, Graph Computing and Relationship Discovery in the Ente...
PDF
Introduction to graph databases GraphDays
PPTX
Graph Databases
PDF
En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta
PDF
Introduction to neo4j - a hands-on crash course
PDF
Whatis neo4j
Revisiting the Six Degrees Problem with a Graph Database - Nick Quinn
Making sense of the Graph Revolution
Application Modeling with Graph Databases
Graph Database Use Cases - StampedeCon 2015
Graph database Use Cases
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Application modelling with graph databases
Introduction to graph databases in term of neo4j
Graph Databases 101
Meetup: An Introduction to InfiniteGraph, and Connecting the Dots in Big Data.
Neo4j 20 minutes introduction
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Graph databases
Soft Shake Event / A soft introduction to Neo4J
New Data Technologies, Graph Computing and Relationship Discovery in the Ente...
Introduction to graph databases GraphDays
Graph Databases
En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta
Introduction to neo4j - a hands-on crash course
Whatis neo4j

More from InfiniteGraph (19)

PDF
NoSQL Simplified: Schema vs. Schema-less
PDF
Solution Use Case Demo: The Power of Relationships in Your Big Data
PPT
Objectivity/DB: A Multipurpose NoSQL Database
PPT
An Introduction to Graph Databases
PPT
Turning Big Data into Smart Data with Graph Technologies
PPTX
NoSQL Technology and Real-time, Accurate Predictive Analytics
PPTX
Vodafone xone fev142013v3 ext
PDF
Dbta Webinar Realize Value of Big Data with graph 011713
PDF
Oracle no sql overview brief
PPT
Infinite graph nosql meetup dec 2012
PPTX
Silicon valley nosql meetup april 2012
PPT
NOSQL Now! Presentation, August 24, 2011: Graph Databases: Connecting the Dot...
PPT
NOSQL Now! Presentation, August 23, 2011: Introduction to InfiniteGraph, the ...
PPT
Webinar: An Introduction to InfiniteGraph, and Connecting the Dots in Big Data.
PPT
An overview of InfiniteGraph, the distributed graph database
PPT
InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
PPTX
New Data Technologies, Graph Computing and Relationship Discovery in the Ente...
PDF
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
PDF
Gluecon InfiniteGraph Presentation: Scaling the Social Graph in the Cloud
NoSQL Simplified: Schema vs. Schema-less
Solution Use Case Demo: The Power of Relationships in Your Big Data
Objectivity/DB: A Multipurpose NoSQL Database
An Introduction to Graph Databases
Turning Big Data into Smart Data with Graph Technologies
NoSQL Technology and Real-time, Accurate Predictive Analytics
Vodafone xone fev142013v3 ext
Dbta Webinar Realize Value of Big Data with graph 011713
Oracle no sql overview brief
Infinite graph nosql meetup dec 2012
Silicon valley nosql meetup april 2012
NOSQL Now! Presentation, August 24, 2011: Graph Databases: Connecting the Dot...
NOSQL Now! Presentation, August 23, 2011: Introduction to InfiniteGraph, the ...
Webinar: An Introduction to InfiniteGraph, and Connecting the Dots in Big Data.
An overview of InfiniteGraph, the distributed graph database
InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast
New Data Technologies, Graph Computing and Relationship Discovery in the Ente...
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
Gluecon InfiniteGraph Presentation: Scaling the Social Graph in the Cloud

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
“AI and Expert System Decision Support & Business Intelligence Systems”
Review of recent advances in non-invasive hemoglobin estimation
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Chapter 3 Spatial Domain Image Processing.pdf
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
NewMind AI Weekly Chronicles - August'25 Week I
NewMind AI Monthly Chronicles - July 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

Everything Goes Better With Bacon: Revisiting the Six Degrees Problem with a Graph Database

  • 2. What are we talking about today? Not that Bacon This Bacon! • Intro to the Six Degrees Problem • What is a Graph Database? • Why Bacon in Graph Database? • How we solved the problem Images Courtesy of IMDB (www.imdb.com)
  • 3. Six Degrees of Bacon “…any individual involved in the Hollywood, California film industry can be linked through his or her film roles to actor Kevin Bacon within six steps” [http://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon] Gina Menza Images Courtesy of IMDB (www.imdb.com) A Tale of Two Kevins
  • 4. Why Six Degrees of Bacon? Actor Age # of Projects Kevin Bacon 54 76 Harrison Ford 70 70 Tom Cruise 50 40 Julia Roberts 45 50 Tom Hanks 56 73 Denzel Washington 58 53 Michael Caine 80 157 Kiefer Sutherland 46 82 Kevin Bacon Images Courtesy of IMDB (www.imdb.com)
  • 5. Bacon Numbers in Google In the summer of 2012, Google started to allow users to find the bacon number of any actor simply by following his or her name with “bacon number”. Morgan Freeman The Dark Night Rises appeared in Gary Oldman appeared in Kevin Bacon Criminal Law appeared in appeared in www.google.com Graphical Representation
  • 6. What is a Graph Database ?
  • 7. The Physical Data Model • Difference between relational & graph databases Meetings P1 Place TimeP2 Alice Denver 5-27-10Bob Calls From Time DurationTo Bob 13:20 25Carlos Bob 17:10 15Charlie Payments From Date AmountTo Carlos 5-12-10 100000Charlie Met 5-27-10 Alice Called 13:20 Bob Paid 100000 Carlos Charlie Called 17:10 Rows/Columns/Tables Relationship/Graph Optimized
  • 8. Connecting Data Person Building ? Work Live RR Visit Eat Shop
  • 9. Who is Gina Menza? • How do we get meaning from highly connected data? Gina Menza Jury Forewoman Miss Jeffries Images Courtesy of IMDB (www.imdb.com)
  • 10. Strength of Connections Matter! • Why 6 degrees of separation and not 3.74? • We need analysis tools in order to – identify and filter out “unimportant” data and – infer what needs to be filtered as we investigate it. “When considering another person in the world, a friend of your friend knows a friend of their friend” - facebook
  • 11. Why Bacon in a Graph Database ?
  • 12. Graph Analysis • Why use Graph Databases for graph analysis? – Dynamic on Live Data – Feedback/Inference – Optimized for concurrent user access – Handles big data problems – Native Graph Traversal API – Manage memory efficiently
  • 13. Paths to Bacon Bacon Number (Degree of Separation / 2) # of People 1 2823 2 323677 3 1088560 4 272905 5 22533 6 2300 Using the IMDB (www.imdb.com) data set, we can study how many paths can be found by degrees of separation from Kevin Bacon. Out of 5,067,124 nodes and 11,505,797 edges, we get the following: 0 200000 400000 600000 800000 1000000 1200000 1 2 3 4 5 6 # of People
  • 14. Big Data + Graph = Big Graph Data 4 Degrees of Kevin Bacon (Breadth First up to 20K connections) Images generated using the IG Visualizer
  • 15. Analyzing Bacon • To be able to perform meaningful analysis, these are things that you will need: – Ingest IMDB Dataset – About 50 Formatted compressed files (Largest > 200 MB) – Custom algorithm support to perform meaningful analysis • Optimize queries to get results back in reasonable time – Visualization tool to test and view the results of the navigation (optional)
  • 16. How IG Sizzles Your Bacon Ingest Update Navigate Massive graph data require efficient and intelligent tools to analyze and understand it.
  • 17. Super Simple Java API Actor bacon = new Actor(“Kevin Bacon”); imdbGraphDB.addVertex( bacon ); Movie apollo= new Movie(“Apollo 13”, 1995); imdbGraphDB.addVertex( apollo ); ActedIn bacon2apollo = new ActedIn(“Jack Swigert”); imdbGraphDB.addEdge(bacon2apollo, bacon, apollo, EdgeKind.BIDIRECTIONAL, 1 /**weight**/); Ingest
  • 18. Scaling Writes • Big/Fast data demands write performance • Most NoSQL solutions allow you to scale writes by… – Partitioning the data – Understanding your consistency requirements – Allowing you to defer conflicts Ingest
  • 19. App-2 (Ingest V2) App-2 (E23{ V2V3}) Scaling Graph Writes ACID Transactions InfiniteGraph Objectivity/DB Persistence Layer App-1 (Ingest V1) App-3 (Ingest V3) V1 V2 V3 App-1 (E1 2{ V1V2}) App-3 E12 E23 Ingest
  • 20. High Performance Edge Ingest IG Core/API C1 C2 C3 E12 E23 TargetContainers PipelineContainers E(1->2) E(3->1) E(2->3) E(2->1) E(2->3) E(3->1) E(1->2) E(3->2) E(1->2) E(2->3) E(3->1) E(2->1) E(2->3) E(3->1) E(3->2) E(1->2) Pipeline Agent Ingest
  • 21. Trade offs • Excellent for efficient use of page cache • Able to maintain full database consistency • Achieves highest ingest rate in distributed environments • Almost always has highest “perceived” rate • Trading Off : • Eventual consistency in graph (connections) • Updates are still atomic, isolated and durable but phased • External agent performs graph building Ingest
  • 22. Result… 1 client 2 clients 4 clients 8 clients 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 1 2 4 NodesandEdgespersecond 1 client 2 clients 4 clients 8 clients Ingest
  • 23. Scaling Reads and Query Distributed API Application(s) Partition 1 Partition 3Partition 2 Partition ...n Processor Processor Processor Processor Partitioning and Read Replicas… easy right !
  • 24. Why are Graphs Different ? Distributed API Application(s) Partition 1 Partition 3Partition 2 Partition ...n Processor Processor Processor Processor Navigate
  • 25. Distributed Navigation • Detect local hops and perform in memory traversal • Send the partial path to the distributed processing to continue the navigation. • Intelligently cache remote data when accessed frequently • Route tasks to other hosts when it is optimal Navigate
  • 26. Distributed Navigation Server Processor Distributed API Partition 1 Partition 2 Processor Application A X Y B C D E P(A,B,C,D) F G Navigate
  • 27. GraphViews Leveraging Schema in the Graph Patient Prescription Drug Ingredient Outcome Complaint Visit Allergy Physician Navigate
  • 28. Schema Enables Views • GraphViews are extremely powerful • Allow Big Data to appear small ! • Connection inference can lead to exponential gains in query performance • Views are reusable between queries • Views can be persisted • Built into the native kernel Navigate
  • 29. Problem of Supernodes In Graph Theory, a “supernode” is a vertex with a disproportionally high number of connected edges. Supernodes make it difficult to do a navigational query in real-time due to the amount of effort it may be to pursue paths through it that may be unfruitful. Navigate Images generated using the IG Visualizer
  • 30. Supernodes in Bacon Navigate In the IMDB data set, some examples of supernodes may be talk shows, awards shows, compilations or variety shows. Images generated using the IG Visualizer
  • 31. How to avoid supernodes 1. Setting policies on the navigation like the NoRevisitPolicy , MaximumResultCountPolicy and MaximumPathDepthPolicy can be used to customize the overall behavior of the navigation. PolicyChain policies = new PolicyChain(); // Only traverse the same vertex once policies.addPolicy(new NoRevisitPolicy()); // limits the number of paths that will be returned to 10K policies.addPolicy(new MaximumResultCountPolicy(10000)); // limits the path depth to 6 policies.addPolicy(new MaximumPathDepthPolicy(6)); Navigate
  • 32. How to avoid supernodes 2. Graph View to exclude or limit types GraphView view = new GraphView(); //Excludes all instances of TvShow from navigation view.excludeClass(myDb.getTypeId(TvShow.class.getName())); //Excludes all movies made for TV/Video view.excludeClass(myDb.getTypeId(Movie.class.getName()), “details.madeForTv || details.madeForVideo”); //Include ActedIn w/ characterName not containing “Himself” view.excludeClass(myDb.getTypeId(WorkedOn.class.getName())); view.includeClass(myDb.getTypeId(ActedIn.class.getName()), “!CONTAINS(characterName, “Himself”)”); Navigate Kevin Bacon Actor The Following TV Show Behind the Scenes Movie Apollo 13 Movie HimselfRyan Hardy Jack Swigert
  • 33. How to avoid supernodes 3. Using these policies and graph view, we can filter the size of the result set in our navigation: Navigator navigator = bacon.navigate(view, Guide.SIMPLE_BREADTH_FIRST, Qualifier.ANY, new VertexPredicate(Person.class, ""), policies, myResultHandler); navigator.start(); Navigate
  • 34. Filtered Views in Bacon The results of this navigation would look something like this… Navigate Images generated using the IG Visualizer
  • 35. Why InfiniteGraph™? • Objectivity/DB is a proven foundation – Building distributed databases since 1993 – A complete database management system • Concurrency, transactions, cache, schema, query, indexing • It’s a Graph Specialist ! – Simple but powerful API tailored for data navigation. – Easy to configure distribution model
  • 36. Advanced Configured Placement • Physically co-locate “closely related” data • Driven through a declarative placement model • Dramatically speeds “local” reads Facility Data Page(s)Patient Data Page(s) Mr Citizen Visit Visit Dr Jones San Jose Facility Dr Smith Primary Physician HasHas With At Located Located Facility Data Page(s) Dr Blake Sunny- vale Dr Quinn Located Located With At
  • 37. Fully Distributed Data Model Zone 2Zone 1 HostA IG Core/API Distributed Object and Relationship Persistence Layer Customizable Placement HostB HostC HostX AddVertex()
  • 38. Polyglot NoSQL Architectures Distributed Data Processing Platform Document Graph Database RDBMS Partitioned Distributed DB (often Document / KV) Users Applications External/LegacyData TransformationMDM Business
  • 39. What else! • Distributed update. Update … we are working on it.
  • 40. Conclusion I hope that you enjoyed the bacon. My apologies to my kosher friends for any offense. Look out for new features coming soon!