SlideShare a Scribd company logo
Neo Technology, Inc Confidential
An Overview Of The
Emerging Graph Landscape
DataWeek Oct 2, 2013
Emil Eifrem
emil@neotechnology.com
@emileifrem
#neo4j
1Wednesday, October 2, 13
Neo Technology, Inc Confidential
Agenda
1. Why Graphs,Why Now?
2. What Is A Graph, Anyway?
3. Graphs In The Real World
4. The Graph Landscape
i) Popular Graph Models
ii) Graph Databases
iii)Graph Compute Engines
2Wednesday, October 2, 13
Neo Technology, Inc Confidential
Why Graphs,
Why Now?
3Wednesday, October 2, 13
Graph Buzz
4Wednesday, October 2, 13
Neo Technology, Inc Confidential
“Graph analysis is the true killer app for Big Data.”
- Forrester Research, Dec 2011
http://blogs.forrester.com/james_kobielus/11-12-19-the_year_ahead_in_big_data_big_cool_new_stuff_looms_large
Graph Buzz
5Wednesday, October 2, 13
Neo Technology, Inc Confidential
“[I]t is arguable that graph databases will have a
bigger impact on the database landscape than
Hadoop or its competitors.”
- Bloor Research, May 2012
http://www.bloorresearch.com/blog/IM-Blog/2012/5/graph-databases-nosql.html
Graph Buzz
6Wednesday, October 2, 13
Neo Technology, Inc Confidential
Graph Buzz
Ref: http://www.gartner.com/id=2081316
Copy of Gartner slide:
7Wednesday, October 2, 13
Neo Technology, Inc Confidential
Graph Buzz
8Wednesday, October 2, 13
Evolution of Web Search
Survival of the Fittest
Pre-1999
WWW Indexing
Discrete Data
1999 - 2012
Google Invents
PageRank
Connected Data
(Simple)
2012-?
Google Knowledge Graph,
Facebook Graph Search
Connected Data
(Rich)
9Wednesday, October 2, 13
Evolution of Online Recruiting
1999
Keyword Search
Discrete Data
Survival of the Fittest
2011-12
Social Discovery
Connected Data
10Wednesday, October 2, 13
Neo Technology, Inc Confidential
What Is A Graph,
Anyway?
11Wednesday, October 2, 13
Neo Technology, Inc Confidential
12Wednesday, October 2, 13
Neo Technology, Inc Confidential
MATCH (philip:Person)-[:IS_FRIEND_OF]->(friend),
(friend)-[:LIKES]->(restaurant),
(restaurant)-[:LOCATED_IN]->(newyork:Location),
(restaurant)-[:SERVES]->(sushi:Cuisine)
WHERE philip.name = 'Philip' AND newyork.location='New York' AND
sushi.cuisine='Sushi'
RETURN restaurant.name
* Cypher query language examplehttp://maxdemarzi.com/?s=facebook
13Wednesday, October 2, 13
Neo Technology, Inc Confidential
14Wednesday, October 2, 13
Neo Technology, Inc Confidential
What drugs will bind to protein X and not interact with drugY?
Of course.. a graph is a graph is a graph
15Wednesday, October 2, 13
Network Management
Example
16Wednesday, October 2, 13
Network Management - Create
CREATE
! (crm {name:"CRM"}),
! (dbvm {name:"Database VM"}),
! (www {name:"Public Website"}),
! (wwwvm {name:"Webserver VM"}),
! (srv1 {name:"Server 1"}),
! (san {name:"SAN"}),
! (srv2 {name:"Server 2"}),
! (crm)-[:DEPENDS_ON]->(dbvm),
! (dbvm)-[:DEPENDS_ON]->(srv2),
! (srv2)-[:DEPENDS_ON]->(san),
! (www)-[:DEPENDS_ON]->(dbvm),
! (www)-[:DEPENDS_ON]->(wwwvm),
! (wwwvm)-[:DEPENDS_ON]->(srv1),
! (srv1)-[:DEPENDS_ON]->(san)
Practical Cypher
17Wednesday, October 2, 13
Network Management - Impact Analysis
// Server 1 Outage
MATCH (n)<-[:DEPENDS_ON*]-(upstream)
WHERE n.name = "Server 1"
RETURN upstream
Practical Cypher
upstream
{name:"Webserver VM"}
{name:"Public Website"}
18Wednesday, October 2, 13
Network Management - Dependency Analysis
// Public website dependencies
MATCH (n)-[:DEPENDS_ON*]->(downstream)
WHERE n.name = "Public Website"
RETURN downstream
Practical Cypher
downstream
{name:"Database VM"}
{name:"Server 2"}
{name:"SAN"}
{name:"Webserver VM"}
{name:"Server 1"}
19Wednesday, October 2, 13
Network Management - Statistics
// Most depended on component
MATCH (n)<-[:DEPENDS_ON*]-(dependent)
RETURN n,
count(DISTINCT dependent)
AS dependents
ORDER BY dependents DESC
LIMIT 1
Practical Cypher
n dependents
{name:"SAN"} 6
20Wednesday, October 2, 13
Neo Technology, Inc Confidential
Graphs In The
Real World
21Wednesday, October 2, 13
Neo Technology, Inc Confidential
Core Industries
& Use Cases:
Web / ISV
Finance &
Insurance
Telecomm-
unications
Network & Data
Center Management
MDM
Social
Geo
Early Adopter Segments
(What we expected to happen - view from several years ago)
22Wednesday, October 2, 13
Neo Technology, Inc Confidential
Core Industries
& Use Cases:
Web / ISV
Finance &
Insurance
Telecomm-
unications
Network & Data
Center Management
MDM
Social
Geo
Select Commercial Customers* Across Anticipated Segments
Neo4j Adoption Snapshot
Core Industries
& Use Cases:
Software
Financial
Services
Telecomm
unications
Health Care &
Life Sciences
Web Social,
HR & Recruiting
Media &
Publishing
Energy, Services,
Automotive, Gov’t,
Logistics, Education,
Gaming, Other
Network & Data
Center
Management
MDM / System of
Record
Social
Geo
Recommend-
ations
Identity &
Access Mgmt
Content
Management
BI, CRM, Impact
Analysis, Fraud
Detection, Resource
Optimization, etc.
Accenture
Finance
Energy Aerospace
23Wednesday, October 2, 13
Neo Technology, Inc Confidential
• Network Graph
(e.g. Network Dependency Analysis, Network Inventory, etc.)
• Social Graph
(mobile apps, social recommendations, collaboration)
• Call Graph
(creating inferred social graph, churn reduction, etc.)
• Master Data Graph
(org & product hierarchy, data governance, IAM)
• Help Desk Graph
(enterprise collaboration)
5 Graphs of Telco
24Wednesday, October 2, 13
Neo Technology, Inc Confidential
• Payment Graph
(e.g. Fraud Detection, Credit Risk Analysis, Chargebacks...)
• Customer Graph
(org drillthru, product recommendations, mobile payments, etc.)
• Entitlement Graph
(identity & access management, authorization)
• Portfolio Graph
(portfolio analytics, risk analysis, trading, compliance)
• Master Data Graph
(enterprise collaboration, corporate hierarchy, data governance)
5 Graphs of Finance
Finance
25Wednesday, October 2, 13
Neo Technology, Inc Confidential
• Provider Graph
(e.g. referrals, patient management, research)
• Patient Graph
(support communities, doctor recommendations, clinical trials)
• Bioinformatic Graph
(drug research, genetic screening, plant engineering, etc.)
• Master Data Graph
(biological master data, evolutionary taxonomy, etc.)
• Treatment Graph
(collaborative medicine, clinical trials, etc.)
5 Graphs of Health Care
26Wednesday, October 2, 13
Accenture
Background
•One of the world’s largest logistics carriers
•Projected to outgrow capacity of old system
•New parcel routing system
•Single source of truth for entire network
•B2C & B2B parcel tracking
•Real-time routing: up to 5M parcels per day
Business problem
•24x7 availability, year round
•Peak loads of 2500+ parcels per second
•Complex and diverse software stack
•Need predictable performance & linear
scalability
•Daily changes to logistics network: route from
any point, to any point
Solution & Benefits
•Neo4j provides the ideal domain fit:
•a logistics network is a graph
•Extreme availability & performance with Neo4j
clustering
•Hugely simplified queries, vs. relational for
complex routing
•Flexible data model can reflect real-world data
variance much better than relational
•“Whiteboard friendly” model easy to understand
Industry: Logistics
Use case: Parcel Routing
Neo Technology Confidential
27Wednesday, October 2, 13
Neo Technology, Inc Confidential
Industry: Online Job Search
Use case: Social / Recommendations
• Online jobs and career community, providing
anonymized inside information to job seekers
Business problem
• Wanted to leverage known fact that most jobs are
found through personal & professional connections
• Needed to rely on an existing source of social
network data. Facebook was the ideal choice.
• End users needed to get instant gratification
• Aiming to have the best job search service, in a very
competitive market
Solution & Benefits
• First-to-market with a product that let users find jobs
through their network of Facebook friends
• Job recommendations served real-time from Neo4j
• Individual Facebook graphs imported real-time into Neo4j
• Glassdoor now stores > 50% of the entire Facebook
social graph
• Neo4j cluster has grown seamlessly, with new instances
being brought online as graph size and load have increased
Person
Company
KNOW
S
Person
Person
KNOWS
Company
KNOWS
WORKS_AT
WORKS_AT
Background
Sausalito, CA
28Wednesday, October 2, 13
Neo Technology, Inc Confidential
The Graph Landscape
29Wednesday, October 2, 13
Neo Technology, Inc Confidential
Overview of Popular
Graph Data Models
• Property Graph
• Description: A “directed, labeled, attributed, multi-
graph”1 which exposes three building blocks: nodes, typed
relationships and key-value properties on both nodes and
relationships
• Vendors: Neo4j, OrientDB, InfiniteGraph, Dex
• RDF Triples
• Description: URI-centered subject-predicate-object
triples as pioneered by the semantic web movement2
• Vendors: AllegroGraph, Sesame
• HyperGraph
• Description: A generalized graph where a
relationship can connect an arbitrary amount of nodes
(compared to the more common binary graph models)3
• Vendors: HyperGraphDB,TrinityDB
1] Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” 2010, http://arxiv.org/abs/1006.2361
2] W3C,“The Resource Description Framework (RDF),” 2004, http://www.w3.org/RDF/
3] Wikipedia, http://en.wikipedia.org/wiki/Hypergraph
30Wednesday, October 2, 13
Neo Technology, Inc Confidential
Graph Ecosystem @ 10k Feet
1. Graph Databases
2. Graph Compute Engines
31Wednesday, October 2, 13
Neo Technology, Inc Confidential
1.What is a
Graph Database
A graph database is an online (“real-time”)
database management system with CRUD
methods that expose a graph data model1
• Two important properties:
• Native graph processing, including
index-free adjacency to facilitate traversals
• Native graph storage engine, i.e.
written from the ground up to be
optimized for managing graph data
1] Robinson,Webber, Eifrem. Graph Databases. O’Reilly, 2013. p. 5. ISBN-10: 1449356265
32Wednesday, October 2, 13
Neo Technology, Inc Confidential
Graph Local Queries
e.g. Recommendations, Friend-of-Friend, Shortest Path
Sweet Spot for Graph Databases
33Wednesday, October 2, 13
Neo Technology, Inc Confidential
The Emerging
Graph Database Space
Graph Storage
GraphProcessing
N
on-
N
ative
Native
Native
FlockDB
AllegroGraph
The Graph
Database Space
34Wednesday, October 2, 13
Neo Technology, Inc Confidential
Processing platforms that enable graph global
computational algorithms to be run against
large data sets
Graph
Compute
Engine
(Working Storage)
In-Memory Processing
System(s)
of Record
Graph Compute
Engine
Data extraction,
transformation,
and load
2.What is a Graph
Compute Engine
35Wednesday, October 2, 13
Neo Technology, Inc Confidential
How many restaurants, on average, has each person liked?
Graph Global Queries
Sweet Spot for Graph Compute Engines
36Wednesday, October 2, 13
Neo Technology, Inc Confidential
Graph Compute Engines
• In-Memory / Single Machine
• Distributed - most common of which is the
“Bulk Synchronous Parallel Model” (aka
Pregel clone)
Largely fall into one of two patterns:
37Wednesday, October 2, 13
Neo Technology, Inc Confidential
Distributed Computing Architecture - Examples
Graph Compute Engine
• Apache project based on
Hadoop
• Bulk Synchronous
Processing Model
(Pregel Clone)
• Released in 2012 • OSS Project developed out of CMU
• Based on Hadoop & Map/Reduce
• Includes key algos for graph global
pattern matching & visualization
• OSS Project
• Distributes relationships vs. nodes
• Developed at CMU with funding
from DARPA, Intel, et al. &VC
38Wednesday, October 2, 13
Neo Technology, Inc Confidential
Cassovary
• OSS Project led by Twitter
• Used by Twitter for large-
scale graph mining (uses
daily export from FlockDB
system of record)
• “Not designed for
persistence or database
functionality”.
YarcData uRiKA
• Graph compute appliance
launched by Cray in Feb 2012
• Build to discover unforeseen
relationships in the graph
In-Memory Single-Machine Examples
Graph Compute Engine
GraphChi
• GraphLab Spinoff
• Similar order-of-magnitude
performance as GraphLab on
a Mac Mini
39Wednesday, October 2, 13
Neo Technology, Inc Confidential
Example Graph Database Deployment
Application
Other
Databases
ETL
Graph
Database
Cluster
Data Storage &
Business Rules Execution
Reporting
Graph-
Dashboards
&
Ad-hoc
Analysis
Graph
Visualization
End User Ad-hoc visual navigation &
discovery
Bulk Analytic
Infrastructure
(e.g. Graph Compute
Engine)
ETL
Graph Mining &
Aggregation
Data Scientist
Ad-Hoc
Analysis
40Wednesday, October 2, 13
Neo Technology, Inc Confidential
DEAR DATA SCIENTIST: TAKE THE RED PILL
JOIN THE GRAPH. WE ARE HIRING.
41Wednesday, October 2, 13
Neo Technology, Inc Confidential
teh end (sic)
stay connected
42Wednesday, October 2, 13

More Related Content

PDF
GraphConnect SF 2013 Keynote
PDF
Introduction to the graph technologies landscape
PPTX
The years of the graph: The future of the future is here
PDF
Graph Realities
PDF
Graphs in Telecommunications - Jesus Barrasa, Neo4j
PPTX
NoSQL Technology and Real-time, Accurate Predictive Analytics
PDF
Using A Distributed Graph Database To Make Sense Of Disparate Data Stores
PDF
Data Science Introduction - Data Science: What Art Thou?
GraphConnect SF 2013 Keynote
Introduction to the graph technologies landscape
The years of the graph: The future of the future is here
Graph Realities
Graphs in Telecommunications - Jesus Barrasa, Neo4j
NoSQL Technology and Real-time, Accurate Predictive Analytics
Using A Distributed Graph Database To Make Sense Of Disparate Data Stores
Data Science Introduction - Data Science: What Art Thou?

What's hot (20)

PDF
An Obligatory Introduction to Data Science
PDF
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
PPTX
Big-Data Computing on the Cloud
PDF
Python for Data Science - TDC 2015
PDF
How it works- Data Science
PPTX
Bigdatacooltools
PDF
Introduction to Deep Learning and AI at Scale for Managers
PPS
Big Data Science: Intro and Benefits
PDF
Democratizing Data at Airbnb
PDF
La bi, l'informatique décisionnelle et les graphes
PDF
Using Graphs to Enable National-Scale Analytics
PDF
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
PPTX
How we Learned to Stop Worrying and Solve the Distributed Graph Problem
PDF
3. Relationships Matter: Using Connected Data for Better Machine Learning
PDF
Graph technology meetup slides
ODP
How do You Graph
PDF
Full-Stack Data Science: How to be a One-person Data Team
PDF
Big data Big Analytics
PPTX
Big Data Analytics MIS presentation
PPT
Big data with hadoop
An Obligatory Introduction to Data Science
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Big-Data Computing on the Cloud
Python for Data Science - TDC 2015
How it works- Data Science
Bigdatacooltools
Introduction to Deep Learning and AI at Scale for Managers
Big Data Science: Intro and Benefits
Democratizing Data at Airbnb
La bi, l'informatique décisionnelle et les graphes
Using Graphs to Enable National-Scale Analytics
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
How we Learned to Stop Worrying and Solve the Distributed Graph Problem
3. Relationships Matter: Using Connected Data for Better Machine Learning
Graph technology meetup slides
How do You Graph
Full-Stack Data Science: How to be a One-person Data Team
Big data Big Analytics
Big Data Analytics MIS presentation
Big data with hadoop
Ad

Viewers also liked (9)

PDF
No IoT Without Identity
PDF
Graph database Use Cases
PPTX
Bitkom Cray presentation - on HPC affecting big data analytics in FS
PDF
Cray Urika-XA Advanced Analytics Platform
PDF
Building a Graph-based Analytics Platform
PDF
Choosing the Right Graph Database to Succeed in Your Project
PDF
GraphTalks Rome - The Italian Business Graph
PPTX
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
PPTX
ragam bahasa
No IoT Without Identity
Graph database Use Cases
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Cray Urika-XA Advanced Analytics Platform
Building a Graph-based Analytics Platform
Choosing the Right Graph Database to Succeed in Your Project
GraphTalks Rome - The Italian Business Graph
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
ragam bahasa
Ad

Similar to An Overview of the Emerging Graph Landscape (Oct 2013) (20)

PPTX
Follow the money with graphs
PDF
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
PPTX
Big Data Tutorial V4
PDF
Neo4j: What's Under the Hood & How Knowing This Can Help You
PDF
Thegraphrevolution
PDF
Dynamic Data Center concept
PDF
afternoon3.pdf
PDF
GraphTour 2020 - Neo4j: What's New?
PPTX
g-Social - Enhancing e-Science Tools with Social Networking Functionality
PDF
Data Science At Zillow
PDF
3rd Wave Observability: Open or Bust
PDF
Data Visualization for Big Data: Experience from the Front Line
PPT
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
PDF
Introduction to Graph databases and Neo4j (by Stefan Armbruster)
PDF
What is a Data Commons and Why Should You Care?
PPTX
Big Data Analysis : Deciphering the haystack
PDF
Neo4j in Depth
PPTX
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
PDF
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
PDF
Keynote: GraphTour Toronto
Follow the money with graphs
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Big Data Tutorial V4
Neo4j: What's Under the Hood & How Knowing This Can Help You
Thegraphrevolution
Dynamic Data Center concept
afternoon3.pdf
GraphTour 2020 - Neo4j: What's New?
g-Social - Enhancing e-Science Tools with Social Networking Functionality
Data Science At Zillow
3rd Wave Observability: Open or Bust
Data Visualization for Big Data: Experience from the Front Line
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Introduction to Graph databases and Neo4j (by Stefan Armbruster)
What is a Data Commons and Why Should You Care?
Big Data Analysis : Deciphering the haystack
Neo4j in Depth
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Keynote: GraphTour Toronto

More from Emil Eifrem (11)

PDF
Startups in Sweden vs Startups in Silicon Valley, 2015 edition
PDF
Startups in Sweden vs Startups in Silicon Valley
PDF
An intro to Neo4j and some use cases (JFokus 2011)
PDF
An overview of NOSQL (JFokus 2011)
PDF
NOSQL part of the SpringOne 2GX 2010 keynote
PDF
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
PDF
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
PDF
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
PDF
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
PDF
Neo4j - The Benefits of Graph Databases (OSCON 2009)
PDF
Neo4j -- or why graph dbs kick ass
Startups in Sweden vs Startups in Silicon Valley, 2015 edition
Startups in Sweden vs Startups in Silicon Valley
An intro to Neo4j and some use cases (JFokus 2011)
An overview of NOSQL (JFokus 2011)
NOSQL part of the SpringOne 2GX 2010 keynote
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
Neo4j - The Benefits of Graph Databases (OSCON 2009)
Neo4j -- or why graph dbs kick ass

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Mushroom cultivation and it's methods.pdf
PDF
Encapsulation theory and applications.pdf
PDF
August Patch Tuesday
PDF
Machine learning based COVID-19 study performance prediction
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Getting Started with Data Integration: FME Form 101
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
Tartificialntelligence_presentation.pptx
Spectroscopy.pptx food analysis technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
gpt5_lecture_notes_comprehensive_20250812015547.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Mushroom cultivation and it's methods.pdf
Encapsulation theory and applications.pdf
August Patch Tuesday
Machine learning based COVID-19 study performance prediction
SOPHOS-XG Firewall Administrator PPT.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25-Week II
Getting Started with Data Integration: FME Form 101
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars
Heart disease approach using modified random forest and particle swarm optimi...
Tartificialntelligence_presentation.pptx

An Overview of the Emerging Graph Landscape (Oct 2013)

  • 1. Neo Technology, Inc Confidential An Overview Of The Emerging Graph Landscape DataWeek Oct 2, 2013 Emil Eifrem emil@neotechnology.com @emileifrem #neo4j 1Wednesday, October 2, 13
  • 2. Neo Technology, Inc Confidential Agenda 1. Why Graphs,Why Now? 2. What Is A Graph, Anyway? 3. Graphs In The Real World 4. The Graph Landscape i) Popular Graph Models ii) Graph Databases iii)Graph Compute Engines 2Wednesday, October 2, 13
  • 3. Neo Technology, Inc Confidential Why Graphs, Why Now? 3Wednesday, October 2, 13
  • 5. Neo Technology, Inc Confidential “Graph analysis is the true killer app for Big Data.” - Forrester Research, Dec 2011 http://blogs.forrester.com/james_kobielus/11-12-19-the_year_ahead_in_big_data_big_cool_new_stuff_looms_large Graph Buzz 5Wednesday, October 2, 13
  • 6. Neo Technology, Inc Confidential “[I]t is arguable that graph databases will have a bigger impact on the database landscape than Hadoop or its competitors.” - Bloor Research, May 2012 http://www.bloorresearch.com/blog/IM-Blog/2012/5/graph-databases-nosql.html Graph Buzz 6Wednesday, October 2, 13
  • 7. Neo Technology, Inc Confidential Graph Buzz Ref: http://www.gartner.com/id=2081316 Copy of Gartner slide: 7Wednesday, October 2, 13
  • 8. Neo Technology, Inc Confidential Graph Buzz 8Wednesday, October 2, 13
  • 9. Evolution of Web Search Survival of the Fittest Pre-1999 WWW Indexing Discrete Data 1999 - 2012 Google Invents PageRank Connected Data (Simple) 2012-? Google Knowledge Graph, Facebook Graph Search Connected Data (Rich) 9Wednesday, October 2, 13
  • 10. Evolution of Online Recruiting 1999 Keyword Search Discrete Data Survival of the Fittest 2011-12 Social Discovery Connected Data 10Wednesday, October 2, 13
  • 11. Neo Technology, Inc Confidential What Is A Graph, Anyway? 11Wednesday, October 2, 13
  • 12. Neo Technology, Inc Confidential 12Wednesday, October 2, 13
  • 13. Neo Technology, Inc Confidential MATCH (philip:Person)-[:IS_FRIEND_OF]->(friend), (friend)-[:LIKES]->(restaurant), (restaurant)-[:LOCATED_IN]->(newyork:Location), (restaurant)-[:SERVES]->(sushi:Cuisine) WHERE philip.name = 'Philip' AND newyork.location='New York' AND sushi.cuisine='Sushi' RETURN restaurant.name * Cypher query language examplehttp://maxdemarzi.com/?s=facebook 13Wednesday, October 2, 13
  • 14. Neo Technology, Inc Confidential 14Wednesday, October 2, 13
  • 15. Neo Technology, Inc Confidential What drugs will bind to protein X and not interact with drugY? Of course.. a graph is a graph is a graph 15Wednesday, October 2, 13
  • 17. Network Management - Create CREATE ! (crm {name:"CRM"}), ! (dbvm {name:"Database VM"}), ! (www {name:"Public Website"}), ! (wwwvm {name:"Webserver VM"}), ! (srv1 {name:"Server 1"}), ! (san {name:"SAN"}), ! (srv2 {name:"Server 2"}), ! (crm)-[:DEPENDS_ON]->(dbvm), ! (dbvm)-[:DEPENDS_ON]->(srv2), ! (srv2)-[:DEPENDS_ON]->(san), ! (www)-[:DEPENDS_ON]->(dbvm), ! (www)-[:DEPENDS_ON]->(wwwvm), ! (wwwvm)-[:DEPENDS_ON]->(srv1), ! (srv1)-[:DEPENDS_ON]->(san) Practical Cypher 17Wednesday, October 2, 13
  • 18. Network Management - Impact Analysis // Server 1 Outage MATCH (n)<-[:DEPENDS_ON*]-(upstream) WHERE n.name = "Server 1" RETURN upstream Practical Cypher upstream {name:"Webserver VM"} {name:"Public Website"} 18Wednesday, October 2, 13
  • 19. Network Management - Dependency Analysis // Public website dependencies MATCH (n)-[:DEPENDS_ON*]->(downstream) WHERE n.name = "Public Website" RETURN downstream Practical Cypher downstream {name:"Database VM"} {name:"Server 2"} {name:"SAN"} {name:"Webserver VM"} {name:"Server 1"} 19Wednesday, October 2, 13
  • 20. Network Management - Statistics // Most depended on component MATCH (n)<-[:DEPENDS_ON*]-(dependent) RETURN n, count(DISTINCT dependent) AS dependents ORDER BY dependents DESC LIMIT 1 Practical Cypher n dependents {name:"SAN"} 6 20Wednesday, October 2, 13
  • 21. Neo Technology, Inc Confidential Graphs In The Real World 21Wednesday, October 2, 13
  • 22. Neo Technology, Inc Confidential Core Industries & Use Cases: Web / ISV Finance & Insurance Telecomm- unications Network & Data Center Management MDM Social Geo Early Adopter Segments (What we expected to happen - view from several years ago) 22Wednesday, October 2, 13
  • 23. Neo Technology, Inc Confidential Core Industries & Use Cases: Web / ISV Finance & Insurance Telecomm- unications Network & Data Center Management MDM Social Geo Select Commercial Customers* Across Anticipated Segments Neo4j Adoption Snapshot Core Industries & Use Cases: Software Financial Services Telecomm unications Health Care & Life Sciences Web Social, HR & Recruiting Media & Publishing Energy, Services, Automotive, Gov’t, Logistics, Education, Gaming, Other Network & Data Center Management MDM / System of Record Social Geo Recommend- ations Identity & Access Mgmt Content Management BI, CRM, Impact Analysis, Fraud Detection, Resource Optimization, etc. Accenture Finance Energy Aerospace 23Wednesday, October 2, 13
  • 24. Neo Technology, Inc Confidential • Network Graph (e.g. Network Dependency Analysis, Network Inventory, etc.) • Social Graph (mobile apps, social recommendations, collaboration) • Call Graph (creating inferred social graph, churn reduction, etc.) • Master Data Graph (org & product hierarchy, data governance, IAM) • Help Desk Graph (enterprise collaboration) 5 Graphs of Telco 24Wednesday, October 2, 13
  • 25. Neo Technology, Inc Confidential • Payment Graph (e.g. Fraud Detection, Credit Risk Analysis, Chargebacks...) • Customer Graph (org drillthru, product recommendations, mobile payments, etc.) • Entitlement Graph (identity & access management, authorization) • Portfolio Graph (portfolio analytics, risk analysis, trading, compliance) • Master Data Graph (enterprise collaboration, corporate hierarchy, data governance) 5 Graphs of Finance Finance 25Wednesday, October 2, 13
  • 26. Neo Technology, Inc Confidential • Provider Graph (e.g. referrals, patient management, research) • Patient Graph (support communities, doctor recommendations, clinical trials) • Bioinformatic Graph (drug research, genetic screening, plant engineering, etc.) • Master Data Graph (biological master data, evolutionary taxonomy, etc.) • Treatment Graph (collaborative medicine, clinical trials, etc.) 5 Graphs of Health Care 26Wednesday, October 2, 13
  • 27. Accenture Background •One of the world’s largest logistics carriers •Projected to outgrow capacity of old system •New parcel routing system •Single source of truth for entire network •B2C & B2B parcel tracking •Real-time routing: up to 5M parcels per day Business problem •24x7 availability, year round •Peak loads of 2500+ parcels per second •Complex and diverse software stack •Need predictable performance & linear scalability •Daily changes to logistics network: route from any point, to any point Solution & Benefits •Neo4j provides the ideal domain fit: •a logistics network is a graph •Extreme availability & performance with Neo4j clustering •Hugely simplified queries, vs. relational for complex routing •Flexible data model can reflect real-world data variance much better than relational •“Whiteboard friendly” model easy to understand Industry: Logistics Use case: Parcel Routing Neo Technology Confidential 27Wednesday, October 2, 13
  • 28. Neo Technology, Inc Confidential Industry: Online Job Search Use case: Social / Recommendations • Online jobs and career community, providing anonymized inside information to job seekers Business problem • Wanted to leverage known fact that most jobs are found through personal & professional connections • Needed to rely on an existing source of social network data. Facebook was the ideal choice. • End users needed to get instant gratification • Aiming to have the best job search service, in a very competitive market Solution & Benefits • First-to-market with a product that let users find jobs through their network of Facebook friends • Job recommendations served real-time from Neo4j • Individual Facebook graphs imported real-time into Neo4j • Glassdoor now stores > 50% of the entire Facebook social graph • Neo4j cluster has grown seamlessly, with new instances being brought online as graph size and load have increased Person Company KNOW S Person Person KNOWS Company KNOWS WORKS_AT WORKS_AT Background Sausalito, CA 28Wednesday, October 2, 13
  • 29. Neo Technology, Inc Confidential The Graph Landscape 29Wednesday, October 2, 13
  • 30. Neo Technology, Inc Confidential Overview of Popular Graph Data Models • Property Graph • Description: A “directed, labeled, attributed, multi- graph”1 which exposes three building blocks: nodes, typed relationships and key-value properties on both nodes and relationships • Vendors: Neo4j, OrientDB, InfiniteGraph, Dex • RDF Triples • Description: URI-centered subject-predicate-object triples as pioneered by the semantic web movement2 • Vendors: AllegroGraph, Sesame • HyperGraph • Description: A generalized graph where a relationship can connect an arbitrary amount of nodes (compared to the more common binary graph models)3 • Vendors: HyperGraphDB,TrinityDB 1] Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” 2010, http://arxiv.org/abs/1006.2361 2] W3C,“The Resource Description Framework (RDF),” 2004, http://www.w3.org/RDF/ 3] Wikipedia, http://en.wikipedia.org/wiki/Hypergraph 30Wednesday, October 2, 13
  • 31. Neo Technology, Inc Confidential Graph Ecosystem @ 10k Feet 1. Graph Databases 2. Graph Compute Engines 31Wednesday, October 2, 13
  • 32. Neo Technology, Inc Confidential 1.What is a Graph Database A graph database is an online (“real-time”) database management system with CRUD methods that expose a graph data model1 • Two important properties: • Native graph processing, including index-free adjacency to facilitate traversals • Native graph storage engine, i.e. written from the ground up to be optimized for managing graph data 1] Robinson,Webber, Eifrem. Graph Databases. O’Reilly, 2013. p. 5. ISBN-10: 1449356265 32Wednesday, October 2, 13
  • 33. Neo Technology, Inc Confidential Graph Local Queries e.g. Recommendations, Friend-of-Friend, Shortest Path Sweet Spot for Graph Databases 33Wednesday, October 2, 13
  • 34. Neo Technology, Inc Confidential The Emerging Graph Database Space Graph Storage GraphProcessing N on- N ative Native Native FlockDB AllegroGraph The Graph Database Space 34Wednesday, October 2, 13
  • 35. Neo Technology, Inc Confidential Processing platforms that enable graph global computational algorithms to be run against large data sets Graph Compute Engine (Working Storage) In-Memory Processing System(s) of Record Graph Compute Engine Data extraction, transformation, and load 2.What is a Graph Compute Engine 35Wednesday, October 2, 13
  • 36. Neo Technology, Inc Confidential How many restaurants, on average, has each person liked? Graph Global Queries Sweet Spot for Graph Compute Engines 36Wednesday, October 2, 13
  • 37. Neo Technology, Inc Confidential Graph Compute Engines • In-Memory / Single Machine • Distributed - most common of which is the “Bulk Synchronous Parallel Model” (aka Pregel clone) Largely fall into one of two patterns: 37Wednesday, October 2, 13
  • 38. Neo Technology, Inc Confidential Distributed Computing Architecture - Examples Graph Compute Engine • Apache project based on Hadoop • Bulk Synchronous Processing Model (Pregel Clone) • Released in 2012 • OSS Project developed out of CMU • Based on Hadoop & Map/Reduce • Includes key algos for graph global pattern matching & visualization • OSS Project • Distributes relationships vs. nodes • Developed at CMU with funding from DARPA, Intel, et al. &VC 38Wednesday, October 2, 13
  • 39. Neo Technology, Inc Confidential Cassovary • OSS Project led by Twitter • Used by Twitter for large- scale graph mining (uses daily export from FlockDB system of record) • “Not designed for persistence or database functionality”. YarcData uRiKA • Graph compute appliance launched by Cray in Feb 2012 • Build to discover unforeseen relationships in the graph In-Memory Single-Machine Examples Graph Compute Engine GraphChi • GraphLab Spinoff • Similar order-of-magnitude performance as GraphLab on a Mac Mini 39Wednesday, October 2, 13
  • 40. Neo Technology, Inc Confidential Example Graph Database Deployment Application Other Databases ETL Graph Database Cluster Data Storage & Business Rules Execution Reporting Graph- Dashboards & Ad-hoc Analysis Graph Visualization End User Ad-hoc visual navigation & discovery Bulk Analytic Infrastructure (e.g. Graph Compute Engine) ETL Graph Mining & Aggregation Data Scientist Ad-Hoc Analysis 40Wednesday, October 2, 13
  • 41. Neo Technology, Inc Confidential DEAR DATA SCIENTIST: TAKE THE RED PILL JOIN THE GRAPH. WE ARE HIRING. 41Wednesday, October 2, 13
  • 42. Neo Technology, Inc Confidential teh end (sic) stay connected 42Wednesday, October 2, 13