SlideShare a Scribd company logo
Using Graph Databases For Insights
Into Connected Data

Gagan Agrawal

Xebia India

1
SOFTWARE DEVELOPMENT DONE RIGHT
Netherlands | USA | India | France | UK
Agenda








High level view of Graph Space
Comparison with RDBMS and other NoSQL
stores
Data Modeling
Cypher : Graph Query Language
Graph Database Internals
Graphs In Real World

Xebia India

3
What is a Graph?

Xebia India

4
Using Graph Databases For Insights Into Connected Data.
What is a Graph?






A collection of vertices and edges.
Set of nodes and the relationships that connect
them.
Graph Represents 




Entities as NODES
The way those entities relate to the world as
RELATIONSHIP

Allows to model all kind of scenarios





System of road
Medical history
Supply chain management
Data Center
Xebia India

6
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
High Level view of Graph Space




Graph Databases - Technologies used primarily
for transactional online graph persistence –
OLTP.

Graph Compute Engines - Tecnologies used
primarily for offline graph analytics - OLAP.

Xebia India

9
Graph Databases


Online database management system with Create, Read, Update, Delete

methods that expose a graph data model.

Built for use with transactional (OLTP) systems.

Used for richly connected data.

Querying is performed through traversals.

Can perform millions of traversal steps per
second.

Traversal step resembles a join in a RDBMS
Xebia India

10
Graph Database Properties


The Underlying Storage : Native / Non-Native



The Processing Engine : Native / Non-Native

Xebia India

11
Graph DB – The Underlying Storage




Native Graph Storage – Optimized and designed
for storing and managing graphs.
Non-Native Graph Storage – Serialize the graph
data into a relational database, an object oriented
database, or some other general purpose data
store.

Xebia India

12
Using Graph Databases For Insights Into Connected Data.
Graph DB – The processing Engine


Index free adjacency – Connected Nodes
physically point to each other in the database

Xebia India

14
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Power of Graph Databases


Performance



Flexibility



Agility

Xebia India

18
Comparison


Relational Databases



NoSQL Databases



Graph Databases

Xebia India

19
Relational Databases Lack
Relationships








Initially designed to codify paper forms and
tabular structures.
Deal poorly with relationships.
The rise in connectedness translates into
increased joins.
Lower performance.
Difficult to cater for changing business needs.

Xebia India

20
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
NoSQL Databases also lack
Relationships






NOSQL Databases e.g key-value, document or
column oriented store sets of disconnected
values/documents/columns.
Makes it difficult to use them for connected data
and graphs.
One of the solution is to embed an aggregate's
identifier inside the field belonging to another
aggregate.




Effectively introducing foreign keys

Requires joining aggregates at the application
level.
Xebia India

23
NoSQL DB








Relationships between aggregates aren't first
class citizens in the data model.
Foreign aggregate "links" are not reflexive.
Need to use some external compute infrastructure
e.g Hadoop for such processing.
Do not maintain consistency of connected data.
Do not support index-free adjacency.

Xebia India

24
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Graph DB


Find friends-of-friends in a social network, to a
maximum depth of 5.



Total records : 1,000,000
Each with approximately 50 friends

Xebia India

27
Using Graph Databases For Insights Into Connected Data.
Data Modeling with Graph

Xebia India

29
Data Modeling






“Whiteboard” friendly

The typical whiteboard view of a problem is a
GRAPH.
Sketch in our creative and analytical
modes, maps closely to the data model inside the
database.

Xebia India

30
Using Graph Databases For Insights Into Connected Data.
Cypher : Graph Query Language









Pattern-Matching Query Language
Humane language
Expressive
Declarative : Say what you want, now how
Borrows from well know query languages
Aggregation, Ordering, Limit
Update the Graph

Xebia India

32
Cypher


Cypher Representation :
(c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-[:KNOWS]->(a)
(c)-[:KNOWS]->(b)-[:KNOWS]->(a)<-[:KNOWS]-(c)

Xebia India

33
Cypher
START c=node:user(name='Michael')
MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)[:KNOWS]->(a)
RETURN a, b

Xebia India

34
Other Cypher Clauses


WHERE




CREATE and CREATE UNIQUE




Create nodes and relationships

DELETE




Provides criteria for filtering pattern matching
results.

Removes nodes, relationships and properties

SET


Sets property values

Xebia India

35
Other Cypher Clauses


FOREACH




UNION




Performs an updating action for graph element in
a list.
Merge results from two or more queries.

WITH


Chains subsequent query parts and forward
results from one to the next. Similar to piping
commands in UNIX.

Xebia India

36
Comparison of Relational and Graph Modeling

Xebia India

37
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Graph Database Internals

Xebia India

43
Non Functional Characteristics


Transactions






Fully ACID

Recoverability
Availability
Scalability

Xebia India

44
Scalability


Capacity (Graph Size)



Latency (Response Time)



Read and Write Throughput

Xebia India

45
Capacity




1.9 Release of Neo4j can support single graphs
having 10s of billions of nodes, relationships
and properties.
The Neo4j team has publicly expressed the
intention to support 100B+
nodes/relationships/properties in a single
graph.

Xebia India

46
Latency











RDBMS – more data in tables/indexes result in
longer join operations.
Graph DB doesn't suffer the same latency
problem.
Index is used to find starting node.
Traversal uses a combination of pointer chasing
and pattern matching to search the data.
Performance does not depend on total size of the
dataset.
Depends only on the data being queried.
Xebia India

47
Throughput


Constant performance irrespective of graph size.

Xebia India

48
Graphs in the Real World

Xebia India

49
Common Use Cases





Social
Recommendations
Geo
Logistics Networks : for package routing, finding shortest
Path





Financial Transaction Graphs : for fraud detection
Master Data Management
Bioinformatics : Era7 to relate complex web of information
that includes genes, proteins and enzymes



Authorization and Access Control : Adobe Creative
Cloud, Telenor
Xebia India

50
Using Graph Databases For Insights Into Connected Data.
Using Graph Databases For Insights Into Connected Data.
Thank You

Xebia India

53
BigData & Real Time Analytics

Services
Visualization (Tableau)
Analytics Framework (Mahout)
Integration (Sqoop, Flume , Storm)
Hadoop Powered Solutions (Pig, Hive, Oozie,
Hbase Impala) (Solr, Elastic Search)
Core Hadoop
(HDFS, MapReduce,Zookeeper, Cloudera

Trainings
- Cloudera Data Analyst /
Developer / Admin
Training

Products
- Divolte
- Wearable Sensors

Solutions
- Big data warehousing
- Scalable big data etl
- High volume web
analytics
Contact us @

Websites

www.xebia.in
www.xebia.com
www.xebia.fr

Xebia India

infoindia@xebia.com

Thought
Leadership

Htto://xebee.xebia.in
http://blog.xebia.com
http://podcast.xebia.com

More Related Content

PDF
jagadeesh updated
PDF
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
PPT
Graph Analytics for big data
PDF
ExecutiveWhitePaper
PPT
Data Visualization Workshop using Tableau
PDF
Big Data Analytics With MATLAB
jagadeesh updated
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Graph Analytics for big data
ExecutiveWhitePaper
Data Visualization Workshop using Tableau
Big Data Analytics With MATLAB

What's hot (20)

PDF
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
PDF
Graph-Powered Machine Learning
PPTX
Informatica training
PDF
Neo4j MeetUp - Graph Exploration with MetaExp
PDF
Improving Machine Learning using Graph Algorithms
DOC
PDF
Power bi ea content pack v0.1
PPT
Tableau workshop during ICCTAC 2018
PPTX
Graphs and Financial Services Analytics
DOCX
Deblina Dey - Resume
PDF
What is Power BI
PDF
An Introduction to Graph: Database, Analytics, and Cloud Services
PDF
Graph analytics in Linkurious Enterprise
PPTX
Conference 2014: Rajat Arya - Deployment with GraphLab Create
PPTX
Tableau
PPTX
Business Intelligence tools comparison
PDF
Data Modeling with Neo4j
PPT
Pentaho etl-tool
ODP
Graphs are everywhere! Distributed graph computing with Spark GraphX
PPTX
GraphLab Conference 2014 Keynote - Carlos Guestrin
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Graph-Powered Machine Learning
Informatica training
Neo4j MeetUp - Graph Exploration with MetaExp
Improving Machine Learning using Graph Algorithms
Power bi ea content pack v0.1
Tableau workshop during ICCTAC 2018
Graphs and Financial Services Analytics
Deblina Dey - Resume
What is Power BI
An Introduction to Graph: Database, Analytics, and Cloud Services
Graph analytics in Linkurious Enterprise
Conference 2014: Rajat Arya - Deployment with GraphLab Create
Tableau
Business Intelligence tools comparison
Data Modeling with Neo4j
Pentaho etl-tool
Graphs are everywhere! Distributed graph computing with Spark GraphX
GraphLab Conference 2014 Keynote - Carlos Guestrin
Ad

Viewers also liked (12)

PDF
Managing RDF data with graph databases
PPTX
Impact of BIG Data on MDM
PPTX
The NoSQL Geospatial Landscape
PDF
Finding Insights In Connected Data: Using Graph Databases In Journalism
PDF
NoSQL Database: Classification, Characteristics and Comparison
PPTX
Neo4j GraphTalks - Semantische Netze
PPTX
Graphs in the Real World
PPTX
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
PDF
Customer 360
PPTX
Using a Graph Database for Next-Gen MDM
PPTX
Neo4j GraphTalks - Einführung in Graphdatenbanken
PDF
Graph database Use Cases
Managing RDF data with graph databases
Impact of BIG Data on MDM
The NoSQL Geospatial Landscape
Finding Insights In Connected Data: Using Graph Databases In Journalism
NoSQL Database: Classification, Characteristics and Comparison
Neo4j GraphTalks - Semantische Netze
Graphs in the Real World
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Customer 360
Using a Graph Database for Next-Gen MDM
Neo4j GraphTalks - Einführung in Graphdatenbanken
Graph database Use Cases
Ad

Similar to Using Graph Databases For Insights Into Connected Data. (20)

PPT
Graph db
PDF
Graph Database Use Cases - StampedeCon 2015
PPTX
The year of the graph: do you really need a graph database? How do you choose...
PPTX
Graph Databases
PPTX
PDF
Graph based data models
PPT
10. Graph Databases
PDF
Graph Database
PPTX
GraphTalks Rome - Selecting the right Technology
PPTX
Introduction to graph databases in term of neo4j
PPTX
no sql ppt.pptx
PDF
A Survey on Graph Database Management Techniques for Huge Unstructured Data
PDF
Gerry McNicol Graph Databases
PDF
Evaluation of graph databases
PPTX
Graph Database and Why it is gaining traction
PDF
Intro to Graphs for Fedict
ODP
Graph databases
PPTX
Ramya ppt.pptx
PPTX
NoSQL Module -5.pptx nosql module 4 notes
PPTX
Module 2.3 Document Databases in NoSQL Systems
Graph db
Graph Database Use Cases - StampedeCon 2015
The year of the graph: do you really need a graph database? How do you choose...
Graph Databases
Graph based data models
10. Graph Databases
Graph Database
GraphTalks Rome - Selecting the right Technology
Introduction to graph databases in term of neo4j
no sql ppt.pptx
A Survey on Graph Database Management Techniques for Huge Unstructured Data
Gerry McNicol Graph Databases
Evaluation of graph databases
Graph Database and Why it is gaining traction
Intro to Graphs for Fedict
Graph databases
Ramya ppt.pptx
NoSQL Module -5.pptx nosql module 4 notes
Module 2.3 Document Databases in NoSQL Systems

More from Xebia IT Architects (20)

PPT
Use Cases of #Grails in #WebApplications
PPT
When elephants dance , enterprise goes mobile !
PDF
DevOps demystified
PDF
Exploiting vulnerabilities in location based commerce
PDF
Modelling RESTful applications – Why should I not use verbs in REST url
PDF
Scrumban - benefits of both the worlds
PDF
#Continuous delivery with #Deployit
PDF
Continuous integration using thucydides(bdd) with selenium
PPTX
Battlefield agility
PPTX
Fish!ing for agile teams
PDF
Xebia-Agile consulting and training offerings
PPT
Xebia e-Commerce / mCommerce Solutions
PPT
Growth at Xebia
PPTX
A warm and prosperous Happy Diwali to all our clients
PDF
"We Plan to double our headcount" - MD, Xebia India
PPT
Agile 2.0 - Our Road to Mastery
PPTX
Agile FAQs by Shrikant Vashishtha
PPTX
Agile Team Dynamics by Bhavin Chandulal Javia
PPTX
Practicing Agile in Offshore Environment by Himanshu Seth & Imran Mir
PPT
Moving Gradually to Agile Development by Kavita Gupta
Use Cases of #Grails in #WebApplications
When elephants dance , enterprise goes mobile !
DevOps demystified
Exploiting vulnerabilities in location based commerce
Modelling RESTful applications – Why should I not use verbs in REST url
Scrumban - benefits of both the worlds
#Continuous delivery with #Deployit
Continuous integration using thucydides(bdd) with selenium
Battlefield agility
Fish!ing for agile teams
Xebia-Agile consulting and training offerings
Xebia e-Commerce / mCommerce Solutions
Growth at Xebia
A warm and prosperous Happy Diwali to all our clients
"We Plan to double our headcount" - MD, Xebia India
Agile 2.0 - Our Road to Mastery
Agile FAQs by Shrikant Vashishtha
Agile Team Dynamics by Bhavin Chandulal Javia
Practicing Agile in Offshore Environment by Himanshu Seth & Imran Mir
Moving Gradually to Agile Development by Kavita Gupta

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Approach and Philosophy of On baking technology
PDF
KodekX | Application Modernization Development
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Approach and Philosophy of On baking technology
KodekX | Application Modernization Development
The AUB Centre for AI in Media Proposal.docx
Teaching material agriculture food technology
Cloud computing and distributed systems.
MYSQL Presentation for SQL database connectivity
Unlocking AI with Model Context Protocol (MCP)
The Rise and Fall of 3GPP – Time for a Sabbatical?
Agricultural_Statistics_at_a_Glance_2022_0.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
Spectral efficient network and resource selection model in 5G networks
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
Mobile App Security Testing_ A Comprehensive Guide.pdf

Using Graph Databases For Insights Into Connected Data.

  • 1. Using Graph Databases For Insights Into Connected Data Gagan Agrawal Xebia India 1
  • 2. SOFTWARE DEVELOPMENT DONE RIGHT Netherlands | USA | India | France | UK
  • 3. Agenda       High level view of Graph Space Comparison with RDBMS and other NoSQL stores Data Modeling Cypher : Graph Query Language Graph Database Internals Graphs In Real World Xebia India 3
  • 4. What is a Graph? Xebia India 4
  • 6. What is a Graph?    A collection of vertices and edges. Set of nodes and the relationships that connect them. Graph Represents    Entities as NODES The way those entities relate to the world as RELATIONSHIP Allows to model all kind of scenarios     System of road Medical history Supply chain management Data Center Xebia India 6
  • 9. High Level view of Graph Space   Graph Databases - Technologies used primarily for transactional online graph persistence – OLTP. Graph Compute Engines - Tecnologies used primarily for offline graph analytics - OLAP. Xebia India 9
  • 10. Graph Databases  Online database management system with Create, Read, Update, Delete methods that expose a graph data model.  Built for use with transactional (OLTP) systems.  Used for richly connected data.  Querying is performed through traversals.  Can perform millions of traversal steps per second.  Traversal step resembles a join in a RDBMS Xebia India 10
  • 11. Graph Database Properties  The Underlying Storage : Native / Non-Native  The Processing Engine : Native / Non-Native Xebia India 11
  • 12. Graph DB – The Underlying Storage   Native Graph Storage – Optimized and designed for storing and managing graphs. Non-Native Graph Storage – Serialize the graph data into a relational database, an object oriented database, or some other general purpose data store. Xebia India 12
  • 14. Graph DB – The processing Engine  Index free adjacency – Connected Nodes physically point to each other in the database Xebia India 14
  • 18. Power of Graph Databases  Performance  Flexibility  Agility Xebia India 18
  • 20. Relational Databases Lack Relationships      Initially designed to codify paper forms and tabular structures. Deal poorly with relationships. The rise in connectedness translates into increased joins. Lower performance. Difficult to cater for changing business needs. Xebia India 20
  • 23. NoSQL Databases also lack Relationships    NOSQL Databases e.g key-value, document or column oriented store sets of disconnected values/documents/columns. Makes it difficult to use them for connected data and graphs. One of the solution is to embed an aggregate's identifier inside the field belonging to another aggregate.   Effectively introducing foreign keys Requires joining aggregates at the application level. Xebia India 23
  • 24. NoSQL DB      Relationships between aggregates aren't first class citizens in the data model. Foreign aggregate "links" are not reflexive. Need to use some external compute infrastructure e.g Hadoop for such processing. Do not maintain consistency of connected data. Do not support index-free adjacency. Xebia India 24
  • 27. Graph DB  Find friends-of-friends in a social network, to a maximum depth of 5.   Total records : 1,000,000 Each with approximately 50 friends Xebia India 27
  • 29. Data Modeling with Graph Xebia India 29
  • 30. Data Modeling    “Whiteboard” friendly The typical whiteboard view of a problem is a GRAPH. Sketch in our creative and analytical modes, maps closely to the data model inside the database. Xebia India 30
  • 32. Cypher : Graph Query Language        Pattern-Matching Query Language Humane language Expressive Declarative : Say what you want, now how Borrows from well know query languages Aggregation, Ordering, Limit Update the Graph Xebia India 32
  • 33. Cypher  Cypher Representation : (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-[:KNOWS]->(a) (c)-[:KNOWS]->(b)-[:KNOWS]->(a)<-[:KNOWS]-(c) Xebia India 33
  • 35. Other Cypher Clauses  WHERE   CREATE and CREATE UNIQUE   Create nodes and relationships DELETE   Provides criteria for filtering pattern matching results. Removes nodes, relationships and properties SET  Sets property values Xebia India 35
  • 36. Other Cypher Clauses  FOREACH   UNION   Performs an updating action for graph element in a list. Merge results from two or more queries. WITH  Chains subsequent query parts and forward results from one to the next. Similar to piping commands in UNIX. Xebia India 36
  • 37. Comparison of Relational and Graph Modeling Xebia India 37
  • 44. Non Functional Characteristics  Transactions     Fully ACID Recoverability Availability Scalability Xebia India 44
  • 45. Scalability  Capacity (Graph Size)  Latency (Response Time)  Read and Write Throughput Xebia India 45
  • 46. Capacity   1.9 Release of Neo4j can support single graphs having 10s of billions of nodes, relationships and properties. The Neo4j team has publicly expressed the intention to support 100B+ nodes/relationships/properties in a single graph. Xebia India 46
  • 47. Latency       RDBMS – more data in tables/indexes result in longer join operations. Graph DB doesn't suffer the same latency problem. Index is used to find starting node. Traversal uses a combination of pointer chasing and pattern matching to search the data. Performance does not depend on total size of the dataset. Depends only on the data being queried. Xebia India 47
  • 48. Throughput  Constant performance irrespective of graph size. Xebia India 48
  • 49. Graphs in the Real World Xebia India 49
  • 50. Common Use Cases     Social Recommendations Geo Logistics Networks : for package routing, finding shortest Path    Financial Transaction Graphs : for fraud detection Master Data Management Bioinformatics : Era7 to relate complex web of information that includes genes, proteins and enzymes  Authorization and Access Control : Adobe Creative Cloud, Telenor Xebia India 50
  • 54. BigData & Real Time Analytics Services Visualization (Tableau) Analytics Framework (Mahout) Integration (Sqoop, Flume , Storm) Hadoop Powered Solutions (Pig, Hive, Oozie, Hbase Impala) (Solr, Elastic Search) Core Hadoop (HDFS, MapReduce,Zookeeper, Cloudera Trainings - Cloudera Data Analyst / Developer / Admin Training Products - Divolte - Wearable Sensors Solutions - Big data warehousing - Scalable big data etl - High volume web analytics
  • 55. Contact us @ Websites www.xebia.in www.xebia.com www.xebia.fr Xebia India infoindia@xebia.com Thought Leadership Htto://xebee.xebia.in http://blog.xebia.com http://podcast.xebia.com

Editor's Notes

  • #55: Services should include hadoop consulting rather