SlideShare a Scribd company logo
03   introduction to graph databases
● Components of a Property Graph
● Introducing Game of Thrones
● Introducing Cypher
● Introducing APOC
● Cypher strikes back
● Property Graph Modeling 101
03   introduction to graph databases
Yes indeed. While Graph Theory has been around since 1736 there
are quite a few ways to implement it. Currently there are two that
matter in the context of databases
● RDF Graph
● Property Graph
A discussion on these is not within the scope of this training, Neo4j
implements the Property Graph (and can be considered the inventor of
said type of graph). If you want more information/a comparison, your
instructor will be happy to provide that offline.
Node - vertex, entity, object, thing
Node - vertex, entity, object, thing
Label - Class(ification) of a Node
Alien
Person
Node - vertex, entity, object, thing
Label - Class(ification) of a Node
Relationship - edge, physical link
between two nodes
Node - vertex, entity, object, thing
Label - Class(ification) of a Node
Relationship - edge, physical link
between two nodes
Type - Class(ification) of a Relationship
COMMANDS
Node - vertex, entity, object, thing
Label - Class(ification) of a Node
Relationship - edge, physical link
between two nodes
Type - Class(ification) of a Relationship
Property - Key Value Pair
COMMANDS
stardate: 41054.3
Relational Graph
Rows Nodes
Joins Relationships
Table names Labels
Columns Properties
Relational Graph
Each column must have
value (or null)
Nodes with the same label
are not required to have the
same properties
Joins are calculated at read
time
Relationships are created at
write time
A row belongs only to one
table
A node can have many (or
no) labels
Relational Graph
03   introduction to graph databases
A television series with the usual mix of ...
Drugs Sex Violence
Dragons
Based on the book series A Song of Ice and Fire by George R. R.
Martin, what is special about it is the sheer number of characters
that have a big storyline (and - importantly - all interact with each
other). In the movies its dozens (in the books it runs in the
hundreds).
Rather than put of audiences, this turned it into a hit. And it also
made this one hell of a graph database usecase …
One of the two databases we'll be using in this training (and that
you should have ready to go) is a Game Of Thrones interaction
database.
So this is finally where you start doing stuff. And lets start with
looking what model that database has.
As explained in the preparation email for this training, you should
now have
● a Neo4j instance running
● a gameofthrones database loaded in that instance
● a Neo4j Browser session connected to that instance
I'll not be flipping between slides and code execution. You put me
in a window now and follow along either in the browser guide or by
cutting and pasting from a website ...
So it's either … inside the Neo4j Browser you run
:play http://neo4jguides.tomgeudens.io/gdsintro.html
(note that this requires a neo4j.conf setting to whitelist the host)
or you open a regular browser session too and go to
https://bit.ly/neo4j-gds-intro
and cut-and-paste the commands from there
03   introduction to graph databases
The syntax to show the model of a database is ...
// Make sure you're in the correct database
:use gameofthrones
// Show the model
CALL db.schema.visualization();
03   introduction to graph databases
That wasn't exactly what you got, was it?
For me - as human - labels like Knight, Dead, King are obviously all
extra labels, subclassing the Person label and thus I manually
removed them for clarity. The database makes no such distinctions
and does not subclass. All labels are separate. And that's why you
got what you got.
Note that Rome almost had a horse as consul at some point. Allegedly,
true, but Neo4j could have handled that for real!
03   introduction to graph databases
This is
Cypher
This is
Le Chiffre
● A declarative query language
● Focus on the what, not the how
● Uses ASCII art to represent nodes and relationships
("`-''-/").___..--''"`-._
`6_ 6 ) `-. ( ).`-.__.`)
(_Y_.)' ._ ) `._ `. ``-..-'
_..`--'_..-_/ /--'_.'
((((.-'' ((((.' (((.-'
It won't get this complex, I promise ...
Diving straight in I'm afraid I have to start by taking away your
precioussss SELECT. A Cypher query always starts with a pattern
match and thus the logical keyword is MATCH
// Find and return all nodes in the database
MATCH (n)
RETURN n;
● What you just did is actually a pretty bad idea. While the
database has no issue with you asking for all the nodes, simple
visualization tools (such as the Neo4j Browser) typically can't
handle what comes back.
● If you want to think along the lines of what you know about
SQL queries, you could see the MATCH as a preprocessing step
and the RETURN as the select/projection.
The trick to a good graph query (which I'll repeat ad nauseam) is to
efficiently find your starting points. Using labels is a great way to do
that.
// Find and return all House nodes in the database
MATCH (n:House)
RETURN n;
Since a node can have multiple labels, using multiple labels in the
base syntax implies AND and returns nodes that have all the
specified labels
// Find and return all Person nodes in the database that also have
the King and the Dead label
MATCH (n:Person:King:Dead)
RETURN n;
● How many (:Person:King:Dead) nodes did you get?
● Did you notice (for all queries so far) something strange in the
results?
● 9
● Did you ask for relationships in your query? So why then are
there relationships on screen?
The whole point of a graph database is to be able to traverse
relationships (which in the case of graph native Neo4j means
hopping pointers). So here we go ...
// Which houses where present in which battles?
MATCH (h:House)-[]->(b:Battle)
RETURN h.name, b.name;
The more specific you can make a query, the more efficient it -
generally speaking - becomes.
// Which houses attacked in which battles?
MATCH (h:House)-[a:ATTACKER]->(b:Battle)
RETURN h,a,b LIMIT 30;
As a relationship has one and only one type, using multiple
relationships in the base syntax implies OR.
// Which houses attacked or defended in which battles?
MATCH (h:House)-[ad:ATTACKER|DEFENDER]->(b:Battle)
RETURN h,ad,b LIMIT 30;
● The LIMIT is imposed to make sure the browser doesn't blow
up, the database is fine, thank you very much.
● A Cypher query can RETURN quite a few different things and it is
your application's job to handle them ...
03   introduction to graph databases
I bet you saw that one coming ...
● Awesome Procedures On Cypher
● A library of user defined procedures (UDP) and functions
(UDF) that extends the Cypher query language.
● In this particular case it is a supported library, that provides
tons of convenience tools to Cypher.
It's funny, because in The Matrix, Cypher brutally kills Apoc. In reality
you could argue that Apoc totally saves Cypher!
Stored procedures, are pieces of slightly enhanced/wrapped query
syntax (PL/SQL comes to mind) and are - as the name implies -
stored inside the database itself.
APOC is a library of Java code that is deployed - as a jar file - to a
folder together with the database software. Yes, of course you can
also create such libraries of your own (many customers do) but
forget about what you know about stored procedures. UDPs and
UDFs are neither stored or created similarly!
● Returns one value
● If it does stuff on the database itself, this is always read-only
● Is used inline
Try it …
// Generate a UUID and return it and the versions of APOC and GDS
RETURN apoc.create.uuid() as uuid, apoc.version() as apocversion,
gds.version() as gdsversion;
● YIELDs a stream of results using a predefined signature
● If it does stuff on the database itself, this can be read or write
● Is CALLed
Try it
// Find out about the signature of a procedure
CALL apoc.help("help") YIELD signature
RETURN signature;
Try it some more
// Show which available procedures can load data into the database
CALL apoc.help("load") YIELD type, name, text, signature, core
RETURN type, name, text, signature, core;
03   introduction to graph databases
So in order to start doing Graph Data Science on the Neo4j
database you're going to need a bit more advanced Cypher than
we've seen so far ...
● WHERE clauses
● Aggregation stuff … COLLECT/UNWIND, COUNT, ...
● Intermediate projection with … WITH
● Result deduplication with … DISTINCT
All of this was a prerequiste though … ;-)
I find it always calms audiences down when - after brutally ripping
away their SELECT - I can give (back) the WHERE clause …
// So these are braces ...
MATCH (h:House {name:"Darry"})-[d:DEFENDER]->(b:Battle)
RETURN h,d,b;
// And they are the equivalent of checking for equality
MATCH (h:House)-[d:DEFENDER]->(b:Battle)
WHERE h.name = "Darry"
RETURN h,d,b;
Between the braces you can only do equality, but the WHERE clause
comes with the full range of options ...
MATCH (p:Person)-[:BELONGS_TO]->(h:House)
WHERE p.death_year >= 300 AND p.death_year <= 1200
RETURN p.name, h.name;
MATCH (p:Person)-[:BELONGS_TO]->(h:House)
WHERE 300 <= p.death_year <= 1200
RETURN p.name, h.name;
It came as a bit of a surprise (to me at least) that there is indeed
such a debate. One that has in fact been raging ever since
aggregation became a thing in a database query language. Neo4j
does IMPLICIT aggregation, there is NO group by.
Try it
// How many persons does each house have?
MATCH (a:Person)-[:BELONGS_TO]->(h:House)
RETURN h.name as Housename, count(*) as Household;
An aggregation in a projection (RETURN or WITH) will implicitely be
done on everything that is not aggregated ...
Read that again … and again, now (and I'm not giving you syntax)
tell me how big the household of Brotherhood Without Banners
is ...
6
Check again if you didn't get that ...
There are quite a few aggregation functions, but the two you'll use
most in this training are count and collect.
Try it
// Which commanders didn't learn the first time?
MATCH
(ac:Person)-[:ATTACKER_COMMANDER]->(b:Battle)<-[:DEFENDER_C
OMMANDER]-(dc:Person)
RETURN ac.name, dc.name, count(b) AS commonBattles,
collect(b.name) AS battlenames
ORDER BY commonBattles DESC LIMIT 5
● I guess explaining what a count does is a bit pointless (wait
until after the deduplication to start laughing though).
● A collect aggregates detail into a list.
● ORDER BY … SKIP … LIMIT
work exactly the same as you're used to … all three are needed
if you want to do paging.
I can explain this to you, but I can't understand it for you. So try the
following and answer the - relatively - simple question … How
many times is Jon Snow on screen?
// Just Jon, right?
MATCH (p:Person {name: "Jon Snow"})-[:INTERACTS]->()
RETURN p;
I would say your eyes deceived you … allow the
machine to count.
// Counting Jon
MATCH (p:Person {name: "Jon Snow"})-[:INTERACTS]->()
RETURN count(p);
89
// I am the only Jon
MATCH (p:Person {name: "Jon Snow"})-[:INTERACTS]->()
RETURN DISTINCT p;
This is one of the biggest issues people have with Cypher queries. It
does pattern matching. Every match returns a pattern that is
unique as a whole, not in it's parts.
Jon Snow shows up in 89 unique interaction patterns.
I made several attempts to add something to the previous line, but the
point is made best if you understand exactly what that line (no more,
no less) says. Please take a moment to do that, I'll wait ...
WITH
● provides an intermediate projection with exactly the same
functionality as a RETURN (but without ending the query)
● controls the visibility of variables beyond the WITH … what isn't
projected is gone
● comes with an additional WHERE clause that is the equivalent
of the HAVING clause in SQL
● is the main building block for building Cypher pipeline queries
It's only the second slide … remember that they kept that ruse
going for almost 8 seasons … and you kept watching!
// The point of the query is moot, but it shows what WITH can do ...
MATCH (p:Person)-[:ATTACKER_COMMANDER]->(b:Battle)
WITH p.name as name, count(*) AS numBattles, collect(b.name) as
battles
WHERE numBattles = 2
RETURN name, numBattles, battles;
It's only the second slide … remember that they kept that ruse
going for almost 8 seasons … and you kept watching!
// The point of the query is moot, but it shows what WITH can do ...
MATCH (p:Person)-[:ATTACKER_COMMANDER]->(b:Battle)
WITH p.name as name, count(*) AS numBattles, collect(b.name) as
battles
WHERE numBattles = 2
RETURN name, numBattles, battles;
03   introduction to graph databases
UNWIND takes a collection and turns it into rows … nice … but what
exactly does that mean?
// How many results come out?
WITH range(1,5) as acollection
UNWIND acollection as arow
WITH arow, acollection, apoc.coll.shuffle(acollection) as ashuffle
UNWIND ashuffle as anotherrow
RETURN arow, anotherrow, acollection,ashuffle;
So a collect aggregates anything into a list (= collection) and an
UNWIND turns it back into rows … aaaaarrrgghhhh make it stop !!!
// aggressive lot, these
MATCH (a:House)-[:ATTACKER]->()
WITH collect(DISTINCT a) AS attackers
MATCH (d:House)-[:DEFENDER]->()
WITH attackers, collect(DISTINCT d) AS defenders
UNWIND apoc.coll.removeAll(attackers,defenders) AS houses
RETURN houses.name as Names;
03   introduction to graph databases
To lead you into this list of modeling steps …
1. Define high-level domain requirements
2. Create sample data for modeling purposes
3. Define the questions for the domain
4. Identify entities
5. Identify connections between entities
6. Test the model against the questions
7. Test scalability
And I'm sure that is all very interesting and … yadi yada …
So what we are going to do instead
is take a good look at that Game Of
Thrones model again and see what
does / does not make sense!
03   introduction to graph databases
03   introduction to graph databases
● Are you capturing all the information? For example … who is
actually fighting in these battles? And why do we only care about
commanders and kings?
● Instance modeling helps you to ask the questions … for
example (and I haven't seen a single episode of the whole thing
myself) … are you really telling me that all the battles are so
pitched/clearcut that no house and not a single commander ever
changes side during one of them?
● ...
03   introduction to graph databases
● Why? Why is this not just a label (like Knight and King and Dead
… and a word on those too in the next bulletpoint)? Is this
adding anything to the questions we want to answer?
● Does status never change over time? In fact, that goes for
Knight and King and definitely Dead too. The model seems to
support a current situation for Person nodes only … but when
is this exactly? Having a time dimension for some things but
not for all can put you on the path to false conclusions!
● ...
I can keep this up for several hours (and then we call that a
Modeling training) … but here are the top tips …
● Work concrete question driven. As the model is very flexible
you can easy iterate and adapt as your understanding grows.
● Leverage instance modeling to show if you've missed
information. Your instance model should show all cases.
● Be expressive in your relationshiptypes. The only
touching-stone is whether the business users understand it
and see their business reflected in it!
03   introduction to graph databases

More Related Content

PDF
05 neo4j gds graph catalog
PDF
Mateusz herych content search problem on android
PPT
Playing with d3.js
PDF
Better Web Clients with Mantle and AFNetworking
PDF
Gremlin 101.3 On Your FM Dial
PPT
Introduction to couch_db
PDF
University of arizona mobile matters - technology, a means to an end
PDF
Workers of the web - BrazilJS 2013
05 neo4j gds graph catalog
Mateusz herych content search problem on android
Playing with d3.js
Better Web Clients with Mantle and AFNetworking
Gremlin 101.3 On Your FM Dial
Introduction to couch_db
University of arizona mobile matters - technology, a means to an end
Workers of the web - BrazilJS 2013

What's hot (17)

PDF
Understanding our code with tests, schemas, and types
PPTX
Neo4j 20 minutes introduction
PPT
Schema design short
PDF
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
PDF
Neo4j for Ruby and Rails
PDF
Advanced Regular Expressions in .NET
PDF
Automatically Spotting Cross-language Relations
PDF
Scala bad practices, scala.io 2019
PPTX
ETL for Pros: Getting Data Into MongoDB
KEY
The Ruby/mongoDB ecosystem
PDF
The State of NoSQL
PDF
Open Problems in the Universal Graph Theory
PDF
Simple fuzzy name matching in elasticsearch paris meetup
PDF
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
PDF
The Dynamic Language is not Enough
PPTX
Data Governance with JSON Schema
PDF
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
Understanding our code with tests, schemas, and types
Neo4j 20 minutes introduction
Schema design short
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
Neo4j for Ruby and Rails
Advanced Regular Expressions in .NET
Automatically Spotting Cross-language Relations
Scala bad practices, scala.io 2019
ETL for Pros: Getting Data Into MongoDB
The Ruby/mongoDB ecosystem
The State of NoSQL
Open Problems in the Universal Graph Theory
Simple fuzzy name matching in elasticsearch paris meetup
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
The Dynamic Language is not Enough
Data Governance with JSON Schema
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
Ad

Similar to 03 introduction to graph databases (20)

PDF
Neo4j Introduction (Basics, Cypher, RDBMS to GRAPH)
PDF
Introduction to Graphs with Neo4j
PPTX
The Inside Scoop on Neo4j: Meet the Builders
PDF
Neo4j: Graph-like power
PDF
Neo4j Graph Data Science Training - June 9 & 10 - Slides #3
PDF
Getting started with Graph Databases & Neo4j
PDF
Training Week: Introduction to Neo4j
PDF
Neo4j Introduction - Game of Thrones
PDF
Neo4j Data Science Presentation
PPTX
Neo4j Training Introduction
PDF
Neo4J
PPTX
Intro to Cypher
PDF
Neo4j (Part 1)
PDF
Training Week: Introduction to Neo4j 2022
PPT
Hands on Training – Graph Database with Neo4j
PDF
Workshop Introduction to Neo4j
PDF
Intro to Cypher
PDF
Graph Database Using Neo4J
PPTX
Introduction to graph databases in term of neo4j
Neo4j Introduction (Basics, Cypher, RDBMS to GRAPH)
Introduction to Graphs with Neo4j
The Inside Scoop on Neo4j: Meet the Builders
Neo4j: Graph-like power
Neo4j Graph Data Science Training - June 9 & 10 - Slides #3
Getting started with Graph Databases & Neo4j
Training Week: Introduction to Neo4j
Neo4j Introduction - Game of Thrones
Neo4j Data Science Presentation
Neo4j Training Introduction
Neo4J
Intro to Cypher
Neo4j (Part 1)
Training Week: Introduction to Neo4j 2022
Hands on Training – Graph Database with Neo4j
Workshop Introduction to Neo4j
Intro to Cypher
Graph Database Using Neo4J
Introduction to graph databases in term of neo4j
Ad

More from Neo4j (20)

PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
PDF
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
PDF
GraphSummit Singapore Master Deck - May 20, 2025
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
PPTX
Neo4j Knowledge for Customer Experience.pptx
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
PDF
Neo4j: The Art of the Possible with Graph
PDF
Smarter Knowledge Graphs For Public Sector
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
PDF
ANZ Presentation: GraphSummit Melbourne 2024
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
PDF
Démonstration Digital Twin Building Wire Management
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
PDF
Démonstration Supply Chain - GraphTalk Paris
PDF
The Art of Possible - GraphTalk Paris Opening Session
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
GraphSummit Singapore Master Deck - May 20, 2025
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j Knowledge for Customer Experience.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j: The Art of the Possible with Graph
Smarter Knowledge Graphs For Public Sector
GraphRAG and Knowledge Graphs Exploring AI's Future
Matinée GenAI & GraphRAG Paris - Décembre 24
ANZ Presentation: GraphSummit Melbourne 2024
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Démonstration Digital Twin Building Wire Management
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Démonstration Supply Chain - GraphTalk Paris
The Art of Possible - GraphTalk Paris Opening Session
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...

Recently uploaded (20)

PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
1_Introduction to advance data techniques.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Business Analytics and business intelligence.pdf
Data_Analytics_and_PowerBI_Presentation.pptx
Database Infoormation System (DBIS).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
1_Introduction to advance data techniques.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction-to-Cloud-ComputingFinal.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
.pdf is not working space design for the following data for the following dat...
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
IB Computer Science - Internal Assessment.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Supervised vs unsupervised machine learning algorithms
Reliability_Chapter_ presentation 1221.5784
Business Analytics and business intelligence.pdf

03 introduction to graph databases

  • 2. ● Components of a Property Graph ● Introducing Game of Thrones ● Introducing Cypher ● Introducing APOC ● Cypher strikes back ● Property Graph Modeling 101
  • 4. Yes indeed. While Graph Theory has been around since 1736 there are quite a few ways to implement it. Currently there are two that matter in the context of databases ● RDF Graph ● Property Graph A discussion on these is not within the scope of this training, Neo4j implements the Property Graph (and can be considered the inventor of said type of graph). If you want more information/a comparison, your instructor will be happy to provide that offline.
  • 5. Node - vertex, entity, object, thing
  • 6. Node - vertex, entity, object, thing Label - Class(ification) of a Node Alien Person
  • 7. Node - vertex, entity, object, thing Label - Class(ification) of a Node Relationship - edge, physical link between two nodes
  • 8. Node - vertex, entity, object, thing Label - Class(ification) of a Node Relationship - edge, physical link between two nodes Type - Class(ification) of a Relationship COMMANDS
  • 9. Node - vertex, entity, object, thing Label - Class(ification) of a Node Relationship - edge, physical link between two nodes Type - Class(ification) of a Relationship Property - Key Value Pair COMMANDS stardate: 41054.3
  • 10. Relational Graph Rows Nodes Joins Relationships Table names Labels Columns Properties
  • 11. Relational Graph Each column must have value (or null) Nodes with the same label are not required to have the same properties Joins are calculated at read time Relationships are created at write time A row belongs only to one table A node can have many (or no) labels Relational Graph
  • 13. A television series with the usual mix of ... Drugs Sex Violence Dragons
  • 14. Based on the book series A Song of Ice and Fire by George R. R. Martin, what is special about it is the sheer number of characters that have a big storyline (and - importantly - all interact with each other). In the movies its dozens (in the books it runs in the hundreds). Rather than put of audiences, this turned it into a hit. And it also made this one hell of a graph database usecase …
  • 15. One of the two databases we'll be using in this training (and that you should have ready to go) is a Game Of Thrones interaction database. So this is finally where you start doing stuff. And lets start with looking what model that database has.
  • 16. As explained in the preparation email for this training, you should now have ● a Neo4j instance running ● a gameofthrones database loaded in that instance ● a Neo4j Browser session connected to that instance I'll not be flipping between slides and code execution. You put me in a window now and follow along either in the browser guide or by cutting and pasting from a website ...
  • 17. So it's either … inside the Neo4j Browser you run :play http://neo4jguides.tomgeudens.io/gdsintro.html (note that this requires a neo4j.conf setting to whitelist the host) or you open a regular browser session too and go to https://bit.ly/neo4j-gds-intro and cut-and-paste the commands from there
  • 19. The syntax to show the model of a database is ... // Make sure you're in the correct database :use gameofthrones // Show the model CALL db.schema.visualization();
  • 21. That wasn't exactly what you got, was it? For me - as human - labels like Knight, Dead, King are obviously all extra labels, subclassing the Person label and thus I manually removed them for clarity. The database makes no such distinctions and does not subclass. All labels are separate. And that's why you got what you got. Note that Rome almost had a horse as consul at some point. Allegedly, true, but Neo4j could have handled that for real!
  • 24. ● A declarative query language ● Focus on the what, not the how ● Uses ASCII art to represent nodes and relationships ("`-''-/").___..--''"`-._ `6_ 6 ) `-. ( ).`-.__.`) (_Y_.)' ._ ) `._ `. ``-..-' _..`--'_..-_/ /--'_.' ((((.-'' ((((.' (((.-' It won't get this complex, I promise ...
  • 25. Diving straight in I'm afraid I have to start by taking away your precioussss SELECT. A Cypher query always starts with a pattern match and thus the logical keyword is MATCH // Find and return all nodes in the database MATCH (n) RETURN n;
  • 26. ● What you just did is actually a pretty bad idea. While the database has no issue with you asking for all the nodes, simple visualization tools (such as the Neo4j Browser) typically can't handle what comes back. ● If you want to think along the lines of what you know about SQL queries, you could see the MATCH as a preprocessing step and the RETURN as the select/projection.
  • 27. The trick to a good graph query (which I'll repeat ad nauseam) is to efficiently find your starting points. Using labels is a great way to do that. // Find and return all House nodes in the database MATCH (n:House) RETURN n;
  • 28. Since a node can have multiple labels, using multiple labels in the base syntax implies AND and returns nodes that have all the specified labels // Find and return all Person nodes in the database that also have the King and the Dead label MATCH (n:Person:King:Dead) RETURN n;
  • 29. ● How many (:Person:King:Dead) nodes did you get? ● Did you notice (for all queries so far) something strange in the results? ● 9 ● Did you ask for relationships in your query? So why then are there relationships on screen?
  • 30. The whole point of a graph database is to be able to traverse relationships (which in the case of graph native Neo4j means hopping pointers). So here we go ... // Which houses where present in which battles? MATCH (h:House)-[]->(b:Battle) RETURN h.name, b.name;
  • 31. The more specific you can make a query, the more efficient it - generally speaking - becomes. // Which houses attacked in which battles? MATCH (h:House)-[a:ATTACKER]->(b:Battle) RETURN h,a,b LIMIT 30;
  • 32. As a relationship has one and only one type, using multiple relationships in the base syntax implies OR. // Which houses attacked or defended in which battles? MATCH (h:House)-[ad:ATTACKER|DEFENDER]->(b:Battle) RETURN h,ad,b LIMIT 30;
  • 33. ● The LIMIT is imposed to make sure the browser doesn't blow up, the database is fine, thank you very much. ● A Cypher query can RETURN quite a few different things and it is your application's job to handle them ...
  • 35. I bet you saw that one coming ...
  • 36. ● Awesome Procedures On Cypher ● A library of user defined procedures (UDP) and functions (UDF) that extends the Cypher query language. ● In this particular case it is a supported library, that provides tons of convenience tools to Cypher. It's funny, because in The Matrix, Cypher brutally kills Apoc. In reality you could argue that Apoc totally saves Cypher!
  • 37. Stored procedures, are pieces of slightly enhanced/wrapped query syntax (PL/SQL comes to mind) and are - as the name implies - stored inside the database itself. APOC is a library of Java code that is deployed - as a jar file - to a folder together with the database software. Yes, of course you can also create such libraries of your own (many customers do) but forget about what you know about stored procedures. UDPs and UDFs are neither stored or created similarly!
  • 38. ● Returns one value ● If it does stuff on the database itself, this is always read-only ● Is used inline Try it … // Generate a UUID and return it and the versions of APOC and GDS RETURN apoc.create.uuid() as uuid, apoc.version() as apocversion, gds.version() as gdsversion;
  • 39. ● YIELDs a stream of results using a predefined signature ● If it does stuff on the database itself, this can be read or write ● Is CALLed Try it // Find out about the signature of a procedure CALL apoc.help("help") YIELD signature RETURN signature;
  • 40. Try it some more // Show which available procedures can load data into the database CALL apoc.help("load") YIELD type, name, text, signature, core RETURN type, name, text, signature, core;
  • 42. So in order to start doing Graph Data Science on the Neo4j database you're going to need a bit more advanced Cypher than we've seen so far ... ● WHERE clauses ● Aggregation stuff … COLLECT/UNWIND, COUNT, ... ● Intermediate projection with … WITH ● Result deduplication with … DISTINCT All of this was a prerequiste though … ;-)
  • 43. I find it always calms audiences down when - after brutally ripping away their SELECT - I can give (back) the WHERE clause … // So these are braces ... MATCH (h:House {name:"Darry"})-[d:DEFENDER]->(b:Battle) RETURN h,d,b; // And they are the equivalent of checking for equality MATCH (h:House)-[d:DEFENDER]->(b:Battle) WHERE h.name = "Darry" RETURN h,d,b;
  • 44. Between the braces you can only do equality, but the WHERE clause comes with the full range of options ... MATCH (p:Person)-[:BELONGS_TO]->(h:House) WHERE p.death_year >= 300 AND p.death_year <= 1200 RETURN p.name, h.name; MATCH (p:Person)-[:BELONGS_TO]->(h:House) WHERE 300 <= p.death_year <= 1200 RETURN p.name, h.name;
  • 45. It came as a bit of a surprise (to me at least) that there is indeed such a debate. One that has in fact been raging ever since aggregation became a thing in a database query language. Neo4j does IMPLICIT aggregation, there is NO group by. Try it // How many persons does each house have? MATCH (a:Person)-[:BELONGS_TO]->(h:House) RETURN h.name as Housename, count(*) as Household;
  • 46. An aggregation in a projection (RETURN or WITH) will implicitely be done on everything that is not aggregated ... Read that again … and again, now (and I'm not giving you syntax) tell me how big the household of Brotherhood Without Banners is ... 6 Check again if you didn't get that ...
  • 47. There are quite a few aggregation functions, but the two you'll use most in this training are count and collect. Try it // Which commanders didn't learn the first time? MATCH (ac:Person)-[:ATTACKER_COMMANDER]->(b:Battle)<-[:DEFENDER_C OMMANDER]-(dc:Person) RETURN ac.name, dc.name, count(b) AS commonBattles, collect(b.name) AS battlenames ORDER BY commonBattles DESC LIMIT 5
  • 48. ● I guess explaining what a count does is a bit pointless (wait until after the deduplication to start laughing though). ● A collect aggregates detail into a list. ● ORDER BY … SKIP … LIMIT work exactly the same as you're used to … all three are needed if you want to do paging.
  • 49. I can explain this to you, but I can't understand it for you. So try the following and answer the - relatively - simple question … How many times is Jon Snow on screen? // Just Jon, right? MATCH (p:Person {name: "Jon Snow"})-[:INTERACTS]->() RETURN p;
  • 50. I would say your eyes deceived you … allow the machine to count. // Counting Jon MATCH (p:Person {name: "Jon Snow"})-[:INTERACTS]->() RETURN count(p); 89 // I am the only Jon MATCH (p:Person {name: "Jon Snow"})-[:INTERACTS]->() RETURN DISTINCT p;
  • 51. This is one of the biggest issues people have with Cypher queries. It does pattern matching. Every match returns a pattern that is unique as a whole, not in it's parts. Jon Snow shows up in 89 unique interaction patterns. I made several attempts to add something to the previous line, but the point is made best if you understand exactly what that line (no more, no less) says. Please take a moment to do that, I'll wait ...
  • 52. WITH ● provides an intermediate projection with exactly the same functionality as a RETURN (but without ending the query) ● controls the visibility of variables beyond the WITH … what isn't projected is gone ● comes with an additional WHERE clause that is the equivalent of the HAVING clause in SQL ● is the main building block for building Cypher pipeline queries
  • 53. It's only the second slide … remember that they kept that ruse going for almost 8 seasons … and you kept watching! // The point of the query is moot, but it shows what WITH can do ... MATCH (p:Person)-[:ATTACKER_COMMANDER]->(b:Battle) WITH p.name as name, count(*) AS numBattles, collect(b.name) as battles WHERE numBattles = 2 RETURN name, numBattles, battles;
  • 54. It's only the second slide … remember that they kept that ruse going for almost 8 seasons … and you kept watching! // The point of the query is moot, but it shows what WITH can do ... MATCH (p:Person)-[:ATTACKER_COMMANDER]->(b:Battle) WITH p.name as name, count(*) AS numBattles, collect(b.name) as battles WHERE numBattles = 2 RETURN name, numBattles, battles;
  • 56. UNWIND takes a collection and turns it into rows … nice … but what exactly does that mean? // How many results come out? WITH range(1,5) as acollection UNWIND acollection as arow WITH arow, acollection, apoc.coll.shuffle(acollection) as ashuffle UNWIND ashuffle as anotherrow RETURN arow, anotherrow, acollection,ashuffle;
  • 57. So a collect aggregates anything into a list (= collection) and an UNWIND turns it back into rows … aaaaarrrgghhhh make it stop !!! // aggressive lot, these MATCH (a:House)-[:ATTACKER]->() WITH collect(DISTINCT a) AS attackers MATCH (d:House)-[:DEFENDER]->() WITH attackers, collect(DISTINCT d) AS defenders UNWIND apoc.coll.removeAll(attackers,defenders) AS houses RETURN houses.name as Names;
  • 59. To lead you into this list of modeling steps … 1. Define high-level domain requirements 2. Create sample data for modeling purposes 3. Define the questions for the domain 4. Identify entities 5. Identify connections between entities 6. Test the model against the questions 7. Test scalability And I'm sure that is all very interesting and … yadi yada …
  • 60. So what we are going to do instead is take a good look at that Game Of Thrones model again and see what does / does not make sense!
  • 63. ● Are you capturing all the information? For example … who is actually fighting in these battles? And why do we only care about commanders and kings? ● Instance modeling helps you to ask the questions … for example (and I haven't seen a single episode of the whole thing myself) … are you really telling me that all the battles are so pitched/clearcut that no house and not a single commander ever changes side during one of them? ● ...
  • 65. ● Why? Why is this not just a label (like Knight and King and Dead … and a word on those too in the next bulletpoint)? Is this adding anything to the questions we want to answer? ● Does status never change over time? In fact, that goes for Knight and King and definitely Dead too. The model seems to support a current situation for Person nodes only … but when is this exactly? Having a time dimension for some things but not for all can put you on the path to false conclusions! ● ...
  • 66. I can keep this up for several hours (and then we call that a Modeling training) … but here are the top tips … ● Work concrete question driven. As the model is very flexible you can easy iterate and adapt as your understanding grows. ● Leverage instance modeling to show if you've missed information. Your instance model should show all cases. ● Be expressive in your relationshiptypes. The only touching-stone is whether the business users understand it and see their business reflected in it!