Find your way in Graph
labyrinths
with SQL, SPARQL, and Gremlin
who we are?
Daniel Camarda
daniel.camarda@gmail.com
https://github.com/mdread
Alfredo Serafini
seralf@gmail.com
https://github.com/seralf
It’s all about relations
for example: northwind DB ...on graph
SEE: http://sql2gremlin.com/
schema?
properties or
relations?
joins or edges?
SQL 1. - ER: tables for Entity and Relations
A table is really similar in practice to a flat CSV. But:
● It introduces types.
● Can be used to materialize important relations, not only entities, normalizing data (=avoiding
duplications)
● Can be fast to access using Indexes
● Logical Entity can be physically splitted into many different Tables, after normalization.
● Relations are not explicit they are:
○ materialized as properties/tables
○ expressed by constraints
○ retrieved by joins
ROW -> TUPLE!
SEE: Northwind schema
RDF 1. - modeling
But tuples can be more “atomic”, if we think differently.
RDF (Resource Description Framework): introduces a conceptual data modeling approach inspired by
several best practices, including the well-known dublin-core.
Similar role to ER schemas (mostly used on relational DB), or class diagram (mostly used in software
design).
RDF is based upon describing resources, by making statements about them: both data and metadata
can be described this way (self-described).
Then we have TUPLEs -> TRIPLEs! (actually QUADs, at least)
subject -> predicate -> object (+ context!)
Thus it is a multigraph labeled and directed: it's the best architecture for managing ontologies, and it
can be also managed more or less as a property graph.
RDF 2. - schema
Have you said schema?
What is a Schema?
● A schema describes your model
● A schema can defines constraints and data types on your model
● A schema provides a good abstraction on the raw data (to be handled manually)
What is the best language to describe schemas?
● XML: DTD is not XML, XSD is XML
● DDL is SQL, but dialect, dictionary and schema changes
● RDF can describe both data and metadata (schema)
○ Are we afraid of standards? Why? Are they too much complex?
○ Schema must be mantained!
RDF 3. - a shared language for schemas
A standardized framework for the
description of models it's only a shared
language!
1) No one is forced to adopt a specific
vocabulary: only a basic syntax is
shared among different domains.
2) However different domains can be
modeled sharing both schema and data
linking, creating a wider knowledge
graph.
examples: all kind of linked data,
vocabularies such as good relations,
schema.org and so on
http://www.google.com/insidesearch/features/search/knowledge.html
https://www.freebase.com/
http://dbpedia.org/
RDF 4. - looking at an RDF vocabulary (schema)
How does one of those RDF vocabulary
can look like?
For example FOAF (Friend Of A Friend)
vocabulary,
using the VOWL toolkit
http://vowl.visualdataweb.org/
SQL & gremlin - 1
SQL
SELECT CategoryName
FROM Categories
Gremlin
g.V('type','category').categoryName
SPARQL
SELECT ?category
WHERE {
?uri a ?category .
}
SQL & gremlin - 2
SQL
SELECT *
FROM Products AS P
INNER JOIN Categories AS C
ON (C.CategoryID = P.CategoryID)
WHERE (C.CategoryName = 'Beverages')
SPARQL
SELECT *
FROM <http://northwind/graph>
WHERE {
?uri a nw:Product .
?uri nw:has_category ?category .
?category a nw:Category .
?category nw:categoryName 'Beverages' .
}
SELECT *
FROM <http://northwind/graph>
WHERE {
?uri a nw:Product .
?uri nw:has_category / nw:categoryName
'Beverages' .
}
Gremlin
g.V('categoryName','Beverages').in('inCategory').map()
From table to graph: two strategies
1. RDF mapping, with tools R2RML (Relational to RDF Mapping Language) and DM (Direct
Mapping)
a. builds an RDF graph, and the mapping itself is also RDF (turtle)
b. triples can be mapped live from the relational engine, or materialized into a triplestore
2. Build your own graph model.
a. no need for learn a new language
b. no need for introduce external tools as dependencies
In both cases, a projection of the graph can be used to produce either different graph or tables
schema
Example: Github graph
The idea
search for repositories on github, get information about those repos along with collaborators and
library dependencies
Why?
Github has lots of interesting data, analyzing it can give us insights on how the opensource
community is evolving. A graph is the best way to represent this kind of deeply interconnected
community
How it works?
Tinkerpop is used on top of OrientDB which is the backend graph engine. The data is retrieved by a
small Scala application
github schema
Graph visualized
generated with gephi https://gephi.org/
● an interactive tool for exploration
and analysis of graphs
● connect with external data sources
with the Stream plugin
● useful when thinking about your
queries
repository
dependency
user
Github data collected on Orient Graph:
https://github.com/randomknot/graph-labyrinth-demo
Is a query language, specifically built for graph traversal
● easy to navigate relationships (edges)
● easy to filter
● start thinking about Paths, not Records
● turing complete language
● default implementation as a Groovy DSL
examples 1
All contributors of a repository
g.v("#11:192").in("contributes").login
projects on which users of this project contribute to
g.v("#11:192").in("contributes").out("contributes").dedup.name
Repositories with more than ten contributors
g.V("node_type", "Repository").filter{it.inE("contributes").count() > 10}.name
examples 2
common contributors of two projects
g.v('#11:47').in("contributes").as("x").out.retain([g.v('#11:57')]).back("x").login
users who work on projects, using a specific library
g.V("node_type", "Contributor").as("usr")
.out("contributes")
.out("depends")
.filter{it.artifact_id == "spring-social-web"}
.back("usr")
.login
how gremlin select nodes?
examples 3
five most used libraries
g.V("node_type", "Dependency").inE("depends").inV.groupCount{it.artifact_id}.cap.orderMap(T.
decr)[0..4]
contributors of projects with more than ten contributors
g.V("node_type", "Repository").filter{it.inE("contributes").count() > 10}.in("contributes").login
The end
references
● Freebase knowledge base
https://www.freebase.com/
● Google Knowledge Graph
http://www.google.com/insidesearch/features/search/knowledge.html
● RDF
○ RDF primer
http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/
○ VOWL
http://vowl.visualdataweb.org/
○ FOAF - Friend Of A Friend
http://www.foaf-project.org/
● dbeaver
http://dbeaver.jkiss.org/
references
● gremlin documentation
https://github.com/tinkerpop/gremlin/wiki
http://gremlindocs.com/
● sql2gremlin
http://sql2gremlin.com/
○ visualization: http://sql2gremlin.com/graph/
○ joins: http://sql2gremlin.com/#joining/inner-join
● gremlin examples
http://www.fromdev.com/2013/09/Gremlin-Example-Query-Snippets-Graph-DB.html
● SPARQL + gremlin
https://github.com/tinkerpop/gremlin/wiki/SPARQL-vs.-Gremlin
● using SPARQL qith gephi to visualize co-authorship
http://data.linkededucation.org/linkedup/devtalk/?p=31
● mining github followers in tinkerpop (with R, github, neo4j)
http://patrick.wagstrom.net/weblog/2012/05/13/mining-github-followers-in-tinkerpop/

More Related Content

PPT
RDF and OWL
PPT
Introduction To RDF and RDFS
PDF
Debunking some “RDF vs. Property Graph” Alternative Facts
PPT
Understanding RDF: the Resource Description Framework in Context (1999)
PPTX
Demystifying RDF
PDF
Jesús Barrasa
PPTX
Semantic Variation Graphs the case for RDF & SPARQL
PDF
Efficient Query Answering against Dynamic RDF Databases
RDF and OWL
Introduction To RDF and RDFS
Debunking some “RDF vs. Property Graph” Alternative Facts
Understanding RDF: the Resource Description Framework in Context (1999)
Demystifying RDF
Jesús Barrasa
Semantic Variation Graphs the case for RDF & SPARQL
Efficient Query Answering against Dynamic RDF Databases

What's hot (20)

PDF
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
PPTX
Resource description framework
PPT
PDF
Two graph data models : RDF and Property Graphs
PPT
Web ontology language (owl)
PPTX
Owl web ontology language
PPT
PDF
Knowledge graph construction with a façade - The SPARQL Anything Project
PPT
PPT
Introduction to RDF
PPT
Dublin Core In Practice
PPTX
The Semantic Web #9 - Web Ontology Language (OWL)
PDF
The SPARQL Anything project
PDF
Trying SPARQL Anything with MEI
PPT
What’s in a structured value?
PPT
Rdf Overview Presentation
PPTX
469 talk
PDF
Introduction to RDF
PPT
Semantic Pipes and Semantic Mashups
PDF
Graph databases & data integration v2
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Resource description framework
Two graph data models : RDF and Property Graphs
Web ontology language (owl)
Owl web ontology language
Knowledge graph construction with a façade - The SPARQL Anything Project
Introduction to RDF
Dublin Core In Practice
The Semantic Web #9 - Web Ontology Language (OWL)
The SPARQL Anything project
Trying SPARQL Anything with MEI
What’s in a structured value?
Rdf Overview Presentation
469 talk
Introduction to RDF
Semantic Pipes and Semantic Mashups
Graph databases & data integration v2
Ad

Similar to Find your way in Graph labyrinths (20)

PDF
Evolution of the Graph Schema
ODP
State of the Semantic Web
PDF
RDF: what and why plus a SPARQL tutorial
PDF
Graphs, Stores and API
PPTX
Knowledge Graph Introduction
PDF
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
PPTX
SWT Lecture Session 2 - RDF
PDF
Graph Abstractions Matter by Ora Lassila
PPTX
One day workshop Linked Data and Semantic Web
PDF
SPARQL and Linked Data
PDF
Rdf data-model-and-storage
PPTX
Semantic Web and Related Work at W3C
PPTX
Selecting the right database type for your knowledge management needs.
PDF
Graph Databases and Web Frameworks (NodeJS, AngularJS, GridFS, OpenLink Virtu...
PPTX
Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2
PDF
Introduction to Graph Databases
PDF
A Hands On Overview Of The Semantic Web
PDF
Semantic Web Technology
PPT
A Semantic Multimedia Web (Part 2)
PPTX
21-RDF and triplestores in NOSql database.pptx
Evolution of the Graph Schema
State of the Semantic Web
RDF: what and why plus a SPARQL tutorial
Graphs, Stores and API
Knowledge Graph Introduction
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
SWT Lecture Session 2 - RDF
Graph Abstractions Matter by Ora Lassila
One day workshop Linked Data and Semantic Web
SPARQL and Linked Data
Rdf data-model-and-storage
Semantic Web and Related Work at W3C
Selecting the right database type for your knowledge management needs.
Graph Databases and Web Frameworks (NodeJS, AngularJS, GridFS, OpenLink Virtu...
Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2
Introduction to Graph Databases
A Hands On Overview Of The Semantic Web
Semantic Web Technology
A Semantic Multimedia Web (Part 2)
21-RDF and triplestores in NOSql database.pptx
Ad

Recently uploaded (20)

PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
Types of Token_ From Utility to Security.pdf
PDF
Time Tracking Features That Teams and Organizations Actually Need
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
Visual explanation of Dijkstra's Algorithm using Python
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PPTX
assetexplorer- product-overview - presentation
PPTX
Computer Software - Technology and Livelihood Education
PDF
AI Guide for Business Growth - Arna Softech
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PPTX
Introduction to Windows Operating System
PDF
Cost to Outsource Software Development in 2025
PDF
Salesforce Agentforce AI Implementation.pdf
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
iTop VPN Crack Latest Version Full Key 2025
Patient Appointment Booking in Odoo with online payment
Types of Token_ From Utility to Security.pdf
Time Tracking Features That Teams and Organizations Actually Need
Trending Python Topics for Data Visualization in 2025
Visual explanation of Dijkstra's Algorithm using Python
Oracle Fusion HCM Cloud Demo for Beginners
Weekly report ppt - harsh dattuprasad patel.pptx
assetexplorer- product-overview - presentation
Computer Software - Technology and Livelihood Education
AI Guide for Business Growth - Arna Softech
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
Introduction to Windows Operating System
Cost to Outsource Software Development in 2025
Salesforce Agentforce AI Implementation.pdf
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
How Tridens DevSecOps Ensures Compliance, Security, and Agility
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Multiverse AI Review 2025: Access All TOP AI Model-Versions!

Find your way in Graph labyrinths

  • 1. Find your way in Graph labyrinths with SQL, SPARQL, and Gremlin
  • 2. who we are? Daniel Camarda daniel.camarda@gmail.com https://github.com/mdread Alfredo Serafini seralf@gmail.com https://github.com/seralf
  • 3. It’s all about relations for example: northwind DB ...on graph SEE: http://sql2gremlin.com/ schema? properties or relations? joins or edges?
  • 4. SQL 1. - ER: tables for Entity and Relations A table is really similar in practice to a flat CSV. But: ● It introduces types. ● Can be used to materialize important relations, not only entities, normalizing data (=avoiding duplications) ● Can be fast to access using Indexes ● Logical Entity can be physically splitted into many different Tables, after normalization. ● Relations are not explicit they are: ○ materialized as properties/tables ○ expressed by constraints ○ retrieved by joins ROW -> TUPLE! SEE: Northwind schema
  • 5. RDF 1. - modeling But tuples can be more “atomic”, if we think differently. RDF (Resource Description Framework): introduces a conceptual data modeling approach inspired by several best practices, including the well-known dublin-core. Similar role to ER schemas (mostly used on relational DB), or class diagram (mostly used in software design). RDF is based upon describing resources, by making statements about them: both data and metadata can be described this way (self-described). Then we have TUPLEs -> TRIPLEs! (actually QUADs, at least) subject -> predicate -> object (+ context!) Thus it is a multigraph labeled and directed: it's the best architecture for managing ontologies, and it can be also managed more or less as a property graph.
  • 6. RDF 2. - schema Have you said schema? What is a Schema? ● A schema describes your model ● A schema can defines constraints and data types on your model ● A schema provides a good abstraction on the raw data (to be handled manually) What is the best language to describe schemas? ● XML: DTD is not XML, XSD is XML ● DDL is SQL, but dialect, dictionary and schema changes ● RDF can describe both data and metadata (schema) ○ Are we afraid of standards? Why? Are they too much complex? ○ Schema must be mantained!
  • 7. RDF 3. - a shared language for schemas A standardized framework for the description of models it's only a shared language! 1) No one is forced to adopt a specific vocabulary: only a basic syntax is shared among different domains. 2) However different domains can be modeled sharing both schema and data linking, creating a wider knowledge graph. examples: all kind of linked data, vocabularies such as good relations, schema.org and so on http://www.google.com/insidesearch/features/search/knowledge.html https://www.freebase.com/ http://dbpedia.org/
  • 8. RDF 4. - looking at an RDF vocabulary (schema) How does one of those RDF vocabulary can look like? For example FOAF (Friend Of A Friend) vocabulary, using the VOWL toolkit http://vowl.visualdataweb.org/
  • 9. SQL & gremlin - 1 SQL SELECT CategoryName FROM Categories Gremlin g.V('type','category').categoryName SPARQL SELECT ?category WHERE { ?uri a ?category . }
  • 10. SQL & gremlin - 2 SQL SELECT * FROM Products AS P INNER JOIN Categories AS C ON (C.CategoryID = P.CategoryID) WHERE (C.CategoryName = 'Beverages') SPARQL SELECT * FROM <http://northwind/graph> WHERE { ?uri a nw:Product . ?uri nw:has_category ?category . ?category a nw:Category . ?category nw:categoryName 'Beverages' . } SELECT * FROM <http://northwind/graph> WHERE { ?uri a nw:Product . ?uri nw:has_category / nw:categoryName 'Beverages' . } Gremlin g.V('categoryName','Beverages').in('inCategory').map()
  • 11. From table to graph: two strategies 1. RDF mapping, with tools R2RML (Relational to RDF Mapping Language) and DM (Direct Mapping) a. builds an RDF graph, and the mapping itself is also RDF (turtle) b. triples can be mapped live from the relational engine, or materialized into a triplestore 2. Build your own graph model. a. no need for learn a new language b. no need for introduce external tools as dependencies In both cases, a projection of the graph can be used to produce either different graph or tables schema
  • 12. Example: Github graph The idea search for repositories on github, get information about those repos along with collaborators and library dependencies Why? Github has lots of interesting data, analyzing it can give us insights on how the opensource community is evolving. A graph is the best way to represent this kind of deeply interconnected community How it works? Tinkerpop is used on top of OrientDB which is the backend graph engine. The data is retrieved by a small Scala application
  • 14. Graph visualized generated with gephi https://gephi.org/ ● an interactive tool for exploration and analysis of graphs ● connect with external data sources with the Stream plugin ● useful when thinking about your queries repository dependency user Github data collected on Orient Graph: https://github.com/randomknot/graph-labyrinth-demo
  • 15. Is a query language, specifically built for graph traversal ● easy to navigate relationships (edges) ● easy to filter ● start thinking about Paths, not Records ● turing complete language ● default implementation as a Groovy DSL
  • 16. examples 1 All contributors of a repository g.v("#11:192").in("contributes").login projects on which users of this project contribute to g.v("#11:192").in("contributes").out("contributes").dedup.name Repositories with more than ten contributors g.V("node_type", "Repository").filter{it.inE("contributes").count() > 10}.name
  • 17. examples 2 common contributors of two projects g.v('#11:47').in("contributes").as("x").out.retain([g.v('#11:57')]).back("x").login users who work on projects, using a specific library g.V("node_type", "Contributor").as("usr") .out("contributes") .out("depends") .filter{it.artifact_id == "spring-social-web"} .back("usr") .login
  • 19. examples 3 five most used libraries g.V("node_type", "Dependency").inE("depends").inV.groupCount{it.artifact_id}.cap.orderMap(T. decr)[0..4] contributors of projects with more than ten contributors g.V("node_type", "Repository").filter{it.inE("contributes").count() > 10}.in("contributes").login
  • 21. references ● Freebase knowledge base https://www.freebase.com/ ● Google Knowledge Graph http://www.google.com/insidesearch/features/search/knowledge.html ● RDF ○ RDF primer http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/ ○ VOWL http://vowl.visualdataweb.org/ ○ FOAF - Friend Of A Friend http://www.foaf-project.org/ ● dbeaver http://dbeaver.jkiss.org/
  • 22. references ● gremlin documentation https://github.com/tinkerpop/gremlin/wiki http://gremlindocs.com/ ● sql2gremlin http://sql2gremlin.com/ ○ visualization: http://sql2gremlin.com/graph/ ○ joins: http://sql2gremlin.com/#joining/inner-join ● gremlin examples http://www.fromdev.com/2013/09/Gremlin-Example-Query-Snippets-Graph-DB.html ● SPARQL + gremlin https://github.com/tinkerpop/gremlin/wiki/SPARQL-vs.-Gremlin ● using SPARQL qith gephi to visualize co-authorship http://data.linkededucation.org/linkedup/devtalk/?p=31 ● mining github followers in tinkerpop (with R, github, neo4j) http://patrick.wagstrom.net/weblog/2012/05/13/mining-github-followers-in-tinkerpop/