SlideShare a Scribd company logo
Skolemising Blank Nodes while
Preserving Isomorphism
Aidan Hogan – DCC, Universidad de Chile
WHY? BLANK NODES ARE GREAT!
When life gives you blank nodes …
Blank Nodes are glue!
Blank Nodes names aren’t important …
(Isomorphic)
Blank nodes are common in real-world data …
Aidan Hogan, Marcelo Arenas, Alejandro Mallea and Axel Polleres
"Everything You Always Wanted to Know About Blank Nodes".
Journal of Web Semantics 27: pp. 42–69, 2014
BLANK NODES ENABLE SYNTAX SHORTCUTS
They represent implicit nodes in the graph
They help specify order, higher-arity relations, reification, etc., succinctly
They are common in real-world data
BLANK NODES:
WHAT’S THE PROBLEM?
Are two RDF graphs isomorphic?
Are two RDF graphs isomorphic?
RDF ISOMORPHISM IS GI-COMPLETE
A general algorithm to see if two RDF graphs are the “same” will
(probably) not be tractable
BLANK NODES ADD COMPLEXITY?
WHAT TO DO?
RDF 1.1 proposes Skolemisation
But fresh IRIs every time is not ideal
But fresh IRIs every time is not ideal
Would prefer a “consistent” labelling
Would prefer a “consistent” labelling
Compute isomorphically-unique graph hash
Finding duplicate documents from a crawler
CANONICAL LABELLING USEFUL FOR:
1. Mapping blank nodes to IRIs
2. Computing unique hashes for RDF graphs
OLD BUT RECURRING QUESTION
An old question that won’t go away …
Jeremy J. Carroll. “Signing RDF Graphs.” ISWC 2003.
Edzard Höfig, Ina Schieferdecker. “Hashing of RDF Graphs
and a Solution to the Blank Node Problem.” URSW 2014.
NO EXISTING APPROACH IS GENERAL
• Hard cases seem unlikely in practice
• Let’s build a general (and thus worst-case exponential) algorithm
that’s efficient for practical cases
NAÏVE CANONICAL LABELLING SCHEME
(Naïve) Canonical labels for blank nodes
But wait … what happens if ... ?
Or another case …
Or another case …
Or another case …
Fixpoint does not distinguish all blank nodes!
NAÏVE: COLOUR BLANK NODES RECURSIVELY
UNTIL FIXPOINT
• Efficient
• Incomplete
CANONICAL LABELLING SCHEME:
ALWAYS DISTINGUISH ALL BLANK NODES
Brendan D. McKay. "Practical graph isomorphism". Congressus Numerantium 30: pp. 45–87, 1981.
Start with a (non-distinguished) colouring …
Let’s distinguish a node …
Let’s distinguish a node …
Colouring is no longer a fixpoint!
Rerun colouring to fixpoint
Rerun colouring to fixpoint
Rerun colouring to fixpoint
Rerun colouring to fixpoint
Fixpoint reached: still not finished!
So again let’s distinguish another …
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
Now all blank nodes are distinguished!
Blank node labels computed from colour
Let’s go back: first, why pick _:a and _:c?
Okay so: why _:a …
Adapt ideas from the Nauty algorithm
(for standard graph isomorphism)
Adapt ideas from the Nauty algorithm
(for standard graph isomorphism)
Check all leafs for minimum graph
What happened?
What happened?
What happened?
Automorphisms cause repetitions
CORE ALGORITHM: FIND MINIMAL GRAPH
FOLLOWING FIXED COLOURING RULES
• Complete
• Efficient for many cases?
OKAY … SO WHAT HASHING TO USE?
What about hash collisions?
128 bit: MD5, Murmur3_128
160 bit: SHA1
HASHING MAY LEAD TO COLLISIONS
• Don’t care what hashing you want to use
• 128-bit hash shortest hash with acceptable collision probability
• For cryptographic use-cases, SHA-256 or better might be needed
EVALUATION
Evaluation: Real-world Graphs
Evaluation: Nasty Synthetic Graphs
CONCLUSIONS
In loving memory of
Linked Data
2007–2012
Survived by its research
community
_:b
1999–2015
Conclusions
Aside: Why GI-Hard?
Aside: Why GI-Hard?
(Can Encode Graph Isomorphism as RDF Isomorphism)
if and only if
Aside: Why GI-Complete?
(Can we encode RDF isomorphism as graph isomorphism?)
if and only if
?
?
Aside: Why GI-Complete?
(Yes: We can encode RDF isomorphism as graph isomorphism)
Aside: Why GI-Complete?
(Yes: We can encode RDF isomorphism as graph isomorphism)
if and only if
COMPLETE CANONICAL LABELLING SCHEME
A complete canonical labelling?
Find a canonical labelling for H
Choose the lowest possible graph
COMPLETE: FIND MINIMUM POSSIBLE
GRAPH USING FIXED BLANK NODE LABELS
• Complete
• Inefficient
The need for a graph-level hash
OPTIMISATION: PRUNE THE TREE USING
AUTOMORPHISMS
Trim the search tree
using “found” automorphisms
Found Automorphisms …
PRUNING PER AUTOMORPHISMS AVOIDS
SYMMETRIC REPETITIONS
• Automorphisms are found naturally
• Makes very “regular” structures (like cliques) a lot easier
• Need to be careful how to manage the automorphism group

More Related Content

PPTX
Introduction to RDF Data Model
PPTX
Best Practices for Multilingual Linked Open Data
PDF
Linked Data Under the Hood
PPSX
Introduction to RDF
PPT
Rdf Overview Presentation
PDF
Rdf data-model-and-storage
PPTX
Resource description framework
PDF
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Introduction to RDF Data Model
Best Practices for Multilingual Linked Open Data
Linked Data Under the Hood
Introduction to RDF
Rdf Overview Presentation
Rdf data-model-and-storage
Resource description framework
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...

Viewers also liked (7)

PPTX
Learning W3C Linked Data Platform with examples
PDF
Metadata - Linked Data
PPTX
Why do they call it Linked Data when they want to say...?
PPT
Introduction to RDF
PPT
PPT
RDF and OWL
PPT
Andreas Blumauer: Über das ‘Smarte’ am Semantic Web
Learning W3C Linked Data Platform with examples
Metadata - Linked Data
Why do they call it Linked Data when they want to say...?
Introduction to RDF
RDF and OWL
Andreas Blumauer: Über das ‘Smarte’ am Semantic Web
Ad

Similar to Skolemising Blank Nodes while Preserving Isomorphism (20)

PPTX
Presentation at SMI 2023
PDF
Yoav Goldberg: Word Embeddings What, How and Whither
PPTX
Core Methods In Educational Data Mining
PDF
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
PPTX
Corr clust-kiel
PDF
Constrained text generation to measure reading performance: A new approach ba...
PPT
R for the semantic web, Quesada useR 2009
PDF
ilp-nlp-slides.pdf
PDF
Music recommendations @ MLConf 2014
PPTX
Query Linguistic Intent Detection
KEY
Bluffers guide to Terminology
KEY
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
PPTX
Neuro-symbolic is not enough, we need neuro-*semantic*
PDF
DataDay 2023 Presentation - Notes
PDF
Colloquium talk on modal sense classification using a convolutional neural ne...
PDF
Spark MLlib and Viral Tweets
PDF
A Guide to the Post Relational Revolution
PPTX
ScalaDays 2013 Keynote Speech by Martin Odersky
PPTX
Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018
PDF
DFS-model Graph Modeling (CES 417) Lecture 6
Presentation at SMI 2023
Yoav Goldberg: Word Embeddings What, How and Whither
Core Methods In Educational Data Mining
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Corr clust-kiel
Constrained text generation to measure reading performance: A new approach ba...
R for the semantic web, Quesada useR 2009
ilp-nlp-slides.pdf
Music recommendations @ MLConf 2014
Query Linguistic Intent Detection
Bluffers guide to Terminology
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
Neuro-symbolic is not enough, we need neuro-*semantic*
DataDay 2023 Presentation - Notes
Colloquium talk on modal sense classification using a convolutional neural ne...
Spark MLlib and Viral Tweets
A Guide to the Post Relational Revolution
ScalaDays 2013 Keynote Speech by Martin Odersky
Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018
DFS-model Graph Modeling (CES 417) Lecture 6
Ad

Recently uploaded (20)

PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
TLE Review Electricity (Electricity).pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
project resource management chapter-09.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
STKI Israel Market Study 2025 version august
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Modernising the Digital Integration Hub
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Hybrid model detection and classification of lung cancer
PDF
Architecture types and enterprise applications.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
The various Industrial Revolutions .pptx
Getting started with AI Agents and Multi-Agent Systems
TLE Review Electricity (Electricity).pptx
1 - Historical Antecedents, Social Consideration.pdf
project resource management chapter-09.pdf
OMC Textile Division Presentation 2021.pptx
STKI Israel Market Study 2025 version august
Hindi spoken digit analysis for native and non-native speakers
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
DP Operators-handbook-extract for the Mautical Institute
A comparative study of natural language inference in Swahili using monolingua...
Assigned Numbers - 2025 - Bluetooth® Document
Modernising the Digital Integration Hub
Module 1.ppt Iot fundamentals and Architecture
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
NewMind AI Weekly Chronicles – August ’25 Week III
Hybrid model detection and classification of lung cancer
Architecture types and enterprise applications.pdf
Enhancing emotion recognition model for a student engagement use case through...
The various Industrial Revolutions .pptx

Skolemising Blank Nodes while Preserving Isomorphism