Skolemising Blank Nodes while Preserving Isomorphism

Skolemising Blank Nodes while
Preserving Isomorphism
Aidan Hogan – DCC, Universidad de Chile

When life gives you blank nodes …

Blank Nodes names aren’t important …
(Isomorphic)

Blank nodes are common in real-world data …
Aidan Hogan, Marcelo Arenas, Alejandro Mallea and Axel Polleres
"Everything You Always Wanted to Know About Blank Nodes".
Journal of Web Semantics 27: pp. 42–69, 2014

BLANK NODES ENABLE SYNTAX SHORTCUTS
They represent implicit nodes in the graph
They help specify order, higher-arity relations, reification, etc., succinctly
They are common in real-world data

BLANK NODES:
WHAT’S THE PROBLEM?

Are two RDF graphs isomorphic?

RDF ISOMORPHISM IS GI-COMPLETE
A general algorithm to see if two RDF graphs are the “same” will
(probably) not be tractable

BLANK NODES ADD COMPLEXITY?
WHAT TO DO?

RDF 1.1 proposes Skolemisation

But fresh IRIs every time is not ideal

Would prefer a “consistent” labelling

Compute isomorphically-unique graph hash

Finding duplicate documents from a crawler

CANONICAL LABELLING USEFUL FOR:
1. Mapping blank nodes to IRIs
2. Computing unique hashes for RDF graphs

An old question that won’t go away …
Jeremy J. Carroll. “Signing RDF Graphs.” ISWC 2003.
Edzard Höfig, Ina Schieferdecker. “Hashing of RDF Graphs
and a Solution to the Blank Node Problem.” URSW 2014.

NO EXISTING APPROACH IS GENERAL
• Hard cases seem unlikely in practice
• Let’s build a general (and thus worst-case exponential) algorithm
that’s efficient for practical cases

NAÏVE CANONICAL LABELLING SCHEME

(Naïve) Canonical labels for blank nodes

But wait … what happens if ... ?

Fixpoint does not distinguish all blank nodes!

NAÏVE: COLOUR BLANK NODES RECURSIVELY
UNTIL FIXPOINT
• Efficient
• Incomplete

CANONICAL LABELLING SCHEME:
ALWAYS DISTINGUISH ALL BLANK NODES
Brendan D. McKay. "Practical graph isomorphism". Congressus Numerantium 30: pp. 45–87, 1981.

Start with a (non-distinguished) colouring …

Let’s distinguish a node …

Colouring is no longer a fixpoint!

Fixpoint reached: still not finished!

So again let’s distinguish another …

… and rerun colouring to fixpoint

Now all blank nodes are distinguished!

Blank node labels computed from colour

Let’s go back: first, why pick _:a and _:c?

Adapt ideas from the Nauty algorithm
(for standard graph isomorphism)

Check all leafs for minimum graph

Automorphisms cause repetitions

CORE ALGORITHM: FIND MINIMAL GRAPH
FOLLOWING FIXED COLOURING RULES
• Complete
• Efficient for many cases?

OKAY … SO WHAT HASHING TO USE?

What about hash collisions?
128 bit: MD5, Murmur3_128
160 bit: SHA1

HASHING MAY LEAD TO COLLISIONS
• Don’t care what hashing you want to use
• 128-bit hash shortest hash with acceptable collision probability
• For cryptographic use-cases, SHA-256 or better might be needed

Evaluation: Nasty Synthetic Graphs

In loving memory of
Linked Data
2007–2012
Survived by its research
community
_:b
1999–2015

Aside: Why GI-Hard?
(Can Encode Graph Isomorphism as RDF Isomorphism)
if and only if

Aside: Why GI-Complete?
(Can we encode RDF isomorphism as graph isomorphism?)
if and only if
?
?

(Yes: We can encode RDF isomorphism as graph isomorphism)

(Yes: We can encode RDF isomorphism as graph isomorphism)
if and only if

COMPLETE CANONICAL LABELLING SCHEME

A complete canonical labelling?

Find a canonical labelling for H

Choose the lowest possible graph

COMPLETE: FIND MINIMUM POSSIBLE
GRAPH USING FIXED BLANK NODE LABELS
• Complete
• Inefficient

The need for a graph-level hash

OPTIMISATION: PRUNE THE TREE USING
AUTOMORPHISMS

Trim the search tree
using “found” automorphisms
Found Automorphisms …

PRUNING PER AUTOMORPHISMS AVOIDS
SYMMETRIC REPETITIONS
• Automorphisms are found naturally
• Makes very “regular” structures (like cliques) a lot easier
• Need to be careful how to manage the automorphism group

Skolemising Blank Nodes while Preserving Isomorphism

More Related Content

Viewers also liked (7)

Similar to Skolemising Blank Nodes while Preserving Isomorphism (20)

Recently uploaded (20)

Skolemising Blank Nodes while Preserving Isomorphism