SlideShare a Scribd company logo
Experiments with
evolving RDF
Sławek Staworko
(joint work with Peter Buneman)
University of Edinburgh
Preservation of evolving data
Tom
cat
has
tuna
eats
Tom
cat
has
Apr 1
dies
Tom
dog
has
dog
food eats
Version 1 Version 2 Version 3
…
Archive
• Version retrieval
• Timeline queries
• Storage space efficiency
Approaches to data
preservation
• Store all versions
• Store the original databases and log the changes
• Hybrid approach of the above two
• store the initial and every 10th version
• store log changes for the intermediate versions
• Annotation based approach!
• never delete data but annotate its validity with
time intervals
Annotation of RDF
Tom
cat
has
tuna
eats
Tom
cat
has
Apr 1
dies
Tom
dog
has
dog
food eats
Version 1 Version 2 Version 3
Archive
Tom
cat
has [1–2]
tuna
eats [1–1]
Apr 1
dies [2–2]
dog
has [3—]
dog
food
eats [3—]
What exactly is the input?
Delta = difference between two databases expressed with
two atomic operations: inserting a triple and deleting a triple
Tom
cat
has
tuna
eats
Tom
cat
has
Apr 1
dies
Tom
dog
has
dog
food eats
delete (cat, eats, tuna)
insert (cat, dies, Apr 1)
delete (Tom, has, cat)
insert (Tom, has, dog)
inset (dog, eats, dog food)
delete (cat, dies, Apr 1)
Snapshots
Deltas
Snapshots = complete database instances
Challenges in preserving
evolving data with annotations
1. The task is relatively simple if deltas are know:!
• deleting a triple closes its interval!
• adding a triple opens a new interval !
2. It gets complicated when only snapshots are given!
• it boils down to computing deltas!
• main challenge: identify objects that are the same across
versions of the database
Entity resolution problem!
which data object represent the same entity across different versions!
well-studied database problem in various different settings
(from duplicate elimination to record matching)
Entity resolution and RDF
URI (Uniform resource identifier)
URIs are supposed to make things easy but…
• RDF has also blank nodes
• URIs don’t exactly solve the problem in the
context of evolving/merged ontologies…
Two different RDF nodes need not represent different objects
Blank nodes
• LOD initiative frowns upon them
• Blank nodes are commonplace (and misused?)
Tom
cat
has
Peter
believes
Tom cathas
Peter believes
_bsubject
pred
object
_b
2.4 -0.4
Reification Complex number
Blank nodes (cont.)
1. Reification (Peter believes that Tom has a cat)
2. Data structures (complex types)
3. Anonymization (Tom has a pet)
Assumptions on reasonable use of blank nodes:!
1. Represent concrete objects !
2. The objects can be identified from the context
Deblanking
_b1
7 end
_b2
3
_b3
5
LISP-style encoding
list of numbers [5,3,7]
head
head
head
tail
tail
tail
#(7,end)
7 end
_b2
3
_b3
5
head
head
head
tail
tail
tail
#(7,end)
7 end
#(3,7,end)
3
_b3
5
head
head
head
tail
tail
tail
#(7,end)
7 end
#(3,7,end)
3
#(5,3,7,end)
5
head
head
head
tail
tail
tail
Assumption: graph has no cycles consisting of blanks only
Assumption: identity of a blank node is determined by its contents
Experiements
• 10 versions of Experimental Factor Ontology (EFO)
data expressed in OWL
• 200k triples in the 1st version, 290k in the last
• On average 20k blank nodes in each version
• 920k triples overall (blank nodes are independent)
• many triples do not last more than 1 version
Experiment
Deblanking and life expectancy of an object
Round Triples Blanks Life expect.
0 921896 165935 2.55
1 358857 33253 6.39
2 348356 28150 6.57
3 339695 23502 6.88
4 330564 18862 7.10
5 318761 14763 7.24
6 311562 11021 7.39
7 304628 7299 7.54
8 297744 3622 7.83
9 285484 58 7.83
10 285334 2 7.83
11 285334 1 7.83
12 285334 0 7.83
Improving space efficiency
Peter
Edinburgh +44 712 4567
phone [1–10]lives [1–10]
Peter
Edinburgh +44 712 4567
phonelives
[1–10]Lift common intervals to subject
dog
has [1–5]
dog
has [1–5]
• Intervals moved from all but 33.7k triples (of total 285k)
• Number of subjects with histories is 34.3k
• Total number of intervals is reduced from 285k to 60k
• The size of the index reduced by almost 80%
Future:
• Bisimulation
• Nested RDF
Conclusions
• Annotation offers an attractive way of representing
an evolving RDF dataset (need for nested RDF?)
• Evolution of data may require more complex atomic
operations. For instance, vocabulary evolution:
adding, splitting, merging classes. (can
bisimulation help here?)

More Related Content

PDF
오픈소스 라이브러리 개발기
PPTX
Python list functions
PDF
Python in Academia by Marco Bardoscia
PDF
2 × 3 = 6
PDF
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
PDF
Slides octave1
PDF
Decentralized Evolution and Consolidation of RDF Graphs
PPT
Aidan's PhD Viva
오픈소스 라이브러리 개발기
Python list functions
Python in Academia by Marco Bardoscia
2 × 3 = 6
Metadata Provenance Tutorial Part 2: Interoperable Metadata Provenance
Slides octave1
Decentralized Evolution and Consolidation of RDF Graphs
Aidan's PhD Viva

Similar to Experiments with evolving RDF (20)

PPTX
Inductive Triple Graphs: A purely functional approach to represent RDF
ODP
Cdao Evolution08
PDF
Vital AI: Big Data Modeling
PPTX
DIACHRON Project Overview
PDF
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
PDF
Formalizing (Web) Standards: An Application of Test and Proof
PDF
RDF: what and why plus a SPARQL tutorial
PDF
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
PPTX
Formalization and implementation of BFO 2 with a focus on the OWL implementation
PDF
Computing probabilistic queries in the presence of uncertainty via probabilis...
PPTX
Code is not text! How graph technologies can help us to understand our code b...
PPT
Rdf In A Nutshell V1
PPTX
OWL: Yet to arrive on the Web of Data?
PDF
Types Working for You, Not Against You
PDF
learn you some erlang - chap 9 to chap10
PPTX
The Challenge of Deeper Knowledge Graphs for Science
PPTX
2013 py con awesome big data algorithms
PDF
2018 BSidesSF Buiding Intelligent Automatons with Semantic Reasoning
PDF
Bio ontologies and semantic technologies
PPTX
Building Named Entity Recognition Models Efficiently using NERDS
Inductive Triple Graphs: A purely functional approach to represent RDF
Cdao Evolution08
Vital AI: Big Data Modeling
DIACHRON Project Overview
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Formalizing (Web) Standards: An Application of Test and Proof
RDF: what and why plus a SPARQL tutorial
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Formalization and implementation of BFO 2 with a focus on the OWL implementation
Computing probabilistic queries in the presence of uncertainty via probabilis...
Code is not text! How graph technologies can help us to understand our code b...
Rdf In A Nutshell V1
OWL: Yet to arrive on the Web of Data?
Types Working for You, Not Against You
learn you some erlang - chap 9 to chap10
The Challenge of Deeper Knowledge Graphs for Science
2013 py con awesome big data algorithms
2018 BSidesSF Buiding Intelligent Automatons with Semantic Reasoning
Bio ontologies and semantic technologies
Building Named Entity Recognition Models Efficiently using NERDS
Ad

More from PRELIDA Project (16)

PDF
Steps towards a Data Value Chain
PPTX
Preserving linked data: sustainability and organizational infrastructure
PPTX
Organizational and Economic Issues in Linked Data Preservation
PPTX
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
PDF
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
PDF
Media Ecology Project
PPTX
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
PPTX
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
PPSX
DIACHRON Preservation: Evolution Management for Preservation
PPTX
PRELIDA Project Draft Roadmap
PPT
D.3.1: State of the Art - Linked Data and Digital Preservation
PPTX
Introduction to PRELIDA Consolidation and Dissemination Workshop
PPTX
D3.1 State of the art assessment on Linked Data and Digital Preservation
PPTX
Gap Analysis
PPTX
Towards long-term preservation of linked data - the PRELIDA project
PPTX
Introduction to Prelida
Steps towards a Data Value Chain
Preserving linked data: sustainability and organizational infrastructure
Organizational and Economic Issues in Linked Data Preservation
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Media Ecology Project
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
DIACHRON Preservation: Evolution Management for Preservation
PRELIDA Project Draft Roadmap
D.3.1: State of the Art - Linked Data and Digital Preservation
Introduction to PRELIDA Consolidation and Dissemination Workshop
D3.1 State of the art assessment on Linked Data and Digital Preservation
Gap Analysis
Towards long-term preservation of linked data - the PRELIDA project
Introduction to Prelida
Ad

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
KodekX | Application Modernization Development
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KodekX | Application Modernization Development
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
sap open course for s4hana steps from ECC to s4
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
NewMind AI Weekly Chronicles - August'25 Week I
MIND Revenue Release Quarter 2 2025 Press Release
Unlocking AI with Model Context Protocol (MCP)
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf

Experiments with evolving RDF

  • 1. Experiments with evolving RDF Sławek Staworko (joint work with Peter Buneman) University of Edinburgh
  • 2. Preservation of evolving data Tom cat has tuna eats Tom cat has Apr 1 dies Tom dog has dog food eats Version 1 Version 2 Version 3 … Archive • Version retrieval • Timeline queries • Storage space efficiency
  • 3. Approaches to data preservation • Store all versions • Store the original databases and log the changes • Hybrid approach of the above two • store the initial and every 10th version • store log changes for the intermediate versions • Annotation based approach! • never delete data but annotate its validity with time intervals
  • 4. Annotation of RDF Tom cat has tuna eats Tom cat has Apr 1 dies Tom dog has dog food eats Version 1 Version 2 Version 3 Archive Tom cat has [1–2] tuna eats [1–1] Apr 1 dies [2–2] dog has [3—] dog food eats [3—]
  • 5. What exactly is the input? Delta = difference between two databases expressed with two atomic operations: inserting a triple and deleting a triple Tom cat has tuna eats Tom cat has Apr 1 dies Tom dog has dog food eats delete (cat, eats, tuna) insert (cat, dies, Apr 1) delete (Tom, has, cat) insert (Tom, has, dog) inset (dog, eats, dog food) delete (cat, dies, Apr 1) Snapshots Deltas Snapshots = complete database instances
  • 6. Challenges in preserving evolving data with annotations 1. The task is relatively simple if deltas are know:! • deleting a triple closes its interval! • adding a triple opens a new interval ! 2. It gets complicated when only snapshots are given! • it boils down to computing deltas! • main challenge: identify objects that are the same across versions of the database Entity resolution problem! which data object represent the same entity across different versions! well-studied database problem in various different settings (from duplicate elimination to record matching)
  • 7. Entity resolution and RDF URI (Uniform resource identifier) URIs are supposed to make things easy but… • RDF has also blank nodes • URIs don’t exactly solve the problem in the context of evolving/merged ontologies… Two different RDF nodes need not represent different objects
  • 8. Blank nodes • LOD initiative frowns upon them • Blank nodes are commonplace (and misused?) Tom cat has Peter believes Tom cathas Peter believes _bsubject pred object _b 2.4 -0.4 Reification Complex number
  • 9. Blank nodes (cont.) 1. Reification (Peter believes that Tom has a cat) 2. Data structures (complex types) 3. Anonymization (Tom has a pet) Assumptions on reasonable use of blank nodes:! 1. Represent concrete objects ! 2. The objects can be identified from the context
  • 10. Deblanking _b1 7 end _b2 3 _b3 5 LISP-style encoding list of numbers [5,3,7] head head head tail tail tail #(7,end) 7 end _b2 3 _b3 5 head head head tail tail tail #(7,end) 7 end #(3,7,end) 3 _b3 5 head head head tail tail tail #(7,end) 7 end #(3,7,end) 3 #(5,3,7,end) 5 head head head tail tail tail Assumption: graph has no cycles consisting of blanks only Assumption: identity of a blank node is determined by its contents
  • 11. Experiements • 10 versions of Experimental Factor Ontology (EFO) data expressed in OWL • 200k triples in the 1st version, 290k in the last • On average 20k blank nodes in each version • 920k triples overall (blank nodes are independent) • many triples do not last more than 1 version
  • 12. Experiment Deblanking and life expectancy of an object Round Triples Blanks Life expect. 0 921896 165935 2.55 1 358857 33253 6.39 2 348356 28150 6.57 3 339695 23502 6.88 4 330564 18862 7.10 5 318761 14763 7.24 6 311562 11021 7.39 7 304628 7299 7.54 8 297744 3622 7.83 9 285484 58 7.83 10 285334 2 7.83 11 285334 1 7.83 12 285334 0 7.83
  • 13. Improving space efficiency Peter Edinburgh +44 712 4567 phone [1–10]lives [1–10] Peter Edinburgh +44 712 4567 phonelives [1–10]Lift common intervals to subject dog has [1–5] dog has [1–5] • Intervals moved from all but 33.7k triples (of total 285k) • Number of subjects with histories is 34.3k • Total number of intervals is reduced from 285k to 60k • The size of the index reduced by almost 80%
  • 15. Conclusions • Annotation offers an attractive way of representing an evolving RDF dataset (need for nested RDF?) • Evolution of data may require more complex atomic operations. For instance, vocabulary evolution: adding, splitting, merging classes. (can bisimulation help here?)