Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Herbert Van de Sompel
@hvdsomp
Los Alamos National Laboratory
Acknowledgments: Lyudmila Balakireva, Harihar Shankar, Ruben Verborgh
Access to DBpedia Versions using
Memento and Triple Pattern Fragments
Miel Vander Sande
@Miel_vds
Ghent University
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Memento Framework
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Memento LDOW 2010 Submission
Herbert Van de Sompel et al. (2010) An HTTP-Based Versioning Mechanism for Linked Data
http://arxiv.org/abs/1003.3661
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Memento and Linked Data
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Memento and Linked Data
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Time-Series Analysis across DBpedia Versions
Data collected through “follow your nose” HTTP Navigation
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
First Generation DBpedia Archive: Storage
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
First Generation DBpedia Archive: Storage
Characteristics
upload software
custom
upload time
~ 24 hours per version
storage software
MongoDB
storage space
383 Gb for 10 versions
DBpedia versions
10 versions: 2.0 through 3.9
number of triples
~ 3 billion
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
First Generation DBpedia Archive: Subject-URI Access
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
First Generation DBpedia Archive: Subject-URI Access
http://dbpedia.mementodepot.org/memento/2009052/http://dbpedia.org/page/Oaxaca
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
First Generation DBpedia Archive: Subject-URI Access
Characteristics
TimeGate software
custom
access type
Subject URI & datetime
external integration
current DBpedia
clients
• all clients: direct access to
Memento Subject-URI
• Memento clients: datetime
negotiation with Subject-URI
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
DBpedia Archive @ LANL Since 2010
• Access based on Subject-URI (DBpedia Topic URI) only
• MongoDB storage
• A blob per Subject-URI per version
• Dynamically transformed to other RDF serializations
• No updates since version 3.9 (2013) of DBpedia as a result of
scalability problems
!!!
!!!
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Affordable & Useful Linked Data Archives
• A Linked Data Archive consists of temporal snapshots of one or
more Linked Data sets, whereby each temporal snapshot reflects
the state of a Linked Data set at a specific moment or interval in
time.
• How to make Linked Data Archives accessible in a manner that is
• affordable/sustainable for the publisher
• useful for the consumer
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive: Characteristics
General Characteristics Publisher Consumer
Availability
Bandwidth
Cost
Functionality
Interface Expressiveness
LOD Integration
Memento Support
Cross Time/Data
Verdict:
• Publication perspective: $$$$
• Access perspective: ++++
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Publishing
• The typical ways of publishing Linked Data on the Web:
• Subject URI access
• Data dump
• SPARQL endpoint
Let’s consider these from the perspective of Linked Data Archives,
i.e. archival storage and access
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive with Subject-URI Access
• For each temporal snapshot of a Linked Data set, and for each
Subject in that snapshot, publish an RDF description (of the Subject)
at a URI that is specific per snapshot/subject
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive with Subject-URI Access: Characteristics
General Characteristics Publisher Consumer
Availability rather high rather high
Bandwidth ~ description ~ description
Cost rather low rather high
Functionality
Interface Expressiveness rather low
LOD Integration yes
Memento Support possible
Cross Time/Data follow your nose
Verdict:
• Publication perspective: $$$$
• Access perspective: ++++
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive Using Dumps
• Renders each temporal snapshot of a Linked Data set as a data
dump that places all temporal dataset triples (as they were at a
specific moment in time) into one or more files
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive Using Dumps: Characteristics
General Characteristics Publisher Consumer
Availability high high
Bandwidth high high
Cost low high
Functionality
Interface Expressiveness download dataset
LOD Integration no
Memento Support not possible
Cross Time/Data download various datasets
Verdict:
• Publication perspective: $$$$
• Access perspective: ++++
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive with SPARQL Endpoint(s)
• For each temporal snapshot of a Linked Data set, supports arbitrary
SPARQL queries.
• Different architectural set-ups possible; no standard approach
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive Using SPARQL Endpoint(s): Characteristics
General Characteristics Publisher Consumer
Availability problematic problematic
Bandwidth ~ query ~ query
Cost high low
Functionality
Interface Expressiveness highly expressive
LOD Integration no
Memento Support hard
Cross Time/Data custom distributed queries
Verdict:
• Publication perspective: $$$$
• Access perspective: ++++
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Affordable & Useful Linked Data Archives
Linked Data Archive Type Publishing Consuming
Data Dump $$$$ ++++
SPARQL Endpoint(s) $$$$ ++++
Subject URI Access $$$$ ++++
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Fragments (Ghent U)
• Every Linked Data interface offers specific fragments of a Linked
Data set
• A fragment is described by
• Selector: what questions can I ask?
• Controls: how do I get more fragments?
• Metadata: helpful information for consumption?
• Each interface type comes with tradeoffs
• cf. the analysis thus far
http://linkeddatafragments.org
Verborgh, R. et al. (2014) Querying datsets on the web with high availability. ISWC 2014
http://ruben.verborgh.org/publications/verborgh_iswc_2014/
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Triple Pattern Fragments (Ghent U)
• Triple Pattern Fragments is a new interface with a different set of
tradeoffs that are attractive from an archival perspective
http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Triple Pattern Fragments (Ghent U)
• Allows querying a Linked Data set according to
?Subject ?Predicate ?Object
patterns
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Triple Pattern Fragments (Ghent U)
Controls: Responses provide navigational help for clients
• Based on emerging Hydra vocabulary for self-describing
Hypermedia-Driven Web APIs
Metadata: dataset info, estimated count (to aid client applications)
http://www.hydra-cg.com/spec/latest/core/
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Binary RDF Representation for Publication and Exchange (HDT)
http://www.w3.org/Submission/HDT/
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Binary RDF Representation for Publication and Exchange (HDT)
http://www.w3.org/Submission/HDT/
• Header-Dictionary-Triple (HDT) is a compact, binary representation
of RDF datasets.
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Binary RDF Representation for Publication and Exchange (HDT)
http://www.w3.org/Submission/HDT/
• Able to represent massive data sets
• Dictionary/Triples structure achieves
• rapid search for ?subject ?predicate ?object pattern
• high compression rates
• Header provides metadata about the dataset
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
HDT Linked Data Archive with TPF Support
• For each temporal snapshot of a Linked Data set, generate an HDT
serialization that provides access according to
?subject ?predicate ?object
patterns
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Archive with ?s?p?o Access: Characteristics
General Characteristics Publisher Consumer
Availability high high
Bandwidth ~ query ~ query
Cost low medium
Functionality
Interface Expressiveness better than subject-URI only
LOD Integration yes
Memento Support possible
Cross Time/Data follow your nose
Verdict:
• Publication perspective: $$$$
• Access perspective: ++++
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Affordable & Useful Linked Data Archives
Linked Data Archive Type Publishing Consuming
Data Dump $$$$ ++++
SPARQL Endpoint(s) $$$$ ++++
Subject URI Access $$$$ ++++
HDT & TPF $$$$ ++++
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: Storage
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: Storage
Characteristics
upload software
HDT-CPP
upload time
~ 4 hours per version
storage software
HDT binary files
storage space
70 Gb for 12 versions
DBpedia versions
12 versions: 2.0 through 2015
number of triples
~ 5 billion
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: ?s?p?o Query-URI Access
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: ?s?p?o Query-URI Access
http://fragments.mementodepot.org/dbpedia_3_8?subject=&predicate=http://dbpedia.org/ontology/b
irthPlace&object=http://dbpedia.org/resource/Ghent
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: ?s?p?o Query-URI Access
?s?p?o Query-URI Access
TimeGate URI http://fragments.mementodepot.org/timegate/dbpedia?
subject={DBpediaURI}&predicate={DBpediaURI}&object={DBpediaURI}
http://fragments.mementodepot.org/timegate/dbpedia?
subject=&predicate=&object=http://dbpedia.org/resource/Ghent
TimeMap URI not supported
Memento URI http://fragments.mementodepot.org/{DBpediaVersion}?subject={DBpediaURI
}&predicate={DBpediaURI}&object={DBpediaURI}
http://fragments.mementodepot.org/dbpedia_3_0?
subject=&predicate=&object=http://dbpedia.org/resource/Ghent
Further info http://mementoweb.org/depot/native/fragments/
Try it with Memento for Chrome – http://bit.ly/memento-for-chrome
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: Subject-URI Access
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: Subject-URI Access
Subject-URI Access
TimeGate URI http://dbpedia.mementodepot.org/timegate/{DBpediaURI}
http://dbpedia.mementodepot.org/timegate/http://dbpedia.org/data/Ghent
TimeMap URI http://dbpedia.mementodepot.org/timemap/link/{DBpediaURI}
http://dbpedia.mementodepot.org/timemap/link/http://dbpedia.org/data/Ghent
Memento URI http://dbpedia.mementodepot.org/{yyyymmdd}/{DBpediaURI}
http://dbpedia.mementodepot.org/20080103/http://dbpedia.org/data/Ghent
Further info http://mementoweb.org/depot/native/dbpedia/
Try it with Memento for Chrome – http://bit.ly/memento-for-chrome
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Second Generation DBpedia Archive: Access
Characteristics
TimeGate software
① node.js LDF server 2.0.0
② LDF js client
access type
① ?s?p?o Query-URI & datetime
② Subject-URI & datetime
external integration
① DBpedia LDF server
② current DBpedia
clients
• all clients: direct access to
Mementos of Subject-URI and
?s?p?o Query-URI
• Memento clients: datetime
negotiation with Subject-URI and
?s?p?o Query-URI
1
2
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Outline
• Prelude: Memento and Linked Data
• First Generation DBpedia Archive
• Devising Affordable/Useful Linked Data Archives
• Intermezzo: Triple Pattern Fragments (TPF)
• Intermezzo: Binary RDF Representation (HDT)
• Devising Affordable/Useful Linked Data Archives
• Second Generation DBpedia Archive
• Try this At Home
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Building a Linked Data Archive
• Convert the archival data set(s) to HDT using HDT-CPP
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
HDT Software (C++)
https://github.com/rdfhdt/hdt-cpp
• input data requires cleaning
before processing, especially
regarding URI characters
• DBpedia data not clean
• DBpedia v3.5 was not
successfully processed
• No meaningful error
messages to help locate
problems
• memory intensive
• Kyoto Cabinet was used
to optimize storage
requirement and speed
during processing
• Java version exists but has
memory problems
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Building a Linked Data Archive
• Convert the archival data set(s) to HDT using HDT-CPP
• Download the Triple Fragment Server code
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Fragment Server (Node.js)
https://github.com/LinkedDataFragments/Server.js
• provides ?s?p?o access to
local and/or remote Linked
Data sets
• supports HDT, Turtle files, N-
Triple files, JSON-LD files,
SPARQL endpoints, in-
memory store, and
BlazeGraph Linked Data sets
• version 2.0.0 (released March
31 2016) has built-in Memento
support
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Building a Linked Data Archive
• Convert the archival data set(s) to HDT using HDT-CPP
• Download the Triple Fragment Server code
• Create the JSON config file for Memento
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Linked Data Fragment Server, Memento Configuration
https://github.com/LinkedDataFragments/Server.js/wiki/Configuring-Memento
• declare archival data set(s)
• add datetime ranges for the
archival data set(s)
• add a TimeGate
• list the archival data set(s) for
which the TimeGate should
support datetime negotiation
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Building a Linked Data Archive
• Convert the archival data set(s) to HDT using HDT-CPP
• Download the Triple Fragment Server code
• Create the JSON config file for Memento
• Run the server
Herbert Van de Sompel & Miel Vander Sande
CNI Spring Meeting, San Antonio, TX, April 5 2016
Herbert Van de Sompel
@hvdsomp
Los Alamos National Laboratory
Acknowledgments: Lyudmila Balakireva, Harihar Shankar, Ruben Verborgh
Access to DBpedia Versions using
Memento and Triple Pattern Fragments
Miel Vander Sande
@Miel_vds
Ghent University

More Related Content

PPTX
PPTX
Hiberlink: Investigating Reference Rot, December 2013
PPTX
A Perspective on Archiving the Scholarly Record
PPTX
Reminiscing about interoperability
PPTX
Creating Pockets of Persistence
PDF
OAC Presentation at CNI 09 Fall Forum
PPT
Profiling Web Archives
PPTX
PID Signposting Pattern
Hiberlink: Investigating Reference Rot, December 2013
A Perspective on Archiving the Scholarly Record
Reminiscing about interoperability
Creating Pockets of Persistence
OAC Presentation at CNI 09 Fall Forum
Profiling Web Archives
PID Signposting Pattern

What's hot (20)

PPT
Linked Open Data for Libraries
PPTX
SWIB14 Weaving repository contents into the Semantic Web
PPT
towards interoperable archives: the Universal Preprint Service initiative
ZIP
Intro to Linked Open Data in Libraries, Archives & Museums
PPTX
ResourceSync Quick Overview
PPTX
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
PDF
A Clean Slate?
PPTX
FAIR Signposting: A KISS Approach to a Burning Issue
PPTX
Interoperability for web based scholarship
ZIP
Intro to Linked Open Data in Libraries Archives & Museums.
PDF
Web Data Management with RDF
PPTX
Linked Data at ISAW: How and Why
PPT
The SFX Framework for Context-Sensitive Reference Linking
PDF
The aDORe Federation Architecture
PDF
DBpedia InsideOut
PPTX
Linked Data in Libraries
PDF
Linked open data and libraries
PPTX
What is #LODLAM?! (revised January 2015)
PDF
Web Data Management in the RDF Age
PPTX
Why do they call it Linked Data when they want to say...?
Linked Open Data for Libraries
SWIB14 Weaving repository contents into the Semantic Web
towards interoperable archives: the Universal Preprint Service initiative
Intro to Linked Open Data in Libraries, Archives & Museums
ResourceSync Quick Overview
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
A Clean Slate?
FAIR Signposting: A KISS Approach to a Burning Issue
Interoperability for web based scholarship
Intro to Linked Open Data in Libraries Archives & Museums.
Web Data Management with RDF
Linked Data at ISAW: How and Why
The SFX Framework for Context-Sensitive Reference Linking
The aDORe Federation Architecture
DBpedia InsideOut
Linked Data in Libraries
Linked open data and libraries
What is #LODLAM?! (revised January 2015)
Web Data Management in the RDF Age
Why do they call it Linked Data when they want to say...?
Ad

Viewers also liked (19)

PDF
The bX project: Federating and Mining Usage Logs from Linking Servers
PPT
The Roof is on Fire
PDF
Open Archives Initiative Object Re-Use & Exchange
PDF
The djatoka Image Server
PDF
Augmenting interoperability across scholarly repositories
PDF
Attempts at innovation in scholarly communication
PDF
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
PDF
the UPS protoproto project
PPT
MESUR: Making sense and use of usage data
PDF
An HTTP-Based Versioning Mechanism for Linked Data
PDF
The Web as infrastructure for scholarly research and communication
PDF
Motivation, inspiration and innovation from frustration
PDF
Memento: Time Travel for the Web
PDF
Memento: Big Leaps Towards Seamless Navigation of the Web of the Past
PDF
Towards a Machine-Actionable Scholarly Communication System
PDF
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
PDF
Untitled I: Challenges ahead
PDF
Time travelling through DBpedia
PPTX
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
The bX project: Federating and Mining Usage Logs from Linking Servers
The Roof is on Fire
Open Archives Initiative Object Re-Use & Exchange
The djatoka Image Server
Augmenting interoperability across scholarly repositories
Attempts at innovation in scholarly communication
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
the UPS protoproto project
MESUR: Making sense and use of usage data
An HTTP-Based Versioning Mechanism for Linked Data
The Web as infrastructure for scholarly research and communication
Motivation, inspiration and innovation from frustration
Memento: Time Travel for the Web
Memento: Big Leaps Towards Seamless Navigation of the Web of the Past
Towards a Machine-Actionable Scholarly Communication System
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
Untitled I: Challenges ahead
Time travelling through DBpedia
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Ad

Similar to DBpedia Archive using Memento, Triple Pattern Fragments, and HDT (20)

PDF
A sweet affordable combo for Linked Data Archives
PDF
Reproducibility with 
the 99 cents Linked Data archive
PDF
Versioned Triple Pattern Fragments
PDF
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
PDF
CLARIAH Toogdag 2018: A distributed network of digital heritage information
PDF
Sustainable queryable access to Linked Data
PPTX
Collaborating web archives - Herbert van de Sompel
PDF
A distributed network of digital heritage information - Unesco/NDL India
PPTX
"Web Archive services framework for tighter integration between the past and ...
PDF
Session 1.4 a distributed network of heritage information
PDF
A distributed network of digital heritage information - Semantics Amsterdam
PPTX
Linked Data Implementations—Who, What and Why?
PDF
MementoMap: An Archive Profile Dissemination Framework
PDF
Linked Data Generation for the University Data From Legacy Database
ODP
Charper.lawdi.20130531
PDF
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
PDF
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
PDF
Perseverance on Persistence by Herbert van de Sompel - EuropeanaTech Conferen...
PPTX
Perseverance on Persistence
PPT
Linked Data - the Future for Open Repositories?
A sweet affordable combo for Linked Data Archives
Reproducibility with 
the 99 cents Linked Data archive
Versioned Triple Pattern Fragments
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
CLARIAH Toogdag 2018: A distributed network of digital heritage information
Sustainable queryable access to Linked Data
Collaborating web archives - Herbert van de Sompel
A distributed network of digital heritage information - Unesco/NDL India
"Web Archive services framework for tighter integration between the past and ...
Session 1.4 a distributed network of heritage information
A distributed network of digital heritage information - Semantics Amsterdam
Linked Data Implementations—Who, What and Why?
MementoMap: An Archive Profile Dissemination Framework
Linked Data Generation for the University Data From Legacy Database
Charper.lawdi.20130531
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
Perseverance on Persistence by Herbert van de Sompel - EuropeanaTech Conferen...
Perseverance on Persistence
Linked Data - the Future for Open Repositories?

More from Herbert Van de Sompel (16)

PPTX
The web is rotting and what to do about it
PPTX
Researcher Pod: Scholarly Communication Using the Decentralized Web
PPTX
Persistent Identification: Easier Said than Done
PPTX
Registration / Certification Interoperability Architecture (overlay peer-review)
PPTX
Collecting the organizational scholarly record
PPTX
To the Rescue of Scholarly Orphans
PPTX
Almost two decades at LANL
PPTX
Paul Evan Peters Lecture
PPT
Achieving Link Integrity for Managed Collections
PPTX
Signposting Overview (Version November 2017)
PPTX
Signposting Overview
PPTX
ResourceSync Overview
PPTX
ResourceSync tutorial OAI8
PDF
Paint-Yourself-In-The-Corner Infrastructure
PDF
ResourceSync: Web-Based Resource Synchronization
PDF
ResourceSync: Conceptual and Technical Problem Perspective
The web is rotting and what to do about it
Researcher Pod: Scholarly Communication Using the Decentralized Web
Persistent Identification: Easier Said than Done
Registration / Certification Interoperability Architecture (overlay peer-review)
Collecting the organizational scholarly record
To the Rescue of Scholarly Orphans
Almost two decades at LANL
Paul Evan Peters Lecture
Achieving Link Integrity for Managed Collections
Signposting Overview (Version November 2017)
Signposting Overview
ResourceSync Overview
ResourceSync tutorial OAI8
Paint-Yourself-In-The-Corner Infrastructure
ResourceSync: Web-Based Resource Synchronization
ResourceSync: Conceptual and Technical Problem Perspective

Recently uploaded (20)

PPTX
Reading as a good Form of Recreation
PPTX
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
PPTX
Artificial_Intelligence_Basics use in our daily life
PPTX
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd
PDF
Exploring The Internet Of Things(IOT).ppt
PDF
Computer Networking, Internet, Casting in Network
PDF
BIOCHEM CH2 OVERVIEW OF MICROBIOLOGY.pdf
PDF
The Evolution of Traditional to New Media .pdf
PDF
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
DOCX
Memecoinist Update: Best Meme Coins 2025, Trump Meme Coin Predictions, and th...
PDF
Course Overview and Agenda cloud security
PDF
Buy Cash App Verified Accounts Instantly – Secure Crypto Deal.pdf
PPTX
IPCNA VIRTUAL CLASSES INTERMEDIATE 6 PROJECT.pptx
PDF
Understand the Gitlab_presentation_task.pdf
PDF
Alethe Consulting Corporate Profile and Solution Aproach
PPTX
Basic understanding of cloud computing one need
PDF
Virtual Guard Technology Provider_ Remote Security Service Solutions.pdf
PDF
Top 8 Trusted Sources to Buy Verified Cash App Accounts.pdf
PDF
Paper The World Game (s) Great Redesign.pdf
PDF
Lean-Manufacturing-Tools-Techniques-and-How-To-Use-Them.pdf
Reading as a good Form of Recreation
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
Artificial_Intelligence_Basics use in our daily life
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd
Exploring The Internet Of Things(IOT).ppt
Computer Networking, Internet, Casting in Network
BIOCHEM CH2 OVERVIEW OF MICROBIOLOGY.pdf
The Evolution of Traditional to New Media .pdf
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
Memecoinist Update: Best Meme Coins 2025, Trump Meme Coin Predictions, and th...
Course Overview and Agenda cloud security
Buy Cash App Verified Accounts Instantly – Secure Crypto Deal.pdf
IPCNA VIRTUAL CLASSES INTERMEDIATE 6 PROJECT.pptx
Understand the Gitlab_presentation_task.pdf
Alethe Consulting Corporate Profile and Solution Aproach
Basic understanding of cloud computing one need
Virtual Guard Technology Provider_ Remote Security Service Solutions.pdf
Top 8 Trusted Sources to Buy Verified Cash App Accounts.pdf
Paper The World Game (s) Great Redesign.pdf
Lean-Manufacturing-Tools-Techniques-and-How-To-Use-Them.pdf

DBpedia Archive using Memento, Triple Pattern Fragments, and HDT

  • 1. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Herbert Van de Sompel @hvdsomp Los Alamos National Laboratory Acknowledgments: Lyudmila Balakireva, Harihar Shankar, Ruben Verborgh Access to DBpedia Versions using Memento and Triple Pattern Fragments Miel Vander Sande @Miel_vds Ghent University
  • 2. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Outline • Prelude: Memento and Linked Data • First Generation DBpedia Archive • Devising Affordable/Useful Linked Data Archives • Intermezzo: Triple Pattern Fragments (TPF) • Intermezzo: Binary RDF Representation (HDT) • Devising Affordable/Useful Linked Data Archives • Second Generation DBpedia Archive • Try this At Home
  • 3. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Outline • Prelude: Memento and Linked Data • First Generation DBpedia Archive • Devising Affordable/Useful Linked Data Archives • Intermezzo: Triple Pattern Fragments (TPF) • Intermezzo: Binary RDF Representation (HDT) • Devising Affordable/Useful Linked Data Archives • Second Generation DBpedia Archive • Try this At Home
  • 4. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Memento Framework
  • 5. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Memento LDOW 2010 Submission Herbert Van de Sompel et al. (2010) An HTTP-Based Versioning Mechanism for Linked Data http://arxiv.org/abs/1003.3661
  • 6. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Memento and Linked Data
  • 7. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Memento and Linked Data
  • 8. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Time-Series Analysis across DBpedia Versions Data collected through “follow your nose” HTTP Navigation
  • 9. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Outline • Prelude: Memento and Linked Data • First Generation DBpedia Archive • Devising Affordable/Useful Linked Data Archives • Intermezzo: Triple Pattern Fragments (TPF) • Intermezzo: Binary RDF Representation (HDT) • Devising Affordable/Useful Linked Data Archives • Second Generation DBpedia Archive • Try this At Home
  • 10. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 First Generation DBpedia Archive: Storage
  • 11. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 First Generation DBpedia Archive: Storage Characteristics upload software custom upload time ~ 24 hours per version storage software MongoDB storage space 383 Gb for 10 versions DBpedia versions 10 versions: 2.0 through 3.9 number of triples ~ 3 billion
  • 12. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 First Generation DBpedia Archive: Subject-URI Access
  • 13. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 First Generation DBpedia Archive: Subject-URI Access http://dbpedia.mementodepot.org/memento/2009052/http://dbpedia.org/page/Oaxaca
  • 14. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 First Generation DBpedia Archive: Subject-URI Access Characteristics TimeGate software custom access type Subject URI & datetime external integration current DBpedia clients • all clients: direct access to Memento Subject-URI • Memento clients: datetime negotiation with Subject-URI
  • 15. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 DBpedia Archive @ LANL Since 2010 • Access based on Subject-URI (DBpedia Topic URI) only • MongoDB storage • A blob per Subject-URI per version • Dynamically transformed to other RDF serializations • No updates since version 3.9 (2013) of DBpedia as a result of scalability problems !!! !!!
  • 16. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Outline • Prelude: Memento and Linked Data • First Generation DBpedia Archive • Devising Affordable/Useful Linked Data Archives • Intermezzo: Triple Pattern Fragments (TPF) • Intermezzo: Binary RDF Representation (HDT) • Devising Affordable/Useful Linked Data Archives • Second Generation DBpedia Archive • Try this At Home
  • 17. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Affordable & Useful Linked Data Archives • A Linked Data Archive consists of temporal snapshots of one or more Linked Data sets, whereby each temporal snapshot reflects the state of a Linked Data set at a specific moment or interval in time. • How to make Linked Data Archives accessible in a manner that is • affordable/sustainable for the publisher • useful for the consumer
  • 18. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Archive: Characteristics General Characteristics Publisher Consumer Availability Bandwidth Cost Functionality Interface Expressiveness LOD Integration Memento Support Cross Time/Data Verdict: • Publication perspective: $$$$ • Access perspective: ++++
  • 19. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Publishing • The typical ways of publishing Linked Data on the Web: • Subject URI access • Data dump • SPARQL endpoint Let’s consider these from the perspective of Linked Data Archives, i.e. archival storage and access
  • 20. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Archive with Subject-URI Access • For each temporal snapshot of a Linked Data set, and for each Subject in that snapshot, publish an RDF description (of the Subject) at a URI that is specific per snapshot/subject
  • 21. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Archive with Subject-URI Access: Characteristics General Characteristics Publisher Consumer Availability rather high rather high Bandwidth ~ description ~ description Cost rather low rather high Functionality Interface Expressiveness rather low LOD Integration yes Memento Support possible Cross Time/Data follow your nose Verdict: • Publication perspective: $$$$ • Access perspective: ++++
  • 22. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Archive Using Dumps • Renders each temporal snapshot of a Linked Data set as a data dump that places all temporal dataset triples (as they were at a specific moment in time) into one or more files
  • 23. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Archive Using Dumps: Characteristics General Characteristics Publisher Consumer Availability high high Bandwidth high high Cost low high Functionality Interface Expressiveness download dataset LOD Integration no Memento Support not possible Cross Time/Data download various datasets Verdict: • Publication perspective: $$$$ • Access perspective: ++++
  • 24. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Archive with SPARQL Endpoint(s) • For each temporal snapshot of a Linked Data set, supports arbitrary SPARQL queries. • Different architectural set-ups possible; no standard approach
  • 25. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Archive Using SPARQL Endpoint(s): Characteristics General Characteristics Publisher Consumer Availability problematic problematic Bandwidth ~ query ~ query Cost high low Functionality Interface Expressiveness highly expressive LOD Integration no Memento Support hard Cross Time/Data custom distributed queries Verdict: • Publication perspective: $$$$ • Access perspective: ++++
  • 26. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Affordable & Useful Linked Data Archives Linked Data Archive Type Publishing Consuming Data Dump $$$$ ++++ SPARQL Endpoint(s) $$$$ ++++ Subject URI Access $$$$ ++++
  • 27. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Outline • Prelude: Memento and Linked Data • First Generation DBpedia Archive • Devising Affordable/Useful Linked Data Archives • Intermezzo: Triple Pattern Fragments (TPF) • Intermezzo: Binary RDF Representation (HDT) • Devising Affordable/Useful Linked Data Archives • Second Generation DBpedia Archive • Try this At Home
  • 28. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Fragments (Ghent U) • Every Linked Data interface offers specific fragments of a Linked Data set • A fragment is described by • Selector: what questions can I ask? • Controls: how do I get more fragments? • Metadata: helpful information for consumption? • Each interface type comes with tradeoffs • cf. the analysis thus far http://linkeddatafragments.org Verborgh, R. et al. (2014) Querying datsets on the web with high availability. ISWC 2014 http://ruben.verborgh.org/publications/verborgh_iswc_2014/
  • 29. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Triple Pattern Fragments (Ghent U) • Triple Pattern Fragments is a new interface with a different set of tradeoffs that are attractive from an archival perspective http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/
  • 30. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Triple Pattern Fragments (Ghent U) • Allows querying a Linked Data set according to ?Subject ?Predicate ?Object patterns
  • 31. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Triple Pattern Fragments (Ghent U) Controls: Responses provide navigational help for clients • Based on emerging Hydra vocabulary for self-describing Hypermedia-Driven Web APIs Metadata: dataset info, estimated count (to aid client applications) http://www.hydra-cg.com/spec/latest/core/
  • 32. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Outline • Prelude: Memento and Linked Data • First Generation DBpedia Archive • Devising Affordable/Useful Linked Data Archives • Intermezzo: Triple Pattern Fragments (TPF) • Intermezzo: Binary RDF Representation (HDT) • Devising Affordable/Useful Linked Data Archives • Second Generation DBpedia Archive • Try this At Home
  • 33. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Binary RDF Representation for Publication and Exchange (HDT) http://www.w3.org/Submission/HDT/
  • 34. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Binary RDF Representation for Publication and Exchange (HDT) http://www.w3.org/Submission/HDT/ • Header-Dictionary-Triple (HDT) is a compact, binary representation of RDF datasets.
  • 35. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Binary RDF Representation for Publication and Exchange (HDT) http://www.w3.org/Submission/HDT/ • Able to represent massive data sets • Dictionary/Triples structure achieves • rapid search for ?subject ?predicate ?object pattern • high compression rates • Header provides metadata about the dataset
  • 36. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Outline • Prelude: Memento and Linked Data • First Generation DBpedia Archive • Devising Affordable/Useful Linked Data Archives • Intermezzo: Triple Pattern Fragments (TPF) • Intermezzo: Binary RDF Representation (HDT) • Devising Affordable/Useful Linked Data Archives • Second Generation DBpedia Archive • Try this At Home
  • 37. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 HDT Linked Data Archive with TPF Support • For each temporal snapshot of a Linked Data set, generate an HDT serialization that provides access according to ?subject ?predicate ?object patterns
  • 38. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Archive with ?s?p?o Access: Characteristics General Characteristics Publisher Consumer Availability high high Bandwidth ~ query ~ query Cost low medium Functionality Interface Expressiveness better than subject-URI only LOD Integration yes Memento Support possible Cross Time/Data follow your nose Verdict: • Publication perspective: $$$$ • Access perspective: ++++
  • 39. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Affordable & Useful Linked Data Archives Linked Data Archive Type Publishing Consuming Data Dump $$$$ ++++ SPARQL Endpoint(s) $$$$ ++++ Subject URI Access $$$$ ++++ HDT & TPF $$$$ ++++
  • 40. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Outline • Prelude: Memento and Linked Data • First Generation DBpedia Archive • Devising Affordable/Useful Linked Data Archives • Intermezzo: Triple Pattern Fragments (TPF) • Intermezzo: Binary RDF Representation (HDT) • Devising Affordable/Useful Linked Data Archives • Second Generation DBpedia Archive • Try this At Home
  • 41. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Second Generation DBpedia Archive: Storage
  • 42. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Second Generation DBpedia Archive: Storage Characteristics upload software HDT-CPP upload time ~ 4 hours per version storage software HDT binary files storage space 70 Gb for 12 versions DBpedia versions 12 versions: 2.0 through 2015 number of triples ~ 5 billion
  • 43. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Second Generation DBpedia Archive: ?s?p?o Query-URI Access
  • 44. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Second Generation DBpedia Archive: ?s?p?o Query-URI Access http://fragments.mementodepot.org/dbpedia_3_8?subject=&predicate=http://dbpedia.org/ontology/b irthPlace&object=http://dbpedia.org/resource/Ghent
  • 45. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Second Generation DBpedia Archive: ?s?p?o Query-URI Access ?s?p?o Query-URI Access TimeGate URI http://fragments.mementodepot.org/timegate/dbpedia? subject={DBpediaURI}&predicate={DBpediaURI}&object={DBpediaURI} http://fragments.mementodepot.org/timegate/dbpedia? subject=&predicate=&object=http://dbpedia.org/resource/Ghent TimeMap URI not supported Memento URI http://fragments.mementodepot.org/{DBpediaVersion}?subject={DBpediaURI }&predicate={DBpediaURI}&object={DBpediaURI} http://fragments.mementodepot.org/dbpedia_3_0? subject=&predicate=&object=http://dbpedia.org/resource/Ghent Further info http://mementoweb.org/depot/native/fragments/ Try it with Memento for Chrome – http://bit.ly/memento-for-chrome
  • 46. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Second Generation DBpedia Archive: Subject-URI Access
  • 47. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Second Generation DBpedia Archive: Subject-URI Access Subject-URI Access TimeGate URI http://dbpedia.mementodepot.org/timegate/{DBpediaURI} http://dbpedia.mementodepot.org/timegate/http://dbpedia.org/data/Ghent TimeMap URI http://dbpedia.mementodepot.org/timemap/link/{DBpediaURI} http://dbpedia.mementodepot.org/timemap/link/http://dbpedia.org/data/Ghent Memento URI http://dbpedia.mementodepot.org/{yyyymmdd}/{DBpediaURI} http://dbpedia.mementodepot.org/20080103/http://dbpedia.org/data/Ghent Further info http://mementoweb.org/depot/native/dbpedia/ Try it with Memento for Chrome – http://bit.ly/memento-for-chrome
  • 48. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Second Generation DBpedia Archive: Access Characteristics TimeGate software ① node.js LDF server 2.0.0 ② LDF js client access type ① ?s?p?o Query-URI & datetime ② Subject-URI & datetime external integration ① DBpedia LDF server ② current DBpedia clients • all clients: direct access to Mementos of Subject-URI and ?s?p?o Query-URI • Memento clients: datetime negotiation with Subject-URI and ?s?p?o Query-URI 1 2
  • 49. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Outline • Prelude: Memento and Linked Data • First Generation DBpedia Archive • Devising Affordable/Useful Linked Data Archives • Intermezzo: Triple Pattern Fragments (TPF) • Intermezzo: Binary RDF Representation (HDT) • Devising Affordable/Useful Linked Data Archives • Second Generation DBpedia Archive • Try this At Home
  • 50. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Building a Linked Data Archive • Convert the archival data set(s) to HDT using HDT-CPP
  • 51. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 HDT Software (C++) https://github.com/rdfhdt/hdt-cpp • input data requires cleaning before processing, especially regarding URI characters • DBpedia data not clean • DBpedia v3.5 was not successfully processed • No meaningful error messages to help locate problems • memory intensive • Kyoto Cabinet was used to optimize storage requirement and speed during processing • Java version exists but has memory problems
  • 52. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Building a Linked Data Archive • Convert the archival data set(s) to HDT using HDT-CPP • Download the Triple Fragment Server code
  • 53. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Fragment Server (Node.js) https://github.com/LinkedDataFragments/Server.js • provides ?s?p?o access to local and/or remote Linked Data sets • supports HDT, Turtle files, N- Triple files, JSON-LD files, SPARQL endpoints, in- memory store, and BlazeGraph Linked Data sets • version 2.0.0 (released March 31 2016) has built-in Memento support
  • 54. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Building a Linked Data Archive • Convert the archival data set(s) to HDT using HDT-CPP • Download the Triple Fragment Server code • Create the JSON config file for Memento
  • 55. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Linked Data Fragment Server, Memento Configuration https://github.com/LinkedDataFragments/Server.js/wiki/Configuring-Memento • declare archival data set(s) • add datetime ranges for the archival data set(s) • add a TimeGate • list the archival data set(s) for which the TimeGate should support datetime negotiation
  • 56. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Building a Linked Data Archive • Convert the archival data set(s) to HDT using HDT-CPP • Download the Triple Fragment Server code • Create the JSON config file for Memento • Run the server
  • 57. Herbert Van de Sompel & Miel Vander Sande CNI Spring Meeting, San Antonio, TX, April 5 2016 Herbert Van de Sompel @hvdsomp Los Alamos National Laboratory Acknowledgments: Lyudmila Balakireva, Harihar Shankar, Ruben Verborgh Access to DBpedia Versions using Memento and Triple Pattern Fragments Miel Vander Sande @Miel_vds Ghent University