SlideShare a Scribd company logo
State of the Art
SUMMARY OF D3.1 STATE OF THE ART
D GIARETTA
Outline
Preservation – State of the Art
Challenges for Linked Data
Options
Conclusions
EC policy – a brief
history – a personal view
EC support for
 DP research
 for creating digital objects
 Data
 Digitisation
 e-Infrastructure
 to
 Digital Agenda
National funding
 Significantly more than EC funding
 What is the EC role?
DP research: approx
100M€ from EC
From Research on Digital Preservation within projects co-funded by the European
Union in the ICT programme, 2011, Stephan Strodl et al
http://cordis.europa.eu/fp7/ict/creativity/report-research-digital-preservation_en.pdf
Situation now
The digital preservation community has failed in persuading the EC
that there is need for more funding for DP research
◦We do not have a consistent story about:
◦ Costs
◦ Rights
◦ Methods etc
◦ “Emulate or Migrate” inadequate!
◦ Who is doing it right
Luxembourg unit which previously funded DP research – name
changed to “Creativity” - now shows no funding for digital
preservation research
EC expects results from the previous 100 M € research by deploying
solutions
Digital Preservation –
some quotes:
Head of unit funding the Digital Preservation
projects asked repeatedly:
◦“Who pays and why?”
NSF colleague:
◦“Digital preservation is like VAT – people don’t
like it”
Value pyramid
From Riding the Wave
“The Digital Agenda for Europe outlines policies and actions
to maximise the benefit of the digital revolution for all.
Supporting research and innovation is a key priority of the
Agenda, essential if we want to establish a flourishing digital
economy.”
Neelie Kroes,
Vice-President of the EC, responsible for the Digital Agenda
Data is the new gold.
“We have a huge goldmine… Let’s start mining it.”
Neelie Kroes
That is the magic to find value amid the mass of data. The right infrastructure, the
right networks, the right computing capacity and, last but not least, the right
analysis methods and algorithms help us break through the mountains of rock to
find the gold within.
……but
Gold is precious because
◦it is rare
◦it does not combine with other elements
◦it does not perish
……..but……….
Data is valuable because
◦there is so much of it
◦it is more valuable when it is combined together
◦BUT it is far from imperishable
Role for
Linked Data
OR
Preservation – State of the Art
Problems when
preserving data
Preserve?
Preserve what?
For how long?
How to test?
Which people?
Which organisations?
How well?
• Metadata? – What kind? How much?
Difficulties in digital
preservation
Many different terminologies
Many different views of preservation
Many different kinds of digital objects
◦ Documents
◦ Data
◦ …… and new types of objects
Tools and Services
◦ Which ones work for which digital objects?
◦ Which tools/techniques fit together?
◦ How to integrate new tools
Consistent training needed
Risks vs Cost
Who can you trust?
}Need a
consistent,
coherent
approach to
digital
preservation
- APARSEN.
Need an Audit and Certification
system – ISO 16363
OAIS – ISO 14721
Preservation techniques
For each technique
look for evidence – what
evidence?
must at least make sure we
consider different types of data
◦rendered vs non-rendered
◦composite vs simple
◦dynamic vs static
◦active vs passive
must look at all types of threats
Basic preservation
activities
Libraries say:
“Emulate or migrate”
◦ Works well with data only in special cases
◦ Can repeat what was done before instead of new things
◦ Does not help with building cross-disciplinary communities
• Can repeat what has been
done before
BUT
• Cannot use new applications
• Convert to format which
new software can use
BUT
• What if there are many
software systems?
Contains numbers – need
meaning
16
...to be combined and processed
to get this
17
Level 2Level 0 Level 1
Processing
Processing/c
ombining
...or this
18
OAIS Information model:
Representation Information
The Information Model is
keyRecursion ends at
KNOWLEDGEBASE of
the DESIGNATED
COMMUNITY
(this knowledge will
change over time
and region)
Does not demand that
ALL Representation
Information be
collected at once.
A process which can
be tested
FITS FILE
FITS DICTIONARY
FITS
STANDARD
PDF
SOFTWAREJAVA VM
PDF
STANDARD
FITS JAVA
SOFTWARE
DICTIONARY
SPECIFICATION
XML
SPECIFICATION
UNICODE
SPECIFICATION
Rep Info Network
Additional technique:
add Representation Information
Descriptions of the digitally encoded
object
Ideal description allows a machine to
extract information
Migration
OAIS defines various types of Migration:
◦Do not change the bits
◦Refresh
◦Replicate
◦Change the packaging but not the content
◦Repackage
◦Change the content
◦Transform (usually non-reversible)
◦Need to consider “Transformational Information Properties” – important for
AUTHENTICITY
◦Related to “Significant properties”
◦Add appropriate Representation Information for the new format
22
AND – be prepared to
Hand-over
Preservation requires funding
Funding for a dataset (or a repository) may stop
Need to be ready to hand over everything needed
for preservation
◦OAIS (ISO 14721) defines “Archival Information Package
(AIP).
◦Issues:
◦ Storage naming conventions
◦ Representation Information
◦ Provenance
◦ ….
Preserving digitally
encoded information
Ensure that digitally encoded information
are understandable and usable over the long
term
 Long term could start at just a few years
 Chain of preservation
Need to do something because things
become “unfamiliar” over time
But the same techniques enable use of data
which is “unfamiliar” right now
When things changes
We need to:
◦Know something has changed
◦Identify the implications of that change
◦Decide on the best course of action for preservation
◦What RepInfo we need to fill the gaps
◦ Created by someone else or creating a new one
◦If transformed: how to maintain data authenticity
◦Alternatively: hand it over to another repository
◦Make sure data continues to be usable
Orchestration
Service
Gap Identification
Service
Preservation
Strategy Tk
RepInfo Registry
Service
Authenticity
Toolkit
Packaging Tk
Data
Virtualisati
on Toolkit
Process
Virtualisati
on Toolkit
RepInfo
Toolkit
SCIDIP-ES
Storage
Service
Gap
Identification
Service
Orchestration
Service
RepInfo
Registry
Service
Preservation
Strategy
Toolkit
Data
Virtualisation
Toolkit
Process
Virtualisation
Toolkit
Authenticity
Toolkit
Packaging
Toolkit
RepInfo
Toolkit
Finding
Aid
Toolkit
Cloud
Storage
External
Access/Use
Services
Persistent ID
i/f Service
External
PI
services
ISO
Certification
Organisation
Certification
Toolkit
Services:
run on remote
servers
Toolkits
Runs on
local
machines
• These SUPPLEMENT what repositories do (customised for repositories)
• Make it easier for repositories to do preservation – share the effort
D.3.1: State of the Art - Linked Data and Digital Preservation
Preservation objectives
The same digital object may be
preserved with different aims in mind
by different repositories:
For a digital document
Re-print the pages?
To understand the numbers printed in the page to
do further research
For a piece of performance art
Replay a recording of a particular performance?
Re-perform the work?
For a scientific data file
Understand the numbers?
Understand the numbers in the context of a
particular theory?
Preservation, Value and
Re-use
(re-)usability the essential test for success of preservation
◦ Usability usually essential for justifying cost of preservation
Impossible to insist on common formats, semantics or software
◦ How to avoid N2
problem?
Impossible to know what formats, semantics or software will be used in future
Needs appropriate Representation Information
◦ for preservation (use in the future when things have become unfamiliar)
◦ for use now (use of unfamiliar data i.e. most of it!)
◦ automated (re-)use as far as possible
APARSEN is bringing together a coherent, consistent, evidence-based approach to
digital preservation involving tools, services, consultancy and training.
Classification of objects
must at least make sure we
consider different types of data
◦rendered vs non-rendered
◦composite vs simple
◦dynamic vs static
◦Active vs passive
RDF Triple: dynamic/complex/non-rendered/passive
Key questions about the
what is to be preserved
What is the object to be preserved?
The specific piece of RDF?
The specific RDF plus data pointed to
The underlying database (if any)?
 The whole linked “world”?
What are the preservation objectives?
The RDF and whole inference system?
Just the RDF?
Just the underlying database (if any)?
Key questions about
RDF
What Representation information is needed for the LD?
Schema?
Additional semantics?
Evolution of links e.g. replace this host by a new one)?
Snapshots?
What Transformation?
One version of RDF to another?
Move to replacement for RDF?
Change of underlying database?
Authenticity??
Who to hand over to
What to do with the URIs? – maintain or change?
What to do with the underlying database (if any)?
Key questions about the
things the RDF points to
Will they be preserved?
How to find the Representation
Information?
Will the Persistent Identifiers change?
Joint Key Questions
Who will pay, and why?
For which things?
Are some things more valuable – and therefore
more likely to be preserved?
What happens when some things disappear?
Options
Be clear about what is meant
Understand what is possible
Start with what is agreed as valuable
Don’t promise too much
Input to standards
See http://www.iso16363.org
Audit and Certification of Trustworthy
repositories
Forum: OAIS Futures
Conclusions
A great deal of funding (€100M) has been
invested in digital preservation research by the EU
EC is not putting further funding into digital
preservation research
There are technical challenges
The biggest challenge is to be clear about what
the preservation aims are for Linked Data

More Related Content

PDF
Big data data lake and beyond
PPTX
(Big) Data (Science) Skills
PPTX
Unpacking persistent identifiers for research
PDF
2015 05-27-congrés archivoscatalunya
PPTX
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
PPTX
Setting up Dataverse repository for research data
 
PDF
Semantic Technologies for Big Data
PPTX
Controlled vocabularies and ontologies in Dataverse data repository
 
Big data data lake and beyond
(Big) Data (Science) Skills
Unpacking persistent identifiers for research
2015 05-27-congrés archivoscatalunya
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Setting up Dataverse repository for research data
 
Semantic Technologies for Big Data
Controlled vocabularies and ontologies in Dataverse data repository
 

What's hot (20)

PPTX
Digitization Basics for Libraries, Archives, and Museums
PPTX
External controlled vocabularies support in Dataverse
 
PPT
Doc Book Vs Dita Teresa
 
PPTX
Building an electronic repository and archives on Dataverse in the European O...
 
PPT
Intro to Digitization Projects
PPTX
Motivation for big data
PDF
Digital preservation: an introduction
PPTX
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
PDF
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
PPTX
The world of Docker and Kubernetes
 
PDF
Bio-IT Trends From The Trenches (digital edition)
PPTX
5 years of Dataverse evolution
 
PPTX
Metaverse for Dataverse
 
PPTX
Building COVID-19 Museum as Open Science Project
 
PDF
Big Data: hype or necessity?
PDF
Long Live Posix - HPC Storage and the HPC Datacenter
PPTX
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
PPTX
Flexible metadata schemes for research data repositories - Clarin Conference...
PDF
Machine Learning Deep Learning AI and Data Science
PPTX
Leveraging open source for big data stack
Digitization Basics for Libraries, Archives, and Museums
External controlled vocabularies support in Dataverse
 
Doc Book Vs Dita Teresa
 
Building an electronic repository and archives on Dataverse in the European O...
 
Intro to Digitization Projects
Motivation for big data
Digital preservation: an introduction
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
The world of Docker and Kubernetes
 
Bio-IT Trends From The Trenches (digital edition)
5 years of Dataverse evolution
 
Metaverse for Dataverse
 
Building COVID-19 Museum as Open Science Project
 
Big Data: hype or necessity?
Long Live Posix - HPC Storage and the HPC Datacenter
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Flexible metadata schemes for research data repositories - Clarin Conference...
Machine Learning Deep Learning AI and Data Science
Leveraging open source for big data stack
Ad

Viewers also liked (9)

PPT
Digital preservation
PPSX
DIACHRON Preservation: Evolution Management for Preservation
PPTX
Organizational and Economic Issues in Linked Data Preservation
PPTX
Towards long-term preservation of linked data - the PRELIDA project
PPTX
2nd generation of design tools for ocean energy devices and arrays developmen...
PPTX
Preserving linked data: sustainability and organizational infrastructure
PDF
La conservazione dei documenti digitali
PPT
Digital Preservation
PPT
Brief Introduction to Digital Preservation
Digital preservation
DIACHRON Preservation: Evolution Management for Preservation
Organizational and Economic Issues in Linked Data Preservation
Towards long-term preservation of linked data - the PRELIDA project
2nd generation of design tools for ocean energy devices and arrays developmen...
Preserving linked data: sustainability and organizational infrastructure
La conservazione dei documenti digitali
Digital Preservation
Brief Introduction to Digital Preservation
Ad

Similar to D.3.1: State of the Art - Linked Data and Digital Preservation (20)

PPT
Digital Preservation
PPTX
Completepresentation
PPT
Trm Introduction
PPT
Digital Preservation
PDF
WHAT IS DIGITAL PRESERVATION? DISCUSS ITS SIGNIFICANCE IN TODAY’S INFORMATIO...
PDF
Digital Preservation for Libraries Archives and Museums 2nd Edition Edward M....
PPTX
Electronic Records
PDF
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
PDF
Digital Preservation (UWE)
PPT
Repositories and digital preservation
PPTX
Digitization for Access and Preservation: The Evolving Debate in the Cultural...
PPT
Getaneh Alemu
PDF
Corrado -- Establishing the Landscape
PPTX
Digital preservation and curation of information.presentation
PPT
The digital preservation technical context
PPT
Metadata for digital long-term preservation
PPT
Introduction to Digital Preservation
PPT
Digital Archives in Theory and Practice
PPTX
20100401 정영임 da 전략 tft_0330
PPTX
20100401 정영임 da 전략 tft_0330
Digital Preservation
Completepresentation
Trm Introduction
Digital Preservation
WHAT IS DIGITAL PRESERVATION? DISCUSS ITS SIGNIFICANCE IN TODAY’S INFORMATIO...
Digital Preservation for Libraries Archives and Museums 2nd Edition Edward M....
Electronic Records
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
Digital Preservation (UWE)
Repositories and digital preservation
Digitization for Access and Preservation: The Evolving Debate in the Cultural...
Getaneh Alemu
Corrado -- Establishing the Landscape
Digital preservation and curation of information.presentation
The digital preservation technical context
Metadata for digital long-term preservation
Introduction to Digital Preservation
Digital Archives in Theory and Practice
20100401 정영임 da 전략 tft_0330
20100401 정영임 da 전략 tft_0330

More from PRELIDA Project (13)

PDF
Steps towards a Data Value Chain
PPTX
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
PDF
Experiments with evolving RDF
PDF
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
PDF
Media Ecology Project
PPTX
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
PPTX
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
PPTX
DIACHRON Project Overview
PPTX
PRELIDA Project Draft Roadmap
PPTX
Introduction to PRELIDA Consolidation and Dissemination Workshop
PPTX
D3.1 State of the art assessment on Linked Data and Digital Preservation
PPTX
Gap Analysis
PPTX
Introduction to Prelida
Steps towards a Data Value Chain
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...
Experiments with evolving RDF
Privacy‐Aware Preservation: Challenges from the Perspective of a Linked Data ...
Media Ecology Project
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
DIACHRON Project Overview
PRELIDA Project Draft Roadmap
Introduction to PRELIDA Consolidation and Dissemination Workshop
D3.1 State of the art assessment on Linked Data and Digital Preservation
Gap Analysis
Introduction to Prelida

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Electronic commerce courselecture one. Pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Cloud computing and distributed systems.
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Encapsulation theory and applications.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
cuic standard and advanced reporting.pdf
PPT
Teaching material agriculture food technology
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Electronic commerce courselecture one. Pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
NewMind AI Monthly Chronicles - July 2025
Cloud computing and distributed systems.
Empathic Computing: Creating Shared Understanding
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
Encapsulation theory and applications.pdf
Spectral efficient network and resource selection model in 5G networks
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
cuic standard and advanced reporting.pdf
Teaching material agriculture food technology

D.3.1: State of the Art - Linked Data and Digital Preservation

  • 1. State of the Art SUMMARY OF D3.1 STATE OF THE ART D GIARETTA
  • 2. Outline Preservation – State of the Art Challenges for Linked Data Options Conclusions
  • 3. EC policy – a brief history – a personal view EC support for  DP research  for creating digital objects  Data  Digitisation  e-Infrastructure  to  Digital Agenda National funding  Significantly more than EC funding  What is the EC role?
  • 4. DP research: approx 100M€ from EC From Research on Digital Preservation within projects co-funded by the European Union in the ICT programme, 2011, Stephan Strodl et al http://cordis.europa.eu/fp7/ict/creativity/report-research-digital-preservation_en.pdf
  • 5. Situation now The digital preservation community has failed in persuading the EC that there is need for more funding for DP research ◦We do not have a consistent story about: ◦ Costs ◦ Rights ◦ Methods etc ◦ “Emulate or Migrate” inadequate! ◦ Who is doing it right Luxembourg unit which previously funded DP research – name changed to “Creativity” - now shows no funding for digital preservation research EC expects results from the previous 100 M € research by deploying solutions
  • 6. Digital Preservation – some quotes: Head of unit funding the Digital Preservation projects asked repeatedly: ◦“Who pays and why?” NSF colleague: ◦“Digital preservation is like VAT – people don’t like it”
  • 8. “The Digital Agenda for Europe outlines policies and actions to maximise the benefit of the digital revolution for all. Supporting research and innovation is a key priority of the Agenda, essential if we want to establish a flourishing digital economy.” Neelie Kroes, Vice-President of the EC, responsible for the Digital Agenda Data is the new gold. “We have a huge goldmine… Let’s start mining it.” Neelie Kroes That is the magic to find value amid the mass of data. The right infrastructure, the right networks, the right computing capacity and, last but not least, the right analysis methods and algorithms help us break through the mountains of rock to find the gold within.
  • 9. ……but Gold is precious because ◦it is rare ◦it does not combine with other elements ◦it does not perish ……..but………. Data is valuable because ◦there is so much of it ◦it is more valuable when it is combined together ◦BUT it is far from imperishable Role for Linked Data
  • 10. OR
  • 12. Problems when preserving data Preserve? Preserve what? For how long? How to test? Which people? Which organisations? How well? • Metadata? – What kind? How much?
  • 13. Difficulties in digital preservation Many different terminologies Many different views of preservation Many different kinds of digital objects ◦ Documents ◦ Data ◦ …… and new types of objects Tools and Services ◦ Which ones work for which digital objects? ◦ Which tools/techniques fit together? ◦ How to integrate new tools Consistent training needed Risks vs Cost Who can you trust? }Need a consistent, coherent approach to digital preservation - APARSEN. Need an Audit and Certification system – ISO 16363 OAIS – ISO 14721
  • 14. Preservation techniques For each technique look for evidence – what evidence? must at least make sure we consider different types of data ◦rendered vs non-rendered ◦composite vs simple ◦dynamic vs static ◦active vs passive must look at all types of threats
  • 15. Basic preservation activities Libraries say: “Emulate or migrate” ◦ Works well with data only in special cases ◦ Can repeat what was done before instead of new things ◦ Does not help with building cross-disciplinary communities • Can repeat what has been done before BUT • Cannot use new applications • Convert to format which new software can use BUT • What if there are many software systems?
  • 16. Contains numbers – need meaning 16
  • 17. ...to be combined and processed to get this 17 Level 2Level 0 Level 1 Processing Processing/c ombining
  • 19. OAIS Information model: Representation Information The Information Model is keyRecursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region) Does not demand that ALL Representation Information be collected at once. A process which can be tested
  • 20. FITS FILE FITS DICTIONARY FITS STANDARD PDF SOFTWAREJAVA VM PDF STANDARD FITS JAVA SOFTWARE DICTIONARY SPECIFICATION XML SPECIFICATION UNICODE SPECIFICATION Rep Info Network
  • 21. Additional technique: add Representation Information Descriptions of the digitally encoded object Ideal description allows a machine to extract information
  • 22. Migration OAIS defines various types of Migration: ◦Do not change the bits ◦Refresh ◦Replicate ◦Change the packaging but not the content ◦Repackage ◦Change the content ◦Transform (usually non-reversible) ◦Need to consider “Transformational Information Properties” – important for AUTHENTICITY ◦Related to “Significant properties” ◦Add appropriate Representation Information for the new format 22
  • 23. AND – be prepared to Hand-over Preservation requires funding Funding for a dataset (or a repository) may stop Need to be ready to hand over everything needed for preservation ◦OAIS (ISO 14721) defines “Archival Information Package (AIP). ◦Issues: ◦ Storage naming conventions ◦ Representation Information ◦ Provenance ◦ ….
  • 24. Preserving digitally encoded information Ensure that digitally encoded information are understandable and usable over the long term  Long term could start at just a few years  Chain of preservation Need to do something because things become “unfamiliar” over time But the same techniques enable use of data which is “unfamiliar” right now
  • 25. When things changes We need to: ◦Know something has changed ◦Identify the implications of that change ◦Decide on the best course of action for preservation ◦What RepInfo we need to fill the gaps ◦ Created by someone else or creating a new one ◦If transformed: how to maintain data authenticity ◦Alternatively: hand it over to another repository ◦Make sure data continues to be usable Orchestration Service Gap Identification Service Preservation Strategy Tk RepInfo Registry Service Authenticity Toolkit Packaging Tk Data Virtualisati on Toolkit Process Virtualisati on Toolkit RepInfo Toolkit
  • 28. Preservation objectives The same digital object may be preserved with different aims in mind by different repositories: For a digital document Re-print the pages? To understand the numbers printed in the page to do further research For a piece of performance art Replay a recording of a particular performance? Re-perform the work? For a scientific data file Understand the numbers? Understand the numbers in the context of a particular theory?
  • 29. Preservation, Value and Re-use (re-)usability the essential test for success of preservation ◦ Usability usually essential for justifying cost of preservation Impossible to insist on common formats, semantics or software ◦ How to avoid N2 problem? Impossible to know what formats, semantics or software will be used in future Needs appropriate Representation Information ◦ for preservation (use in the future when things have become unfamiliar) ◦ for use now (use of unfamiliar data i.e. most of it!) ◦ automated (re-)use as far as possible APARSEN is bringing together a coherent, consistent, evidence-based approach to digital preservation involving tools, services, consultancy and training.
  • 30. Classification of objects must at least make sure we consider different types of data ◦rendered vs non-rendered ◦composite vs simple ◦dynamic vs static ◦Active vs passive RDF Triple: dynamic/complex/non-rendered/passive
  • 31. Key questions about the what is to be preserved What is the object to be preserved? The specific piece of RDF? The specific RDF plus data pointed to The underlying database (if any)?  The whole linked “world”? What are the preservation objectives? The RDF and whole inference system? Just the RDF? Just the underlying database (if any)?
  • 32. Key questions about RDF What Representation information is needed for the LD? Schema? Additional semantics? Evolution of links e.g. replace this host by a new one)? Snapshots? What Transformation? One version of RDF to another? Move to replacement for RDF? Change of underlying database? Authenticity?? Who to hand over to What to do with the URIs? – maintain or change? What to do with the underlying database (if any)?
  • 33. Key questions about the things the RDF points to Will they be preserved? How to find the Representation Information? Will the Persistent Identifiers change?
  • 34. Joint Key Questions Who will pay, and why? For which things? Are some things more valuable – and therefore more likely to be preserved? What happens when some things disappear?
  • 35. Options Be clear about what is meant Understand what is possible Start with what is agreed as valuable Don’t promise too much
  • 36. Input to standards See http://www.iso16363.org Audit and Certification of Trustworthy repositories Forum: OAIS Futures
  • 37. Conclusions A great deal of funding (€100M) has been invested in digital preservation research by the EU EC is not putting further funding into digital preservation research There are technical challenges The biggest challenge is to be clear about what the preservation aims are for Linked Data

Editor's Notes

  • #15: Image, document Rendered/ Static/ Simple Dynamic database with stored procedures Non-rendered/ Dynamic/ Complex Scientific dataset Non-rendered/ Static/ Complex
  • #16: Data is migrated – big job but is done sometimes. Emulation is sometimes used but mainly for repeating processing for some specific reason. More generally users do not want to simply repeat what has been done before.
  • #20: Just to be clear – I am focussing on the OAIS Information Model
  • #23: Divide Migration into 3 groups depending what changes: Refresh – replace media like for like Replicate – maybe new media Repackage – e.g. copy from tape to disk Transform – e.g. change from Word to PDF or - The “migrate” in “emulate or migrate” is the third one - Transform
  • #32: Image, document Rendered/ Static/ Simple Dynamic database with stored procedures Non-rendered/ Dynamic/ Complex Scientific dataset Non-rendered/ Static/ Complex