SlideShare a Scribd company logo
UKOLN is supported  by: A brief introduction to digital preservation Michael Day Research and Development Team Leader UKOLN, University of Bath MSc Lecture, UWE, Bristol, 10 March 2010
Presentation outline Digital preservation basics Digital preservation challenges The OAIS Reference Model Digital preservation principles and strategies Digital preservation tools: Case studies (if time): E-mail Websites Exercise
Digital preservation challenges (1) Technical challenges Digital media Currently magnetic or optical tape and disks, some devices (e.g., memory sticks) Uncertain lifetimes Hardware and software dependence Most digital objects are dependent on particular configurations of hardware and software Relatively short obsolescence cycles
Digital preservation challenges (2) Conceptual challenges: Three levels of information required: Physical layer – unusually a bitstream Logical layer – defines how to interpret the bitstream (through software) to generate meaningful information (e.g. ASCII, XML, file formats) Conceptual layer – real world objects Some are analogues of traditional objects, e.g. meeting minutes, research papers Others are not, e.g. Web pages, GIS, 3D models of chemical structures Complex and dynamic
Digital preservation challenges (3) On which of the three layers should preservation activities focus? We need to preserve the ability to reproduce the objects, not just the bits In fact, we can change the bits and logical representation and still reproduce an ‘authentic’ conceptual object (e.g. by converting a text file into PDF or TIFF) Authenticity and integrity How can we trust that an object is what it claims to be? Digital information can easily be changed by accident or design
Digital preservation basics An ongoing approach to managing digital content based on: The identification and adoption of appropriate preservation strategies Creation or Ingest stages are normally the best time to ensure that data are fit-for-purpose and “preservable” The collection and management of appropriate metadata Capture of explicit and implicit knowledge, contexts The ongoing monitoring of technical contexts and the application of preservation planning techniques Continual monitoring of the organisation (audit)
OAIS Reference Model (1) Reference Model for an Open Archival Information System (OAIS) ISO 14721:2003 Space data and information transfer systems -- Open archival information system -- Reference model Defines: Common vocabulary (definitions of key concepts) Information model (information packages, metadata, etc.) Functional model (six functional entities) Mandatory responsibilities
OAIS Reference Model (2) OAIS Mandatory Responsibilities: Negotiating and accepting information Obtaining sufficient control of the information to ensure long-term preservation Determining the "designated community"  Ensuring that information is independently understandable, i.e. can be (re)used without the assistance of those who produced it Following documented policies and procedures  Making the preserved information available
OAIS Reference Model (3) Administration Ingest Archival Storage Access Data Management Descriptive info. PRODUCER CONSUMER MANAGEMENT queries result sets Descriptive info. Preservation Planning orders OAIS Functional Entities (Figure 4-1) SIP SIP SIP DIP DIP AIP AIP
OAIS Reference Model (4) OAIS Information Model: Defines the “Information Packages” required Ingest (Submission Information Package) Storage (Archival Information Package) Access (Dissemination Information Package) General principle of Information Packages: All objects are wrapped in multiple layers of metadata (Representation Information, Descriptive Information, Packaging, etc.)
OAIS Reference Model (5) Implementation fundamentals: OAIS is a reference model (a conceptual framework), NOT a blueprint for system design It informs the design of system architectures, the development of systems and components It provides common definitions of terms … a common language, a means of making comparison But it does NOT ensure consistency or interoperability between implementations Conformance only relates to mandatory responsibilities and following the information model
The DCC Lifecycle Model Digital Curation: “… The activity of, managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and re-use” (Lord & MacDonald, 2003) DCC Digital Curation Lifecycle Model: Focused on the  entire  lifecycle of objects (influenced by records management and archives thinking) from creation, through appraisal, ingest, storage, to access and reuse Preservation activities at core of model …
 
Digital preservation principles (1)  Most of the technical problems associated with long-term digital preservation can be solved if a life-cycle management approach is adopted  i.e. a continual programme of active management Ideally, combines both managerial and technical processes, e.g., as in the OAIS Reference Model Many current preservation systems are attempting to support this approach Digital preservation strategies need to be seen in this wider context Wherever possible, retain also the original byte-stream
Digital preservation principles (2) Preservation needs to be considered at a very early stage in an object's life-cycle There is a need to identify 'significant properties' Recognises that preservation is context dependent, even user specific (concept of 'designated community') “ Performance” model (National Archives of Australia) Helps with choosing an acceptable preservation strategy Encapsulation Surrounding the digital object - at least in theory - with all of the information needed to decode and understand it (including software)
Digital preservation principles (3) Metadata and documentation is vitally important Relates to OAIS Information Model concepts like Representation Information and Preservation Description Information Functions Records meaning Records the context Enables the development of finding aids Specific standards are being developed that support digital preservation activities (e.g., the PREMIS Data Dictionary)
Digital preservation strategies Technology preservation Maintaining technology Computer museums, digital archaeology Emulation Running original bit-streams and application software on emulator programs that mimic the behaviour of obsolete hardware and operating systems Migration Periodic transfer of digital information from one hardware and software configuration to another, or from one generation of computer technology to a subsequent one
Choosing a strategy (1) Preservation strategies are not in competition Different strategies will work together, may be value in diversification Migration strategies mean difficult choices need to be made about target formats But the strategy chosen has implications for: The technical infrastructure required (and metadata) Collection management priorities Rights management Owning the rights to re-engineer software Costs
Choosing a strategy (2) Plato preservation planning tool (EU Planets project) A decision support tool that helps users explore the evaluation of potential preservation solutions against specific requirements and for building a plan for preserving a given set of objects Integrates file format identification (using DROID); some migration services; XML-based generic format characterisation using XCL (eXtensible Characterisation Languages) http://www.ifs.tuwien.ac.at/dp/plato/intro.html
Preservation support on ingest Formats can be identified and validated on ingest or deposit into a repository JHOVE (JSTOR/Harvard Object Validation Environment) PRONOM, DROID (The National Archives) Metadata Some tools exist for the automatic capture of metadata Standardisation on ingest Received wisdom suggests the adoption of open or non-proprietary standards, e.g. databases structured in XML, uncompressed images, 'preservation friendly' standards like PDF/A
Repository audit frameworks Repository audit frameworks first developed out of the OAIS Reference Model OAIS Mandatory Responsibilities (only six of them): The main focus was on technical and organisational aspects, e.g.: That repositories ensure that preserved information (content) can be understood (independently understandable) That documented policies and procedures are being followed No clear concept of OAIS compliance (although this is often claimed by system developers)
TRAC Criteria and Checklist (1) Trusted Repositories Audit and Certification (TRAC): Criteria and Checklist Background: Checklist developed by the RLG-NARA Digital Repository Certification Task Force Revised (following pilot audits) by the Center for Research Libraries and OCLC Based upon OAIS concepts
TRAC Criteria and Checklist (2) TRAC criteria cover three main aspects: Organisational Infrastructure Governance and viability, structure and staffing, financial sustainability, contracts, etc. Digital Object Management Ingest, preservation planning, archival storage, etc. Technologies, Technical Infrastructure, & Security Systems and infrastructure, etc.
TRAC Checklist example page
DRAMBORA DRAMBORA (Digital Repository Audit Method Based on Risk Assessment) Digital Curation Centre / Digital Preservation Europe “ Presents a methodology for self-assessment, encouraging organisations to establish a comprehensive self-awareness of their objectives, activities and assets before identifying, assessing and managing the risks implicit within their organisation“ Identifying risks and scoring each one on likelihood and impact Covers: organisational context, policies, assets, risks, etc. Online tool (http://www.repositoryaudit.eu/about/)
Repository audit frameworks A means of "asking the right questions" about your repository and documenting appropriate procedures and risks Both TRAC and DRAMBORA are under consideration by (different) ISO technical committees External badge of quality (a "certified preservation repository") vs. Management tool for self assessment
Case study 1: E-mail preservation Electronic Mail Now ubiquitous in many business contexts A mixture of records and other stuff High-risk if not managed properly: Loss of accountability, efficiency, public credibility, organisational memory, etc. There also may be legal and financial consequences An obvious candidate for the records management approach
Some specific challenges of E-mail Inappropriate content For example: spam, personal messages, illegal content Wide range of attachment types – some will provide preservation challenges of their own Unclear responsibilities: Users can be reluctant to ‘manage’ incoming mail E-mail seen as personal domain, not as organisational property ... this can have consequences …
 
"All staff will be reminded of the appropriate use of Number 10 resources" – Downing Street spokesperson
 
“ The unfortunate incident that has taken place through the illegal hacking of the private communications of individual scientists …” (Rajendra Pachauri, Chairman of the UN Intergovernmental Panel on Climate Change, statement, 4 Dec 2009, http://www.ipcc.ch/) “ Since emails are normally intended to be private, people writing them are, shall we say, somewhat freer in expressing themselves than they would in a public statement” (RealClimate Web pages, http://www.realclimate.org/)
Approaches to managing e-mail Developing specific policies for managing email within an organisation Produce guidance for creators (and others) Identify the chain of custody through lifecycle Need to involve all people involved, e.g. creators, managers, records managers, IT staff, etc. Developing a preservation approach Appraisal - the identification of key e-mail content or records Preservation strategies – the adoption of suitable strategies to deal with that content that needs to be retained
E-mail policies (1) Policies need to cover: Creation practices Using business e-mail accounts for private use & vice versa Levels of organisational monitoring Legal issues Integrated records retention and preservation Disposal
E-mail policies (2) From: http://www.hm-treasury.gov.uk/about_record_mngmnt_pol.htm
E-mail preservation Appraisal Determining what content needs to be preserved Destruction of transient/unnecessary e-mails Saving e-mail records independently of the e-mail client Check that content is complete - comprising message body, headers & attachments Consider authenticity requirements Ingest into an organisational EDRMS or repository Make decisions on appropriate preservation strategies for content and attachments Selecting a standard format? Significant properties?
Lost e-mails from the past The world’s very first  network  email Sent by Ray Tomlinson (BBN Technologies), late 1971 A test message, probably something like “QWERTYUIOP” (documented, but not preserved – the contents were “entirely forgettable, and I have, therefore, forgotten them”) First ‘real’ message explained to colleagues how to send messages over the network (exact text now unknown) Probably no significant records management implications, but a key step in the historical development of the Internet was not recorded
Case study 2: Preserving Websites Websites are ubiquitous: “ The Web has become the platform and interface of choice for virtually every kind of information system” (JISC-PoWR Handbook) Typically run by IT staff (e.g., Web managers), main responsibilities relate to keeping systems online, stable and secure, and up-to-date … content is constantly evolving Potential role for records managers to identify which parts of institutional Websites need to be incorporated within RM guidelines
Preserving Websites (2) Things to consider: The identification / appraisal of Web records Change frequency Ownership and rights Databases and the “deep Web” The use of Content Management Systems (CMS) Streamed content The use of third-party sites Personalisation / Web 2.0 / social networking
Preserving Websites (3) Collection approaches: Various harvesting tools exist (e.g. Heritrix) Domain harvesting, selective capture, periodic capture Working with third parties – e.g.: European Archive (http://www.europarchive.org/) Internet Archive (http://www.archive.org/) Some examples of existing initiatives: UK Government Web Archive (TNA): http://www.nationalarchives.gov.uk/webarchive/ UK Web Archive (BL, JISC, Wellcome Library, NLW) http://www.webarchive.org.uk/ukwa/
Preserving Websites (4) Aspects of Websites that could be preserved: Information Content Information Appearance Information Behaviour Information Relationships (e.g. links, embedded or linked metadata) Change history Use history From: Kevin Ashley (ULCC), “The JISC-PoWR Handbook - Explaining Web Preservation,” via SlideShare: http://bit.ly/7GyJbd
Questions?
Further reading (1) General: Abby Smith, "The Research Library in the 21st Century: Collecting, Preserving, and Making Accessible Resources for Scholarship." In: No Brief Candle: Reconceiving Research Libraries for the 21st Century (CLIR, 2008), pp. 13-20. http://www.clir.org/pubs/abstract/pub142abst.html Priscilla Caplan, Understanding PREMIS (Library of Congress, 2009): http://www.loc.gov/standards/premis/understanding-premis.pdf Blue Riband Task Force on Sustainable Digital Preservation and Access,  Sustainable economics for a digital planet  (2010): http://brtf.sdsc.edu/ Paradigm Project Workbook: http://www.paradigm.ac.uk/workbook/ Plato Preservation Planning tool: http://www.ifs.tuwien.ac.at/dp/plato/intro.html DRAMBORA: http://www.repositoryaudit.eu/about/
Further reading (2) Preserving Emails: Maureen Pennock, “Curating E-mails,” In:  DCC Curation Manual  (2006): http://www.dcc.ac.uk/resource/curation-manual/chapters/curating-e-mails/  The National Archives,  Developing a policy for managing e-mail  (2004): http://www.nationalarchives.gov.uk/documents/managing_emails.pdf Collaborative Electronic Records Project,  Email records guidance  (Smithsonian Institution Archives & Rockefeller Archives Center, 2007): http://siarchives.si.edu/pdf/CERP_Email_guidance_supp_0307.pdf
Further reading (3) Preserving Websites: JISC-PoWR Handbook (Nov 2008): http://jiscpowr.jiscinvolve.org/handbook/ JISC-PoWR blog: http://jiscpowr.jiscinvolve.org/ The National Archives - Web Continuity project: http://www.nationalarchives.gov.uk/webcontinuity/ Adrian Brown,  Archiving Websites: a practical guide for information management professionals  (London: Facet Publishing, 2006) Julien Masanès (ed.),  Web Archiving  (Berlin: Springer-Verlag, 2006)
Acknowledgments UKOLN is funded by the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, the Museums, Libraries and Archives Council (MLA), as well as by project funding from the JISC, the European Union, and other sources. UKOLN also receives support from the University of Bath, where it is based. More information: http://www.ukoln.ac.uk/
Thank You!

More Related Content

PPT
Digital Preservation
PPTX
Digital preservation: an introduction
PPT
Digital preservation
PPT
Interoperability Protocols and Standards in LIS
PPT
Digital preservation
PPTX
Introduction to DSpace
PPTX
Digital library software
Digital Preservation
Digital preservation: an introduction
Digital preservation
Interoperability Protocols and Standards in LIS
Digital preservation
Introduction to DSpace
Digital library software

What's hot (20)

PPT
Dublin Core Intro
PDF
Greenstone Digital Library Software
ODP
Dublin core Presentation
PDF
Introduction to DSpace
PPTX
Dspace software
PDF
Information storage and retrieval
PPT
Digital Archives in Theory and Practice
PPT
Digital Preservation
PDF
WHAT IS DIGITAL PRESERVATION? DISCUSS ITS SIGNIFICANCE IN TODAY’S INFORMATIO...
PPT
FRBR model by Gaurav Boudh
KEY
Cloud computing and library services
PPTX
Unisist ppt
PPT
Metadata harvesting Tools
PPTX
Dspace
PPT
User education and information literacy - Innovative strategies and practices
PDF
Information storage and retrieval PPT.pdf
PPT
Module 1 introduction of Dspace
PPTX
Preparation, Proceed and Review of preservation of Digital Library
Dublin Core Intro
Greenstone Digital Library Software
Dublin core Presentation
Introduction to DSpace
Dspace software
Information storage and retrieval
Digital Archives in Theory and Practice
Digital Preservation
WHAT IS DIGITAL PRESERVATION? DISCUSS ITS SIGNIFICANCE IN TODAY’S INFORMATIO...
FRBR model by Gaurav Boudh
Cloud computing and library services
Unisist ppt
Metadata harvesting Tools
Dspace
User education and information literacy - Innovative strategies and practices
Information storage and retrieval PPT.pdf
Module 1 introduction of Dspace
Preparation, Proceed and Review of preservation of Digital Library
Ad

Viewers also liked (20)

PDF
Digital preservation: an introduction
PDF
Digital preservation from a records management perspective
PPT
Sustainable Digital Preservation and Access
PPTX
NCompass Live: Digital Preservation, Part 2: Storage and Protection
PPTX
Data models for preserving and publishing digital research material beyond th...
PDF
185991 open access_2011_report
PPT
Introduction to Digital Preservation
PPT
D.3.1: State of the Art - Linked Data and Digital Preservation
PPT
Curation of Research Data
PPT
Digital Preservation
PDF
Models for integrating institutional repositories and research information ma...
PPTX
Fundamental concepts in digital preservation
PPTX
Digital Preservation Best Practices: Lessons Learned From Across the Pond
PPT
An Introduction to Digital Preservation
PPTX
Apple presentation
PPT
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...
PDF
The prevention of conflict damage to archive and library materials
PPTX
Paper and Digital Filing Systems
PPTX
natural disaster project by mirza ibrahim from greenwich academy
PPT
How Document Management Solutions Benefit Government Agencies
Digital preservation: an introduction
Digital preservation from a records management perspective
Sustainable Digital Preservation and Access
NCompass Live: Digital Preservation, Part 2: Storage and Protection
Data models for preserving and publishing digital research material beyond th...
185991 open access_2011_report
Introduction to Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
Curation of Research Data
Digital Preservation
Models for integrating institutional repositories and research information ma...
Fundamental concepts in digital preservation
Digital Preservation Best Practices: Lessons Learned From Across the Pond
An Introduction to Digital Preservation
Apple presentation
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...
The prevention of conflict damage to archive and library materials
Paper and Digital Filing Systems
natural disaster project by mirza ibrahim from greenwich academy
How Document Management Solutions Benefit Government Agencies
Ad

Similar to Brief Introduction to Digital Preservation (20)

PPT
Repositories and digital preservation
PPT
Seminar: OAIS Model application in digital preservation projects
PPT
The digital preservation technical context
PPT
Introduction to digital curation
PPT
Preservation Issues
PPT
The Reference Model for an Open Archival Information System (OAIS)
PPT
Trm Vilnius Oais New
PPT
Hans Hofman - European Perspectives on Digital Preservation
PDF
Digital Preservation (UWE)
PPT
Trm Introduction
PDF
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
PPT
Digital Preservation
PPT
Getaneh Alemu
PPT
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
PPTX
Digital Presentation Best Practices: Lessons Learned From Across the Pond
PPT
Digital preservation geoscinfo
PPT
Introduction to the Reference Model for an Open Archival Information System (...
PPT
Digital Curation 101: Preserve
PPT
Metadata for digital long-term preservation
PPT
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...
Repositories and digital preservation
Seminar: OAIS Model application in digital preservation projects
The digital preservation technical context
Introduction to digital curation
Preservation Issues
The Reference Model for an Open Archival Information System (OAIS)
Trm Vilnius Oais New
Hans Hofman - European Perspectives on Digital Preservation
Digital Preservation (UWE)
Trm Introduction
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
Digital Preservation
Getaneh Alemu
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
Digital Presentation Best Practices: Lessons Learned From Across the Pond
Digital preservation geoscinfo
Introduction to the Reference Model for an Open Archival Information System (...
Digital Curation 101: Preserve
Metadata for digital long-term preservation
KeepIt Course 4: digital preservation recap, by Andreas Rauber, Hannes Kulovi...

More from Michael Day (20)

PDF
What can libraries do for researchers?
PDF
Preservation planning at the British Library
PDF
Implementing digital preservation strategy: collection profiling at the Briti...
PDF
Developing institutional RDM services
PDF
Open access data
PPT
Digital Curation 101 (University of Glamorgan)
PDF
Continuity and change: Opportunities and challenges for the future of researc...
PDF
Developing a Community Capability Model Framework for data-intensive research
PDF
Introduction to research data management
PDF
Introduction to Research Data Management: activities, roles and requirements
PPT
UKOLN activities on research information management
PDF
UKOLN Programme Support for the JISC Research Information Management Programme
PDF
EASTER project
PDF
Research Information Management
PPT
Digital preservation exercises
PDF
The Improving Access to Text (IMPACT) project and other European initiatives
PPT
Enhancing social tagging with a knowledge organization system
PPT
Disciplinary and institutional perspectives on digital curation
PPT
DCC 101: Preservation
PPT
Moving OA to the scientific enterprise
What can libraries do for researchers?
Preservation planning at the British Library
Implementing digital preservation strategy: collection profiling at the Briti...
Developing institutional RDM services
Open access data
Digital Curation 101 (University of Glamorgan)
Continuity and change: Opportunities and challenges for the future of researc...
Developing a Community Capability Model Framework for data-intensive research
Introduction to research data management
Introduction to Research Data Management: activities, roles and requirements
UKOLN activities on research information management
UKOLN Programme Support for the JISC Research Information Management Programme
EASTER project
Research Information Management
Digital preservation exercises
The Improving Access to Text (IMPACT) project and other European initiatives
Enhancing social tagging with a knowledge organization system
Disciplinary and institutional perspectives on digital curation
DCC 101: Preservation
Moving OA to the scientific enterprise

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
MIND Revenue Release Quarter 2 2025 Press Release
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf
Review of recent advances in non-invasive hemoglobin estimation
Advanced methodologies resolving dimensionality complications for autism neur...
sap open course for s4hana steps from ECC to s4
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation_ Review paper, used for researhc scholars
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Brief Introduction to Digital Preservation

  • 1. UKOLN is supported by: A brief introduction to digital preservation Michael Day Research and Development Team Leader UKOLN, University of Bath MSc Lecture, UWE, Bristol, 10 March 2010
  • 2. Presentation outline Digital preservation basics Digital preservation challenges The OAIS Reference Model Digital preservation principles and strategies Digital preservation tools: Case studies (if time): E-mail Websites Exercise
  • 3. Digital preservation challenges (1) Technical challenges Digital media Currently magnetic or optical tape and disks, some devices (e.g., memory sticks) Uncertain lifetimes Hardware and software dependence Most digital objects are dependent on particular configurations of hardware and software Relatively short obsolescence cycles
  • 4. Digital preservation challenges (2) Conceptual challenges: Three levels of information required: Physical layer – unusually a bitstream Logical layer – defines how to interpret the bitstream (through software) to generate meaningful information (e.g. ASCII, XML, file formats) Conceptual layer – real world objects Some are analogues of traditional objects, e.g. meeting minutes, research papers Others are not, e.g. Web pages, GIS, 3D models of chemical structures Complex and dynamic
  • 5. Digital preservation challenges (3) On which of the three layers should preservation activities focus? We need to preserve the ability to reproduce the objects, not just the bits In fact, we can change the bits and logical representation and still reproduce an ‘authentic’ conceptual object (e.g. by converting a text file into PDF or TIFF) Authenticity and integrity How can we trust that an object is what it claims to be? Digital information can easily be changed by accident or design
  • 6. Digital preservation basics An ongoing approach to managing digital content based on: The identification and adoption of appropriate preservation strategies Creation or Ingest stages are normally the best time to ensure that data are fit-for-purpose and “preservable” The collection and management of appropriate metadata Capture of explicit and implicit knowledge, contexts The ongoing monitoring of technical contexts and the application of preservation planning techniques Continual monitoring of the organisation (audit)
  • 7. OAIS Reference Model (1) Reference Model for an Open Archival Information System (OAIS) ISO 14721:2003 Space data and information transfer systems -- Open archival information system -- Reference model Defines: Common vocabulary (definitions of key concepts) Information model (information packages, metadata, etc.) Functional model (six functional entities) Mandatory responsibilities
  • 8. OAIS Reference Model (2) OAIS Mandatory Responsibilities: Negotiating and accepting information Obtaining sufficient control of the information to ensure long-term preservation Determining the "designated community" Ensuring that information is independently understandable, i.e. can be (re)used without the assistance of those who produced it Following documented policies and procedures Making the preserved information available
  • 9. OAIS Reference Model (3) Administration Ingest Archival Storage Access Data Management Descriptive info. PRODUCER CONSUMER MANAGEMENT queries result sets Descriptive info. Preservation Planning orders OAIS Functional Entities (Figure 4-1) SIP SIP SIP DIP DIP AIP AIP
  • 10. OAIS Reference Model (4) OAIS Information Model: Defines the “Information Packages” required Ingest (Submission Information Package) Storage (Archival Information Package) Access (Dissemination Information Package) General principle of Information Packages: All objects are wrapped in multiple layers of metadata (Representation Information, Descriptive Information, Packaging, etc.)
  • 11. OAIS Reference Model (5) Implementation fundamentals: OAIS is a reference model (a conceptual framework), NOT a blueprint for system design It informs the design of system architectures, the development of systems and components It provides common definitions of terms … a common language, a means of making comparison But it does NOT ensure consistency or interoperability between implementations Conformance only relates to mandatory responsibilities and following the information model
  • 12. The DCC Lifecycle Model Digital Curation: “… The activity of, managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and re-use” (Lord & MacDonald, 2003) DCC Digital Curation Lifecycle Model: Focused on the entire lifecycle of objects (influenced by records management and archives thinking) from creation, through appraisal, ingest, storage, to access and reuse Preservation activities at core of model …
  • 13.  
  • 14. Digital preservation principles (1) Most of the technical problems associated with long-term digital preservation can be solved if a life-cycle management approach is adopted i.e. a continual programme of active management Ideally, combines both managerial and technical processes, e.g., as in the OAIS Reference Model Many current preservation systems are attempting to support this approach Digital preservation strategies need to be seen in this wider context Wherever possible, retain also the original byte-stream
  • 15. Digital preservation principles (2) Preservation needs to be considered at a very early stage in an object's life-cycle There is a need to identify 'significant properties' Recognises that preservation is context dependent, even user specific (concept of 'designated community') “ Performance” model (National Archives of Australia) Helps with choosing an acceptable preservation strategy Encapsulation Surrounding the digital object - at least in theory - with all of the information needed to decode and understand it (including software)
  • 16. Digital preservation principles (3) Metadata and documentation is vitally important Relates to OAIS Information Model concepts like Representation Information and Preservation Description Information Functions Records meaning Records the context Enables the development of finding aids Specific standards are being developed that support digital preservation activities (e.g., the PREMIS Data Dictionary)
  • 17. Digital preservation strategies Technology preservation Maintaining technology Computer museums, digital archaeology Emulation Running original bit-streams and application software on emulator programs that mimic the behaviour of obsolete hardware and operating systems Migration Periodic transfer of digital information from one hardware and software configuration to another, or from one generation of computer technology to a subsequent one
  • 18. Choosing a strategy (1) Preservation strategies are not in competition Different strategies will work together, may be value in diversification Migration strategies mean difficult choices need to be made about target formats But the strategy chosen has implications for: The technical infrastructure required (and metadata) Collection management priorities Rights management Owning the rights to re-engineer software Costs
  • 19. Choosing a strategy (2) Plato preservation planning tool (EU Planets project) A decision support tool that helps users explore the evaluation of potential preservation solutions against specific requirements and for building a plan for preserving a given set of objects Integrates file format identification (using DROID); some migration services; XML-based generic format characterisation using XCL (eXtensible Characterisation Languages) http://www.ifs.tuwien.ac.at/dp/plato/intro.html
  • 20. Preservation support on ingest Formats can be identified and validated on ingest or deposit into a repository JHOVE (JSTOR/Harvard Object Validation Environment) PRONOM, DROID (The National Archives) Metadata Some tools exist for the automatic capture of metadata Standardisation on ingest Received wisdom suggests the adoption of open or non-proprietary standards, e.g. databases structured in XML, uncompressed images, 'preservation friendly' standards like PDF/A
  • 21. Repository audit frameworks Repository audit frameworks first developed out of the OAIS Reference Model OAIS Mandatory Responsibilities (only six of them): The main focus was on technical and organisational aspects, e.g.: That repositories ensure that preserved information (content) can be understood (independently understandable) That documented policies and procedures are being followed No clear concept of OAIS compliance (although this is often claimed by system developers)
  • 22. TRAC Criteria and Checklist (1) Trusted Repositories Audit and Certification (TRAC): Criteria and Checklist Background: Checklist developed by the RLG-NARA Digital Repository Certification Task Force Revised (following pilot audits) by the Center for Research Libraries and OCLC Based upon OAIS concepts
  • 23. TRAC Criteria and Checklist (2) TRAC criteria cover three main aspects: Organisational Infrastructure Governance and viability, structure and staffing, financial sustainability, contracts, etc. Digital Object Management Ingest, preservation planning, archival storage, etc. Technologies, Technical Infrastructure, & Security Systems and infrastructure, etc.
  • 25. DRAMBORA DRAMBORA (Digital Repository Audit Method Based on Risk Assessment) Digital Curation Centre / Digital Preservation Europe “ Presents a methodology for self-assessment, encouraging organisations to establish a comprehensive self-awareness of their objectives, activities and assets before identifying, assessing and managing the risks implicit within their organisation“ Identifying risks and scoring each one on likelihood and impact Covers: organisational context, policies, assets, risks, etc. Online tool (http://www.repositoryaudit.eu/about/)
  • 26. Repository audit frameworks A means of "asking the right questions" about your repository and documenting appropriate procedures and risks Both TRAC and DRAMBORA are under consideration by (different) ISO technical committees External badge of quality (a "certified preservation repository") vs. Management tool for self assessment
  • 27. Case study 1: E-mail preservation Electronic Mail Now ubiquitous in many business contexts A mixture of records and other stuff High-risk if not managed properly: Loss of accountability, efficiency, public credibility, organisational memory, etc. There also may be legal and financial consequences An obvious candidate for the records management approach
  • 28. Some specific challenges of E-mail Inappropriate content For example: spam, personal messages, illegal content Wide range of attachment types – some will provide preservation challenges of their own Unclear responsibilities: Users can be reluctant to ‘manage’ incoming mail E-mail seen as personal domain, not as organisational property ... this can have consequences …
  • 29.  
  • 30. "All staff will be reminded of the appropriate use of Number 10 resources" – Downing Street spokesperson
  • 31.  
  • 32. “ The unfortunate incident that has taken place through the illegal hacking of the private communications of individual scientists …” (Rajendra Pachauri, Chairman of the UN Intergovernmental Panel on Climate Change, statement, 4 Dec 2009, http://www.ipcc.ch/) “ Since emails are normally intended to be private, people writing them are, shall we say, somewhat freer in expressing themselves than they would in a public statement” (RealClimate Web pages, http://www.realclimate.org/)
  • 33. Approaches to managing e-mail Developing specific policies for managing email within an organisation Produce guidance for creators (and others) Identify the chain of custody through lifecycle Need to involve all people involved, e.g. creators, managers, records managers, IT staff, etc. Developing a preservation approach Appraisal - the identification of key e-mail content or records Preservation strategies – the adoption of suitable strategies to deal with that content that needs to be retained
  • 34. E-mail policies (1) Policies need to cover: Creation practices Using business e-mail accounts for private use & vice versa Levels of organisational monitoring Legal issues Integrated records retention and preservation Disposal
  • 35. E-mail policies (2) From: http://www.hm-treasury.gov.uk/about_record_mngmnt_pol.htm
  • 36. E-mail preservation Appraisal Determining what content needs to be preserved Destruction of transient/unnecessary e-mails Saving e-mail records independently of the e-mail client Check that content is complete - comprising message body, headers & attachments Consider authenticity requirements Ingest into an organisational EDRMS or repository Make decisions on appropriate preservation strategies for content and attachments Selecting a standard format? Significant properties?
  • 37. Lost e-mails from the past The world’s very first network email Sent by Ray Tomlinson (BBN Technologies), late 1971 A test message, probably something like “QWERTYUIOP” (documented, but not preserved – the contents were “entirely forgettable, and I have, therefore, forgotten them”) First ‘real’ message explained to colleagues how to send messages over the network (exact text now unknown) Probably no significant records management implications, but a key step in the historical development of the Internet was not recorded
  • 38. Case study 2: Preserving Websites Websites are ubiquitous: “ The Web has become the platform and interface of choice for virtually every kind of information system” (JISC-PoWR Handbook) Typically run by IT staff (e.g., Web managers), main responsibilities relate to keeping systems online, stable and secure, and up-to-date … content is constantly evolving Potential role for records managers to identify which parts of institutional Websites need to be incorporated within RM guidelines
  • 39. Preserving Websites (2) Things to consider: The identification / appraisal of Web records Change frequency Ownership and rights Databases and the “deep Web” The use of Content Management Systems (CMS) Streamed content The use of third-party sites Personalisation / Web 2.0 / social networking
  • 40. Preserving Websites (3) Collection approaches: Various harvesting tools exist (e.g. Heritrix) Domain harvesting, selective capture, periodic capture Working with third parties – e.g.: European Archive (http://www.europarchive.org/) Internet Archive (http://www.archive.org/) Some examples of existing initiatives: UK Government Web Archive (TNA): http://www.nationalarchives.gov.uk/webarchive/ UK Web Archive (BL, JISC, Wellcome Library, NLW) http://www.webarchive.org.uk/ukwa/
  • 41. Preserving Websites (4) Aspects of Websites that could be preserved: Information Content Information Appearance Information Behaviour Information Relationships (e.g. links, embedded or linked metadata) Change history Use history From: Kevin Ashley (ULCC), “The JISC-PoWR Handbook - Explaining Web Preservation,” via SlideShare: http://bit.ly/7GyJbd
  • 43. Further reading (1) General: Abby Smith, "The Research Library in the 21st Century: Collecting, Preserving, and Making Accessible Resources for Scholarship." In: No Brief Candle: Reconceiving Research Libraries for the 21st Century (CLIR, 2008), pp. 13-20. http://www.clir.org/pubs/abstract/pub142abst.html Priscilla Caplan, Understanding PREMIS (Library of Congress, 2009): http://www.loc.gov/standards/premis/understanding-premis.pdf Blue Riband Task Force on Sustainable Digital Preservation and Access, Sustainable economics for a digital planet (2010): http://brtf.sdsc.edu/ Paradigm Project Workbook: http://www.paradigm.ac.uk/workbook/ Plato Preservation Planning tool: http://www.ifs.tuwien.ac.at/dp/plato/intro.html DRAMBORA: http://www.repositoryaudit.eu/about/
  • 44. Further reading (2) Preserving Emails: Maureen Pennock, “Curating E-mails,” In: DCC Curation Manual (2006): http://www.dcc.ac.uk/resource/curation-manual/chapters/curating-e-mails/ The National Archives, Developing a policy for managing e-mail (2004): http://www.nationalarchives.gov.uk/documents/managing_emails.pdf Collaborative Electronic Records Project, Email records guidance (Smithsonian Institution Archives & Rockefeller Archives Center, 2007): http://siarchives.si.edu/pdf/CERP_Email_guidance_supp_0307.pdf
  • 45. Further reading (3) Preserving Websites: JISC-PoWR Handbook (Nov 2008): http://jiscpowr.jiscinvolve.org/handbook/ JISC-PoWR blog: http://jiscpowr.jiscinvolve.org/ The National Archives - Web Continuity project: http://www.nationalarchives.gov.uk/webcontinuity/ Adrian Brown, Archiving Websites: a practical guide for information management professionals (London: Facet Publishing, 2006) Julien Masanès (ed.), Web Archiving (Berlin: Springer-Verlag, 2006)
  • 46. Acknowledgments UKOLN is funded by the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, the Museums, Libraries and Archives Council (MLA), as well as by project funding from the JISC, the European Union, and other sources. UKOLN also receives support from the University of Bath, where it is based. More information: http://www.ukoln.ac.uk/

Editor's Notes

  • #5: Reference: Thibodeau, K. (2002)."Overview of technological approaches to digital preservation and challenges in coming years." In: The state of digital preservation: an international perspective . Washington, D.C.: Council for Library and Information Resources. Available: http://www.clir.org/pubs/abstract/pub107abst.html
  • #8: References: CCSDS 650.0-B-1. (2002). Reference model for an Open Archival Information System (OAIS): http://www.ccsds.org/documents/650x0b1.pdf ISO 14721:2003. Space data and information transfer systems -- Open archival information system -- Reference model. Geneva: International Organization for Standardization.
  • #16: References: Nelson, M.L. (2001). "Buckets: a new digital library technology for preserving NASA research." Journal of Government Information , 28(4), 369-394. http://www.cs.odu.edu/~mln/pubs/jgi/jgi-eprint.pdf Universal Preservation Format: http://info.wgbh.org/upf/
  • #17: References: Nelson, M.L. (2001). "Buckets: a new digital library technology for preserving NASA research." Journal of Government Information , 28(4), 369-394. http://www.cs.odu.edu/~mln/pubs/jgi/jgi-eprint.pdf Universal Preservation Format: http://info.wgbh.org/upf/