SlideShare a Scribd company logo
An Introduction toAn Introduction to
Digital PreservationDigital Preservation
at the Library of Congressat the Library of Congress
Leslie Johnston
Library of Congress
2
NDIIPPNDIIPP
National Digital Information InfrastructureNational Digital Information Infrastructure
and Preservation Programand Preservation Program
MISSION: Ensure access over time to a rich
body of digital content through establishment of a
national network of partners committed to
selecting, collecting and preserving at-risk digital
information.
http://www.digitalpreservation.gov/
3
NDIIPPNDIIPP
Learn By Doing
Catalyze Activity
Support Collaboration
4
NDIIPP Focus AreasNDIIPP Focus Areas
Digital Content
Partnerships:
Government, Industry,
Academia
Technical Infrastructure
Education
5
Access Drives PreservationAccess Drives Preservation
6
There are Important Non-Technical IssuesThere are Important Non-Technical Issues
Legal: intellectual property, copyright, privacy, nationalLegal: intellectual property, copyright, privacy, national
security classificationsecurity classification
Collaboration: new models needed for institutions,Collaboration: new models needed for institutions,
communities to work togethercommunities to work together
Institutional culture: staff need new skills, new policiesInstitutional culture: staff need new skills, new policies
need to be made, leaders need to integrate analog andneed to be made, leaders need to integrate analog and
digitaldigital
Cost: many cost variables; economic sustainability is anCost: many cost variables; economic sustainability is an
issueissue
7
Digital Content can be Copyrighted,Digital Content can be Copyrighted,
Private, ConfidentialPrivate, Confidential
Societal norms and expectations for privacy areSocietal norms and expectations for privacy are
shiftingshifting
 Especially on the InternetEspecially on the Internet
Data mining and other techniques allow for newData mining and other techniques allow for new
kinds of access and new policieskinds of access and new policies
 Email – public and personalEmail – public and personal
 Personal digital archives in specialPersonal digital archives in special
collectionscollections
8
Economic IssuesEconomic Issues
http://ncdd.nl/en/document/EnglishSummary.pdf
http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf
Hard to know the ongoing costs for digital
preservation, lots of variables
Institutions often need to support the
preservation of analog and digital
collections with tight budgets
Demonstrate value of preserved digital
content through use and reuse
9
Organizational IssuesOrganizational Issues
http://ncdd.nl/en/document/EnglishSummary.pdf
Digital preservation is a big challenge
New models are needed for institutions,
communities to work together
Preservationists need to be involved
much earlier in the lifecycle of a digital
object
A variety of new skills and training
opportunities are needed.
10
Examples of Digital Preservation InitiativesExamples of Digital Preservation Initiatives
Open Planets FoundationOpen Planets Foundation
 European project using a solution adopted by national
heritage organizations and others
National Archives and Records AdministrationNational Archives and Records Administration
 Developing Electronic Records Archives system to meet
federal records management and archival needs
National Library of New ZealandNational Library of New Zealand
 Developing National Digital Heritage Archive for digital
collections
International Internet Preservation ConsortiumInternational Internet Preservation Consortium
 Group of national libraries and other organizations
collaborating in web content preservation and developing
common tools
11
What are examplesWhat are examples
of some of theof some of the
collecting andcollecting and
preservationpreservation
challenges at thechallenges at the
Library ofLibrary of
Congress?Congress?
12
National DigitalNational Digital
Newspaper ProgramNewspaper Program
chroniclingamerica.loc.gov/chroniclingamerica.loc.gov/
Some researchers want to search for stories in historic
newspapers.
Some researchers want to mine newspaper OCR for trends
across time periods and geographic areas.
Requests have come in to analyze all 6 million pages.
The site gets approximately 5 million views per day.
The program has:
 Multiple producers (25 now, ultimately 54)
 Free and open public access
 APIs for machine access and automated processes
Files
 TIFFs, JPEGs, JPEG 2000s, and XML.
 Over 6 million newspaper pages ingested to date
 Over 250 Tb of data
13
eDeposit for eSerialseDeposit for eSerials
eDeposit for eSerials is a collaborative effort between
the U.S. Copyright Office and the Library of
Congress.
Copyright Mandatory Deposit represents the largest
acquisitions channel for the Library. In general, all
U.S. publishers are legally required to submit for
deposit two copies of each of their publications to
the Copyright Office. This mechanism has allowed
the Library to build the collection and to preserve the
publications.
eSerials became subject to mandatory deposit in
January 2010, with the publication of a new interim
regulation. Demands began in June 2010 and files
began to arrive in October 2010.
The files must come to the Library “as published” – in
whatever their original formats are. This means a
wide variety of XML content and metadata, HTML,
and PDFs. We have received 49 different file
extensions…so far.
14
Packard Campus NationalPackard Campus National
Audio-Visual CenterAudio-Visual Center
Preserving Film, Broadcast Television, and
Audio
The Packard Campus is a variety of preservation
workflows, including those for obsolete physical
formats such as wire recordings, wax cylinders, and
2“ videotape. The Campus is fully equipped to play
back and preserve all antique film, video and sound
formats, and to maintain that capability far into the
future.
The facility also handles born-digital video and audio
received directly from producers.
The formats include MPEG-4, MP3, BWF, AVI, and a
wide variety of specialized commercial formats.
Over 3.5 PB of files.
15
Web ArchivingWeb Archiving
http://www.loc.gov/webarchiving/http://www.loc.gov/webarchiving/
lcweb2.loc.gov/diglib/lcwa/html/lcwa-home.htmllcweb2.loc.gov/diglib/lcwa/html/lcwa-home.html
The Library has been archiving the web since 2000. Subject area
specialists curate the collections, and Library catalogers create collection-
level metadata records.
Websites are complex objects
multiple formats
interrelated elements
distributed authors
ownership is not transparent
The concept of publishing on the Web doesn’t match with legal definition
The volume of content is immense
Website publishing technology is constantly changing
When we began archiving election web sites, we imagined users browsing
through the web pages. But when our first researchers came to the Library,
they wanted to mine the collections
Files
 Every format possible on the web
 Approximately 7 billion files
 Over 400 TB
16
The Twitter ArchiveThe Twitter Archive
Every public tweet since Twitter’s launch in March
2006.
The Library’s researcher services will not recreate
twitter, and cannot be openly accessible.
Research requests have included users looking for
their own Twitter history, the study of the
geographic spread of news, the study of the
spread of epidemics, and the study of the
transmission of new uses of language.
The collection comprises only a few TB, but over
10s of billions of tweets.
A White Paper is available at
http://blogs.loc.gov/loc/2013/01/update-on-the-twitter-arcstatus
privacy
commercial
personal
events
social media
visualization
social
science
17
Libraries/archives/museums have reasons to engage
with individuals about personal digital preservation
May bring in personal digital collections
Raise institutional visibility
Answer patron questions
Guidance for the general public on saving their
digital stuff: documents, photos, music, video,
email, websites etc.
Public Events
Further How-to’s and tutorials
“Personal Archiving Day”
http://www.digitalpreservation.gov/personalarchiving/http://www.digitalpreservation.gov/personalarchiving/
Personal Digital ArchivingPersonal Digital Archiving
18
What are some of theWhat are some of the
technological challenges oftechnological challenges of
managing and preservingmanaging and preserving
large digital collections inlarge digital collections in
many formats, and makingmany formats, and making
them available for re-use?them available for re-use?
19
Sheer amount.Sheer amount.
Huge variation in file formats.Huge variation in file formats.
Unclear and undocumented rights.Unclear and undocumented rights.
SecuritySecurity
Missing metadata.Missing metadata.
Data citation and identifier issues.Data citation and identifier issues.
Discovery expectations: discovery across collections andDiscovery expectations: discovery across collections and
institutions together.institutions together.
Cost.Cost.
20
I will mention infrastructure only in passingI will mention infrastructure only in passing
There are scale issues related to:
Bandwidth
Storage
Backup and tape archiving
Software development
Staffing for processing
21
Preservation ArchitecturePreservation Architecture
There is no national preservation architecture, system, or storage
backend.
Highly variable institution by institution, but commonalities in backend
repository systems, ingest models, and discovery models.
Community- and discipline-based repositories, often with an unclear
relationship to libraries or archives.
Multiple methods for certifying the trust level for a repository.
Agreed upon protocols and mechanisms for the transfer of files, but no
single standard for the interchange of files and metadata between
environments.
Synchronization and versioning are not just a technical challenge; it
complicates management and preservation and access.
22
And at the Library of Congress?And at the Library of Congress?
The Library has an active digital reformatting program across all formats.
The Library is currently modifying its preservation and collection security policies
around digital collections.
The Library has repository services that inventory its file assets and maintains
multiple copies of files on servers and on tape, in geographically distributed
locations.
The Library developed the BagIt transfer specification for the movement of files
between and within organizations.
 http://www.digitalpreservation.gov/documents/bagitspec.pdf
The Library has documented sustainability factors for file formats.
 http://www.digitalpreservation.gov/formats/
For cases where we do have control over what comes in, we have a “Best Edition”
Preferred Formats statement, which is currently being updated.
http://www.copyright.gov/circs/circ07b.pdf
23
What are the Library’s strategiesWhat are the Library’s strategies
for formats?for formats?
The Library has documented sustainability factors for file
formats.
http://www.digitalpreservation.gov/formats/
For cases where we do have control over what comes in,
we have a “Best Edition” Preferred Formats statement,
which is currently being updated.
http://www.copyright.gov/circs/circ07b.pdf
The Library is ready to start developing Digital Format
Preservation Action Plans.
24
What are the Digital PreservationWhat are the Digital Preservation
Services?Services?
We must develop sufficient infrastructure for distributed, replicated preservationWe must develop sufficient infrastructure for distributed, replicated preservation
storage.storage.
We will spend an increasing amount of time auditing our files and storage toWe will spend an increasing amount of time auditing our files and storage to
ensure that no issues have arisen.ensure that no issues have arisen.
We may need to process all files to create a variety of derivatives that are moreWe may need to process all files to create a variety of derivatives that are more
sustainable, and that might be required for various forms of use and analysissustainable, and that might be required for various forms of use and analysis
before ingesting them and providing access.before ingesting them and providing access.
We must develop sufficient infrastructure to support large scale discovery.We must develop sufficient infrastructure to support large scale discovery.
We are comfortable with self-service through the institutional repository model,We are comfortable with self-service through the institutional repository model,
but can libraries ingest, manage and provide access to an increasing numberbut can libraries ingest, manage and provide access to an increasing number
of digital collections without any mediation?of digital collections without any mediation?
We are providing quite a bit of guidance to researchers on digital preservationWe are providing quite a bit of guidance to researchers on digital preservation
standards and personal digital preservation.standards and personal digital preservation.
25
And where are theAnd where are the
digital preservationdigital preservation
innovations?innovations?
26
The Cloud is aThe Cloud is a
supplement – NOT asupplement – NOT a
replacement – for localreplacement – for local
preservation storagepreservation storage
resources.resources.
27
In content characterizationIn content characterization
tools, such as JHOVE andtools, such as JHOVE and
DROID and FITS, so we canDROID and FITS, so we can
understand the risks inherent inunderstand the risks inherent in
the files in our collections.the files in our collections.
28
In the adaptation and use ofIn the adaptation and use of
forensics tools for the creationforensics tools for the creation
of complete and authenticof complete and authentic
copies of unique digital media.copies of unique digital media.
29
In virtualization and emulationIn virtualization and emulation
technologies used to recreatetechnologies used to recreate
environments needs for digitalenvironments needs for digital
preservation and for access.preservation and for access.
30
PreservationPreservation
Partnerships are aPartnerships are a
Necessary InnovationNecessary Innovation
The Library cannot collect everything on
its own, so works as part of:
The National Digital Stewardship Alliance
http://www.digitalpreservation.gov/ndsa/
The International Internet Preservation
Consortium http://netpreserve.org/about/index.php
among others…
31
What is Success for any DigitalWhat is Success for any Digital
Preservation Initiative?Preservation Initiative?
 Success must be measured inSuccess must be measured in
concrete goals and deliverables thatconcrete goals and deliverables that
are widely and openly distributed.are widely and openly distributed.
 Success is also measured inSuccess is also measured in
enthusiasm, participation, and inenthusiasm, participation, and in
adoption by the community.adoption by the community.
32
SummarySummary
Digital information presents tough issues in terms ofDigital information presents tough issues in terms of
preservation and accesspreservation and access
Libraries and archives must address these issues evenLibraries and archives must address these issues even
though there are no ideal solutions and some openthough there are no ideal solutions and some open
questionsquestions
Progress is evident though the application of sharedProgress is evident though the application of shared
conceptsconcepts
Initiatives are underway around the world testing differentInitiatives are underway around the world testing different
approaches to preservationapproaches to preservation
There are a number of significant non-technical issuesThere are a number of significant non-technical issues
Digital preservation is also relevant on the personal levelDigital preservation is also relevant on the personal level
33http://www.digitalpreservation.gov/formats/index.shtml
The Library of Congress “Sustainability of Digital
Formats” site, which analyzes the preservation merits
of a variety of digital file formats.
NDIIPP Digital Preservation OutreachNDIIPP Digital Preservation Outreach
34
NDIIPP Digital Preservation OutreachNDIIPP Digital Preservation Outreach
http://www.digitalpreservation.gov
The Library of Congress Digital Preservation web site
35http://blogs.loc.gov/digitalpreservation
The NDIIPP blog “The Signal”: Where we post, and
discuss, the many issues, news items and project
updates about digital preservation and library
technology, both inside and outside of the Library of
Congress.
NDIIPP Digital Preservation OutreachNDIIPP Digital Preservation Outreach
36
Leslie JohnstonLeslie Johnston
Library of CongressLibrary of Congress
lesliej@loc.govlesliej@loc.gov

More Related Content

PPT
Personal Digital Archiving Initiatives at the Library of Congress
PPT
An Introduction to Digital Preservation
PPTX
Data preservation
PPT
Digital preservation
ZIP
Digital Preservation
PPTX
Digital Preservation Best Practices: Lessons Learned From Across the Pond
PPTX
Digital preservation: an introduction
PPTX
Preparation, Proceed and Review of preservation of Digital Library
Personal Digital Archiving Initiatives at the Library of Congress
An Introduction to Digital Preservation
Data preservation
Digital preservation
Digital Preservation
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital preservation: an introduction
Preparation, Proceed and Review of preservation of Digital Library

What's hot (19)

PDF
Intro to Digital Preservation
PPT
Brief Introduction to Digital Preservation
PPT
Digital Preservation
PPT
Digital preservation
PDF
Digital preservation: an introduction
PPT
Digital Preservation
PPT
Digital Libray
PDF
Digital preservation from a records management perspective
PPT
Digital Preservation
PDF
Digital Preservation in the Wild
PPTX
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
PPSX
PPTX
Isi5102 presentation
PPTX
Digital preservation and curation of information.presentation
PPT
Digital Library Initiatives in Philippine Academic Libraries: the Rizal Libra...
PPT
Digital library
PPTX
Clare Lanigan - DRI Training Series: 3. Understanding Copyright
PDF
Open Source Software for Digital Preservation Repositories : A Survey
Intro to Digital Preservation
Brief Introduction to Digital Preservation
Digital Preservation
Digital preservation
Digital preservation: an introduction
Digital Preservation
Digital Libray
Digital preservation from a records management perspective
Digital Preservation
Digital Preservation in the Wild
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Isi5102 presentation
Digital preservation and curation of information.presentation
Digital Library Initiatives in Philippine Academic Libraries: the Rizal Libra...
Digital library
Clare Lanigan - DRI Training Series: 3. Understanding Copyright
Open Source Software for Digital Preservation Repositories : A Survey
Ad

Similar to An Introduction to digital preservation at the Library of Congress (20)

PDF
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
PPT
Digital Preservation and Social Media Outreach
PPT
digital Preservation
PPT
Cultural Heritage Insitutions and Big Data Collections
PDF
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
PPTX
Preservation for 21st Century Library Collections
PDF
Corrado -- Establishing the Landscape
PDF
Save this book: posterity’s challenge
PPTX
Electronic Records
PPT
Digital Preservation
PDF
Save This Book
PPTX
Carl idigpres
PPTX
CARLIdigpres
PDF
Davis Digital Preservation and the Web: Challenges for Libraries
PDF
(Jan 2011) Digital Curation (Guest Lecture)
PPT
Web and Twitter Archiving at the Library of Congress
PDF
Leslie Johnston Keynote, Best Practices Exchange 2011
PDF
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
PDF
The digital future of the past and present
PPT
148 john shaw2006fall
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012
Digital Preservation and Social Media Outreach
digital Preservation
Cultural Heritage Insitutions and Big Data Collections
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Preservation for 21st Century Library Collections
Corrado -- Establishing the Landscape
Save this book: posterity’s challenge
Electronic Records
Digital Preservation
Save This Book
Carl idigpres
CARLIdigpres
Davis Digital Preservation and the Web: Challenges for Libraries
(Jan 2011) Digital Curation (Guest Lecture)
Web and Twitter Archiving at the Library of Congress
Leslie Johnston Keynote, Best Practices Exchange 2011
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
The digital future of the past and present
148 john shaw2006fall
Ad

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Modernizing your data center with Dell and AMD
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
KodekX | Application Modernization Development
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Monthly Chronicles - July 2025
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation theory and applications.pdf
MYSQL Presentation for SQL database connectivity
Modernizing your data center with Dell and AMD
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
KodekX | Application Modernization Development
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Per capita expenditure prediction using model stacking based on satellite ima...
Building Integrated photovoltaic BIPV_UPV.pdf
The AUB Centre for AI in Media Proposal.docx
Understanding_Digital_Forensics_Presentation.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

An Introduction to digital preservation at the Library of Congress

  • 1. An Introduction toAn Introduction to Digital PreservationDigital Preservation at the Library of Congressat the Library of Congress Leslie Johnston Library of Congress
  • 2. 2 NDIIPPNDIIPP National Digital Information InfrastructureNational Digital Information Infrastructure and Preservation Programand Preservation Program MISSION: Ensure access over time to a rich body of digital content through establishment of a national network of partners committed to selecting, collecting and preserving at-risk digital information. http://www.digitalpreservation.gov/
  • 3. 3 NDIIPPNDIIPP Learn By Doing Catalyze Activity Support Collaboration
  • 4. 4 NDIIPP Focus AreasNDIIPP Focus Areas Digital Content Partnerships: Government, Industry, Academia Technical Infrastructure Education
  • 6. 6 There are Important Non-Technical IssuesThere are Important Non-Technical Issues Legal: intellectual property, copyright, privacy, nationalLegal: intellectual property, copyright, privacy, national security classificationsecurity classification Collaboration: new models needed for institutions,Collaboration: new models needed for institutions, communities to work togethercommunities to work together Institutional culture: staff need new skills, new policiesInstitutional culture: staff need new skills, new policies need to be made, leaders need to integrate analog andneed to be made, leaders need to integrate analog and digitaldigital Cost: many cost variables; economic sustainability is anCost: many cost variables; economic sustainability is an issueissue
  • 7. 7 Digital Content can be Copyrighted,Digital Content can be Copyrighted, Private, ConfidentialPrivate, Confidential Societal norms and expectations for privacy areSocietal norms and expectations for privacy are shiftingshifting  Especially on the InternetEspecially on the Internet Data mining and other techniques allow for newData mining and other techniques allow for new kinds of access and new policieskinds of access and new policies  Email – public and personalEmail – public and personal  Personal digital archives in specialPersonal digital archives in special collectionscollections
  • 8. 8 Economic IssuesEconomic Issues http://ncdd.nl/en/document/EnglishSummary.pdf http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf Hard to know the ongoing costs for digital preservation, lots of variables Institutions often need to support the preservation of analog and digital collections with tight budgets Demonstrate value of preserved digital content through use and reuse
  • 9. 9 Organizational IssuesOrganizational Issues http://ncdd.nl/en/document/EnglishSummary.pdf Digital preservation is a big challenge New models are needed for institutions, communities to work together Preservationists need to be involved much earlier in the lifecycle of a digital object A variety of new skills and training opportunities are needed.
  • 10. 10 Examples of Digital Preservation InitiativesExamples of Digital Preservation Initiatives Open Planets FoundationOpen Planets Foundation  European project using a solution adopted by national heritage organizations and others National Archives and Records AdministrationNational Archives and Records Administration  Developing Electronic Records Archives system to meet federal records management and archival needs National Library of New ZealandNational Library of New Zealand  Developing National Digital Heritage Archive for digital collections International Internet Preservation ConsortiumInternational Internet Preservation Consortium  Group of national libraries and other organizations collaborating in web content preservation and developing common tools
  • 11. 11 What are examplesWhat are examples of some of theof some of the collecting andcollecting and preservationpreservation challenges at thechallenges at the Library ofLibrary of Congress?Congress?
  • 12. 12 National DigitalNational Digital Newspaper ProgramNewspaper Program chroniclingamerica.loc.gov/chroniclingamerica.loc.gov/ Some researchers want to search for stories in historic newspapers. Some researchers want to mine newspaper OCR for trends across time periods and geographic areas. Requests have come in to analyze all 6 million pages. The site gets approximately 5 million views per day. The program has:  Multiple producers (25 now, ultimately 54)  Free and open public access  APIs for machine access and automated processes Files  TIFFs, JPEGs, JPEG 2000s, and XML.  Over 6 million newspaper pages ingested to date  Over 250 Tb of data
  • 13. 13 eDeposit for eSerialseDeposit for eSerials eDeposit for eSerials is a collaborative effort between the U.S. Copyright Office and the Library of Congress. Copyright Mandatory Deposit represents the largest acquisitions channel for the Library. In general, all U.S. publishers are legally required to submit for deposit two copies of each of their publications to the Copyright Office. This mechanism has allowed the Library to build the collection and to preserve the publications. eSerials became subject to mandatory deposit in January 2010, with the publication of a new interim regulation. Demands began in June 2010 and files began to arrive in October 2010. The files must come to the Library “as published” – in whatever their original formats are. This means a wide variety of XML content and metadata, HTML, and PDFs. We have received 49 different file extensions…so far.
  • 14. 14 Packard Campus NationalPackard Campus National Audio-Visual CenterAudio-Visual Center Preserving Film, Broadcast Television, and Audio The Packard Campus is a variety of preservation workflows, including those for obsolete physical formats such as wire recordings, wax cylinders, and 2“ videotape. The Campus is fully equipped to play back and preserve all antique film, video and sound formats, and to maintain that capability far into the future. The facility also handles born-digital video and audio received directly from producers. The formats include MPEG-4, MP3, BWF, AVI, and a wide variety of specialized commercial formats. Over 3.5 PB of files.
  • 15. 15 Web ArchivingWeb Archiving http://www.loc.gov/webarchiving/http://www.loc.gov/webarchiving/ lcweb2.loc.gov/diglib/lcwa/html/lcwa-home.htmllcweb2.loc.gov/diglib/lcwa/html/lcwa-home.html The Library has been archiving the web since 2000. Subject area specialists curate the collections, and Library catalogers create collection- level metadata records. Websites are complex objects multiple formats interrelated elements distributed authors ownership is not transparent The concept of publishing on the Web doesn’t match with legal definition The volume of content is immense Website publishing technology is constantly changing When we began archiving election web sites, we imagined users browsing through the web pages. But when our first researchers came to the Library, they wanted to mine the collections Files  Every format possible on the web  Approximately 7 billion files  Over 400 TB
  • 16. 16 The Twitter ArchiveThe Twitter Archive Every public tweet since Twitter’s launch in March 2006. The Library’s researcher services will not recreate twitter, and cannot be openly accessible. Research requests have included users looking for their own Twitter history, the study of the geographic spread of news, the study of the spread of epidemics, and the study of the transmission of new uses of language. The collection comprises only a few TB, but over 10s of billions of tweets. A White Paper is available at http://blogs.loc.gov/loc/2013/01/update-on-the-twitter-arcstatus privacy commercial personal events social media visualization social science
  • 17. 17 Libraries/archives/museums have reasons to engage with individuals about personal digital preservation May bring in personal digital collections Raise institutional visibility Answer patron questions Guidance for the general public on saving their digital stuff: documents, photos, music, video, email, websites etc. Public Events Further How-to’s and tutorials “Personal Archiving Day” http://www.digitalpreservation.gov/personalarchiving/http://www.digitalpreservation.gov/personalarchiving/ Personal Digital ArchivingPersonal Digital Archiving
  • 18. 18 What are some of theWhat are some of the technological challenges oftechnological challenges of managing and preservingmanaging and preserving large digital collections inlarge digital collections in many formats, and makingmany formats, and making them available for re-use?them available for re-use?
  • 19. 19 Sheer amount.Sheer amount. Huge variation in file formats.Huge variation in file formats. Unclear and undocumented rights.Unclear and undocumented rights. SecuritySecurity Missing metadata.Missing metadata. Data citation and identifier issues.Data citation and identifier issues. Discovery expectations: discovery across collections andDiscovery expectations: discovery across collections and institutions together.institutions together. Cost.Cost.
  • 20. 20 I will mention infrastructure only in passingI will mention infrastructure only in passing There are scale issues related to: Bandwidth Storage Backup and tape archiving Software development Staffing for processing
  • 21. 21 Preservation ArchitecturePreservation Architecture There is no national preservation architecture, system, or storage backend. Highly variable institution by institution, but commonalities in backend repository systems, ingest models, and discovery models. Community- and discipline-based repositories, often with an unclear relationship to libraries or archives. Multiple methods for certifying the trust level for a repository. Agreed upon protocols and mechanisms for the transfer of files, but no single standard for the interchange of files and metadata between environments. Synchronization and versioning are not just a technical challenge; it complicates management and preservation and access.
  • 22. 22 And at the Library of Congress?And at the Library of Congress? The Library has an active digital reformatting program across all formats. The Library is currently modifying its preservation and collection security policies around digital collections. The Library has repository services that inventory its file assets and maintains multiple copies of files on servers and on tape, in geographically distributed locations. The Library developed the BagIt transfer specification for the movement of files between and within organizations.  http://www.digitalpreservation.gov/documents/bagitspec.pdf The Library has documented sustainability factors for file formats.  http://www.digitalpreservation.gov/formats/ For cases where we do have control over what comes in, we have a “Best Edition” Preferred Formats statement, which is currently being updated. http://www.copyright.gov/circs/circ07b.pdf
  • 23. 23 What are the Library’s strategiesWhat are the Library’s strategies for formats?for formats? The Library has documented sustainability factors for file formats. http://www.digitalpreservation.gov/formats/ For cases where we do have control over what comes in, we have a “Best Edition” Preferred Formats statement, which is currently being updated. http://www.copyright.gov/circs/circ07b.pdf The Library is ready to start developing Digital Format Preservation Action Plans.
  • 24. 24 What are the Digital PreservationWhat are the Digital Preservation Services?Services? We must develop sufficient infrastructure for distributed, replicated preservationWe must develop sufficient infrastructure for distributed, replicated preservation storage.storage. We will spend an increasing amount of time auditing our files and storage toWe will spend an increasing amount of time auditing our files and storage to ensure that no issues have arisen.ensure that no issues have arisen. We may need to process all files to create a variety of derivatives that are moreWe may need to process all files to create a variety of derivatives that are more sustainable, and that might be required for various forms of use and analysissustainable, and that might be required for various forms of use and analysis before ingesting them and providing access.before ingesting them and providing access. We must develop sufficient infrastructure to support large scale discovery.We must develop sufficient infrastructure to support large scale discovery. We are comfortable with self-service through the institutional repository model,We are comfortable with self-service through the institutional repository model, but can libraries ingest, manage and provide access to an increasing numberbut can libraries ingest, manage and provide access to an increasing number of digital collections without any mediation?of digital collections without any mediation? We are providing quite a bit of guidance to researchers on digital preservationWe are providing quite a bit of guidance to researchers on digital preservation standards and personal digital preservation.standards and personal digital preservation.
  • 25. 25 And where are theAnd where are the digital preservationdigital preservation innovations?innovations?
  • 26. 26 The Cloud is aThe Cloud is a supplement – NOT asupplement – NOT a replacement – for localreplacement – for local preservation storagepreservation storage resources.resources.
  • 27. 27 In content characterizationIn content characterization tools, such as JHOVE andtools, such as JHOVE and DROID and FITS, so we canDROID and FITS, so we can understand the risks inherent inunderstand the risks inherent in the files in our collections.the files in our collections.
  • 28. 28 In the adaptation and use ofIn the adaptation and use of forensics tools for the creationforensics tools for the creation of complete and authenticof complete and authentic copies of unique digital media.copies of unique digital media.
  • 29. 29 In virtualization and emulationIn virtualization and emulation technologies used to recreatetechnologies used to recreate environments needs for digitalenvironments needs for digital preservation and for access.preservation and for access.
  • 30. 30 PreservationPreservation Partnerships are aPartnerships are a Necessary InnovationNecessary Innovation The Library cannot collect everything on its own, so works as part of: The National Digital Stewardship Alliance http://www.digitalpreservation.gov/ndsa/ The International Internet Preservation Consortium http://netpreserve.org/about/index.php among others…
  • 31. 31 What is Success for any DigitalWhat is Success for any Digital Preservation Initiative?Preservation Initiative?  Success must be measured inSuccess must be measured in concrete goals and deliverables thatconcrete goals and deliverables that are widely and openly distributed.are widely and openly distributed.  Success is also measured inSuccess is also measured in enthusiasm, participation, and inenthusiasm, participation, and in adoption by the community.adoption by the community.
  • 32. 32 SummarySummary Digital information presents tough issues in terms ofDigital information presents tough issues in terms of preservation and accesspreservation and access Libraries and archives must address these issues evenLibraries and archives must address these issues even though there are no ideal solutions and some openthough there are no ideal solutions and some open questionsquestions Progress is evident though the application of sharedProgress is evident though the application of shared conceptsconcepts Initiatives are underway around the world testing differentInitiatives are underway around the world testing different approaches to preservationapproaches to preservation There are a number of significant non-technical issuesThere are a number of significant non-technical issues Digital preservation is also relevant on the personal levelDigital preservation is also relevant on the personal level
  • 33. 33http://www.digitalpreservation.gov/formats/index.shtml The Library of Congress “Sustainability of Digital Formats” site, which analyzes the preservation merits of a variety of digital file formats. NDIIPP Digital Preservation OutreachNDIIPP Digital Preservation Outreach
  • 34. 34 NDIIPP Digital Preservation OutreachNDIIPP Digital Preservation Outreach http://www.digitalpreservation.gov The Library of Congress Digital Preservation web site
  • 35. 35http://blogs.loc.gov/digitalpreservation The NDIIPP blog “The Signal”: Where we post, and discuss, the many issues, news items and project updates about digital preservation and library technology, both inside and outside of the Library of Congress. NDIIPP Digital Preservation OutreachNDIIPP Digital Preservation Outreach
  • 36. 36 Leslie JohnstonLeslie Johnston Library of CongressLibrary of Congress lesliej@loc.govlesliej@loc.gov

Editor's Notes

  • #5: <number>
  • #25: <number>
  • #26: <number>
  • #27: <number>
  • #28: <number>
  • #29: <number>
  • #30: <number>
  • #34: <number>
  • #35: <number>
  • #36: <number>
  • #37: <number>