SlideShare a Scribd company logo
WEB ARCHIVING PROJECTS END-USER PERSPECTIVE   Bogdan Trifunovic, M. A. Digitization Center Public Library Cacak [email_address] www.cacak-dis.rs
The purpose of research Examines usability and accessibility of the publicly opened web archiving projects Identifying user-friendly features associated with the web sites of several web archiving projects, but also the creation of basic structure and framework for comparative analysis  Raising awareness about web archiving
INTERNET ARCHIVE http://www.archive.org Established in 1996 as non-profit organization (private funding) Oldest web archiving project, using Alexa crawler (robot) for creating the snapshots of entire WWW The sheer size of Internet doesn’t allow capturing everything online
Newer approaches Mostly dealing with the “national” part of WWW (e.g. capturing and archiving national domain, digital preservation of “web heritage”) Run by major national institutions (libraries, consortia) Selective approach of identifying quality Internet content, which satisfies established standards
Web Archiving projects PANDORA (National Library of Australia) EUROPEAN ARCHIVE (non-profit) MINERVA (Library of Congress) UK WEB ARCHIVE (British Library) WEBARCHIV (National Library of the Czech Republic) *All projects were reviewed in November 2008
PANDORA http://pandora.nla.gov.au/ PANDAS (PANDORA Digital Archiving System) HTTrack crawler Excellent documentation, easily to navigate and browse collections Basic and advance search options Unlimited access to collections
EUROPEAN ARCHIVE http://www.europarchive.org/ New project, still in development Web 2.0 elements (tag cloud, my Desktop) Internet Archive harvesting services No search options for web archive, multilingual interface Unlimited access
MINERVA http://lcweb2.loc.gov/diglib/lcwa/html/lcwa-home.html Harvest by Internet Archive Thematic collections (US elections, war in Iraq, etc) Restrictions on access to some collections (only from LOC)
UK WEB ARCHIVE http://www.webarchive.org.uk Established 2003 by six institutions as UK Web Archive Consortium, between 2005 and 2007 project had used PANDAS technology, from 2008 new web archiving system based on Web Curator Tool has been introduced BL maintains project from 2008
WEBARCHIV http://www.webarchiv.cz/ Heritrix crawler Archiving Czech web domain, access to collection of websites (900+) with signed contracts for public access, everything else only from NKP No search option except by URL, content not indexed
Why archiving web General idea is that changing nature of WWW and instability of information on Internet should be preserved in some way, because that is part of national (digital) culture Preservation of online documents (e.g., for citation accuracy) Because there is huge growth of online material
Difficulties There are three important characteristics of the Web that make crawling it very difficult: its large volume,  its fast rate of change, and  dynamic page generation Identifying web content that should be preserved for future – the role of librarians, curators, archivists…
Serbia case The process of changing national domain from .yu to .rs domain has started in 2008 By October 2009 all of .yu content (everything with .yu address) will permanently disappear from WWW Thousands of web pages will be lost There is no strategy of preserving them (but also no time)
Planning on a small scale Public library Cacak-Digitization Center created a short list of about 50 web sites of interest for us We used HTTrack (http://www.httrack.com/) web crawler to locally archive them It is possible to navigate all websites, where harvesting process was successful
 
 
 
Future steps Improving organizational framework for web archiving of local resources Defining the legal setting – how to download and archive authorized material Finding solutions for automatic  archiving (partially solving the problem of staff shortages)
THANK YOU! QUESTIONS? Bogdan Trifunovic, M. A. Digitization Center Public Library Cacak [email_address] www.cacak-dis.rs

More Related Content

PDF
Scaling up to archive the UK Web. Helen Hockx-Yu
PDF
An introduction to the International Internet Preservation Consortium. Mary Pitt
PDF
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
PDF
A Framework for Improved Access to Museum Databases in the Semantic Web
PPTX
Bingham, De Wild & Aasman Presentation
PDF
Netarchive Suite at the BNE. Juan Carlos García Arratia y Mar Pérez Morillo
PDF
lodlam summit session browsable linked data
PPTX
Publishing "5 star" data: the case for RDF
Scaling up to archive the UK Web. Helen Hockx-Yu
An introduction to the International Internet Preservation Consortium. Mary Pitt
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
A Framework for Improved Access to Museum Databases in the Semantic Web
Bingham, De Wild & Aasman Presentation
Netarchive Suite at the BNE. Juan Carlos García Arratia y Mar Pérez Morillo
lodlam summit session browsable linked data
Publishing "5 star" data: the case for RDF

What's hot (20)

PDF
Building a Collection of the Historical UK Web for scholarly use
PPTX
High and Lows of Library Linked Data
PPT
Microservices in LoCloud
PPTX
Clare Lanigan - Presentation to IES Students
PPTX
Open Science Days 2014 - Becker - Repositories and Linked Data
PPTX
C06 linda levi_jeffrey_edelstein_jdc_archives
PDF
C06 linda levi_jeffrey_edelstein_jdc_archives
PPTX
Tuesday 5 May: Definition and Representation of National Web Domains across W...
PPTX
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
PPT
LoCloud Micro Services and the Digitisation Workflow
PDF
OA Network: Heading for Joint Standards and Enhancing Cooperation: Value‐Adde...
PPT
Ktisis: Building an Open Access Institutional and Cultural Repository
PDF
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
PPTX
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
PDF
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
PDF
LoCloud: Local Cultural Heritage Online and in the Cloud
PPTX
Genealogy of front end technologies
PPT
Eenmaal gemeten, veel gebruikt
PPTX
Do MORe with your data
PPTX
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
Building a Collection of the Historical UK Web for scholarly use
High and Lows of Library Linked Data
Microservices in LoCloud
Clare Lanigan - Presentation to IES Students
Open Science Days 2014 - Becker - Repositories and Linked Data
C06 linda levi_jeffrey_edelstein_jdc_archives
C06 linda levi_jeffrey_edelstein_jdc_archives
Tuesday 5 May: Definition and Representation of National Web Domains across W...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
LoCloud Micro Services and the Digitisation Workflow
OA Network: Heading for Joint Standards and Enhancing Cooperation: Value‐Adde...
Ktisis: Building an Open Access Institutional and Cultural Repository
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Archiving the French Web: the BnF web archiving workflow. Sara Aubry
LoCloud: Local Cultural Heritage Online and in the Cloud
Genealogy of front end technologies
Eenmaal gemeten, veel gebruikt
Do MORe with your data
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
Ad

Viewers also liked (17)

PDF
Captured and preserved forever
PDF
Digital Preservation in the Wild
PDF
Gestionando la Serendipia / managing serendipity
PDF
DIY Digitization & Preservation
PPT
Managing Preservation without a Preservation Librarian
PPTX
Who needs migrant Workers? Martin Ruhs, Migration Observatory, COMPAS
PDF
002 teoria de-redes
KEY
Propuesta de Ley Comer bien para vivir bien
KEY
Open Design Now
PDF
Policies on the digitalization of cultural heritage
PDF
tutorial sobre Diigo
PDF
Newforma Project Center Top 10: Most Popular Features
PPT
Passion at work: blogging practices of knowledge workers
PPT
Compta En Bref
PPT
Transistoresclase
PPT
03 phenomenology
PDF
Mai2010 einladung doktorandenkolloquium
Captured and preserved forever
Digital Preservation in the Wild
Gestionando la Serendipia / managing serendipity
DIY Digitization & Preservation
Managing Preservation without a Preservation Librarian
Who needs migrant Workers? Martin Ruhs, Migration Observatory, COMPAS
002 teoria de-redes
Propuesta de Ley Comer bien para vivir bien
Open Design Now
Policies on the digitalization of cultural heritage
tutorial sobre Diigo
Newforma Project Center Top 10: Most Popular Features
Passion at work: blogging practices of knowledge workers
Compta En Bref
Transistoresclase
03 phenomenology
Mai2010 einladung doktorandenkolloquium
Ad

Similar to WEB ARCHIVING PROJECTS END-USER PERSPECTIVE (20)

PPT
Creating and Maintaining Web Archives
PDF
Slides anu talkwebarchivingaug2012
PDF
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
PPTX
Can you save the web? Web Archiving!
PDF
The Past Web Exploring Web Archives Daniel Gomes Elena Demidova
PDF
Internet content as research data
PDF
Archiving websites a practical guide for information management professionals...
PPTX
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
PDF
Archiving websites a practical guide for information management professionals...
PPTX
Archiving Web-Based #musetech for Institutional Memory
PPT
The development of web archiving 3
PPT
Web Archiving Intro (circa 2015)
PDF
Web Archiving – Lessons and Potential
PDF
Introduction to Web Archiving
PPTX
Archiving for Now and Later - workshop at Common Field Convening 2019
PPTX
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
PPTX
Best Practices for Descriptive Metadata
PPT
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Creating and Maintaining Web Archives
Slides anu talkwebarchivingaug2012
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
Can you save the web? Web Archiving!
The Past Web Exploring Web Archives Daniel Gomes Elena Demidova
Internet content as research data
Archiving websites a practical guide for information management professionals...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
Archiving websites a practical guide for information management professionals...
Archiving Web-Based #musetech for Institutional Memory
The development of web archiving 3
Web Archiving Intro (circa 2015)
Web Archiving – Lessons and Potential
Introduction to Web Archiving
Archiving for Now and Later - workshop at Common Field Convening 2019
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
Best Practices for Descriptive Metadata
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...

More from Bogdan Trifunovic (8)

PPT
So, your library wants to do digital preservation?!
PPT
Digitization Of Audiovisual Collections
PPT
Digitalizacija i zavicajne digitalne biblioteke
PPT
DIGITAL LIBRARIES AND THE CHALLENGE OF A "DIGITAL DARK AGES"
PPT
Ifla2009 Foto Izvestaj
PPT
Developing services for local history research through a digitization project...
PPT
Digitization projects among public libraries in Serbia
PPT
Digitalna biblioteka Cacka 2006-2008
So, your library wants to do digital preservation?!
Digitization Of Audiovisual Collections
Digitalizacija i zavicajne digitalne biblioteke
DIGITAL LIBRARIES AND THE CHALLENGE OF A "DIGITAL DARK AGES"
Ifla2009 Foto Izvestaj
Developing services for local history research through a digitization project...
Digitization projects among public libraries in Serbia
Digitalna biblioteka Cacka 2006-2008

Recently uploaded (20)

PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
HVAC Specification 2024 according to central public works department
PDF
Indian roads congress 037 - 2012 Flexible pavement
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PPTX
Introduction to Building Materials
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PPTX
20th Century Theater, Methods, History.pptx
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
My India Quiz Book_20210205121199924.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
1_English_Language_Set_2.pdf probationary
Chinmaya Tiranga quiz Grand Finale.pdf
HVAC Specification 2024 according to central public works department
Indian roads congress 037 - 2012 Flexible pavement
TNA_Presentation-1-Final(SAVE)) (1).pptx
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Introduction to Building Materials
AI-driven educational solutions for real-life interventions in the Philippine...
20th Century Theater, Methods, History.pptx
Share_Module_2_Power_conflict_and_negotiation.pptx
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Paper A Mock Exam 9_ Attempt review.pdf.
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
202450812 BayCHI UCSC-SV 20250812 v17.pptx
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Virtual and Augmented Reality in Current Scenario
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
My India Quiz Book_20210205121199924.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
1_English_Language_Set_2.pdf probationary

WEB ARCHIVING PROJECTS END-USER PERSPECTIVE

  • 1. WEB ARCHIVING PROJECTS END-USER PERSPECTIVE Bogdan Trifunovic, M. A. Digitization Center Public Library Cacak [email_address] www.cacak-dis.rs
  • 2. The purpose of research Examines usability and accessibility of the publicly opened web archiving projects Identifying user-friendly features associated with the web sites of several web archiving projects, but also the creation of basic structure and framework for comparative analysis Raising awareness about web archiving
  • 3. INTERNET ARCHIVE http://www.archive.org Established in 1996 as non-profit organization (private funding) Oldest web archiving project, using Alexa crawler (robot) for creating the snapshots of entire WWW The sheer size of Internet doesn’t allow capturing everything online
  • 4. Newer approaches Mostly dealing with the “national” part of WWW (e.g. capturing and archiving national domain, digital preservation of “web heritage”) Run by major national institutions (libraries, consortia) Selective approach of identifying quality Internet content, which satisfies established standards
  • 5. Web Archiving projects PANDORA (National Library of Australia) EUROPEAN ARCHIVE (non-profit) MINERVA (Library of Congress) UK WEB ARCHIVE (British Library) WEBARCHIV (National Library of the Czech Republic) *All projects were reviewed in November 2008
  • 6. PANDORA http://pandora.nla.gov.au/ PANDAS (PANDORA Digital Archiving System) HTTrack crawler Excellent documentation, easily to navigate and browse collections Basic and advance search options Unlimited access to collections
  • 7. EUROPEAN ARCHIVE http://www.europarchive.org/ New project, still in development Web 2.0 elements (tag cloud, my Desktop) Internet Archive harvesting services No search options for web archive, multilingual interface Unlimited access
  • 8. MINERVA http://lcweb2.loc.gov/diglib/lcwa/html/lcwa-home.html Harvest by Internet Archive Thematic collections (US elections, war in Iraq, etc) Restrictions on access to some collections (only from LOC)
  • 9. UK WEB ARCHIVE http://www.webarchive.org.uk Established 2003 by six institutions as UK Web Archive Consortium, between 2005 and 2007 project had used PANDAS technology, from 2008 new web archiving system based on Web Curator Tool has been introduced BL maintains project from 2008
  • 10. WEBARCHIV http://www.webarchiv.cz/ Heritrix crawler Archiving Czech web domain, access to collection of websites (900+) with signed contracts for public access, everything else only from NKP No search option except by URL, content not indexed
  • 11. Why archiving web General idea is that changing nature of WWW and instability of information on Internet should be preserved in some way, because that is part of national (digital) culture Preservation of online documents (e.g., for citation accuracy) Because there is huge growth of online material
  • 12. Difficulties There are three important characteristics of the Web that make crawling it very difficult: its large volume, its fast rate of change, and dynamic page generation Identifying web content that should be preserved for future – the role of librarians, curators, archivists…
  • 13. Serbia case The process of changing national domain from .yu to .rs domain has started in 2008 By October 2009 all of .yu content (everything with .yu address) will permanently disappear from WWW Thousands of web pages will be lost There is no strategy of preserving them (but also no time)
  • 14. Planning on a small scale Public library Cacak-Digitization Center created a short list of about 50 web sites of interest for us We used HTTrack (http://www.httrack.com/) web crawler to locally archive them It is possible to navigate all websites, where harvesting process was successful
  • 15.  
  • 16.  
  • 17.  
  • 18. Future steps Improving organizational framework for web archiving of local resources Defining the legal setting – how to download and archive authorized material Finding solutions for automatic archiving (partially solving the problem of staff shortages)
  • 19. THANK YOU! QUESTIONS? Bogdan Trifunovic, M. A. Digitization Center Public Library Cacak [email_address] www.cacak-dis.rs