Designing Preservable
                Websites
                                Nicholas Taylor
                                 @nullhandle

DC, VA & MD Search Engine Marketing Meetup
July 18, 2012

                                        “found glass” by Flickr user nuanc under CC BY-NC-ND 2.0
why preserve the web?




copy of the first webpage
web archivists aren’t visible
       stakeholders



     design
                      archiving
     usage
search engine crawler ≠
    archival crawler




      “GoogleBots” by Flickr user ares64 under CC BY 2.0
what is a “preservable”
           website?




“Fish Preserver” by Flickr user ecstaticist under CC BY-NC-SA 2.0
three priorities:
• capture: can resources be acquired by
  current web archiving technologies?
• replay: can the user’s experience of
  the original website be recreated from
  the archived resources?
• preservation: how can it be assured
  that the archived website remains
  coherent over time?
follow web standards and
   accessibility guidelines




“Web Standards Fortune Cookie” by Flickr user mherzber under CC BY-SA 2.0
be careful with robots.txt
       exclusions




     robots.txt for Last.fm
use a site map, transparent
links, and contiguous navigation




    “Card sorting” by Flickr user Manchester Library under CC BY-SA 2.0
maintain stable URLs and
 redirect when necessary




     “Improvised detour sign” by Flickr user Jason McHuff under CC BY-SA 2.0
consider using a Creative
   Commons license




   “2500 Creative Commons Licenses” by Flickr user qthomasbower under CC BY-SA 2.0
use durable data formats




 “Lascaux cave painting” by Flickr user qoforchris under CC BY-ND 2.0
embed metadata, especially the
      character encoding




   source code of http://www.seo.com/
use archiving-friendly platform
     providers and CMSs




   robots.txt for Drupal 7
three tips
1. see how well your site
   validates on
   http://validator.w3.org/
2. see how your site looks
   on http://archive.org/
3. your favorite online
   sitemap generator is a
   good starting point




                              “Highlighters” by Flickr user KJGarbutt under CC BY-ND 2.0
thank you!

Nicholas Taylor
 @nullhandle

More Related Content

PPTX
WASAPI Web Archive Data Transfer APIs
PPTX
2015 NDSA Web Archiving Survey Report Highlights
PPTX
Why Not Lots of Copies Keep(ing) Software Safe?
PPTX
Building Archivable Websites
PPTX
Interoperability and Technical Collaboration for Web and Social Media Archiving
PPTX
Outreach to Campus Webmasters for a Better Web, and Better Web Archiving
PPTX
Collection Development for Selective Web Archiving
PPTX
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
WASAPI Web Archive Data Transfer APIs
2015 NDSA Web Archiving Survey Report Highlights
Why Not Lots of Copies Keep(ing) Software Safe?
Building Archivable Websites
Interoperability and Technical Collaboration for Web and Social Media Archiving
Outreach to Campus Webmasters for a Better Web, and Better Web Archiving
Collection Development for Selective Web Archiving
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...

Similar to Designing Preservable Websites (20)

PPTX
Advocating for Web Archivability
PPT
Tool Academy: Web Archiving
PPTX
From Seed to Harvest: Web Archiving Program Considerations for SUL
PPS
Web 2.0 : Intellectual Property Issues
PPT
Workshop Barcelona: Introduction to Creative Commons
PDF
Bulock Collection Management for OA Resources
PDF
Web 2.0
PPT
Perth Museums - Part 3 managing copyright material
PPT
Creative Commons and the CC BY license, Overview for 2013 OPEN Kick-off
PPT
CC BY license implementation deep dive (OPEN Kick-off)
PPT
Using the CC BY license, Workshop for 2013 OPEN Kick-off
PDF
State of CC Search (GS 2019)
PPTX
2015 03-11_todd-fritz_devnexus_2015
PPT
Share, Remix, Reuse: Creative commons in your library
PPTX
PPT
Web 2.0 The Very Basics Remote
PDF
CC and OER Presentation at Whipple Hill User Conference 09
PPTX
Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Archite...
PPT
Creative Commons @ San Francisco Macromedia Users Forum
Advocating for Web Archivability
Tool Academy: Web Archiving
From Seed to Harvest: Web Archiving Program Considerations for SUL
Web 2.0 : Intellectual Property Issues
Workshop Barcelona: Introduction to Creative Commons
Bulock Collection Management for OA Resources
Web 2.0
Perth Museums - Part 3 managing copyright material
Creative Commons and the CC BY license, Overview for 2013 OPEN Kick-off
CC BY license implementation deep dive (OPEN Kick-off)
Using the CC BY license, Workshop for 2013 OPEN Kick-off
State of CC Search (GS 2019)
2015 03-11_todd-fritz_devnexus_2015
Share, Remix, Reuse: Creative commons in your library
Web 2.0 The Very Basics Remote
CC and OER Presentation at Whipple Hill User Conference 09
Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Archite...
Creative Commons @ San Francisco Macromedia Users Forum
Ad

More from nullhandle (16)

PPTX
Understanding Legal Use Cases for Web Archives
PPTX
Unlocking LOCKSS with APIs
PPTX
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
PPTX
Building Web Archiving Technology, Together
PPTX
Measure All the (Web Archiving) Things!
PPTX
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
PPTX
Campaign Web Archives to Support Multi-Institutional Research
PPTX
2013 NDSA Web Archiving Survey Report Highlights
PPTX
Considerations for Strategic Web Archive Collection Development
PPTX
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
PPTX
Link Persistence, Website Persistence
PPTX
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...
PPT
Using Wayback Machine for Research
PPT
Web and Twitter Archiving at the Library of Congress
PPT
Where We're Going: Non-Traditional Careers for LIS Graduates
PPTX
Usability Testing in Federal Libraries: A Case Study
Understanding Legal Use Cases for Web Archives
Unlocking LOCKSS with APIs
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
Building Web Archiving Technology, Together
Measure All the (Web Archiving) Things!
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
Campaign Web Archives to Support Multi-Institutional Research
2013 NDSA Web Archiving Survey Report Highlights
Considerations for Strategic Web Archive Collection Development
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Link Persistence, Website Persistence
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...
Using Wayback Machine for Research
Web and Twitter Archiving at the Library of Congress
Where We're Going: Non-Traditional Careers for LIS Graduates
Usability Testing in Federal Libraries: A Case Study
Ad

Recently uploaded (20)

PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
STKI Israel Market Study 2025 version august
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
The various Industrial Revolutions .pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PPT
Geologic Time for studying geology for geologist
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Five Habits of High-Impact Board Members
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Hybrid model detection and classification of lung cancer
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Getting started with AI Agents and Multi-Agent Systems
STKI Israel Market Study 2025 version august
Benefits of Physical activity for teenagers.pptx
Chapter 5: Probability Theory and Statistics
The various Industrial Revolutions .pptx
WOOl fibre morphology and structure.pdf for textiles
Geologic Time for studying geology for geologist
Assigned Numbers - 2025 - Bluetooth® Document
NewMind AI Weekly Chronicles – August ’25 Week III
Enhancing emotion recognition model for a student engagement use case through...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
1 - Historical Antecedents, Social Consideration.pdf
sustainability-14-14877-v2.pddhzftheheeeee
Five Habits of High-Impact Board Members
Univ-Connecticut-ChatGPT-Presentaion.pdf
Zenith AI: Advanced Artificial Intelligence
A review of recent deep learning applications in wood surface defect identifi...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Hybrid model detection and classification of lung cancer

Designing Preservable Websites

  • 1. Designing Preservable Websites Nicholas Taylor @nullhandle DC, VA & MD Search Engine Marketing Meetup July 18, 2012 “found glass” by Flickr user nuanc under CC BY-NC-ND 2.0
  • 2. why preserve the web? copy of the first webpage
  • 3. web archivists aren’t visible stakeholders design archiving usage
  • 4. search engine crawler ≠ archival crawler “GoogleBots” by Flickr user ares64 under CC BY 2.0
  • 5. what is a “preservable” website? “Fish Preserver” by Flickr user ecstaticist under CC BY-NC-SA 2.0
  • 6. three priorities: • capture: can resources be acquired by current web archiving technologies? • replay: can the user’s experience of the original website be recreated from the archived resources? • preservation: how can it be assured that the archived website remains coherent over time?
  • 7. follow web standards and accessibility guidelines “Web Standards Fortune Cookie” by Flickr user mherzber under CC BY-SA 2.0
  • 8. be careful with robots.txt exclusions robots.txt for Last.fm
  • 9. use a site map, transparent links, and contiguous navigation “Card sorting” by Flickr user Manchester Library under CC BY-SA 2.0
  • 10. maintain stable URLs and redirect when necessary “Improvised detour sign” by Flickr user Jason McHuff under CC BY-SA 2.0
  • 11. consider using a Creative Commons license “2500 Creative Commons Licenses” by Flickr user qthomasbower under CC BY-SA 2.0
  • 12. use durable data formats “Lascaux cave painting” by Flickr user qoforchris under CC BY-ND 2.0
  • 13. embed metadata, especially the character encoding source code of http://www.seo.com/
  • 14. use archiving-friendly platform providers and CMSs robots.txt for Drupal 7
  • 15. three tips 1. see how well your site validates on http://validator.w3.org/ 2. see how your site looks on http://archive.org/ 3. your favorite online sitemap generator is a good starting point “Highlighters” by Flickr user KJGarbutt under CC BY-ND 2.0

Editor's Notes

  • #2: Design decisions have a major effect on website preservability.
  • #3: “ Benign neglect” may have been sufficient for physical objects; more active interventions needed for digital ones.
  • #4: Design and usage inform each other; where does web archiving fit?
  • #5: Because web archivists care about recreating the user experience, they care about all assets being crawled.
  • #8: Good also for usability and SEO. Web crawlers access sites like a text browser. Replay platform must accommodate coding idiosyncrasies.
  • #9: CSS and JavaScript directories matter for archiving but perhaps not for search engine indexing.
  • #10: Crawler can only capture links it sees. User of archived site can only navigate by following links. Avoid relying on Flash, JavaScript, or other technologies that obscure links. Use a site map.
  • #11: Link rot is common. Web archiving tools are URL-sensitive. Stable/redirect URLs make for seamless archive access.
  • #12: Copyright law lacks explicit provisions for digital preservation. Many libraries ask for permission to archive websites. Creative Commons provides affirmative permission to be crawled and preserved.
  • #13: Websites contain many different file types, each with distinct preservation risks. Favor open standards and file formats, except when poorly-documented or where vendor-specific extensions are allowed.
  • #14: Embedded metadata makes it easier to replay and preserve archived sites.
  • #15: Platform providers more likely to accommodate commercial search indexers than archival crawlers. If you care about archiving, inquire about policies, examine robots.txt, or look at how website looks in Internet Archive’s Wayback Machine. If you’re using an open source CMS, be sure to review the bundled robots.txt.
  • #16: While following these recommendations won’t guarantee perfect archiving, not following them will ensure additional challenges.