Measure All the
(Web Archiving) Things!
Nicholas Taylor
Web Archiving Service Manager
Stanford University Libraries
Archive-It Partner Meeting
August 18, 2015
how many more websites are we archiving?
“Library_01.jpg” by British Library
crawl report list
Archive-It: “Crawls for Account #198”
seeds for individual crawl
Archive-It: “Seeds for Crawl #99435”
download seed list
Archive-It: “Seeds for Crawl #99435”
downloaded seed list
whew, that was easy!
oh, wait a minute…
seed lists are per crawl
well, how many crawls are there?
• 6 accounts
• oldest active since 2007
• 30+ collections
• hundreds of crawls
count and average not enough
• seeds move in and out of
crawls
• seeds have different
frequencies
• new seeds w/ new URLs
for old seeds
• “university website” is
many seeds
plus
• non Archive-It web
archiving activity
“Dichotomic Maples” by francoismi under CC BY-NC-SA 2.0
“what gets measured, gets managed”
“Gudauri still life” by Carsten ten Brink under CC BY-NC-ND 2.0
why measure?
• advocacy/outreach
• service modeling
• program assessment
• policy making
• staffing assessment
• grant support
• prioritization
• risk assessment “Measuring river depth” by epeirogenic under CC BY-NC 2.0
what to measure?
• How to handle the data volume?
• What is the usage of web archives?
• How much does web archiving cost?
• How to assure the quality of archived content?
• How to secure institutional buy-in?
• How much loss have resources suffered?
• What is the impact of policy requirements?
community-valued metrics
0%
10%
20%
30%
40%
50%
60%
Volume Usage Cost Quality Buy-in Loss Policy
Percentage of organizations
NDSA: “Web Archiving in the United States: a 2013 Survey”
volume
• websites
– captured
– preserved
– described
• data
– captured
– preserved
• objects
– captured
– preserved “typography jumble” by Bill Dickinson under CC BY-NC 2.0
usage
• web analytics
– visitors
– visits
– referers
• actual use cases
(who + how many?)
– research
– teaching
– institutional legacy
– compliance
“113/365 Days: A page from my heart” by LaughingRhoda under CC BY-NC-ND 2.0
cost
• external
– out-payments for web
archiving services
– quota utilization
• internal
– staff time, by activity
– storage “Largest square from a dollar bill” by origami_madness under CC BY-NC 2.0
performance
• accessioning
throughput
• service request
turnaround
• collections/websites
w/ discovery records
• time to regenerate
full-text index
“Lower rack” by Andy Melton under CC BY-SA 2.0
community-valued…metrics?
0%
10%
20%
30%
40%
50%
60%
Volume Usage Cost Quality Buy-in Loss Policy
Percentage of organizations
NDSA: “Web Archiving in the United States: a 2013 Survey”
“not everything that counts can be counted”
“Ten Floods, Twenty-Five Trees, Nineteen Bubbles...” by Flood G. under CC BY-NC-ND 2.0
quality
• use case-specific?
• benchmark to ideal or
to limits of tools?
• quantifiable metrics?
• existing metrics as
proxies for quality?
• sampling approach?
• not just missing content
but also collected junk
NYARC: “I. Introduction - NYARC Documentation”
buy-in
• unique nominators?
• projects w/ web archiving
component?
• budgetary commitments?
• resource commitments?
• charge for service?
• testimonials?
“The Play” by Ryan Hyde under CC BY-SA 2.0
loss
UK Web Archive: “Ten years of the UK Web Archive: What have we saved?”
policy
• first capture under
embargo
• opt-out requests
• takedown requests
• external environment
“We apologise for any convenience - Update” by Alan Stanton under CC BY-SA 2.0
better measures, measuring better
“Line Art Project #2 VIS3 UCSD” by Mandy Jouan under CC BY-NC-ND 2.0

More Related Content

PPTX
Digitisation Projects at Wellcome Library
PPTX
Finnish Cultural Institute Workshop at British Library
PPTX
Google Books' Potential for Digital Transformation - Syracuse University MLIS
PPT
Internet Archive 2
PDF
User Access Patterns in Web Archives
PPTX
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
PPT
Internet Archive
PPTX
"Archive What I See Now" - NEH ODH overview
Digitisation Projects at Wellcome Library
Finnish Cultural Institute Workshop at British Library
Google Books' Potential for Digital Transformation - Syracuse University MLIS
Internet Archive 2
User Access Patterns in Web Archives
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Internet Archive
"Archive What I See Now" - NEH ODH overview

Viewers also liked (7)

PPTX
Who and What Links to the Internet Archive
PPTX
What can linked data do for digital libraries
PDF
The impact of innovation on travel and tourism industries (World Travel Marke...
PPTX
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...
PPTX
Considerations for Strategic Web Archive Collection Development
PPTX
Building Web Archiving Technology, Together
PPTX
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
Who and What Links to the Internet Archive
What can linked data do for digital libraries
The impact of innovation on travel and tourism industries (World Travel Marke...
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...
Considerations for Strategic Web Archive Collection Development
Building Web Archiving Technology, Together
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
Ad

Similar to Measure All the (Web Archiving) Things! (20)

PPTX
From Seed to Harvest: Web Archiving Program Considerations for SUL
PPTX
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
PPT
Creating and Maintaining Web Archives
PPTX
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
PPTX
Capture All the URLS: First Steps in Web Archiving
PPTX
Advocating for Web Archivability
PDF
Building Web Archiving Collaborations to Save [More of] the Web
PPTX
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
PDF
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
PDF
Slides anu talkwebarchivingaug2012
PPTX
Building Archivable Websites
PPTX
Collaboration and Cash: Web Archiving Incentive Awards
PDF
Collaborative Web Archiving with Ivy Plus / Borrow Direct
PPTX
Capture All the URLs: First Steps in Web Archiving
PDF
Review of Web Archiving
PPT
Web Archiving Intro (circa 2015)
PDF
Web archiving collaborations: a presentation for colleagues working in the Li...
PPTX
2015 NDSA Web Archiving Survey Report Highlights
PDF
Introduction to Web Archiving
PDF
Web Archiving in the Year eaee1902f186819154789ee22ca30035
From Seed to Harvest: Web Archiving Program Considerations for SUL
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Creating and Maintaining Web Archives
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
Capture All the URLS: First Steps in Web Archiving
Advocating for Web Archivability
Building Web Archiving Collaborations to Save [More of] the Web
Progress Made and Lessons Learned through Collaborative Web Archiving Proj...
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
Slides anu talkwebarchivingaug2012
Building Archivable Websites
Collaboration and Cash: Web Archiving Incentive Awards
Collaborative Web Archiving with Ivy Plus / Borrow Direct
Capture All the URLs: First Steps in Web Archiving
Review of Web Archiving
Web Archiving Intro (circa 2015)
Web archiving collaborations: a presentation for colleagues working in the Li...
2015 NDSA Web Archiving Survey Report Highlights
Introduction to Web Archiving
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Ad

More from nullhandle (18)

PPTX
Understanding Legal Use Cases for Web Archives
PPTX
Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Archite...
PPTX
Unlocking LOCKSS with APIs
PPTX
Interoperability and Technical Collaboration for Web and Social Media Archiving
PPTX
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
PPTX
Collection Development for Selective Web Archiving
PPTX
Why Not Lots of Copies Keep(ing) Software Safe?
PPTX
WASAPI Web Archive Data Transfer APIs
PPTX
Outreach to Campus Webmasters for a Better Web, and Better Web Archiving
PPTX
Campaign Web Archives to Support Multi-Institutional Research
PPTX
2013 NDSA Web Archiving Survey Report Highlights
PPTX
Link Persistence, Website Persistence
PPT
Tool Academy: Web Archiving
PPT
Using Wayback Machine for Research
PPT
Designing Preservable Websites
PPT
Web and Twitter Archiving at the Library of Congress
PPT
Where We're Going: Non-Traditional Careers for LIS Graduates
PPTX
Usability Testing in Federal Libraries: A Case Study
Understanding Legal Use Cases for Web Archives
Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Archite...
Unlocking LOCKSS with APIs
Interoperability and Technical Collaboration for Web and Social Media Archiving
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
Collection Development for Selective Web Archiving
Why Not Lots of Copies Keep(ing) Software Safe?
WASAPI Web Archive Data Transfer APIs
Outreach to Campus Webmasters for a Better Web, and Better Web Archiving
Campaign Web Archives to Support Multi-Institutional Research
2013 NDSA Web Archiving Survey Report Highlights
Link Persistence, Website Persistence
Tool Academy: Web Archiving
Using Wayback Machine for Research
Designing Preservable Websites
Web and Twitter Archiving at the Library of Congress
Where We're Going: Non-Traditional Careers for LIS Graduates
Usability Testing in Federal Libraries: A Case Study

Recently uploaded (20)

PPT
12 Things That Make People Trust a Website Instantly
PDF
Uptota Investor Deck - Where Africa Meets Blockchain
PPTX
在线订购名古屋艺术大学毕业证, buy NUA diploma学历认证失败怎么办
PPTX
Layers_of_the_Earth_Grade7.pptx class by
PDF
Virtual Guard Technology Provider_ Remote Security Service Solutions.pdf
PPT
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
PPTX
Cyber Hygine IN organizations in MSME or
DOCX
Powerful Ways AIRCONNECT INFOSYSTEMS Pvt Ltd Enhances IT Infrastructure in In...
PPTX
Internet Safety for Seniors presentation
PDF
Understand the Gitlab_presentation_task.pdf
PDF
The Evolution of Traditional to New Media .pdf
PPTX
AI_Cyberattack_Solutions AI AI AI AI .pptx
PPTX
Reading as a good Form of Recreation
PDF
Session 1 (Week 1)fghjmgfdsfgthyjkhfdsadfghjkhgfdsa
PPTX
Viva Digitally Software-Defined Wide Area Network.pptx
PDF
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
PPTX
t_and_OpenAI_Combined_two_pressentations
PPTX
Tìm hiểu về dịch vụ FTTH - Fiber Optic Access Node
PPTX
COPD_Management_Exacerbation_Detailed_Placeholders.pptx
PPT
250152213-Excitation-SystemWERRT (1).ppt
12 Things That Make People Trust a Website Instantly
Uptota Investor Deck - Where Africa Meets Blockchain
在线订购名古屋艺术大学毕业证, buy NUA diploma学历认证失败怎么办
Layers_of_the_Earth_Grade7.pptx class by
Virtual Guard Technology Provider_ Remote Security Service Solutions.pdf
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
Cyber Hygine IN organizations in MSME or
Powerful Ways AIRCONNECT INFOSYSTEMS Pvt Ltd Enhances IT Infrastructure in In...
Internet Safety for Seniors presentation
Understand the Gitlab_presentation_task.pdf
The Evolution of Traditional to New Media .pdf
AI_Cyberattack_Solutions AI AI AI AI .pptx
Reading as a good Form of Recreation
Session 1 (Week 1)fghjmgfdsfgthyjkhfdsadfghjkhgfdsa
Viva Digitally Software-Defined Wide Area Network.pptx
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
t_and_OpenAI_Combined_two_pressentations
Tìm hiểu về dịch vụ FTTH - Fiber Optic Access Node
COPD_Management_Exacerbation_Detailed_Placeholders.pptx
250152213-Excitation-SystemWERRT (1).ppt

Measure All the (Web Archiving) Things!

  • 1. Measure All the (Web Archiving) Things! Nicholas Taylor Web Archiving Service Manager Stanford University Libraries Archive-It Partner Meeting August 18, 2015
  • 2. how many more websites are we archiving? “Library_01.jpg” by British Library
  • 3. crawl report list Archive-It: “Crawls for Account #198”
  • 4. seeds for individual crawl Archive-It: “Seeds for Crawl #99435”
  • 5. download seed list Archive-It: “Seeds for Crawl #99435”
  • 8. oh, wait a minute… seed lists are per crawl well, how many crawls are there? • 6 accounts • oldest active since 2007 • 30+ collections • hundreds of crawls
  • 9. count and average not enough • seeds move in and out of crawls • seeds have different frequencies • new seeds w/ new URLs for old seeds • “university website” is many seeds plus • non Archive-It web archiving activity “Dichotomic Maples” by francoismi under CC BY-NC-SA 2.0
  • 10. “what gets measured, gets managed” “Gudauri still life” by Carsten ten Brink under CC BY-NC-ND 2.0
  • 11. why measure? • advocacy/outreach • service modeling • program assessment • policy making • staffing assessment • grant support • prioritization • risk assessment “Measuring river depth” by epeirogenic under CC BY-NC 2.0
  • 12. what to measure? • How to handle the data volume? • What is the usage of web archives? • How much does web archiving cost? • How to assure the quality of archived content? • How to secure institutional buy-in? • How much loss have resources suffered? • What is the impact of policy requirements?
  • 13. community-valued metrics 0% 10% 20% 30% 40% 50% 60% Volume Usage Cost Quality Buy-in Loss Policy Percentage of organizations NDSA: “Web Archiving in the United States: a 2013 Survey”
  • 14. volume • websites – captured – preserved – described • data – captured – preserved • objects – captured – preserved “typography jumble” by Bill Dickinson under CC BY-NC 2.0
  • 15. usage • web analytics – visitors – visits – referers • actual use cases (who + how many?) – research – teaching – institutional legacy – compliance “113/365 Days: A page from my heart” by LaughingRhoda under CC BY-NC-ND 2.0
  • 16. cost • external – out-payments for web archiving services – quota utilization • internal – staff time, by activity – storage “Largest square from a dollar bill” by origami_madness under CC BY-NC 2.0
  • 17. performance • accessioning throughput • service request turnaround • collections/websites w/ discovery records • time to regenerate full-text index “Lower rack” by Andy Melton under CC BY-SA 2.0
  • 18. community-valued…metrics? 0% 10% 20% 30% 40% 50% 60% Volume Usage Cost Quality Buy-in Loss Policy Percentage of organizations NDSA: “Web Archiving in the United States: a 2013 Survey”
  • 19. “not everything that counts can be counted” “Ten Floods, Twenty-Five Trees, Nineteen Bubbles...” by Flood G. under CC BY-NC-ND 2.0
  • 20. quality • use case-specific? • benchmark to ideal or to limits of tools? • quantifiable metrics? • existing metrics as proxies for quality? • sampling approach? • not just missing content but also collected junk NYARC: “I. Introduction - NYARC Documentation”
  • 21. buy-in • unique nominators? • projects w/ web archiving component? • budgetary commitments? • resource commitments? • charge for service? • testimonials? “The Play” by Ryan Hyde under CC BY-SA 2.0
  • 22. loss UK Web Archive: “Ten years of the UK Web Archive: What have we saved?”
  • 23. policy • first capture under embargo • opt-out requests • takedown requests • external environment “We apologise for any convenience - Update” by Alan Stanton under CC BY-SA 2.0
  • 24. better measures, measuring better “Line Art Project #2 VIS3 UCSD” by Mandy Jouan under CC BY-NC-ND 2.0