SlideShare a Scribd company logo
Building Web Archiving
Technology, Together
Nicholas Taylor
Web Archiving Service Manager
Stanford University Libraries
Web Archives 2015: Capture, Curate, Analyze
November 13, 2015
overview
• why build together?
• community for
collaborative work
• APIs for collaborative
work
“LAX on take off” by Doug under CC BY-NC-ND 2.0
not a programmer
“Bug” by Randall Munroe under CC BY-NC 2.5
aspiring OSS contributor
GitHub: “nullhandle (Nicholas Taylor)”
studying the landscape
“2010 Grand Canyon Celebration of Art 172” by Grand Canyon National Park under CC BY 2.0
a centralized enterprise
60%
25%
14%
63%
20%
16%
0%
10%
20%
30%
40%
50%
60%
70%
External Local Both
2011 2013
NDSA: “Web Archiving in the U.S.: A 2013 Survey”
a centralized enterprise
0 0
1
0
2
0
1
0
1
0
3 3
1
2
4
2
6
4
1
0
2
0
0
1
1
0
1 3
5
3
4 2
2
5
6
15
0
2
4
6
8
10
12
14
16
18
20
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Number of organizations Archive-It Partner as of 2013
NDSA: “Web Archiving in the U.S.: A 2013 Survey”
minimal local preservation
19%
81%
20%
80%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Transferred Haven't transferred
2011 2013
NDSA: “Web Archiving in the U.S.: A 2013 Survey”
evolving web
“Light Writing - Spider Web” by oz dean under CC BY-ND 2.0
opportunities for preservation
“standing out” by kenda bustami under CC BY 2.0
opportunities for research
“Exploring the Canadian Political Interest Group and Political Parties Web Sphere” by Ian Milligan under Standard YouTube License
not the only one
HUL: “Web Archiving Environmental Scan Home”
CDL: “Announcing a New Partnership”
a response
COMMUNITY
“Why we love Peckham, P1020468crop” by Eye magazine under CC BY-NC-SA 2.0
community analysis
SAA Web
Archiving
Roundtabl
e
Archive-It
Partners
IIPC
NDSA: “Web Archiving in the U.S.: A 2013 Survey”
Archive-It
Archive-It: “Archive-It 5.0 Feature Requests”
IIPC
Open HUB: “Open Wayback”
models of software production
(irrespective of license)
• sole source
– single developer
• closed source
– team/corporate dev; no outside contributions
• club source
– pool resources for solo/team/corporate dev
• community source
– direct and distributed community participation
• open source
– grassroots, democratic, meritocratic participation
Tom Cramer: “Collaborative Open Source Software Production & APIs”
club source examples
• Archivematica, AtoM (Artefactual)
• ArchivesSpace (Lyrasis)
• Bitcurator (Educopia)
• Fedora (DuraSpace)
• JHOVE (OPF)
• LOCKSS (Stanford University)
• Omeka (George Mason University)
community source examples
community architecture
• privileges community over code
• recognizes distribution of investment
• embraces community diversity
• models open processes and governance
• encourages varied contributions
• serves community needs
STANDARDS
“P1050827” by Rebecca Siegel under CC BY 2.0
success of a standard
• capture: DeDuplicator, Heritrix, python-
heritrix, SiteStory, WAIL, WARCreate,
WarcMITMProxy, WarcProxy, Webrecorder,
wget, Wpull
• access: OpenWayback, pywb, warc-proxy,
WarcManager, Wayback Machine, Web
Archive Discovery, WebArchivePlayer
• utilities: JHOVE2, JWAT, Megawarc, pylibwarc,
WARCAT, Warcbase, warctools, Web Archive
Commons
web archiving lifecycle
Internet Archive: “The Web Archiving Life Cycle Model”
missed opportunities?
Appraisal
and
Selection
Scoping
Data
Capture
Storage and
Organization
QA and
Analysis
Metadata /
Description
Access
/ Use /
Reuse
Preservation
Risk
Management
ACT
Archive-It
AtN
BCWeb
CDL WAS
DigiBoard
Islandora
WARC
Solution Pack
Netarchive
Suite
PageFreezer
UNT
Nomination
Tool
WCT
smaller, modular components
“Giant Rubik's Cube” by Francois Lamotte under CC BY 2.0
smaller projects do better
small projects (<$1 million) large projects (>$10 million)
on time/budget challenged failedon time/budget challenged failed
Standish Group: “Chaos Manifesto 2013: Thing Big, Act Small”
IIPC community interest in APIs
contribution type
% of
respondents
# of
respondents
help define functional
requirements
94% 15
contribute use cases 81% 13
help define technical details 69% 11
help schedule and run
meetings
19% 3
implement and test 6% 1
Andrea Goethals: “Results of the Web Archiving API Survey of IIPC Members”
API candidates
• capture tool/proxy
interconnect
• capture tool
management
• data import/export
• query + extraction
• integrity audit + repair
• descriptive metadata
• logs + analytics
• renderings/derivative
formats
• federated data
delivery
• federated replay
• federated full-text
search
let’s combine forces
“Stages of flow” by Peter Thoeny under CC BY-NC-SA 2.0

More Related Content

PPTX
Outreach to Campus Webmasters for a Better Web, and Better Web Archiving
PPTX
Collection Development for Selective Web Archiving
PPTX
From Seed to Harvest: Web Archiving Program Considerations for SUL
PDF
Open Access, E Resources In The Networked Web 2.0
PPT
The Making Of A Social Librarian
KEY
Drupal Open Source Everything
PDF
Avoiding Zombies in Archival Replay Using ServiceWorker
PPTX
Wikis in the Workplace: Enhancing Collaboration and Knowledge Management
Outreach to Campus Webmasters for a Better Web, and Better Web Archiving
Collection Development for Selective Web Archiving
From Seed to Harvest: Web Archiving Program Considerations for SUL
Open Access, E Resources In The Networked Web 2.0
The Making Of A Social Librarian
Drupal Open Source Everything
Avoiding Zombies in Archival Replay Using ServiceWorker
Wikis in the Workplace: Enhancing Collaboration and Knowledge Management

What's hot (9)

PPT
Practical examples of web2.0 in the development sector
PPTX
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
ODP
Web 2.0 - The Coming of the Storm
PDF
Andrew Hoppin, CIO, NY State Senate
PPT
Going social: the librarians bag of tricks
PPT
Wiki on Library Perspective
PPT
Web 2.0 In a Nutshell : A Librarian Guide to the World of Web 2.0
PPTX
Library2 Presentation
PDF
Web 2.0 For Labor
Practical examples of web2.0 in the development sector
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Web 2.0 - The Coming of the Storm
Andrew Hoppin, CIO, NY State Senate
Going social: the librarians bag of tricks
Wiki on Library Perspective
Web 2.0 In a Nutshell : A Librarian Guide to the World of Web 2.0
Library2 Presentation
Web 2.0 For Labor
Ad

Viewers also liked (7)

PDF
Web Archiving: A Brief Introduction
PDF
Archivo web fesabid1
PDF
El archivo web de la BNE. Mar Pérez Morillo, Icíar Muguerza López
PDF
Web Archiving: A Brief Introduction
PDF
El Archivo de la Web Española. Mar Pérez Morillo
PPT
Archivistica
PDF
Archivamiento web: conceptos básicos, estrategias y mejores practicas
Web Archiving: A Brief Introduction
Archivo web fesabid1
El archivo web de la BNE. Mar Pérez Morillo, Icíar Muguerza López
Web Archiving: A Brief Introduction
El Archivo de la Web Española. Mar Pérez Morillo
Archivistica
Archivamiento web: conceptos básicos, estrategias y mejores practicas
Ad

Similar to Building Web Archiving Technology, Together (20)

PPTX
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
PPTX
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
PPTX
The Social Semantic Web
PDF
Introduction to Open Source for Libraries
PDF
Open Source & Libraries
PPTX
Breaking Down Walls in Enterprise with Social Semantics
PDF
Social Networking Extensions for EPrints
PPTX
Geeks bearing gifts: Unwrapping New Technologies, Version April12
PDF
Providing Services to our Remote Users: Open Source Solutions
PPT
Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...
PDF
Developing XWiki
PDF
Wiki Analytics Workshop
PDF
Vila LOD-innovacion- bib-semweb-redux
PPTX
IA Gateway Introduction
PPTX
Leveraging the Crowd: Supporting Newcomers to Build an OSS Community
PDF
QCon São Paulo: Real-Time Analytics with Spark Streaming
PDF
20100306 Datasalon 4 : code4lib
PDF
What is New in W3C land?
PPTX
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
PPT
Open Source Issues and Trends
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
The Social Semantic Web
Introduction to Open Source for Libraries
Open Source & Libraries
Breaking Down Walls in Enterprise with Social Semantics
Social Networking Extensions for EPrints
Geeks bearing gifts: Unwrapping New Technologies, Version April12
Providing Services to our Remote Users: Open Source Solutions
Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...
Developing XWiki
Wiki Analytics Workshop
Vila LOD-innovacion- bib-semweb-redux
IA Gateway Introduction
Leveraging the Crowd: Supporting Newcomers to Build an OSS Community
QCon São Paulo: Real-Time Analytics with Spark Streaming
20100306 Datasalon 4 : code4lib
What is New in W3C land?
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Open Source Issues and Trends

More from nullhandle (20)

PPTX
Understanding Legal Use Cases for Web Archives
PPTX
Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Archite...
PPTX
Unlocking LOCKSS with APIs
PPTX
Interoperability and Technical Collaboration for Web and Social Media Archiving
PPTX
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
PPTX
2015 NDSA Web Archiving Survey Report Highlights
PPTX
Why Not Lots of Copies Keep(ing) Software Safe?
PPTX
WASAPI Web Archive Data Transfer APIs
PPTX
Measure All the (Web Archiving) Things!
PPTX
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
PPTX
Campaign Web Archives to Support Multi-Institutional Research
PPTX
2013 NDSA Web Archiving Survey Report Highlights
PPTX
Considerations for Strategic Web Archive Collection Development
PPTX
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
PPTX
Advocating for Web Archivability
PPTX
Building Archivable Websites
PPTX
Link Persistence, Website Persistence
PPTX
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...
PPT
Tool Academy: Web Archiving
PPT
Using Wayback Machine for Research
Understanding Legal Use Cases for Web Archives
Lots More LOCKSS for Web Archiving: Boons from the LOCKSS Software Re-Archite...
Unlocking LOCKSS with APIs
Interoperability and Technical Collaboration for Web and Social Media Archiving
Rethinking Web Archiving Quality Assurance for Impact, Scalability, and Susta...
2015 NDSA Web Archiving Survey Report Highlights
Why Not Lots of Copies Keep(ing) Software Safe?
WASAPI Web Archive Data Transfer APIs
Measure All the (Web Archiving) Things!
A Snapshot of the U.S. Web Archiving Landscape through the 2013 NDSA Survey R...
Campaign Web Archives to Support Multi-Institutional Research
2013 NDSA Web Archiving Survey Report Highlights
Considerations for Strategic Web Archive Collection Development
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Advocating for Web Archivability
Building Archivable Websites
Link Persistence, Website Persistence
A Survey of Research Prospects for more Manageable Personal Digital Photo Col...
Tool Academy: Web Archiving
Using Wayback Machine for Research

Recently uploaded (20)

PDF
Decoding a Decade: 10 Years of Applied CTI Discipline
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPTX
Mathew Digital SEO Checklist Guidlines 2025
PPTX
522797556-Unit-2-Temperature-measurement-1-1.pptx
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PDF
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
PDF
Exploring VPS Hosting Trends for SMBs in 2025
PPTX
Funds Management Learning Material for Beg
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PPTX
artificial intelligence overview of it and more
Decoding a Decade: 10 Years of Applied CTI Discipline
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Mathew Digital SEO Checklist Guidlines 2025
522797556-Unit-2-Temperature-measurement-1-1.pptx
An introduction to the IFRS (ISSB) Stndards.pdf
Module 1 - Cyber Law and Ethics 101.pptx
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
Slides PPTX World Game (s) Eco Economic Epochs.pptx
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Unit-1 introduction to cyber security discuss about how to secure a system
SASE Traffic Flow - ZTNA Connector-1.pdf
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
Exploring VPS Hosting Trends for SMBs in 2025
Funds Management Learning Material for Beg
WebRTC in SignalWire - troubleshooting media negotiation
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
artificial intelligence overview of it and more

Building Web Archiving Technology, Together