SlideShare a Scribd company logo
Reference Rot and !
Link Decoration!
Martin Klein!
UCLA
martinklein0815@gmail.com
@mart1nkle1n
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Hiberlink Team
• Los Alamos National Laboratory
• Research Library: (Martin Klein), (Robert Sanderson), Harihar
Shankar, Herbert Van de Sompel!
• University of Edinburgh
• Edina: Peter Burnhill, Neil Mayo, Muriel Mewissen, Christine
Rees, Tim Strickland, Richard Wincewicz
• Language Technology Group: Beatrix Alex, Claire Grover,
Colin Matheson, Richard Tobin, (Ke “Adam” Zhou)
• Funding: Andrew W. Mellon Foundation
2
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
3
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115253
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
4
Reference Rot
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
5
Link Rot
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
6
“Entertaining” Link Rot
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
7
Ubiquitous Link Rot
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
8
Content Drift
http://dl00.org!
!
2000
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
9
Content Drift
http://dl00.org!
!
2004
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
10
Content Drift
http://dl00.org!
!
2005
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
11
Content Drift
http://dl00.org!
!
2008
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
12
NYT Coverage
Links in!
Supreme Court decisions:!
!
• Link rot: 29%!
!
• Reference rot: 49%
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
13
Scholarly Communication
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
14
!Exist
!Exist
!Exist
Exist
Exist
Archived
Archived
!Archived
Archived
Archived
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Entrance Hiberlink
• These resources:
• Are not necessarily under the custodianship of parties that care about
long time integrity, access
• Do not necessarily have the same sense of fixity like e.g., journal articles
• Links to these resources are subject to Reference Rot:
• Link Rot: Link stops working e.g., HTTP 404
• Content Drift: Linked content changes over time
15
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
16
Quantifying!
Reference Rot
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Our Study
• Time frame of publications: Jan 1997 - Dec 2012
• Articles from arXiv, Elsevier, and PMC in XML and PDF format
• Convert PDF to XML
• Extract URIs to web at large resources
• Store article’s publication date
• URI live web test (trusted in 200 OK response)
• URI archive lookup via Memento infrastructure
17
arXiv Elsevier PMC
total articles 707, 667 2, 285, 000 595, 889
articles with HTTP references 142, 134 94, 645 156, 160
amount of HTTP references 346, 177 232, 712 480, 853
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
18
1997 1999 2001 2003 2005 2007 2009 2011
02000060000100000140000180000
articles
URI references
1997 1999 2001 2003 2005 2007 2009 2011
050001500025000350004500055000
articles
URI references
1997 1999 2001 2003 2005 2007 2009 2011
050000100000150000200000250000300000350000
articles
URI references
PMC
Elsevier
arXiv
Our Corpora
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
19
Link Rot in arXiv
1997 1999 2001 2003 2005 2007 2009 2011
102030405060708090100
1000020000300004000050000
HTTP References
Link Rot
NumberofHTTPReferences
LinkRotPercentage
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
20
1997 1999 2001 2003 2005 2007 2009 2011
102030405060708090100
1000020000300004000050000
HTTP References
Link Rot
NumberofHTTPReferences
LinkRotPercentage
1997 1999 2001 2003 2005 2007 2009 2011
102030405060708090100
5000100001500020000250003000035000
HTTP References
Link Rot
NumberofHTTPReferences
LinkRotPercentage
1997 1999 2001 2003 2005 2007 2009 2011
102030405060708090100
20000400006000080000100000120000
HTTP References
Link Rot
NumberofHTTPReferences
LinkRotPercentage
PMC
Elsevier
arXiv
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
21
Content Drift / Archival Status
Not Archived
75.3%
Archived
24.7%
Rotten
26.0%
Active
74.0%
All Links
• Archival status used as proxy
• Availability of archived copy created within N days of article’s publication
• N = 14 arXiv
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
22
PMC
Elsevier
arXiv
Not Archived
75.3%
Archived
24.7%
Rotten
26.0%
Active
74.0%
All Links
Not Archived
75.2%
Archived
24.8%
Rotten
32.7%
Active
67.3%
All Links
Not Archived
74.5%
Archived
25.5%
Rotten
20.0%
Active
80.0%
All Links
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
23
Loss of Context
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
24
Loss of Context
all links active links
links archived!
(14 days)
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
STM Article Extrapolation
25
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
STM Article Extrapolation
• Immune: article contains no URIs to web at large
resources
• Healthy: none of the URIs to web at large
resources suffer from link rot nor content drift
• infected: at least one URI to web at large
resources suffers from link rot or content drift
26
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
27
Immune vs not Immune STM Articles
0
10
20
30
40
50
60
70
80
90
100
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Immune not Immune
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
STM Article Extrapolation
• Immune: article contains no URIs to web at large
resources
• Healthy: none of the URIs to web at large
resources suffer from reference rot
• Infected: at least one URI to web at large
resources suffers from reference rot
28
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
29
0
10
20
30
40
50
60
70
80
90
100
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Immune Healthy Infected
1/5 articles suffers !
from !
Reference Rot!
Immune, Healthy, Infected STM Articles
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
30
An approach to solve !
Reference Rot
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Robust Links
1.Create snapshot of linked resources in a web archive when:
• drafting work
• submitting article
• publishing article
• aggregating article
31
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Robust Links
1. Create snapshot of linked resources in a web
archive
2. Convey creation date of your web page in
machine-actionable manner
32
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Page Creation Date
33
<!DOCTYPE html>
<html>
<head>
<title> … </title>
<meta itemprop="datePublished" content="2015-02-18" />
…
</head>
…
</html>
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
34
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Robust Links
1. Create snapshot of linked resources in a web archive
2. Convey creation date of your web page in machine-
actionable manner
3. Decorate links with datetime of linking and URI of
archived snapshot, in addition to resource’s original
URI
35
http://robustlinks.mementoweb.org/spec/
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Link Decoration
36
<a href="http://hiberlink.org/">http://hiberlink.org/</a>
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
Link Decoration
37
<a href="http://hiberlink.org/"
!
data-versionurl="http://archive.is/Bvq2v"
data-versiondate=“2014-11-01">
!
http://hiberlink.org/</a>
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
38
http://robustlinks.mementoweb.org/demo/uri_references_js.html
Reference Rot and Link Decoration!
@mart1nkle1n!
OAI9, Geneva, June 17th 2015
39
http://robustlinks.mementoweb.org/demo/uri_references_js.html
Reference Rot and !
Link Decoration!
Martin Klein!
UCLA
martinklein0815@gmail.com
@mart1nkle1n

More Related Content

PPTX
Prototypes of pro-active approaches to support the archiving of web reference...
PDF
Quantifying Orphaned Annotations in Hypothes.is
PPTX
To the Rescue of the Orphans of Scholarly Communication
PPTX
The Web We Want
PDF
Linked data - A radical change?
PPTX
PID Signposting Pattern
PPTX
Discovering Scholarly Orphans Using ORCID
PPT
Achieving Link Integrity for Managed Collections
Prototypes of pro-active approaches to support the archiving of web reference...
Quantifying Orphaned Annotations in Hypothes.is
To the Rescue of the Orphans of Scholarly Communication
The Web We Want
Linked data - A radical change?
PID Signposting Pattern
Discovering Scholarly Orphans Using ORCID
Achieving Link Integrity for Managed Collections

What's hot (20)

PDF
Linked data radical change
PPTX
Signposting Overview
ODP
Linked Data: turning the web into a context graph
PDF
Impact of URI Canonicalization on Memento Count
PPTX
Signposting for Repositories
PDF
Linked Data Patterns
PPTX
Paul Evan Peters Lecture
PDF
Metadata / Linked Data
ODP
Web Integrated Data
PPTX
BIBFRAME as a Library Linked Data Standard
PPTX
Centre for Social Informatics - January 2016
PDF
More than just access: scholarship is in need of infrastructure reform
ODP
Dataincubator
PPT
Start Or Home Pages
PDF
Welcome to Consuming Linked Data tutorial WWW2010
PPTX
Very Gentle Linked Data Workshop
PPTX
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
PPTX
Creating Topical Collections: Web Archives vs. Live Web
PDF
Compile, Clean, Connect: The mantra of data journalism (Future Everything 2011)
PDF
Answers to usual issues in getting started with consuming Linked Data
Linked data radical change
Signposting Overview
Linked Data: turning the web into a context graph
Impact of URI Canonicalization on Memento Count
Signposting for Repositories
Linked Data Patterns
Paul Evan Peters Lecture
Metadata / Linked Data
Web Integrated Data
BIBFRAME as a Library Linked Data Standard
Centre for Social Informatics - January 2016
More than just access: scholarship is in need of infrastructure reform
Dataincubator
Start Or Home Pages
Welcome to Consuming Linked Data tutorial WWW2010
Very Gentle Linked Data Workshop
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Creating Topical Collections: Web Archives vs. Live Web
Compile, Clean, Connect: The mantra of data journalism (Future Everything 2011)
Answers to usual issues in getting started with consuming Linked Data
Ad

Similar to Reference Rot and Link Decoration (20)

PPTX
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
PPTX
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
PPTX
Hiberlink: Investigating Reference Rot, December 2013
PPTX
Reference Rot: Threat and Remedy
PPTX
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
PPTX
Reference Rot
PPTX
Robust Linking to Web Resources
PPTX
Web Today, Good Tomorrow? Transactional archiving of web content
PPTX
The web is rotting and what to do about it
PPTX
Reference Rot and Linked Data: Threat and Remedy
PPTX
Ensuring the Integrity (& Continuity) of Our Record of Scholarship
PDF
Reference Rot and E-Theses: Threat and Remedy
PPTX
Robust Links - a proposed solution to reference rot in scholarly communication
PPTX
Dulin PermaCC Talk for MIT PIS
PDF
090626cc tech-summit
PPTX
NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012
PPTX
Burke siobhan link_resolver
PDF
En toen was er niets meer ....
PDF
Archives & the Semantic Web
PPTX
Burke siobhan link_resolver
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Hiberlink: Investigating Reference Rot, December 2013
Reference Rot: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
Reference Rot
Robust Linking to Web Resources
Web Today, Good Tomorrow? Transactional archiving of web content
The web is rotting and what to do about it
Reference Rot and Linked Data: Threat and Remedy
Ensuring the Integrity (& Continuity) of Our Record of Scholarship
Reference Rot and E-Theses: Threat and Remedy
Robust Links - a proposed solution to reference rot in scholarly communication
Dulin PermaCC Talk for MIT PIS
090626cc tech-summit
NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012
Burke siobhan link_resolver
En toen was er niets meer ....
Archives & the Semantic Web
Burke siobhan link_resolver
Ad

More from Martin Klein (20)

PPTX
On the Persistence of Persistent Identifiers of the Scholarly Web
PPTX
On the Persistence of Persistent Identifiers of the Scholarly Web
PPTX
An Institutional Perspective to Rescue Scholarly Orphans
PPTX
Who is Asking - Humans and Machines Experience a Different Scholarly Web
PPTX
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
PPTX
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
PPTX
Comparing the Performance of OAI-PMH with ResourceSync
PPTX
Evaluating Memento Service Optimizations
PPTX
An Institutional Perspective to Rescue Scholarly Orphans
PPTX
A Vision of the Library’s Role in Archiving Scholarly Artifacts
PPTX
First Steps in Research Data Management Under Constraints of a National Secur...
PPTX
Smart Routing of Memento Requests
PPTX
Building Event Collections from Crawling Web Archives
PPTX
A Web-Centric Pipeline for Archiving Scholarly Artifacts
PPTX
Focused Crawl of Web Archives to Build Event Collections
PPTX
Using the Memento Framework to Assess Content Drift in Scholarly Communication
PPTX
Uniform Access to Raw Mementos
PDF
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
PPTX
web_archive_interoperability_memento
PPTX
Comparing Published Scientific Journal Articles to Their Pre-print Versions
On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
An Institutional Perspective to Rescue Scholarly Orphans
Who is Asking - Humans and Machines Experience a Different Scholarly Web
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Comparing the Performance of OAI-PMH with ResourceSync
Evaluating Memento Service Optimizations
An Institutional Perspective to Rescue Scholarly Orphans
A Vision of the Library’s Role in Archiving Scholarly Artifacts
First Steps in Research Data Management Under Constraints of a National Secur...
Smart Routing of Memento Requests
Building Event Collections from Crawling Web Archives
A Web-Centric Pipeline for Archiving Scholarly Artifacts
Focused Crawl of Web Archives to Build Event Collections
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Uniform Access to Raw Mementos
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
web_archive_interoperability_memento
Comparing Published Scientific Journal Articles to Their Pre-print Versions

Recently uploaded (20)

DOCX
Unit-3 cyber security network security of internet system
PPTX
newyork.pptxirantrafgshenepalchinachinane
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PPTX
artificialintelligenceai1-copy-210604123353.pptx
DOC
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
PDF
Exploring VPS Hosting Trends for SMBs in 2025
PDF
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PPT
Ethics in Information System - Management Information System
PPT
Design_with_Watersergyerge45hrbgre4top (1).ppt
PPTX
Introduction to Information and Communication Technology
PDF
Introduction to the IoT system, how the IoT system works
PPTX
Database Information System - Management Information System
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PPTX
Funds Management Learning Material for Beg
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PPTX
presentation_pfe-universite-molay-seltan.pptx
Unit-3 cyber security network security of internet system
newyork.pptxirantrafgshenepalchinachinane
Module 1 - Cyber Law and Ethics 101.pptx
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
artificialintelligenceai1-copy-210604123353.pptx
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
Exploring VPS Hosting Trends for SMBs in 2025
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
Tenda Login Guide: Access Your Router in 5 Easy Steps
Ethics in Information System - Management Information System
Design_with_Watersergyerge45hrbgre4top (1).ppt
Introduction to Information and Communication Technology
Introduction to the IoT system, how the IoT system works
Database Information System - Management Information System
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
An introduction to the IFRS (ISSB) Stndards.pdf
Funds Management Learning Material for Beg
Power Point - Lesson 3_2.pptx grad school presentation
The New Creative Director: How AI Tools for Social Media Content Creation Are...
presentation_pfe-universite-molay-seltan.pptx

Reference Rot and Link Decoration

  • 1. Reference Rot and ! Link Decoration! Martin Klein! UCLA martinklein0815@gmail.com @mart1nkle1n
  • 2. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 Hiberlink Team • Los Alamos National Laboratory • Research Library: (Martin Klein), (Robert Sanderson), Harihar Shankar, Herbert Van de Sompel! • University of Edinburgh • Edina: Peter Burnhill, Neil Mayo, Muriel Mewissen, Christine Rees, Tim Strickland, Richard Wincewicz • Language Technology Group: Beatrix Alex, Claire Grover, Colin Matheson, Richard Tobin, (Ke “Adam” Zhou) • Funding: Andrew W. Mellon Foundation 2
  • 3. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 3 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115253
  • 4. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 4 Reference Rot
  • 5. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 5 Link Rot
  • 6. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 6 “Entertaining” Link Rot
  • 7. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 7 Ubiquitous Link Rot
  • 8. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 8 Content Drift http://dl00.org! ! 2000
  • 9. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 9 Content Drift http://dl00.org! ! 2004
  • 10. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 10 Content Drift http://dl00.org! ! 2005
  • 11. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 11 Content Drift http://dl00.org! ! 2008
  • 12. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 12 NYT Coverage Links in! Supreme Court decisions:! ! • Link rot: 29%! ! • Reference rot: 49%
  • 13. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 13 Scholarly Communication
  • 14. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 14 !Exist !Exist !Exist Exist Exist Archived Archived !Archived Archived Archived
  • 15. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 Entrance Hiberlink • These resources: • Are not necessarily under the custodianship of parties that care about long time integrity, access • Do not necessarily have the same sense of fixity like e.g., journal articles • Links to these resources are subject to Reference Rot: • Link Rot: Link stops working e.g., HTTP 404 • Content Drift: Linked content changes over time 15
  • 16. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 16 Quantifying! Reference Rot
  • 17. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 Our Study • Time frame of publications: Jan 1997 - Dec 2012 • Articles from arXiv, Elsevier, and PMC in XML and PDF format • Convert PDF to XML • Extract URIs to web at large resources • Store article’s publication date • URI live web test (trusted in 200 OK response) • URI archive lookup via Memento infrastructure 17 arXiv Elsevier PMC total articles 707, 667 2, 285, 000 595, 889 articles with HTTP references 142, 134 94, 645 156, 160 amount of HTTP references 346, 177 232, 712 480, 853
  • 18. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 18 1997 1999 2001 2003 2005 2007 2009 2011 02000060000100000140000180000 articles URI references 1997 1999 2001 2003 2005 2007 2009 2011 050001500025000350004500055000 articles URI references 1997 1999 2001 2003 2005 2007 2009 2011 050000100000150000200000250000300000350000 articles URI references PMC Elsevier arXiv Our Corpora
  • 19. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 19 Link Rot in arXiv 1997 1999 2001 2003 2005 2007 2009 2011 102030405060708090100 1000020000300004000050000 HTTP References Link Rot NumberofHTTPReferences LinkRotPercentage
  • 20. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 20 1997 1999 2001 2003 2005 2007 2009 2011 102030405060708090100 1000020000300004000050000 HTTP References Link Rot NumberofHTTPReferences LinkRotPercentage 1997 1999 2001 2003 2005 2007 2009 2011 102030405060708090100 5000100001500020000250003000035000 HTTP References Link Rot NumberofHTTPReferences LinkRotPercentage 1997 1999 2001 2003 2005 2007 2009 2011 102030405060708090100 20000400006000080000100000120000 HTTP References Link Rot NumberofHTTPReferences LinkRotPercentage PMC Elsevier arXiv
  • 21. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 21 Content Drift / Archival Status Not Archived 75.3% Archived 24.7% Rotten 26.0% Active 74.0% All Links • Archival status used as proxy • Availability of archived copy created within N days of article’s publication • N = 14 arXiv
  • 22. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 22 PMC Elsevier arXiv Not Archived 75.3% Archived 24.7% Rotten 26.0% Active 74.0% All Links Not Archived 75.2% Archived 24.8% Rotten 32.7% Active 67.3% All Links Not Archived 74.5% Archived 25.5% Rotten 20.0% Active 80.0% All Links
  • 23. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 23 Loss of Context
  • 24. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 24 Loss of Context all links active links links archived! (14 days)
  • 25. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 STM Article Extrapolation 25
  • 26. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 STM Article Extrapolation • Immune: article contains no URIs to web at large resources • Healthy: none of the URIs to web at large resources suffer from link rot nor content drift • infected: at least one URI to web at large resources suffers from link rot or content drift 26
  • 27. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 27 Immune vs not Immune STM Articles 0 10 20 30 40 50 60 70 80 90 100 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Immune not Immune
  • 28. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 STM Article Extrapolation • Immune: article contains no URIs to web at large resources • Healthy: none of the URIs to web at large resources suffer from reference rot • Infected: at least one URI to web at large resources suffers from reference rot 28
  • 29. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 29 0 10 20 30 40 50 60 70 80 90 100 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Immune Healthy Infected 1/5 articles suffers ! from ! Reference Rot! Immune, Healthy, Infected STM Articles
  • 30. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 30 An approach to solve ! Reference Rot
  • 31. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 Robust Links 1.Create snapshot of linked resources in a web archive when: • drafting work • submitting article • publishing article • aggregating article 31
  • 32. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 Robust Links 1. Create snapshot of linked resources in a web archive 2. Convey creation date of your web page in machine-actionable manner 32
  • 33. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 Page Creation Date 33 <!DOCTYPE html> <html> <head> <title> … </title> <meta itemprop="datePublished" content="2015-02-18" /> … </head> … </html>
  • 34. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 34
  • 35. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 Robust Links 1. Create snapshot of linked resources in a web archive 2. Convey creation date of your web page in machine- actionable manner 3. Decorate links with datetime of linking and URI of archived snapshot, in addition to resource’s original URI 35 http://robustlinks.mementoweb.org/spec/
  • 36. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 Link Decoration 36 <a href="http://hiberlink.org/">http://hiberlink.org/</a>
  • 37. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 Link Decoration 37 <a href="http://hiberlink.org/" ! data-versionurl="http://archive.is/Bvq2v" data-versiondate=“2014-11-01"> ! http://hiberlink.org/</a>
  • 38. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 38 http://robustlinks.mementoweb.org/demo/uri_references_js.html
  • 39. Reference Rot and Link Decoration! @mart1nkle1n! OAI9, Geneva, June 17th 2015 39 http://robustlinks.mementoweb.org/demo/uri_references_js.html
  • 40. Reference Rot and ! Link Decoration! Martin Klein! UCLA martinklein0815@gmail.com @mart1nkle1n