SlideShare a Scribd company logo
The Web of Sites: Creating Effective
Web Archiving Appraisal and
Collection Development Policies
Jennifer Wright
Archives and Information Management Team Leader
SAA 2013
Session 408
The Mission of Smithsonian Archives
 Appraise, acquire, and preserve
the records of the Smithsonian
Institution
 Offer a range of research and
reference services
 Establish policy and provide
expert guidance on record
keeping practices
 Create and promote products
and services that broaden
understanding of the
Smithsonian
Websites as Records
 Smithsonian’s official definition of a record:
“any official recorded information, regardless of
medium or characteristics, created, received,
and maintained by a Smithsonian museum,
office, or employee”
Smithsonian Directive 950
Management of the Smithsonian Web
 Sets policies and procedures to ensure the integrity
of content, reliability of infrastructure, and usability
of websites while protecting privacy of visitors and
Smithsonian’s reputation
 Requires Archives to provide dispositions for unit
websites, web applications, and online exhibits
 Requires Archives to maintain historical snapshots
of Smithsonian websites and related content
Smithsonian Directive 814
Social Media Policy
 Sets policy for opening and maintaining official
Smithsonian social media accounts
 Requires that units notify Archives when opening
and before closing a social media account
 Requires Archives to maintain registry of social
media accounts and to archive information
contained in the accounts according to current
standards and retention policies
Why Save?
 Websites and social media profiles are Smithsonian’s
public face
 Similar to a publication
 May incorporate many types of materials
 May replace other formats
Sounds straightforward.
How complicated could
appraisal possibly be?
Smithsonian’s Web Presence
 257 websites + 10 mobile websites
 89 blogs
 26 apps for various platforms
 578 social media accounts including:
 153 Facebook accounts
 105 Twitter accounts
 66 Flickr accounts
 66 YouTube accounts
http://www.si.edu/Connect
Why Not Save Everything?
 Some content already transferred to Archives in
another format
 Some content is the responsibility of other units
 Some content is collections, not records
 Some content serves only as pointers to other
Smithsonian and non-Smithsonian content
Other Issues Affecting Appraisal
 Certain types of files and coding don’t crawl well
 Flash, JavaScript, some video
 Organization and coding of site may make it impossible to
capture everything wanted and exclude everything unwanted
 Social media terms of service often do not allow
crawling
 Users may consider social media interactions to be
private
One policy doesn’t fit all
Our Policies: Public Websites
 Permanent records but may exclude:
 Detailed collections information
 Large sections duplicated in another format
 Crawl annually, before and after redesign, and on
day of major event
Our Policies: Intranets
 Individually appraised based upon content
 Generally block crawlers – permanent records must
be transferred via ftp, server to server transfer, or
external drive
 Will be restricted as appropriate
Our Policies: Social Media Accounts
 Will capture most accounts one time to show they
existed and how they were used
 Will crawl, use export tool, take screenshots, or a
combo to best capture account
 Will not be made immediately available online to
mitigate violations of terms of service
Our Policies: Social Media Accounts
 Must include or link to Smithsonian’s Terms of Use
– no capture otherwise
http://www.si.edu/Termsofuse
Our Policies: Social Media Accounts
 After first capture, account will be appraised
annually - significant original content will be
captured again
Our Policies: Blogs
 Permanent records
 Crawl annually unless there is no link to
Smithsonian’s terms of use
Questions?
Jennifer Wright
Archives and Information
Management Team Leader
wrightjm@si.edu
http://www.siarchives.si.edu/
SAA 2013 Session 408
Original Smithsonian Home
Page, launched May 8, 1995

More Related Content

PPTX
Preserving the Smithsonian Institution’s Web Presence
PDF
What can you do with an intranet vs. an extranet?
PPT
PPTX
Re-imagining the Attic: Creating User-Centered Services for Your Special Col...
PPTX
SAA 2014 session 703
PDF
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
PPTX
Best Practices for Descriptive Metadata
PPT
Creating and Maintaining Web Archives
Preserving the Smithsonian Institution’s Web Presence
What can you do with an intranet vs. an extranet?
Re-imagining the Attic: Creating User-Centered Services for Your Special Col...
SAA 2014 session 703
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
Best Practices for Descriptive Metadata
Creating and Maintaining Web Archives

Similar to The Web of Sites: Creating Effective Web Archiving Appraisal and Collection Development Policies (20)

PPT
Web and Twitter Archiving at the Library of Congress
PPT
Fuller Disclosure: Getting More Collections into the Network Flow
PPTX
WS-DL’s Work towards Enabling Personal Use of Web Archives
PPTX
Was uc3-nov2012wkshps-final
PPTX
Archiving for Now and Later - workshop at Common Field Convening 2019
PPT
Digital Preservation and Social Media Outreach
PPT
Cultural Heritage Insitutions and Big Data Collections
PPTX
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
PDF
Interrogating the Politics and Performativity of Web Archiving
PPT
Analytics and Access to the UK web archive
PPTX
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
PPTX
Aggregating Private and Public Web Archives Using the Mementity Framework
PPT
Web Archiving Intro (circa 2015)
PPT
Introduction to British Library digital resources for social scientists
PDF
Observing Web Archives: The Case for an Ethnographic Study of Web Archiving
PPTX
TLA Conference 2012
PPTX
Best Practices for Descriptive Metadata for Web Archiving
PDF
Introduction to Web Archiving
PDF
Leslie Johnston Keynote, Best Practices Exchange 2011
PPTX
What Does It Mean to Have Collections?
Web and Twitter Archiving at the Library of Congress
Fuller Disclosure: Getting More Collections into the Network Flow
WS-DL’s Work towards Enabling Personal Use of Web Archives
Was uc3-nov2012wkshps-final
Archiving for Now and Later - workshop at Common Field Convening 2019
Digital Preservation and Social Media Outreach
Cultural Heritage Insitutions and Big Data Collections
Boiling the Ocean, Together: Web Archive Collection Development in a Global C...
Interrogating the Politics and Performativity of Web Archiving
Analytics and Access to the UK web archive
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
Aggregating Private and Public Web Archives Using the Mementity Framework
Web Archiving Intro (circa 2015)
Introduction to British Library digital resources for social scientists
Observing Web Archives: The Case for an Ethnographic Study of Web Archiving
TLA Conference 2012
Best Practices for Descriptive Metadata for Web Archiving
Introduction to Web Archiving
Leslie Johnston Keynote, Best Practices Exchange 2011
What Does It Mean to Have Collections?
Ad

More from Smithsonian Institution Archives (11)

PDF
Know Thyself: How Suffering Through An Existential Crisis Will Help you Plan ...
PPTX
Evolution of the Memo
PPTX
Don’t Panic! : An Archivist’s Guide to Emergency Response – Lessons from the ...
PPTX
Preserving Digital Materials at the Smithsonian Institution Archives
PPT
The Smithsonian Institution's Crowdsourcing Tradition, Since 1849
PPTX
The Russell E. Train Africana Collection: An Archival Safari through Photogra...
PPT
The Most Famous Man You’ve Never Heard Of: Dr. J. Horace McFarland
PPTX
The Chief S.O. Alonge Photographic Collection: Royal Court of Benin photograp...
PPT
Out of the Box: The Archives of American Art’s Lawrence A. Fleischman Gallery
PPTX
Magnetic Videotape Recordings: Preservation, Assessment, and Migration
PDF
The Evolution and Management of Email
Know Thyself: How Suffering Through An Existential Crisis Will Help you Plan ...
Evolution of the Memo
Don’t Panic! : An Archivist’s Guide to Emergency Response – Lessons from the ...
Preserving Digital Materials at the Smithsonian Institution Archives
The Smithsonian Institution's Crowdsourcing Tradition, Since 1849
The Russell E. Train Africana Collection: An Archival Safari through Photogra...
The Most Famous Man You’ve Never Heard Of: Dr. J. Horace McFarland
The Chief S.O. Alonge Photographic Collection: Royal Court of Benin photograp...
Out of the Box: The Archives of American Art’s Lawrence A. Fleischman Gallery
Magnetic Videotape Recordings: Preservation, Assessment, and Migration
The Evolution and Management of Email
Ad

Recently uploaded (20)

PDF
August Patch Tuesday
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Tartificialntelligence_presentation.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Machine Learning_overview_presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
August Patch Tuesday
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Unlocking AI with Model Context Protocol (MCP)
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Building Integrated photovoltaic BIPV_UPV.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
cloud_computing_Infrastucture_as_cloud_p
Group 1 Presentation -Planning and Decision Making .pptx
A Presentation on Artificial Intelligence
Getting Started with Data Integration: FME Form 101
Tartificialntelligence_presentation.pptx
A comparative study of natural language inference in Swahili using monolingua...
Univ-Connecticut-ChatGPT-Presentaion.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
MIND Revenue Release Quarter 2 2025 Press Release
Machine Learning_overview_presentation.pptx
Empathic Computing: Creating Shared Understanding

The Web of Sites: Creating Effective Web Archiving Appraisal and Collection Development Policies

  • 1. The Web of Sites: Creating Effective Web Archiving Appraisal and Collection Development Policies Jennifer Wright Archives and Information Management Team Leader SAA 2013 Session 408
  • 2. The Mission of Smithsonian Archives  Appraise, acquire, and preserve the records of the Smithsonian Institution  Offer a range of research and reference services  Establish policy and provide expert guidance on record keeping practices  Create and promote products and services that broaden understanding of the Smithsonian
  • 3. Websites as Records  Smithsonian’s official definition of a record: “any official recorded information, regardless of medium or characteristics, created, received, and maintained by a Smithsonian museum, office, or employee”
  • 4. Smithsonian Directive 950 Management of the Smithsonian Web  Sets policies and procedures to ensure the integrity of content, reliability of infrastructure, and usability of websites while protecting privacy of visitors and Smithsonian’s reputation  Requires Archives to provide dispositions for unit websites, web applications, and online exhibits  Requires Archives to maintain historical snapshots of Smithsonian websites and related content
  • 5. Smithsonian Directive 814 Social Media Policy  Sets policy for opening and maintaining official Smithsonian social media accounts  Requires that units notify Archives when opening and before closing a social media account  Requires Archives to maintain registry of social media accounts and to archive information contained in the accounts according to current standards and retention policies
  • 6. Why Save?  Websites and social media profiles are Smithsonian’s public face  Similar to a publication  May incorporate many types of materials  May replace other formats
  • 7. Sounds straightforward. How complicated could appraisal possibly be?
  • 8. Smithsonian’s Web Presence  257 websites + 10 mobile websites  89 blogs  26 apps for various platforms  578 social media accounts including:  153 Facebook accounts  105 Twitter accounts  66 Flickr accounts  66 YouTube accounts http://www.si.edu/Connect
  • 9. Why Not Save Everything?  Some content already transferred to Archives in another format  Some content is the responsibility of other units  Some content is collections, not records  Some content serves only as pointers to other Smithsonian and non-Smithsonian content
  • 10. Other Issues Affecting Appraisal  Certain types of files and coding don’t crawl well  Flash, JavaScript, some video  Organization and coding of site may make it impossible to capture everything wanted and exclude everything unwanted  Social media terms of service often do not allow crawling  Users may consider social media interactions to be private
  • 12. Our Policies: Public Websites  Permanent records but may exclude:  Detailed collections information  Large sections duplicated in another format  Crawl annually, before and after redesign, and on day of major event
  • 13. Our Policies: Intranets  Individually appraised based upon content  Generally block crawlers – permanent records must be transferred via ftp, server to server transfer, or external drive  Will be restricted as appropriate
  • 14. Our Policies: Social Media Accounts  Will capture most accounts one time to show they existed and how they were used  Will crawl, use export tool, take screenshots, or a combo to best capture account  Will not be made immediately available online to mitigate violations of terms of service
  • 15. Our Policies: Social Media Accounts  Must include or link to Smithsonian’s Terms of Use – no capture otherwise http://www.si.edu/Termsofuse
  • 16. Our Policies: Social Media Accounts  After first capture, account will be appraised annually - significant original content will be captured again
  • 17. Our Policies: Blogs  Permanent records  Crawl annually unless there is no link to Smithsonian’s terms of use
  • 18. Questions? Jennifer Wright Archives and Information Management Team Leader wrightjm@si.edu http://www.siarchives.si.edu/ SAA 2013 Session 408 Original Smithsonian Home Page, launched May 8, 1995

Editor's Notes

  • #4: By this definition, any official web presence maintained by Smithsonian units is considered a record and subject to appraisal by the Archives.
  • #5: The Smithsonian also has two directives governing its web presence that give the Archives specific responsibilities.
  • #8: An organization’s web presence may be larger than you realize.
  • #9: Not to mention iTunes,Pinterest, UStream, FourSquare, Instagram, Tumblr, Google+, Wikis, Vine, Vimeo, and many others.That’s a lot of data to be captured, preserved, and stored over the long haul. We need to make sure we’re not capturing more than is necessary.
  • #11: There are also technical and legal issues affecting appraisal.
  • #12: We’ve found that one policy doesn’t fit every situation and we’ve developed general polices for different types of web presences.
  • #13: Annually is our goal, but we’re still working up to that frequency.
  • #17: On the left is my favorite example of original content. On April 30, 2012, the National Zoo live-tweeted from the artificial insemination of our giant panda.On the right is an excerpt from the Smithsonian Magazine’s Twitter feed. It simply tweets teasers and links to its blog posts and other web content. The account has immediate marketing value, but not long-term significance.