SlideShare a Scribd company logo
Social Feed Manager
Laura Wrubel
@liblaura @SocialFeedMgr
http://go.gwu.edu/sfm
Web Archives and Digital Libraries workshop, JCDL 2016
Social Feed Manager is supported by the National Historical Publications & Records Commission
Allows users to create collections of data
from social media platforms
Open source software, not a black box
Social Feed Manager, WADL/JCDL 2016
Social Feed Manager, WADL/JCDL 2016
Social Feed Manager, WADL/JCDL 2016
Research documentation (for researchers)
≈ provenance metadata (for archivists)
(and it’s really important for both)
Creation
Authoring of the social media
● Creation metadata is provided by Twitter as JSON via API.
● Social media user metadata:
○ Screen name
○ Date account created
○ Location
● Tweet metadata:
○ Date
○ Tweet text
○ Mentions
○ Hashtags
○ URLs
○ Source (how posted)
● SFM records it in WARC files.
Selection
Decisions by the SFM user which leads SFM to harvest the tweet
Recorded in the SFM database
● Collection information
○ Harvest type
○ Harvest options (e.g., incremental, harvest web resources)
○ Credentials (API keys)
○ Description of collection
● Seeds for the collection (which vary by platform)
○ Screen name
○ UID
○ Keywords to filter on
● Change log
○ Change note
○ Fields changed
○ User who made change
○ Date of change
Collection
How SFM retrieved the tweet from Twitter’s API
● Collection metadata is received by SFM’s Twitter harvester & recorded
within WARCs.
● WARCs include the exact HTTP request/response
○ URL with params such as user account id or keywords
○ HTTP headers
● WARC record headers also include:
○ Date WARC record created
○ Server information
○ Fixities
Collection (cont)
● WARC file metadata, recorded in the SFM database:
○ File location
○ File size
○ Fixity
○ Creation date
● Harvest metadata:
○ Date
○ Collection
○ Date harvest started
○ Date harvest ended
○ Messages (informational, warning, or error)
○ Token/seed updates
○ Basic stats on number of items collected
Working paper: http://bit.ly/tweet-prov
Comments welcome!
How is this useful? http://bit.ly/tweet-prov
● Which of this provenance metadata do you (researcher,
archivist, librarian, etc.) want access to?
● How do you want access to this metadata? In SFM’s UI? In
reports when exports are created? Exposed via SFM’s
software libraries? A REST API? Machine-readable?
Human-readable?
● What metadata have we missed?
● Do the answers to the previous questions vary by discipline
(e.g., humanities, social science, etc.)?
● Are there other relevant specifications or standards that we
should consider? Is there value in a mapping to or providing
output in accordance with metadata standards such as
PREMIS or PROV?

More Related Content

PDF
Mobyle 2 - Mobyle Workshop - September 28, 2012
PPT
Technical processing section
PDF
LDCache - a cache for linked data-driven web applications
PPTX
Metadata ppt
PPTX
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
PDF
Social Feed Manager presentation at Archives Unleashed 3.0
PDF
Extracting Insights from Data at Twitter
PDF
Api centric enterprises
Mobyle 2 - Mobyle Workshop - September 28, 2012
Technical processing section
LDCache - a cache for linked data-driven web applications
Metadata ppt
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
Social Feed Manager presentation at Archives Unleashed 3.0
Extracting Insights from Data at Twitter
Api centric enterprises

Similar to Social Feed Manager, WADL/JCDL 2016 (20)

PDF
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
PDF
Streamsets and spark in Retail
PPTX
PDF
LiveFolders as feeds
PDF
Livefoldersasfeeds
PPTX
Why use big data tools to do web analytics? And how to do it using Snowplow a...
PPTX
hacking techniques and intrusion techniques useful in OSINT.pptx
PDF
Week10
PPT
Resource discovery and information sharing: reaching the 2.0 turn
PDF
4th Content Providers Community Call
PPTX
Democratizing data science Using spark, hive and druid
PDF
Digital game preservation conference 12 25-2018
PDF
Publishing Linked Data using Schema.org
PDF
What Goes In Must Come Out: Egress-Assess and Data Exfiltration
PDF
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
PPTX
Draux "Working with Scholarly APIs: A NISO Training Series, Session Four: Dig...
PDF
Resource sync overview and real-world use cases for discovery, harvesting, an...
PDF
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
PDF
Data mining and data warehousing notes
PDF
Ismael Benito & Arnau Gàmez - Hacking Tokens: A Massive PoC [rooted2018]
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
Streamsets and spark in Retail
LiveFolders as feeds
Livefoldersasfeeds
Why use big data tools to do web analytics? And how to do it using Snowplow a...
hacking techniques and intrusion techniques useful in OSINT.pptx
Week10
Resource discovery and information sharing: reaching the 2.0 turn
4th Content Providers Community Call
Democratizing data science Using spark, hive and druid
Digital game preservation conference 12 25-2018
Publishing Linked Data using Schema.org
What Goes In Must Come Out: Egress-Assess and Data Exfiltration
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Draux "Working with Scholarly APIs: A NISO Training Series, Session Four: Dig...
Resource sync overview and real-world use cases for discovery, harvesting, an...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
Data mining and data warehousing notes
Ismael Benito & Arnau Gàmez - Hacking Tokens: A Massive PoC [rooted2018]
Ad

Recently uploaded (20)

DOCX
Unit-3 cyber security network security of internet system
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PPTX
SAP Ariba Sourcing PPT for learning material
PPTX
Internet___Basics___Styled_ presentation
PPTX
presentation_pfe-universite-molay-seltan.pptx
PDF
Sims 4 Historia para lo sims 4 para jugar
PDF
Decoding a Decade: 10 Years of Applied CTI Discipline
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPT
tcp ip networks nd ip layering assotred slides
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
PPTX
Introduction to Information and Communication Technology
PDF
Triggering QUIC, presented by Geoff Huston at IETF 123
PPTX
innovation process that make everything different.pptx
PDF
The Internet -By the Numbers, Sri Lanka Edition
Unit-3 cyber security network security of internet system
An introduction to the IFRS (ISSB) Stndards.pdf
Job_Card_System_Styled_lorem_ipsum_.pptx
Unit-1 introduction to cyber security discuss about how to secure a system
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
SASE Traffic Flow - ZTNA Connector-1.pdf
Cloud-Scale Log Monitoring _ Datadog.pdf
SAP Ariba Sourcing PPT for learning material
Internet___Basics___Styled_ presentation
presentation_pfe-universite-molay-seltan.pptx
Sims 4 Historia para lo sims 4 para jugar
Decoding a Decade: 10 Years of Applied CTI Discipline
Module 1 - Cyber Law and Ethics 101.pptx
tcp ip networks nd ip layering assotred slides
The New Creative Director: How AI Tools for Social Media Content Creation Are...
RPKI Status Update, presented by Makito Lay at IDNOG 10
Introduction to Information and Communication Technology
Triggering QUIC, presented by Geoff Huston at IETF 123
innovation process that make everything different.pptx
The Internet -By the Numbers, Sri Lanka Edition
Ad

Social Feed Manager, WADL/JCDL 2016

  • 1. Social Feed Manager Laura Wrubel @liblaura @SocialFeedMgr http://go.gwu.edu/sfm Web Archives and Digital Libraries workshop, JCDL 2016 Social Feed Manager is supported by the National Historical Publications & Records Commission
  • 2. Allows users to create collections of data from social media platforms
  • 3. Open source software, not a black box
  • 7. Research documentation (for researchers) ≈ provenance metadata (for archivists) (and it’s really important for both)
  • 8. Creation Authoring of the social media ● Creation metadata is provided by Twitter as JSON via API. ● Social media user metadata: ○ Screen name ○ Date account created ○ Location ● Tweet metadata: ○ Date ○ Tweet text ○ Mentions ○ Hashtags ○ URLs ○ Source (how posted) ● SFM records it in WARC files.
  • 9. Selection Decisions by the SFM user which leads SFM to harvest the tweet Recorded in the SFM database ● Collection information ○ Harvest type ○ Harvest options (e.g., incremental, harvest web resources) ○ Credentials (API keys) ○ Description of collection ● Seeds for the collection (which vary by platform) ○ Screen name ○ UID ○ Keywords to filter on ● Change log ○ Change note ○ Fields changed ○ User who made change ○ Date of change
  • 10. Collection How SFM retrieved the tweet from Twitter’s API ● Collection metadata is received by SFM’s Twitter harvester & recorded within WARCs. ● WARCs include the exact HTTP request/response ○ URL with params such as user account id or keywords ○ HTTP headers ● WARC record headers also include: ○ Date WARC record created ○ Server information ○ Fixities
  • 11. Collection (cont) ● WARC file metadata, recorded in the SFM database: ○ File location ○ File size ○ Fixity ○ Creation date ● Harvest metadata: ○ Date ○ Collection ○ Date harvest started ○ Date harvest ended ○ Messages (informational, warning, or error) ○ Token/seed updates ○ Basic stats on number of items collected
  • 13. How is this useful? http://bit.ly/tweet-prov ● Which of this provenance metadata do you (researcher, archivist, librarian, etc.) want access to? ● How do you want access to this metadata? In SFM’s UI? In reports when exports are created? Exposed via SFM’s software libraries? A REST API? Machine-readable? Human-readable? ● What metadata have we missed? ● Do the answers to the previous questions vary by discipline (e.g., humanities, social science, etc.)? ● Are there other relevant specifications or standards that we should consider? Is there value in a mapping to or providing output in accordance with metadata standards such as PREMIS or PROV?