SlideShare a Scribd company logo
a transcript game to make historic
public broadcasting more discoverable
Casey E. Davis Kaufman
Associate Director, WGBH Media Library and Archives
Project Manager, American Archive of Public Broadcasting
WGBH Educational Foundation
Boston, Massachusetts
• We produce at least one third of the
content broadcast on PBS
• NOVA, FRONTLINE, Antiques Roadshow,
American Experience,
Masterpiece Theatre, Arthur,
Curious George
Library of Congress
National Audio-Visual Conservation Center
Culpeper, Virginia
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
americanarchive.org
fixit.americanarchive.org
@amarchivepub
facebook.com/amarchivepub
the situation
■ 68,000+ digitized television and radio programs
■ incomplete, inaccurate metadata records
■ limited staff resources
■ we need to know what we have in the collection
■ we have a responsibility to users to provide access to the collection
■ continued growth of the collection (content and sparse metadata)
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
AV crowdsourcing precedents
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
github.com/WGBH/fixit
github.com/popuparchive/american-archive-kaldi
marketing
• targeting outreach to several audiences, including
– senior citizens, senior centers, senior living facilities
– K-12 students
– volunteer opportunity seekers (idealist.com, volunteermatch.com, etc.)
– formerly incarcerated individuals
– speech-language pathologists
– literacy initiatives
– Participating stations and public media enthusiasts
– archivists, librarians, history lovers
• future plan to organize edit-a-thons
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
once corrected…
• JSON transcripts will be stored on AAPB’s Amazon S3 account
• transcripts will be indexed for keyword searching on the
AAPB website
• transcripts will be made available alongside the media on the
record page
• transcripts can play as captions within the player
• transcripts can be harvested via an API and used as a dataset
for research such as a digital humanities project
• transcripts can be provided to the stations that contributed
the content
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable
americanarchive.org
fixit.americanarchive.org
@amarchivepub
facebook.com/amarchivepub
Thank you!
Casey E. Davis Kaufman
Casey_Davis-Kaufman@wgbh.org
@caseyedavis1

More Related Content

PDF
Engage Your Community to Celebrate Your History
PPT
New media and other media in Communicating Archaeology
PDF
How to Use the American Archive of Public Broadcasting as a Resource in the C...
PPTX
Our Marathon Presentation at DH Data Curation Workshop
PPTX
AAPB Educators Webinar
PPT
Getting your stuff on the web
PDF
Accessibility of the American Archive of Public Broadcasting in Academic Libr...
PPTX
Press Play on History: Unlocking 70 Years of Primary Source Materials for Dis...
Engage Your Community to Celebrate Your History
New media and other media in Communicating Archaeology
How to Use the American Archive of Public Broadcasting as a Resource in the C...
Our Marathon Presentation at DH Data Curation Workshop
AAPB Educators Webinar
Getting your stuff on the web
Accessibility of the American Archive of Public Broadcasting in Academic Libr...
Press Play on History: Unlocking 70 Years of Primary Source Materials for Dis...

Similar to FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable (20)

PPTX
Boston Library Consortium Webinar Part 1, Accessibility of AAPB for Academic ...
PPTX
Improving Access to Historic Public Broadcasting through Speech-to-Text, Crow...
PPTX
Boston Library Consortium Webinars: Use of AAPB in Humanities Research"
PPTX
American Archive of Public Broadcasting. Karen Cariani, Casey E. Davis, WGBH....
PPTX
AAPB: National Federation of Community Broadcasters
PPTX
AAPB Introduction at AMIA 2014
PPTX
DESIGN FOR CONTEXT: Cataloging, Web Design, and Linked Data for Exposing Nati...
PPTX
Herbert Hoover Presidential Library-Museum
PPTX
Exploring Cultural History Online -- Winding Rivers Library System Kickoff Event
PPTX
NEVADA AND LAS VEGAS MEMORY: DIGITAL TREASURES FOR READERS, AUTHORS AND THE L...
PDF
KCariani cv10_2015
PDF
AAPB as a Digital Library for Teaching Media Literacy
PDF
American Archive of Public Broadcasting: a Digital Library for Teaching Media...
PDF
2016 Press Release _rev_FINAL
PPTX
Building the AAPB: Inter-Institutional Preservation and Access Workflows
PPT
The DPLA and NY Heritage for Tech Camp 2014
PPT
Reprioritising our values to recognise culture for its true value | Biocity S...
PDF
Keeping the Broadcast Historic Record: An Archive of Public Media in the Making
PDF
New Models of Distribution - Australian Broadcasting Corp Presentation
PPT
WNR.sg - Keynote Address by Mr John van Oudenaren, Director, World Digital Li...
Boston Library Consortium Webinar Part 1, Accessibility of AAPB for Academic ...
Improving Access to Historic Public Broadcasting through Speech-to-Text, Crow...
Boston Library Consortium Webinars: Use of AAPB in Humanities Research"
American Archive of Public Broadcasting. Karen Cariani, Casey E. Davis, WGBH....
AAPB: National Federation of Community Broadcasters
AAPB Introduction at AMIA 2014
DESIGN FOR CONTEXT: Cataloging, Web Design, and Linked Data for Exposing Nati...
Herbert Hoover Presidential Library-Museum
Exploring Cultural History Online -- Winding Rivers Library System Kickoff Event
NEVADA AND LAS VEGAS MEMORY: DIGITAL TREASURES FOR READERS, AUTHORS AND THE L...
KCariani cv10_2015
AAPB as a Digital Library for Teaching Media Literacy
American Archive of Public Broadcasting: a Digital Library for Teaching Media...
2016 Press Release _rev_FINAL
Building the AAPB: Inter-Institutional Preservation and Access Workflows
The DPLA and NY Heritage for Tech Camp 2014
Reprioritising our values to recognise culture for its true value | Biocity S...
Keeping the Broadcast Historic Record: An Archive of Public Media in the Making
New Models of Distribution - Australian Broadcasting Corp Presentation
WNR.sg - Keynote Address by Mr John van Oudenaren, Director, World Digital Li...
Ad

More from WGBH Media Library and Archives (19)

PDF
Wikipedia Editathon: How to Guide
PPTX
FIX IT+ Transcript Editing
PPTX
AV Digitization Projects: Tools and Strategies for Enhancing Impact and Engag...
PPTX
Implementing Samvera Open Source Technology at WGBH and the American Archive ...
PDF
Use of American Archive of Public Broadcasting in Humanities Research
PPTX
Putting the Pieces Together: Creating a National Educational Television Catalog
PDF
DESIGN FOR CONTEXT: Cataloging and Linked Data for Exposing National Educatio...
PPTX
Preserving Your Station Legacy with the American Archive of Public Broadcasti...
PPTX
Let the Computer Do the Work
PPTX
Using Computational Tools and Crowdsourcing Games to Increase Metadata and Di...
PPTX
Can the Computer and the Public Do the Metadata Work?
PPTX
Going Far by Going Together: Collaboration with Scholars and Other Allies
PPTX
Building AAPB Participation into Digitization Grant Proposals: Requirements, ...
PPTX
Let the Public and the Computer do the Metadata Work!
PPTX
Put it on your Bucket List: Navigating Copyright to Expose Digital AV Collect...
PDF
NET Collection Catalog Project
PPTX
PBCore RDF Ontology Hackathon | Code4Lib 2015
PPTX
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
PPTX
American Archive of Public Broadcasting: Preservation and Content Continuity
Wikipedia Editathon: How to Guide
FIX IT+ Transcript Editing
AV Digitization Projects: Tools and Strategies for Enhancing Impact and Engag...
Implementing Samvera Open Source Technology at WGBH and the American Archive ...
Use of American Archive of Public Broadcasting in Humanities Research
Putting the Pieces Together: Creating a National Educational Television Catalog
DESIGN FOR CONTEXT: Cataloging and Linked Data for Exposing National Educatio...
Preserving Your Station Legacy with the American Archive of Public Broadcasti...
Let the Computer Do the Work
Using Computational Tools and Crowdsourcing Games to Increase Metadata and Di...
Can the Computer and the Public Do the Metadata Work?
Going Far by Going Together: Collaboration with Scholars and Other Allies
Building AAPB Participation into Digitization Grant Proposals: Requirements, ...
Let the Public and the Computer do the Metadata Work!
Put it on your Bucket List: Navigating Copyright to Expose Digital AV Collect...
NET Collection Catalog Project
PBCore RDF Ontology Hackathon | Code4Lib 2015
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
American Archive of Public Broadcasting: Preservation and Content Continuity
Ad

Recently uploaded (20)

PPTX
Chapter 5: Probability Theory and Statistics
PDF
Architecture types and enterprise applications.pdf
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
1. Introduction to Computer Programming.pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
project resource management chapter-09.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Tartificialntelligence_presentation.pptx
PPT
What is a Computer? Input Devices /output devices
PDF
Hybrid model detection and classification of lung cancer
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Chapter 5: Probability Theory and Statistics
Architecture types and enterprise applications.pdf
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Getting started with AI Agents and Multi-Agent Systems
TLE Review Electricity (Electricity).pptx
Assigned Numbers - 2025 - Bluetooth® Document
1. Introduction to Computer Programming.pptx
Zenith AI: Advanced Artificial Intelligence
Final SEM Unit 1 for mit wpu at pune .pptx
observCloud-Native Containerability and monitoring.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
project resource management chapter-09.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
Developing a website for English-speaking practice to English as a foreign la...
Hindi spoken digit analysis for native and non-native speakers
NewMind AI Weekly Chronicles - August'25-Week II
Tartificialntelligence_presentation.pptx
What is a Computer? Input Devices /output devices
Hybrid model detection and classification of lung cancer
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx

FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discoverable

  • 1. a transcript game to make historic public broadcasting more discoverable Casey E. Davis Kaufman Associate Director, WGBH Media Library and Archives Project Manager, American Archive of Public Broadcasting
  • 2. WGBH Educational Foundation Boston, Massachusetts • We produce at least one third of the content broadcast on PBS • NOVA, FRONTLINE, Antiques Roadshow, American Experience, Masterpiece Theatre, Arthur, Curious George Library of Congress National Audio-Visual Conservation Center Culpeper, Virginia
  • 5. the situation ■ 68,000+ digitized television and radio programs ■ incomplete, inaccurate metadata records ■ limited staff resources ■ we need to know what we have in the collection ■ we have a responsibility to users to provide access to the collection ■ continued growth of the collection (content and sparse metadata)
  • 20. marketing • targeting outreach to several audiences, including – senior citizens, senior centers, senior living facilities – K-12 students – volunteer opportunity seekers (idealist.com, volunteermatch.com, etc.) – formerly incarcerated individuals – speech-language pathologists – literacy initiatives – Participating stations and public media enthusiasts – archivists, librarians, history lovers • future plan to organize edit-a-thons
  • 22. once corrected… • JSON transcripts will be stored on AAPB’s Amazon S3 account • transcripts will be indexed for keyword searching on the AAPB website • transcripts will be made available alongside the media on the record page • transcripts can play as captions within the player • transcripts can be harvested via an API and used as a dataset for research such as a digital humanities project • transcripts can be provided to the stations that contributed the content
  • 25. Thank you! Casey E. Davis Kaufman Casey_Davis-Kaufman@wgbh.org @caseyedavis1

Editor's Notes

  • #2: Hello everyone, my name is Casey Davis Kaufman and I’m here today to tell you about the American Archive of Public Broadcasting’s new transcript crowdsourcing game FIX IT.
  • #3: Who are we? The American Archive of Public Broadcasting is a collaboration between the Library of Congress and WGBH to preserve significant public broadcasting before its content is lost to posterity, and to provide a central web portal for access to the historic programming created by public media over the last 60+ years. WGBH is one of the primary producers of public broadcasting content distributed by PBS. You may be familiar with our programming – NOVA, FRONTLINE, American Experience, Masterpiece Theatre, Antiques Roadshow. Our archive dates back to 1947 for radio and 1951 for TV. And we all know the Library of Congress….
  • #4: We’ve digitized more than 50,000 hours of historic public broadcasting television and radio programming, the entire collection is accessible on location at WGBH and the Library of Congress, and more than 20,000 programs are available online in our Online Reading Room.
  • #5: Check our our website, our FIX IT game, and follow us on social media.
  • #7: Today I’m presenting on an AAPB grant project, funded by the Institute of Museum and Library Services and in collaboration between WGBH, Pop Up Archive, and the University of Texas at Austin’s school of Information. The goal of this project is to develop methods of improving access to audiovisual collections through speech-to-text and soundwave analysis. During the project, we worked with Pop Up Archive to create speech-to-text transcripts of the entire AAPB collection. Understanding that speech-to-text produces a transcript’s not 100% accurate, WGBH’s role in the project was to develop a game to engage the public the in the correction of these often error-prone transcripts.
  • #20: Built upon something already existing called Kaldi. Pop up archive have just released the code in github but it is not yet tested and we are not sure about documentation. It is just for English