SlideShare a Scribd company logo
Crowdsourcing Digitization Harnessing Workflows to Increase Output Gretchen Gueguen, East Carolina University Ann Hanlon, Marquette University LITA National Forum, 2008 Cincinnati, Ohio
What is crowdsourcing? Jeff Howe,  Wired Magazine , 2006 “ distributed labor networks are using the Internet to exploit the spare processing power of millions of human brains” – best example, Wikipedia… Any end achieved by harnessing the wisdom and labor of crowds Distributing the burden of a large endeavor Howe, Jeff. “The Rise of Crowdsourcing”,  Wired Magazine , Issue 14.06, June 2006
Crowdsourcing Digitization Crowd? Patrons and Co-workers Capturing digitization for patron request Selection is driven by patron request Centralized and Decentralized staffing for digitization Object : Build robust digital collections Online collections dense enough for  systematic research  (not just showcases)
Crowdsourcing Digitization The Wisdom of Crowds How the project was conceived and developed: success story The Madness of Crowds How the project failed, why: bringing it back from the brink Crowd Control Methods used and lessons learned Attracting a Crowd Critical mass for the masses: why we digitize
The Wisdom of Crowds
The Wisdom of Crowds Project  Background:  Archives and Special Collections Digital image management for archives and special collections Reducing redundancy – many items requested for digitization more than once, why not track them? Digital Collections and Research (DCR) New office to coordinate digitization efforts established Establishing a digital repository  More ambitious than just image management Image management  = capturing patron scanning workflow to populate the new repository
The Wisdom of Crowds Coordination between Archives and Digital Collections:  New metadata schema New best practice guidelines Developing Repository Fedora required development Meanwhile, patron scanning continues to grow…
The Wisdom of Crowds Answer: Scanning Database Microsoft Access database: “stop-gap measure” while digital repository was being built Corresponded to newly created XML schema and metadata requirements for repository
The Wisdom of Crowds
The Wisdom of Crowds Biggest beneficiary: University Archives Receives the most scanning requests from patrons Capture patron requests, as well as items scanned prior to implementation of Scanning Database University celebrating 150 th  anniversary Documentary “ Coffee table”  book Departmental histories Nostalgic alumnae
The Wisdom of Crowds Collections created by crowdsourcing digitization: University AlbUM National Trust for Historic Preservation Postcard Collection
The Madness of Crowds
The Madness of Crowds Evolution Evolving standards for both metadata and imaging Training/Quality (dis)Organization Backlog www.funnyfreepics.com
The Madness of Crowds Evolution Quality of legacy scans file types spatial resolutions Color profiles Clipping, noise, and other  “problems” Flawed equipment Training/Procedures (dis)Organization Backlog
The Madness of Crowds Rotated 90º Rotated 180º 24-bit color  300 dpi tif 8-bit  600 dpi tif 48-bit color  600 dpi tif Bitonal EPS 16-bit  300 dpi JPEG indexed color 72 dpi gif PDF???
The Madness of Crowds
Evolution Metadata Quality Lack of experience with controlled vocabularies and input standards Changing metadata requirements Training/Procedures (dis)Organization Backlog The Madness of Crowds It’s not quite wrong… but, it’s not quite right
Evolution Training/Procedures No standard guidelines for scanning procedures No quality control procedures for images or metadata No one to set them up anyway… (dis)Organization Backlog The Madness of Crowds
The Madness of Crowds
The Madness of Crowds
The Madness of Crowds Evolution Training/Procedures (dis)Organization Does everything fit in  a “collection? Backlog
The Madness of Crowds Evolution Training/Procedures (dis)Organization Backlog Robust metadata standard to enable repurposing and “sharability” Could take 10x more time to do metadata than scanning Volume of scanning didn’t leave much time for metadata
The Madness of Crowds
Crowd Control
Create Documentation “ Teachable” standard Responsibility Quality Divide and Conquer?!? Crowd Control
Crowd Control Create Documentation TEACH  it Responsibility Quality: Live it, Learn it, Love it  Divide and Conquer 6. file format  3. straightness and placement  1. resolution  2. color  4.  reference points (targets) 5. noise
Crowd Control Puglia, 2007 Imaging Environment Defined Image State RAW Prepped for a specific output Output Referred - looks towards output Input Referred - looks towards sensor Original Referred - defined relationship between original and digital version Current Practice Emerging Practice More technical metadata is needed Should be able to get by with less technical metadata
Create documentation TEACH  it!  Quality: Live it, Learn it, Love it Have curatorial staff check for accuracy and completeness DCR staff follow up with a check of a statistically significant portion for style and consistency Eventually, give curatorial staff to make corrections as they find them using the web-based administrative form Responsibility Divide and conquer?!? Crowd Control
Documentation “ Teachable” standard Quality: Live it, Learn it, Love it  Responsibility Someone  has to have  some But it doesn’t have to be an entire job Divide and Conquer Crowd Control
Create documentation TEACH  it! Quality: Live it, Learn it, Love it Responsibility Divide and conquer?!? Stub record created at request time; Cataloging enhances Crowd Control
Crowd Control Create documentation TEACH  it! Quality: Live it, Learn it, Love it Responsibility Divide and conquer Give up Less control, more power
Crowd Control Would you want to try this? Give yourself room to evolve and change through the project Don’t feel like every image is a precious snowflake More than any single technique, it’s the philosophy of crowdsourcing that’s more important
Crowd Control Would you want to try this? Don’t feel like every image is a precious snowflake l Access to a low-quality scan… … is still better than no access at all.
Would you want to try this? More than any single technique, it’s the philosophy of crowdsourcing that’s important
Crowd Control
 
Attracting a Crowd
Attracting a Crowd Letting Go “ Letting go” creates efficiencies Looking at expertise across the Libraries Distribute the burden Move away from “trophy” collections  toward online Research Collections
Attracting a Crowd Distributed Problem-solving Ideas from Archives: Organizing repository by subject rather than by collection Dabbling in folder-level description (and digitization) rather than just item-level Neutral Collection-building Erway, Ricky and Jennifer Schaffner. 2007, “Gearing Up to Get Into the Flow.” Report produced by OCLC Programs and Research (formerly RLG)
Attracting a Crowd Distributed Problem-solving Ideas from Archives: Using “stub records” from patron request forms Dabbling in folder-level description (and digitization) rather than just item-level “ Neutral” Collection-building Wikipedia-style collection-building Building a collection with wide range
Attracting a Crowd Mass digitization Google projects: Books Newspapers Mass decision- making Instead of item-level  decision-making
Attracting a Crowd Making Digitization a Core Function of the Library Mission Statements come to life! Organizing around digitization – very little has really  been done yet Why?  For researchers “ Fringe activities” need to become core investments Metadata creation Digitization Council on Library and Information Resources (CLIR).  No Brief Candle: Reconceiving Research Libraries for the 21 st  Century , 2008.
Crowdsourcing Digitization THANKS! Access these slides at: http://www.personal.ecu.edu/presentations/Crowdsourcing.ppt Or: http://www.slideshare.net Gretchen Gueguen [email_address] East Carolina University Greenville, North Carolina Ann Hanlon [email_address] Marquette University Milwaukee, Wisconsin

More Related Content

PPTX
Supporting UC Research Data Management
PPTX
Professor Paul Resnick at Vircomm14 – 'Motivating Contribution: 5 theories an...
PPT
Crowdsourcing Best Practices
PPT
Fa027presentation
PPTX
Techniques for Electronic Resource Management: Crowdsourcing for Best Practices
PDF
Leading Transformation Programs in Large / Global Organizations
PDF
Design Thinking
PPTX
Trends and challanges for IT in Knowledge Management
Supporting UC Research Data Management
Professor Paul Resnick at Vircomm14 – 'Motivating Contribution: 5 theories an...
Crowdsourcing Best Practices
Fa027presentation
Techniques for Electronic Resource Management: Crowdsourcing for Best Practices
Leading Transformation Programs in Large / Global Organizations
Design Thinking
Trends and challanges for IT in Knowledge Management

Similar to Crowdsourcing Digitization: Harnessing Workflows to Increase Output (20)

PPT
Digital library services and the changing environment
PDF
Rettiggoel.ux week.8.25.05
PPT
Integrating Unique Materials into the Global Discovery Network
PDF
Handout for adventures in digital curation
PPT
Newcomers Breakfast
PPTX
ILA Presentation
PPT
Risks and strategies: the view from OCLC Research
PPT
EDUCAUSE Midwest - Presentation - Koch - Henshaw
PPT
20080903arsenalsofnemesis 04
PPTX
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
PPTX
Exploring Digital Badging
PDF
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
PPTX
Adventures in digital curation
PDF
Setting a Course for Success: Getting Started with Digital Preservation in Yo...
PPT
Bootstrap Alliance Google Call to Action
PDF
When Search becomes Research and Research becomes Search
PPT
The DCC: Helping you curate your reputation
PDF
Intro to Data Science for Non-Data Scientists
PPTX
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
PPT
Web 2.0: Riding the Wave
Digital library services and the changing environment
Rettiggoel.ux week.8.25.05
Integrating Unique Materials into the Global Discovery Network
Handout for adventures in digital curation
Newcomers Breakfast
ILA Presentation
Risks and strategies: the view from OCLC Research
EDUCAUSE Midwest - Presentation - Koch - Henshaw
20080903arsenalsofnemesis 04
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
Exploring Digital Badging
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Adventures in digital curation
Setting a Course for Success: Getting Started with Digital Preservation in Yo...
Bootstrap Alliance Google Call to Action
When Search becomes Research and Research becomes Search
The DCC: Helping you curate your reputation
Intro to Data Science for Non-Data Scientists
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Web 2.0: Riding the Wave
Ad

More from Gretchen Gueguen (11)

PPT
Linked Data: Uses and Users
PPTX
DPLA Archival Description Working Group Update
PPTX
Data Quality at the Scale of Aggregation
PPTX
DPLA's Archival Description Working Group Update
PPT
Collecting in the Moment
PPTX
Do Digital Archivists Dream of Electronic Records
PPT
Capturing the Zeitgeist
PPT
Just keep clicking Till You Find It: Building a Library Digital Collection In...
PPT
National History Day Projects
PPT
The Daily Reflector Image Collection: Best Practices in the Classroom
PPTX
Seeds Of Change Technical Implementation
Linked Data: Uses and Users
DPLA Archival Description Working Group Update
Data Quality at the Scale of Aggregation
DPLA's Archival Description Working Group Update
Collecting in the Moment
Do Digital Archivists Dream of Electronic Records
Capturing the Zeitgeist
Just keep clicking Till You Find It: Building a Library Digital Collection In...
National History Day Projects
The Daily Reflector Image Collection: Best Practices in the Classroom
Seeds Of Change Technical Implementation
Ad

Recently uploaded (20)

PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Lesson notes of climatology university.
PDF
RMMM.pdf make it easy to upload and study
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Institutional Correction lecture only . . .
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
GDM (1) (1).pptx small presentation for students
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Abdominal Access Techniques with Prof. Dr. R K Mishra
Chinmaya Tiranga quiz Grand Finale.pdf
Lesson notes of climatology university.
RMMM.pdf make it easy to upload and study
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Institutional Correction lecture only . . .
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
A systematic review of self-coping strategies used by university students to ...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
O7-L3 Supply Chain Operations - ICLT Program
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Final Presentation General Medicine 03-08-2024.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
GDM (1) (1).pptx small presentation for students

Crowdsourcing Digitization: Harnessing Workflows to Increase Output

  • 1. Crowdsourcing Digitization Harnessing Workflows to Increase Output Gretchen Gueguen, East Carolina University Ann Hanlon, Marquette University LITA National Forum, 2008 Cincinnati, Ohio
  • 2. What is crowdsourcing? Jeff Howe, Wired Magazine , 2006 “ distributed labor networks are using the Internet to exploit the spare processing power of millions of human brains” – best example, Wikipedia… Any end achieved by harnessing the wisdom and labor of crowds Distributing the burden of a large endeavor Howe, Jeff. “The Rise of Crowdsourcing”, Wired Magazine , Issue 14.06, June 2006
  • 3. Crowdsourcing Digitization Crowd? Patrons and Co-workers Capturing digitization for patron request Selection is driven by patron request Centralized and Decentralized staffing for digitization Object : Build robust digital collections Online collections dense enough for systematic research (not just showcases)
  • 4. Crowdsourcing Digitization The Wisdom of Crowds How the project was conceived and developed: success story The Madness of Crowds How the project failed, why: bringing it back from the brink Crowd Control Methods used and lessons learned Attracting a Crowd Critical mass for the masses: why we digitize
  • 5. The Wisdom of Crowds
  • 6. The Wisdom of Crowds Project Background: Archives and Special Collections Digital image management for archives and special collections Reducing redundancy – many items requested for digitization more than once, why not track them? Digital Collections and Research (DCR) New office to coordinate digitization efforts established Establishing a digital repository More ambitious than just image management Image management = capturing patron scanning workflow to populate the new repository
  • 7. The Wisdom of Crowds Coordination between Archives and Digital Collections: New metadata schema New best practice guidelines Developing Repository Fedora required development Meanwhile, patron scanning continues to grow…
  • 8. The Wisdom of Crowds Answer: Scanning Database Microsoft Access database: “stop-gap measure” while digital repository was being built Corresponded to newly created XML schema and metadata requirements for repository
  • 9. The Wisdom of Crowds
  • 10. The Wisdom of Crowds Biggest beneficiary: University Archives Receives the most scanning requests from patrons Capture patron requests, as well as items scanned prior to implementation of Scanning Database University celebrating 150 th anniversary Documentary “ Coffee table” book Departmental histories Nostalgic alumnae
  • 11. The Wisdom of Crowds Collections created by crowdsourcing digitization: University AlbUM National Trust for Historic Preservation Postcard Collection
  • 12. The Madness of Crowds
  • 13. The Madness of Crowds Evolution Evolving standards for both metadata and imaging Training/Quality (dis)Organization Backlog www.funnyfreepics.com
  • 14. The Madness of Crowds Evolution Quality of legacy scans file types spatial resolutions Color profiles Clipping, noise, and other “problems” Flawed equipment Training/Procedures (dis)Organization Backlog
  • 15. The Madness of Crowds Rotated 90º Rotated 180º 24-bit color 300 dpi tif 8-bit 600 dpi tif 48-bit color 600 dpi tif Bitonal EPS 16-bit 300 dpi JPEG indexed color 72 dpi gif PDF???
  • 16. The Madness of Crowds
  • 17. Evolution Metadata Quality Lack of experience with controlled vocabularies and input standards Changing metadata requirements Training/Procedures (dis)Organization Backlog The Madness of Crowds It’s not quite wrong… but, it’s not quite right
  • 18. Evolution Training/Procedures No standard guidelines for scanning procedures No quality control procedures for images or metadata No one to set them up anyway… (dis)Organization Backlog The Madness of Crowds
  • 19. The Madness of Crowds
  • 20. The Madness of Crowds
  • 21. The Madness of Crowds Evolution Training/Procedures (dis)Organization Does everything fit in a “collection? Backlog
  • 22. The Madness of Crowds Evolution Training/Procedures (dis)Organization Backlog Robust metadata standard to enable repurposing and “sharability” Could take 10x more time to do metadata than scanning Volume of scanning didn’t leave much time for metadata
  • 23. The Madness of Crowds
  • 25. Create Documentation “ Teachable” standard Responsibility Quality Divide and Conquer?!? Crowd Control
  • 26. Crowd Control Create Documentation TEACH it Responsibility Quality: Live it, Learn it, Love it Divide and Conquer 6. file format 3. straightness and placement 1. resolution 2. color 4. reference points (targets) 5. noise
  • 27. Crowd Control Puglia, 2007 Imaging Environment Defined Image State RAW Prepped for a specific output Output Referred - looks towards output Input Referred - looks towards sensor Original Referred - defined relationship between original and digital version Current Practice Emerging Practice More technical metadata is needed Should be able to get by with less technical metadata
  • 28. Create documentation TEACH it! Quality: Live it, Learn it, Love it Have curatorial staff check for accuracy and completeness DCR staff follow up with a check of a statistically significant portion for style and consistency Eventually, give curatorial staff to make corrections as they find them using the web-based administrative form Responsibility Divide and conquer?!? Crowd Control
  • 29. Documentation “ Teachable” standard Quality: Live it, Learn it, Love it Responsibility Someone has to have some But it doesn’t have to be an entire job Divide and Conquer Crowd Control
  • 30. Create documentation TEACH it! Quality: Live it, Learn it, Love it Responsibility Divide and conquer?!? Stub record created at request time; Cataloging enhances Crowd Control
  • 31. Crowd Control Create documentation TEACH it! Quality: Live it, Learn it, Love it Responsibility Divide and conquer Give up Less control, more power
  • 32. Crowd Control Would you want to try this? Give yourself room to evolve and change through the project Don’t feel like every image is a precious snowflake More than any single technique, it’s the philosophy of crowdsourcing that’s more important
  • 33. Crowd Control Would you want to try this? Don’t feel like every image is a precious snowflake l Access to a low-quality scan… … is still better than no access at all.
  • 34. Would you want to try this? More than any single technique, it’s the philosophy of crowdsourcing that’s important
  • 36.  
  • 38. Attracting a Crowd Letting Go “ Letting go” creates efficiencies Looking at expertise across the Libraries Distribute the burden Move away from “trophy” collections toward online Research Collections
  • 39. Attracting a Crowd Distributed Problem-solving Ideas from Archives: Organizing repository by subject rather than by collection Dabbling in folder-level description (and digitization) rather than just item-level Neutral Collection-building Erway, Ricky and Jennifer Schaffner. 2007, “Gearing Up to Get Into the Flow.” Report produced by OCLC Programs and Research (formerly RLG)
  • 40. Attracting a Crowd Distributed Problem-solving Ideas from Archives: Using “stub records” from patron request forms Dabbling in folder-level description (and digitization) rather than just item-level “ Neutral” Collection-building Wikipedia-style collection-building Building a collection with wide range
  • 41. Attracting a Crowd Mass digitization Google projects: Books Newspapers Mass decision- making Instead of item-level decision-making
  • 42. Attracting a Crowd Making Digitization a Core Function of the Library Mission Statements come to life! Organizing around digitization – very little has really been done yet Why? For researchers “ Fringe activities” need to become core investments Metadata creation Digitization Council on Library and Information Resources (CLIR). No Brief Candle: Reconceiving Research Libraries for the 21 st Century , 2008.
  • 43. Crowdsourcing Digitization THANKS! Access these slides at: http://www.personal.ecu.edu/presentations/Crowdsourcing.ppt Or: http://www.slideshare.net Gretchen Gueguen [email_address] East Carolina University Greenville, North Carolina Ann Hanlon [email_address] Marquette University Milwaukee, Wisconsin

Editor's Notes

  • #2: Who we are, where we were….