Mechanical Turk for Social ScienceSean Munson	EytanBakshySchool of Information, UMich	28 October 2009
11:00 am - Problem:Need to classify thousands of blogs according to category.
Lunch**not actual lunch
Mechanical Turk for Social Science Introduction
1:00 pm50 blogs classified5x each
Mechanical Turk for Social Science AwesomeSean Munson                                            EytanBakshyAn API made of people!
OverviewWho are the Turkers?Tasks suitable for Mechanical Turk and workarounds for tasks that are semi-suitableTasks from Turkers’ and requesters’ points of viewExamplesClassifying linksReacting to collections of linksPracticalitiesToolsPaying Turkers at UMichHuman SubjectsSlides will be available online.
Who are the Turkers?
Andy Baio, Faces of Mechanical Turk
Andy Baio, Faces of Mechanical Turk
Andy Baio, Faces of Mechanical Turk
300 Turker Survey from PanosIpeirotisLimited by self-selection issues (people who do tasks w/ only one available, and at that pay).By country:	76% US;   8% India;    3% UK;     2% Canada
Mechanical Turk for Social Science Introduction
Mechanical Turk for Social Science Introduction
Mechanical Turk for Social Science Introduction
Ideal types of tasksShort durationRepetitive – Turker learns once, repeats many No particular expertise requiredFrom requester perspective: Human input is verifiable with less effort than it would take to do it yourself or to pay an expert, e.g.tasks that require people to write something assess quality using multiple ratersbut you can use it in other ways.
Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete task
Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete task
Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete task
Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete task
Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete taskCreate task typeLoad Task instances(prepay)Flickr:Michelle Gibson
Mechanical Turk for Social Science Introduction
Mechanical Turk for Social Science Introduction
Mechanical Turk for Social Science Introduction
Mechanical Turk for Social Science Introduction
Mechanical Turk for Social Science Introduction
Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete taskCreate task typeLoad Task instances(prepay)Approve or reject tasks
Turkers as Classifiers
Large-scale study of diffusion and influence on TwitterHow does the spread of a URL over the twitter network depend on the content?What proportion of “influential” users are mass media vs. individualsRequires thousands of labels of URLs and users.  Needs to be fast and cheap.
Mechanical Turk for Social Science Introduction
Mechanical Turk for Social Science Introduction
Mechanical Turk for Social Science Introduction
Turkers as Subjects
Turkers as Subjects – ChallengesHard to check answer quality when you want opinions!Screening & treatment randomizationmTurk not optimized for 1x tasks
Mechanical Turk for Social Science Introduction
Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete taskCreate task typeLoad Task instances(prepay)Approve or reject tasks
How to screen?LiberalRepublicanDemocratConservative
Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Take QualificationGet PaidComplete taskCreate task typeLoad Task instances(prepay)Require 95% task approval ratingRequire US locationAsk demographics, political preferencesApprove or reject tasks
Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Take QualificationGet PaidComplete taskCreate task typeLoad Task instances(prepay)Approve or reject tasksEvaluate Qualification: Grant or rejectCreate or use existing qualification
Checking for validityCouldn’t ask verifiable information (Kittur and Chi) about collection without affecting how the subjects look at the listDid have demographic info from qualification. Randomly selected a question to repeat  removed people for gender changes, aging backwards, or major changes in political preferences
Total cost:  $382 for 485 collection ratingsHad to pay more (~$12/hr) because only one task available at a time, plus required (unpaid) qualification.
Practicalities
ToolsWeb interface: WYSIWYG editor, CSV upload of tasks. Many task templates to use as starting points. Very simple and fast to use, but limited in capability. Command line tools: Required to create custom qualifications or use multiple quals. Much more flexibility. Input format is XML. Documentation is adequate, overall experience is clunky.Other libraries(e.g. http://developer.amazonwebservices.com/connect/entry.jspa?externalID=827&categoryID=85)3rd party tools: Almost as easy to use as Amazon’s web interface & support nearly all features of command line tools. But they take a cut. CrowdFlower – from Dolores Labs: crowdflower.comSmartsheet: smartsheet.com/product/smartsourcing
Human subjects?Human subjects status varies with designCategorizing content: Not human subjectsAsking for reactions to content: Human subjects.Informed ConsentMy preference has been to argue for waiver of informed consent. (Mechanical Turk terms of service prohibit collection of identifiable information.)You can use qualifications if you have a task where you feel informed consent is appropriate, have extended consent information and have repetitive tasks.
Subject paymentmTurk handles all payment, butAssociate your account with the University of Michigan employer ID number, in case any one person earns more than the IRS reporting limit from all Michigan mTurk studies.Stacy Callahan or I have more information.
Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Take QualificationGet PaidComplete taskTurkerCreate task typeLoad Task instances(prepay)Approve or reject tasksEvaluate Qualification: Grant or rejectScoringAutomatically score: instant grant / reject, requires right & wrong answers
Download & score: Good for participant screening, fast turnaround (run every minute), random assignmentCan set limits on retakingToo many rejects? Revoke qualification.Create or use existing qualificationMust be hosted by Amazon
Built in quals for location, reputationRequesterCan assign people to dummy qualifications to allow them to take follow-up studies, and you can email them through mTurk. Also can exclude this way to maintain virgin sample.
Some references & resourcesGeneralDolores Labs blog: http://blog.doloreslabs.com/Turker Nation forums: http://turkers.proboards.com5 Study how-tos from Markus Jakobsson (PARC)http://blogs.parc.com/blog/2009/07/experimenting-on-mechanical-turk-5-how-tos/Turker DemographicsSurvey by PanosIpeirotishttp://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.htmlTurker demographics vs. Internet Demographicshttp://behind-the-enemy-lines.blogspot.com/2009/03/turker-demographics-vs-internet.htmlWhy do people participatehttp://behind-the-enemy-lines.blogspot.com/2008/03/why-people-participate-on-mechanical.htmlWhy do people participate (more)http://www.floozyspeak.com/blog/archives/2008/08/valley_of_the_t.html

More Related Content

PPT
Assessment Technologies
PPTX
Arquímedes
PPTX
Importancia de-la-tic-en-la-educacion1600
PDF
רשימת שמירה מאי 13
PDF
Informationsanlässe der Suva Basel zum Thema: Unfälle um jeden Preis verhinde...
DOCX
Actividad 1
PDF
Assessment Technologies
Arquímedes
Importancia de-la-tic-en-la-educacion1600
רשימת שמירה מאי 13
Informationsanlässe der Suva Basel zum Thema: Unfälle um jeden Preis verhinde...
Actividad 1

Viewers also liked (13)

PDF
Una jornada escolar
PPTX
Presenting Diverse Political Opinions: How and How Much (CHI 2010)
PPT
Admpublica
PDF
רשימת שמירה ינואר 13
DOC
resume minesh s soni
PDF
רשימת שמירה פברואר 11
DOCX
Comunicación global parte 2
PDF
Dinos nao existem
PDF
Informationsanlass der Suva Aarau: Partnerschaftliche Zusammenarbeit von Arzt...
PPTX
Dmitry Plushevskiy, Add in App
PPT
Imagen de ciudad relacion
PPT
PPTX
Las rocas
Una jornada escolar
Presenting Diverse Political Opinions: How and How Much (CHI 2010)
Admpublica
רשימת שמירה ינואר 13
resume minesh s soni
רשימת שמירה פברואר 11
Comunicación global parte 2
Dinos nao existem
Informationsanlass der Suva Aarau: Partnerschaftliche Zusammenarbeit von Arzt...
Dmitry Plushevskiy, Add in App
Imagen de ciudad relacion
Las rocas
Ad

Similar to Mechanical Turk for Social Science Introduction (20)

PDF
Remote, unmoderated usability and user testing.
PDF
Systems Analysis and Design Rosenblatt 10th Edition Test Bank
PPTX
SEO + MTurk
PPT
Engaging with Users on Public Social Media
PPT
Identifying and improving top tasks
PPT
Oco usability
PDF
C_GRCAC_10 Exam - Six Things You Must know About SAP C_GRCAC_10 Exam
PPT
Vipul Kocher - Software Testing, A Framework Based Approach
PPT
OCSS_PPT.ppt
PPTX
online xampp quiz project final report for the completion of the degree bca
PPT
Project ppt, Learn Project Java
PDF
Systems Analysis and Design Rosenblatt 10th Edition Test Bank
PDF
Organizing Your First Website Usability Test - WordCamp Toronto 2016
PDF
Barga Data Science lecture 2
DOCX
PEER RESPONSESWeek 3 - Discussion 1Kirkpatricks Taxo.docx
PDF
Organizing Your First Website Usability Test - Cornell Drupal Camp 2016 - part 4
PDF
1Z0-450 Exam - Six Things You Must know About Oracle 1Z0-450 Exam
PPTX
Agile User Studies (Agile & Beyond 2012)
PPTX
Using the ACBSP Online Reporting Portal to Complete a Self Study
PPT
Usability Primer - for Alberta Municipal Webmasters Working Group
Remote, unmoderated usability and user testing.
Systems Analysis and Design Rosenblatt 10th Edition Test Bank
SEO + MTurk
Engaging with Users on Public Social Media
Identifying and improving top tasks
Oco usability
C_GRCAC_10 Exam - Six Things You Must know About SAP C_GRCAC_10 Exam
Vipul Kocher - Software Testing, A Framework Based Approach
OCSS_PPT.ppt
online xampp quiz project final report for the completion of the degree bca
Project ppt, Learn Project Java
Systems Analysis and Design Rosenblatt 10th Edition Test Bank
Organizing Your First Website Usability Test - WordCamp Toronto 2016
Barga Data Science lecture 2
PEER RESPONSESWeek 3 - Discussion 1Kirkpatricks Taxo.docx
Organizing Your First Website Usability Test - Cornell Drupal Camp 2016 - part 4
1Z0-450 Exam - Six Things You Must know About Oracle 1Z0-450 Exam
Agile User Studies (Agile & Beyond 2012)
Using the ACBSP Online Reporting Portal to Complete a Self Study
Usability Primer - for Alberta Municipal Webmasters Working Group
Ad

More from Sean Munson (10)

PDF
Encouraging Reading of Diverse Political Viewpoints with a Browser Widget
PDF
Exploring Goal-setting, Rewards, Self-monitoring, and Sharing to Motivate Phy...
PDF
Happier Together: Integrating a Wellness Application Into a Social Network Site
PDF
Challenges and Opportunities in Using Online Social Networks for Health (CSCW...
PDF
Thanks and Tweets: Comparing Two Public Displays (CSCW 2011)
PDF
The Prevalence of Political Discourse in Non-Political Blogs
PPTX
Attitudes toward Online Availability of US Public Records
PPTX
Building Wellness Interventions Into Facebook
PPT
Motivating and Enabling Organizational Memory with a Workgroup Wiki
PPTX
Sidelines: An Algorithm for Increasing Diversity in News and Opinion Aggregators
Encouraging Reading of Diverse Political Viewpoints with a Browser Widget
Exploring Goal-setting, Rewards, Self-monitoring, and Sharing to Motivate Phy...
Happier Together: Integrating a Wellness Application Into a Social Network Site
Challenges and Opportunities in Using Online Social Networks for Health (CSCW...
Thanks and Tweets: Comparing Two Public Displays (CSCW 2011)
The Prevalence of Political Discourse in Non-Political Blogs
Attitudes toward Online Availability of US Public Records
Building Wellness Interventions Into Facebook
Motivating and Enabling Organizational Memory with a Workgroup Wiki
Sidelines: An Algorithm for Increasing Diversity in News and Opinion Aggregators

Recently uploaded (20)

PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PPTX
Education and Perspectives of Education.pptx
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PDF
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PDF
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
PDF
semiconductor packaging in vlsi design fab
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
Complications of Minimal Access-Surgery.pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
Mucosal Drug Delivery system_NDDS_BPHARMACY__SEM VII_PCI.pdf
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
B.Sc. DS Unit 2 Software Engineering.pptx
Education and Perspectives of Education.pptx
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
Unit 4 Computer Architecture Multicore Processor.pptx
Race Reva University – Shaping Future Leaders in Artificial Intelligence
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
semiconductor packaging in vlsi design fab
Environmental Education MCQ BD2EE - Share Source.pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Complications of Minimal Access-Surgery.pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Mucosal Drug Delivery system_NDDS_BPHARMACY__SEM VII_PCI.pdf
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf

Mechanical Turk for Social Science Introduction

  • 1. Mechanical Turk for Social ScienceSean Munson EytanBakshySchool of Information, UMich 28 October 2009
  • 2. 11:00 am - Problem:Need to classify thousands of blogs according to category.
  • 5. 1:00 pm50 blogs classified5x each
  • 6. Mechanical Turk for Social Science AwesomeSean Munson EytanBakshyAn API made of people!
  • 7. OverviewWho are the Turkers?Tasks suitable for Mechanical Turk and workarounds for tasks that are semi-suitableTasks from Turkers’ and requesters’ points of viewExamplesClassifying linksReacting to collections of linksPracticalitiesToolsPaying Turkers at UMichHuman SubjectsSlides will be available online.
  • 8. Who are the Turkers?
  • 9. Andy Baio, Faces of Mechanical Turk
  • 10. Andy Baio, Faces of Mechanical Turk
  • 11. Andy Baio, Faces of Mechanical Turk
  • 12. 300 Turker Survey from PanosIpeirotisLimited by self-selection issues (people who do tasks w/ only one available, and at that pay).By country: 76% US; 8% India; 3% UK; 2% Canada
  • 16. Ideal types of tasksShort durationRepetitive – Turker learns once, repeats many No particular expertise requiredFrom requester perspective: Human input is verifiable with less effort than it would take to do it yourself or to pay an expert, e.g.tasks that require people to write something assess quality using multiple ratersbut you can use it in other ways.
  • 17. Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete task
  • 18. Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete task
  • 19. Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete task
  • 20. Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete task
  • 21. Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete taskCreate task typeLoad Task instances(prepay)Flickr:Michelle Gibson
  • 27. Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete taskCreate task typeLoad Task instances(prepay)Approve or reject tasks
  • 29. Large-scale study of diffusion and influence on TwitterHow does the spread of a URL over the twitter network depend on the content?What proportion of “influential” users are mass media vs. individualsRequires thousands of labels of URLs and users. Needs to be fast and cheap.
  • 34. Turkers as Subjects – ChallengesHard to check answer quality when you want opinions!Screening & treatment randomizationmTurk not optimized for 1x tasks
  • 36. Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Get PaidComplete taskCreate task typeLoad Task instances(prepay)Approve or reject tasks
  • 38. Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Take QualificationGet PaidComplete taskCreate task typeLoad Task instances(prepay)Require 95% task approval ratingRequire US locationAsk demographics, political preferencesApprove or reject tasks
  • 39. Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Take QualificationGet PaidComplete taskCreate task typeLoad Task instances(prepay)Approve or reject tasksEvaluate Qualification: Grant or rejectCreate or use existing qualification
  • 40. Checking for validityCouldn’t ask verifiable information (Kittur and Chi) about collection without affecting how the subjects look at the listDid have demographic info from qualification. Randomly selected a question to repeat  removed people for gender changes, aging backwards, or major changes in political preferences
  • 41. Total cost: $382 for 485 collection ratingsHad to pay more (~$12/hr) because only one task available at a time, plus required (unpaid) qualification.
  • 43. ToolsWeb interface: WYSIWYG editor, CSV upload of tasks. Many task templates to use as starting points. Very simple and fast to use, but limited in capability. Command line tools: Required to create custom qualifications or use multiple quals. Much more flexibility. Input format is XML. Documentation is adequate, overall experience is clunky.Other libraries(e.g. http://developer.amazonwebservices.com/connect/entry.jspa?externalID=827&categoryID=85)3rd party tools: Almost as easy to use as Amazon’s web interface & support nearly all features of command line tools. But they take a cut. CrowdFlower – from Dolores Labs: crowdflower.comSmartsheet: smartsheet.com/product/smartsourcing
  • 44. Human subjects?Human subjects status varies with designCategorizing content: Not human subjectsAsking for reactions to content: Human subjects.Informed ConsentMy preference has been to argue for waiver of informed consent. (Mechanical Turk terms of service prohibit collection of identifiable information.)You can use qualifications if you have a task where you feel informed consent is appropriate, have extended consent information and have repetitive tasks.
  • 45. Subject paymentmTurk handles all payment, butAssociate your account with the University of Michigan employer ID number, in case any one person earns more than the IRS reporting limit from all Michigan mTurk studies.Stacy Callahan or I have more information.
  • 46. Automatically accept another task of this type, or go find a new taskTask listing – Preview & select task Take QualificationGet PaidComplete taskTurkerCreate task typeLoad Task instances(prepay)Approve or reject tasksEvaluate Qualification: Grant or rejectScoringAutomatically score: instant grant / reject, requires right & wrong answers
  • 47. Download & score: Good for participant screening, fast turnaround (run every minute), random assignmentCan set limits on retakingToo many rejects? Revoke qualification.Create or use existing qualificationMust be hosted by Amazon
  • 48. Built in quals for location, reputationRequesterCan assign people to dummy qualifications to allow them to take follow-up studies, and you can email them through mTurk. Also can exclude this way to maintain virgin sample.
  • 49. Some references & resourcesGeneralDolores Labs blog: http://blog.doloreslabs.com/Turker Nation forums: http://turkers.proboards.com5 Study how-tos from Markus Jakobsson (PARC)http://blogs.parc.com/blog/2009/07/experimenting-on-mechanical-turk-5-how-tos/Turker DemographicsSurvey by PanosIpeirotishttp://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.htmlTurker demographics vs. Internet Demographicshttp://behind-the-enemy-lines.blogspot.com/2009/03/turker-demographics-vs-internet.htmlWhy do people participatehttp://behind-the-enemy-lines.blogspot.com/2008/03/why-people-participate-on-mechanical.htmlWhy do people participate (more)http://www.floozyspeak.com/blog/archives/2008/08/valley_of_the_t.html
  • 50. Some references & resourcesImproving Answer qualityAniketKittur, Ed H. Chi, and BongwonSuh (2008). “Crowdsourcing user studies with Mechanical Turk,” CHI 2008. Answer quality and dealing with bad answersCarpenter, Bob. 2008. Hierarchical Bayesian Models of Categorical DataRaykar et al. (2009) Supervised Learning from Multiple Experts: Whom to Trust when Everyone Lies a Bit, ICML.Worker quality & HIT difficultyhttp://behind-the-enemy-lines.blogspot.com/2008/08/mechanical-turk-worker-quality-and-hit.htmlAlso see literature on scoring a test without an answer key
  • 51. Some references & resourcesTurker effort, skills, participation rate, and payW Mason, D Watts. (2009). Financial Incentives and the Performance of Crowds. KDD Workshop on Human Computation.Self report on skillshttp://behind-the-enemy-lines.blogspot.com/2009/01/how-good-are-you-turker.htmlHuman SubjectsConsent in qualification testshttp://behind-the-enemy-lines.blogspot.com/2009/08/get-consent-form-for-irb-on-mturk-using.htmlDiscussionhttp://behind-the-enemy-lines.blogspot.com/2009/01/mechanical-turk-human-subjects-and-irbs.html

Editor's Notes

  • #19: Tasks can be sorted by price or number of HITs available, among other things. To increase participation, you generally want to appear higher on at least one of these lists.
  • #38: For this study, we wanted Conservative Republicans and Liberal Democrats, not people with neutral views, Liberal Republicans, or Conservative Democrats.
  • #39: Restricting who can participate.
  • #40: If not automatically scored, the qualification introduces an even bigger delay in the process, and you’ll lose workers. But scoring it yourself allows a lot more control, and lets you retain turker answer data.