SlideShare a Scribd company logo
What Can Machine Learning & Crowdsourcing
Do for You?
Exploring New Tools for Scalable Data Processing
Matt Lease
School of Information @mattlease
University of Texas at Austin ml@utexas.edu
Slides:
slideshare.net/mattlease
“The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at 65 universities around the world
www.ischools.org
What’s an Information School?
2
• Machine Learning (AI) lets us automate many
useful tasks, eg. natural language processing (NLP)
• Crowdsourcing enables new levels of efficiency &
scalability in data collection & processing
• Human Computation lets us build next-generation
applications today, with capabilities beyond AI
Roadmap
Motivation: Applications
@mattlease
Automatic/Hybrid Fact Checking
• http://fcweb.pythonanywhere.com
– Nguyen et al., AAAI 2018
5
• http://odyssey.ischool.utexas.edu/mb/
– Ryu et al., HyperText 2012
MemeBrowser
6
• Kumar et al., CIKM 2011
Dating Biographies without Time Mentions
Plato (428-348 B.C.) Lincoln (1809-1865)
7
Transcription & Copy-Editing
• Spontaneous speech is often disfluent, with repetitions,
corrections, and vocalized space-fillers
• Lease, Charniak, and Johnson, 2005
• Zhou, Baskov, and Lease, 2013 (& Zhou’s Thesis)
S1: Uh first um i need to know uh how do you feel about uh about
sending uh an elderly uh family member to a nursing home
S2: Well of course it's you know it's one of the last few things in the
world you'd ever want to do you know unless it's just you know really
you know uh for their uh you know for their own good
Transcription & Copy-Editing
• Spontaneous speech is often disfluent, with repetitions,
corrections, and vocalized space-fillers
• Lease, Charniak, and Johnson, 2005
• Zhou, Baskov, and Lease, 2013 (& Zhou’s Thesis)
S1: Uh first um i need to know uh how do you feel about uh about
sending uh an elderly uh family member to a nursing home
S2: Well of course it's you know it's one of the last few things in the
world you'd ever want to do you know unless it's just you know really
you know uh for their uh you know for their own good
Two Problems
@mattlease
Machine Learning - Supervised
Slide courtesy of Byron Wallace (Northeastern)
11
AI effectiveness is often limited by training data size
Problem: creating labeled data is expensive!
Banko and Brill (2001)
What do we do when state-of-art AI
still isn’t good enough?
Crowdsourcing
@mattlease
Crowdsourcing
• Jeff Howe. Wired, June 2006.
• Take a job traditionally
performed by a known agent
(often an employee)
• Outsource it to an undefined,
generally large group of
people via an open call
15
Volunteer Crowd Success Stories
Zooniverse
17
• Marketplace for paid crowd work (“micro-tasks”)
– Created in 2005 (remains in “beta” today)
• On-demand, scalable, 24/7 global workforce
• API lets human labor be integrated into software
– “You’ve heard of software-as-a-service. Now this is human-as-a-service.”
Amazon Mechanical Turk (MTurk)
Collecting Data from Crowds
2008: MTurk sparks “gold rush” for ML training data
• Information Retrieval: Alonso et al., SIGIR Forum
• Human-Computer Interaction: Kittur et al., CHI
• Computer Vision: Sorokin & Forsythe, CVPR
• NLP: Snow et al, EMNLP
– Annotating human language
– 22,000 labels for only US $26
– Crowd’s consensus labels can
replace traditional expert labels
Human Computation
@mattlease
21
ACM Queue, May 2006
22
“Software developers with innovative ideas for
businesses and technologies are constrained by the
limits of artificial intelligence… If software developers
could programmatically access and incorporate human
intelligence into their applications, a whole new class
of innovative businesses and applications would be
possible. This is the goal of Amazon Mechanical Turk…
people are freer to innovate because they can now
imbue software with real human intelligence.”
PlateMate: Counting Calories
Noronha et al., UIST’10
23
Bederson et al., 2010; Morita & Ishidi, 2009
MonoTrans
Translation by Monolingual Speakers + AI
24
Zensors
Laput et al., CSCW 2015
25
But Who Protects the Moderators?
Dang et al., HCOMP’18 & CI’18 26
What about ethics?
• Silberman, Irani, and Ross (2010)
– “How should we… conceptualize the role of these people
who we ask to power our computing?”
• Irani and Silberman (2013)
– “…by hiding workers behind web forms and APIs…
employers see themselves as builders of innovative
technologies, rather than… unconcerned with working
conditions… redirecting focus to the innovation of human
computation as a field of technological achievement.”
• Fort, Adda, and Cohen (2011)
– “…opportunities for our community to deliberately
value ethics above cost savings.” 27
Summary
• Machine Learning (AI) lets us automate many
useful tasks, eg. natural language processing (NLP)
• Crowdsourcing enables new levels of efficiency &
scalability in data collection & processing
• Human Computation lets us build next-generation
applications today, with capabilities beyond AI
The Future of Crowd Work
Paper @ CSCW 2013 by
Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton 29
Matt Lease - ml@utexas.edu - @mattlease
Thank You!
Slides: slideshare.net/mattlease
Lab: ir.ischool.utexas.edu

More Related Content

PDF
The Rise of Crowd Computing - 2016
PDF
The Rise of Crowd Computing (December 2015)
PDF
Toward Better Crowdsourcing Science
PDF
The Rise of Crowd Computing (July 7, 2016)
PDF
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
PDF
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
PDF
15 Pros and 5 Cons of Artificial Intelligence in the Classroom
PPTX
Artificial Intelligence in E-learning (AI-Ed): Current and future applications
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing (December 2015)
Toward Better Crowdsourcing Science
The Rise of Crowd Computing (July 7, 2016)
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
15 Pros and 5 Cons of Artificial Intelligence in the Classroom
Artificial Intelligence in E-learning (AI-Ed): Current and future applications

What's hot (20)

PPTX
The Future of work and impact on the technology worker
PPTX
The Impact of Automation & AI in the Workplace
PPTX
Data urban service science 20130617 v2
PPTX
20220103 jim spohrer hicss v9
PPTX
Skills Requirements for Future Jobs - 10 Facts
PDF
AI & Business - Opportunities & Dangers
PPTX
People's Interactions with Cognitive Assistants for Enhanced Performance
PPTX
Frontiers sutton spohrer 20150711 v2
PPTX
20211103 jim spohrer oecd ai_science_productivity_panel v5
PPTX
20210322 jim spohrer eaae deans summit v13
PDF
Artificial Intelligence (AI) and Job Loss
PPTX
Effects of ai on job market
PDF
Will robots take our jobs (short version) for Women Techmakers Talk
PDF
The impact of AI on work
PPTX
Korea day1 keynote 20161013 v6
PDF
Smart Machines: Driving the 4th Industrial Revolution?
PPTX
Applying Machine Learning and Artificial Intelligence to Business
PPTX
20210519 jim spohrer sir rel future_ai v14
PPTX
How Artificial Intelligence is taking over Human Jobs
PPTX
Japan 20200724 v13
The Future of work and impact on the technology worker
The Impact of Automation & AI in the Workplace
Data urban service science 20130617 v2
20220103 jim spohrer hicss v9
Skills Requirements for Future Jobs - 10 Facts
AI & Business - Opportunities & Dangers
People's Interactions with Cognitive Assistants for Enhanced Performance
Frontiers sutton spohrer 20150711 v2
20211103 jim spohrer oecd ai_science_productivity_panel v5
20210322 jim spohrer eaae deans summit v13
Artificial Intelligence (AI) and Job Loss
Effects of ai on job market
Will robots take our jobs (short version) for Women Techmakers Talk
The impact of AI on work
Korea day1 keynote 20161013 v6
Smart Machines: Driving the 4th Industrial Revolution?
Applying Machine Learning and Artificial Intelligence to Business
20210519 jim spohrer sir rel future_ai v14
How Artificial Intelligence is taking over Human Jobs
Japan 20200724 v13
Ad

Similar to What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for Scalable Data Processing (20)

PDF
Rise of Crowd Computing (December 2012)
PDF
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
PDF
Metrocon-Rise-Of-Crowd-Computing
PDF
Hybrid Intelligence
PDF
Artificial Assistants: How can I help you? by Christopher Currin
PDF
Humans in the loop: AI in open source and industry
PDF
Humans in a loop: Jupyter notebooks as a front-end for AI
PDF
許永真/Crowd Computing for Big and Deep AI
PDF
UT Dallas CS - Rise of Crowd Computing
PPTX
New 02-New Trends in Computer Science.pptx
PPTX
Crowdsourced Data Processing: Industry and Academic Perspectives
PDF
AI & Work, with Transparency & the Crowd
PDF
Machine Learning Introduction Basic of ML
PPTX
Artificial Intelligence in Emerging Technology
PDF
Explainable Fact Checking with Humans in-the-loop
PPTX
Artificial Intelligence(A.pptx
PDF
From models to systems (Machine Learning in Production)
PDF
Understanding Artificial benifits advantage and disadvantage
PDF
Artificial Intelligence for Business - Version 2
PDF
Sort joinpvldb12
Rise of Crowd Computing (December 2012)
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Metrocon-Rise-Of-Crowd-Computing
Hybrid Intelligence
Artificial Assistants: How can I help you? by Christopher Currin
Humans in the loop: AI in open source and industry
Humans in a loop: Jupyter notebooks as a front-end for AI
許永真/Crowd Computing for Big and Deep AI
UT Dallas CS - Rise of Crowd Computing
New 02-New Trends in Computer Science.pptx
Crowdsourced Data Processing: Industry and Academic Perspectives
AI & Work, with Transparency & the Crowd
Machine Learning Introduction Basic of ML
Artificial Intelligence in Emerging Technology
Explainable Fact Checking with Humans in-the-loop
Artificial Intelligence(A.pptx
From models to systems (Machine Learning in Production)
Understanding Artificial benifits advantage and disadvantage
Artificial Intelligence for Business - Version 2
Sort joinpvldb12
Ad

More from Matthew Lease (20)

PDF
Automated Models for Quantifying Centrality of Survey Responses
PDF
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
PDF
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
PDF
Designing Human-AI Partnerships to Combat Misinfomation
PDF
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
PDF
But Who Protects the Moderators?
PDF
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
PDF
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
PDF
Fact Checking & Information Retrieval
PDF
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
PDF
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
PDF
Systematic Review is e-Discovery in Doctor’s Clothing
PDF
The Search for Truth in Objective & Subject Crowdsourcing
PDF
Toward Effective and Sustainable Online Crowd Work
PDF
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
PDF
Crowdsourcing: From Aggregation to Search Engine Evaluation
PDF
Crowdsourcing Transcription Beyond Mechanical Turk
PDF
Crowdsourcing for Information Retrieval: From Statistics to Ethics
PDF
Crowdsourcing & ethics: a few thoughts and refences.
PDF
Mechanical Turk is Not Anonymous
Automated Models for Quantifying Centrality of Survey Responses
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Designing Human-AI Partnerships to Combat Misinfomation
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
But Who Protects the Moderators?
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Fact Checking & Information Retrieval
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Systematic Review is e-Discovery in Doctor’s Clothing
The Search for Truth in Objective & Subject Crowdsourcing
Toward Effective and Sustainable Online Crowd Work
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing & ethics: a few thoughts and refences.
Mechanical Turk is Not Anonymous

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
Teaching material agriculture food technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
MIND Revenue Release Quarter 2 2025 Press Release
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
Reach Out and Touch Someone: Haptics and Empathic Computing
Teaching material agriculture food technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
Building Integrated photovoltaic BIPV_UPV.pdf
Spectroscopy.pptx food analysis technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
sap open course for s4hana steps from ECC to s4
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...

What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for Scalable Data Processing

  • 1. What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for Scalable Data Processing Matt Lease School of Information @mattlease University of Texas at Austin ml@utexas.edu Slides: slideshare.net/mattlease
  • 2. “The place where people & technology meet” ~ Wobbrock et al., 2009 “iSchools” now exist at 65 universities around the world www.ischools.org What’s an Information School? 2
  • 3. • Machine Learning (AI) lets us automate many useful tasks, eg. natural language processing (NLP) • Crowdsourcing enables new levels of efficiency & scalability in data collection & processing • Human Computation lets us build next-generation applications today, with capabilities beyond AI Roadmap
  • 5. Automatic/Hybrid Fact Checking • http://fcweb.pythonanywhere.com – Nguyen et al., AAAI 2018 5
  • 7. • Kumar et al., CIKM 2011 Dating Biographies without Time Mentions Plato (428-348 B.C.) Lincoln (1809-1865) 7
  • 8. Transcription & Copy-Editing • Spontaneous speech is often disfluent, with repetitions, corrections, and vocalized space-fillers • Lease, Charniak, and Johnson, 2005 • Zhou, Baskov, and Lease, 2013 (& Zhou’s Thesis) S1: Uh first um i need to know uh how do you feel about uh about sending uh an elderly uh family member to a nursing home S2: Well of course it's you know it's one of the last few things in the world you'd ever want to do you know unless it's just you know really you know uh for their uh you know for their own good
  • 9. Transcription & Copy-Editing • Spontaneous speech is often disfluent, with repetitions, corrections, and vocalized space-fillers • Lease, Charniak, and Johnson, 2005 • Zhou, Baskov, and Lease, 2013 (& Zhou’s Thesis) S1: Uh first um i need to know uh how do you feel about uh about sending uh an elderly uh family member to a nursing home S2: Well of course it's you know it's one of the last few things in the world you'd ever want to do you know unless it's just you know really you know uh for their uh you know for their own good
  • 11. Machine Learning - Supervised Slide courtesy of Byron Wallace (Northeastern) 11
  • 12. AI effectiveness is often limited by training data size Problem: creating labeled data is expensive! Banko and Brill (2001)
  • 13. What do we do when state-of-art AI still isn’t good enough?
  • 15. Crowdsourcing • Jeff Howe. Wired, June 2006. • Take a job traditionally performed by a known agent (often an employee) • Outsource it to an undefined, generally large group of people via an open call 15
  • 18. • Marketplace for paid crowd work (“micro-tasks”) – Created in 2005 (remains in “beta” today) • On-demand, scalable, 24/7 global workforce • API lets human labor be integrated into software – “You’ve heard of software-as-a-service. Now this is human-as-a-service.” Amazon Mechanical Turk (MTurk)
  • 19. Collecting Data from Crowds 2008: MTurk sparks “gold rush” for ML training data • Information Retrieval: Alonso et al., SIGIR Forum • Human-Computer Interaction: Kittur et al., CHI • Computer Vision: Sorokin & Forsythe, CVPR • NLP: Snow et al, EMNLP – Annotating human language – 22,000 labels for only US $26 – Crowd’s consensus labels can replace traditional expert labels
  • 21. 21
  • 22. ACM Queue, May 2006 22 “Software developers with innovative ideas for businesses and technologies are constrained by the limits of artificial intelligence… If software developers could programmatically access and incorporate human intelligence into their applications, a whole new class of innovative businesses and applications would be possible. This is the goal of Amazon Mechanical Turk… people are freer to innovate because they can now imbue software with real human intelligence.”
  • 23. PlateMate: Counting Calories Noronha et al., UIST’10 23
  • 24. Bederson et al., 2010; Morita & Ishidi, 2009 MonoTrans Translation by Monolingual Speakers + AI 24
  • 25. Zensors Laput et al., CSCW 2015 25
  • 26. But Who Protects the Moderators? Dang et al., HCOMP’18 & CI’18 26
  • 27. What about ethics? • Silberman, Irani, and Ross (2010) – “How should we… conceptualize the role of these people who we ask to power our computing?” • Irani and Silberman (2013) – “…by hiding workers behind web forms and APIs… employers see themselves as builders of innovative technologies, rather than… unconcerned with working conditions… redirecting focus to the innovation of human computation as a field of technological achievement.” • Fort, Adda, and Cohen (2011) – “…opportunities for our community to deliberately value ethics above cost savings.” 27
  • 28. Summary • Machine Learning (AI) lets us automate many useful tasks, eg. natural language processing (NLP) • Crowdsourcing enables new levels of efficiency & scalability in data collection & processing • Human Computation lets us build next-generation applications today, with capabilities beyond AI
  • 29. The Future of Crowd Work Paper @ CSCW 2013 by Kittur, Nickerson, Bernstein, Gerber, Shaw, Zimmerman, Lease, and Horton 29
  • 30. Matt Lease - ml@utexas.edu - @mattlease Thank You! Slides: slideshare.net/mattlease Lab: ir.ischool.utexas.edu