SlideShare a Scribd company logo
Amazon's Mechanical Turk is Not Anonymous
Matt Lease
School of Information @mattlease
University of Texas at Austin ml@ischool.utexas.edussrn.com/abstract=2190946
Roadmap
• What is Mechanical Turk?
• Mechanical Turk & Anonymity
• The Vulnerability
• Potential Risks
• Closing Thoughts
2
What is Mechanical Turk?
3@mattlease
• Online marketplace for paid crowd work
• On-demand, scalable, 24/7 global workforce
• Can perform all interactions via programmer’s API
• Requestors & Workers are seemingly anonymous…
Amazon Mechanical Turk (MTurk)
4
Use Case 1: Data Processing
5
J. Pontin. Artificial Intelligence, With Help From
the Humans. New York Times (March 25, 2007)
Use Case 2: Data Collection
(e.g., surveys, demographics, …)
Amazon's
Mechanical Turk:
A New Source of
Inexpensive, Yet
High-Quality,
Data?
M. Buhrmester
et al. (2011)
6
Mechanical Turk & Anonymity
7@mattlease
Worker Privacy
Each worker is assigned an alphanumeric ID
8
Requesters see only Worker IDs
9
Brief Digression: Identity Fraud
• Compromised & exploited worker accounts
• Sybil attacks: use of multiple worker identities
• Script bots masquerading as human workers
10
Robert Sim, MSR Faculty Summit’12
Safeguarding Personal Data
•
“What are the characteristics of MTurk workers?... the MTurk
system is set up to strictly protect workers’ anonymity….”
11
The Vulnerability
12@mattlease
`
Amazon profile page
URLs use the same
IDs used on MTurk !
Did Anyone Know?
13
Did Anyone Know About This?
• Researchers & Review Boards (IRBs)?
– CrowdCamp announcement at ACM CSCW 2013
– Reviewed prior published studies
– Contacted researchers around the world
– Contacted university IRBs
• Amazon?
– Reviewed website, technical & legal documents,
online forums, blog, & interviews
– Talked to Amazon’s VP in charge of MTurk
• Workers?
– Reviewed worker forums & conducted a survey
14
Broad Perception of Anonymity
15
ssrn.com/abstract=2190946
Fraudulent Abuse of Workers
“Do not do any HITs that involve: filling in
CAPTCHAs; secret shopping; test our web page;
test zip code; free trial; click my link; surveys or
quizzes (unless the requester is listed with a
smiley in the Hall of Fame/Shame); anything
that involves sending a text message; or
basically anything that asks for any personal
information at all—even your zip code. If you
feel in your gut it’s not on the level, IT’S NOT.
Why? Because they are scams...”
16
Workers’ Views: Survey & Forums
• “... my reviewer profile is linked to my Mturk number! I had
no idea...”
• “...Amazon needs to separate the Mturk numbers from
seller numbers to protect our privacy…”
• “I think this is outrageous though. Makes me concerned
about trusting privacy agreements.”
• “Mine pulled up my Amazon wish list which revealed my
identity. It seems to me that so called ”anonymous” tasks
on mTurk (like surveys) are not anonymous after all.”
17
Potential Risks
18@mattlease
Risks to
Workers
• Inadvertent disclosure of PII or private data
• Loss of blind hiring practices online
• Greater risk of exploitation, reputation damage,
loss of income, or even physical harm…
19
Risks to Researchers
• Exposing participants to undocumented risks
• Having disclosed WorkerIDs (e.g., online)
• Having not restricted access to the internally
– Potential harm to participants
– Lack of compliance with Federal/IRB governance
of human subjects research
– Being required to discard collected data
– Delays or inability to conduct future MTurk studies
20
Risks to Amazon
• Workers/Requesters abandoning MTurk
• The Federal Trade Commission (FTC) has recently
begun to aggressively protect consumers from data
breaches by commercial entities, including the
release of supposedly “anonymous” data
– Inadequate protection of customer records: BJWC
– De-anonymized customer records: AOL, Netflix
– Did workers have a reasonable expectation of privacy
in their use of MTurk which has been violated? 21
Closing Thoughts
22@mattlease
Human-centered Privacy Protection
• Vulnerabilities are not purely technological
• Focusing on software is not enough: human
factors play a significant role in security of today’s
socio-technical, online systems
– Insufficient attention to human factors design
can compromise information security, despite having
the best algorithmic security protocols
• Privacy protection should be explicitly-valued in
relation to other competing goals & stakeholder
interests to prevent being ignored or sacrificed
23
Brief Digression: Information Schools
• At 30 universities in N. America, Europe, Asia
• Study human-centered aspects of information
technologies: design, implementation, policy, …
24
www.ischools.org
Wobbrock et
al., 2009
The Future of Crowd Work
@ ACM CSCW 2013
Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton
25
Matt Lease - ml@ischool.utexas.edu - @mattlease
Thank You!
Mechanical Turk is Not
Anonymous
Matthew Lease, Jessica Hullman,
Jeffrey P. Bigham, Michael S. Bernstein,
Juho Kim, Walter S. Lasecki, Saeideh
Bakhshi, Tanushree Mitra, and
Robert C. Miller
Social Science Research Network
ssrn.com/abstract=2190946
ir.ischool.utexas.edu/crowd
26

More Related Content

PDF
Discovering and Navigating Memes in Social Media
PDF
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
PDF
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
PDF
Crowdsourcing For Research and Engineering (Tutorial given at CrowdConf 2011)
PDF
Crowdsourcing & ethics: a few thoughts and refences.
PPTX
Implications and response to large security breaches
PDF
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
PDF
Unit 03 Computer and Internet Crime [5 hrs] v1.2.pdf
Discovering and Navigating Memes in Social Media
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing For Research and Engineering (Tutorial given at CrowdConf 2011)
Crowdsourcing & ethics: a few thoughts and refences.
Implications and response to large security breaches
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Unit 03 Computer and Internet Crime [5 hrs] v1.2.pdf

Similar to Mechanical Turk is Not Anonymous (20)

PDF
Algorithmic auditing 1.0
PDF
Industrial revolution 4.0
PDF
Big Data in FinTech
PDF
Observations on Social Engineering presentation by Warren Finch for LkNOG 6
PDF
Ml master class northeastern university
PDF
Ml master class
PDF
Crowdsourcing: From Aggregation to Search Engine Evaluation
PDF
The AI Platform Business Revolution: Matchmaking, Empathetic Technology, and ...
PPTX
Fake app Detection Project.pptx
PPTX
Reconnaissance and Social Engineering
PDF
Machine Learning: What Assurance Professionals Need to Know
PPTX
2015 KSU So You Want To Be in Cyber Security
PPTX
Artificial Intelligence: The Next 5(0) Years
PPTX
AI_finance_Module-3.pptx
PPTX
Seminar on detecting fake accounts in social media using machine learning
PDF
Evil User Stories - Improve Your Application Security
PPT
ML UNIT-I.ppt
PPTX
Machine Learning for Auditors
PDF
Data Analytics in Azure Cloud
PDF
Toward Trustworthy AI
Algorithmic auditing 1.0
Industrial revolution 4.0
Big Data in FinTech
Observations on Social Engineering presentation by Warren Finch for LkNOG 6
Ml master class northeastern university
Ml master class
Crowdsourcing: From Aggregation to Search Engine Evaluation
The AI Platform Business Revolution: Matchmaking, Empathetic Technology, and ...
Fake app Detection Project.pptx
Reconnaissance and Social Engineering
Machine Learning: What Assurance Professionals Need to Know
2015 KSU So You Want To Be in Cyber Security
Artificial Intelligence: The Next 5(0) Years
AI_finance_Module-3.pptx
Seminar on detecting fake accounts in social media using machine learning
Evil User Stories - Improve Your Application Security
ML UNIT-I.ppt
Machine Learning for Auditors
Data Analytics in Azure Cloud
Toward Trustworthy AI
Ad

More from Matthew Lease (20)

PDF
Automated Models for Quantifying Centrality of Survey Responses
PDF
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
PDF
Explainable Fact Checking with Humans in-the-loop
PDF
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
PDF
AI & Work, with Transparency & the Crowd
PDF
Designing Human-AI Partnerships to Combat Misinfomation
PDF
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
PDF
But Who Protects the Moderators?
PDF
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
PDF
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
PDF
Fact Checking & Information Retrieval
PDF
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
PDF
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
PDF
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
PDF
Systematic Review is e-Discovery in Doctor’s Clothing
PDF
The Rise of Crowd Computing (July 7, 2016)
PDF
The Rise of Crowd Computing - 2016
PDF
The Rise of Crowd Computing (December 2015)
PDF
Toward Better Crowdsourcing Science
PDF
The Search for Truth in Objective & Subject Crowdsourcing
Automated Models for Quantifying Centrality of Survey Responses
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Explainable Fact Checking with Humans in-the-loop
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
AI & Work, with Transparency & the Crowd
Designing Human-AI Partnerships to Combat Misinfomation
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
But Who Protects the Moderators?
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Fact Checking & Information Retrieval
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Systematic Review is e-Discovery in Doctor’s Clothing
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing (December 2015)
Toward Better Crowdsourcing Science
The Search for Truth in Objective & Subject Crowdsourcing
Ad

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
Programs and apps: productivity, graphics, security and other tools
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Dropbox Q2 2025 Financial Results & Investor Presentation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
“AI and Expert System Decision Support & Business Intelligence Systems”
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Per capita expenditure prediction using model stacking based on satellite ima...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Electronic commerce courselecture one. Pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.

Mechanical Turk is Not Anonymous

  • 1. Amazon's Mechanical Turk is Not Anonymous Matt Lease School of Information @mattlease University of Texas at Austin ml@ischool.utexas.edussrn.com/abstract=2190946
  • 2. Roadmap • What is Mechanical Turk? • Mechanical Turk & Anonymity • The Vulnerability • Potential Risks • Closing Thoughts 2
  • 3. What is Mechanical Turk? 3@mattlease
  • 4. • Online marketplace for paid crowd work • On-demand, scalable, 24/7 global workforce • Can perform all interactions via programmer’s API • Requestors & Workers are seemingly anonymous… Amazon Mechanical Turk (MTurk) 4
  • 5. Use Case 1: Data Processing 5 J. Pontin. Artificial Intelligence, With Help From the Humans. New York Times (March 25, 2007)
  • 6. Use Case 2: Data Collection (e.g., surveys, demographics, …) Amazon's Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data? M. Buhrmester et al. (2011) 6
  • 7. Mechanical Turk & Anonymity 7@mattlease
  • 8. Worker Privacy Each worker is assigned an alphanumeric ID 8
  • 9. Requesters see only Worker IDs 9
  • 10. Brief Digression: Identity Fraud • Compromised & exploited worker accounts • Sybil attacks: use of multiple worker identities • Script bots masquerading as human workers 10 Robert Sim, MSR Faculty Summit’12
  • 11. Safeguarding Personal Data • “What are the characteristics of MTurk workers?... the MTurk system is set up to strictly protect workers’ anonymity….” 11
  • 13. ` Amazon profile page URLs use the same IDs used on MTurk ! Did Anyone Know? 13
  • 14. Did Anyone Know About This? • Researchers & Review Boards (IRBs)? – CrowdCamp announcement at ACM CSCW 2013 – Reviewed prior published studies – Contacted researchers around the world – Contacted university IRBs • Amazon? – Reviewed website, technical & legal documents, online forums, blog, & interviews – Talked to Amazon’s VP in charge of MTurk • Workers? – Reviewed worker forums & conducted a survey 14
  • 15. Broad Perception of Anonymity 15 ssrn.com/abstract=2190946
  • 16. Fraudulent Abuse of Workers “Do not do any HITs that involve: filling in CAPTCHAs; secret shopping; test our web page; test zip code; free trial; click my link; surveys or quizzes (unless the requester is listed with a smiley in the Hall of Fame/Shame); anything that involves sending a text message; or basically anything that asks for any personal information at all—even your zip code. If you feel in your gut it’s not on the level, IT’S NOT. Why? Because they are scams...” 16
  • 17. Workers’ Views: Survey & Forums • “... my reviewer profile is linked to my Mturk number! I had no idea...” • “...Amazon needs to separate the Mturk numbers from seller numbers to protect our privacy…” • “I think this is outrageous though. Makes me concerned about trusting privacy agreements.” • “Mine pulled up my Amazon wish list which revealed my identity. It seems to me that so called ”anonymous” tasks on mTurk (like surveys) are not anonymous after all.” 17
  • 19. Risks to Workers • Inadvertent disclosure of PII or private data • Loss of blind hiring practices online • Greater risk of exploitation, reputation damage, loss of income, or even physical harm… 19
  • 20. Risks to Researchers • Exposing participants to undocumented risks • Having disclosed WorkerIDs (e.g., online) • Having not restricted access to the internally – Potential harm to participants – Lack of compliance with Federal/IRB governance of human subjects research – Being required to discard collected data – Delays or inability to conduct future MTurk studies 20
  • 21. Risks to Amazon • Workers/Requesters abandoning MTurk • The Federal Trade Commission (FTC) has recently begun to aggressively protect consumers from data breaches by commercial entities, including the release of supposedly “anonymous” data – Inadequate protection of customer records: BJWC – De-anonymized customer records: AOL, Netflix – Did workers have a reasonable expectation of privacy in their use of MTurk which has been violated? 21
  • 23. Human-centered Privacy Protection • Vulnerabilities are not purely technological • Focusing on software is not enough: human factors play a significant role in security of today’s socio-technical, online systems – Insufficient attention to human factors design can compromise information security, despite having the best algorithmic security protocols • Privacy protection should be explicitly-valued in relation to other competing goals & stakeholder interests to prevent being ignored or sacrificed 23
  • 24. Brief Digression: Information Schools • At 30 universities in N. America, Europe, Asia • Study human-centered aspects of information technologies: design, implementation, policy, … 24 www.ischools.org Wobbrock et al., 2009
  • 25. The Future of Crowd Work @ ACM CSCW 2013 Kittur, Nickerson, Bernstein, Gerber, Shaw, Zimmerman, Lease, and Horton 25
  • 26. Matt Lease - ml@ischool.utexas.edu - @mattlease Thank You! Mechanical Turk is Not Anonymous Matthew Lease, Jessica Hullman, Jeffrey P. Bigham, Michael S. Bernstein, Juho Kim, Walter S. Lasecki, Saeideh Bakhshi, Tanushree Mitra, and Robert C. Miller Social Science Research Network ssrn.com/abstract=2190946 ir.ischool.utexas.edu/crowd 26