SlideShare a Scribd company logo
Mechanical cheat
Spamming Schemes and Adversarial
Techniques on Crowdsourcing Platforms
Djellel Eddine Difallah, GianlucaDemartini, and Philippe Cudré-Mauroux
University of Fribourg, Switzerland
Popularity and Monetary Incentives
 Micro task Crowdsourcing is growing in popularity.
 ~500k registered workers in AMT
 ~200k hits available (April 2012)
 ~20k $ of rewards (April 2012)
Spam could be a threat for
Crowdsourcing
Some Experiments Results:
Entity Link Selection (ZenCrowd – WWW2012)

 Evidence of participations of dishonest workers, spending

less time doing more tasks and achieving lesser quality.
Dishonest Answers onCrowdsourcing
Platforms
 We define a dishonest answer in a crowd sourcing context as

answer that has been either:
 Randomly posted.
 Artificially generated.
 Duplicated from another source.
How can requesters perform quality
control?
 Go over all the submissions?
 Blindly accept all submissions?
 Use selection and filtering algorithms.
Anti adversarial techniques
 Pre-selection and dissuasion
 Use built in control (ex: acceptance rate)
 Task design
 Qualification test

 Post processing
 Task repetition and aggregation
 Test questions
 Machine learning (ex: probabilistic netw0rk in ZenCrowd)
Countering adversarial techniques
Organization
Counteringadversarial techniques
Individual attacks
 Random Answers
 Target tasks designed with monetary incentive
 Countered with test questions
 Automated Answers
 Target tasks with simple submission mechanism
 Counter with test questions (especially captchas)
 Semi-Automated Answers
 Target easy hits achievable with some AI.
 Can pass easy-to-answer test questions
 Can detect captchas and forward them to a human.
Counteringadversarial techniques
Group attacks
 Agree on Answers
 Target naïve aggregation schemes like majority vote.
 May discard valid answers!
 Counter by shuffling the options
 Answer Sharing
 Target repeated tasks
 Counter with creating multiple batches
 Artificial Clones
 Target repeated tasks
Conclusions and future work
 We claim the inefficiency of some quality control tools to

counter resourceful spammers.
 Combine multiple techniques for post-filtering.
 Crowdsourcing platforms to provide more tools.
 Evaluation of futurefiltering algorithms must be repeatable

and generic.
 Crowdsourcing benchmark.
Conclusions and future work
Benchmarkproposal
 A collection of tasks with multiple choice options
 Each task is repeated multiple times
 Unpublished expert judgment for all the tasks
 Publish answers completed in a controlled environment with the

following categories of workers:





Honest workers
Random clicks
Semi automated program
Organized group

 Post-filtering methods are evaluated based on their ability to achieve

high precision score.

 Other parameter could be the money spent etc
Discussion
Q&A

More Related Content

PPT
Make the Most of Your Time: How Should the Analyst Work with Automated Tracea...
PPT
Testing 2 - Thinking Like A Tester
PPTX
Software engineering 22 error detection and debugging
PDF
Penetration testing services
PDF
[Tho Quan] Fault Localization - Where is the root cause of a bug?
PPTX
Detecting incorrectly implemented experiments
PPTX
PMP Sample Questions Set 2
PDF
Itis pentest slides hyd
Make the Most of Your Time: How Should the Analyst Work with Automated Tracea...
Testing 2 - Thinking Like A Tester
Software engineering 22 error detection and debugging
Penetration testing services
[Tho Quan] Fault Localization - Where is the root cause of a bug?
Detecting incorrectly implemented experiments
PMP Sample Questions Set 2
Itis pentest slides hyd

Similar to Mechanical Cheat (20)

PDF
Abuse prevention in the globally distributed economy presentation
PDF
A CAPTCHA in the Rye
PDF
An incremental learning based framework for image spam filtering
PDF
paper_97
PDF
Human Computer Interface -l2.pdf
PDF
Threats_Report_2013
PPT
Captcha ppt
PPTX
CAPTCHA.pptx
PPTX
Crowdsourcing for Online Data Collection
PDF
Projects Colman2010 Part2
PDF
TrustRank.PDF
PDF
The Rise of Crowd Computing (December 2015)
PPTX
Human Agency on Algorithmic Systems
PDF
IRJET- Different Implemented Captchas and Breaking Methods
PDF
許永真/Crowd Computing for Big and Deep AI
PDF
Final Report.pdf
PDF
Rise of Crowd Computing (December 2012)
PPTX
Bsa cpd a_koene2016
DOCX
Advanced Captcha Report
DOCX
Bypass Slider Captcha Using AI – The Future of Automation.docx
Abuse prevention in the globally distributed economy presentation
A CAPTCHA in the Rye
An incremental learning based framework for image spam filtering
paper_97
Human Computer Interface -l2.pdf
Threats_Report_2013
Captcha ppt
CAPTCHA.pptx
Crowdsourcing for Online Data Collection
Projects Colman2010 Part2
TrustRank.PDF
The Rise of Crowd Computing (December 2015)
Human Agency on Algorithmic Systems
IRJET- Different Implemented Captchas and Breaking Methods
許永真/Crowd Computing for Big and Deep AI
Final Report.pdf
Rise of Crowd Computing (December 2012)
Bsa cpd a_koene2016
Advanced Captcha Report
Bypass Slider Captcha Using AI – The Future of Automation.docx
Ad

More from eXascale Infolab (20)

PDF
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
PPTX
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
PDF
Representation Learning on Complex Graphs
PPTX
A force directed approach for offline gps trajectory map
PPTX
Cikm 2018
PPTX
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
PDF
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
PDF
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
PDF
Crowd scheduling www2016
PPTX
SANAPHOR: Ontology-based Coreference Resolution
PDF
Efficient, Scalable, and Provenance-Aware Management of Linked Data
PDF
Entity-Centric Data Management
PDF
SSSW 2015 Sense Making
PDF
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
PDF
Executing Provenance-Enabled Queries over Web Data
PDF
The Dynamics of Micro-Task Crowdsourcing
PDF
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
PPTX
CIKM14: Fixing grammatical errors by preposition ranking
PDF
OLTP-Bench
PPTX
An Introduction to Big Data
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
Representation Learning on Complex Graphs
A force directed approach for offline gps trajectory map
Cikm 2018
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Crowd scheduling www2016
SANAPHOR: Ontology-based Coreference Resolution
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Entity-Centric Data Management
SSSW 2015 Sense Making
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
Executing Provenance-Enabled Queries over Web Data
The Dynamics of Micro-Task Crowdsourcing
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
CIKM14: Fixing grammatical errors by preposition ranking
OLTP-Bench
An Introduction to Big Data
Ad

Recently uploaded (20)

PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Architecture types and enterprise applications.pdf
PPT
What is a Computer? Input Devices /output devices
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
Tartificialntelligence_presentation.pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
1. Introduction to Computer Programming.pptx
PDF
August Patch Tuesday
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
Modernising the Digital Integration Hub
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Hybrid model detection and classification of lung cancer
Web App vs Mobile App What Should You Build First.pdf
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Developing a website for English-speaking practice to English as a foreign la...
Architecture types and enterprise applications.pdf
What is a Computer? Input Devices /output devices
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
O2C Customer Invoices to Receipt V15A.pptx
A comparative study of natural language inference in Swahili using monolingua...
Tartificialntelligence_presentation.pptx
A novel scalable deep ensemble learning framework for big data classification...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Module 1.ppt Iot fundamentals and Architecture
1. Introduction to Computer Programming.pptx
August Patch Tuesday
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
OMC Textile Division Presentation 2021.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Modernising the Digital Integration Hub
A contest of sentiment analysis: k-nearest neighbor versus neural network
Hybrid model detection and classification of lung cancer

Mechanical Cheat

  • 1. Mechanical cheat Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms Djellel Eddine Difallah, GianlucaDemartini, and Philippe Cudré-Mauroux University of Fribourg, Switzerland
  • 2. Popularity and Monetary Incentives  Micro task Crowdsourcing is growing in popularity.  ~500k registered workers in AMT  ~200k hits available (April 2012)  ~20k $ of rewards (April 2012)
  • 3. Spam could be a threat for Crowdsourcing
  • 4. Some Experiments Results: Entity Link Selection (ZenCrowd – WWW2012)  Evidence of participations of dishonest workers, spending less time doing more tasks and achieving lesser quality.
  • 5. Dishonest Answers onCrowdsourcing Platforms  We define a dishonest answer in a crowd sourcing context as answer that has been either:  Randomly posted.  Artificially generated.  Duplicated from another source.
  • 6. How can requesters perform quality control?  Go over all the submissions?  Blindly accept all submissions?  Use selection and filtering algorithms.
  • 7. Anti adversarial techniques  Pre-selection and dissuasion  Use built in control (ex: acceptance rate)  Task design  Qualification test  Post processing  Task repetition and aggregation  Test questions  Machine learning (ex: probabilistic netw0rk in ZenCrowd)
  • 9. Counteringadversarial techniques Individual attacks  Random Answers  Target tasks designed with monetary incentive  Countered with test questions  Automated Answers  Target tasks with simple submission mechanism  Counter with test questions (especially captchas)  Semi-Automated Answers  Target easy hits achievable with some AI.  Can pass easy-to-answer test questions  Can detect captchas and forward them to a human.
  • 10. Counteringadversarial techniques Group attacks  Agree on Answers  Target naïve aggregation schemes like majority vote.  May discard valid answers!  Counter by shuffling the options  Answer Sharing  Target repeated tasks  Counter with creating multiple batches  Artificial Clones  Target repeated tasks
  • 11. Conclusions and future work  We claim the inefficiency of some quality control tools to counter resourceful spammers.  Combine multiple techniques for post-filtering.  Crowdsourcing platforms to provide more tools.  Evaluation of futurefiltering algorithms must be repeatable and generic.  Crowdsourcing benchmark.
  • 12. Conclusions and future work Benchmarkproposal  A collection of tasks with multiple choice options  Each task is repeated multiple times  Unpublished expert judgment for all the tasks  Publish answers completed in a controlled environment with the following categories of workers:     Honest workers Random clicks Semi automated program Organized group  Post-filtering methods are evaluated based on their ability to achieve high precision score.  Other parameter could be the money spent etc

Editor's Notes

  • #7:  If you are a task requester, you’d prefer to “hire” honest workers, and not automated programs nor dishonest workers. MTurk, for instances do not offer any guarantee for that, Furthermore they encourage the requester to (pay well, fairly and quickly). Beside, if one has a large amount of tasks, one will likely never go through all the submissions. How to the task requesters unsure quality then? - Go over all the submissions? - Blindly accept all? - Filter algorithm
  • #8: Many researchers looked at this particular issue and proposed solution. We can mainly distinguish two approaches1- Cheater disuasion, and pre-selection2- postprocessing
  • #9: Note that there is no evidence of existence of such groups
  • #13:  Conclusion and future work: So we tried to review some quality controls tool, and look at them with spammers eyes. By claiming insufficiency in available quality control tools we are mainly stressing that spammers are resourceful.So what does it take to build a bullet proof CS platform or filtering scheme? One solution do not fit all ..