SlideShare a Scribd company logo
The Bayesian Crowd: scalable
informaon combinaon for Cizen
Science and Crowdsourcing
Stephen Roberts
Machine Learning Research Group  Oxford-Man Instute
University of Oxford

Alan Turing Instute
Joint work with Edwin Simpson, Steven Reece  Ma-eo Venanzi
Bayes Nets Meeng, January 2017
Professor Steve Roberts; The Bayesian Crowd: scalable information combination for Cizen Science and Crowdsourcing
• Bayesian modelling allows for explicit incorporation of all desiderata
• Effort focused not only on theory development, but algorithmic
implementations that are timely  practical for real-world, real-time
scenarios
• Single, under- and over-arching philosophy…
“one method to rule them all… and in the darkness bind them”
“The language is that of Bayesian inference, which I will not utter
here...”
p(a|b) =
p(b|a)p(a)/p(b)
Core methodology – Bayesian inference
• Uncertainty at all levels of inference is naturally taken into account
• Optimal fusion of information: subjective, objective
• Handling missing values
• Handling of noise
• Principled inference of confidence and risk
• Optimal decision making
What does this buy us?
The scaling issue...
Data growth: Moore's law
The scale of things
The big data we generate
The rise of the flop and the fall in price
Science: the 4th paradigm?
Decision
Combination
How can we deal with unreliable worker responses and
very large datasets?
Big data: Square Kilometer Array,
10 petabytes compressed images/day
Noisy reports: Twi-er, Typhoon Haiyan
Aims: Reliability and E9ciency
●
Challenge: volunteers have varying reliability
– Di;erent knowledge, interests, skills
– Typically handled with redundancy → build a consensus
●
Challenge: datasets are large, what to priorise?
●
Aim: increase accuracy by learning reliability
●
Aim: use our volunteers' me eciently
– Reduce redundant decisions
– Deploy experts where needed
– Use addional data to scale up to larger datasets
Machine Learning: aggregate responses and assign
tasks intelligently
●
Probabilisc models of people and data
●
Handle uncertainty in model
●
Opmise and automate analysis to reduce costs
Machine learning
Data
Crowd AnnotaonsCrowd
Results
Zooniverse has 26 current applicaons across a
range of domains, with  1 million volunteers
● Can we use ML to handle variaons in ability?
● Or to match tasks to people's interests and skills?
How can we combine annotaons from
di;erent members of the crowd?
● Fewer annotaons needed from more reliable labellers
● ConCdence and trust → user weights
● But weighted majority is soE selecon
– Blurred decision boundaries
● Need to combine di;erent experse + weak labellers
Bayesian Methods
● Opmal framework for combining evidence
● Quanfy prior beliefs explicitly
– E.g. workers are mostly be-er than random
● QuanCes uncertainty at all levels
– Which agents are reliable?
– Do we need more evidence for an object's target class?
● Principled approach
– Move away from Cne-tuning each project
– E.g. avoid trial-and-error thresholds to determine when
consensus reached
How can we aggregate responses intelligently?
● Bayes' rule combines di@erent pieces of informaon
● Weight workers' contribuons through their likelihood
of response to class
● Opmal weighted majority decision
● Error guarantees
● SoD selecon
p(t|c)∝p(t)∏k ∈K
p(c
(k)
|t)
p(c(k)
|t) c(k)
t
Likelihood deCned by a confusion matrix
● Likelihood = of response to class :
● Richer than user accuracy weights:
– Di;ering skill levels in each class
– Responses need not be votes
p(c(k)
|t)
Response c(k)
Target
class
t
A B C
1 0.7 0.1 0.2
2 0.4 0.4 0.2
π(k)
c
(k)
t
Independent Bayesian classiCer combinaon
(IBCC) handles parameter uncertainty
Target labels
(multinomial)
Observed worker responses
(multinomial)
Worker-
specific
confusion
matrix
(Dirichlet)
Proportions of each
class (Dirichlet)
●
Deal raonally with limited or missing data
Hyperparameters encode prior beliefs in worker
behaviour, e.g. worker is be-er than random
●
Opmise/marginalise to handle model uncertainty
●
Share prior pseudo-counts between similar projects
●
Rao → relave
probability of
agent
responses given
class t
●
Magnitude →
strength of
prior beliefs
c(k)
t
A B C
1 7 1 2
2 4 4 2
Joint, condi
oned on hyper-hyper parameters
Inference
Gibbs sampling – rather slow
Variaonal Bayes – o;ers fast inference, at
expense of approximaons
Inference
-ve free energy Kullback-Leibler divergence
Variational Bayes
Variational Bayes
Variaonal Bayes: inOang the balloon
Variaonal Bayes: inOang the balloon
Variaonal Bayes: inOang the balloon
Users rate each presented object which provides a score of
-1 : very unlikely SN object
1 : possible SN object
3 : likely SN object
(“true” labels obtained retrospectively via Palomar Transient Factory
spectrographic analysis)
Zooniverse: Galaxy Zoo Supernovae
IBCC-VB outperforms alternaves
Galaxy Zoo
Supernovae
AUC
IBCC-VB 0.90
Mean 0.65
Weighted Sum 0.64
Weighted Majority 0.58
Area under ROC curve defining better
solutions
25,000 50,000 75,000 100,000 125,000 150,000
0.2
0.3
0.4
0.5
0.6
0.7
0.8
#labels
Accuracy
IBCC
DawidSkene
MV
Votedistribution
IBCC outperforms alternaves across domains
CrowdFlower Tweet Senment
IBCC
Galaxy Zoo
Supernovae
AUC
IBCC-VB 0.90
Mean 0.65
Weighted Sum 0.64
Weighted Majority 0.58
Community detecon over E[π] matrices:
behaviour types among Zooniverse users
Sensible Extreme Random Opmist Pessimist
● vbIBCC provides insights into crowd behaviour using
Bayesian community analysis
● Design training to inOuence these types
● CommunityBCC builds these types into the model to
be-er predict new workers
CommunityBCC builds these disnct types into
the model to be-er understand new workers
● Priors constrain the
worker model
● Fewer examples needed
to learn reliabilies
Dynamic IBCC: behaviour changes as people
learn, get bored, move...
● Detect a worker's current state: aggregate correctly,
select suitable tasks, inOuence behaviour
Current state
“true” decision label
(multinomial)
Set of all observed decisions
(multinomial)
Dirichlet
Dirichlet
Agent specific
“confusion” matrix
What about dynamics?
“true” decision label
(multinomial)
Set of all observed decisions
(multinomial)
Dirichlet
Dirichlet
Agent specific
“confusion” matrix
time
What about dynamics?
Dynamic IBCC tracks changes to the confusion
matrix over me
● Bayes' Clter
esmates
evolving
Markov chain
● Assumpon:
unexpected
behaviour →
state changes
Galaxy Zoo Supernovae example volunteer
Dynamic IBCC tracks changes to the confusion
matrix over me
● Bayes' Clter
esmates
evolving
Markov chain
● Assumpon:
unexpected
behaviour →
state changes
Mechanical Turk document classiCcaon
Modelling the data so we can deploy the
crowd more e9ciently...
Combining the crowd with features:
TREC Crowdsourcing Challenge
● IBCC + 2000 LDA features acng
as addional classiCers [11]
● Classify unlabelled documents
● Results:
– 0.81 AUC with only 16%
documents labelled at all
– 0.77 for next-best approach
– 1st place required mulple
labellings of all documents
BCCWords: an e9cient way to learn language
in new contexts
25,000 50,000 75,000 100,000 125,000 150,000
0.2
0.3
0.4
0.5
0.6
0.7
0.8
#labels
Accuracy
IBCC
CBCC
ScalBCCWords
MV(Textclassi+er
DawidSkene
MV
Votedistribution
CrowdFlower Tweet
Senment
Posive words about the
weather learnt by
BCCWords
BCCWords increases
accuracy with limited
labels
Unstructured data in social media: a rich
source of mely informaon
Real-me, local events – e.g. emergency reports aDer an
earthquake
Senment about products, health and social issues – e.g.
opinions about H1N1, product reviews
Butler 2013, Morrow et al. 2011
Understanding Textual Data Streams
● Turn unstructured data into reliable, machine-readable
informaon
● Automated classiBers struggle to understand diverse,
evolving language in new contexts
● Need new tools to resolve ambiguity and lack of
training data
Ushahidi – From Hai 2010 earthquake
Morrow et al. 2011
Categories of earthquake reports
Nepal, 2015, Quakemap.org
Gender
Kivran-Swaine et al., 2013
“Love” “Dude”
Interpreng Language through Crowdsourcing
● Biased and noisy interpretaons
● Scalability: the workers cannot label everything mulple mes
● New techniques needed to reduce the workload of labellers
using textual informaon
● How to learn a language model from unreliable judgements?
+
+
-
+
Repeve TasksRepeve Tasks Time Costs
Scenario: Senment Analysis of Tweets and
Reviews
Dataset Text Plaorm Sen
ment
Classes
No.
Documents
No.
Judgements
No.
Workers
2013
CrowdScale
shared
task
challenge
Tweets about
weather
CrowdFlower Posive
Negave
Neutral –
Not related X
Unknown ?
98,980 569,375 461
Rodrigues et
al., 2013
Ro-en
Tomatoes
Movie
Reviews
Amazon
Mechanical
Turk
Posive
Negave
5,000 27,747 203
“Morning sunshine”
09:18 PM June 7, 2011
“Is it rainy too?
Totally hate it”
10:05 PM June 7, 2011
“lovely sunny day”
10:06 PM June 7,
2011
Bayesian ClassiCer Combinaon with Words
BCCWords
●
Bayes' theorem provides a principled mathemacal
framework for classiCer combinaon
– Dawid  Skene, 1979; Kim  Ghahramani, 2012; Simpson et al., 2013;
Venanzi et al., 2014.
– Outperforms weighted majority vong etc.
+
+
-
+BCCWords
Bayesian ClassiCer Combinaon with Words
BCCWords
● Novel approach to combine weak signals from text
and crowd
– Model the reliability of members of the crowd
– Train a language model to reduce the number of
judgements needed
+
+
-
+BCCWords
Reliability of judgements deBned by a
confusion matrix for each worker
● DeBnes likelihood for worker k:
● Aggregate support for class c using Bayes' rule:
● Richer than weighng by overall accuracy:
– Accounts for bias and random noise
– Di@ering skill levels in each class
– Labels need not be votes for true class
p(label
(k)
|true class)
label(k)
True
class
+ve uncertain -ve
+ve 0.7 0.1 0.2
-ve 0.4 0.4 0.2
∏k∈K
p(label
(k)
|trueclass=c)
Likelihood of text features in each class: bag-of-
words
ωc=p(wordn|true class=c)
●
Words have di;erent likelihoods in each senment class
●
Prior distribuon over word likelihoods in each class
●
Learning posterior : update pseudo-counts as we observe words
in document of class c
Good, nice
More likely
Terrible
More likely
ωc
ωc
BCCWords: integrang this into one model...
BCCWords: judgements are condioned on
true class
Confusion
Matrix
Judgement
Label
True Class
BCCWords: judgements are condioned on
true class
Confusion
Matrix
Judgement
Label
True Class
N documents
BCCWords: judgements and words are
condioned on the true class
Confusion
Matrix
Judgement
Label
True Class
Word
Likelihoods
Words
ωc
N documents
BCCWords: judgements and words are
condioned on the true class
Use Bayes' rule to infer true class
from labels and words
Confusion
Matrix
Judgement
Label
True Class
Word
Likelihoods
Words
ωc
N documents
… but we need to
learn the likelihoods
from true class
labels
Variaonal Bayes: learn confusion matrices, language
model and true class with limited training data
●
Computaonally e9cient: 20 mins for 500k judgements, 98k tweets
●
Iteravely updates each variable in turn, learning from latent structure
and any prior knowledge or training data
●
Algorithm can be distributed to constrain memory requirements
Experiments: Senment Analysis of Tweets and
Reviews
Dataset Text Plaorm Sen
ment
Classes
No.
Documents
No.
Judgements
No.
Workers
2013
CrowdScale
shared
task
challenge
Tweets about
weather
CrowdFlower Posive
Negave
Neutral –
Not related X
Unknown ?
98,980 569,375 461
Rodrigues et
al., 2013
Ro-en
Tomatoes
Movie
Reviews
Amazon
Mechanical
Turk
Posive
Negave
5,000 27,747 203
“Morning sunshine”
09:18 PM June 7, 2011
“Is it rainy too?
Totally hate it”
10:05 PM June 7, 2011
“lovely sunny day”
10:06 PM June 7,
2011
Language Model for Weather Senment
Posive NegaveMost Likely Words
Discriminave Words
Disnct worker types show the importance of
learning reliability
1
0.5
0
1
0.5
0
1
1
0.5
True
class Worker
Label
Probability
Good Worker Inaccurate Worker
CrowdLower Weather – 5 classes
Summary: BCCWords fuses subjecve
interpretaons to learn models of language in
the wild
● Important to account for skills and bias
of individuals in crowd
● Learns worker reliability and language
model in a single integrated inference
algorithm
● Uses textual informaon to reduce the
number of judgements required
● Bayesian inference
– Proven framework for fusing informaon
– Handles uncertainty in true class labels
and model itself
1
0.5
0
1
0.5
0
1
0.5
0
1
0.5
0
Moving towards e9cient learning with
Crowd in-the-Loop
● Turn masses of unstructured, heterogeneous data into
reliable, machine-readable informaon
● Use the model to choose who does what task
1
0.5
0
1
0.5
0
1
0.5
0
1
0.5
0
● Detect di;erent interpretaons of language between communies
in the crowd?
Intelligent agent-task assignment:
who should classify which object?
● Aim: direct crowd's e;ort to learn quickly  cheaply
● Priorise tasks by considering their features and conCdence
in their classiCcaon
● Task choice depends on the workers available
● Maximise expected ulity
DynIBCC confusion matrix
describes individual skills
Ulity of response: informaon gain about
targets when DynIBCC is updated
● Naturally balances exploraon  exploitaon
● Explore an agent's behaviour from silver tasks
– Objects already labelled conCdently by crowd
– Increases ulity of past responses
● Exploit an agent's skills to learn uncertain targets t
E[U τ (k ,i)]=E[ I (t ; ci
(k)
∣Dτ )]
Index of target object
Worker ID
Crowdsourced data
collected so far
Time index
Hiring and Cring algorithm makes greedy
assignments to reduce computaonal cost
● Hire for priority task that matches current skills
● Fire if new crowd members likely to do be-er
Loose crowds on the web  in organisaons:
Disaster Response
● Extracng key informaon from noisy background
– Text: Twi-er, Ushahidi 15000 messages in a few weeks [8]
– Images: Satellite, Social Media
– Team communicaons, other agencies
● Locaons of emergencies:
– connuous target funcon
Bayesian crowdsourced heatmaps visualise
likely emergencies and informaon gaps
● Neighbouring reports related by spaal Gaussian
process (GP) classiCer
Κ
ti
Density of
emergencies
at (x,y)
Emergency
state at (x,y)
ci
(k)
π(k)
α0
(k)
Sigmoid funcon maps GP to Dirichlet
GP Variance
Bayesian crowdsourced heatmaps visualise
likely emergencies and informaon gaps
Ushahidi crowd + trusted report from Crst responder
Future Opportunies
Adapve training and movaon to create diverse
skills and smulate workers
●
Model worker preferences, rewards
●
Fast approximaons to future ulity
– Deduct cost of rewards
– Add retenon, work rate, reliability
– Target clusters of workers
●
Selecng tasks/training: consider person's
history
Apprenceship/Peer Training
Infer improvements in confusion
matrices from e;ect of task on others
Models for combining new data types  target
funcons
● Targets have mulple dimensions
– Shapes in PlanetFour
● Poisson processes, event rates
– Malaria rates
Acvely switch types of tasks to opmise
learning from the crowd
● Select quesons from decision tree
● Labelling, comparing, marking features, grouping...
● Ulity varies: accuracy of responses, current model of
features...
34.556
Maximise
informaon
about t
...is like...
Learn how people make decisions by
acvely adapng tasks
● Improve automaon,
reduce work
● Select interacon mode or
quesons in the micro-task
● Maximise informaon given
current model
● Crowd-supervised feature
extracon, e.g. adapng
PCA to learn more useful
features from the crowd
Projecon
Summary: Bayesian models enable accurate
and scalable crowdsourcing across domains
● Quanfy uncertainty in data model  worker behaviour
● Acvely learn from crowds using model of features
● Opportunies: opmisaon and learning to automate
with humans-in-the-loop
Machine
learning
Data
Crowd AnnotaonsCrowd
Results
ORCHID and Zooniverse collaborators worked
with Rescue Global to idenfy and then reCne
their crical informaon requirements.
• placement of life detectors and water
Clters within 50 mile radius of Kathmandu.
Crowd labelled 1200 Planet Labs satellite images
using Zooniverse soEware.
• Recruited 25 image labellers from within
Oxford University and Rescue Global sta;
(they worked hard over the bank holiday
weekend).
Folded in OpenStreetMap building density data
and inferred populaon density map using
ORCHID data processing algorithms.
Delivered map overlay to Rescue Global for
disseminaon to their CaDRA partners (SARaid,
Team Rubicon, CADENA).
29/04/15 to
2/05/15
02/05/15 to
20:13 GMT 05/05/15
00:15 GMT
06/05/15
05/05/15
25/04/15, 7.8 Earthquake in Gorkha District of Nepal
SoDware on Github
● h+p://www.robots.ox.ac.uk/~edwin/
– Please use and report bugs
● PyIBCC: IBCC-VB and DynIBCC-VB in Python 2
– Collaborang with Zooniverse
● MatlabIBCC: IBCC-VB and DynIBCC-VB in Matlab
Acknowledgements
● Uni of Southampton: Nick Jennings, Alex Rogers, Sarvapali
Ramchurn, Ma+eo Venanzi
● Oxford: Edwin Simpson, Steve Reece, Chris Linto+  Zooniverse team
● EPSRC (UK research council), the ORCHID project, Rescue Global,
MicrosoD, Zooniverse
References
[1] Dawid, A. P.,  Skene, A. M. (1979). Maximum likelihood esmaon of observer error-rates using the EM algorithm. Applied stascs, 20-28.
[2] Kim, H. C.,  Ghahramani, Z. (2012). Bayesian classiCer combinaon. In Internaonal conference on arCcial intelligence and stascs (pp. 619-
627).
[3] E. Simpson, S. Roberts, I. Psorakis, A. Smith and C. Linto- (2011). Bayesian Combinaon of Mulple, Imperfect ClassiCers. Proceedings of NIPS
2011 workshop
[4] Simpson, E., Roberts, S., Psorakis, I.,  Smith, A. (2013). Dynamic bayesian combinaon of mulple imperfect classiCers. In Decision Making and
Imperfecon (pp. 1-35). Springer.
[5] Psorakis, I., Roberts, S., Ebden, M.,  Sheldon, B. (2011). Overlapping Community Detecon using Bayesian Nonnegave Matrix Factorizaon.
Physical Review E, 83.
[6] Venanzi, M., Guiver, J., Kazai, G., Kohli, P.,  Shokouhi, M. (2014). Community-based bayesian aggregaon models for crowdsourcing. In
Proceedings of the 23rd internaonal conference on World wide web (pp. 155-164). Internaonal World Wide Web Conferences Steering
Commi-ee.
[7] E. Simpson, S. Roberts (2015 – to appear). Bayesian Methods for Intelligent Task Assignment in Crowdsourcing Systems, Scalable Decision
Making: Uncertainty, Imperfecon, Deliberaon; Studies in Computaonal Intelligence, Springer
[8] N. Morrow, N. Mock, A. Papendieck, and N. Kocmich (2011). Independent Evaluaon of the Ushahidi Hai Project. Development Informaon
Systems., 8:2011.
[9] MacKay, David J. C. (1992). Informaon-based objecve funcons for acve data selecon. Neural computaon, 4(4):590–604.
[10]Chen, X., Benne-, P. N., Collins-Thompson, K., and Horvitz, E. (2013). Pairwise ranking aggregaon in a crowdsourced se`ng. In Proceedings of
the sixth ACM internaonal conference on Web search and data mining. ACM
[11]E. Simpson, S. Reece, A. Penta, G. Ramchurn, and S. Roberts (2012). Using a Bayesian Model to Combine LDA Features with Crowdsourced
Responses. In The Twenty-First Text REtrieval Conference (TREC 2012), Crowdsourcing Track, NIST.
[12]S. Nitzan, J. Paroush (1982). Opmal decision rules in uncertain dichotomous choice situaons. Internaonal Economic Review, 23(2):289–297,
1982.
[13]D. Berend, A. Kontorovich (2014). Consistency of Weighted Majority Votes. NIPS
[14]Y. Zhang, X. Chen, D. Zhou, M. Jordan (2014). Spectral methods meet EM: a Provable Opmal Algorithm for Crowdsourcing.
Quesons?

More Related Content

PDF
David Barber - Deep Nets, Bayes and the story of AI
PDF
[PR12] Spectral Normalization for Generative Adversarial Networks
PPTX
IROS 2017 Slides
PDF
Continual Learning with Deep Architectures - Tutorial ICML 2021
PDF
[PR12] intro. to gans jaejun yoo
PDF
[PR12] understanding deep learning requires rethinking generalization
PDF
Learning In Nonstationary Environments: Perspectives And Applications. Part2:...
PPTX
Value iteration networks
David Barber - Deep Nets, Bayes and the story of AI
[PR12] Spectral Normalization for Generative Adversarial Networks
IROS 2017 Slides
Continual Learning with Deep Architectures - Tutorial ICML 2021
[PR12] intro. to gans jaejun yoo
[PR12] understanding deep learning requires rethinking generalization
Learning In Nonstationary Environments: Perspectives And Applications. Part2:...
Value iteration networks

What's hot (10)

PDF
Learning In Nonstationary Environments: Perspectives And Applications. Part1:...
PDF
Model-Based Reinforcement Learning @NIPS2017
PDF
Introduction to ambient GAN
PDF
Recent Trends in Deep Learning
PDF
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
PDF
Generative Adversarial Networks and Their Medical Imaging Applications
PDF
Evolving Rules to Solve Problems: The Learning Classifier Systems Way
PPTX
GECCO-2014 Learning Classifier Systems: A Gentle Introduction
PDF
Large Scale Distributed Deep Networks
PDF
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Learning In Nonstationary Environments: Perspectives And Applications. Part1:...
Model-Based Reinforcement Learning @NIPS2017
Introduction to ambient GAN
Recent Trends in Deep Learning
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Generative Adversarial Networks and Their Medical Imaging Applications
Evolving Rules to Solve Problems: The Learning Classifier Systems Way
GECCO-2014 Learning Classifier Systems: A Gentle Introduction
Large Scale Distributed Deep Networks
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Ad

Viewers also liked (20)

PPT
Señales de transito tics 3
ODP
Empresas de la chia
PPTX
PPTX
Presentación RSS
PDF
Presentacio catala
PPTX
Presentación fabián fernández
PPTX
Tecnologías de la informacion y la comunicación
PPTX
Dialnet y catálogo fama+
DOCX
This is a Ramp
PPTX
Ciudades digitales
PDF
Los 10 inventos.
PPTX
Presentación-Ana Garcia
PPTX
Segundo Parcial de Samuel Erben
PPTX
Personas de derecho público
DOCX
M2 t1 planificador_aamtic numerales 4.1 a 4.5
PPTX
Pintando te puedes divertir
PPS
ByG 1ºESO. TEMA 1. El Universo
PPT
Glosario de conceptos informáticos
PDF
Unidad 1
PPT
Tarea del seminario 2 buena
Señales de transito tics 3
Empresas de la chia
Presentación RSS
Presentacio catala
Presentación fabián fernández
Tecnologías de la informacion y la comunicación
Dialnet y catálogo fama+
This is a Ramp
Ciudades digitales
Los 10 inventos.
Presentación-Ana Garcia
Segundo Parcial de Samuel Erben
Personas de derecho público
M2 t1 planificador_aamtic numerales 4.1 a 4.5
Pintando te puedes divertir
ByG 1ºESO. TEMA 1. El Universo
Glosario de conceptos informáticos
Unidad 1
Tarea del seminario 2 buena
Ad

Similar to Professor Steve Roberts; The Bayesian Crowd: scalable information combination for Cizen Science and Crowdsourcing (20)

PDF
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
PPTX
1.1 Probability Theory and Naiv Bayse.pptx
PPT
UNIT2_NaiveBayes algorithms used in machine learning
PDF
Striving to Demystify Bayesian Computational Modelling
PDF
Machine learning naive bayes and svm.pdf
PPTX
Naïve Bayes Classifier Algorithm.pptx
PPT
lecture13-nbbbbb. Bbnnndnjdjdjbayes.ppt
PDF
Crowdsourcing: From Aggregation to Search Engine Evaluation
PDF
19BayesTheoremClassification19BayesTheoremClassification.ppt
PPT
lecture15-supervised.ppt
PDF
Modeling and Aggregation of Complex Annotations
PDF
NBaysian classifier, Naive Bayes classifier
PDF
Crowdsourcing for Information Retrieval: From Statistics to Ethics
PPTX
Model Development And Evaluation in ML.pptx
PDF
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
PDF
Active learning from crowds
ODP
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
PDF
Did you mean crowdsourcing for recommender systems?
PDF
classification_clean.pdf
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing
1.1 Probability Theory and Naiv Bayse.pptx
UNIT2_NaiveBayes algorithms used in machine learning
Striving to Demystify Bayesian Computational Modelling
Machine learning naive bayes and svm.pdf
Naïve Bayes Classifier Algorithm.pptx
lecture13-nbbbbb. Bbnnndnjdjdjbayes.ppt
Crowdsourcing: From Aggregation to Search Engine Evaluation
19BayesTheoremClassification19BayesTheoremClassification.ppt
lecture15-supervised.ppt
Modeling and Aggregation of Complex Annotations
NBaysian classifier, Naive Bayes classifier
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Model Development And Evaluation in ML.pptx
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Active learning from crowds
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
Did you mean crowdsourcing for recommender systems?
classification_clean.pdf

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Cloud computing and distributed systems.
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
A Presentation on Artificial Intelligence
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Cloud computing and distributed systems.
Machine Learning_overview_presentation.pptx
Programs and apps: productivity, graphics, security and other tools
A Presentation on Artificial Intelligence
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Assigned Numbers - 2025 - Bluetooth® Document
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectral efficient network and resource selection model in 5G networks
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
Chapter 3 Spatial Domain Image Processing.pdf

Professor Steve Roberts; The Bayesian Crowd: scalable information combination for Cizen Science and Crowdsourcing

  • 1. The Bayesian Crowd: scalable informaon combinaon for Cizen Science and Crowdsourcing Stephen Roberts Machine Learning Research Group Oxford-Man Instute University of Oxford Alan Turing Instute Joint work with Edwin Simpson, Steven Reece Ma-eo Venanzi Bayes Nets Meeng, January 2017
  • 3. • Bayesian modelling allows for explicit incorporation of all desiderata • Effort focused not only on theory development, but algorithmic implementations that are timely practical for real-world, real-time scenarios • Single, under- and over-arching philosophy… “one method to rule them all… and in the darkness bind them” “The language is that of Bayesian inference, which I will not utter here...” p(a|b) = p(b|a)p(a)/p(b) Core methodology – Bayesian inference
  • 4. • Uncertainty at all levels of inference is naturally taken into account • Optimal fusion of information: subjective, objective • Handling missing values • Handling of noise • Principled inference of confidence and risk • Optimal decision making What does this buy us?
  • 7. The scale of things
  • 8. The big data we generate
  • 9. The rise of the flop and the fall in price
  • 10. Science: the 4th paradigm?
  • 12. How can we deal with unreliable worker responses and very large datasets? Big data: Square Kilometer Array, 10 petabytes compressed images/day Noisy reports: Twi-er, Typhoon Haiyan
  • 13. Aims: Reliability and E9ciency ● Challenge: volunteers have varying reliability – Di;erent knowledge, interests, skills – Typically handled with redundancy → build a consensus ● Challenge: datasets are large, what to priorise? ● Aim: increase accuracy by learning reliability ● Aim: use our volunteers' me eciently – Reduce redundant decisions – Deploy experts where needed – Use addional data to scale up to larger datasets
  • 14. Machine Learning: aggregate responses and assign tasks intelligently ● Probabilisc models of people and data ● Handle uncertainty in model ● Opmise and automate analysis to reduce costs Machine learning Data Crowd AnnotaonsCrowd Results
  • 15. Zooniverse has 26 current applicaons across a range of domains, with 1 million volunteers ● Can we use ML to handle variaons in ability? ● Or to match tasks to people's interests and skills?
  • 16. How can we combine annotaons from di;erent members of the crowd? ● Fewer annotaons needed from more reliable labellers ● ConCdence and trust → user weights ● But weighted majority is soE selecon – Blurred decision boundaries ● Need to combine di;erent experse + weak labellers
  • 17. Bayesian Methods ● Opmal framework for combining evidence ● Quanfy prior beliefs explicitly – E.g. workers are mostly be-er than random ● QuanCes uncertainty at all levels – Which agents are reliable? – Do we need more evidence for an object's target class? ● Principled approach – Move away from Cne-tuning each project – E.g. avoid trial-and-error thresholds to determine when consensus reached
  • 18. How can we aggregate responses intelligently? ● Bayes' rule combines di@erent pieces of informaon ● Weight workers' contribuons through their likelihood of response to class ● Opmal weighted majority decision ● Error guarantees ● SoD selecon p(t|c)∝p(t)∏k ∈K p(c (k) |t) p(c(k) |t) c(k) t
  • 19. Likelihood deCned by a confusion matrix ● Likelihood = of response to class : ● Richer than user accuracy weights: – Di;ering skill levels in each class – Responses need not be votes p(c(k) |t) Response c(k) Target class t A B C 1 0.7 0.1 0.2 2 0.4 0.4 0.2 π(k) c (k) t
  • 20. Independent Bayesian classiCer combinaon (IBCC) handles parameter uncertainty Target labels (multinomial) Observed worker responses (multinomial) Worker- specific confusion matrix (Dirichlet) Proportions of each class (Dirichlet) ● Deal raonally with limited or missing data
  • 21. Hyperparameters encode prior beliefs in worker behaviour, e.g. worker is be-er than random ● Opmise/marginalise to handle model uncertainty ● Share prior pseudo-counts between similar projects ● Rao → relave probability of agent responses given class t ● Magnitude → strength of prior beliefs c(k) t A B C 1 7 1 2 2 4 4 2
  • 22. Joint, condi oned on hyper-hyper parameters Inference Gibbs sampling – rather slow Variaonal Bayes – o;ers fast inference, at expense of approximaons Inference
  • 23. -ve free energy Kullback-Leibler divergence Variational Bayes
  • 25. Variaonal Bayes: inOang the balloon
  • 26. Variaonal Bayes: inOang the balloon
  • 27. Variaonal Bayes: inOang the balloon
  • 28. Users rate each presented object which provides a score of -1 : very unlikely SN object 1 : possible SN object 3 : likely SN object (“true” labels obtained retrospectively via Palomar Transient Factory spectrographic analysis) Zooniverse: Galaxy Zoo Supernovae
  • 29. IBCC-VB outperforms alternaves Galaxy Zoo Supernovae AUC IBCC-VB 0.90 Mean 0.65 Weighted Sum 0.64 Weighted Majority 0.58 Area under ROC curve defining better solutions
  • 30. 25,000 50,000 75,000 100,000 125,000 150,000 0.2 0.3 0.4 0.5 0.6 0.7 0.8 #labels Accuracy IBCC DawidSkene MV Votedistribution IBCC outperforms alternaves across domains CrowdFlower Tweet Senment IBCC Galaxy Zoo Supernovae AUC IBCC-VB 0.90 Mean 0.65 Weighted Sum 0.64 Weighted Majority 0.58
  • 31. Community detecon over E[π] matrices: behaviour types among Zooniverse users Sensible Extreme Random Opmist Pessimist ● vbIBCC provides insights into crowd behaviour using Bayesian community analysis ● Design training to inOuence these types ● CommunityBCC builds these types into the model to be-er predict new workers
  • 32. CommunityBCC builds these disnct types into the model to be-er understand new workers ● Priors constrain the worker model ● Fewer examples needed to learn reliabilies
  • 33. Dynamic IBCC: behaviour changes as people learn, get bored, move... ● Detect a worker's current state: aggregate correctly, select suitable tasks, inOuence behaviour Current state
  • 34. “true” decision label (multinomial) Set of all observed decisions (multinomial) Dirichlet Dirichlet Agent specific “confusion” matrix What about dynamics?
  • 35. “true” decision label (multinomial) Set of all observed decisions (multinomial) Dirichlet Dirichlet Agent specific “confusion” matrix time What about dynamics?
  • 36. Dynamic IBCC tracks changes to the confusion matrix over me ● Bayes' Clter esmates evolving Markov chain ● Assumpon: unexpected behaviour → state changes Galaxy Zoo Supernovae example volunteer
  • 37. Dynamic IBCC tracks changes to the confusion matrix over me ● Bayes' Clter esmates evolving Markov chain ● Assumpon: unexpected behaviour → state changes Mechanical Turk document classiCcaon
  • 38. Modelling the data so we can deploy the crowd more e9ciently...
  • 39. Combining the crowd with features: TREC Crowdsourcing Challenge ● IBCC + 2000 LDA features acng as addional classiCers [11] ● Classify unlabelled documents ● Results: – 0.81 AUC with only 16% documents labelled at all – 0.77 for next-best approach – 1st place required mulple labellings of all documents
  • 40. BCCWords: an e9cient way to learn language in new contexts 25,000 50,000 75,000 100,000 125,000 150,000 0.2 0.3 0.4 0.5 0.6 0.7 0.8 #labels Accuracy IBCC CBCC ScalBCCWords MV(Textclassi+er DawidSkene MV Votedistribution CrowdFlower Tweet Senment Posive words about the weather learnt by BCCWords BCCWords increases accuracy with limited labels
  • 41. Unstructured data in social media: a rich source of mely informaon Real-me, local events – e.g. emergency reports aDer an earthquake Senment about products, health and social issues – e.g. opinions about H1N1, product reviews Butler 2013, Morrow et al. 2011
  • 42. Understanding Textual Data Streams ● Turn unstructured data into reliable, machine-readable informaon ● Automated classiBers struggle to understand diverse, evolving language in new contexts ● Need new tools to resolve ambiguity and lack of training data Ushahidi – From Hai 2010 earthquake Morrow et al. 2011 Categories of earthquake reports Nepal, 2015, Quakemap.org Gender Kivran-Swaine et al., 2013 “Love” “Dude”
  • 43. Interpreng Language through Crowdsourcing ● Biased and noisy interpretaons ● Scalability: the workers cannot label everything mulple mes ● New techniques needed to reduce the workload of labellers using textual informaon ● How to learn a language model from unreliable judgements? + + - + Repeve TasksRepeve Tasks Time Costs
  • 44. Scenario: Senment Analysis of Tweets and Reviews Dataset Text Plaorm Sen ment Classes No. Documents No. Judgements No. Workers 2013 CrowdScale shared task challenge Tweets about weather CrowdFlower Posive Negave Neutral – Not related X Unknown ? 98,980 569,375 461 Rodrigues et al., 2013 Ro-en Tomatoes Movie Reviews Amazon Mechanical Turk Posive Negave 5,000 27,747 203 “Morning sunshine” 09:18 PM June 7, 2011 “Is it rainy too? Totally hate it” 10:05 PM June 7, 2011 “lovely sunny day” 10:06 PM June 7, 2011
  • 45. Bayesian ClassiCer Combinaon with Words BCCWords ● Bayes' theorem provides a principled mathemacal framework for classiCer combinaon – Dawid Skene, 1979; Kim Ghahramani, 2012; Simpson et al., 2013; Venanzi et al., 2014. – Outperforms weighted majority vong etc. + + - +BCCWords
  • 46. Bayesian ClassiCer Combinaon with Words BCCWords ● Novel approach to combine weak signals from text and crowd – Model the reliability of members of the crowd – Train a language model to reduce the number of judgements needed + + - +BCCWords
  • 47. Reliability of judgements deBned by a confusion matrix for each worker ● DeBnes likelihood for worker k: ● Aggregate support for class c using Bayes' rule: ● Richer than weighng by overall accuracy: – Accounts for bias and random noise – Di@ering skill levels in each class – Labels need not be votes for true class p(label (k) |true class) label(k) True class +ve uncertain -ve +ve 0.7 0.1 0.2 -ve 0.4 0.4 0.2 ∏k∈K p(label (k) |trueclass=c)
  • 48. Likelihood of text features in each class: bag-of- words ωc=p(wordn|true class=c) ● Words have di;erent likelihoods in each senment class ● Prior distribuon over word likelihoods in each class ● Learning posterior : update pseudo-counts as we observe words in document of class c Good, nice More likely Terrible More likely ωc ωc
  • 49. BCCWords: integrang this into one model...
  • 50. BCCWords: judgements are condioned on true class Confusion Matrix Judgement Label True Class
  • 51. BCCWords: judgements are condioned on true class Confusion Matrix Judgement Label True Class N documents
  • 52. BCCWords: judgements and words are condioned on the true class Confusion Matrix Judgement Label True Class Word Likelihoods Words ωc N documents
  • 53. BCCWords: judgements and words are condioned on the true class Use Bayes' rule to infer true class from labels and words Confusion Matrix Judgement Label True Class Word Likelihoods Words ωc N documents … but we need to learn the likelihoods from true class labels
  • 54. Variaonal Bayes: learn confusion matrices, language model and true class with limited training data ● Computaonally e9cient: 20 mins for 500k judgements, 98k tweets ● Iteravely updates each variable in turn, learning from latent structure and any prior knowledge or training data ● Algorithm can be distributed to constrain memory requirements
  • 55. Experiments: Senment Analysis of Tweets and Reviews Dataset Text Plaorm Sen ment Classes No. Documents No. Judgements No. Workers 2013 CrowdScale shared task challenge Tweets about weather CrowdFlower Posive Negave Neutral – Not related X Unknown ? 98,980 569,375 461 Rodrigues et al., 2013 Ro-en Tomatoes Movie Reviews Amazon Mechanical Turk Posive Negave 5,000 27,747 203 “Morning sunshine” 09:18 PM June 7, 2011 “Is it rainy too? Totally hate it” 10:05 PM June 7, 2011 “lovely sunny day” 10:06 PM June 7, 2011
  • 56. Language Model for Weather Senment Posive NegaveMost Likely Words Discriminave Words
  • 57. Disnct worker types show the importance of learning reliability 1 0.5 0 1 0.5 0 1 1 0.5 True class Worker Label Probability Good Worker Inaccurate Worker CrowdLower Weather – 5 classes
  • 58. Summary: BCCWords fuses subjecve interpretaons to learn models of language in the wild ● Important to account for skills and bias of individuals in crowd ● Learns worker reliability and language model in a single integrated inference algorithm ● Uses textual informaon to reduce the number of judgements required ● Bayesian inference – Proven framework for fusing informaon – Handles uncertainty in true class labels and model itself 1 0.5 0 1 0.5 0 1 0.5 0 1 0.5 0
  • 59. Moving towards e9cient learning with Crowd in-the-Loop ● Turn masses of unstructured, heterogeneous data into reliable, machine-readable informaon ● Use the model to choose who does what task 1 0.5 0 1 0.5 0 1 0.5 0 1 0.5 0 ● Detect di;erent interpretaons of language between communies in the crowd?
  • 60. Intelligent agent-task assignment: who should classify which object? ● Aim: direct crowd's e;ort to learn quickly cheaply ● Priorise tasks by considering their features and conCdence in their classiCcaon ● Task choice depends on the workers available ● Maximise expected ulity DynIBCC confusion matrix describes individual skills
  • 61. Ulity of response: informaon gain about targets when DynIBCC is updated ● Naturally balances exploraon exploitaon ● Explore an agent's behaviour from silver tasks – Objects already labelled conCdently by crowd – Increases ulity of past responses ● Exploit an agent's skills to learn uncertain targets t E[U τ (k ,i)]=E[ I (t ; ci (k) ∣Dτ )] Index of target object Worker ID Crowdsourced data collected so far Time index
  • 62. Hiring and Cring algorithm makes greedy assignments to reduce computaonal cost ● Hire for priority task that matches current skills ● Fire if new crowd members likely to do be-er
  • 63. Loose crowds on the web in organisaons: Disaster Response ● Extracng key informaon from noisy background – Text: Twi-er, Ushahidi 15000 messages in a few weeks [8] – Images: Satellite, Social Media – Team communicaons, other agencies ● Locaons of emergencies: – connuous target funcon
  • 64. Bayesian crowdsourced heatmaps visualise likely emergencies and informaon gaps ● Neighbouring reports related by spaal Gaussian process (GP) classiCer Κ ti Density of emergencies at (x,y) Emergency state at (x,y) ci (k) π(k) α0 (k) Sigmoid funcon maps GP to Dirichlet GP Variance
  • 65. Bayesian crowdsourced heatmaps visualise likely emergencies and informaon gaps Ushahidi crowd + trusted report from Crst responder
  • 67. Adapve training and movaon to create diverse skills and smulate workers ● Model worker preferences, rewards ● Fast approximaons to future ulity – Deduct cost of rewards – Add retenon, work rate, reliability – Target clusters of workers ● Selecng tasks/training: consider person's history Apprenceship/Peer Training Infer improvements in confusion matrices from e;ect of task on others
  • 68. Models for combining new data types target funcons ● Targets have mulple dimensions – Shapes in PlanetFour ● Poisson processes, event rates – Malaria rates
  • 69. Acvely switch types of tasks to opmise learning from the crowd ● Select quesons from decision tree ● Labelling, comparing, marking features, grouping... ● Ulity varies: accuracy of responses, current model of features... 34.556 Maximise informaon about t ...is like...
  • 70. Learn how people make decisions by acvely adapng tasks ● Improve automaon, reduce work ● Select interacon mode or quesons in the micro-task ● Maximise informaon given current model ● Crowd-supervised feature extracon, e.g. adapng PCA to learn more useful features from the crowd Projecon
  • 71. Summary: Bayesian models enable accurate and scalable crowdsourcing across domains ● Quanfy uncertainty in data model worker behaviour ● Acvely learn from crowds using model of features ● Opportunies: opmisaon and learning to automate with humans-in-the-loop Machine learning Data Crowd AnnotaonsCrowd Results
  • 72. ORCHID and Zooniverse collaborators worked with Rescue Global to idenfy and then reCne their crical informaon requirements. • placement of life detectors and water Clters within 50 mile radius of Kathmandu. Crowd labelled 1200 Planet Labs satellite images using Zooniverse soEware. • Recruited 25 image labellers from within Oxford University and Rescue Global sta; (they worked hard over the bank holiday weekend). Folded in OpenStreetMap building density data and inferred populaon density map using ORCHID data processing algorithms. Delivered map overlay to Rescue Global for disseminaon to their CaDRA partners (SARaid, Team Rubicon, CADENA). 29/04/15 to 2/05/15 02/05/15 to 20:13 GMT 05/05/15 00:15 GMT 06/05/15 05/05/15 25/04/15, 7.8 Earthquake in Gorkha District of Nepal
  • 73. SoDware on Github ● h+p://www.robots.ox.ac.uk/~edwin/ – Please use and report bugs ● PyIBCC: IBCC-VB and DynIBCC-VB in Python 2 – Collaborang with Zooniverse ● MatlabIBCC: IBCC-VB and DynIBCC-VB in Matlab Acknowledgements ● Uni of Southampton: Nick Jennings, Alex Rogers, Sarvapali Ramchurn, Ma+eo Venanzi ● Oxford: Edwin Simpson, Steve Reece, Chris Linto+ Zooniverse team ● EPSRC (UK research council), the ORCHID project, Rescue Global, MicrosoD, Zooniverse
  • 74. References [1] Dawid, A. P., Skene, A. M. (1979). Maximum likelihood esmaon of observer error-rates using the EM algorithm. Applied stascs, 20-28. [2] Kim, H. C., Ghahramani, Z. (2012). Bayesian classiCer combinaon. In Internaonal conference on arCcial intelligence and stascs (pp. 619- 627). [3] E. Simpson, S. Roberts, I. Psorakis, A. Smith and C. Linto- (2011). Bayesian Combinaon of Mulple, Imperfect ClassiCers. Proceedings of NIPS 2011 workshop [4] Simpson, E., Roberts, S., Psorakis, I., Smith, A. (2013). Dynamic bayesian combinaon of mulple imperfect classiCers. In Decision Making and Imperfecon (pp. 1-35). Springer. [5] Psorakis, I., Roberts, S., Ebden, M., Sheldon, B. (2011). Overlapping Community Detecon using Bayesian Nonnegave Matrix Factorizaon. Physical Review E, 83. [6] Venanzi, M., Guiver, J., Kazai, G., Kohli, P., Shokouhi, M. (2014). Community-based bayesian aggregaon models for crowdsourcing. In Proceedings of the 23rd internaonal conference on World wide web (pp. 155-164). Internaonal World Wide Web Conferences Steering Commi-ee. [7] E. Simpson, S. Roberts (2015 – to appear). Bayesian Methods for Intelligent Task Assignment in Crowdsourcing Systems, Scalable Decision Making: Uncertainty, Imperfecon, Deliberaon; Studies in Computaonal Intelligence, Springer [8] N. Morrow, N. Mock, A. Papendieck, and N. Kocmich (2011). Independent Evaluaon of the Ushahidi Hai Project. Development Informaon Systems., 8:2011. [9] MacKay, David J. C. (1992). Informaon-based objecve funcons for acve data selecon. Neural computaon, 4(4):590–604. [10]Chen, X., Benne-, P. N., Collins-Thompson, K., and Horvitz, E. (2013). Pairwise ranking aggregaon in a crowdsourced se`ng. In Proceedings of the sixth ACM internaonal conference on Web search and data mining. ACM [11]E. Simpson, S. Reece, A. Penta, G. Ramchurn, and S. Roberts (2012). Using a Bayesian Model to Combine LDA Features with Crowdsourced Responses. In The Twenty-First Text REtrieval Conference (TREC 2012), Crowdsourcing Track, NIST. [12]S. Nitzan, J. Paroush (1982). Opmal decision rules in uncertain dichotomous choice situaons. Internaonal Economic Review, 23(2):289–297, 1982. [13]D. Berend, A. Kontorovich (2014). Consistency of Weighted Majority Votes. NIPS [14]Y. Zhang, X. Chen, D. Zhou, M. Jordan (2014). Spectral methods meet EM: a Provable Opmal Algorithm for Crowdsourcing.