SlideShare a Scribd company logo
Matt Lease
Associate Professor
School of Information
The University of Texas at Austin
Amazon Scholar
Human-in-the-loop Services
Amazon Web Services (AWS)
Automated Models for Quantifying
Centrality of Survey Responses
1
Lab: ir.ischool.utexas.edu
@mattlease
Slides: slideshare.net/mattlease
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark.
Human-in-the-loop Services
• 3 Team Products: Mechanical Turk,
Sagemaker Ground Truth, and Augmented AI (A2I)
• https://www.amazon.science/research-awards
– Cash and/or AWS credits
• Summer, sabbatical, or longer engagements
– https://www.amazon.science/scholars
– https://www.amazon.science/visiting-academics
• https://www.amazon.science/tag/internships
HTTPS://WWW.HUMANCOMPUTATION.COM
3
What’s the capital of Texas?
Austin
Austin
Houston
4
What’s the capital of Texas?
Austin
Austin
Houston
Majority Vote
5
Simple annotation & aggregation
Classification
• sentiment analysis
• image categorization
Ordinal rating
• product & movie reviews
• search relevance
Aggregation
• Crowdsourcing: quality control
• Experts: wisdom of crowds
• Goal: select best label available
for each item (no label fusion)
6
Caption this image:
7
A cat is
eating
The cat
eats
A beautiful
picture
Caption this image:
When majority voting falls short
Problem: large label space, exact match doesn’t work!
8
A cat is
eating
The cat
eats
A beautiful
picture
What about complex annotations?
Ranked lists
Parse trees
A1: A cat is eating
A2: The cat eats
A3: A beautiful picture
Image captions
Range sequences
9
10
Alexander Braylan1 and Matthew Lease2
1
Dept. of Computer Science & 2
School of Information
The University of Texas at Austin
Modeling and Aggregation of Complex
Annotations via Annotation Distance
Code & Data: https://github.com/Praznat/annotationmodeling
https://github.com/Praznat/annotationmodeling
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
11
https://github.com/Praznat/annotationmodeling
Aggregating Simple Labels
• Hundreds of papers
• Multiple benchmarking studies
• Rich body of Bayesian modeling
• General-purpose aggregation
models for simple labels don’t
support complex labels
Dawid-Skene MACE
Hierarchical Dawid-Skene
Item Difficulty
Logistic Random Effects
Source:
Paun et al 2018
“Comparing bayesian
models of annotation”
12
https://github.com/Praznat/annotationmodeling
Task-specific models
• Pros:
– Task specialization
maximizes accuracy
• Cons:
– Need new model for
every task
– Complicated, difficult
to formulate
Nguyen et al 2017 (Sequences)
Lin, Mausam, and Weld 2012 (Math)
13
https://github.com/Praznat/annotationmodeling
Our goals
• We want aggregation for complex data types
– Build on ideas from simple label aggregation models
• We want to generalize across many labeling tasks
– Can we reduce problem to common simpler state space?
14
https://github.com/Praznat/annotationmodeling
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
15
https://github.com/Praznat/annotationmodeling
Key Insight
Partial credit matching via task-specific distance function
• Adopt or define a distance function for each annotation task
• Model annotation distances uniformly across tasks
• Distance functions already exist for many task types
– Free-text responses, e.g., survey questions
16
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
17
• Example task: free text answer
• Example distance function:
string edit distance
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.05
0.1
0.1
18
• Example task: free text answer
• Example distance function:
string edit distance
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.8
0.82
0.05
0.1
0.1
19
0.82
• Example task: free text answer
• Example distance function:
string edit distance
https://github.com/Praznat/annotationmodeling
Example Distance: Levenshtein
20
https://github.com/Praznat/annotationmodeling
Example Distance: Word embeddings
21
https://github.com/Praznat/annotationmodeling
Distance function properties
22
Properties of distance functions
Non-negativity
Symmetry
Triangle inequality
Data Free Text Rankings
Example
evaluation fn
BLEU(x, y)
Example
distance fn
Non-negativity ✓ ✓
Symmetry ✓ ✓
Triangle
inequality
✓ ✓
https://github.com/Praznat/annotationmodeling
Calculate distances
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.8
0.82
0.05
0.1
0.1
23
0.82
https://github.com/Praznat/annotationmodeling
A1: A cat is eating
A2: The cat eats
A3: A beautiful
picture
0.1 0.6
0.3
24
All tasks reduce to
matrices of distances
https://github.com/Praznat/annotationmodeling
How to aggregate given distances
• Local selection model
• Global selection model
• Combined
25
Current item
Other items
https://github.com/Praznat/annotationmodeling
Local approach: Smallest Avg Distance (SAD)
• For each question: compute average
distance between responses
• The response with smallest average
distance is locally most normative,
generalizing majority vote
• Independence between items
• Local approach does not model
respondent agreement
26
Current item
Other items
https://github.com/Praznat/annotationmodeling
Global approach: Best Available User (BAU)
• Score each participant by their
average distance to all other
participants across all questions
• The participant with lowest score is
globally most normative; treat their
response as most normative
• Global approach ignores distance
observed on the current item
27
Current item
Other items
https://github.com/Praznat/annotationmodeling
Can we get best of both worlds?
• Want a method that combines:
– Best available user (global)
– Smallest avg distance (local)
• Should build on rich history of work on Bayesian annotation modeling
• Need a principled framework for modeling annotation distance matrices
weights
votes weighted voting
28
https://github.com/Praznat/annotationmodeling
Multidimensional Annotation Scaling (MAS)
• Based on Multidimensional
Scaling (Kruskal & Wish 1978)
• Probabilistic model of multi-
item distance matrices
• “Hierarchical Bayesian”
– Additional learned parameters
represent crowd effects such as
worker reliability
A cat is
eating
The cat
eats
A beautiful
picture
29
https://github.com/Praznat/annotationmodeling
MAS Objective 1: Likelihood
Multidimensional Scaling
objective:
Diuv ∼ N(∥εiu−εiv∥, σ)
• Diuv : observed distance
• εiu : annotation embedding
• σ : error scale
“a cat is eating” “cat is eating”
“a beautiful picture” “the cat eats”
0.8
0.82
0.05
0.1
0.1
0.82
30
https://github.com/Praznat/annotationmodeling
MAS Objective 1: Likelihood
Multidimensional Scaling
objective:
Diuv ∼ N(∥εiu−εiv∥, σ)
• Diuv : observed distance
• εiu : annotation embedding
• σ : error scale
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
0.8
0.82
0.05
0.1
0.1
0.82
31
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
Pseudo-gold
32
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
33
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
“a cat is eating”
“cat is eating”
“a beautiful picture”
“the cat eats”
34
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
35
https://github.com/Praznat/annotationmodeling
MAS Objective 2: Prior
36
https://github.com/Praznat/annotationmodelingç
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
37
https://github.com/Praznat/annotationmodeling
Example Output: father
38
Response SAD MAS
He always speaks ill about his father behind back. 0.78 0.16
He always speaks ill of his father behind his back. 0.71 0.30
He always talks about his father behind his back. 0.74 0.50
He always speaks ill of his father 0.78 0.55
He always speak ill of his father. 0.79 0.62
He is always talking about his father behind his back. 0.82 0.63
He always says behind his father. 0.90 0.72
He always talks about his dad behind his back. 0.83 0.73
https://github.com/Praznat/annotationmodelingç
Example Output: she says
39
Response SAD MAS
Please be sure to take a note of what she says. 0.77 0.16
Please take a note of what she says. 0.84 0.30
Be sure to take a warning notice what she says. 0.86 0.46
Please be sure to take notes what she says. 0.81 0.48
Please take a note what she say. 0.92 0.73
Please be sure to take instructions for her saying. 0.93 0.76
Make sure to insert disclaimer about what she says. 0.93 0.80
Please make a memo whatever she says. 0.99 0.82
https://github.com/Praznat/annotationmodelingç
Example Output: quiet
40
Response SAD MAS
As long as you keep quiet you may stay here 0.83 0.26
You can stay here as long as you keep quiet. 0.86 0.39
You may stay here if you keep quiet. 0.81 0.39
You can stay here if you keep quiet. 0.82 0.57
So long as you remain quiet you may stay here. 0.92 0.57
If it is quiet you may stay here 0.90 0.70
If you keep quiet you can stay here. 0.92 0.81
You may be here if you keep quiet. 0.91 0.84
https://github.com/Praznat/annotationmodelingç
Example Output: go ahead
41
Response SAD MAS
Please go ahead if i am late. 0.83 0.16
Please go ahead if I'm late. 0.79 0.28
Please go ahead if I delayed. 0.82 0.51
Please go without me if I'm late. 0.91 0.62
Please go ahead if I get late 0.83 0.67
Please go ahead and leave if I'm late. 0.88 0.74
If I am late you can go in first. 1.00 0.79
If I should be late go without me. 1.00 0.81
https://github.com/Praznat/annotationmodelingç
Example Output: married
42
Response SAD MAS
Actually they are not married 0.91 0.18
To tell the truth they are not couple 0.79 0.47
To tell the truth they are not a married couple 0.84 0.62
To tell the truth they're not married 0.89 0.63
In fact they are not couple 0.94 0.69
to telling the truth we're not married 0.97 0.71
Two people are not couples in truth 1.00 0.79
https://github.com/Praznat/annotationmodelingç
Roadmap
• Prior work
• Approach
• Example outputs
• Conclusion
43
https://github.com/Praznat/annotationmodeling
Conclusion
• Probabilistic model identifies normative vs. outlier
responses by quantifying distance between responses
• Many choices for measuring distance between two
texts (e.g., character-based or more semantic NLP)
• 3 models: local (SAD), global (BAU), or combo (MAS)
• Open source: github.com/Praznat/annotationmodeling
44
A1: A cat is eating
A2: The cat eats
A3: A beautiful picture
https://github.com/Praznat/annotationmodeling
Future work
45
A1: A cat is eating
A2: The cat eats
A3: A beautiful picture
• From objective labeling tasks to subjective responses
• Evaluation on survey data
– Collaboration with behavioral science researchers?
– Compare distance functions and model settings for utility
• Automatic detection of consistent biases in a
participant’s responses vs. what’s group normative
https://github.com/Praznat/annotationmodeling
46
Matt Lease (University of Texas at Austin)
Lab: ir.ischool.utexas.edu
@mattlease
Slides: slideshare.net/mattlease
We thank our many talented crowd workers
for their contributions to our research!
https://github.com/Praznat/annotationmodeling
Alexander Braylan and Matthew Lease. Aggregating Complex Annotations via Merging and Matching.
In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data
Mining, pages 86--94, 2021. [ bib | pdf | data | sourcecode | video | slides | tech-report ]
Alexander Braylan and Matthew Lease. Modeling and Aggregation of Complex Annotations via
Annotation Distances. In Proceedings of the Web Conference, pages 1807--1818, 2020.
[ bib | pdf | data | sourcecode | video | slides ]
Bonus Material
47
MTurk: The Early Days
48
• Artificial Intelligence, With Help From the Humans.
– J. Pontin. NY Times, March 25, 2007
• Is Amazon's Mechanical Turk a Failure? April 9, 2007
– “As of this writing, there are [only] 128 HITs available on Mechanical Turk.”
• Su et al., WWW 2007: “a web-based human data collection system… ‘System M’ ”
2008: the ”Gold” Rush Begins
Braylan and Lease 49
Snow et al, EMNLP (Natural Language Processing)
• Annotating human language for natural language processing (NLP)
• 22,000 labels for only $26 USD
• Crowd’s consensus labels can replace traditional expert labels
“Discovery” sparks rush for “gold” data across areas
• Alonso et al., SIGIR Forum (Information Retrieval)
• Kittur et al., CHI (Human-Computer Interaction)
• Sorokin and Forsythe, CVPR (Computer Vision)
2010-11: Social & Behavioral Sciences
50
• A Guide to Behavioral Experiments on Mechanical Turk
– W. Mason and S. Suri (2010). SSRN online.
• Crowdsourcing for Human Subjects Research
– L. Schmidt (CrowdConf 2010)
• Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk
– Conley & Tosti-Kharas (2010). Academy of Management
• Amazon's Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data?
– M. Buhrmester et al. (2011). Perspectives… 6(1):3-5.
– see also: Amazon Mechanical Turk Guide for Social Scientists
The Future of Crowd Work (ACM CSCW’13)
by Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton
51
Braylan and Lease 52
Example Output
53
https://github.com/Praznat/annotationmodeling
Braylan and Lease 54
Tasks & datasets
SYNTHETIC DATASETS
• Syntactic parse trees
– Distance function: evalb
• Ranked lists
– Distance function: Kendall’s tau
REAL DATASETS
• Biomedical text sequences
– Distance function: Span F1
• Urdu-English translations
– Distance function: GLEU
55
Nguyen et al 2017
Zaidan and Callison-Burch 2011
Methods
Baselines:
• Random User (RU): pick one label randomly
• ZenCrowd (ZC) (Demartini et al. 2012)
– Weighted voting based on exact match (rare!)
• Crowd Hidden Markov Model (CHMM) (Nguyen et al. 2017)
– Sequence annotation task only
Upper bound: Oracle (OR) (always picks best label)
• Even if 5 workers answer, limited by best answer any of them gave
56
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.246
Sequences F1 0.561 0.827
Parses EVALB 0.812 0.939
Rankings 0.491 0.724
57
• Diverse complex label datasets
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.188 0.246
Sequences F1 0.561 0.569 0.827
Parses EVALB 0.812 0.819 0.939
Rankings 0.491 0.495 0.724
58
• Diverse complex label datasets
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.188 - 0.246
Sequences F1 0.561 0.569 0.702 0.827
Parses EVALB 0.812 0.819 - 0.939
Rankings 0.491 0.495 - 0.724
59
• Diverse complex label datasets
Results
Task Metric RU ZC CHMM MAS Oracle
Translations GLEU 0.185 0.188 - 0.217 0.246
Sequences F1 0.561 0.569 0.702 0.709 0.827
Parses EVALB 0.812 0.819 - 0.932 0.939
Rankings 0.491 0.495 - 0.710 0.724
60
• Diverse complex label datasets
• MAS aggregation is best way to get closer to ground truth with no
model alteration between datasets
Braylan and Lease 61
62
Goal: Design a future of Artificial Intelligence (AI)
technologies to meet society’s needs and values.
.
http://goodsystems.utexas.edu
Good Systems: an 8-year, $10M
UT Austin Grand Challenge
“The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at over 100 universities around the world
63
What’s an Information School?
Task-specific workflows
• Pros:
– Empower workers
for complex tasks
• Cons:
– Need new workflow
for every task
– Complicated, difficult
to formulate
Noronha et al 2011
(image analysis)
Lasecki et al 2012
(transcription)
64

More Related Content

PPTX
Aggregating Complex Annotations via Merging and Matching
PDF
Modeling and Aggregation of Complex Annotations
PDF
Mixed Effects Models - Introduction
PDF
Learning from Noisy Label Distributions (ICANN2017)
PDF
Mixed Effects Models - Fixed Effect Interactions
PDF
Mixed Effects Models - Post-Hoc Comparisons
PDF
Mixed Effects Models - Fixed Effects
PDF
Word2vec and Friends
Aggregating Complex Annotations via Merging and Matching
Modeling and Aggregation of Complex Annotations
Mixed Effects Models - Introduction
Learning from Noisy Label Distributions (ICANN2017)
Mixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Post-Hoc Comparisons
Mixed Effects Models - Fixed Effects
Word2vec and Friends

Similar to Automated Models for Quantifying Centrality of Survey Responses (20)

PDF
한국어와 NLTK, Gensim의 만남
PPTX
Text Classification Using Machine Learning.pptx
PDF
Lda2vec text by the bay 2016 with notes
PDF
Word2vec and Friends
PDF
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
PPTX
Kaggle nlp approaches
PPTX
Neel Sundaresan - Teaching a machine to code
PDF
Word2vec in Theory Practice with TensorFlow
PPTX
Text Classification
PDF
機械学習モデルの判断根拠の説明
PDF
Introduction to Open Source RAG and RAG Evaluation
PDF
Automatic generation of domain models for call centers
PPTX
Lecture1.pptx
PDF
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
PDF
Andrey Kutuzov and Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...
PDF
A pragmatic introduction to natural language processing models (October 2019)
PDF
Ai & ml
PDF
Magpie
PDF
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
PDF
From grep to BERT
한국어와 NLTK, Gensim의 만남
Text Classification Using Machine Learning.pptx
Lda2vec text by the bay 2016 with notes
Word2vec and Friends
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
Kaggle nlp approaches
Neel Sundaresan - Teaching a machine to code
Word2vec in Theory Practice with TensorFlow
Text Classification
機械学習モデルの判断根拠の説明
Introduction to Open Source RAG and RAG Evaluation
Automatic generation of domain models for call centers
Lecture1.pptx
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Andrey Kutuzov and Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...
A pragmatic introduction to natural language processing models (October 2019)
Ai & ml
Magpie
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
From grep to BERT
Ad

More from Matthew Lease (20)

PDF
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
PDF
Explainable Fact Checking with Humans in-the-loop
PDF
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
PDF
AI & Work, with Transparency & the Crowd
PDF
Designing Human-AI Partnerships to Combat Misinfomation
PDF
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
PDF
But Who Protects the Moderators?
PDF
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
PDF
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
PDF
Fact Checking & Information Retrieval
PDF
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
PDF
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
PDF
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
PDF
Systematic Review is e-Discovery in Doctor’s Clothing
PDF
The Rise of Crowd Computing (July 7, 2016)
PDF
The Rise of Crowd Computing - 2016
PDF
The Rise of Crowd Computing (December 2015)
PDF
Toward Better Crowdsourcing Science
PDF
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
PDF
The Search for Truth in Objective & Subject Crowdsourcing
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Explainable Fact Checking with Humans in-the-loop
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
AI & Work, with Transparency & the Crowd
Designing Human-AI Partnerships to Combat Misinfomation
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
But Who Protects the Moderators?
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Fact Checking & Information Retrieval
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Systematic Review is e-Discovery in Doctor’s Clothing
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing (December 2015)
Toward Better Crowdsourcing Science
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
The Search for Truth in Objective & Subject Crowdsourcing
Ad

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
Teaching material agriculture food technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Empathic Computing: Creating Shared Understanding
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
cuic standard and advanced reporting.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Teaching material agriculture food technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Advanced methodologies resolving dimensionality complications for autism neur...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation
Empathic Computing: Creating Shared Understanding
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Programs and apps: productivity, graphics, security and other tools
cuic standard and advanced reporting.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Automated Models for Quantifying Centrality of Survey Responses

  • 1. Matt Lease Associate Professor School of Information The University of Texas at Austin Amazon Scholar Human-in-the-loop Services Amazon Web Services (AWS) Automated Models for Quantifying Centrality of Survey Responses 1 Lab: ir.ischool.utexas.edu @mattlease Slides: slideshare.net/mattlease
  • 2. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark. Human-in-the-loop Services • 3 Team Products: Mechanical Turk, Sagemaker Ground Truth, and Augmented AI (A2I) • https://www.amazon.science/research-awards – Cash and/or AWS credits • Summer, sabbatical, or longer engagements – https://www.amazon.science/scholars – https://www.amazon.science/visiting-academics • https://www.amazon.science/tag/internships
  • 4. What’s the capital of Texas? Austin Austin Houston 4
  • 5. What’s the capital of Texas? Austin Austin Houston Majority Vote 5
  • 6. Simple annotation & aggregation Classification • sentiment analysis • image categorization Ordinal rating • product & movie reviews • search relevance Aggregation • Crowdsourcing: quality control • Experts: wisdom of crowds • Goal: select best label available for each item (no label fusion) 6
  • 7. Caption this image: 7 A cat is eating The cat eats A beautiful picture
  • 8. Caption this image: When majority voting falls short Problem: large label space, exact match doesn’t work! 8 A cat is eating The cat eats A beautiful picture
  • 9. What about complex annotations? Ranked lists Parse trees A1: A cat is eating A2: The cat eats A3: A beautiful picture Image captions Range sequences 9
  • 10. 10 Alexander Braylan1 and Matthew Lease2 1 Dept. of Computer Science & 2 School of Information The University of Texas at Austin Modeling and Aggregation of Complex Annotations via Annotation Distance Code & Data: https://github.com/Praznat/annotationmodeling https://github.com/Praznat/annotationmodeling
  • 11. Roadmap • Prior work • Approach • Example outputs • Conclusion 11 https://github.com/Praznat/annotationmodeling
  • 12. Aggregating Simple Labels • Hundreds of papers • Multiple benchmarking studies • Rich body of Bayesian modeling • General-purpose aggregation models for simple labels don’t support complex labels Dawid-Skene MACE Hierarchical Dawid-Skene Item Difficulty Logistic Random Effects Source: Paun et al 2018 “Comparing bayesian models of annotation” 12 https://github.com/Praznat/annotationmodeling
  • 13. Task-specific models • Pros: – Task specialization maximizes accuracy • Cons: – Need new model for every task – Complicated, difficult to formulate Nguyen et al 2017 (Sequences) Lin, Mausam, and Weld 2012 (Math) 13 https://github.com/Praznat/annotationmodeling
  • 14. Our goals • We want aggregation for complex data types – Build on ideas from simple label aggregation models • We want to generalize across many labeling tasks – Can we reduce problem to common simpler state space? 14 https://github.com/Praznat/annotationmodeling
  • 15. Roadmap • Prior work • Approach • Example outputs • Conclusion 15 https://github.com/Praznat/annotationmodeling
  • 16. Key Insight Partial credit matching via task-specific distance function • Adopt or define a distance function for each annotation task • Model annotation distances uniformly across tasks • Distance functions already exist for many task types – Free-text responses, e.g., survey questions 16 https://github.com/Praznat/annotationmodeling
  • 17. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 17 • Example task: free text answer • Example distance function: string edit distance https://github.com/Praznat/annotationmodeling
  • 18. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.05 0.1 0.1 18 • Example task: free text answer • Example distance function: string edit distance https://github.com/Praznat/annotationmodeling
  • 19. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 19 0.82 • Example task: free text answer • Example distance function: string edit distance https://github.com/Praznat/annotationmodeling
  • 21. Example Distance: Word embeddings 21 https://github.com/Praznat/annotationmodeling
  • 22. Distance function properties 22 Properties of distance functions Non-negativity Symmetry Triangle inequality Data Free Text Rankings Example evaluation fn BLEU(x, y) Example distance fn Non-negativity ✓ ✓ Symmetry ✓ ✓ Triangle inequality ✓ ✓ https://github.com/Praznat/annotationmodeling
  • 23. Calculate distances “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 23 0.82 https://github.com/Praznat/annotationmodeling
  • 24. A1: A cat is eating A2: The cat eats A3: A beautiful picture 0.1 0.6 0.3 24 All tasks reduce to matrices of distances https://github.com/Praznat/annotationmodeling
  • 25. How to aggregate given distances • Local selection model • Global selection model • Combined 25 Current item Other items https://github.com/Praznat/annotationmodeling
  • 26. Local approach: Smallest Avg Distance (SAD) • For each question: compute average distance between responses • The response with smallest average distance is locally most normative, generalizing majority vote • Independence between items • Local approach does not model respondent agreement 26 Current item Other items https://github.com/Praznat/annotationmodeling
  • 27. Global approach: Best Available User (BAU) • Score each participant by their average distance to all other participants across all questions • The participant with lowest score is globally most normative; treat their response as most normative • Global approach ignores distance observed on the current item 27 Current item Other items https://github.com/Praznat/annotationmodeling
  • 28. Can we get best of both worlds? • Want a method that combines: – Best available user (global) – Smallest avg distance (local) • Should build on rich history of work on Bayesian annotation modeling • Need a principled framework for modeling annotation distance matrices weights votes weighted voting 28 https://github.com/Praznat/annotationmodeling
  • 29. Multidimensional Annotation Scaling (MAS) • Based on Multidimensional Scaling (Kruskal & Wish 1978) • Probabilistic model of multi- item distance matrices • “Hierarchical Bayesian” – Additional learned parameters represent crowd effects such as worker reliability A cat is eating The cat eats A beautiful picture 29 https://github.com/Praznat/annotationmodeling
  • 30. MAS Objective 1: Likelihood Multidimensional Scaling objective: Diuv ∼ N(∥εiu−εiv∥, σ) • Diuv : observed distance • εiu : annotation embedding • σ : error scale “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 0.82 30 https://github.com/Praznat/annotationmodeling
  • 31. MAS Objective 1: Likelihood Multidimensional Scaling objective: Diuv ∼ N(∥εiu−εiv∥, σ) • Diuv : observed distance • εiu : annotation embedding • σ : error scale “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 0.8 0.82 0.05 0.1 0.1 0.82 31 https://github.com/Praznat/annotationmodeling
  • 32. MAS Objective 2: Prior “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” Pseudo-gold 32 https://github.com/Praznat/annotationmodeling
  • 33. MAS Objective 2: Prior “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 33 https://github.com/Praznat/annotationmodeling
  • 34. MAS Objective 2: Prior “a cat is eating” “cat is eating” “a beautiful picture” “the cat eats” 34 https://github.com/Praznat/annotationmodeling
  • 35. MAS Objective 2: Prior 35 https://github.com/Praznat/annotationmodeling
  • 36. MAS Objective 2: Prior 36 https://github.com/Praznat/annotationmodelingç
  • 37. Roadmap • Prior work • Approach • Example outputs • Conclusion 37 https://github.com/Praznat/annotationmodeling
  • 38. Example Output: father 38 Response SAD MAS He always speaks ill about his father behind back. 0.78 0.16 He always speaks ill of his father behind his back. 0.71 0.30 He always talks about his father behind his back. 0.74 0.50 He always speaks ill of his father 0.78 0.55 He always speak ill of his father. 0.79 0.62 He is always talking about his father behind his back. 0.82 0.63 He always says behind his father. 0.90 0.72 He always talks about his dad behind his back. 0.83 0.73 https://github.com/Praznat/annotationmodelingç
  • 39. Example Output: she says 39 Response SAD MAS Please be sure to take a note of what she says. 0.77 0.16 Please take a note of what she says. 0.84 0.30 Be sure to take a warning notice what she says. 0.86 0.46 Please be sure to take notes what she says. 0.81 0.48 Please take a note what she say. 0.92 0.73 Please be sure to take instructions for her saying. 0.93 0.76 Make sure to insert disclaimer about what she says. 0.93 0.80 Please make a memo whatever she says. 0.99 0.82 https://github.com/Praznat/annotationmodelingç
  • 40. Example Output: quiet 40 Response SAD MAS As long as you keep quiet you may stay here 0.83 0.26 You can stay here as long as you keep quiet. 0.86 0.39 You may stay here if you keep quiet. 0.81 0.39 You can stay here if you keep quiet. 0.82 0.57 So long as you remain quiet you may stay here. 0.92 0.57 If it is quiet you may stay here 0.90 0.70 If you keep quiet you can stay here. 0.92 0.81 You may be here if you keep quiet. 0.91 0.84 https://github.com/Praznat/annotationmodelingç
  • 41. Example Output: go ahead 41 Response SAD MAS Please go ahead if i am late. 0.83 0.16 Please go ahead if I'm late. 0.79 0.28 Please go ahead if I delayed. 0.82 0.51 Please go without me if I'm late. 0.91 0.62 Please go ahead if I get late 0.83 0.67 Please go ahead and leave if I'm late. 0.88 0.74 If I am late you can go in first. 1.00 0.79 If I should be late go without me. 1.00 0.81 https://github.com/Praznat/annotationmodelingç
  • 42. Example Output: married 42 Response SAD MAS Actually they are not married 0.91 0.18 To tell the truth they are not couple 0.79 0.47 To tell the truth they are not a married couple 0.84 0.62 To tell the truth they're not married 0.89 0.63 In fact they are not couple 0.94 0.69 to telling the truth we're not married 0.97 0.71 Two people are not couples in truth 1.00 0.79 https://github.com/Praznat/annotationmodelingç
  • 43. Roadmap • Prior work • Approach • Example outputs • Conclusion 43 https://github.com/Praznat/annotationmodeling
  • 44. Conclusion • Probabilistic model identifies normative vs. outlier responses by quantifying distance between responses • Many choices for measuring distance between two texts (e.g., character-based or more semantic NLP) • 3 models: local (SAD), global (BAU), or combo (MAS) • Open source: github.com/Praznat/annotationmodeling 44 A1: A cat is eating A2: The cat eats A3: A beautiful picture https://github.com/Praznat/annotationmodeling
  • 45. Future work 45 A1: A cat is eating A2: The cat eats A3: A beautiful picture • From objective labeling tasks to subjective responses • Evaluation on survey data – Collaboration with behavioral science researchers? – Compare distance functions and model settings for utility • Automatic detection of consistent biases in a participant’s responses vs. what’s group normative https://github.com/Praznat/annotationmodeling
  • 46. 46 Matt Lease (University of Texas at Austin) Lab: ir.ischool.utexas.edu @mattlease Slides: slideshare.net/mattlease We thank our many talented crowd workers for their contributions to our research! https://github.com/Praznat/annotationmodeling Alexander Braylan and Matthew Lease. Aggregating Complex Annotations via Merging and Matching. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 86--94, 2021. [ bib | pdf | data | sourcecode | video | slides | tech-report ] Alexander Braylan and Matthew Lease. Modeling and Aggregation of Complex Annotations via Annotation Distances. In Proceedings of the Web Conference, pages 1807--1818, 2020. [ bib | pdf | data | sourcecode | video | slides ]
  • 48. MTurk: The Early Days 48 • Artificial Intelligence, With Help From the Humans. – J. Pontin. NY Times, March 25, 2007 • Is Amazon's Mechanical Turk a Failure? April 9, 2007 – “As of this writing, there are [only] 128 HITs available on Mechanical Turk.” • Su et al., WWW 2007: “a web-based human data collection system… ‘System M’ ”
  • 49. 2008: the ”Gold” Rush Begins Braylan and Lease 49 Snow et al, EMNLP (Natural Language Processing) • Annotating human language for natural language processing (NLP) • 22,000 labels for only $26 USD • Crowd’s consensus labels can replace traditional expert labels “Discovery” sparks rush for “gold” data across areas • Alonso et al., SIGIR Forum (Information Retrieval) • Kittur et al., CHI (Human-Computer Interaction) • Sorokin and Forsythe, CVPR (Computer Vision)
  • 50. 2010-11: Social & Behavioral Sciences 50 • A Guide to Behavioral Experiments on Mechanical Turk – W. Mason and S. Suri (2010). SSRN online. • Crowdsourcing for Human Subjects Research – L. Schmidt (CrowdConf 2010) • Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk – Conley & Tosti-Kharas (2010). Academy of Management • Amazon's Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data? – M. Buhrmester et al. (2011). Perspectives… 6(1):3-5. – see also: Amazon Mechanical Turk Guide for Social Scientists
  • 51. The Future of Crowd Work (ACM CSCW’13) by Kittur, Nickerson, Bernstein, Gerber, Shaw, Zimmerman, Lease, and Horton 51
  • 55. Tasks & datasets SYNTHETIC DATASETS • Syntactic parse trees – Distance function: evalb • Ranked lists – Distance function: Kendall’s tau REAL DATASETS • Biomedical text sequences – Distance function: Span F1 • Urdu-English translations – Distance function: GLEU 55 Nguyen et al 2017 Zaidan and Callison-Burch 2011
  • 56. Methods Baselines: • Random User (RU): pick one label randomly • ZenCrowd (ZC) (Demartini et al. 2012) – Weighted voting based on exact match (rare!) • Crowd Hidden Markov Model (CHMM) (Nguyen et al. 2017) – Sequence annotation task only Upper bound: Oracle (OR) (always picks best label) • Even if 5 workers answer, limited by best answer any of them gave 56
  • 57. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.246 Sequences F1 0.561 0.827 Parses EVALB 0.812 0.939 Rankings 0.491 0.724 57 • Diverse complex label datasets
  • 58. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.188 0.246 Sequences F1 0.561 0.569 0.827 Parses EVALB 0.812 0.819 0.939 Rankings 0.491 0.495 0.724 58 • Diverse complex label datasets
  • 59. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.188 - 0.246 Sequences F1 0.561 0.569 0.702 0.827 Parses EVALB 0.812 0.819 - 0.939 Rankings 0.491 0.495 - 0.724 59 • Diverse complex label datasets
  • 60. Results Task Metric RU ZC CHMM MAS Oracle Translations GLEU 0.185 0.188 - 0.217 0.246 Sequences F1 0.561 0.569 0.702 0.709 0.827 Parses EVALB 0.812 0.819 - 0.932 0.939 Rankings 0.491 0.495 - 0.710 0.724 60 • Diverse complex label datasets • MAS aggregation is best way to get closer to ground truth with no model alteration between datasets
  • 62. 62 Goal: Design a future of Artificial Intelligence (AI) technologies to meet society’s needs and values. . http://goodsystems.utexas.edu Good Systems: an 8-year, $10M UT Austin Grand Challenge
  • 63. “The place where people & technology meet” ~ Wobbrock et al., 2009 “iSchools” now exist at over 100 universities around the world 63 What’s an Information School?
  • 64. Task-specific workflows • Pros: – Empower workers for complex tasks • Cons: – Need new workflow for every task – Complicated, difficult to formulate Noronha et al 2011 (image analysis) Lasecki et al 2012 (transcription) 64