The human face of AI: how collective and augmented intelligence can help solve societal problems

THE HUMAN FACE OF AI:
HOW COLLECTIVE AND
AUGMENTED INTELLIGENCE
CAN HELP SOLVE SOCIETAL
PROBLEMS
Elena Simperl
ACM-W UK, June 2020
@esimperl

AUGMENTED
INTELLIGENCE
Human-centred design paradigm for systems
that utilise artificial intelligence (AI)
People and AI work together to enhance
cognitive performance, support decision
making and create new experiences

AI DEPENDS
ON PEOPLE
Applications require more or better data e.g.
from mobile or IoT devices
Machine learning algorithms learn from
human labellers
Knowledge-based AI approaches acquire
domain knowledge from people

AI BENEFITS
FROM
COLLECTIVE
INTELLIGENCE
Collective intelligence (CI) emerges when
groups or communities come together,
implicitly or explicitly, to achieve a common
goal
CI techniques help AI applications design and
manage interactions with people
In human computation a machine performs a
function by outsourcing some steps to people

How do we design
systems that bring
together human,
collective &
computational
intelligence

IN THIS TALK
Design patterns for socio-
technical systems
Socio-technical challenges
when defining and applying
the patterns
Directions for future research

EXAMPLE:
SUPPORTING DISASTER RELIEF
Human computation has provided huge advances to
disaster relief efforts
 40,000 independent reports mapped through Ushahidi after
Haiti earthquake
Crisis teams sift through large volumes of
crowdsourced reports from social media and other
sources
Volunteer efforts predominantly limited to initial
phase of recovery
Human interest and effort often fails before later
stages of process

CHALLENGE:
MAKING CROWDSOURCING SUSTAINABLE
Learning
Engagement

TASK ALLOCATION
EXPERIMENT
Increase learning and engagement by ordering
tasks by difficulty or similar content
Public dataset of tweet URLs about hurricanes
Harvey, Irma and Maria, curated manually to 2000
tweets, 1000 text-only, 1000 with images
People were asked to classify tweets to help
recovery teams process social media reports
Recruitment via Amazon’s Mechanical Turk
Labels train machine learning classifiers

TASK DESIGN
Presented participants with disaster relief tweets
(text or text + image)
Participants asked to:
 Classify text based on content
 Rate task according to difficulty
Three conditions:
 Random baseline
 Difficult tweets
 Easy tweets
Monitored accuracy of responses

FINDINGS
Accuracy influenced by difficulty
 Text: Weak association when comparing easy and difficult
clusters
 Images: Strong association when comparing difficult and random
clusters
No significant association between difficulty and
volume of completed tasks
Only 30% of workers completed more than one task

FEEDBACK EXPERIMENT
Two forms of feedback
 Expert feedback (using gold standard)
 Crowd feedback (randomly selected)
Workflow:
 Participant gives answer
 Prompted with pre-existing answer and offered chance to
edit
 Asked to explain decision
Monitored decisions and justifications

FINDINGS
Participants generally poor at taking feedback into
account:
 57% of workers felt expert feedback matched their responses
(only 7% did)
 36% of workers felt crowd feedback matched (only 4% did)
Participants presented with crowd feedback more
likely to change their answer in response (41% vs
26%)
Also more likely to deem presented feedback from
crowd as incorrect than from experts (22% vs 16%)

FUTURE WORK
Difficulty impacts accuracy, but not
engagement
Participants struggled with more
complex tasks
Significant support required for
maximum accuracy
Generic feedback not sufficient –
more personalised support required,
resource-intensive

EXAMPLE:
URBAN AUDITING
ON DEMAND
Urban datasets are often out-of-
date
 Survey methodologies: expensive,
error-prone, no validation
 VGI (e.g. OpenStreetMap): no
control over data updates,
coverage etc.
Online tool using paid microtask
crowdsourcing
 Uses digital street view imagery
 Task performed remotely
 Participants recruited from online
marketplaces

VIRTUAL CITY EXPLORER
QROWD-POI.HEROKUAPP.COM/
Urban planner defines an area
and the instructions for the
participants
Participants explore an area
virtually and identify points of
interest
Urban planner monitors task
execution, quality and rewards

CHALLENGE:
CROWDSOURCING DESIGN
TASK DESIGN DATA QUALITY INCENTIVES FAIRNESS

EXPERIMENT: CYCLING TRENTO & NANTES
150 participants per city, random starting positions
5 PoIs (bike racks) per participant for $0.15
Total cost per city: $45 (7 days)
Mixed methods approach, including metrics and
manual inspection
 RQ1: Feasibility and precision as task progresses
 RQ2: Completeness (overlap with benchmark datasets)
 RQ3: Coverage (percentage of visited nodes on explorable path)
 RQ4: Crowd experience (interface errors triggered, number of
escapes)
Trento Nantes
Area 0.347km2 0.336km2
Nodes 906 1177
Explorable
distance
9127m 12104m
StreetView
coverage
93% 92%

RQ1: TASK FEASIBILITY AND PRECISION AS TASK PROGRESSES
UX supports discovery
of PoIs
Photoshoot paradigm
and triangulation
method help identify
low-quality answers
Precision drops as all
PoIs are submitted

RQ2: DATA COMPLETENESS
Approach complements existing
data sources and is able to find
new PoIs
Highly customisable (area of
interest, budget, questions, timing)
52
54

RQ3: COVERAGE OF THE
DESIGNATED AREA
Approach achieves high coverage of the
area of interest
Some parts of the map are visited more
often than others (resources)
Black dots are points on StreetView that
are difficult to explore

RQ4: CROWD EXPERIENCE
Most participants were able to complete
their tasks without any incidents. Some did
not manage to triangulate or stepped
outside of the designated area
Positive feedback, fair payment, despite
taboo mechanism. Small percentage submit
some data and dropped out.
Most participants who dropped out did not
seem to try to complete the task

FINDINGS
VCE adds value to urban auditing methods
 Accuracy comparable to OpenStreetMap, easier to
manage than VGI
 Additional resources upon demand (at a cost)
Free exploration achieves good coverage
Taboo mechanism helps reduce costs and avoid
duplicated work

FUTURE WORK
Allocating starting positions: randomly, centre, to confirm item, to cover new
area etc.
Coordinating among participants: map showing progress of other participants
Understanding the impact of urban topology on feasibility, accuracy,
coverage
Direct comparisons with other approaches
Hybrid workflows with crowds on the ground and online

EXAMPLE:
UNDERSTANDING MOBILITY
PATTERNS
City planners lack detailed mobility
information about their residents
Human-AI workflow
Bespoke app for data collection
Combination of symbolic and numerical ML
classifiers to match trip segments to modes of
transport
Active learning approach to ask travellers to
validate trips the machine is unsure about

CHALLENGE:
USER EXPERIENCE
Iterative UX development via citizen lab to improve
journey data and ML predictions
Lab and field studies with 250+ participants

CHALLENGE:
ASSESSING THE QUALITY OF THE DATA
Naïve model assumes people will notice and correct errors in journeys detected by the
algorithm
Is this true? If not, can we detect errors and estimate residual error rate?
Are people employing specific ‘strategies’ to check and correct journeys?

EXPERIMENT DESIGN
No independent ground truth!
Inject artificial errors and measure if
they are corrected
Assume artificial errors are not
accidental corrections
Use ratio of discovered natural
errors to discovered artificial errors
to estimate initial and residual
natural error rate
Assume natural errors are comparable
to artificial ones and people are not
adding new errors (‘mis-corrections’)

EXPERIMENT DESIGN (2)
10 participants, ~5 journeys per participant, from Google Timelines (KML)
Pre-process to add artificial errors in four classes:
Under- or over-segmentation
Bad mode
Bad point (100/400m GPS point move)
Score. Manual process, tool supported

PRELIMINARY FINDINGS:
MORE RESEARCH NEEDED INTO DATA COLLECTION
METHODOLOGIES FOR ML
Errors can be corrected
Errors can mislead
Errors can persist
A range of complex cases

How do we design systems that
bring together human, collective &
computational intelligence

Mix of CI approaches
Iterative UX design
Methods to assess data quality and
improve human-AI interactions
Aligned motivation and incentives

THANKS TO LUIS-DANIEL IBÁÑEZ, EDDY
MADDALENA, RICHARD GOMER, NEAL REEVES,
THE QROWD PROJECT, NESTA AND THE
EUROPEAN COMMISSION
@esimperl
Maddalena, E., Ibáñez, L.D. and Simperl, E., 2020. On the mapping of
Points of Interest through StreetView imagery and paid crowdsourcing. To
appear in ACM TIST.
qrowd-poi.herokuapp.com
Nesta, June 2020. Combining Crowds and Machines: Experiments in
collective intelligence design 1.0. nesta.org.uk/report/combining-crowds-
and-machines/
Nesta, June 2020. Collective intelligence grants 1.0.
nesta.org.uk/feature/collective-intelligence-grants/

The human face of AI: how collective and augmented intelligence can help solve societal problems

More Related Content

What's hot (20)

Similar to The human face of AI: how collective and augmented intelligence can help solve societal problems (20)

More from Elena Simperl (20)

Recently uploaded (20)

The human face of AI: how collective and augmented intelligence can help solve societal problems