SlideShare a Scribd company logo
HUMAN COMPUTATION
FOR VGI MANAGEMENT
Irene Celino – irene.celino@cefriel.com
Cefriel, Viale Sarca 226, 20126 Milano
Workshop on Volunteered Geographic Information – Milano, April 16th, 2018
1. Introduction
2. Human Computation and Games with a Purpose
3. GWAP examples for VGI Management
4. Indirect people involvement
5. Future perspectives
AGENDA
2copyright © 2018 Cefriel – All rights reserved
from ideation to business value
3
1. INTRODUCTION
Relation between VGI and Human Computation
copyright © 2018 Cefriel – All rights reserved
VGI AND HUMAN COMPUTATION
• VGI is carried out by volunteers, so by definition it implies human intervention
• Still VGI suffers of all issues related to that:
• Varying participation  impact on sustainability (long tail effect)
• Reliability of volunteers  impact of information quality
• Uneven distribution of contributions  impact on coverage
• Human Computation is an approach that can bring benefits to VGI…
• …and VGI can reveal more than you could expect!
4copyright © 2018 Cefriel – All rights reserved
WISDOM OF CROWDS
• “Why the Many Are Smarter Than the Few and
How Collective Wisdom Shapes Business, Economies, Societies and Nations”
• Criteria for a wise crowd
• Diversity of opinion (importance of interpretation)
• Independence (not a “single mind”)
• Decentralization (importance of local knowledge)
• Aggregation (aim to get a collective decision)
• The are also failures/risks in crowd decisions:
• Homogeneity, centralization, division, imitation, emotionality
5copyright © 2018 Cefriel – All rights reserved
James Surowiecki
The wisdom of crowds
Anchor, 2005
from ideation to business value
6
2. HUMAN COMPUTATION &
GAMES WITH A PURPOSE
What is Human Computation? What goals can humans help machines to achieve? How to
involve a crowd of persons?
What extrinsic rewards (money, prizes, etc.) or intrinsic incentives can we adopt to
motivate people?
copyright © 2018 Cefriel – All rights reserved
HUMAN COMPUTATION
• Human Computation is a computer science technique in which a computational process
is performed by outsourcing certain steps to humans. Unlike traditional computation,
in which a human delegates a task to a computer, in Human Computation the computer
asks a person or a large group of people to solve a problem; then it collects, interprets
and integrates their solutions
• The original concept of Human Computation by its inventor Luis von Ahn derived from the
common sense observation that people are intrinsically very good at solving some
kinds of tasks which are, on the other hand, very hard to address for a computer;
this is the case of a number of targets of Artificial Intelligence (like image recognition or
natural language understanding) for which research is still open
7copyright © 2018 Cefriel – All rights reserved
Edith Law and Luis von Ahn. Human computation.
Synthesis Lectures on Artificial Intelligence and Machine Learning, 2011
HUMAN COMPUTATION
8copyright © 2018 Cefriel – All rights reserved
Problem: an Artificial Intelligence
algorithm is unable to achieve an
adequate result with a satisfactory
level of confidence
Solution: ask people to intervene
when the AI system fails, “masking”
the task within another human
process
Example: https://www.google.com/recaptcha/
WHY HUMAN COMPUTATION FOR VGI?
• Collection of new data – as a complement to VGI itself, exploiting redundancy of multiple
contributions
• Validation of collected data or automatic processing – as “third party” to solve
discrepancies
• Completion of data, to fill out “missing pieces”
• Identification of mistakes/outdated information and respective “correction”
9copyright © 2018 Cefriel – All rights reserved
GAMES WITH A PURPOSE
• A GWAP lets to outsource to humans some steps of a computational process in an
entertaining way
• The application has a “collateral effect”, because players’ actions are exploited to
solve a hidden task
• The application *IS* a fully-fledged game (opposed to gamification, which is the use
of game-like features in non-gaming environments)
• The players are (usually) unaware of the hidden purpose, they simply meet game
challenges
10copyright © 2018 Cefriel – All rights reserved
Luis Von Ahn. Games with a purpose. Computer, 39(6):92–94, 2006
Luis Von Ahn and Laura Dabbish. Designing games with a purpose.
Communications of the ACM, 51(8):58–67, 2008
GAMES WITH A PURPOSE (GWAP)
11copyright © 2018 Cefriel – All rights reserved
Problem: it’s the same of
Human Computation (ask
humans when AI fails)
Solution: Solution: hide the
task within a game, so that
users are motivated by game
challenges, often remaining
unaware of the hidden purpose,
task solution comes from
agreement between players
SOME “VARIATIONS” OF HUMAN COMPUTATION
• Other terms have been used to indicate approaches and methods that are similar to
Human Computation and sometimes mistaken for it
• While there is of course quite a large overlap, it is useful to distinguish them
• Crowdsourcing
• Citizen Science
12copyright © 2018 Cefriel – All rights reserved
CROWDSOURCING
• Crowdsourcing is the process to outsource tasks to a “crowd” of distributed people.
The possibility to exploit the Internet as vehicle to recruit contributors and to assign
tasks led to the rise of micro-work platforms, thus often (but not always) implying a
monetary reward. The term Crowdsourcing, although quite recent, is used to indicate a
wide range of practices; however, the most common meaning of Crowdsourcing implies
that the “crowd” of workers involved in the solution of tasks is different from the traditional
or intended groups of task solvers
13copyright © 2018 Cefriel – All rights reserved
Jeff Howe. Crowdsourcing: How the power of the crowd
is driving the future of business. Random House, 2008
CROWDSOURCING
14copyright © 2018 Cefriel – All rights reserved
Problem: a company needs to
execute a lot of simple tasks,
but cannot afford hiring a
person to do that job
Solution: pack tasks in
bunches (human intelligence
tasks or HITs) and outsource
them to a very cheap workforce
through an online platform
Example: https://www.mturk.com/
CITIZEN SCIENCE
• Citizen Science is the involvement of volunteers to collect or process data as part of
a scientific or research experiment; those volunteers can be the scientists and
researchers themselves, but more often the name of this discipline “implies a form of
science developed and enacted by citizens” including those “outside of formal scientific
institutions”, thus representing a form of public participation to science. Formally, Citizen
Science has been defined as “the systematic collection and analysis of data; development
of technology; testing of natural phenomena; and the dissemination of these activities by
researchers on a primarily avocational basis”.
15copyright © 2018 Cefriel – All rights reserved
Alan Irwin. Citizen science: A study of people, expertise
and sustainable development. Psychology Press, 1995
CITIZEN SCIENCE
16copyright © 2018 Cefriel – All rights reserved
Example: https://www.zooniverse.org/
Problem: a scientific
experiment requires the
execution of a lot of simple
tasks, but researchers are busy
Solution: engage the general
audience in solving those tasks,
explaining that they are
contributing to science,
research and the public good
SPOT THE DIFFERENCE…
• Similarities:
• Involvement of people
• Aggregation of multiple contributions
• No automatic replacement
• Variations:
• Motivation
• Reward (glory, money, passion/need)
• Hybrids or parallel!
17copyright © 2018 Cefriel – All rights reserved
Citizen Science
Crowdsourcing
Human
Computation
from ideation to business value
18
3. GWAP EXAMPLES FOR VGI
MANAGEMENT
Can we embed VGI management tasks within Games with a Purpose?
copyright © 2018 Cefriel – All rights reserved
3 EXAMPLES OF GAMES WITH A PURPOSE FOR VGI
• Collection of missing data: GWAP enabler for OSM Restaurants
• Validation of automatically collected information: LCV game
• Collection, validation and correction of data: Urbanopoly
19copyright © 2018 Cefriel – All rights reserved
20
• Input: OSM restaurants in a
given area with/without
cuisine tag (those with the
tag are used for assessing
player reliability)
• Goal: assign score 𝜎 to
each restaurant-cuisine pair
to discover the “right”
category
• Score 𝜎 of each pair is
updated on the basis of
players’ choices
(incremented if link
selected)
• When the score overcomes
the threshold 𝜎 ≥ 𝑡 , the
restaurant’s category is
considered “true” (and
removed from the game)
• Restaurant POIs (amenity=restaurant) from OSM may miss the cuisine type (cuisine key)
GWAP ENABLER TUTORIAL FOR OSM RESTAURANTS
copyright © 2018 Cefriel – All rights reserved
Pure GWAP with
double player game
mechanics
Points, badges,
leaderboard as
intrinsic reward
A player scores if he/she
chooses the same cuisine
of its gameplay “mate”
Data validation is a result
of the “agreement”
between players
https://github.com/STARS4ALL/
gwap-enabler-tutorial
Points, badges,
leaderboard as
intrinsic reward
21
• Input: set of pixels where
the two classifications
“disagree”
• Goal: assign score 𝜎 to
each pixel-category pair to
discover the “right” land
cover class
• Score 𝜎 of each pair is
updated on the basis of
players’ choices
(incremented if selected,
decremented if not
selected)
• When the score overcomes
the threshold 𝜎 ≥ 𝑡 , the
pixel’s category is
considered “true” (and
removed from the game)
• Two automatic land cover classifications in disagreement:
• DUSAF (Lombardy Region) and GlobeLand 30 (Chinese governmental agency)
LAND COVER VALIDATION GAME
copyright © 2018 Cefriel – All rights reserved
https://youtu.be/Q0ru1hhDM9Q
http://bit.ly/foss4game
Pure GWAP with
not-so-hidden purpose
(played by “experts”)
Points, badges,
leaderboard as
intrinsic reward
A player scores if he/she
guess one of the two
disagreeing classifications
Data validation is a result
of the “agreement”
between players
Maria Antonia Brovelli, Irene Celino, Andrea Fiano, Monia Elisa Molinari, Vijaycharan Venkatachalam.
A crowdsourcing-based game for land cover validation. Applied Geomatics, 2017
22
• Input: data from OSM
• Goal:
if data doesn’t exist, collect
if data exists, validate
if data is wrong, correct
• Complex game embedding
“mini-games” for data
collection, validation and
correction
• Same score mechanisms,
with score 𝜎 updated on the
basis of players’ choices
• When the score overcomes
the threshold 𝜎 ≥ 𝑡 , data is
considered “true” (and can
be sent back to OSM)
• POI information from OSM to be collected or validated/corrected
URBANOPOLY
copyright © 2018 Cefriel – All rights reserved
Irene Celino. Geospatial dataset curation through a location-based game.
Semantic Web Journal, Volume 6, Number 2, IOS Press, 2015
Monopoly-like game
to win venues in the
real world
Wheel of fortune and
mini-games to acquire
venues and become
“rich” in the game
Data acquisition
challenges as
contributions for
missing data
Data validation
challenges to check
pre-existing data
Result from
players
“agreement”
LESSONS LEARNED BY DESIGNING AND RUNNING THOSE GAMES
• Designing and developing a full game is expensive
• The simpler the game, the better its acceptance by players and its “throughput”
• Different players are motivated by different incentives
• Fun is not always enough to engage people, especially in the long term
• Data collected via games can be enough to train automatic models
23copyright © 2018 Cefriel – All rights reserved
Gloria Re Calegari, Gioele Nasi, Irene Celino. Human Computation vs. Machine Learning:
an Experimental Comparison for Image Classification. Human Computation Journal, 2018.
from ideation to business value
24
4. INDIRECT PEOPLE INVOLVEMENT
Are there indirect ways to involve humans in data processing?
copyright © 2018 Cefriel – All rights reserved
HUMANS AS A SOURCE OF INFORMATION
• People are not only task executors, they are also information providers!
• Open content and cooperative knowledge
• Data explicitly provided by people like VGI can “hide” further information
• e.g., logs of wiki editing, statistical distribution of contributes
• Opportunistic sensing
• Voluntary or involuntary digital traces of human-related activities
• e.g., phone call logs, GPS traces, social media activities
25copyright © 2018 Cefriel – All rights reserved
FROM SPATIAL ANALYTICS TO GEO-SPATIAL “SEMANTICS”
• Spatial distribution and conglomeration of specific points of interest (POI)
from OpenStreetMap can give hints about the geographical space
• Re-engineering of spatial features through comparison between areas:
same POI type shows different distribution  evidence for different
semantics (e.g. what is a pub in Milano vs. London)
• Semantic specification of spatial neighbourhoods:
• Emerging neighbourhoods from spatial clustering of POIs (opposed
to administrative divisions)
• Spatial version of tf-idf to compare between different areas (e.g.
central or peripheral areas in different cities) and to characterise
neighbourhoods (e.g. shopping district)
26copyright © 2018 Cefriel – All rights reserved
Gloria Re Calegari, Emanuela Carlino, Irene Celino, Diego Peroni. Supporting Geo-Ontology
Engineering through Spatial Data Analytics. 13th Extended Semantic Web Conference, 2016
FROM POI INFORMATION AND PHONE CALL LOGS TO LAND USE
• General topic: exploit “low-cost” information about a geographic area as features to
train a predictive model that outputs “expensive” information about the same area
• “Inexpensive” input information:
• Geo-information about points of interests processed to characterize space
(distance from the nearest POI of type X)
• Mobile traffic data processed using different time series techniques
(smoothing, decomposition, filtering, time-windowing)
• “Expensive” output information:
• Land use characterization (usually collected through long and expensive
workflows that mix machine processing and costly human labour)
27copyright © 2018 Cefriel – All rights reserved
Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Extracting Urban Land Use from Linked Open Geospatial Data. IJGI, 2015
Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Filtering and Windowing Mobile Traffic Time Series for Territorial Land Use Classification. COMCOM, 2016
from ideation to business value
28
5. FUTURE PERSPECTIVES
Are we there yet?!?
copyright © 2018 Cefriel – All rights reserved
FUTURE PERSPECTIVES
• VGI management is still an open issue
• Human Computation methods (and the like) can be employed to support VGI
management
• Parallel/joint adoption of different methods to get the best out of them
• Research challenges are still the same
• Collection, completion/coverage, quality, (in)homogeneity, update/sustainability, …
• Human-in-the-loop is an emerging trend and paradigm also in Machine Learning
research (e.g. active learning)
29copyright © 2018 Cefriel – All rights reserved
MILANO
viale Sarca 226,
20126,
Milano - Italy
LONDON
4th floor
57 Rathbone Place
London W1T 1JU – UK
NEW YORK
One Liberty Plaza,
165 Broadway, 23rd Floor,
New York City, New York, 10006 USA
Cefriel.com
Thanks for your attention!
Any question?
Irene Celino
Knowledge Technologies
Digital Interaction Division
irene.celino@cefriel.com

More Related Content

PDF
Human-in-the-loop @ ISWS 2019
PDF
Bigger than Any One: Solving Large Scale Data Problems with People and Machines
PDF
Crowdsourcing challenges and opportunities 2012
PPTX
The Need for Deep Learning Transparency
PDF
Event Tech Live: Crunched by Event Wallet
PPTX
Crowdsourcing presentation
PDF
The Leadership Challenges of Digital Transformation - The Conference Board - ...
PDF
Creating Effective Adoption of Social Tools with Design and Measurement | DW2...
Human-in-the-loop @ ISWS 2019
Bigger than Any One: Solving Large Scale Data Problems with People and Machines
Crowdsourcing challenges and opportunities 2012
The Need for Deep Learning Transparency
Event Tech Live: Crunched by Event Wallet
Crowdsourcing presentation
The Leadership Challenges of Digital Transformation - The Conference Board - ...
Creating Effective Adoption of Social Tools with Design and Measurement | DW2...

Similar to Human Computation for VGI Management (20)

PDF
Human computation @ Data Semantics
PDF
BDVe Webinar Series - QROWD: The Human Factor in Big Data
PDF
BDVe Webinar Series - QROWD: The Human Factor in Big Data
PDF
Human factor in big data qrowd bdve
PDF
Human Computation
PDF
Increasing Perfomance via Gamification in a Volunteer-Based Evolutionary Comp...
DOCX
Chapter 7Evaluating and Controlling TechnologyBased.docx
PDF
AI Orange Belt - Session 2
PDF
PDF
Crowdsourcing: A Survey
PDF
Dashboards are Dumb Data - Why Smart Analytics Will Kill Your KPIs
PPTX
The big data revolution in healthcare
PDF
Minne analytics presentation 2018 12 03 final compressed
PDF
The Unreasonable Effectiveness of Data
PDF
Scientific revenue unreasonable effectiveness of data
PPTX
LT-Innovate Brussels June 26, 2013 Innovation Session III: Cooperation
PDF
Clipperton - AI - Deep Learning: From Hype to Maturity?
PDF
Minne analytics presentation 2018 12 03 final compressed
PPTX
The Purdue IronHacks
PPTX
Counting the World with AI Models
Human computation @ Data Semantics
BDVe Webinar Series - QROWD: The Human Factor in Big Data
BDVe Webinar Series - QROWD: The Human Factor in Big Data
Human factor in big data qrowd bdve
Human Computation
Increasing Perfomance via Gamification in a Volunteer-Based Evolutionary Comp...
Chapter 7Evaluating and Controlling TechnologyBased.docx
AI Orange Belt - Session 2
Crowdsourcing: A Survey
Dashboards are Dumb Data - Why Smart Analytics Will Kill Your KPIs
The big data revolution in healthcare
Minne analytics presentation 2018 12 03 final compressed
The Unreasonable Effectiveness of Data
Scientific revenue unreasonable effectiveness of data
LT-Innovate Brussels June 26, 2013 Innovation Session III: Cooperation
Clipperton - AI - Deep Learning: From Hype to Maturity?
Minne analytics presentation 2018 12 03 final compressed
The Purdue IronHacks
Counting the World with AI Models
Ad

More from Irene Celino (20)

PDF
Knowledge Technologies group at Cefriel
PDF
Interplay of Game Incentives, Player Profiles and Task Difficulty in Games with ...
PDF
A Framework to build Games with a Purpose for Linked Data Refinement
PDF
Involving people in Citizen Science through game incentives: the case of the ...
PDF
Ninja Riders: sensibilizzare i giovani a una mobilità più sicura attraverso i...
PDF
Ninja Riders - Youth and Road Safety: Discovering Urban Mobility Behaviours
PDF
BotDCAT-AP: An Extension of the DCAT Application Profile for Describing Datas...
PPTX
Give and Take in Citizen Science
PPTX
Ninja Riders @ Human Factory Day 2017
PDF
Night Knights: exploiting games to engage people in a citizen science campaign
PDF
STARS4ALL-CAPSSI-Workshop
PDF
Towards Talkin'Piazza: Engaging Citizens through Playful Interaction with Urb...
PDF
SSSW 2016 Cognition Tutorial
PDF
Analysis of a Cultural Heritage Game with a Purpose with an Educational Incen...
PDF
Supporting Geo-Ontology Engineering through Spatial Data Analytics
PDF
Smart City Semantics - Data Analytics and Human Computation to understand the...
PDF
Towards a Semantic City Service Ecosystem
PDF
Living Land Use - Telecom Big Data Challenge - Trento ICT Days 2014
PDF
Urbanopoly @ PlanetData review
PDF
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Knowledge Technologies group at Cefriel
Interplay of Game Incentives, Player Profiles and Task Difficulty in Games with ...
A Framework to build Games with a Purpose for Linked Data Refinement
Involving people in Citizen Science through game incentives: the case of the ...
Ninja Riders: sensibilizzare i giovani a una mobilità più sicura attraverso i...
Ninja Riders - Youth and Road Safety: Discovering Urban Mobility Behaviours
BotDCAT-AP: An Extension of the DCAT Application Profile for Describing Datas...
Give and Take in Citizen Science
Ninja Riders @ Human Factory Day 2017
Night Knights: exploiting games to engage people in a citizen science campaign
STARS4ALL-CAPSSI-Workshop
Towards Talkin'Piazza: Engaging Citizens through Playful Interaction with Urb...
SSSW 2016 Cognition Tutorial
Analysis of a Cultural Heritage Game with a Purpose with an Educational Incen...
Supporting Geo-Ontology Engineering through Spatial Data Analytics
Smart City Semantics - Data Analytics and Human Computation to understand the...
Towards a Semantic City Service Ecosystem
Living Land Use - Telecom Big Data Challenge - Trento ICT Days 2014
Urbanopoly @ PlanetData review
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Ad

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Approach and Philosophy of On baking technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
The Rise and Fall of 3GPP – Time for a Sabbatical?
Agricultural_Statistics_at_a_Glance_2022_0.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
sap open course for s4hana steps from ECC to s4
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Approach and Philosophy of On baking technology
Unlocking AI with Model Context Protocol (MCP)
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
MIND Revenue Release Quarter 2 2025 Press Release

Human Computation for VGI Management

  • 1. HUMAN COMPUTATION FOR VGI MANAGEMENT Irene Celino – irene.celino@cefriel.com Cefriel, Viale Sarca 226, 20126 Milano Workshop on Volunteered Geographic Information – Milano, April 16th, 2018
  • 2. 1. Introduction 2. Human Computation and Games with a Purpose 3. GWAP examples for VGI Management 4. Indirect people involvement 5. Future perspectives AGENDA 2copyright © 2018 Cefriel – All rights reserved
  • 3. from ideation to business value 3 1. INTRODUCTION Relation between VGI and Human Computation copyright © 2018 Cefriel – All rights reserved
  • 4. VGI AND HUMAN COMPUTATION • VGI is carried out by volunteers, so by definition it implies human intervention • Still VGI suffers of all issues related to that: • Varying participation  impact on sustainability (long tail effect) • Reliability of volunteers  impact of information quality • Uneven distribution of contributions  impact on coverage • Human Computation is an approach that can bring benefits to VGI… • …and VGI can reveal more than you could expect! 4copyright © 2018 Cefriel – All rights reserved
  • 5. WISDOM OF CROWDS • “Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations” • Criteria for a wise crowd • Diversity of opinion (importance of interpretation) • Independence (not a “single mind”) • Decentralization (importance of local knowledge) • Aggregation (aim to get a collective decision) • The are also failures/risks in crowd decisions: • Homogeneity, centralization, division, imitation, emotionality 5copyright © 2018 Cefriel – All rights reserved James Surowiecki The wisdom of crowds Anchor, 2005
  • 6. from ideation to business value 6 2. HUMAN COMPUTATION & GAMES WITH A PURPOSE What is Human Computation? What goals can humans help machines to achieve? How to involve a crowd of persons? What extrinsic rewards (money, prizes, etc.) or intrinsic incentives can we adopt to motivate people? copyright © 2018 Cefriel – All rights reserved
  • 7. HUMAN COMPUTATION • Human Computation is a computer science technique in which a computational process is performed by outsourcing certain steps to humans. Unlike traditional computation, in which a human delegates a task to a computer, in Human Computation the computer asks a person or a large group of people to solve a problem; then it collects, interprets and integrates their solutions • The original concept of Human Computation by its inventor Luis von Ahn derived from the common sense observation that people are intrinsically very good at solving some kinds of tasks which are, on the other hand, very hard to address for a computer; this is the case of a number of targets of Artificial Intelligence (like image recognition or natural language understanding) for which research is still open 7copyright © 2018 Cefriel – All rights reserved Edith Law and Luis von Ahn. Human computation. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2011
  • 8. HUMAN COMPUTATION 8copyright © 2018 Cefriel – All rights reserved Problem: an Artificial Intelligence algorithm is unable to achieve an adequate result with a satisfactory level of confidence Solution: ask people to intervene when the AI system fails, “masking” the task within another human process Example: https://www.google.com/recaptcha/
  • 9. WHY HUMAN COMPUTATION FOR VGI? • Collection of new data – as a complement to VGI itself, exploiting redundancy of multiple contributions • Validation of collected data or automatic processing – as “third party” to solve discrepancies • Completion of data, to fill out “missing pieces” • Identification of mistakes/outdated information and respective “correction” 9copyright © 2018 Cefriel – All rights reserved
  • 10. GAMES WITH A PURPOSE • A GWAP lets to outsource to humans some steps of a computational process in an entertaining way • The application has a “collateral effect”, because players’ actions are exploited to solve a hidden task • The application *IS* a fully-fledged game (opposed to gamification, which is the use of game-like features in non-gaming environments) • The players are (usually) unaware of the hidden purpose, they simply meet game challenges 10copyright © 2018 Cefriel – All rights reserved Luis Von Ahn. Games with a purpose. Computer, 39(6):92–94, 2006 Luis Von Ahn and Laura Dabbish. Designing games with a purpose. Communications of the ACM, 51(8):58–67, 2008
  • 11. GAMES WITH A PURPOSE (GWAP) 11copyright © 2018 Cefriel – All rights reserved Problem: it’s the same of Human Computation (ask humans when AI fails) Solution: Solution: hide the task within a game, so that users are motivated by game challenges, often remaining unaware of the hidden purpose, task solution comes from agreement between players
  • 12. SOME “VARIATIONS” OF HUMAN COMPUTATION • Other terms have been used to indicate approaches and methods that are similar to Human Computation and sometimes mistaken for it • While there is of course quite a large overlap, it is useful to distinguish them • Crowdsourcing • Citizen Science 12copyright © 2018 Cefriel – All rights reserved
  • 13. CROWDSOURCING • Crowdsourcing is the process to outsource tasks to a “crowd” of distributed people. The possibility to exploit the Internet as vehicle to recruit contributors and to assign tasks led to the rise of micro-work platforms, thus often (but not always) implying a monetary reward. The term Crowdsourcing, although quite recent, is used to indicate a wide range of practices; however, the most common meaning of Crowdsourcing implies that the “crowd” of workers involved in the solution of tasks is different from the traditional or intended groups of task solvers 13copyright © 2018 Cefriel – All rights reserved Jeff Howe. Crowdsourcing: How the power of the crowd is driving the future of business. Random House, 2008
  • 14. CROWDSOURCING 14copyright © 2018 Cefriel – All rights reserved Problem: a company needs to execute a lot of simple tasks, but cannot afford hiring a person to do that job Solution: pack tasks in bunches (human intelligence tasks or HITs) and outsource them to a very cheap workforce through an online platform Example: https://www.mturk.com/
  • 15. CITIZEN SCIENCE • Citizen Science is the involvement of volunteers to collect or process data as part of a scientific or research experiment; those volunteers can be the scientists and researchers themselves, but more often the name of this discipline “implies a form of science developed and enacted by citizens” including those “outside of formal scientific institutions”, thus representing a form of public participation to science. Formally, Citizen Science has been defined as “the systematic collection and analysis of data; development of technology; testing of natural phenomena; and the dissemination of these activities by researchers on a primarily avocational basis”. 15copyright © 2018 Cefriel – All rights reserved Alan Irwin. Citizen science: A study of people, expertise and sustainable development. Psychology Press, 1995
  • 16. CITIZEN SCIENCE 16copyright © 2018 Cefriel – All rights reserved Example: https://www.zooniverse.org/ Problem: a scientific experiment requires the execution of a lot of simple tasks, but researchers are busy Solution: engage the general audience in solving those tasks, explaining that they are contributing to science, research and the public good
  • 17. SPOT THE DIFFERENCE… • Similarities: • Involvement of people • Aggregation of multiple contributions • No automatic replacement • Variations: • Motivation • Reward (glory, money, passion/need) • Hybrids or parallel! 17copyright © 2018 Cefriel – All rights reserved Citizen Science Crowdsourcing Human Computation
  • 18. from ideation to business value 18 3. GWAP EXAMPLES FOR VGI MANAGEMENT Can we embed VGI management tasks within Games with a Purpose? copyright © 2018 Cefriel – All rights reserved
  • 19. 3 EXAMPLES OF GAMES WITH A PURPOSE FOR VGI • Collection of missing data: GWAP enabler for OSM Restaurants • Validation of automatically collected information: LCV game • Collection, validation and correction of data: Urbanopoly 19copyright © 2018 Cefriel – All rights reserved
  • 20. 20 • Input: OSM restaurants in a given area with/without cuisine tag (those with the tag are used for assessing player reliability) • Goal: assign score 𝜎 to each restaurant-cuisine pair to discover the “right” category • Score 𝜎 of each pair is updated on the basis of players’ choices (incremented if link selected) • When the score overcomes the threshold 𝜎 ≥ 𝑡 , the restaurant’s category is considered “true” (and removed from the game) • Restaurant POIs (amenity=restaurant) from OSM may miss the cuisine type (cuisine key) GWAP ENABLER TUTORIAL FOR OSM RESTAURANTS copyright © 2018 Cefriel – All rights reserved Pure GWAP with double player game mechanics Points, badges, leaderboard as intrinsic reward A player scores if he/she chooses the same cuisine of its gameplay “mate” Data validation is a result of the “agreement” between players https://github.com/STARS4ALL/ gwap-enabler-tutorial Points, badges, leaderboard as intrinsic reward
  • 21. 21 • Input: set of pixels where the two classifications “disagree” • Goal: assign score 𝜎 to each pixel-category pair to discover the “right” land cover class • Score 𝜎 of each pair is updated on the basis of players’ choices (incremented if selected, decremented if not selected) • When the score overcomes the threshold 𝜎 ≥ 𝑡 , the pixel’s category is considered “true” (and removed from the game) • Two automatic land cover classifications in disagreement: • DUSAF (Lombardy Region) and GlobeLand 30 (Chinese governmental agency) LAND COVER VALIDATION GAME copyright © 2018 Cefriel – All rights reserved https://youtu.be/Q0ru1hhDM9Q http://bit.ly/foss4game Pure GWAP with not-so-hidden purpose (played by “experts”) Points, badges, leaderboard as intrinsic reward A player scores if he/she guess one of the two disagreeing classifications Data validation is a result of the “agreement” between players Maria Antonia Brovelli, Irene Celino, Andrea Fiano, Monia Elisa Molinari, Vijaycharan Venkatachalam. A crowdsourcing-based game for land cover validation. Applied Geomatics, 2017
  • 22. 22 • Input: data from OSM • Goal: if data doesn’t exist, collect if data exists, validate if data is wrong, correct • Complex game embedding “mini-games” for data collection, validation and correction • Same score mechanisms, with score 𝜎 updated on the basis of players’ choices • When the score overcomes the threshold 𝜎 ≥ 𝑡 , data is considered “true” (and can be sent back to OSM) • POI information from OSM to be collected or validated/corrected URBANOPOLY copyright © 2018 Cefriel – All rights reserved Irene Celino. Geospatial dataset curation through a location-based game. Semantic Web Journal, Volume 6, Number 2, IOS Press, 2015 Monopoly-like game to win venues in the real world Wheel of fortune and mini-games to acquire venues and become “rich” in the game Data acquisition challenges as contributions for missing data Data validation challenges to check pre-existing data Result from players “agreement”
  • 23. LESSONS LEARNED BY DESIGNING AND RUNNING THOSE GAMES • Designing and developing a full game is expensive • The simpler the game, the better its acceptance by players and its “throughput” • Different players are motivated by different incentives • Fun is not always enough to engage people, especially in the long term • Data collected via games can be enough to train automatic models 23copyright © 2018 Cefriel – All rights reserved Gloria Re Calegari, Gioele Nasi, Irene Celino. Human Computation vs. Machine Learning: an Experimental Comparison for Image Classification. Human Computation Journal, 2018.
  • 24. from ideation to business value 24 4. INDIRECT PEOPLE INVOLVEMENT Are there indirect ways to involve humans in data processing? copyright © 2018 Cefriel – All rights reserved
  • 25. HUMANS AS A SOURCE OF INFORMATION • People are not only task executors, they are also information providers! • Open content and cooperative knowledge • Data explicitly provided by people like VGI can “hide” further information • e.g., logs of wiki editing, statistical distribution of contributes • Opportunistic sensing • Voluntary or involuntary digital traces of human-related activities • e.g., phone call logs, GPS traces, social media activities 25copyright © 2018 Cefriel – All rights reserved
  • 26. FROM SPATIAL ANALYTICS TO GEO-SPATIAL “SEMANTICS” • Spatial distribution and conglomeration of specific points of interest (POI) from OpenStreetMap can give hints about the geographical space • Re-engineering of spatial features through comparison between areas: same POI type shows different distribution  evidence for different semantics (e.g. what is a pub in Milano vs. London) • Semantic specification of spatial neighbourhoods: • Emerging neighbourhoods from spatial clustering of POIs (opposed to administrative divisions) • Spatial version of tf-idf to compare between different areas (e.g. central or peripheral areas in different cities) and to characterise neighbourhoods (e.g. shopping district) 26copyright © 2018 Cefriel – All rights reserved Gloria Re Calegari, Emanuela Carlino, Irene Celino, Diego Peroni. Supporting Geo-Ontology Engineering through Spatial Data Analytics. 13th Extended Semantic Web Conference, 2016
  • 27. FROM POI INFORMATION AND PHONE CALL LOGS TO LAND USE • General topic: exploit “low-cost” information about a geographic area as features to train a predictive model that outputs “expensive” information about the same area • “Inexpensive” input information: • Geo-information about points of interests processed to characterize space (distance from the nearest POI of type X) • Mobile traffic data processed using different time series techniques (smoothing, decomposition, filtering, time-windowing) • “Expensive” output information: • Land use characterization (usually collected through long and expensive workflows that mix machine processing and costly human labour) 27copyright © 2018 Cefriel – All rights reserved Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Extracting Urban Land Use from Linked Open Geospatial Data. IJGI, 2015 Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Filtering and Windowing Mobile Traffic Time Series for Territorial Land Use Classification. COMCOM, 2016
  • 28. from ideation to business value 28 5. FUTURE PERSPECTIVES Are we there yet?!? copyright © 2018 Cefriel – All rights reserved
  • 29. FUTURE PERSPECTIVES • VGI management is still an open issue • Human Computation methods (and the like) can be employed to support VGI management • Parallel/joint adoption of different methods to get the best out of them • Research challenges are still the same • Collection, completion/coverage, quality, (in)homogeneity, update/sustainability, … • Human-in-the-loop is an emerging trend and paradigm also in Machine Learning research (e.g. active learning) 29copyright © 2018 Cefriel – All rights reserved
  • 30. MILANO viale Sarca 226, 20126, Milano - Italy LONDON 4th floor 57 Rathbone Place London W1T 1JU – UK NEW YORK One Liberty Plaza, 165 Broadway, 23rd Floor, New York City, New York, 10006 USA Cefriel.com Thanks for your attention! Any question? Irene Celino Knowledge Technologies Digital Interaction Division irene.celino@cefriel.com