SlideShare a Scribd company logo
Crowdsourcing tasks in Linked Data management
 Elena Simperl,1 Barry Norton,2 Denny Vrandecic1
 1Institute         AIFB, Karlsruhe Institute of Technology, Germany
 2Ontotext          AD, Bulgaria
 Institute of Applied Informatics and Formal Description Methods (AIFB)
Institute of Applied Informatics and Formal Description Methods (AIFB)




 KIT – University of the State of Baden-Wuerttemberg and
 National Research Center of the Helmholtz Association                    www.kit.edu
Motivation

        Various aspects of Linked Data management
       naturally rely on human intelligence to yield
       optimal results
        But reaching a critical mass of useful contributions
       from all relevant stakeholders is still more an art
       than an engineering exercise




2   23.10.2011   Seminar - Die Rolle von Ontologien in Linked Data – Kickoff   Institut für Angewandte Informatik und Formale
                                                                                                Beschreibungsverfahren (AIFB)
Microtask platforms



                                                                 Break task
                                                                               Evaluate the
                    Define task                                 into smaller
                                                                                 results
                                                                    units




3   23.10.2011   Seminar - Die Rolle von Ontologien in Linked Data – Kickoff   Institut für Angewandte Informatik und Formale
                                                                                                Beschreibungsverfahren (AIFB)
Approach
        Formal, declarative description of the data and tasks
       using SPARQL patterns as a basis for the automatic
       design of HITs

         Integral part of Linked Data tools and applications
                 At design time application developer specifies which data
                 portions workers can process and via which types of HITs
                 At run time
                      The system materializes the data
                      Workers process it
                      Data and application are updated to reflect crowdsourcing results


4   23.10.2011   Crowdsourcing tasks in Linked Data management    Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
Examples of Linked Data tasks
    amenable to crowdsourcing

         Identity resolution
         Metadata completion and checking/correction
         Classification
         Ordering
           Quantitative
           Qualitative
         Translation




5   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                  Beschreibungsverfahren (AIFB)
Running Example




6   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                  Beschreibungsverfahren (AIFB)
Identity resolution

    Identity Resolution “involves the creation of sameAs
    links, either by comparison of metadata or by
    investigation of links on the human Web.”
    Input: {?station a metar:Station;
                      rdfs:label ?slabel;
                      wgs84:lat ?slat;
                      wgs84:long ?slong .
             ?airport a dbp-owl:Airport;
                      rdfs:label ?alabel;
                      wgs84:lat ?alat;
                      wgs84:long ?along}
    Output: {OPTIONAL
             {?airport owl:sameAs ?station}}



7   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                  Beschreibungsverfahren (AIFB)
Metadata completion & correction

    “Certain properties, necessary for a given query,
    may not be uniformly populated. Manually conducted
    research might be necessary to transfer this
    information from the human-readable Web”
     Input: {?station a metar:Station;
                      rdfs:label ?label;
                      wgs84:lat ?lat;
                      wgs84:long ?long;
                      dbp:icao ?badicao}


     Output: {?station dbp:icao ?goodicao}




8   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                  Beschreibungsverfahren (AIFB)
Classification

    “Linked Data emphasis[es…] relationships between
    resources [over classification]. [D]ue to the promoted
    use of generic vocabularies, is it not always possible
    to infer classification from […] properties”
    Input: {?station a metar:Station;
                     rdfs:label ?label;
                     wgs84:lat ?lat;
                     wgs84:long ?long}



    Output: {?station a ?type.
             ?type rdfs:subClassOf
            metar:Station}


9   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                  Beschreibungsverfahren (AIFB)
Ordering

     “Having means to rank Linked Data content along
     specific dimensions is typically deemed useful for
                                                          quantitative
     querying and browsing […both] “specific” ordering
     [(e.g. timestamps) … and] orderings […] via           qualitative
     “less straightforward” built-ins [(e.g. pref/alt labels)]”

 Input: {?station foaf:depiction ?x, ?y}




 Output: {{(?x ?y) a rdf:List}
          UNION {(?y ?x) a rdf:List}}



10   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
Translation

     “[An important] aspect of the labeling of resources for
     humans is multi-linguality […] actual provision of labels
     in non-English languages is currently rather low”


     Input: {?station rdfs:label ?enlabel.
             FILTER (LANG(?label) = "EN")}




     Output: {?station rdfs:label ?bglabel.
              FILTER (LANG(?label) = "BG")}




11   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
Open query answering

          Query a FOAF-file using the vCard vocabulary

     hp:Harry foaf:mbox <mailto:scarface@hogwarts.ac.uk> ;
        foaf:nick "Harry" ; foaf:familyName "Potter" .


     SELECT ?name ?email WHERE
     { ?p vcard:email ?email ; vcard:fn ?name }



          In order to answer the query as intended
                  Vocabulary mapping and entity resolution (foaf to vcard)
                  Metadata completion (full name is Harry Potter)
12   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
Limitations of microtask crowdsourcing

          Decomposability
          Verifiability
          Expertise

         Compositions to deal with tasks with
        underspecified workflow and/or multiple correct
        answers




13   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
Challenges
          Decomposition of user-visible queries:
                  SPARQL
                       Easy: Low quality (meta)data can be subject to automated
                       checking (even if not fixing)
                       Medium: Missing data (and translation) can be automatically
                       identified (but knowing to which dataset it should belong is not
                       necessarily clear)
                       Difficult:
                            Interlinking (at least sameAs) is somewhat implicit (using
                            entailment) and knowing where user expects
                            Query optimisation obfuscates what is used and should
                            involve costs for human tasks
                  Pig might be somewhat easier in latter regard
          Caching
                  Naively we can materialise HIT results into datasets
                  How to deal with partial coverage and dynamic datasets
14   23.10.2011   Crowdsourcing tasks in Linked Data management      Institut für Angewandte Informatik und Formale
                                                                                      Beschreibungsverfahren (AIFB)
Further Challenges

         Appropriate level of granularity for HITs design for
        specific SPARQL constructs and typical
        functionality of Linked Data management
        components
         Optimal user interfaces of graph-like content
                  (Contextual) Rendering of LOD entities and tasks
          Pricing and workers’ assignment
                  Can we connect the end-users of an application and
                  their wish for specific data to be consumed with the
                  payment of workers and prioritization of HITs?
                  Dealing with spam / gaming
15   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
QUESTIONS



16   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)

More Related Content

PDF
Aaai2012
PDF
ThreadModel rev 1.4
PDF
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
PPTX
Data Integration at the Ontology Engineering Group
PDF
IEEE 2014 C# Projects
PDF
IEEE 2014 C# Projects
DOC
Niladri_Sekhar_Das
PDF
2012 13 ieee dotnet titles- jp infotech
Aaai2012
ThreadModel rev 1.4
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
Data Integration at the Ontology Engineering Group
IEEE 2014 C# Projects
IEEE 2014 C# Projects
Niladri_Sekhar_Das
2012 13 ieee dotnet titles- jp infotech

Viewers also liked (7)

PPTX
Linked-Data based Data Management for data.gov.sg
PDF
Crowdsourcing for Information Retrieval: From Statistics to Ethics
PDF
Crowdsourcing Linked Data management
PDF
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
PDF
Managing Crowdsourced Human Computation: A Tutorial
PPTX
ResearchSpace Platform in Use
PPTX
European Data Science Academy: Training the Next Generation of Data Scientists
Linked-Data based Data Management for data.gov.sg
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing Linked Data management
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Managing Crowdsourced Human Computation: A Tutorial
ResearchSpace Platform in Use
European Data Science Academy: Training the Next Generation of Data Scientists
Ad

Similar to Crowdsourcing tasks in Linked Data management (20)

PDF
Crowdsourcing-enabled Linked Data management architecture
PDF
NYCFacets: Metadata, Extrametadata and Crowdknowing
PPTX
Searching Linked Data
PPTX
Selectivity Estimation for Hybrid Queries over Text-Rich Data Graphs
PPT
Linked Data Workshop Stanford University
PDF
Sssc2011 ontologies final
PDF
Sieve - Data Quality and Fusion - LWDM2012
PDF
Soeren okfn greece meetup
PPTX
Big Linked Data - Creating Training Curricula
PPTX
EDF2013: Data Science Curriculum: Barry Norton: Big Linked Data
PDF
Approximate and Incremental Processing of Complex Queries against the Web of ...
PDF
On demand access to Big Data through Semantic Technologies
PPTX
Semantic Web and Related Work at W3C
PPTX
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
PDF
GOSPL: A Method and Tool for Fact-Oriented Hybrid Ontology Engineering
PPT
Exploring Linked Data
PPTX
Dissertation Defense Presentation
PDF
Sharing data on the web (2013)
PDF
Pal gov.tutorial2.session11.oracle
PDF
Linked Data and Sevices
Crowdsourcing-enabled Linked Data management architecture
NYCFacets: Metadata, Extrametadata and Crowdknowing
Searching Linked Data
Selectivity Estimation for Hybrid Queries over Text-Rich Data Graphs
Linked Data Workshop Stanford University
Sssc2011 ontologies final
Sieve - Data Quality and Fusion - LWDM2012
Soeren okfn greece meetup
Big Linked Data - Creating Training Curricula
EDF2013: Data Science Curriculum: Barry Norton: Big Linked Data
Approximate and Incremental Processing of Complex Queries against the Web of ...
On demand access to Big Data through Semantic Technologies
Semantic Web and Related Work at W3C
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
GOSPL: A Method and Tool for Fact-Oriented Hybrid Ontology Engineering
Exploring Linked Data
Dissertation Defense Presentation
Sharing data on the web (2013)
Pal gov.tutorial2.session11.oracle
Linked Data and Sevices
Ad

More from Barry Norton (15)

PPTX
Knowledge Graphs and Milestone
PPTX
GRAVITATE Search
PPTX
ResearchSpace Collaborative Features
PDF
Book of the Dead Project
PPTX
Data Culture / Culture Data
PDF
Querying Cultural Heritage
PDF
A Data API with Security and Graph-Level Access Control
PDF
GLAMorous LOD and ResearchSpace introduction
PDF
GLAMorous LOD
PDF
Linked Data, Ontologies and Inference
PDF
Integrating Drupal with a Triple Store
PPTX
Linked Data and Services
PPT
Towards Linked Open Services and Processes
PPTX
Geospatial Linked Open Services
PPTX
Linked Open Services @ SemData2010
Knowledge Graphs and Milestone
GRAVITATE Search
ResearchSpace Collaborative Features
Book of the Dead Project
Data Culture / Culture Data
Querying Cultural Heritage
A Data API with Security and Graph-Level Access Control
GLAMorous LOD and ResearchSpace introduction
GLAMorous LOD
Linked Data, Ontologies and Inference
Integrating Drupal with a Triple Store
Linked Data and Services
Towards Linked Open Services and Processes
Geospatial Linked Open Services
Linked Open Services @ SemData2010

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Cloud computing and distributed systems.
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Electronic commerce courselecture one. Pdf
PDF
KodekX | Application Modernization Development
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Spectroscopy.pptx food analysis technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
MYSQL Presentation for SQL database connectivity
sap open course for s4hana steps from ECC to s4
Dropbox Q2 2025 Financial Results & Investor Presentation
Cloud computing and distributed systems.
MIND Revenue Release Quarter 2 2025 Press Release
Spectral efficient network and resource selection model in 5G networks
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Electronic commerce courselecture one. Pdf
KodekX | Application Modernization Development
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The AUB Centre for AI in Media Proposal.docx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Network Security Unit 5.pdf for BCA BBA.
Spectroscopy.pptx food analysis technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf

Crowdsourcing tasks in Linked Data management

  • 1. Crowdsourcing tasks in Linked Data management Elena Simperl,1 Barry Norton,2 Denny Vrandecic1 1Institute AIFB, Karlsruhe Institute of Technology, Germany 2Ontotext AD, Bulgaria Institute of Applied Informatics and Formal Description Methods (AIFB) Institute of Applied Informatics and Formal Description Methods (AIFB) KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
  • 2. Motivation Various aspects of Linked Data management naturally rely on human intelligence to yield optimal results But reaching a critical mass of useful contributions from all relevant stakeholders is still more an art than an engineering exercise 2 23.10.2011 Seminar - Die Rolle von Ontologien in Linked Data – Kickoff Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 3. Microtask platforms Break task Evaluate the Define task into smaller results units 3 23.10.2011 Seminar - Die Rolle von Ontologien in Linked Data – Kickoff Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 4. Approach Formal, declarative description of the data and tasks using SPARQL patterns as a basis for the automatic design of HITs Integral part of Linked Data tools and applications At design time application developer specifies which data portions workers can process and via which types of HITs At run time The system materializes the data Workers process it Data and application are updated to reflect crowdsourcing results 4 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 5. Examples of Linked Data tasks amenable to crowdsourcing Identity resolution Metadata completion and checking/correction Classification Ordering Quantitative Qualitative Translation 5 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 6. Running Example 6 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 7. Identity resolution Identity Resolution “involves the creation of sameAs links, either by comparison of metadata or by investigation of links on the human Web.” Input: {?station a metar:Station; rdfs:label ?slabel; wgs84:lat ?slat; wgs84:long ?slong . ?airport a dbp-owl:Airport; rdfs:label ?alabel; wgs84:lat ?alat; wgs84:long ?along} Output: {OPTIONAL {?airport owl:sameAs ?station}} 7 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 8. Metadata completion & correction “Certain properties, necessary for a given query, may not be uniformly populated. Manually conducted research might be necessary to transfer this information from the human-readable Web” Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long; dbp:icao ?badicao} Output: {?station dbp:icao ?goodicao} 8 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 9. Classification “Linked Data emphasis[es…] relationships between resources [over classification]. [D]ue to the promoted use of generic vocabularies, is it not always possible to infer classification from […] properties” Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long} Output: {?station a ?type. ?type rdfs:subClassOf metar:Station} 9 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 10. Ordering “Having means to rank Linked Data content along specific dimensions is typically deemed useful for quantitative querying and browsing […both] “specific” ordering [(e.g. timestamps) … and] orderings […] via qualitative “less straightforward” built-ins [(e.g. pref/alt labels)]” Input: {?station foaf:depiction ?x, ?y} Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}} 10 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 11. Translation “[An important] aspect of the labeling of resources for humans is multi-linguality […] actual provision of labels in non-English languages is currently rather low” Input: {?station rdfs:label ?enlabel. FILTER (LANG(?label) = "EN")} Output: {?station rdfs:label ?bglabel. FILTER (LANG(?label) = "BG")} 11 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 12. Open query answering Query a FOAF-file using the vCard vocabulary hp:Harry foaf:mbox <mailto:scarface@hogwarts.ac.uk> ; foaf:nick "Harry" ; foaf:familyName "Potter" . SELECT ?name ?email WHERE { ?p vcard:email ?email ; vcard:fn ?name } In order to answer the query as intended Vocabulary mapping and entity resolution (foaf to vcard) Metadata completion (full name is Harry Potter) 12 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 13. Limitations of microtask crowdsourcing Decomposability Verifiability Expertise Compositions to deal with tasks with underspecified workflow and/or multiple correct answers 13 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 14. Challenges Decomposition of user-visible queries: SPARQL Easy: Low quality (meta)data can be subject to automated checking (even if not fixing) Medium: Missing data (and translation) can be automatically identified (but knowing to which dataset it should belong is not necessarily clear) Difficult: Interlinking (at least sameAs) is somewhat implicit (using entailment) and knowing where user expects Query optimisation obfuscates what is used and should involve costs for human tasks Pig might be somewhat easier in latter regard Caching Naively we can materialise HIT results into datasets How to deal with partial coverage and dynamic datasets 14 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 15. Further Challenges Appropriate level of granularity for HITs design for specific SPARQL constructs and typical functionality of Linked Data management components Optimal user interfaces of graph-like content (Contextual) Rendering of LOD entities and tasks Pricing and workers’ assignment Can we connect the end-users of an application and their wish for specific data to be consumed with the payment of workers and prioritization of HITs? Dealing with spam / gaming 15 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 16. QUESTIONS 16 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)