SlideShare a Scribd company logo
A semantically enabled architecture for
 crowdsourced Linked Data management
 Elena Simperl,1 Maribel Acosta,1 Barry Norton2
 1Institute         AIFB, Karlsruhe Institute of Technology, Germany
 2Ontotext          AD, Bulgaria
 Institute of Applied Informatics and Formal Description Methods (AIFB)
Institute of Applied Informatics and Formal Description Methods (AIFB)




 KIT – University of the State of Baden-Wuerttemberg and
 National Research Center of the Helmholtz Association                    www.kit.edu
Background: What is Linked Data?
      Linked Data: set of best practices
     to publish and connect structured
     data on the Web.
            URIs to identify entities and
            concepts in the world
            HTTP to access and retrieve
            resources and descriptions of
            these resources
            RDF as generic graph-based data
            model to structure and link data
      Taken together Linked Data is
     said to form a ‘cloud’ of shared
     references and vocabularies.
       Query language: SPARQL.

    http://linkeddata.org/faq
2     07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                   Linked Data management                                                                     Beschreibungsverfahren (AIFB)
Background: Why Linked Data?
Data.gov & public sector information: more                                 BBC & media: added value of
transparency and accountability in                                         content through interlinking
governance




                                                                          Google, Yahoo, Bing & schema.org:
                                                                          enhanced search




3   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                 Linked Data management                                                                     Beschreibungsverfahren (AIFB)
Outline

                 1       • Motivation

                 2       • Our Approach

                 3       • Extensions to VoID and SPARQL

                 4       • Crowdsourced query processing tasks

                 5       • Advantages

                 6       • Challenges

4   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                 Linked Data management                                                                     Beschreibungsverfahren (AIFB)
1. Motivation
                        User Query: Give me the German names of all commercial
                        airports in Baden-Württemberg, ordered by their most
                        informative description.

    „Retrieve the labels in German of commercial airports located
    in Baden-Württemberg, ordered by the better human-readable
    description of the airport given in the comment“.


         This query cannot be optimally answered automatically:
                 Incorrect/missing classification of entities (e.g. classification as
                 airports instead of commercial airports).
                 Missing information in data sets (e.g. German labels).
                 It is not possible to optimally perform subjective operations (e.g.
                 comparisons of pictures or NL comments).
5   07.06.2012    CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
1. Motivation
    „Retrieve the labels in German of commercial airports
    located in Baden-Württemberg, ordered by the better human-
    readable description of the airport given in the comment“.


         In order to answer the query as intended:
                 Classification of airports as commercial airports.
                 Identity resolution of places (Baden-Württemberg).
                 Translation of the labels of the airports.
                 Ordering of the comments by a subjective comparison.




6   07.06.2012    CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
1. Motivation
    „Retrieve the labels in German of commercial airports
    located in Baden-Württemberg, ordered by the better human-
    readable description of the airport given in the comment“.

    SPARQL Query:
    SELECT ?label WHERE {                                                       Classification
                                                                            1
      ?x a metar:CommercialHubAirport;
          rdfs:label ?label;
          rdfs:comment ?comment .
      ?x geonames:parentFeature ?z .                                                                   Identity Resolution
                                                                                                                                         2
      ?z owl:sameAs <http://dbpedia.org/resource/Baden-Wuerttemberg> .
      FILTER (LANG(?label) = "de") 3 Missing Information
                                                           4 Ordering
    } ORDER BY CROWD(?comment, "Better description of %x")


7   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced         Institut für Angewandte Informatik und Formale
                 Linked Data management                                                                           Beschreibungsverfahren (AIFB)
1. Motivation: Our Aim
        SPARQL query engine, able to process queries using
       seamless combination of automatic query processing and
       crowdsourcing.
                        Query                     Results
                                                                                                        Mediator
                         SPARQL query engine                             Crowdsourced query processing
                              Query parsing                                 Task design        UI generation

                           Query optimization

                             Query execution


                     Wrapper       Wrapper         Wrapper                                 Wrapper




8   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced             Institut für Angewandte Informatik und Formale
                 Linked Data management                                                                               Beschreibungsverfahren (AIFB)
2. Our Approach

                                                Parser

     Query                    Results                Decomposes the input query.
       SPARQL query engine
                                                     Selects the data sets that should be
             Query parsing
                                                     accessed to produce answers.
          Query optimization
                                                     Rewrites the query into the internal
             Query execution                         structures.




9   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                 Linked Data management                                                                     Beschreibungsverfahren (AIFB)
2. Our Approach

                                                 Optimizer

      Query                    Results                DB statistics and crowdsourcing
        SPARQL query engine                           statistics: estimated time to completion,
               Query parsing
                                                      and other information about the
                                                      performance (quality, cost) of the crowd.
          Query optimization

                                                      Traditional data bases optimization
              Query execution
                                                      techniques are implemented.

                                                      Determines which parts of the query
                                                      should be solved by human input: VoID
                                                      and SPARQL extensions.

                                                      Generates logical and physical plans.
10   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
2. Our Approach

                                                 Executor

      Query                    Results                Implements physical operators.
        SPARQL query engine
                                                      Invokes crowdsourcing component:
              Query parsing

                                                              Creates tasks.
           Query optimization
                                                              Generates UI.
            Query execution
                                                              Infers facts automatically.

                                                      Executes query against Linked Data:
                                                      computational tasks.

                                                      Incorporates results from the human
                                                      input.
11   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
3. Extensions to VoID and SPARQL
         The RDF based schema to describe data sets is VoID
        (Vocabulary of Interlinked Datasets).



         Common VoID predicates: voidDataset,
        void:inDataset, void:Linkset, void:linkPredicate,
        void:target.
                                                               Automatic interlinking of datasets

          VoID extensions:                                     CrowdClass

                                                               CrowdProperty

13   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
3. Extensions to VoID and SPARQL
          Automatic interlinking of data sets

     Example - Specification of Data Sets:

     :METAR rdf:type void:Dataset .                                                         METAR
     :Genonames rdf:type void:Dataset .
                                                                                                                  owl:sameAs


     :METAR2Geonames rdf:type void:Linkset ;
          void:linkPredicate owl:sameAs ;
          void:target :METAR ;                                                                           Geonames
          void:target :Geonames .

14   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
3. Extensions to VoID and SPARQL

           CrowdClass
      - Specifies which entities of a data set could be crowdsourced.
      - All subclasses of the crowdClass are also defined (implicitly)
        as crowdsourced entities.

     Example:
     metar:Airport void:inDataset :METAR .
     metar:CommercialHubAirport void:inDataset :METAR;
            rdfs:subClass metar:Airport .
     metar:Airport rdf:type void:crowdClass .
     metar:CommercialHubAirport rdf:type void:crowdClass.


15    07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                   Linked Data management                                                                     Beschreibungsverfahren (AIFB)
3. Extensions to VoID and SPARQL
          RDF data can be queried using the language SPARQL.

         Common SPARQL operators: join, union, optional,
        filter, order by.


         Properties related to general ontology languages such as
        OWL are treated as extensions of SPARQL operators,
        and are modeled in our architecture as tasks.




16   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
4. Tasks

         Formal, declarative description of the data and
        tasks using SPARQL patterns as a basis for the
        automatic design of HITs.

                  Identity resolution

                  Missing information

                  Ontological classification

                  Ordering (new operator)


17   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
4.1. Ontological Classification
         It is not always possible to automatically infer classification
        from the properties.
         Example: Retrieve the names (labels) of METAR stations that
        correspond to commercial airports.

     SELECT ?label WHERE {
       ?station a metar:CommercialHubAirport;
         rdfs:label ?label .}

     Input:        {?station a metar:Station;
                      rdfs:label ?label;
                      wgs84:lat ?lat;
                      wgs84:long ?long}

     Output: {?station a ?type.
              ?type rdfs:subClassOf metar:Station}
18   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
4.2. Ordering
         Orderings defined via less straightforward built-ins; for
        instance, the ordering of pictorial representations of entities.
         SPARQL extension: ORDER BY CROWD
          Example: Retrieves all airports and their pictures, and the pictures should
        be ordered according to the more representative image of the given airport.

SELECT ?airport ?picture WHERE {
  ?airport a metar:Airport;
    foaf:depiction ?picture .
} ORDER BY CROWD(?picture,
"Most representative image for %airport")

 Input:       {?airport foaf:depiction ?x, ?y}

Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}}

19   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
4.3. Computational tasks expressed as
     SPARQL queries

          Transitive relations inferred automatically, without
         requiring human intervention.

           Implementation of restrictions in SPIN.

     Identity Resolution                            Classification                           Ordering
     CONSTRUCT {                                    CONSTRUCT {                              CONSTRUCT {
      ?a owl:sameAs ?c .                             ?a a ?b.                                 {(?a ?b) a rdf:List .}
     } WHERE {                                        ?b rdfs:subClassOf ?c.                 } WHERE {
      ?a owl:sameAs ?b .                            } WHERE {                                 (?a ?x) a rdf:List .
      ?b owl:sameAs ?c .                              ?a rdfs:subClassOf ?c.                  (?x ?b) a rdf:List .
     }                                                ?b rdfs:subClassOf ?b1.                }
                                                      ?b1 rdfs:subClassOf ?c.
                                                    }


20    07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced     Institut für Angewandte Informatik und Formale
                   Linked Data management                                                                       Beschreibungsverfahren (AIFB)
5. Advantages
         Declarative description of data allows to decompose the
        query.

          Generation of the UIs automatically.

         Generation of human tasks on-the-fly and adjustment of
        the design of the task.

         Automatic consistency check of results by reasoning
        against validating ontology.




21   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
6. Challenges
         Appropriate level of granularity for HITs design for specific
        SPARQL constructs.
          Caching
            Naively we can materialise HIT results into datasets.
                  How to deal with partial coverage and dynamic datasets.

          Optimal user interfaces of graph-like content.

          Pricing and workers’ assignment.




22   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)
QUESTIONS



23   07.06.2012   CrowdSearch 2012 - A semantically enabled architecture for crowdsourced   Institut für Angewandte Informatik und Formale
                  Linked Data management                                                                     Beschreibungsverfahren (AIFB)

More Related Content

PDF
"Ontology-centric navigation of the scientific literature"
PPTX
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
PPTX
Presentation for turkot v2 0 (dh)
PDF
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
PDF
Jarrar.lecture notes.ontologyintroduction
PDF
Hibernate training at HarshithaTechnologySolutions @ Nizampet
PDF
Veda Semantic Technology
PDF
FAST Search for SharePoint
"Ontology-centric navigation of the scientific literature"
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Presentation for turkot v2 0 (dh)
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
Jarrar.lecture notes.ontologyintroduction
Hibernate training at HarshithaTechnologySolutions @ Nizampet
Veda Semantic Technology
FAST Search for SharePoint

What's hot (10)

PDF
Hadoop - Now, Next and Beyond
PPTX
Pacename
KEY
02 Web Search
PDF
Treasure Data: Big Data Analytics on Heroku
PDF
J2EE ieee projects 2011 SBGC ( Trichy, Chennai, Tirupati, Nellore, Kadapa, Ku...
PDF
Show and tell program 04 2014-09-04
PDF
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS
PDF
Linked Open data: CNR
PDF
Big Data and Data Standardization at LinkedIn
PPTX
Flexible querying of relational databases fuzzy set based approach 27-11
Hadoop - Now, Next and Beyond
Pacename
02 Web Search
Treasure Data: Big Data Analytics on Heroku
J2EE ieee projects 2011 SBGC ( Trichy, Chennai, Tirupati, Nellore, Kadapa, Ku...
Show and tell program 04 2014-09-04
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS
Linked Open data: CNR
Big Data and Data Standardization at LinkedIn
Flexible querying of relational databases fuzzy set based approach 27-11
Ad

Similar to Crowdsourcing-enabled Linked Data management architecture (20)

PDF
Aaai2012
PPTX
Crowdsourcing tasks in Linked Data management
PPTX
Linked data for Enterprise Data Integration
PPT
Semtech 2011 impressions
PDF
Big Data Real Time Applications
PDF
“Semantic Technologies for Smart Services”
PDF
Session 0.0 poster minutes madness
PDF
X api chinese cop monthly meeting feb.2016
PDF
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
PDF
Applied Semantic Search with Microsoft SQL Server
PDF
MongoDB_Spark
KEY
How to Share and Reuse Learning Resources: the ARIADNE Experience
PDF
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
PPT
Sem tech 2011 v8
PDF
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
PDF
ConceptClassifier for SharePoint Turbo Charging the Public Sector
PPTX
Role of Semantic Web in Health Informatics
DOCX
MarkAndrews
PPTX
LRMI in Context, Brandt Redd
Aaai2012
Crowdsourcing tasks in Linked Data management
Linked data for Enterprise Data Integration
Semtech 2011 impressions
Big Data Real Time Applications
“Semantic Technologies for Smart Services”
Session 0.0 poster minutes madness
X api chinese cop monthly meeting feb.2016
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Applied Semantic Search with Microsoft SQL Server
MongoDB_Spark
How to Share and Reuse Learning Resources: the ARIADNE Experience
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
Sem tech 2011 v8
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
ConceptClassifier for SharePoint Turbo Charging the Public Sector
Role of Semantic Web in Health Informatics
MarkAndrews
LRMI in Context, Brandt Redd
Ad

More from Elena Simperl (20)

PDF
When stars align: studies in data quality, knowledge graphs, and machine lear...
PDF
Knowledge engineering: from people to machines and back
PDF
This talk was not generated with ChatGPT: how AI is changing science
PDF
Knowledge graph use cases in natural language generation
PDF
Knowledge engineering: from people to machines and back
PDF
The web of data: how are we doing so far
PDF
What Wikidata teaches us about knowledge engineering
PDF
Open government data portals: from publishing to use and impact
PDF
Ten myths about knowledge graphs.pdf
PDF
What Wikidata teaches us about knowledge engineering
PDF
Data commons and their role in fighting misinformation.pdf
PDF
Are our knowledge graphs trustworthy?
PDF
The web of data: how are we doing so far?
PDF
Crowdsourcing and citizen engagement for people-centric smart cities
PDF
Pie chart or pizza: identifying chart types and their virality on Twitter
PDF
High-value datasets: from publication to impact
PDF
The story of Data Stories
PDF
The human face of AI: how collective and augmented intelligence can help sol...
PDF
Qrowd and the city: designing people-centric smart cities
PDF
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Knowledge engineering: from people to machines and back
This talk was not generated with ChatGPT: how AI is changing science
Knowledge graph use cases in natural language generation
Knowledge engineering: from people to machines and back
The web of data: how are we doing so far
What Wikidata teaches us about knowledge engineering
Open government data portals: from publishing to use and impact
Ten myths about knowledge graphs.pdf
What Wikidata teaches us about knowledge engineering
Data commons and their role in fighting misinformation.pdf
Are our knowledge graphs trustworthy?
The web of data: how are we doing so far?
Crowdsourcing and citizen engagement for people-centric smart cities
Pie chart or pizza: identifying chart types and their virality on Twitter
High-value datasets: from publication to impact
The story of Data Stories
The human face of AI: how collective and augmented intelligence can help sol...
Qrowd and the city: designing people-centric smart cities
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...

Recently uploaded (20)

PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Cell Types and Its function , kingdom of life
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
Institutional Correction lecture only . . .
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Pre independence Education in Inndia.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
O5-L3 Freight Transport Ops (International) V1.pdf
Basic Mud Logging Guide for educational purpose
Cell Types and Its function , kingdom of life
Abdominal Access Techniques with Prof. Dr. R K Mishra
Module 4: Burden of Disease Tutorial Slides S2 2025
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Classroom Observation Tools for Teachers
Institutional Correction lecture only . . .
01-Introduction-to-Information-Management.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
VCE English Exam - Section C Student Revision Booklet
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
human mycosis Human fungal infections are called human mycosis..pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPH.pptx obstetrics and gynecology in nursing
TR - Agricultural Crops Production NC III.pdf
Pre independence Education in Inndia.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...

Crowdsourcing-enabled Linked Data management architecture

  • 1. A semantically enabled architecture for crowdsourced Linked Data management Elena Simperl,1 Maribel Acosta,1 Barry Norton2 1Institute AIFB, Karlsruhe Institute of Technology, Germany 2Ontotext AD, Bulgaria Institute of Applied Informatics and Formal Description Methods (AIFB) Institute of Applied Informatics and Formal Description Methods (AIFB) KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
  • 2. Background: What is Linked Data? Linked Data: set of best practices to publish and connect structured data on the Web. URIs to identify entities and concepts in the world HTTP to access and retrieve resources and descriptions of these resources RDF as generic graph-based data model to structure and link data Taken together Linked Data is said to form a ‘cloud’ of shared references and vocabularies. Query language: SPARQL. http://linkeddata.org/faq 2 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 3. Background: Why Linked Data? Data.gov & public sector information: more BBC & media: added value of transparency and accountability in content through interlinking governance Google, Yahoo, Bing & schema.org: enhanced search 3 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 4. Outline 1 • Motivation 2 • Our Approach 3 • Extensions to VoID and SPARQL 4 • Crowdsourced query processing tasks 5 • Advantages 6 • Challenges 4 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 5. 1. Motivation User Query: Give me the German names of all commercial airports in Baden-Württemberg, ordered by their most informative description. „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human-readable description of the airport given in the comment“. This query cannot be optimally answered automatically: Incorrect/missing classification of entities (e.g. classification as airports instead of commercial airports). Missing information in data sets (e.g. German labels). It is not possible to optimally perform subjective operations (e.g. comparisons of pictures or NL comments). 5 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 6. 1. Motivation „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human- readable description of the airport given in the comment“. In order to answer the query as intended: Classification of airports as commercial airports. Identity resolution of places (Baden-Württemberg). Translation of the labels of the airports. Ordering of the comments by a subjective comparison. 6 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 7. 1. Motivation „Retrieve the labels in German of commercial airports located in Baden-Württemberg, ordered by the better human- readable description of the airport given in the comment“. SPARQL Query: SELECT ?label WHERE { Classification 1 ?x a metar:CommercialHubAirport; rdfs:label ?label; rdfs:comment ?comment . ?x geonames:parentFeature ?z . Identity Resolution 2 ?z owl:sameAs <http://dbpedia.org/resource/Baden-Wuerttemberg> . FILTER (LANG(?label) = "de") 3 Missing Information 4 Ordering } ORDER BY CROWD(?comment, "Better description of %x") 7 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 8. 1. Motivation: Our Aim SPARQL query engine, able to process queries using seamless combination of automatic query processing and crowdsourcing. Query Results Mediator SPARQL query engine Crowdsourced query processing Query parsing Task design UI generation Query optimization Query execution Wrapper Wrapper Wrapper Wrapper 8 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 9. 2. Our Approach Parser Query Results Decomposes the input query. SPARQL query engine Selects the data sets that should be Query parsing accessed to produce answers. Query optimization Rewrites the query into the internal Query execution structures. 9 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 10. 2. Our Approach Optimizer Query Results DB statistics and crowdsourcing SPARQL query engine statistics: estimated time to completion, Query parsing and other information about the performance (quality, cost) of the crowd. Query optimization Traditional data bases optimization Query execution techniques are implemented. Determines which parts of the query should be solved by human input: VoID and SPARQL extensions. Generates logical and physical plans. 10 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 11. 2. Our Approach Executor Query Results Implements physical operators. SPARQL query engine Invokes crowdsourcing component: Query parsing Creates tasks. Query optimization Generates UI. Query execution Infers facts automatically. Executes query against Linked Data: computational tasks. Incorporates results from the human input. 11 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 12. 3. Extensions to VoID and SPARQL The RDF based schema to describe data sets is VoID (Vocabulary of Interlinked Datasets). Common VoID predicates: voidDataset, void:inDataset, void:Linkset, void:linkPredicate, void:target. Automatic interlinking of datasets VoID extensions: CrowdClass CrowdProperty 13 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 13. 3. Extensions to VoID and SPARQL Automatic interlinking of data sets Example - Specification of Data Sets: :METAR rdf:type void:Dataset . METAR :Genonames rdf:type void:Dataset . owl:sameAs :METAR2Geonames rdf:type void:Linkset ; void:linkPredicate owl:sameAs ; void:target :METAR ; Geonames void:target :Geonames . 14 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 14. 3. Extensions to VoID and SPARQL CrowdClass - Specifies which entities of a data set could be crowdsourced. - All subclasses of the crowdClass are also defined (implicitly) as crowdsourced entities. Example: metar:Airport void:inDataset :METAR . metar:CommercialHubAirport void:inDataset :METAR; rdfs:subClass metar:Airport . metar:Airport rdf:type void:crowdClass . metar:CommercialHubAirport rdf:type void:crowdClass. 15 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 15. 3. Extensions to VoID and SPARQL RDF data can be queried using the language SPARQL. Common SPARQL operators: join, union, optional, filter, order by. Properties related to general ontology languages such as OWL are treated as extensions of SPARQL operators, and are modeled in our architecture as tasks. 16 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 16. 4. Tasks Formal, declarative description of the data and tasks using SPARQL patterns as a basis for the automatic design of HITs. Identity resolution Missing information Ontological classification Ordering (new operator) 17 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 17. 4.1. Ontological Classification It is not always possible to automatically infer classification from the properties. Example: Retrieve the names (labels) of METAR stations that correspond to commercial airports. SELECT ?label WHERE { ?station a metar:CommercialHubAirport; rdfs:label ?label .} Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long} Output: {?station a ?type. ?type rdfs:subClassOf metar:Station} 18 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 18. 4.2. Ordering Orderings defined via less straightforward built-ins; for instance, the ordering of pictorial representations of entities. SPARQL extension: ORDER BY CROWD Example: Retrieves all airports and their pictures, and the pictures should be ordered according to the more representative image of the given airport. SELECT ?airport ?picture WHERE { ?airport a metar:Airport; foaf:depiction ?picture . } ORDER BY CROWD(?picture, "Most representative image for %airport") Input: {?airport foaf:depiction ?x, ?y} Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}} 19 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 19. 4.3. Computational tasks expressed as SPARQL queries Transitive relations inferred automatically, without requiring human intervention. Implementation of restrictions in SPIN. Identity Resolution Classification Ordering CONSTRUCT { CONSTRUCT { CONSTRUCT { ?a owl:sameAs ?c . ?a a ?b. {(?a ?b) a rdf:List .} } WHERE { ?b rdfs:subClassOf ?c. } WHERE { ?a owl:sameAs ?b . } WHERE { (?a ?x) a rdf:List . ?b owl:sameAs ?c . ?a rdfs:subClassOf ?c. (?x ?b) a rdf:List . } ?b rdfs:subClassOf ?b1. } ?b1 rdfs:subClassOf ?c. } 20 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 20. 5. Advantages Declarative description of data allows to decompose the query. Generation of the UIs automatically. Generation of human tasks on-the-fly and adjustment of the design of the task. Automatic consistency check of results by reasoning against validating ontology. 21 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 21. 6. Challenges Appropriate level of granularity for HITs design for specific SPARQL constructs. Caching Naively we can materialise HIT results into datasets. How to deal with partial coverage and dynamic datasets. Optimal user interfaces of graph-like content. Pricing and workers’ assignment. 22 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)
  • 22. QUESTIONS 23 07.06.2012 CrowdSearch 2012 - A semantically enabled architecture for crowdsourced Institut für Angewandte Informatik und Formale Linked Data management Beschreibungsverfahren (AIFB)