SlideShare a Scribd company logo
SPARQL - Query
   Language for RDF

Fulvio Corno, Laura Farinetti
Politecnico di Torino
Dipartimento di Automatica e Informatica
e-Lite Research Group – http://elite.polito.it
The “new” Semantic Web vision
    To make data machine processable, we need:
        Unambiguous    names for resources (that may also
         bind data to real world objects): URIs
        A common data model to access, connect, describe
         the resources: RDF
        Access to that data: SPARQL
        Define common vocabularies: RDFS, OWL, SKOS
        Reasoning logics: OWL, Rules

    “SPARQL will make a huge difference” (Tim
     Berners-Lee, May 2006)
F. Corno, L. Farinetti - Politecnico di Torino               2
The Semantic Web timeline




F. Corno, L. Farinetti - Politecnico di Torino   3
SPARQL basics
SPARQL
    Queries are very important for distributed RDF
     data
        Complex    queries into the RDF data are often
         necessary
        E.g.: “give me the (a,b) pair of resources, for which
         there is an x such that (x parent a) and (b brother x)
         holds” (i.e., return the uncles)
    This is the goal of SPARQL (Query Language
     for RDF)

F. Corno, L. Farinetti - Politecnico di Torino                    5
SPARQL
    SPARQL 1.0: W3C Recommendation January
     15th, 2008
    SPARQL 1.1: W3C Working Draft January 5th,
     2012
    SPARQL queries RDF graphs
        An        RDF graph is a set of triples
    SPARQL can be used to express queries across
     diverse data sources, whether the data is stored
     natively as RDF or viewed as RDF via
     middleware
F. Corno, L. Farinetti - Politecnico di Torino      6
SPARQL and RDF
    It is the triples that matter, not the serialization
        RDF/XML     is the W3C recommendation but it not a
           good choice because it allows multiple ways to
           encode the same graph
    SPARQL uses the Turtle syntax, an
     N-Triples extension




F. Corno, L. Farinetti - Politecnico di Torino                7
Turtle - Terse RDF Triple
Language            N-Triples ⊂ Turtle ⊂ N3

    A serialization format for RDF
    A subset of Tim Berners-Lee and Dan
     Connolly’s Notation 3 (N3) language
        Unlike           full N3, doesn’t go beyond RDF’s graph model
    A superset of the minimal N-Triples format
    Turtle has no official status with any standards
     organization, but has become popular amongst
     Semantic Web developers as a human-friendly
     alternative to RDF/XML
F. Corno, L. Farinetti - Politecnico di Torino                           8
“Triple” or “Turtle” notation




F. Corno, L. Farinetti - Politecnico di Torino   9
“Triple” or “Turtle” notation
  <http://www.w3.org/People/EM/contact#me>
  <http://www.w3.org/2000/10/swap/pim/contact#fullName>
  "Eric Miller" .

  <http://www.w3.org/People/EM/contact#me>
  <http://www.w3.org/2000/10/swap/pim/contact#mailbox>
  <mailto:em@w3.org> .

  <http://www.w3.org/People/EM/contact#me>
  <http://www.w3.org/2000/10/swap/pim/contact#personalTitle>
  "Dr." .

  <http://www.w3.org/People/EM/contact#me>
  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
  <http://www.w3.org/2000/10/swap/pim/contact#Person> .




F. Corno, L. Farinetti - Politecnico di Torino                 10
“Triple” or “Turtle” notation
(abbreviated)
        w3people:EM#me contact:fullName "Eric Miller" .

        w3people:EM#me contact:mailbox <mailto:em@w3.org> .

        w3people:EM#me contact:personalTitle "Dr." .

        w3people:EM#me rdf:type contact:Person .




F. Corno, L. Farinetti - Politecnico di Torino                11
Turtle - Terse RDF Triple
Language
    Plain text syntax for RDF
        Based               on Unicode
 Mechanisms for namespace abbreviation
 Allows grouping of triples according to
  subject
 Shortcuts for collections


F. Corno, L. Farinetti - Politecnico di Torino   12
Turtle - Terse RDF Triple
Language
    Simple triple:
     subject predicate object .
                                                 :john rdf:label "John" .

    Grouping triples:
     subject predicate object ; predicate object ...
         :john
           rdf:label "John" ;
           rdf:type ex:Person ;
           ex:homePage http://example.org/johnspage/ .


F. Corno, L. Farinetti - Politecnico di Torino                              13
Prefixes
    Mechanism for namespace abbreviation
                                     @prefix abbr: <URI>

    Example:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

    Default:
                        @prefix : <URI>

    Example:
                    @prefix : <http://example.org/myOntology#>

F. Corno, L. Farinetti - Politecnico di Torino                   14
Identifiers
    URIs: <URI>
                  http://www.w3.org/1999/02/22-rdf-syntax-ns#

    Qnames (Qualified names)
        namespace-abbr?:localname
                                                              rdf:type
                                                              dc:title
                                                              :hasName
    Literals
        "string"(@lang)?(ˆˆtype)?

                                                 "John"
                                                 "Hello"@en-GB
                                                 "1.4"^^xsd:decimal
F. Corno, L. Farinetti - Politecnico di Torino                           15
Blank nodes
    Simple blank node:
        []         or _:x                       :john ex:hasFather [] .
                                                 :john ex:hasFather _:x .


    Blank node as subject:
          [ predicate object ; predicate object ... ] .


                    [ ex:hasName "John"] .
                    [ ex:authorOf :lotr ;
                      ex:hasName "Tolkien"] .



F. Corno, L. Farinetti - Politecnico di Torino                              16
Blank nodes




F. Corno, L. Farinetti - Politecnico di Torino   17
Collections
    ( object1 ... objectn )

                  :doc1 ex:hasAuthor (:john :mary) .



    Short for
                                       :doc1 ex:hasAuthor
                                          [ rdf:first :john;
                                            rdf:rest [ rdf:first :mary;
                                            rdf:rest rdf:nil ]
                                          ] .


F. Corno, L. Farinetti - Politecnico di Torino                            18
Example
@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntaxns# .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix : <http://example.org/#> .

<http://www.w3.org/TR/rdf-syntax-grammar>
  dc:title "RDF/XML Syntax Specification (Revised)" ;
  :editor [
    :fullName "Dave Beckett";
    :homePage <http://purl.org/net/dajobe/>
  ] .




F. Corno, L. Farinetti - Politecnico di Torino              19
RDF Triplestores
    Basically, databases for triples




    Triplestores do not only store triples, but allow to
     extract the “interesting” triples, via SPARQL
     queries
F. Corno, L. Farinetti - Politecnico di Torino              20
Comparison
Relational database                              RDF Triplestore
 Data model                                      Data model
          Relational data (tables)                     RDF graphs
    Data instances                                 Data instances
          Records in tables                            RDF triples
    Query support                                  Query support
          SQL                                          SPARQL
    Indexing mechanisms                            Indexing mechanisms
          Optimized for evaluating                     Optimized for evaluating
           queries as relational                        Queries as graph patterns
           expressions

F. Corno, L. Farinetti - Politecnico di Torino                                       21
SPARQL
    Uses SQL-like syntax
                                                        Prefix mechanism to abbreviate URIs

        PREFIX dc: <http://purl.org/dc/elements/1.1/>
        SELECT ?title
        WHERE { <http://example.org/book/book1> dc:title
        ?title }


                                                         Variables to be returned
   Query pattern (list of triple patterns)

                                                 FROM      Name of the graph

F. Corno, L. Farinetti - Politecnico di Torino                                          22
SELECT
 Variables selection
                                                                    ?x
 Variables: ?string                                                ?title
                                                                    ?name


    Syntax: SELECT var1,…,varn

                                                 SELECT ?name
                                                 SELECT ?x,?title




F. Corno, L. Farinetti - Politecnico di Torino                               23
WHERE
    Graph patterns to match
    Set of triples
       { (subject predicate object .)* }
    Subject: URI, QName, Blank node, Literal,
     Variable
    Predicate: URI, QName, Blank node, Variable
    Object: URI, QName, Blank node, Literal,
     Variable


F. Corno, L. Farinetti - Politecnico di Torino     24
Graph patterns
   The pattern contains unbound symbols
   By binding the symbols (if possible), subgraphs
    of the RDF graph are selected
   If there is such a selection, the query returns the
    bound resources




F. Corno, L. Farinetti - Politecnico di Torino            25
Graph patterns

   E.g.: (subject,?p,?o)
      ?p        and ?o are “unknowns”




F. Corno, L. Farinetti - Politecnico di Torino   26
Graph patterns                                   SELECT ?p ?o
                                                 WHERE {subject ?p ?o}



    The triplets in WHERE define the graph pattern,
     with ?p and ?o “unbound” symbols
    The query returns a list of matching p,o pairs




F. Corno, L. Farinetti - Politecnico di Torino                           27
Example 1



SELECT ?cat, ?val
WHERE { ?x rdf:value ?val.
        ?x category ?cat }


    Returns:
[["Total Members",100],["Total Members",200],…,
["Full Members",10],…]
F. Corno, L. Farinetti - Politecnico di Torino    28
Example 2



SELECT ?cat, ?val
WHERE { ?x rdf:value ?val.
        ?x category ?cat.
        FILTER(?val>=200). }

    Returns:
              [["Total Members",200],…]

F. Corno, L. Farinetti - Politecnico di Torino   29
Example 3



               SELECT ?cat, ?val, ?uri
               WHERE { ?x rdf:value ?val.
                       ?x category ?cat.
                       ?al contains ?x.
                       ?al linkTo ?uri }

     Returns:
              [["Total Members",100,http://...)],…,]

F. Corno, L. Farinetti - Politecnico di Torino         30
Example 4



 SELECT ?cat, ?val, ?uri
 WHERE { ?x rdf:value ?val.
          ?x category ?cat.
 OPTIONAL ?al contains ?x.
          ?al linkTo ?uri }

     Returns:
                        [["Total Members",100,http://...], …,
                        ["Full Members",20, ],…,]
F. Corno, L. Farinetti - Politecnico di Torino                  31
Other SPARQL Features
    Limit the number of returned results
    Remove duplicates, sort them,…
    Specify several data sources (via URI-s) within
     the query (essentially, a merge)
    Construct a graph combining a separate pattern
     and the query results
    Use datatypes and/or language tags when
     matching a pattern

F. Corno, L. Farinetti - Politecnico di Torino     32
SPARQL use in practice
    Locally, i.e., bound to a programming
     environments like Jena
        Jena    is a Java framework for building Semantic Web
           applications; provides an environment for RDF, RDFS
           and OWL, SPARQL and includes a rule-based
           inference engine
    Remotely, e.g., over the network or into a
     database


F. Corno, L. Farinetti - Politecnico di Torino               33
Providing RDF on the Web
    RDF data usually resides in a RDF database
     (triplestore)
        …how              do we ‘put them out’ on the web?
    SPARQL endpoints (SPARQL query over HTTP)
        Direct           connection to triplestore over HTTP




F. Corno, L. Farinetti - Politecnico di Torino                  34
Example of SPARQL endpoint


                                                     Dataset




                                                 SPARQL query




F. Corno, L. Farinetti - Politecnico di Torino                  35
Providing RDF
on the Web




F. Corno, L. Farinetti - Politecnico di Torino   36
Exposing RDF on the web
    Problem: usually HTML content and RDF data are
     separate
                    Web
                    Page
                   (HTML)
                                   RDF
                                   data
                                  (XML)




F. Corno, L. Farinetti - Politecnico di Torino    37
Exposing RDF on the web
    Separate HTML content and RDF data
    Maintenance problem
        Both need to be managed separately
        RDF content and web content have much overlap
         (redundancy)
        RDF/XML difficult to author: extra overhead
    Verification problem
        How            to differences as content changes?
    Visibility problem
        Easy    to ignore the RDF content (out of sight, out
           of mind)
F. Corno, L. Farinetti - Politecnico di Torino                  38
Exposing RDF on the web
    Solution: embed RDF into web content using
     RDFa
               Web
               Page
              (HTML)
                               RDF
                               data
                              (XML)




                                                 ‘Embed’ RDF into HTML


F. Corno, L. Farinetti - Politecnico di Torino                           39
RDFa
        W3C              Recommendation (October, 2008)
        Set  of extensions to XHTML that allows to
           annotate XHTML markup with semantics
        Uses    attributes from XHTML meta and link
           elements, and generalizes them so that they
           are usable on all elements
       A     simple mapping is defined so that RDF
           triples may be extracted

F. Corno, L. Farinetti - Politecnico di Torino             40
Exposing RDF on the web
    RDFa: Resource Description Framework-in-attributes




    Extra (RDFa) markup is ignored by web browsers
F. Corno, L. Farinetti - Politecnico di Torino            41
XHTML + RDFa example
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
   "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
   xmlns:foaf="http://xmlns.com/foaf/0.1/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   version="XHTML+RDFa 1.0" xml:lang="en">
  <head>
    <title>John's Home Page</title>
    <base href="http://example.org/john-d/" />
    <meta property="dc:creator" content="Jonathan Doe" />
  </head>
  <body>
    <h1>John's Home Page</h1>
    <p>My name is <span property="foaf:nick">John D</span> and I like
      <a href="http://www.neubauten.org/" rel="foaf:interest"
      xml:lang="de">Einstürzende Neubauten</a>. </p>
    <p> My <span rel="foaf:interest" resource="urn:ISBN:0752820907">
      favorite book</span> is the inspiring
      <span about="urn:ISBN:0752820907“><cite property="dc:title">
      Weaving the Web</cite> by <span property="dc:creator">Tim
      Berners-Lee</span></span> </p>
  </body>
</html>
 F. Corno, L. Farinetti - Politecnico di Torino                  42
Automatic conversion to RDF/XML
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:dc="http://purl.org/dc/elements/1.1/">
  <rdf:Description rdf:about="http://example.org/john-d/">
    <dc:creator xml:lang="en">Jonathan Doe</dc:creator>
    <foaf:nick xml:lang="en">John D</foaf:nick>
    <foaf:interest rdf:resource="http://www.neubauten.org/"/>
    <foaf:interest>
      <rdf:Description rdf:about="urn:ISBN:0752820907">
        <dc:creator xml:lang="en">Tim Berners-Lee</dc:creator>
        <dc:title xml:lang="en">Weaving the Web</dc:title>
      </rdf:Description>
    </foaf:interest>
  </rdf:Description>
</rdf:RDF>



   F. Corno, L. Farinetti - Politecnico di Torino            43
RDFa annotations
    Less than 5% of web pages have RDFa
     annotations (Google, 2010)
    However, many organizations already publish
     or consume RDFa
        Google, Yahoo
        Facebook, MySpace, LinkedIn
        Best Buy, Tesco, O’Reilly
        SlideShare, Digg
        WhiteHouse.gov, Library of Congress, UK
         government
        Newsweek, BBC
F. Corno, L. Farinetti - Politecnico di Torino     44
RDFa is not the only solution …




   Source: Peter Mika (Yahoo!), RDFa, 2011


F. Corno, L. Farinetti - Politecnico di Torino   45
Rich snippets
    Several solutions for embedding semantic data
     in Web
    Three syntaxes known (by Google) as “rich
     snippets”
        Microformats
        RDFa
        HTML             microdata
    All three are supported by Google, while
     microdata is the “recommended” syntax

F. Corno, L. Farinetti - Politecnico di Torino       46
First came microformats
    Microformats emerged around 2005
    Some key principles
        Startby solving simple, specific problems
        Design for humans first, machines second

    Wide deployment
        Used           on billions of Web pages
    Formats exist for marking up atom feeds,
     calendars, addresses and contact info, geo-
     location, multimedia, news, products, recipes,
     reviews, resumes, social relationships, etc.
F. Corno, L. Farinetti - Politecnico di Torino        47
Microformats example
<div class="vcard">
  <a class="fn org url"
     href="http://www.commerce.net/">CommerceNet</a>
  <div class="adr">
    <span class="type">Work</span>:
    <div class="street-address">169 University Avenue</div>
    <span class="locality">Palo Alto</span>,
    <abbr class="region"
      title="California">CA</abbr>&nbsp;&nbsp;
    <span class="postal-code">94301</span>
    <div class="country-name">USA</div>
  </div>
  <div class="tel">
    <span class="type">Work</span> +1-650-289-4040
  </div>
  <div>Email:
    <span class="email">info@commerce.net</span>
  </div>
</div>
F. Corno, L. Farinetti - Politecnico di Torino           48
Then came RDFa
    RDFa aims to bridge the gap between human
     oriented HTML and machine-oriented RDF
     documents
    Provides XHTML attributes to indicate machine
     understandable information
    Uses the RDF data model, and Semantic Web
     vocabularies directly



F. Corno, L. Farinetti - Politecnico di Torino       49
RDFa example

<div typeof="foaf:Person"
             xmlns:foaf="http://xmlns.com/foaf/0.1/">
<p property="foaf:name">Alice Birpemswick</p>
<p>Email: <a rel="foaf:mbox"
   href="mailto:alice@example.com">alice@example.com</a>
</p>
<p>Phone: <a rel="foaf:phone" href="tel:+1-617-555-7332">
   +1 617.555.7332</a>
</p>
</div>




F. Corno, L. Farinetti - Politecnico di Torino              50
Last but not least, microdata
    Microdata syntax is based on nested groups of
     name-value pairs
    HTML microdata specification includes
        An unambiguous parsing model
        An algorithm to convert microdata to RDF

    Compatible with the Semantic Web via
     mappings



F. Corno, L. Farinetti - Politecnico di Torino       51
Microdata syntax
    Microdata properties
        Annotate   an item with text-valued properties using the
           “itemprop” attribute



  <div itemscope>
    <p>My name is <span itemprop="name">Daniel</span>.</p>
  </div>




F. Corno, L. Farinetti - Politecnico di Torino                 52
Microdata syntax
    Multiple values are ok
        As  in RDF, you can have two properties, for the
           same item (subject) with the same value (object)

         <div itemscope>
           <p>Flavors in my favorite ice cream:</p>
           <ul>
             <li itemprop="flavor">Lemon sorbet</li>
             <li itemprop="flavor">Apricot sorbet</li>
           </ul>
         </div>




F. Corno, L. Farinetti - Politecnico di Torino                53
Microdata syntax
    Item types
        Correspond                        to classes in RDF

        <section itemscope
                 itemtype="http://example.org/animals#cat">
          <h1 itemprop="name">Hedral</h1>
          <p itemprop="desc">Hedral is a male american
             domestic shorthair, with a fluffy black fur with
             white paws and belly.</p>
          <img itemprop="img" src="hedral.jpeg" alt=""
               title="Hedral, age 18 months">
        </section>


F. Corno, L. Farinetti - Politecnico di Torino                  54
Microdata syntax
    Global IDs
        Items may be given global identifiers, which are URLs
        They may be, but do not need to be Semantic Web URIs

    <dl itemscope
      itemtype="http://vocab.example.net/book"
      itemid="urn:isbn:0-330-34032-8">
      <dt>Title</dt>
        <dd itemprop="title">The Reality Dysfunction</dd>
      <dt>Author</dt>
        <dd itemprop="author">Peter F. Hamilton</dd>
      <dt>Publication date</dt>
        <dd><time itemprop="pubdate" datetime="1996-01-26">
            26 January 1996</time></dd>
    </dl>

F. Corno, L. Farinetti - Politecnico di Torino                   55
Schema.org
    Schema.org is one of a number of microdata
     vocabularies
        It  is a shared collection of microdata schemas for use
           by webmasters
    Includes a type hierarchy, like an RDFS schema
        Starts with top-level Thing (which has four properties:
         name, description, url, and image) and DataType types
        More specific types share properties with broader
         types; for example, a Place is a more specific type of
         Thing, and a TouristAttraction is a more specific type of
         Place
        More specific items inherit the properties of their parent

F. Corno, L. Farinetti - Politecnico di Torino                  56
Example
 <div itemscope itemtype="http://schema.org/Movie">
   <h1 itemprop="name"&g;Avatar</h1>
   <div itemprop="director" itemscope
     itemtype="http://schema.org/Person"> Director:
     <span itemprop="name">James Cameron</span>(born
     <span itemprop="birthDate">August 16, 1954)</span>
   </div>
   <span itemprop="genre">Science fiction</span>
   <a href="../movies/avatar-theatrical-trailer.html"
     itemprop="trailer">Trailer</a>
 </div>




F. Corno, L. Farinetti - Politecnico di Torino            57
Current schema.org types




F. Corno, L. Farinetti - Politecnico di Torino   58
Schema.org full hierarchy




                                                 http://www.schema.org/docs/full.html




F. Corno, L. Farinetti - Politecnico di Torino                                  59
Item
examples




F. Corno, L. Farinetti - Politecnico di Torino   60
F. Corno, L. Farinetti - Politecnico di Torino   61
SPARQL use in practice
    Where to find meaningful RDF data to search?

    The Linked Data Project




F. Corno, L. Farinetti - Politecnico di Torino      62
The Linked Data
Project
The Linked Data Project
     A fundamental prerequisite of the Semantic Web is
      the existence of large amounts of meaningfully
      interlinked RDF data on the Web
     “To make the Semantic Web a reality, it is necessary
      to have a large volume of data available on the Web
      in a standard, reachable and manageable format. In
      addition the relationships among data also need to be
      made available. This collection of interrelated data on
      the Web can also be referred to as Linked Data.
      Linked Data lies at the heart of the Semantic Web:
      large scale integration of, and reasoning on, data on
      the Web.” (W3C)
    F. Corno, L. Farinetti - Politecnico di Torino          64
The Linked Data Project
    Linked Data is about using the Web to connect
     related data that wasn’t previously linked, or
     using the Web to lower the barriers to linking
     data currently linked using other methods
    Linked Data is a set of principles that allows
     publishing, querying and browsing of RDF data,
     distributed across different servers



F. Corno, L. Farinetti - Politecnico di Torino        65
The Linked Data Project
    Community effort to make various open datasets
     available on the Web as RDF and to set RDF
     links between data items from different datasets
    The datasets are published according to the
     Linked Data principles and can therefore be
     crawled by Semantic Web search engines and
     navigated using Semantic Web browsers
    Supported by W3C
    Began early 2007
        http://linkeddata.org/home
F. Corno, L. Farinetti - Politecnico di Torino      66
The Web of Documents
    Analogy: a global filesystem
    Designed for human consumption
    Primary objects: documents
    Links between documents (or sub-parts)
    Degree of structure in objects: fairly low
    Semantics of content and links: implicit



F. Corno, L. Farinetti - Politecnico di Torino    67
The Web of Linked Data
    Analogy: a global database
    Designed for machines first, humans later
    Primary objects: things (or descriptions of things)
    Links between things
    Degree of structure in (descriptions of) things:
     high
    Semantics of content and links: explicit



F. Corno, L. Farinetti - Politecnico di Torino        68
Linked Data example




F. Corno, L. Farinetti - Politecnico di Torino   69
Linked Data example




F. Corno, L. Farinetti - Politecnico di Torino   70
Why publish Linked Data?
    Ease of discovery
    Ease of consumption
        Standards-based                         data sharing
    Reduced redundancy
    Added value
        Build          ecosystems around your data/content




F. Corno, L. Farinetti - Politecnico di Torino                  71
Linked Open Data cloud                           May 2007




F. Corno, L. Farinetti - Politecnico di Torino              72
DBpedia
    DBpedia is a community effort to extract
     structured information from Wikipedia
     and to make this information available on the
     Web
    DBpedia allows to ask sophisticated queries
     against Wikipedia, and to link other data sets
     on the Web to Wikipedia data



F. Corno, L. Farinetti - Politecnico di Torino        73
GeoNames
    GeoNames is a geographical database that
     contains over eight million geographical names
    Available for download free of charge under a
     creative commons attribution license




F. Corno, L. Farinetti - Politecnico di Torino        74
Main contributors
    DBLP Computer science                          Project Gutenberg Literary works
     bibliography                                    in the public domain
          Richard Cyganiak, Chris Bizer (FU             Piet Hensel, Hans Butschalowsky
           Berlin)                                        (FU Berlin)
    DBpedia Structured information                 Revyu Community reviews about
     from Wikipedia                                  anything
          Universität Leipzig, FU Berlin,               Tom Heath, Enrico Motta (Open
           OpenLink                                       University)
    DBtune, Jamendo Creative                       RDF Book Mashup Books from
     Commons music repositories                      the Amazon API
          Yves Raimond (University of                   Tobias Gauß, Chris Bizer (FU
           London)                                        Berlin)
    Geonames World-wide                            US Census Data Statistical
     geographical database                           information about the U.S.
          Bernard Vatant (Mondeca), Marc                Josh Tauberer (University of
           Wick (Geonames)                                Pennsylvania), OpenLink
    Musicbrainz Music and artist                   World Factbook Country
     database                                        statistics, compiled by CIA
          Frederick Giasson, Kingsley                   Piet Hensel, Hans Butschalowsky
           Idehen (Zitgist)                               (FU Berlin)

F. Corno, L. Farinetti - Politecnico di Torino                                            75
July 2007




F. Corno, L. Farinetti - Politecnico di Torino               76
August 2007




F. Corno, L. Farinetti - Politecnico di Torino                 77
November 2007




F. Corno, L. Farinetti - Politecnico di Torino                   78
February 2008




F. Corno, L. Farinetti - Politecnico di Torino               79
September 2008




F. Corno, L. Farinetti - Politecnico di Torino                80
March 2009




F. Corno, L. Farinetti - Politecnico di Torino                81
July 2009




F. Corno, L. Farinetti - Politecnico di Torino               82
September 2010




F. Corno, L. Farinetti - Politecnico di Torino               83
September 2010




F. Corno, L. Farinetti - Politecnico di Torino               84
September 2011




F. Corno, L. Farinetti - Politecnico di Torino               85
Statistics on datasets
    http://ckan.net/group/lodcloud
    http://www4.wiwiss.fu-berlin.de/lodcloud/




F. Corno, L. Farinetti - Politecnico di Torino   86
Statistics on links between datasets




F. Corno, L. Farinetti - Politecnico di Torino   87
Linked Data success stories
    BBC Music
        Integrates information from MusicBrainz and
         Wikipedia for artist/band infopages
        Information also available in RDF (in addition to web
         pages)
        3rd party applications built on top of the BBC data
        BBC also contributes data back to the MusicBrainz
    Nytimes
        Maps    its thesaurus of 1 million entity descriptions
           (people, organisations, places, etc) to Dbpedia and
           Freebase

F. Corno, L. Farinetti - Politecnico di Torino                    88
Linked Data shopping list
    List of sites/datasets that the “community” would
     like to see published as Linked Data
        This   list may form the basis for some campaign/action
           to encourage these data publishers to embrace
           Linked Data

    http://linkeddata.org/linked-data-shopping-list




F. Corno, L. Farinetti - Politecnico di Torino                 89
The Linked Data principles
(“expectations of behavior”)
    The Semantic Web isn't just about putting data
     on the web. It is about making links, so that a
     person or machine can explore the web of
     data. With linked data, when you have some of
     it, you can find other, related, data
    It is the unexpected re-use of information
     which is the value added by the web
                                                 (Tim Berners-Lee)



F. Corno, L. Farinetti - Politecnico di Torino                       90
The Linked Data principles
(“expectations of behavior”)
    Unambiguous identifiers for objects (resources)
        Use         URIs as names for things
    Use the structure of the web
        Use  HTTP URIs so that people can look up the
           names
    Make is easy to discover information about an
     object (resource)
        When    someone lookups a URI, provide useful
           information
    Link the object (resource) to related objects
        Include             links to other URIs
F. Corno, L. Farinetti - Politecnico di Torino           91
Link to existing vocabularies
    For describing classes (categories) and properties
     (relationships), try to re-use existing vocabularies
        Easier  to interoperate if we’re talking the same
           language!
    Many vocabularies/ontologies out there
        schema.org is a great place to start looking!
        Vocabs for products (Good Relations), people
           (FOAF), social media (SIOC), places, events,
           businesses, e-commerce, music, etc., you name it…
    If nothing relevant, you can create your own, but
     make sure you…
        Publish it!
        Reconcile                (map) it to other vocabularies, if you can
F. Corno, L. Farinetti - Politecnico di Torino                                 92
Link to other datasets
    Popular predicates for linking

        owl:equivalentClass                                 foaf:homepage
        owl:sameAs                                          foaf:topic
        rdfs:seeAlso                                        foaf:based_near
        skos:closeMatch                                     foaf:maker/foaf:made
        skos:exactMatch                                     foaf:page
        skos:related                                        foaf:primaryTopic

    Example:
                                            http://dbpedia.org/resource/Canberra
                                                owl:sameAs
                                            http://rdf.freebase.com/rdf/en.canberra

F. Corno, L. Farinetti - Politecnico di Torino                                        93
Link to other Data Sets
                                                   (Wikicompany is a free content
                                                   licensed worldwide business
                                                   directory that anyone can edit)




                                                 (flickr wrappr extends DBpedia
                                                 with RDF links to photos posted
                                                 on flickr)

F. Corno, L. Farinetti - Politecnico di Torino                                     94
Linked Data – open issues
    Schema diversity & proliferation
    Quality of data is poor
    No kind of consistency is guarantees
    Issues with reliability of data end-points
        High down-time is not unusual
        There is no Service Level Agreement provided
    Querying of linked data is slow
        Data  is distributed on the web
        Even single SPARQL endpoints can be slow
        Most end-points are experimental/research projects
         with no resources for quality guarantees
    Licensing issues
        Majority             of datasets carry no explicit open license
F. Corno, L. Farinetti - Politecnico di Torino                             95
Linked Data tools
    Tools for Publishing Linked Data
        D2R   Server: a tool for publishing relational databases
         as Linked Data
        Triplify: transforms relational data into
         RDF/LinkedData
        Pubby: a Linked Data frontend for SPARQL endpoints

    Tools for consuming Linked Data
        Semantic Web Browsers and Client Libraries
        Semantic Web Search Engines




F. Corno, L. Farinetti - Politecnico di Torino                 96
Pubby
    Many triple stores and other SPARQL endpoints
     can be accessed only by SPARQL client
     applications that use the SPARQL protocol
        It  cannot be accessed by the growing variety of
           Linked Data clients
    Pubby is designed to provide a Linked Data
     interface to those RDF data sources
    http://www4.wiwiss.fu-berlin.de/pubby/


F. Corno, L. Farinetti - Politecnico di Torino              97
Pubby




F. Corno, L. Farinetti - Politecnico di Torino   98
Linked Data browsers – Marbles
    http://marbles.sourceforge.net
    XHTML views of RDF data (SPARQL endpoint), caching,
     predicate traversal




F. Corno, L. Farinetti - Politecnico di Torino        99
Linked Data browsers – RelFinder
    http://relfinder.dbpedia.org
    Explore & navigate relationships in a RDF graph




F. Corno, L. Farinetti - Politecnico di Torino         100
Linked Data browsers – gFacet
    http://semanticweb.org/wiki/GFacet
    Graph based visualisation & faceted filtering of RDF data




F. Corno, L. Farinetti - Politecnico di Torino              101
Linked Data browsers – Forest
    Front-end to FactForge and LinkedLifeData




F. Corno, L. Farinetti - Politecnico di Torino   102
FactForge and LinkedLifeData
    FactForge
          Integrates some of the most central LOD datasets
          General-purpose information (not specific to a domain)
          1.2B explicit plus 1B inferred statements
          The largest upper-level knowledge base
          http://www.FactForge.net/
    LinkedLifeData
        25 of the most popular life-science datasets
        2.7B explicit and 1.4B inferred triples
        http://www.LinkedLifeData.com


F. Corno, L. Farinetti - Politecnico di Torino                      103
Linked Data browsers – Information Workbench
     http://iwb.fluidops.com/resource/Help:Start




 F. Corno, L. Farinetti - Politecnico di Torino     104
F. Corno, L. Farinetti - Politecnico di Torino   105
F. Corno, L. Farinetti - Politecnico di Torino   106
http://it.ckan.net/




F. Corno, L. Farinetti - Politecnico di Torino                         107
SPARQL syntax
SPARQL query structure
    A SPARQL query includes, in order
        Prefix declarations, for abbreviating URIs
        A result clause, identifying what information to return
         from the query
        The query pattern, specifying what to query for in the
         underlying dataset
        Query modifiers: slicing, ordering, and otherwise
         rearranging query results




F. Corno, L. Farinetti - Politecnico di Torino                 109
SPARQL query structure
    A SPARQL query includes, in order
                # prefix declarations
                PREFIX foo: <http://example.com/resources/>
                ...
                # result clause
                SELECT ...
                # query pattern
                WHERE {
                    ...
                }
                # query modifiers
                ORDER BY ...


F. Corno, L. Farinetti - Politecnico di Torino                110
Dataset: Friend of a Friend (FOAF)
    FOAF is a standard RDF vocabulary for describing
     people and relationships
    Tim Berners-Lee's FOAF information available at
     http://www.w3.org/People/Berners-Lee/card

@prefix card: <http://www.w3.org/People/Berners-Lee/card#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
card:i foaf:name "Timothy Berners-Lee" .
<http://bblfish.net/people/henry/card#me>
foaf:name "Henry Story" .
<http://www.cambridgesemantics.com/people/about/lee>
foaf:name "Lee Feigenbaum" .
card:amy foaf:name "Amy van der Hiel" .
...

F. Corno, L. Farinetti - Politecnico di Torino           111
Example 1 – simple triple pattern
    In the graph http://www.w3.org/People/Berners-Lee/card,
     find all subjects (?person) and objects (?name) linked
     with the foaf:name predicate.
    Then return all the values of ?name.
    In other words, find all names mentioned in Tim Berners-
     Lee’s FOAF file
                                                 PREFIX foaf:
                                                 <http://xmlns.com/foaf/0.1/>
                                                 SELECT ?name
                                                 WHERE {
                                                     ?person foaf:name ?name .
                                                 }


F. Corno, L. Farinetti - Politecnico di Torino                                   112
SPARQL endpoints
    Accept queries and returns results via HTTP
        Generic endpoints queries any Web-accessible RDF data
        Specific endpoints are hardwired to query against
         particular datasets
    The results of SPARQL queries can be returned in a
     variety of formats:
        XML, JSON, RDF, HTML
        JSON (JavaScript Object Notation): lightweight computer
         data interchange format; text-based, human-readable
         format for representing simple data structures and
         associative arrays

F. Corno, L. Farinetti - Politecnico di Torino                     113
SPARQL endpoints
    This query is for an arbitrary bit of RDF data
     (Tim Berners-Lee's FOAF file)
    => generic endpoint to run it
    Possible choices
        SPARQLer - General purpose               processor - sparql.org
           http://sparql.org/sparql.html

        OpenLink's   Virtuoso (Make sure to choose "Retrieve
            remote RDF data for all missing source graphs")
                 http://bbc.openlinksw.com/sparql/
        Redland’s Rasqal
           http://librdf.org/rasqal/


F. Corno, L. Farinetti - Politecnico di Torino                         114
SPARQLer



                                                  SPARQL query


               Dataset




 F. Corno, L. Farinetti - Politecnico di Torino                  115
OpenLink’s Virtuoso


                                                     Dataset




                                                 SPARQL query




F. Corno, L. Farinetti - Politecnico di Torino                  116
Example 1 - simple triple pattern

     PREFIX foaf:
     <http://xmlns.com/foaf/0.1/>
     SELECT ?name
     WHERE {
         ?person foaf:name ?name .
     }




F. Corno, L. Farinetti - Politecnico di Torino   117
Example 2 – multiple triple pattern
    Find all people in Tim Berners-Lee’s FOAF file that have
     names and email addresses
    Return each person’s URI, name, and email address

    Multiple triple patterns retrieve multiple properties about
     a particular resource
    SELECT * selects all variables mentioned in the query
                                                 PREFIX foaf:
                                                 <http://xmlns.com/foaf/0.1/>
                                                 SELECT *
                                                 WHERE {
                                                     ?person foaf:name ?name .
                                                     ?person foaf:mbox ?email .
                                                 }
F. Corno, L. Farinetti - Politecnico di Torino                                    118
Example 2 - multiple triple pattern




F. Corno, L. Farinetti - Politecnico di Torino   119
Example 3 – traversing a graph
    Find the homepage of anyone known by
     Tim Berners-Lee




F. Corno, L. Farinetti - Politecnico di Torino   120
Example 3 – traversing a graph
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX card: <http://www.w3.org/People/Berners-Lee/card#>
    SELECT ?homepage
    FROM <http://www.w3.org/People/Berners-Lee/card>
    WHERE {
        card:i foaf:knows ?known .
        ?known foaf:homepage ?homepage .
    }

    The FROM keyword specifies the target graph in the
     query
    By using ?known as an object of one triple and the
     subject of another, it is possible to traverse multiple links
     in the graph
F. Corno, L. Farinetti - Politecnico di Torino                  121
Dataset: DBPedia
    DBPedia is an RDF version of information from
     Wikipedia
    Contains data derived from Wikipedia’s
     infoboxes, category hierarchy, article abstracts,
     and various external links
    Contains over 100 million triples
    Dataset: http://dbpedia.org/sparql/



F. Corno, L. Farinetti - Politecnico di Torino           122
Example 4 – exploring DBPedia
    Find 15 example concepts in the DBPedia
     dataset
    SELECT DISTINCT ?concept
    WHERE {
        ?s a ?concept .
    } LIMIT 15




F. Corno, L. Farinetti - Politecnico di Torino   123
Example 4 – exploring DBPedia
    LIMIT is a solution modifier that limits the
     number of rows returned from a query
    SPARQL has two other solution modifiers
        ORDER    BY for sorting query solutions on the value of
         one or more variables
        OFFSET, used in conjunction with LIMIT and ORDER
         BY to take a slice of a sorted solution set (e.g. for
         paging)
    The SPARQL keyword a is a shortcut for the
     common predicate rdf:type (class of a resource)
    The DISTINCT modifier eliminates duplicate
     rows from the query results
F. Corno, L. Farinetti - Politecnico di Torino                124
Example 5 – basic SPARQL filters
    Find all landlocked countries with a population greater
     than 15 million
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX type: <http://dbpedia.org/class/yago/>
    PREFIX prop: <http://dbpedia.org/property/>
    SELECT ?country_name ?population
    WHERE {
        ?country a type:LandlockedCountries ;
                 rdfs:label ?country_name ;
                 prop:populationEstimate ?population .
        FILTER (?population > 15000000) .
    }

    FILTER constraints use boolean conditions to filter out
     unwanted query results
    A semicolon (;) can be used to separate two triple
     patterns that share the same subject
F. Corno, L. Farinetti - Politecnico di Torino                 125
SPARQL filters
 Conditions on literal values
 Syntax
                                             FILTER expression


    Examples
                       FILTER (?age > 30)
                       FILTER isIRI(?x)
                       FILTER !BOUND(?y)



F. Corno, L. Farinetti - Politecnico di Torino                   126
SPARQL filters
    BOUND(var)
        true if var is bound in query answer
        false, otherwise
        !BOUND(var) enables negation-as-failure
    Testing types
        isIRI(A):                    A is an “Internationalized Resource
         Identifier”
        isBLANK(A): A is a blank node
        isLITERAL(A): A is a literal



F. Corno, L. Farinetti - Politecnico di Torino                              127
SPARQL filters
        Comparison between                               A = B
                                                          A != B
         RDF terms
                                                             A   = B
                                                             A   != B
        Comparison between                                  A   <= B
                                                             A   >= B
         Numeric and Date types                              A   < B
                                                             A   > B

        Boolean AND/OR                          A && B
                                                 A || B

                                                             A   +   B
        Basic arithmetic                                    A   -   B
                                                             A   *   B
                                                             A   /   B
F. Corno, L. Farinetti - Politecnico di Torino                           128
Example 5 – basic SPARQL filters

    Note all the translated
     duplicates in the results
    How can we deal with
     that?




F. Corno, L. Farinetti - Politecnico di Torino   129
Example 6 – SPARQL filters
    Find me all landlocked countries with a
     population greater than 15 million (revisited),
     with the highest population country first
       PREFIX type: <http://dbpedia.org/class/yago/>
       PREFIX prop: <http://dbpedia.org/property/>
       SELECT ?country_name ?population
       WHERE {
           ?country a type:LandlockedCountries ;
                    rdfs:label ?country_name ;
                    prop:populationEstimate ?population .
           FILTER (?population > 15000000 &&
                   langMatches(lang(?country_name), "EN")) .
       } ORDER BY DESC(?population)



F. Corno, L. Farinetti - Politecnico di Torino                 130
Example 6 – SPARQL filters
    lang extracts a literal’s language tag, if any
    langMatches matches a language tag against a
     language range




F. Corno, L. Farinetti - Politecnico di Torino    131
Dataset: Jamendo
    Jamendo is a community collection of music all
     freely licensed under Creative Commons
     licenses
        http://www.jamendo.com/it/

    DBTune.org hosts a queryable RDF version of
     information about Jamendo's music collection
        Data   on thousands of artists, tens of thousands of
         albums, and nearly 100,000 tracks
        http://dbtune.org/jamendo/store/


F. Corno, L. Farinetti - Politecnico di Torino                  132
Example 7 – the wrong way
    Find all Jamendo artists along with their image,
     home page, and the location they’re near

                          PREFIX mo: <http://purl.org/ontology/mo/>
                          PREFIX foaf: <http://xmlns.com/foaf/0.1/>
                          SELECT ?name ?img ?hp ?loc
                          WHERE {
                            ?a a mo:MusicArtist ;
                               foaf:name ?name ;
                               foaf:img ?img ;
                               foaf:homepage ?hp ;
                               foaf:based_near ?loc .
                          }



F. Corno, L. Farinetti - Politecnico di Torino                        133
Example 7 – DBTune SPARQL
endpoint

                                                 http://dbtune.org/jamendo/store/




    Jamendo has information on about 3,500 artists
    Trying the query we only get 2,667 results. What's
     wrong?
F. Corno, L. Farinetti - Politecnico di Torino                                  134
Example 7 – the right way
    Not every artist has an image, homepage, or location!
    OPTIONAL tries to match a graph pattern, but doesn't
     fail the whole query if the optional match fails
    If an OPTIONAL pattern fails to match for a particular
     solution, any variables in that pattern remain unbound
     (no value) for that solution
                             PREFIX mo: <http://purl.org/ontology/mo/>
                             PREFIX foaf: <http://xmlns.com/foaf/0.1/>
                             SELECT ?name ?img ?hp ?loc
                             WHERE {
                               ?a a mo:MusicArtist ;
                                  foaf:name ?name .
                               OPTIONAL { ?a foaf:img ?img }
                               OPTIONAL { ?a foaf:homepage ?hp }
                               OPTIONAL { ?a foaf:based_near ?loc }
                             }

F. Corno, L. Farinetti - Politecnico di Torino                           135
Dataset: GovTrack
    GovTrack provides SPARQL access to data on
     the U.S. Congress
    Contains over 13,000,000 triples about
     legislators, bills, and votes
    http://www.govtrack.us/




F. Corno, L. Farinetti - Politecnico di Torino    136
Example 8 – querying alternatives
    Find Senate bills that either John McCain or Barack
     Obama sponsored and the other cosponsored
     PREFIX bill: <http://www.rdfabout.com/rdf/schema/usbill/>
     PREFIX dc: <http://purl.org/dc/elements/1.1/>
     PREFIX foaf: <http://xmlns.com/foaf/0.1/>
     SELECT ?title ?sponsor ?status
     WHERE {
       { ?bill bill:sponsor ?mccain ; bill:cosponsor ?obama . }
           UNION
       { ?bill bill:sponsor ?obama ; bill:cosponsor ?mccain . }
         ?bill a bill:SenateBill ;
              bill:status ?status ;
              bill:sponsor ?sponsor ;
               dc:title ?title .
         ?obama foaf:name "Barack Obama" .
         ?mccain foaf:name "John McCain" .
     }

F. Corno, L. Farinetti - Politecnico di Torino                    137
Example 8 – GovTrack specific
endpoint



                                                 http://www.govtrack.us/developers/rdf.xpd




    The UNION keyword forms a disjunction of two graph
     patterns: solutions to both sides of the UNION are
     included in the results
F. Corno, L. Farinetti - Politecnico di Torino                                         138
RDF datasets
    All queries so far have been against a single graph
    In SPARQL this is known as the default graph
    RDF datasets are composed of a single default graph
     and zero or more named graphs, identified by a URI
    Named graphs can be specified with one or more
     FROM NAMED clauses, or they can be hardwired into a
     particular SPARQL endpoint
    The SPARQL GRAPH keyword allows portions of a
     query to match against the named graphs in the RDF
     dataset
    Anything outside a GRAPH clause matches against the
     default graph
F. Corno, L. Farinetti - Politecnico di Torino         139
Dataset: semanticweb.org
    data.semanticweb.org hosts RDF data regarding
     workshops, schedules, and presenters for the
     International Semantic Web (ISWC) and European
     Semantic Web Conference (ESWC) series of events
    Presents data via FOAF, SWRC, and iCal ontologies
    The data for each individual ISWC or ESWC event is
     stored in its own named graph
          i.e., there is one named graph per conference event contained in
           this dataset
    http://data.semanticweb.org/

F. Corno, L. Farinetti - Politecnico di Torino                           140
Example 9 – querying named graphs

    Find people who have been involved with at
     least three ISWC or ESWC conference events
             PREFIX foaf: <http://xmlns.com/foaf/0.1/>
             SELECT DISTINCT ?person
             WHERE {
                 GRAPH ?g1 { ?person a foaf:Person }
                 GRAPH ?g2 { ?person a foaf:Person }
                 GRAPH ?g3 { ?person a foaf:Person }
                 FILTER(?g1 != ?g2 && ?g1 != ?g3 && ?g2 != ?g3) .
             }




F. Corno, L. Farinetti - Politecnico di Torino                      141
Example 9 – querying named graphs

    The GRAPH ?g construct allows a pattern to match
     against one of the named graphs in the RDF dataset
    The URI of the matching graph is bound to ?g
     (or whatever variable was actually used)
    The FILTER assures that we’re finding a person who
     occurs in three distinct graphs
    The Web interface used for this SPARQL query defines
     the foaf: prefix, which is why it is omitted here



F. Corno, L. Farinetti - Politecnico di Torino          142
Data.semanticweb.org specific
SPARQL endpoint
                                                 http://data.semanticweb.org/snorql/




F. Corno, L. Farinetti - Politecnico di Torino                                         143
SPARQL 1.1 extensions
    Projected expressions
        Adds    the ability for query results to contain values
           derived from constants, function calls, or other
           expressions in the SELECT list
    Aggregates
        Adds   the ability to group results and calculate
           aggregate values (e.g. count, min, max, avg, sum, …)
    Sub-queries
        Allows            one query to the nested within another


F. Corno, L. Farinetti - Politecnico di Torino                      144
SPARQL 1.1 extensions
    Negation
        Includes  improved language syntax for querying
           negations
    Property paths
        Adds   the ability to query arbitrarily lenght path
           through a graph via a regular-expression-like syntax
           known as property paths
    Basic federated query
        Defines   a mechanism for splitting a single query
           among multipleSPAQL endpoints and combining
           together the results from each
F. Corno, L. Farinetti - Politecnico di Torino                    145
SPARQL exercise
Exercises - RDF
 @prefix : <http://example.org/data#> .
 @prefix ont: <http://example.org/myOntology#> .
 @prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> .

 :john
   vcard:FN "John Smith" ;
   vcard:N [
     vcard:Given "John" ;
     vcard:Family "Smith" ] ;
   ont:hasAge 32 ;
   ont:marriedTo :mary .
 :mary
   vcard:FN "Mary Smith" ;
   vcard:N [
     vcard:Given "Mary" ;
     vcard:Family "Smith" ] ;
   ont:hasAge 29 .

F. Corno, L. Farinetti - Politecnico di Torino              147
SPARQL query – exercise 1
    Return the full names of all people in the graph
                              PREFIX vCard:
                              <http://www.w3.org/2001/vcardrdf/3.0#>
                              SELECT ?fullName
                              WHERE {?x vCard:FN ?fullName}


    Result
                                 fullName
                                 ===============
                                 “John Smith”
                                 “Mary Smith”


F. Corno, L. Farinetti - Politecnico di Torino                         148
SPARQL query – exercise 2
    Return the relation between John and Mary
                              PREFIX : <http://example.org/data#>
                              SELECT ?p
                              WHERE { :john ?p :mary }



    Result
                p
                =================
                <http://example.org/myOntology#marriedTo>



F. Corno, L. Farinetti - Politecnico di Torino                      149
SPARQL query – exercise 3
    Return the spouse of a person whose name is
     John Smith
                               PREFIX vCard:
                               <http://www.w3.org/2001/vcard-rdf/3.0#>
                               PREFIX ont: <http://example.org/myOntology#>
                               SELECT ?y
                               WHERE {?x vCard:FN "John Smith".
                                      ?x ont:marriedTo ?y}
    Result
                  y
                  =================
                  <http://example.org/data#mary>

F. Corno, L. Farinetti - Politecnico di Torino                           150
SPARQL query – exercise 4
    Return the name and the first name of all people
     in the knowledge base
                               PREFIX vCard:
                               <http://www.w3.org/2001/vcard-rdf/3.0#>
                               SELECT ?name, ?firstName
                               WHERE {?x vCard:N ?name .
                                      ?name vCard:Given ?firstName}

    Result
                                             name    firstName
                                         ========================
                                         “John Smith” "John"
                                         “Mary Smith” "Mary"

F. Corno, L. Farinetti - Politecnico di Torino                           151
SPARQL query – exercise 5
    Return all people over 30 in the knowledge base

                         PREFIX ont: <http://example.org/myOntology#>
                         SELECT ?x
                         WHERE {?x ont:hasAge ?age .
                                FILTER(?age > 30)}



    Result
                                         x
                                         =================
                                         <http://example.org/data#john>


F. Corno, L. Farinetti - Politecnico di Torino                            152
FROM

    Select RDF graph (= dataset) to be queried
    In case of multiple FROM clauses, graphs are
     merged
    Example

                             PREFIX foaf: <http://xmlns.com/foaf/0.1/>
                             SELECT ?name
                             FROM <http://example.org/foaf/aliceFoaf>
                             WHERE { ?x foaf:name ?name }



F. Corno, L. Farinetti - Politecnico di Torino                           153
SPARQL query – exercise 6
    Graph http://example.org/bob

        @prefix foaf: <http://xmlns.com/foaf/0.1/> .
        _:a foaf:name "Bob" .
        _:a foaf:mbox <mailto:bob@oldcorp.example.org> .


    Graph http://example.org/alice
           @prefix foaf: <http://xmlns.com/foaf/0.1/> .
           _:a foaf:name "Alice" .
           _:a foaf:mbox <mailto:alice@work.example> .



F. Corno, L. Farinetti - Politecnico di Torino             154
SPARQL query – exercise 6
    Return the names of people in both graphs
                          PREFIX foaf: <http://xmlns.com/foaf/0.1/>
                          SELECT ?src ?name
                          FROM NAMED <http://example.org/alice>
                          FROM NAMED <http://example.org/bob>
                          WHERE
                          { GRAPH ?src { ?x foaf:name ?name } }


    Result                                src               name
                                =======================================
                                <http://example.org/bob>     "Bob"
                                <http://example.org/alice>   "Alice"

F. Corno, L. Farinetti - Politecnico di Torino                            155
References
    W3C, “Introduction to the Semantic Web”
          http://www.w3.org/2006/Talks/0524-Edinburgh-IH/
    Lee Feigenbaum, “SPARQL By Example”
          http://www.cambridgesemantics.com/2008/09/sparql-by-example
    Valentina Tamma, “Chapter 4: SPARQL”
          http://www.csc.liv.ac.uk/~valli/Comp318/PDF/SPARQL.pdf
    Tom Heath, “An Introduction to Linked Data”
          http://tomheath.com/slides/2009-02-austin-linkeddata-tutorial.pdf
    HTML Microdata
          http://www.w3.org/TR/microdata
    Schema.org
          http://schema.org

F. Corno, L. Farinetti - Politecnico di Torino                            156
License
     This work is licensed under the Creative
     Commons Attribution-Noncommercial-
     Share Alike 3.0 Unported License.
     To view a copy of this license, visit
     http://creativecommons.org/licenses/by-
     nc-sa/3.0/ or send a letter to Creative
     Commons, 171 Second Street, Suite 300,
     San Francisco, California, 94105, USA.

F. Corno, L. Farinetti - Politecnico di Torino   157

More Related Content

PDF
SPARQL and the Open Linked Data initiative
PDF
RDF - Resource Description Framework and RDF Schema
PDF
Ontology languages and OWL
PDF
EDF2012 Mariana Damova - Factforge
PDF
Fact forge20 edf
PDF
Verifying Integrity Constraints of a RDF-based WordNet
PPTX
Topical_Facets
PDF
Linked Data APIs (Funding Circle May 2015)
SPARQL and the Open Linked Data initiative
RDF - Resource Description Framework and RDF Schema
Ontology languages and OWL
EDF2012 Mariana Damova - Factforge
Fact forge20 edf
Verifying Integrity Constraints of a RDF-based WordNet
Topical_Facets
Linked Data APIs (Funding Circle May 2015)

What's hot (19)

PDF
Contexts and Importing in RDF
PPTX
DLF 2015 Presentation, "RDF in the Real World."
PPT
A Semantic Multimedia Web (Part 2)
PPTX
Fedora Migration Considerations
PPT
Querying the Semantic Web with SPARQL
PPT
Ontologies in RDF-S/OWL
PPT
Ist16-04 An introduction to RDF
PPTX
OWL: Yet to arrive on the Web of Data?
PDF
FOAF for Social Network Portability
PPT
Annotating with RDFa
PDF
JVM Internals - NHJUG Jan 2012
PPT
Rdf In A Nutshell V1
PDF
Linked (Open) Data
PDF
Efficient Query Answering against Dynamic RDF Databases
PPTX
Aall denver 2010
PPT
PDF
when the link makes sense
PDF
Context-Enhanced Adaptive Entity Linking
PPTX
SWT Lecture Session 2 - RDF
Contexts and Importing in RDF
DLF 2015 Presentation, "RDF in the Real World."
A Semantic Multimedia Web (Part 2)
Fedora Migration Considerations
Querying the Semantic Web with SPARQL
Ontologies in RDF-S/OWL
Ist16-04 An introduction to RDF
OWL: Yet to arrive on the Web of Data?
FOAF for Social Network Portability
Annotating with RDFa
JVM Internals - NHJUG Jan 2012
Rdf In A Nutshell V1
Linked (Open) Data
Efficient Query Answering against Dynamic RDF Databases
Aall denver 2010
when the link makes sense
Context-Enhanced Adaptive Entity Linking
SWT Lecture Session 2 - RDF
Ad

Similar to SPARQL and Linked Data (20)

PDF
Logic and Reasoning in the Semantic Web (part I –RDF/RDFS)
PDF
Rule-based reasoning in the Semantic Web
PDF
Logic and Reasoning in the Semantic Web
PPT
Linked Open Data for Libraries
PDF
Methods and experiences in cultural heritage enhancement
PDF
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
PPTX
Wi2015 - Clustering of Linked Open Data - the LODeX tool
PPTX
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
PDF
Framester: A Wide Coverage Linguistic Linked Data Hub
PPTX
Open library data and embrace the world library linked data
PPTX
F# Data: Making structured data first class citizens
PDF
Machine learning-and-data-mining-19-mining-text-and-web-data
PPTX
Fantoni Urgo - Cirp Dictionary
PPTX
Linked data 101: Getting Caught in the Semantic Web
PPTX
SPARQL introduction and training (130+ slides with exercices)
PPTX
Sem webmaubeuge
PPT
RDFS In A Nutshell V1
PDF
LP&IIS2013.Chinese Named Entity Recognition with Conditional Random Fields in...
PDF
RuleML2015: FOWLA, a federated architecture for ontologies
PPTX
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Logic and Reasoning in the Semantic Web (part I –RDF/RDFS)
Rule-based reasoning in the Semantic Web
Logic and Reasoning in the Semantic Web
Linked Open Data for Libraries
Methods and experiences in cultural heritage enhancement
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Framester: A Wide Coverage Linguistic Linked Data Hub
Open library data and embrace the world library linked data
F# Data: Making structured data first class citizens
Machine learning-and-data-mining-19-mining-text-and-web-data
Fantoni Urgo - Cirp Dictionary
Linked data 101: Getting Caught in the Semantic Web
SPARQL introduction and training (130+ slides with exercices)
Sem webmaubeuge
RDFS In A Nutshell V1
LP&IIS2013.Chinese Named Entity Recognition with Conditional Random Fields in...
RuleML2015: FOWLA, a federated architecture for ontologies
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Ad

Recently uploaded (20)

PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
RMMM.pdf make it easy to upload and study
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Cell Types and Its function , kingdom of life
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
master seminar digital applications in india
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Pharma ospi slides which help in ospi learning
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Classroom Observation Tools for Teachers
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Insiders guide to clinical Medicine.pdf
Microbial diseases, their pathogenesis and prophylaxis
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
RMMM.pdf make it easy to upload and study
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Cell Types and Its function , kingdom of life
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
FourierSeries-QuestionsWithAnswers(Part-A).pdf
VCE English Exam - Section C Student Revision Booklet
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
master seminar digital applications in india
STATICS OF THE RIGID BODIES Hibbelers.pdf
Pre independence Education in Inndia.pdf
O7-L3 Supply Chain Operations - ICLT Program
Pharma ospi slides which help in ospi learning
Anesthesia in Laparoscopic Surgery in India
Classroom Observation Tools for Teachers
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
O5-L3 Freight Transport Ops (International) V1.pdf

SPARQL and Linked Data

  • 1. SPARQL - Query Language for RDF Fulvio Corno, Laura Farinetti Politecnico di Torino Dipartimento di Automatica e Informatica e-Lite Research Group – http://elite.polito.it
  • 2. The “new” Semantic Web vision  To make data machine processable, we need:  Unambiguous names for resources (that may also bind data to real world objects): URIs  A common data model to access, connect, describe the resources: RDF  Access to that data: SPARQL  Define common vocabularies: RDFS, OWL, SKOS  Reasoning logics: OWL, Rules  “SPARQL will make a huge difference” (Tim Berners-Lee, May 2006) F. Corno, L. Farinetti - Politecnico di Torino 2
  • 3. The Semantic Web timeline F. Corno, L. Farinetti - Politecnico di Torino 3
  • 5. SPARQL  Queries are very important for distributed RDF data  Complex queries into the RDF data are often necessary  E.g.: “give me the (a,b) pair of resources, for which there is an x such that (x parent a) and (b brother x) holds” (i.e., return the uncles)  This is the goal of SPARQL (Query Language for RDF) F. Corno, L. Farinetti - Politecnico di Torino 5
  • 6. SPARQL  SPARQL 1.0: W3C Recommendation January 15th, 2008  SPARQL 1.1: W3C Working Draft January 5th, 2012  SPARQL queries RDF graphs  An RDF graph is a set of triples  SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware F. Corno, L. Farinetti - Politecnico di Torino 6
  • 7. SPARQL and RDF  It is the triples that matter, not the serialization  RDF/XML is the W3C recommendation but it not a good choice because it allows multiple ways to encode the same graph  SPARQL uses the Turtle syntax, an N-Triples extension F. Corno, L. Farinetti - Politecnico di Torino 7
  • 8. Turtle - Terse RDF Triple Language N-Triples ⊂ Turtle ⊂ N3  A serialization format for RDF  A subset of Tim Berners-Lee and Dan Connolly’s Notation 3 (N3) language  Unlike full N3, doesn’t go beyond RDF’s graph model  A superset of the minimal N-Triples format  Turtle has no official status with any standards organization, but has become popular amongst Semantic Web developers as a human-friendly alternative to RDF/XML F. Corno, L. Farinetti - Politecnico di Torino 8
  • 9. “Triple” or “Turtle” notation F. Corno, L. Farinetti - Politecnico di Torino 9
  • 10. “Triple” or “Turtle” notation <http://www.w3.org/People/EM/contact#me> <http://www.w3.org/2000/10/swap/pim/contact#fullName> "Eric Miller" . <http://www.w3.org/People/EM/contact#me> <http://www.w3.org/2000/10/swap/pim/contact#mailbox> <mailto:em@w3.org> . <http://www.w3.org/People/EM/contact#me> <http://www.w3.org/2000/10/swap/pim/contact#personalTitle> "Dr." . <http://www.w3.org/People/EM/contact#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/10/swap/pim/contact#Person> . F. Corno, L. Farinetti - Politecnico di Torino 10
  • 11. “Triple” or “Turtle” notation (abbreviated) w3people:EM#me contact:fullName "Eric Miller" . w3people:EM#me contact:mailbox <mailto:em@w3.org> . w3people:EM#me contact:personalTitle "Dr." . w3people:EM#me rdf:type contact:Person . F. Corno, L. Farinetti - Politecnico di Torino 11
  • 12. Turtle - Terse RDF Triple Language  Plain text syntax for RDF  Based on Unicode  Mechanisms for namespace abbreviation  Allows grouping of triples according to subject  Shortcuts for collections F. Corno, L. Farinetti - Politecnico di Torino 12
  • 13. Turtle - Terse RDF Triple Language  Simple triple: subject predicate object . :john rdf:label "John" .  Grouping triples: subject predicate object ; predicate object ... :john rdf:label "John" ; rdf:type ex:Person ; ex:homePage http://example.org/johnspage/ . F. Corno, L. Farinetti - Politecnico di Torino 13
  • 14. Prefixes  Mechanism for namespace abbreviation @prefix abbr: <URI>  Example: @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  Default: @prefix : <URI>  Example: @prefix : <http://example.org/myOntology#> F. Corno, L. Farinetti - Politecnico di Torino 14
  • 15. Identifiers  URIs: <URI> http://www.w3.org/1999/02/22-rdf-syntax-ns#  Qnames (Qualified names)  namespace-abbr?:localname rdf:type dc:title :hasName  Literals  "string"(@lang)?(ˆˆtype)? "John" "Hello"@en-GB "1.4"^^xsd:decimal F. Corno, L. Farinetti - Politecnico di Torino 15
  • 16. Blank nodes  Simple blank node:  [] or _:x :john ex:hasFather [] . :john ex:hasFather _:x .  Blank node as subject:  [ predicate object ; predicate object ... ] . [ ex:hasName "John"] . [ ex:authorOf :lotr ; ex:hasName "Tolkien"] . F. Corno, L. Farinetti - Politecnico di Torino 16
  • 17. Blank nodes F. Corno, L. Farinetti - Politecnico di Torino 17
  • 18. Collections  ( object1 ... objectn ) :doc1 ex:hasAuthor (:john :mary) .  Short for :doc1 ex:hasAuthor [ rdf:first :john; rdf:rest [ rdf:first :mary; rdf:rest rdf:nil ] ] . F. Corno, L. Farinetti - Politecnico di Torino 18
  • 19. Example @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntaxns# . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix : <http://example.org/#> . <http://www.w3.org/TR/rdf-syntax-grammar> dc:title "RDF/XML Syntax Specification (Revised)" ; :editor [ :fullName "Dave Beckett"; :homePage <http://purl.org/net/dajobe/> ] . F. Corno, L. Farinetti - Politecnico di Torino 19
  • 20. RDF Triplestores  Basically, databases for triples  Triplestores do not only store triples, but allow to extract the “interesting” triples, via SPARQL queries F. Corno, L. Farinetti - Politecnico di Torino 20
  • 21. Comparison Relational database RDF Triplestore  Data model  Data model  Relational data (tables)  RDF graphs  Data instances  Data instances  Records in tables  RDF triples  Query support  Query support  SQL  SPARQL  Indexing mechanisms  Indexing mechanisms  Optimized for evaluating  Optimized for evaluating queries as relational  Queries as graph patterns expressions F. Corno, L. Farinetti - Politecnico di Torino 21
  • 22. SPARQL  Uses SQL-like syntax Prefix mechanism to abbreviate URIs PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?title WHERE { <http://example.org/book/book1> dc:title ?title } Variables to be returned Query pattern (list of triple patterns) FROM Name of the graph F. Corno, L. Farinetti - Politecnico di Torino 22
  • 23. SELECT  Variables selection ?x  Variables: ?string ?title ?name  Syntax: SELECT var1,…,varn SELECT ?name SELECT ?x,?title F. Corno, L. Farinetti - Politecnico di Torino 23
  • 24. WHERE  Graph patterns to match  Set of triples { (subject predicate object .)* }  Subject: URI, QName, Blank node, Literal, Variable  Predicate: URI, QName, Blank node, Variable  Object: URI, QName, Blank node, Literal, Variable F. Corno, L. Farinetti - Politecnico di Torino 24
  • 25. Graph patterns  The pattern contains unbound symbols  By binding the symbols (if possible), subgraphs of the RDF graph are selected  If there is such a selection, the query returns the bound resources F. Corno, L. Farinetti - Politecnico di Torino 25
  • 26. Graph patterns  E.g.: (subject,?p,?o)  ?p and ?o are “unknowns” F. Corno, L. Farinetti - Politecnico di Torino 26
  • 27. Graph patterns SELECT ?p ?o WHERE {subject ?p ?o}  The triplets in WHERE define the graph pattern, with ?p and ?o “unbound” symbols  The query returns a list of matching p,o pairs F. Corno, L. Farinetti - Politecnico di Torino 27
  • 28. Example 1 SELECT ?cat, ?val WHERE { ?x rdf:value ?val. ?x category ?cat }  Returns: [["Total Members",100],["Total Members",200],…, ["Full Members",10],…] F. Corno, L. Farinetti - Politecnico di Torino 28
  • 29. Example 2 SELECT ?cat, ?val WHERE { ?x rdf:value ?val. ?x category ?cat. FILTER(?val>=200). }  Returns: [["Total Members",200],…] F. Corno, L. Farinetti - Politecnico di Torino 29
  • 30. Example 3 SELECT ?cat, ?val, ?uri WHERE { ?x rdf:value ?val. ?x category ?cat. ?al contains ?x. ?al linkTo ?uri }  Returns: [["Total Members",100,http://...)],…,] F. Corno, L. Farinetti - Politecnico di Torino 30
  • 31. Example 4 SELECT ?cat, ?val, ?uri WHERE { ?x rdf:value ?val. ?x category ?cat. OPTIONAL ?al contains ?x. ?al linkTo ?uri }  Returns: [["Total Members",100,http://...], …, ["Full Members",20, ],…,] F. Corno, L. Farinetti - Politecnico di Torino 31
  • 32. Other SPARQL Features  Limit the number of returned results  Remove duplicates, sort them,…  Specify several data sources (via URI-s) within the query (essentially, a merge)  Construct a graph combining a separate pattern and the query results  Use datatypes and/or language tags when matching a pattern F. Corno, L. Farinetti - Politecnico di Torino 32
  • 33. SPARQL use in practice  Locally, i.e., bound to a programming environments like Jena  Jena is a Java framework for building Semantic Web applications; provides an environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine  Remotely, e.g., over the network or into a database F. Corno, L. Farinetti - Politecnico di Torino 33
  • 34. Providing RDF on the Web  RDF data usually resides in a RDF database (triplestore)  …how do we ‘put them out’ on the web?  SPARQL endpoints (SPARQL query over HTTP)  Direct connection to triplestore over HTTP F. Corno, L. Farinetti - Politecnico di Torino 34
  • 35. Example of SPARQL endpoint Dataset SPARQL query F. Corno, L. Farinetti - Politecnico di Torino 35
  • 36. Providing RDF on the Web F. Corno, L. Farinetti - Politecnico di Torino 36
  • 37. Exposing RDF on the web  Problem: usually HTML content and RDF data are separate Web Page (HTML) RDF data (XML) F. Corno, L. Farinetti - Politecnico di Torino 37
  • 38. Exposing RDF on the web  Separate HTML content and RDF data  Maintenance problem  Both need to be managed separately  RDF content and web content have much overlap (redundancy)  RDF/XML difficult to author: extra overhead  Verification problem  How to differences as content changes?  Visibility problem  Easy to ignore the RDF content (out of sight, out of mind) F. Corno, L. Farinetti - Politecnico di Torino 38
  • 39. Exposing RDF on the web  Solution: embed RDF into web content using RDFa Web Page (HTML) RDF data (XML) ‘Embed’ RDF into HTML F. Corno, L. Farinetti - Politecnico di Torino 39
  • 40. RDFa  W3C Recommendation (October, 2008)  Set of extensions to XHTML that allows to annotate XHTML markup with semantics  Uses attributes from XHTML meta and link elements, and generalizes them so that they are usable on all elements A simple mapping is defined so that RDF triples may be extracted F. Corno, L. Farinetti - Politecnico di Torino 40
  • 41. Exposing RDF on the web  RDFa: Resource Description Framework-in-attributes  Extra (RDFa) markup is ignored by web browsers F. Corno, L. Farinetti - Politecnico di Torino 41
  • 42. XHTML + RDFa example <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="XHTML+RDFa 1.0" xml:lang="en"> <head> <title>John's Home Page</title> <base href="http://example.org/john-d/" /> <meta property="dc:creator" content="Jonathan Doe" /> </head> <body> <h1>John's Home Page</h1> <p>My name is <span property="foaf:nick">John D</span> and I like <a href="http://www.neubauten.org/" rel="foaf:interest" xml:lang="de">Einstürzende Neubauten</a>. </p> <p> My <span rel="foaf:interest" resource="urn:ISBN:0752820907"> favorite book</span> is the inspiring <span about="urn:ISBN:0752820907“><cite property="dc:title"> Weaving the Web</cite> by <span property="dc:creator">Tim Berners-Lee</span></span> </p> </body> </html> F. Corno, L. Farinetti - Politecnico di Torino 42
  • 43. Automatic conversion to RDF/XML <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://example.org/john-d/"> <dc:creator xml:lang="en">Jonathan Doe</dc:creator> <foaf:nick xml:lang="en">John D</foaf:nick> <foaf:interest rdf:resource="http://www.neubauten.org/"/> <foaf:interest> <rdf:Description rdf:about="urn:ISBN:0752820907"> <dc:creator xml:lang="en">Tim Berners-Lee</dc:creator> <dc:title xml:lang="en">Weaving the Web</dc:title> </rdf:Description> </foaf:interest> </rdf:Description> </rdf:RDF> F. Corno, L. Farinetti - Politecnico di Torino 43
  • 44. RDFa annotations  Less than 5% of web pages have RDFa annotations (Google, 2010)  However, many organizations already publish or consume RDFa  Google, Yahoo  Facebook, MySpace, LinkedIn  Best Buy, Tesco, O’Reilly  SlideShare, Digg  WhiteHouse.gov, Library of Congress, UK government  Newsweek, BBC F. Corno, L. Farinetti - Politecnico di Torino 44
  • 45. RDFa is not the only solution … Source: Peter Mika (Yahoo!), RDFa, 2011 F. Corno, L. Farinetti - Politecnico di Torino 45
  • 46. Rich snippets  Several solutions for embedding semantic data in Web  Three syntaxes known (by Google) as “rich snippets”  Microformats  RDFa  HTML microdata  All three are supported by Google, while microdata is the “recommended” syntax F. Corno, L. Farinetti - Politecnico di Torino 46
  • 47. First came microformats  Microformats emerged around 2005  Some key principles  Startby solving simple, specific problems  Design for humans first, machines second  Wide deployment  Used on billions of Web pages  Formats exist for marking up atom feeds, calendars, addresses and contact info, geo- location, multimedia, news, products, recipes, reviews, resumes, social relationships, etc. F. Corno, L. Farinetti - Politecnico di Torino 47
  • 48. Microformats example <div class="vcard"> <a class="fn org url" href="http://www.commerce.net/">CommerceNet</a> <div class="adr"> <span class="type">Work</span>: <div class="street-address">169 University Avenue</div> <span class="locality">Palo Alto</span>, <abbr class="region" title="California">CA</abbr>&nbsp;&nbsp; <span class="postal-code">94301</span> <div class="country-name">USA</div> </div> <div class="tel"> <span class="type">Work</span> +1-650-289-4040 </div> <div>Email: <span class="email">info@commerce.net</span> </div> </div> F. Corno, L. Farinetti - Politecnico di Torino 48
  • 49. Then came RDFa  RDFa aims to bridge the gap between human oriented HTML and machine-oriented RDF documents  Provides XHTML attributes to indicate machine understandable information  Uses the RDF data model, and Semantic Web vocabularies directly F. Corno, L. Farinetti - Politecnico di Torino 49
  • 50. RDFa example <div typeof="foaf:Person" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <p property="foaf:name">Alice Birpemswick</p> <p>Email: <a rel="foaf:mbox" href="mailto:alice@example.com">alice@example.com</a> </p> <p>Phone: <a rel="foaf:phone" href="tel:+1-617-555-7332"> +1 617.555.7332</a> </p> </div> F. Corno, L. Farinetti - Politecnico di Torino 50
  • 51. Last but not least, microdata  Microdata syntax is based on nested groups of name-value pairs  HTML microdata specification includes  An unambiguous parsing model  An algorithm to convert microdata to RDF  Compatible with the Semantic Web via mappings F. Corno, L. Farinetti - Politecnico di Torino 51
  • 52. Microdata syntax  Microdata properties  Annotate an item with text-valued properties using the “itemprop” attribute <div itemscope> <p>My name is <span itemprop="name">Daniel</span>.</p> </div> F. Corno, L. Farinetti - Politecnico di Torino 52
  • 53. Microdata syntax  Multiple values are ok  As in RDF, you can have two properties, for the same item (subject) with the same value (object) <div itemscope> <p>Flavors in my favorite ice cream:</p> <ul> <li itemprop="flavor">Lemon sorbet</li> <li itemprop="flavor">Apricot sorbet</li> </ul> </div> F. Corno, L. Farinetti - Politecnico di Torino 53
  • 54. Microdata syntax  Item types  Correspond to classes in RDF <section itemscope itemtype="http://example.org/animals#cat"> <h1 itemprop="name">Hedral</h1> <p itemprop="desc">Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly.</p> <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"> </section> F. Corno, L. Farinetti - Politecnico di Torino 54
  • 55. Microdata syntax  Global IDs  Items may be given global identifiers, which are URLs  They may be, but do not need to be Semantic Web URIs <dl itemscope itemtype="http://vocab.example.net/book" itemid="urn:isbn:0-330-34032-8"> <dt>Title</dt> <dd itemprop="title">The Reality Dysfunction</dd> <dt>Author</dt> <dd itemprop="author">Peter F. Hamilton</dd> <dt>Publication date</dt> <dd><time itemprop="pubdate" datetime="1996-01-26"> 26 January 1996</time></dd> </dl> F. Corno, L. Farinetti - Politecnico di Torino 55
  • 56. Schema.org  Schema.org is one of a number of microdata vocabularies  It is a shared collection of microdata schemas for use by webmasters  Includes a type hierarchy, like an RDFS schema  Starts with top-level Thing (which has four properties: name, description, url, and image) and DataType types  More specific types share properties with broader types; for example, a Place is a more specific type of Thing, and a TouristAttraction is a more specific type of Place  More specific items inherit the properties of their parent F. Corno, L. Farinetti - Politecnico di Torino 56
  • 57. Example <div itemscope itemtype="http://schema.org/Movie"> <h1 itemprop="name"&g;Avatar</h1> <div itemprop="director" itemscope itemtype="http://schema.org/Person"> Director: <span itemprop="name">James Cameron</span>(born <span itemprop="birthDate">August 16, 1954)</span> </div> <span itemprop="genre">Science fiction</span> <a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a> </div> F. Corno, L. Farinetti - Politecnico di Torino 57
  • 58. Current schema.org types F. Corno, L. Farinetti - Politecnico di Torino 58
  • 59. Schema.org full hierarchy http://www.schema.org/docs/full.html F. Corno, L. Farinetti - Politecnico di Torino 59
  • 60. Item examples F. Corno, L. Farinetti - Politecnico di Torino 60
  • 61. F. Corno, L. Farinetti - Politecnico di Torino 61
  • 62. SPARQL use in practice  Where to find meaningful RDF data to search?  The Linked Data Project F. Corno, L. Farinetti - Politecnico di Torino 62
  • 64. The Linked Data Project  A fundamental prerequisite of the Semantic Web is the existence of large amounts of meaningfully interlinked RDF data on the Web  “To make the Semantic Web a reality, it is necessary to have a large volume of data available on the Web in a standard, reachable and manageable format. In addition the relationships among data also need to be made available. This collection of interrelated data on the Web can also be referred to as Linked Data. Linked Data lies at the heart of the Semantic Web: large scale integration of, and reasoning on, data on the Web.” (W3C) F. Corno, L. Farinetti - Politecnico di Torino 64
  • 65. The Linked Data Project  Linked Data is about using the Web to connect related data that wasn’t previously linked, or using the Web to lower the barriers to linking data currently linked using other methods  Linked Data is a set of principles that allows publishing, querying and browsing of RDF data, distributed across different servers F. Corno, L. Farinetti - Politecnico di Torino 65
  • 66. The Linked Data Project  Community effort to make various open datasets available on the Web as RDF and to set RDF links between data items from different datasets  The datasets are published according to the Linked Data principles and can therefore be crawled by Semantic Web search engines and navigated using Semantic Web browsers  Supported by W3C  Began early 2007  http://linkeddata.org/home F. Corno, L. Farinetti - Politecnico di Torino 66
  • 67. The Web of Documents  Analogy: a global filesystem  Designed for human consumption  Primary objects: documents  Links between documents (or sub-parts)  Degree of structure in objects: fairly low  Semantics of content and links: implicit F. Corno, L. Farinetti - Politecnico di Torino 67
  • 68. The Web of Linked Data  Analogy: a global database  Designed for machines first, humans later  Primary objects: things (or descriptions of things)  Links between things  Degree of structure in (descriptions of) things: high  Semantics of content and links: explicit F. Corno, L. Farinetti - Politecnico di Torino 68
  • 69. Linked Data example F. Corno, L. Farinetti - Politecnico di Torino 69
  • 70. Linked Data example F. Corno, L. Farinetti - Politecnico di Torino 70
  • 71. Why publish Linked Data?  Ease of discovery  Ease of consumption  Standards-based data sharing  Reduced redundancy  Added value  Build ecosystems around your data/content F. Corno, L. Farinetti - Politecnico di Torino 71
  • 72. Linked Open Data cloud May 2007 F. Corno, L. Farinetti - Politecnico di Torino 72
  • 73. DBpedia  DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web  DBpedia allows to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data F. Corno, L. Farinetti - Politecnico di Torino 73
  • 74. GeoNames  GeoNames is a geographical database that contains over eight million geographical names  Available for download free of charge under a creative commons attribution license F. Corno, L. Farinetti - Politecnico di Torino 74
  • 75. Main contributors  DBLP Computer science  Project Gutenberg Literary works bibliography in the public domain  Richard Cyganiak, Chris Bizer (FU  Piet Hensel, Hans Butschalowsky Berlin) (FU Berlin)  DBpedia Structured information  Revyu Community reviews about from Wikipedia anything  Universität Leipzig, FU Berlin,  Tom Heath, Enrico Motta (Open OpenLink University)  DBtune, Jamendo Creative  RDF Book Mashup Books from Commons music repositories the Amazon API  Yves Raimond (University of  Tobias Gauß, Chris Bizer (FU London) Berlin)  Geonames World-wide  US Census Data Statistical geographical database information about the U.S.  Bernard Vatant (Mondeca), Marc  Josh Tauberer (University of Wick (Geonames) Pennsylvania), OpenLink  Musicbrainz Music and artist  World Factbook Country database statistics, compiled by CIA  Frederick Giasson, Kingsley  Piet Hensel, Hans Butschalowsky Idehen (Zitgist) (FU Berlin) F. Corno, L. Farinetti - Politecnico di Torino 75
  • 76. July 2007 F. Corno, L. Farinetti - Politecnico di Torino 76
  • 77. August 2007 F. Corno, L. Farinetti - Politecnico di Torino 77
  • 78. November 2007 F. Corno, L. Farinetti - Politecnico di Torino 78
  • 79. February 2008 F. Corno, L. Farinetti - Politecnico di Torino 79
  • 80. September 2008 F. Corno, L. Farinetti - Politecnico di Torino 80
  • 81. March 2009 F. Corno, L. Farinetti - Politecnico di Torino 81
  • 82. July 2009 F. Corno, L. Farinetti - Politecnico di Torino 82
  • 83. September 2010 F. Corno, L. Farinetti - Politecnico di Torino 83
  • 84. September 2010 F. Corno, L. Farinetti - Politecnico di Torino 84
  • 85. September 2011 F. Corno, L. Farinetti - Politecnico di Torino 85
  • 86. Statistics on datasets  http://ckan.net/group/lodcloud  http://www4.wiwiss.fu-berlin.de/lodcloud/ F. Corno, L. Farinetti - Politecnico di Torino 86
  • 87. Statistics on links between datasets F. Corno, L. Farinetti - Politecnico di Torino 87
  • 88. Linked Data success stories  BBC Music  Integrates information from MusicBrainz and Wikipedia for artist/band infopages  Information also available in RDF (in addition to web pages)  3rd party applications built on top of the BBC data  BBC also contributes data back to the MusicBrainz  Nytimes  Maps its thesaurus of 1 million entity descriptions (people, organisations, places, etc) to Dbpedia and Freebase F. Corno, L. Farinetti - Politecnico di Torino 88
  • 89. Linked Data shopping list  List of sites/datasets that the “community” would like to see published as Linked Data  This list may form the basis for some campaign/action to encourage these data publishers to embrace Linked Data  http://linkeddata.org/linked-data-shopping-list F. Corno, L. Farinetti - Politecnico di Torino 89
  • 90. The Linked Data principles (“expectations of behavior”)  The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data  It is the unexpected re-use of information which is the value added by the web (Tim Berners-Lee) F. Corno, L. Farinetti - Politecnico di Torino 90
  • 91. The Linked Data principles (“expectations of behavior”)  Unambiguous identifiers for objects (resources)  Use URIs as names for things  Use the structure of the web  Use HTTP URIs so that people can look up the names  Make is easy to discover information about an object (resource)  When someone lookups a URI, provide useful information  Link the object (resource) to related objects  Include links to other URIs F. Corno, L. Farinetti - Politecnico di Torino 91
  • 92. Link to existing vocabularies  For describing classes (categories) and properties (relationships), try to re-use existing vocabularies  Easier to interoperate if we’re talking the same language!  Many vocabularies/ontologies out there  schema.org is a great place to start looking!  Vocabs for products (Good Relations), people (FOAF), social media (SIOC), places, events, businesses, e-commerce, music, etc., you name it…  If nothing relevant, you can create your own, but make sure you…  Publish it!  Reconcile (map) it to other vocabularies, if you can F. Corno, L. Farinetti - Politecnico di Torino 92
  • 93. Link to other datasets  Popular predicates for linking  owl:equivalentClass  foaf:homepage  owl:sameAs  foaf:topic  rdfs:seeAlso  foaf:based_near  skos:closeMatch  foaf:maker/foaf:made  skos:exactMatch  foaf:page  skos:related  foaf:primaryTopic  Example: http://dbpedia.org/resource/Canberra owl:sameAs http://rdf.freebase.com/rdf/en.canberra F. Corno, L. Farinetti - Politecnico di Torino 93
  • 94. Link to other Data Sets (Wikicompany is a free content licensed worldwide business directory that anyone can edit) (flickr wrappr extends DBpedia with RDF links to photos posted on flickr) F. Corno, L. Farinetti - Politecnico di Torino 94
  • 95. Linked Data – open issues  Schema diversity & proliferation  Quality of data is poor  No kind of consistency is guarantees  Issues with reliability of data end-points  High down-time is not unusual  There is no Service Level Agreement provided  Querying of linked data is slow  Data is distributed on the web  Even single SPARQL endpoints can be slow  Most end-points are experimental/research projects with no resources for quality guarantees  Licensing issues  Majority of datasets carry no explicit open license F. Corno, L. Farinetti - Politecnico di Torino 95
  • 96. Linked Data tools  Tools for Publishing Linked Data  D2R Server: a tool for publishing relational databases as Linked Data  Triplify: transforms relational data into RDF/LinkedData  Pubby: a Linked Data frontend for SPARQL endpoints  Tools for consuming Linked Data  Semantic Web Browsers and Client Libraries  Semantic Web Search Engines F. Corno, L. Farinetti - Politecnico di Torino 96
  • 97. Pubby  Many triple stores and other SPARQL endpoints can be accessed only by SPARQL client applications that use the SPARQL protocol  It cannot be accessed by the growing variety of Linked Data clients  Pubby is designed to provide a Linked Data interface to those RDF data sources  http://www4.wiwiss.fu-berlin.de/pubby/ F. Corno, L. Farinetti - Politecnico di Torino 97
  • 98. Pubby F. Corno, L. Farinetti - Politecnico di Torino 98
  • 99. Linked Data browsers – Marbles  http://marbles.sourceforge.net  XHTML views of RDF data (SPARQL endpoint), caching, predicate traversal F. Corno, L. Farinetti - Politecnico di Torino 99
  • 100. Linked Data browsers – RelFinder  http://relfinder.dbpedia.org  Explore & navigate relationships in a RDF graph F. Corno, L. Farinetti - Politecnico di Torino 100
  • 101. Linked Data browsers – gFacet  http://semanticweb.org/wiki/GFacet  Graph based visualisation & faceted filtering of RDF data F. Corno, L. Farinetti - Politecnico di Torino 101
  • 102. Linked Data browsers – Forest  Front-end to FactForge and LinkedLifeData F. Corno, L. Farinetti - Politecnico di Torino 102
  • 103. FactForge and LinkedLifeData  FactForge  Integrates some of the most central LOD datasets  General-purpose information (not specific to a domain)  1.2B explicit plus 1B inferred statements  The largest upper-level knowledge base  http://www.FactForge.net/  LinkedLifeData  25 of the most popular life-science datasets  2.7B explicit and 1.4B inferred triples  http://www.LinkedLifeData.com F. Corno, L. Farinetti - Politecnico di Torino 103
  • 104. Linked Data browsers – Information Workbench  http://iwb.fluidops.com/resource/Help:Start F. Corno, L. Farinetti - Politecnico di Torino 104
  • 105. F. Corno, L. Farinetti - Politecnico di Torino 105
  • 106. F. Corno, L. Farinetti - Politecnico di Torino 106
  • 107. http://it.ckan.net/ F. Corno, L. Farinetti - Politecnico di Torino 107
  • 109. SPARQL query structure  A SPARQL query includes, in order  Prefix declarations, for abbreviating URIs  A result clause, identifying what information to return from the query  The query pattern, specifying what to query for in the underlying dataset  Query modifiers: slicing, ordering, and otherwise rearranging query results F. Corno, L. Farinetti - Politecnico di Torino 109
  • 110. SPARQL query structure  A SPARQL query includes, in order # prefix declarations PREFIX foo: <http://example.com/resources/> ... # result clause SELECT ... # query pattern WHERE { ... } # query modifiers ORDER BY ... F. Corno, L. Farinetti - Politecnico di Torino 110
  • 111. Dataset: Friend of a Friend (FOAF)  FOAF is a standard RDF vocabulary for describing people and relationships  Tim Berners-Lee's FOAF information available at http://www.w3.org/People/Berners-Lee/card @prefix card: <http://www.w3.org/People/Berners-Lee/card#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . card:i foaf:name "Timothy Berners-Lee" . <http://bblfish.net/people/henry/card#me> foaf:name "Henry Story" . <http://www.cambridgesemantics.com/people/about/lee> foaf:name "Lee Feigenbaum" . card:amy foaf:name "Amy van der Hiel" . ... F. Corno, L. Farinetti - Politecnico di Torino 111
  • 112. Example 1 – simple triple pattern  In the graph http://www.w3.org/People/Berners-Lee/card, find all subjects (?person) and objects (?name) linked with the foaf:name predicate.  Then return all the values of ?name.  In other words, find all names mentioned in Tim Berners- Lee’s FOAF file PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE { ?person foaf:name ?name . } F. Corno, L. Farinetti - Politecnico di Torino 112
  • 113. SPARQL endpoints  Accept queries and returns results via HTTP  Generic endpoints queries any Web-accessible RDF data  Specific endpoints are hardwired to query against particular datasets  The results of SPARQL queries can be returned in a variety of formats:  XML, JSON, RDF, HTML  JSON (JavaScript Object Notation): lightweight computer data interchange format; text-based, human-readable format for representing simple data structures and associative arrays F. Corno, L. Farinetti - Politecnico di Torino 113
  • 114. SPARQL endpoints  This query is for an arbitrary bit of RDF data (Tim Berners-Lee's FOAF file)  => generic endpoint to run it  Possible choices  SPARQLer - General purpose processor - sparql.org  http://sparql.org/sparql.html  OpenLink's Virtuoso (Make sure to choose "Retrieve remote RDF data for all missing source graphs")  http://bbc.openlinksw.com/sparql/  Redland’s Rasqal  http://librdf.org/rasqal/ F. Corno, L. Farinetti - Politecnico di Torino 114
  • 115. SPARQLer SPARQL query Dataset F. Corno, L. Farinetti - Politecnico di Torino 115
  • 116. OpenLink’s Virtuoso Dataset SPARQL query F. Corno, L. Farinetti - Politecnico di Torino 116
  • 117. Example 1 - simple triple pattern PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE { ?person foaf:name ?name . } F. Corno, L. Farinetti - Politecnico di Torino 117
  • 118. Example 2 – multiple triple pattern  Find all people in Tim Berners-Lee’s FOAF file that have names and email addresses  Return each person’s URI, name, and email address  Multiple triple patterns retrieve multiple properties about a particular resource  SELECT * selects all variables mentioned in the query PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT * WHERE { ?person foaf:name ?name . ?person foaf:mbox ?email . } F. Corno, L. Farinetti - Politecnico di Torino 118
  • 119. Example 2 - multiple triple pattern F. Corno, L. Farinetti - Politecnico di Torino 119
  • 120. Example 3 – traversing a graph  Find the homepage of anyone known by Tim Berners-Lee F. Corno, L. Farinetti - Politecnico di Torino 120
  • 121. Example 3 – traversing a graph PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX card: <http://www.w3.org/People/Berners-Lee/card#> SELECT ?homepage FROM <http://www.w3.org/People/Berners-Lee/card> WHERE { card:i foaf:knows ?known . ?known foaf:homepage ?homepage . }  The FROM keyword specifies the target graph in the query  By using ?known as an object of one triple and the subject of another, it is possible to traverse multiple links in the graph F. Corno, L. Farinetti - Politecnico di Torino 121
  • 122. Dataset: DBPedia  DBPedia is an RDF version of information from Wikipedia  Contains data derived from Wikipedia’s infoboxes, category hierarchy, article abstracts, and various external links  Contains over 100 million triples  Dataset: http://dbpedia.org/sparql/ F. Corno, L. Farinetti - Politecnico di Torino 122
  • 123. Example 4 – exploring DBPedia  Find 15 example concepts in the DBPedia dataset SELECT DISTINCT ?concept WHERE { ?s a ?concept . } LIMIT 15 F. Corno, L. Farinetti - Politecnico di Torino 123
  • 124. Example 4 – exploring DBPedia  LIMIT is a solution modifier that limits the number of rows returned from a query  SPARQL has two other solution modifiers  ORDER BY for sorting query solutions on the value of one or more variables  OFFSET, used in conjunction with LIMIT and ORDER BY to take a slice of a sorted solution set (e.g. for paging)  The SPARQL keyword a is a shortcut for the common predicate rdf:type (class of a resource)  The DISTINCT modifier eliminates duplicate rows from the query results F. Corno, L. Farinetti - Politecnico di Torino 124
  • 125. Example 5 – basic SPARQL filters  Find all landlocked countries with a population greater than 15 million PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX type: <http://dbpedia.org/class/yago/> PREFIX prop: <http://dbpedia.org/property/> SELECT ?country_name ?population WHERE { ?country a type:LandlockedCountries ; rdfs:label ?country_name ; prop:populationEstimate ?population . FILTER (?population > 15000000) . }  FILTER constraints use boolean conditions to filter out unwanted query results  A semicolon (;) can be used to separate two triple patterns that share the same subject F. Corno, L. Farinetti - Politecnico di Torino 125
  • 126. SPARQL filters  Conditions on literal values  Syntax FILTER expression  Examples FILTER (?age > 30) FILTER isIRI(?x) FILTER !BOUND(?y) F. Corno, L. Farinetti - Politecnico di Torino 126
  • 127. SPARQL filters  BOUND(var)  true if var is bound in query answer  false, otherwise  !BOUND(var) enables negation-as-failure  Testing types  isIRI(A): A is an “Internationalized Resource Identifier”  isBLANK(A): A is a blank node  isLITERAL(A): A is a literal F. Corno, L. Farinetti - Politecnico di Torino 127
  • 128. SPARQL filters  Comparison between A = B A != B RDF terms A = B A != B  Comparison between A <= B A >= B Numeric and Date types A < B A > B  Boolean AND/OR A && B A || B A + B  Basic arithmetic A - B A * B A / B F. Corno, L. Farinetti - Politecnico di Torino 128
  • 129. Example 5 – basic SPARQL filters  Note all the translated duplicates in the results  How can we deal with that? F. Corno, L. Farinetti - Politecnico di Torino 129
  • 130. Example 6 – SPARQL filters  Find me all landlocked countries with a population greater than 15 million (revisited), with the highest population country first PREFIX type: <http://dbpedia.org/class/yago/> PREFIX prop: <http://dbpedia.org/property/> SELECT ?country_name ?population WHERE { ?country a type:LandlockedCountries ; rdfs:label ?country_name ; prop:populationEstimate ?population . FILTER (?population > 15000000 && langMatches(lang(?country_name), "EN")) . } ORDER BY DESC(?population) F. Corno, L. Farinetti - Politecnico di Torino 130
  • 131. Example 6 – SPARQL filters  lang extracts a literal’s language tag, if any  langMatches matches a language tag against a language range F. Corno, L. Farinetti - Politecnico di Torino 131
  • 132. Dataset: Jamendo  Jamendo is a community collection of music all freely licensed under Creative Commons licenses  http://www.jamendo.com/it/  DBTune.org hosts a queryable RDF version of information about Jamendo's music collection  Data on thousands of artists, tens of thousands of albums, and nearly 100,000 tracks  http://dbtune.org/jamendo/store/ F. Corno, L. Farinetti - Politecnico di Torino 132
  • 133. Example 7 – the wrong way  Find all Jamendo artists along with their image, home page, and the location they’re near PREFIX mo: <http://purl.org/ontology/mo/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?img ?hp ?loc WHERE { ?a a mo:MusicArtist ; foaf:name ?name ; foaf:img ?img ; foaf:homepage ?hp ; foaf:based_near ?loc . } F. Corno, L. Farinetti - Politecnico di Torino 133
  • 134. Example 7 – DBTune SPARQL endpoint http://dbtune.org/jamendo/store/  Jamendo has information on about 3,500 artists  Trying the query we only get 2,667 results. What's wrong? F. Corno, L. Farinetti - Politecnico di Torino 134
  • 135. Example 7 – the right way  Not every artist has an image, homepage, or location!  OPTIONAL tries to match a graph pattern, but doesn't fail the whole query if the optional match fails  If an OPTIONAL pattern fails to match for a particular solution, any variables in that pattern remain unbound (no value) for that solution PREFIX mo: <http://purl.org/ontology/mo/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?img ?hp ?loc WHERE { ?a a mo:MusicArtist ; foaf:name ?name . OPTIONAL { ?a foaf:img ?img } OPTIONAL { ?a foaf:homepage ?hp } OPTIONAL { ?a foaf:based_near ?loc } } F. Corno, L. Farinetti - Politecnico di Torino 135
  • 136. Dataset: GovTrack  GovTrack provides SPARQL access to data on the U.S. Congress  Contains over 13,000,000 triples about legislators, bills, and votes  http://www.govtrack.us/ F. Corno, L. Farinetti - Politecnico di Torino 136
  • 137. Example 8 – querying alternatives  Find Senate bills that either John McCain or Barack Obama sponsored and the other cosponsored PREFIX bill: <http://www.rdfabout.com/rdf/schema/usbill/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?title ?sponsor ?status WHERE { { ?bill bill:sponsor ?mccain ; bill:cosponsor ?obama . } UNION { ?bill bill:sponsor ?obama ; bill:cosponsor ?mccain . } ?bill a bill:SenateBill ; bill:status ?status ; bill:sponsor ?sponsor ; dc:title ?title . ?obama foaf:name "Barack Obama" . ?mccain foaf:name "John McCain" . } F. Corno, L. Farinetti - Politecnico di Torino 137
  • 138. Example 8 – GovTrack specific endpoint http://www.govtrack.us/developers/rdf.xpd  The UNION keyword forms a disjunction of two graph patterns: solutions to both sides of the UNION are included in the results F. Corno, L. Farinetti - Politecnico di Torino 138
  • 139. RDF datasets  All queries so far have been against a single graph  In SPARQL this is known as the default graph  RDF datasets are composed of a single default graph and zero or more named graphs, identified by a URI  Named graphs can be specified with one or more FROM NAMED clauses, or they can be hardwired into a particular SPARQL endpoint  The SPARQL GRAPH keyword allows portions of a query to match against the named graphs in the RDF dataset  Anything outside a GRAPH clause matches against the default graph F. Corno, L. Farinetti - Politecnico di Torino 139
  • 140. Dataset: semanticweb.org  data.semanticweb.org hosts RDF data regarding workshops, schedules, and presenters for the International Semantic Web (ISWC) and European Semantic Web Conference (ESWC) series of events  Presents data via FOAF, SWRC, and iCal ontologies  The data for each individual ISWC or ESWC event is stored in its own named graph  i.e., there is one named graph per conference event contained in this dataset  http://data.semanticweb.org/ F. Corno, L. Farinetti - Politecnico di Torino 140
  • 141. Example 9 – querying named graphs  Find people who have been involved with at least three ISWC or ESWC conference events PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?person WHERE { GRAPH ?g1 { ?person a foaf:Person } GRAPH ?g2 { ?person a foaf:Person } GRAPH ?g3 { ?person a foaf:Person } FILTER(?g1 != ?g2 && ?g1 != ?g3 && ?g2 != ?g3) . } F. Corno, L. Farinetti - Politecnico di Torino 141
  • 142. Example 9 – querying named graphs  The GRAPH ?g construct allows a pattern to match against one of the named graphs in the RDF dataset  The URI of the matching graph is bound to ?g (or whatever variable was actually used)  The FILTER assures that we’re finding a person who occurs in three distinct graphs  The Web interface used for this SPARQL query defines the foaf: prefix, which is why it is omitted here F. Corno, L. Farinetti - Politecnico di Torino 142
  • 143. Data.semanticweb.org specific SPARQL endpoint http://data.semanticweb.org/snorql/ F. Corno, L. Farinetti - Politecnico di Torino 143
  • 144. SPARQL 1.1 extensions  Projected expressions  Adds the ability for query results to contain values derived from constants, function calls, or other expressions in the SELECT list  Aggregates  Adds the ability to group results and calculate aggregate values (e.g. count, min, max, avg, sum, …)  Sub-queries  Allows one query to the nested within another F. Corno, L. Farinetti - Politecnico di Torino 144
  • 145. SPARQL 1.1 extensions  Negation  Includes improved language syntax for querying negations  Property paths  Adds the ability to query arbitrarily lenght path through a graph via a regular-expression-like syntax known as property paths  Basic federated query  Defines a mechanism for splitting a single query among multipleSPAQL endpoints and combining together the results from each F. Corno, L. Farinetti - Politecnico di Torino 145
  • 147. Exercises - RDF @prefix : <http://example.org/data#> . @prefix ont: <http://example.org/myOntology#> . @prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> . :john vcard:FN "John Smith" ; vcard:N [ vcard:Given "John" ; vcard:Family "Smith" ] ; ont:hasAge 32 ; ont:marriedTo :mary . :mary vcard:FN "Mary Smith" ; vcard:N [ vcard:Given "Mary" ; vcard:Family "Smith" ] ; ont:hasAge 29 . F. Corno, L. Farinetti - Politecnico di Torino 147
  • 148. SPARQL query – exercise 1  Return the full names of all people in the graph PREFIX vCard: <http://www.w3.org/2001/vcardrdf/3.0#> SELECT ?fullName WHERE {?x vCard:FN ?fullName}  Result fullName =============== “John Smith” “Mary Smith” F. Corno, L. Farinetti - Politecnico di Torino 148
  • 149. SPARQL query – exercise 2  Return the relation between John and Mary PREFIX : <http://example.org/data#> SELECT ?p WHERE { :john ?p :mary }  Result p ================= <http://example.org/myOntology#marriedTo> F. Corno, L. Farinetti - Politecnico di Torino 149
  • 150. SPARQL query – exercise 3  Return the spouse of a person whose name is John Smith PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> PREFIX ont: <http://example.org/myOntology#> SELECT ?y WHERE {?x vCard:FN "John Smith". ?x ont:marriedTo ?y}  Result y ================= <http://example.org/data#mary> F. Corno, L. Farinetti - Politecnico di Torino 150
  • 151. SPARQL query – exercise 4  Return the name and the first name of all people in the knowledge base PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> SELECT ?name, ?firstName WHERE {?x vCard:N ?name . ?name vCard:Given ?firstName}  Result name firstName ======================== “John Smith” "John" “Mary Smith” "Mary" F. Corno, L. Farinetti - Politecnico di Torino 151
  • 152. SPARQL query – exercise 5  Return all people over 30 in the knowledge base PREFIX ont: <http://example.org/myOntology#> SELECT ?x WHERE {?x ont:hasAge ?age . FILTER(?age > 30)}  Result x ================= <http://example.org/data#john> F. Corno, L. Farinetti - Politecnico di Torino 152
  • 153. FROM  Select RDF graph (= dataset) to be queried  In case of multiple FROM clauses, graphs are merged  Example PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name FROM <http://example.org/foaf/aliceFoaf> WHERE { ?x foaf:name ?name } F. Corno, L. Farinetti - Politecnico di Torino 153
  • 154. SPARQL query – exercise 6  Graph http://example.org/bob @prefix foaf: <http://xmlns.com/foaf/0.1/> . _:a foaf:name "Bob" . _:a foaf:mbox <mailto:bob@oldcorp.example.org> .  Graph http://example.org/alice @prefix foaf: <http://xmlns.com/foaf/0.1/> . _:a foaf:name "Alice" . _:a foaf:mbox <mailto:alice@work.example> . F. Corno, L. Farinetti - Politecnico di Torino 154
  • 155. SPARQL query – exercise 6  Return the names of people in both graphs PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?src ?name FROM NAMED <http://example.org/alice> FROM NAMED <http://example.org/bob> WHERE { GRAPH ?src { ?x foaf:name ?name } }  Result src name ======================================= <http://example.org/bob> "Bob" <http://example.org/alice> "Alice" F. Corno, L. Farinetti - Politecnico di Torino 155
  • 156. References  W3C, “Introduction to the Semantic Web”  http://www.w3.org/2006/Talks/0524-Edinburgh-IH/  Lee Feigenbaum, “SPARQL By Example”  http://www.cambridgesemantics.com/2008/09/sparql-by-example  Valentina Tamma, “Chapter 4: SPARQL”  http://www.csc.liv.ac.uk/~valli/Comp318/PDF/SPARQL.pdf  Tom Heath, “An Introduction to Linked Data”  http://tomheath.com/slides/2009-02-austin-linkeddata-tutorial.pdf  HTML Microdata  http://www.w3.org/TR/microdata  Schema.org  http://schema.org F. Corno, L. Farinetti - Politecnico di Torino 156
  • 157. License This work is licensed under the Creative Commons Attribution-Noncommercial- Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by- nc-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. F. Corno, L. Farinetti - Politecnico di Torino 157