SlideShare a Scribd company logo
Data Exchange over RDF

       Andr´s Letelier
            e
   Advisor: Marcelo Arenas

Pontificia Universidad Cat´lica de Chile
                         o


       September 1, 2011
What is data exchange?




   Problem
   Data under one schema S needs to be restructured and translated
   into a target schema T


                              S −→ T
                              IS −→ IT
Schema mappings



  Question
  Which source instances corresponds to which target instances?

  Answer
  Schema mappings:

                 M ⊆ Instances(S) × Instances(T)

  Usually, schema mappings are defined as M = (S, T, ΣST )
Definition (Solution)
I2 is a solution of I1 under M iif (I1 , I2 ) ∈ M
The set of all solutions for I1 under M is denoted by SolM (I1 )
Resource Description Framework (RDF)



      Data model for representing information about World Wide
      Web resources
      W3C Recommendation (1998)
      Part of the semantic web stack
      Directed, labeled graphs
      Blank nodes (labeled nulls)
      Basically, sets of triples (s, p, o)
Example
 D=   {
          (B1   name    paul)
          (B1   email   paul@example.edu)
          (B2   name    john)
          (B2   city    Liverpool)
                                            }
SPARQL (pronounced “sparkle”)


      Query language for RDF
      W3C Recommendation(2008)
      Standard for querying RDF datasets
      Returns sets of partial mappings
      Operators:
          Projection
          AND (inner join)
          OPT (left join)
          FILTER
          UNION
          and more
Example

          P1 = (?X, name, ?Y )

                     ?X    ?Y
          P1   D   = B1   paul
                     B2   john
Example

          P2 = (?X, name, ?Y ) AND (?X, email, ?Z)

                        ?X   ?Y            ?Z
           P2   D   =
                        B1   paul   paul@example.edu
Example

          P3 = (?X, name, ?Y ) OPT (?X, email, ?Z)

                      ?X    ?Y           ?Z
           P3   D   = B1   paul   paul@example.edu
                      B2   john
Well-designed SPARQL patterns


   Definition (Well-designed patterns)
   A pattern P is well designed if for every subpattern P of the form
   P1 OPT P2 , every variable that appears in P2 and outside P also
   appears in P1 .

   Example
       (?X, name, ?Y ) OPT ((?X, email, ?Z) OPT (?X, city, ?A))
       is well-designed
       (?X, name, ?Y ) OPT ((?W, email, ?Z) OPT (?X, city, ?A))
       is not
Data Exchange over RDF




      S and T are fixed to be RDF triples
      Tuple generating dependencies have to be redefined
      But first, we need some definitions...
RDF Tuple Generating Dependencies



   Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and
   Ω1 and Ω2 be sets of mappings. Then:
       var(P ) are the variables mentioned in P
       dom(µ1 ) is the domain of µ1
       A SPARQL SELECT query (denoted by (W, P ), where
       W ⊆ var(P )) is the projection of the evaluation of P onto
       the variables in W
RDF Tuple Generating Dependencies



   Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and
   Ω1 and Ω2 be sets of mappings. Then:
       µ1 is subsumed by µ2 (µ1 µ2 ) if dom(µ1 ) ⊆ dom(µ2 ), for
       every ?X in dom(µ1 ) that is not bound to a blank node we
       have that µ1 (?X) = µ2 (?X) and for every pair of variables
       ?X and ?Y in dom(µ1 ) such that µ1 (?X) = µ1 (?Y ) it is the
       case that µ2 (?X) = µ2 (?Y ).
       Ω1 is subsumed by Ω2 (Ω1 Ω2 ) if for every mapping µ1 in
       Ω1 there exists a mapping µ2 in Ω2 such that µ1 µ2 .
RDF Tuple Generating Dependencies



   (Re)Definition (Tuple Generating Dependencies)
   Let P1 and P2 be SPARQL patterns, and W ⊂ var(P1 ) ∩ var(P2 ).
   An RDF tgd is a sentence of the form

                            (W, P1 ) → (W, P2 )

   Given two RDF graphs G1 and G2 , and a set of tgds Σ,
   (G1 , G2 ) |= Σ if for every tgd (W, P1 ) → (W, P2 ) in Σ it is the
   case that (W, P1 ) G1        (W, P2 ) G2
RDF Schema Mappings




  Since S and T are fixed,

                             M=Σ


                G2 ∈ SolM (G1 ) ←→ (G1 , G2 ) |= Σ
Universal solutions

   Example
   Let W = {?X}, Σ =
   {(W, (?X, name, ?Y ) AND (?X, email, ?Z)) →
   (W, (?Y, hasmail, ?Z))}
   and consider the dataset D:

   Solution 1
    G2 =     {
                 (paul   hasmail   paul@example.edu)
                                                       }

   Solution 2
    G2 =     {
                 (paul   hasmail   paul@example.edu)
                 (john   hasmail    n)
                                                       }
Universal solutions




   Definition
   A solution G2 is universal if for every other solution G2 , G2   G2

       Solution 1 is universal
       Solution 2 is not
Universal solutions




   Not all settings have universal solutions:
   Consider G1 = {(1, 2, 3)}, W = {?X, ?Y } and

             Σ = {(W, (?X, ?Y, ?Z)) →
                   (W, ((?X, a, b) OPT (?W, b, ?Y ))
                    AND ((?X, c, d) OPT (?Z, d, ?Y )))}
Solution 1
 G2 =    {
             (1     a   b)
             ( n1   b   2)
             (1     c   d)
                             }

Solution 2
 G2 =    {
             (1     a   b)
             ( n2   d   2)
             (1     c   d)
                             }
This setting has no universal solution!
Good and bad news



  Bad news
  There is no ensurance that an exchange setting that has a solution
  will have a universal solution

  Good news
  If the heads of all tgds in Σ are well-designed and there is a
  solution, there is always a universal solution

  Better news
  We have an algorithm
“Chasing” SPARQL queries

 input A mapping µ and a (well-designed) SPARQL pattern P
output An RDF graph G such that µ ∈ P     G

  Chase(µ, ν, P, G)
      t:
      add unbound variables in t as fresh blank nodes to ν
      add ν(t) to G
      P1 AND P2 :
      Chase(µ, ν, P1 , G)
      Chase(µ, ν, P2 , G)
      P1 OPT P2 :
      Chase(µ, ν, P1 , G)
      if dom(µ)  dom(ν) ∩ var(P2 ) = ∅: Chase(µ, ν, P2 , G)
After chasing:




       µ     ν
       ν∈ P      G
       {µ}       P   G
       If we chase with every P2 in Heads(Σ) the evaluations of
        (W, P1 ) G1 , we get a universal solution.
Certain answers



   Definition (Certain answers on a regular data exchange setting)
   The set of certain answers is the intersection of the evaluation of
   the query over all the valid solutions

   Example
   Consider G1 = {(1, 2, 3)} and

              {({?X},(?X, ?Y, ?Z)) →
                      ({?X}, (?X, 1, 2) OPT (?X, ?Y, 3))}
Solution 1
 G2 =   {
             (1   1   2)                  (W, P2 )     G2   = {{?X → 1}}
                           }

Solution 2
 G2 =   {
             (1   1   2)
                                      (W, P2 )   G2   = {{?X → 1, ?Y → 2}}
             (1   2   3)
                        }
  The intersection of (W, P2 )   G2   and (W, P2 )     G2   is empty!
Certain answers


   Given a pattern P and a set of RDF graphs G, let Lower(P, G) be
   the set of all lower bounds of G w.r.t. subsumption.
   (Re)Definition (Certain Answers)
   The set of certain answers of a set of RDF graphs and a SPARQL
   pattern P is defined as any mapping Ω in Lower(P, G), such that
   for any other Ω in Lower(P, G) it is the case that Ω Ω .

   Claim
   All the possible sets of certain answers to an RDF data exchange
   setting are homomorfically equivalent.
Back in our previous example...



 Solution 1
  G2 =   {                            (W, P2 )   G2   = {{?X → 1}}
              (1   1   2)
                            }

 Solution 2
  G2 =   {
              (1   1   2)
              (1   2   3)
                                    (W, P2 ) G2 = {{?X → 1, ?Y → 2}}
                          }
   The set of certain answers is now {{?X → 1}}
In conclusion...




   Our contributions so far:
       RDF and SPARQL TGDs
       RDF Schema mappings
       Universal solutions
       Materialization of universal solutions
       Certain answers
In conclusion...




   To do:
       Prove remaining claims
       Query answering (using universal solutions)
       Incomplete information in the source instance
       Knowledge exchange over RDFs
Thank you for listening




   Any questions?

More Related Content

PDF
Group theory notes
PDF
Abstract Algebra Cheat Sheet
PDF
Some Results on the Group of Lower Unitriangular Matrices L(3,Zp)
PDF
Group Theory and Its Application: Beamer Presentation (PPT)
PDF
PostgreSQL: Approximated searches
PDF
PDF
kunci jawaban grup
Group theory notes
Abstract Algebra Cheat Sheet
Some Results on the Group of Lower Unitriangular Matrices L(3,Zp)
Group Theory and Its Application: Beamer Presentation (PPT)
PostgreSQL: Approximated searches
kunci jawaban grup

What's hot (14)

PPTX
PDF
Lesson 22: Optimization Problems (slides)
PDF
1 cb02e45d01
PDF
Dmss2011 public
PDF
Tele4653 l4
PDF
Ecfft zk studyclub 9.9
PDF
Gwt sdm public
PDF
Total Dominating Color Transversal Number of Graphs And Graph Operations
PPT
Classification of Groups and Homomorphism -By-Rajesh Bandari Yadav
PDF
Gwt presen alsip-20111201
PDF
A STUDY ON L-FUZZY NORMAL SUBl -GROUP
PDF
Biconnectivity
PDF
15 predicate
PDF
Some Concepts on Constant Interval Valued Intuitionistic Fuzzy Graphs
Lesson 22: Optimization Problems (slides)
1 cb02e45d01
Dmss2011 public
Tele4653 l4
Ecfft zk studyclub 9.9
Gwt sdm public
Total Dominating Color Transversal Number of Graphs And Graph Operations
Classification of Groups and Homomorphism -By-Rajesh Bandari Yadav
Gwt presen alsip-20111201
A STUDY ON L-FUZZY NORMAL SUBl -GROUP
Biconnectivity
15 predicate
Some Concepts on Constant Interval Valued Intuitionistic Fuzzy Graphs
Ad

Viewers also liked (7)

PDF
Evolving web, evolving search
PDF
Vector spaces for information extraction - Random Projection Example
PDF
Exchanging More than Complete Data
PDF
Managing Social Communities
PDF
Extracting Information for Context-aware Meeting Preparation
PDF
Exchanging More than Complete Data
PDF
Exchanging more than Complete Data
Evolving web, evolving search
Vector spaces for information extraction - Random Projection Example
Exchanging More than Complete Data
Managing Social Communities
Extracting Information for Context-aware Meeting Preparation
Exchanging More than Complete Data
Exchanging more than Complete Data
Ad

Similar to Data Exchange over RDF (20)

PDF
Federation and Navigation in SPARQL 1.1
PDF
Fosdem 2013 petra selmer flexible querying of graph data
PDF
On Unified Stream Reasoning
PPT
A Distributed Tableau Algorithm for Package-based Description Logics
PDF
Start From A MapReduce Graph Pattern-recognize Algorithm
PDF
Cascon2011_5_rules+owl
PDF
Exchanging OWL 2 QL Knowledge Bases
PPTX
Efficient Parallel Set-Similarity Joins Using MapReduce
PDF
Sem tech 2010_integrity_constraints
PDF
Validating Linked Data with OWL
PPT
OWL briefing
PPT
Fusing semantic data
PPT
Formal- Relational- Query- Languages.ppt
PPT
Formal-Relational-Query-Languages.ppt for education
PDF
Optimizing SPARQL Queries with SHACL.pdf
PDF
Introduction to query rewriting optimisation with dependencies
PDF
Heuristic based Query Optimisation for SPARQL
PDF
OWL Full Semantics
PDF
Semantic Data Box
PDF
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Federation and Navigation in SPARQL 1.1
Fosdem 2013 petra selmer flexible querying of graph data
On Unified Stream Reasoning
A Distributed Tableau Algorithm for Package-based Description Logics
Start From A MapReduce Graph Pattern-recognize Algorithm
Cascon2011_5_rules+owl
Exchanging OWL 2 QL Knowledge Bases
Efficient Parallel Set-Similarity Joins Using MapReduce
Sem tech 2010_integrity_constraints
Validating Linked Data with OWL
OWL briefing
Fusing semantic data
Formal- Relational- Query- Languages.ppt
Formal-Relational-Query-Languages.ppt for education
Optimizing SPARQL Queries with SHACL.pdf
Introduction to query rewriting optimisation with dependencies
Heuristic based Query Optimisation for SPARQL
OWL Full Semantics
Semantic Data Box
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs

More from net2-project (8)

PDF
Random Manhattan Indexing
PDF
Borders of Decidability in Verification of Data-Centric Dynamic Systems
PDF
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
PDF
Extending DBpedia (LOD) using WikiTables
PDF
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
PDF
Answer-set programming
PDF
XSPARQL Tutorial
PPTX
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
Random Manhattan Indexing
Borders of Decidability in Verification of Data-Centric Dynamic Systems
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
Extending DBpedia (LOD) using WikiTables
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
Answer-set programming
XSPARQL Tutorial
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)

Recently uploaded (20)

PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
Cell Structure & Organelles in detailed.
PDF
Computing-Curriculum for Schools in Ghana
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Cell Types and Its function , kingdom of life
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Trump Administration's workforce development strategy
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Classroom Observation Tools for Teachers
History, Philosophy and sociology of education (1).pptx
Cell Structure & Organelles in detailed.
Computing-Curriculum for Schools in Ghana
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Anesthesia in Laparoscopic Surgery in India
Cell Types and Its function , kingdom of life
Chinmaya Tiranga quiz Grand Finale.pdf
RMMM.pdf make it easy to upload and study
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
LDMMIA Reiki Yoga Finals Review Spring Summer
Trump Administration's workforce development strategy
Orientation - ARALprogram of Deped to the Parents.pptx
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Complications of Minimal Access Surgery at WLH
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Microbial disease of the cardiovascular and lymphatic systems
Classroom Observation Tools for Teachers

Data Exchange over RDF

  • 1. Data Exchange over RDF Andr´s Letelier e Advisor: Marcelo Arenas Pontificia Universidad Cat´lica de Chile o September 1, 2011
  • 2. What is data exchange? Problem Data under one schema S needs to be restructured and translated into a target schema T S −→ T IS −→ IT
  • 3. Schema mappings Question Which source instances corresponds to which target instances? Answer Schema mappings: M ⊆ Instances(S) × Instances(T) Usually, schema mappings are defined as M = (S, T, ΣST )
  • 4. Definition (Solution) I2 is a solution of I1 under M iif (I1 , I2 ) ∈ M The set of all solutions for I1 under M is denoted by SolM (I1 )
  • 5. Resource Description Framework (RDF) Data model for representing information about World Wide Web resources W3C Recommendation (1998) Part of the semantic web stack Directed, labeled graphs Blank nodes (labeled nulls) Basically, sets of triples (s, p, o)
  • 6. Example D= { (B1 name paul) (B1 email paul@example.edu) (B2 name john) (B2 city Liverpool) }
  • 7. SPARQL (pronounced “sparkle”) Query language for RDF W3C Recommendation(2008) Standard for querying RDF datasets Returns sets of partial mappings Operators: Projection AND (inner join) OPT (left join) FILTER UNION and more
  • 8. Example P1 = (?X, name, ?Y ) ?X ?Y P1 D = B1 paul B2 john
  • 9. Example P2 = (?X, name, ?Y ) AND (?X, email, ?Z) ?X ?Y ?Z P2 D = B1 paul paul@example.edu
  • 10. Example P3 = (?X, name, ?Y ) OPT (?X, email, ?Z) ?X ?Y ?Z P3 D = B1 paul paul@example.edu B2 john
  • 11. Well-designed SPARQL patterns Definition (Well-designed patterns) A pattern P is well designed if for every subpattern P of the form P1 OPT P2 , every variable that appears in P2 and outside P also appears in P1 . Example (?X, name, ?Y ) OPT ((?X, email, ?Z) OPT (?X, city, ?A)) is well-designed (?X, name, ?Y ) OPT ((?W, email, ?Z) OPT (?X, city, ?A)) is not
  • 12. Data Exchange over RDF S and T are fixed to be RDF triples Tuple generating dependencies have to be redefined But first, we need some definitions...
  • 13. RDF Tuple Generating Dependencies Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and Ω1 and Ω2 be sets of mappings. Then: var(P ) are the variables mentioned in P dom(µ1 ) is the domain of µ1 A SPARQL SELECT query (denoted by (W, P ), where W ⊆ var(P )) is the projection of the evaluation of P onto the variables in W
  • 14. RDF Tuple Generating Dependencies Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, and Ω1 and Ω2 be sets of mappings. Then: µ1 is subsumed by µ2 (µ1 µ2 ) if dom(µ1 ) ⊆ dom(µ2 ), for every ?X in dom(µ1 ) that is not bound to a blank node we have that µ1 (?X) = µ2 (?X) and for every pair of variables ?X and ?Y in dom(µ1 ) such that µ1 (?X) = µ1 (?Y ) it is the case that µ2 (?X) = µ2 (?Y ). Ω1 is subsumed by Ω2 (Ω1 Ω2 ) if for every mapping µ1 in Ω1 there exists a mapping µ2 in Ω2 such that µ1 µ2 .
  • 15. RDF Tuple Generating Dependencies (Re)Definition (Tuple Generating Dependencies) Let P1 and P2 be SPARQL patterns, and W ⊂ var(P1 ) ∩ var(P2 ). An RDF tgd is a sentence of the form (W, P1 ) → (W, P2 ) Given two RDF graphs G1 and G2 , and a set of tgds Σ, (G1 , G2 ) |= Σ if for every tgd (W, P1 ) → (W, P2 ) in Σ it is the case that (W, P1 ) G1 (W, P2 ) G2
  • 16. RDF Schema Mappings Since S and T are fixed, M=Σ G2 ∈ SolM (G1 ) ←→ (G1 , G2 ) |= Σ
  • 17. Universal solutions Example Let W = {?X}, Σ = {(W, (?X, name, ?Y ) AND (?X, email, ?Z)) → (W, (?Y, hasmail, ?Z))} and consider the dataset D: Solution 1 G2 = { (paul hasmail paul@example.edu) } Solution 2 G2 = { (paul hasmail paul@example.edu) (john hasmail n) }
  • 18. Universal solutions Definition A solution G2 is universal if for every other solution G2 , G2 G2 Solution 1 is universal Solution 2 is not
  • 19. Universal solutions Not all settings have universal solutions: Consider G1 = {(1, 2, 3)}, W = {?X, ?Y } and Σ = {(W, (?X, ?Y, ?Z)) → (W, ((?X, a, b) OPT (?W, b, ?Y )) AND ((?X, c, d) OPT (?Z, d, ?Y )))}
  • 20. Solution 1 G2 = { (1 a b) ( n1 b 2) (1 c d) } Solution 2 G2 = { (1 a b) ( n2 d 2) (1 c d) } This setting has no universal solution!
  • 21. Good and bad news Bad news There is no ensurance that an exchange setting that has a solution will have a universal solution Good news If the heads of all tgds in Σ are well-designed and there is a solution, there is always a universal solution Better news We have an algorithm
  • 22. “Chasing” SPARQL queries input A mapping µ and a (well-designed) SPARQL pattern P output An RDF graph G such that µ ∈ P G Chase(µ, ν, P, G) t: add unbound variables in t as fresh blank nodes to ν add ν(t) to G P1 AND P2 : Chase(µ, ν, P1 , G) Chase(µ, ν, P2 , G) P1 OPT P2 : Chase(µ, ν, P1 , G) if dom(µ) dom(ν) ∩ var(P2 ) = ∅: Chase(µ, ν, P2 , G)
  • 23. After chasing: µ ν ν∈ P G {µ} P G If we chase with every P2 in Heads(Σ) the evaluations of (W, P1 ) G1 , we get a universal solution.
  • 24. Certain answers Definition (Certain answers on a regular data exchange setting) The set of certain answers is the intersection of the evaluation of the query over all the valid solutions Example Consider G1 = {(1, 2, 3)} and {({?X},(?X, ?Y, ?Z)) → ({?X}, (?X, 1, 2) OPT (?X, ?Y, 3))}
  • 25. Solution 1 G2 = { (1 1 2) (W, P2 ) G2 = {{?X → 1}} } Solution 2 G2 = { (1 1 2) (W, P2 ) G2 = {{?X → 1, ?Y → 2}} (1 2 3) } The intersection of (W, P2 ) G2 and (W, P2 ) G2 is empty!
  • 26. Certain answers Given a pattern P and a set of RDF graphs G, let Lower(P, G) be the set of all lower bounds of G w.r.t. subsumption. (Re)Definition (Certain Answers) The set of certain answers of a set of RDF graphs and a SPARQL pattern P is defined as any mapping Ω in Lower(P, G), such that for any other Ω in Lower(P, G) it is the case that Ω Ω . Claim All the possible sets of certain answers to an RDF data exchange setting are homomorfically equivalent.
  • 27. Back in our previous example... Solution 1 G2 = { (W, P2 ) G2 = {{?X → 1}} (1 1 2) } Solution 2 G2 = { (1 1 2) (1 2 3) (W, P2 ) G2 = {{?X → 1, ?Y → 2}} } The set of certain answers is now {{?X → 1}}
  • 28. In conclusion... Our contributions so far: RDF and SPARQL TGDs RDF Schema mappings Universal solutions Materialization of universal solutions Certain answers
  • 29. In conclusion... To do: Prove remaining claims Query answering (using universal solutions) Incomplete information in the source instance Knowledge exchange over RDFs
  • 30. Thank you for listening Any questions?