SlideShare a Scribd company logo
Exchanging more than Complete Data

                                                Marcelo Arenas
                                                     PUC Chile


                Joint work with Jorge P´rez (U. de Chile) and Juan Reutter (U. Edinburgh)
                                       e



M. Arenas   –    Exchanging more than Complete Data - RR2011                                1 / 68
Outline: First part




            ◮   The data exchange problem
                   ◮   Some fundamental results in relational data exchange


            ◮   The need for a more general data exchange framework
                   ◮   Two important scenarios: Incomplete databases (open-world
                       databases: RDF) and knowledge bases




M. Arenas   –   Exchanging more than Complete Data - RR2011                        2 / 68
Outline: First part




            ◮   The data exchange problem
                   ◮   Some fundamental results in relational data exchange


            ◮   The need for a more general data exchange framework
                   ◮   Two important scenarios: Incomplete databases (open-world
                       databases: RDF) and knowledge bases




M. Arenas   –   Exchanging more than Complete Data - RR2011                        3 / 68
The problem of data exchange


       Given: A source schema S, a target schema T and a specification
       Σ of the relationship between these schemas


       Data exchange: Problem of materializing an instance of T given
       an instance of S
            ◮   Target instance should reflect the source data as accurately as
                possible, given the constraints imposed by Σ and T
            ◮   It should be efficiently computable
            ◮   It should allow one to evaluate queries on the target in a way
                that is semantically consistent with the source data



M. Arenas   –   Exchanging more than Complete Data - RR2011                      4 / 68
Data exchange in a picture




                                                              Query Q

                                                   Σ


                Schema S                                      Schema T




M. Arenas   –   Exchanging more than Complete Data - RR2011              5 / 68
Data exchange in a picture




                                                              Query Q

                                                   Σ


                Schema S                                      Schema T




M. Arenas   –   Exchanging more than Complete Data - RR2011              5 / 68
Data exchange in a picture




                                                              Query Q

                                                   Σ


                Schema S                                      Schema T




M. Arenas   –   Exchanging more than Complete Data - RR2011              5 / 68
Data exchange in a picture




                                                   Σ


                Schema S                                      Schema T




M. Arenas   –   Exchanging more than Complete Data - RR2011              5 / 68
Data exchange in a picture




                                                              Query Q

                                                   Σ


                Schema S                                      Schema T




M. Arenas   –   Exchanging more than Complete Data - RR2011              5 / 68
Data exchange: Some fundamental questions



       What are the challenges in the area?
            ◮   What is a good language for specifying the relationship
                between source and target data?
                   ◮   Expressiveness versus complexity

            ◮   What is a good instance to materialize?
            ◮   What does it mean to answer a query over target data?
            ◮   How do we answer queries over target data? Can we do this
                efficiently?




M. Arenas   –   Exchanging more than Complete Data - RR2011                 6 / 68
Exchanging relational data

       The data exchange problem has been extensively studied in the
       relational world.
            ◮   It has also been commercially implemented: IBM Clio


       Relational data exchange setting:
            ◮   Source and target schemas: Relational schemas
            ◮   Relationship between source and target schemas:
                Source-to-target tuple-generating dependencies (st-tgds)


       Semantics of data exchange has been precisely defined.
            ◮   Efficient algorithms for materializing target instances and for
                answering queries over the target schema have been developed


M. Arenas   –   Exchanging more than Complete Data - RR2011                     7 / 68
Schema mapping: The key component in relational data
 exchange


       Schema mapping: M = (S, T, Σ)
            ◮   S and T are disjoint relational schemas
            ◮   Σ is a finite set of st-tgds:

                                      ∀¯∀¯ (ϕ(¯ , y ) → ∃¯ ψ(¯ , z ))
                                       x y    x ¯        z x ¯

                       ϕ(¯, y ): conjunction of relational atomic formulas over S
                         x ¯
                       ψ(¯, z ): conjunction of relational atomic formulas over T
                         x ¯




M. Arenas   –   Exchanging more than Complete Data - RR2011                         8 / 68
Relational schema mappings: An example

       Example
            ◮   S: Employee(name)

            ◮   T: Dept(name, number)

            ◮   Σ:

                               ∀x Employee(x) → ∃y Dept(x, y )




M. Arenas   –   Exchanging more than Complete Data - RR2011      9 / 68
Relational schema mappings: An example

       Example
            ◮   S: Employee(name)

            ◮   T: Dept(name, number)

            ◮   Σ:

                               ∀x Employee(x) → ∃y Dept(x, y )



       Note
       We omit universal quantifiers in st-tgds:
                                Employee(x) → ∃y Dept(x, y )

M. Arenas   –   Exchanging more than Complete Data - RR2011      9 / 68
Relational data exchange problem


       Fixed: M = (S, T, Σ)

       Problem: Given instance I of S, find an instance J of T such that
       (I , J) satisfies Σ
            ◮   (I , J) satisfies ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) if whenever I satisfies
                                   x ¯        z x ¯
                    a ¯
                ϕ(¯, b), there is a tuple c such that J satisfies ψ(¯, c )
                                            ¯                          a ¯




M. Arenas   –   Exchanging more than Complete Data - RR2011                        10 / 68
Relational data exchange problem


       Fixed: M = (S, T, Σ)

       Problem: Given instance I of S, find an instance J of T such that
       (I , J) satisfies Σ
            ◮   (I , J) satisfies ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) if whenever I satisfies
                                   x ¯        z x ¯
                    a ¯
                ϕ(¯, b), there is a tuple c such that J satisfies ψ(¯, c )
                                            ¯                          a ¯



       Notation
       J is a solution for I under M
            ◮   SolM (I ): Set of solutions for I under M



M. Arenas   –   Exchanging more than Complete Data - RR2011                        10 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:




M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:
                J1 : {Dept(Peter,1)}




M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:
                J1 : {Dept(Peter,1)}
                J2 : {Dept(Peter,1), Dept(Peter,2)}




M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:
                J1 : {Dept(Peter,1)}
                J2 : {Dept(Peter,1), Dept(Peter,2)}
                J3 : {Dept(Peter,1), Dept(John,1)}




M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:
                J1 : {Dept(Peter,1)}
                J2 : {Dept(Peter,1), Dept(Peter,2)}
                J3 : {Dept(Peter,1), Dept(John,1)}
                J4 : {Dept(Peter,n1 )}



M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
The notion of solution: Example

       Example
            ◮   S: Employee(name)
            ◮   T: Dept(name, number)
            ◮   Σ: Employee(x) → ∃y Dept(x, y )

       Solutions for I = {Employee(Peter)}:
                J1 : {Dept(Peter,1)}
                J2 : {Dept(Peter,1), Dept(Peter,2)}
                J3 : {Dept(Peter,1), Dept(John,1)}
                J4 : {Dept(Peter,n1 )}
                J5 : {Dept(Peter,n1 ), Dept(Peter,n2 )}

M. Arenas   –   Exchanging more than Complete Data - RR2011   11 / 68
Canonical universal solution



       Algorithm (Chase)
            Input         :    M = (S, T, Σ) and an instance I of S
            Output        :    Canonical universal solution J ⋆ for I under M

                 let J ⋆ := empty instance of T
                 for every ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) in Σ do
                                x ¯       z x ¯
                                  a ¯                         a ¯
                      for every ¯, b such that I satisfies ϕ(¯, b) do
                           create a fresh tuple n of pairwise distinct null values
                                                 ¯
                           insert ψ(¯, n) into J
                                    a ¯          ⋆




M. Arenas    –   Exchanging more than Complete Data - RR2011                         12 / 68
Canonical universal solution: Example

       Example
       Consider mapping M specified by dependency:
                                   Employee(x)           → ∃y Dept(x, y )

       Canonical universal solution for
       I = {Employee(Peter), Employee(John)}:
            ◮   For a = Peter do
                   ◮   Create a fresh null value n1
                   ◮   Insert Dept(Peter, n1 ) into J ⋆
            ◮   For a = John do
                   ◮   Create a fresh null value n2
                   ◮   Insert Dept(John, n2 ) into J ⋆

       Result: J ⋆ = {Dept(Peter, n1 ), Dept(John, n2 )}

M. Arenas   –   Exchanging more than Complete Data - RR2011                 13 / 68
Query answering in data exchange



       Given: Mapping M, source instance I and query Q over the target
       schema
            ◮   What does it mean to answer Q?




M. Arenas   –   Exchanging more than Complete Data - RR2011              14 / 68
Query answering in data exchange



       Given: Mapping M, source instance I and query Q over the target
       schema
            ◮   What does it mean to answer Q?



       Definition (Certain answers)
                     certainM (Q, I ) =                                               Q(J)
                                                      J is a solution for I under M




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  14 / 68
Certain answers: Example



       Example
       Consider mapping M specified by:
                                 Employee(x) → ∃y Dept(x, y )


       Given instance I = {Employee(Peter)}:

                            certainM (∃y Dept(x, y ), I )     =   {Peter}
                            certainM (Dept(x, y ), I )        =   ∅




M. Arenas   –   Exchanging more than Complete Data - RR2011                 15 / 68
Query rewriting: An approach for answering queries


       How can we compute certain answers?
            ◮   Na¨ algorithm does not work: infinitely many solutions
                  ıve




M. Arenas   –   Exchanging more than Complete Data - RR2011             16 / 68
Query rewriting: An approach for answering queries


       How can we compute certain answers?
            ◮   Na¨ algorithm does not work: infinitely many solutions
                  ıve


       Approach proposed in [FKMP03]: Query Rewriting
                Given a mapping M and a target query Q, compute a query
                Q ⋆ such that for every source instance I with canonical
                universal solution J ⋆ :

                                      certainM (Q, I ) = Q ⋆ (J ⋆ )




M. Arenas   –   Exchanging more than Complete Data - RR2011                16 / 68
Query rewriting over the canonical universal solution

       Theorem (FKMP03)
       Given a mapping M specified by st-tgds and a union of
       conjunctive queries Q, there exists a query Q ⋆ such that for every
       source instance I with canonical universal solution J ⋆ :

                                      certainM (Q, I ) = Q ⋆ (J ⋆ )




M. Arenas   –   Exchanging more than Complete Data - RR2011                  17 / 68
Query rewriting over the canonical universal solution

       Theorem (FKMP03)
       Given a mapping M specified by st-tgds and a union of
       conjunctive queries Q, there exists a query Q ⋆ such that for every
       source instance I with canonical universal solution J ⋆ :

                                      certainM (Q, I ) = Q ⋆ (J ⋆ )



       Proof idea: Assume that C(a) holds whenever a is a constant.

       Then:
                Q ⋆ (x1 , . . . , xm ) = C(x1 ) ∧ · · · ∧ C(xm ) ∧ Q(x1 , . . . , xm )



M. Arenas   –   Exchanging more than Complete Data - RR2011                              17 / 68
Computing certain answers: Complexity



       Data complexity: Data exchange setting and query are considered
       to be fixed.


       Corollary (FKMP03)
       For mappings given by st-tgds, certain answers for UCQ can be
       computed in polynomial time (data complexity)




M. Arenas   –   Exchanging more than Complete Data - RR2011              18 / 68
Relational data exchange: Some lessons learned

       Key steps in the development of the area:
            ◮   Definition of schema mappings: Precise syntax and semantics
                   ◮   Definition of the notion of solution

            ◮   Identification of good solutions
            ◮   Polynomial time algorithms for materializing good solutions
            ◮   Definition of target queries: Precise semantics
            ◮   Polynomial time algorithms for computing certain answers for
                UCQ




M. Arenas   –   Exchanging more than Complete Data - RR2011                    19 / 68
Relational data exchange: Some lessons learned

       Key steps in the development of the area:
            ◮   Definition of schema mappings: Precise syntax and semantics
                   ◮   Definition of the notion of solution

            ◮   Identification of good solutions
            ◮   Polynomial time algorithms for materializing good solutions
            ◮   Definition of target queries: Precise semantics
            ◮   Polynomial time algorithms for computing certain answers for
                UCQ


       Creating schema mappings is a time consuming and expensive
       process
            ◮   Manual or semi-automatic process in general

M. Arenas   –   Exchanging more than Complete Data - RR2011                    19 / 68
Outline: First part




            ◮   The data exchange problem
                   ◮   Some fundamental results in relational data exchange


            ◮   The need for a more general data exchange framework
                   ◮   Two important scenarios: Incomplete databases (open-world
                       databases: RDF) and knowledge bases




M. Arenas   –   Exchanging more than Complete Data - RR2011                        20 / 68
Ongoing project: Reusing schema mappings


                      S                                   T         U
                                       ΣST                    ΣTU



                                                        ΣSU




M. Arenas   –   Exchanging more than Complete Data - RR2011             21 / 68
Ongoing project: Reusing schema mappings


                      S                                   T         U
                                       ΣST                    ΣTU



                                                        ΣSU




M. Arenas   –   Exchanging more than Complete Data - RR2011             21 / 68
Ongoing project: Reusing schema mappings


                      S                                   T          U
                                       ΣST                     ΣTU



                                                       ΣSU ?




M. Arenas   –   Exchanging more than Complete Data - RR2011              21 / 68
Ongoing project: Reusing schema mappings


                      S                                   T          U
                                       ΣST                     ΣTU



                                                       ΣSU ?




       We need some operators for schema mappings




M. Arenas   –   Exchanging more than Complete Data - RR2011              21 / 68
Ongoing project: Reusing schema mappings


                      S                                   T            U
                                       ΣST                       ΣTU



                                               ΣSU = ΣST ◦ ΣTU




       We need some operators for schema mappings
            ◮   Composition in the above case




M. Arenas   –   Exchanging more than Complete Data - RR2011                21 / 68
Metadata management



       Contributions mentioned in the previous slides are just a first step
       towards the development of a general framework for data exchange.


       In fact, as pointed in [B03],
            many information system problems involve not only the design
            and integration of complex application artifacts, but also their
            subsequent manipulation.




M. Arenas   –   Exchanging more than Complete Data - RR2011                    22 / 68
Metadata management



       This has motivated the need for the development of a general
       infrastructure for managing schema mappings.

       The problem of managing schema mappings is called metadata
       management.

       High-level algebraic operators, such as compose, are used to
       manipulate mappings.
            ◮   What other operators are needed?




M. Arenas   –   Exchanging more than Complete Data - RR2011           23 / 68
An inverse operator is also needed


                            S                                 T                    U
                                             ΣST                      ΣTU


 ΣVS =ΣVS ?
       Σ−1
        SV                       ΣSV
                                                          Σ−1 ◦ ΣST
                                                           VS           (Σ−1 ◦ ΣST ) ◦ ΣTU
                                                                          VS



                            V




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  24 / 68
An inverse operator is also needed


                            S                                 T                    U
                                             ΣST                      ΣTU


 ΣVS =ΣVS ?
       ΣSV
        −1
                                 ΣSV
                                                          Σ−1 ◦ ΣST
                                                           VS           (ΣVS ◦ ΣST ) ◦ ΣTU
                                                                          −1




                            V




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  24 / 68
An inverse operator is also needed


                             S                                 T                    U
                                              ΣST                      ΣTU
 ΣVS = Σ−1
        SV


            ΣVS ?                 ΣSV
                                                           Σ−1 ◦ ΣST
                                                            VS           (Σ−1 ◦ ΣST ) ◦ ΣTU
                                                                           VS



                             V




M. Arenas    –   Exchanging more than Complete Data - RR2011                                  24 / 68
An inverse operator is also needed


                            S                                 T                    U
                                             ΣST                      ΣTU


 ΣVS = Σ−1
        SV                       ΣSV
                                                          Σ−1 ◦ ΣST
                                                           VS           (Σ−1 ◦ ΣST ) ◦ ΣTU
                                                                          VS



                            V




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  24 / 68
An inverse operator is also needed


                            S                                 T                    U
                                             ΣST                      ΣTU


 ΣVS = Σ−1
        SV                       ΣSV
                                                          Σ−1 ◦ ΣST
                                                           SV           (Σ−1 ◦ ΣST ) ◦ ΣTU
                                                                          VS



                            V




       Composition and inverse operators have to be combined




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  24 / 68
An inverse operator is also needed


                            S                                 T                    U
                                             ΣST                      ΣTU


 ΣVS = Σ−1
        SV                       ΣSV
                                                          Σ−1 ◦ ΣST
                                                           SV           (Σ−1 ◦ ΣST ) ◦ ΣTU
                                                                          SV



                            V




       Composition and inverse operators have to be combined




M. Arenas   –   Exchanging more than Complete Data - RR2011                                  24 / 68
Metadata management: A more general data exchange
 framework is needed

       Composition and inverse operators have been extensively studied in the
       relational world.
            ◮   Semantics, computation, . . .


       Combining these operators is an open issue.




M. Arenas   –   Exchanging more than Complete Data - RR2011                     25 / 68
Metadata management: A more general data exchange
 framework is needed

       Composition and inverse operators have been extensively studied in the
       relational world.
            ◮   Semantics, computation, . . .


       Combining these operators is an open issue.
            ◮   Key observation: A target instance of a mapping can be the source
                instance of another mapping




M. Arenas   –   Exchanging more than Complete Data - RR2011                         25 / 68
Metadata management: A more general data exchange
 framework is needed

       Composition and inverse operators have been extensively studied in the
       relational world.
            ◮   Semantics, computation, . . .


       Combining these operators is an open issue.
            ◮   Key observation: A target instance of a mapping can be the source
                instance of another mapping
            ◮   Sources instances may contain null values




M. Arenas   –   Exchanging more than Complete Data - RR2011                         25 / 68
Metadata management: A more general data exchange
 framework is needed

       Composition and inverse operators have been extensively studied in the
       relational world.
            ◮   Semantics, computation, . . .


       Combining these operators is an open issue.
            ◮   Key observation: A target instance of a mapping can be the source
                instance of another mapping
            ◮   Sources instances may contain null values


       There is a need for a data exchange framework that can handle databases
       with incomplete information.


M. Arenas   –   Exchanging more than Complete Data - RR2011                         25 / 68
Data exchange in the RDF world

       There is an increasing interest in publishing relational data as RDF
            ◮   Resulted in the creation of the W3C RDB2RDF Working Group


       The problem of translating relational data into RDF can be seen as a
       data exchange problem
            ◮   Schema mappings can be used to describe how the relational data is
                to be mapped into RDF




M. Arenas   –   Exchanging more than Complete Data - RR2011                          26 / 68
Data exchange in the RDF world

       There is an increasing interest in publishing relational data as RDF
            ◮   Resulted in the creation of the W3C RDB2RDF Working Group


       The problem of translating relational data into RDF can be seen as a
       data exchange problem
            ◮   Schema mappings can be used to describe how the relational data is
                to be mapped into RDF


       But there is a mismatch here: A relational database under a closed-world
       semantics is to be translated into an RDF graph under an open-world
       semantics
            ◮   There is a need for a data exchange framework that can handle
                both databases with complete and incomplete information


M. Arenas   –   Exchanging more than Complete Data - RR2011                          26 / 68
Data exchange in the RDF world


       An issue discussed at the W3C RDB2RDF Working Group: Is a
       mapping information preserving?
            ◮   In particular: For the default mapping defined by this group


       How can we address this issue?
            ◮   Metadata management can help us




M. Arenas   –   Exchanging more than Complete Data - RR2011                   27 / 68
Data exchange in the RDF world


       An issue discussed at the W3C RDB2RDF Working Group: Is a
       mapping information preserving?
            ◮   In particular: For the default mapping defined by this group


       How can we address this issue?
            ◮   Metadata management can help us


       Question to answer: Is a mapping invertible?




M. Arenas   –   Exchanging more than Complete Data - RR2011                   27 / 68
Data exchange in the RDF world


       An issue discussed at the W3C RDB2RDF Working Group: Is a
       mapping information preserving?
            ◮   In particular: For the default mapping defined by this group


       How can we address this issue?
            ◮   Metadata management can help us


       Question to answer: Is a mapping invertible?
            ◮   This time an RDF graph is to be translated into a relational
                database!




M. Arenas   –   Exchanging more than Complete Data - RR2011                    27 / 68
Data exchange in the RDF world


       An issue discussed at the W3C RDB2RDF Working Group: Is a
       mapping information preserving?
            ◮   In particular: For the default mapping defined by this group


       How can we address this issue?
            ◮   Metadata management can help us


       Question to answer: Is a mapping invertible?
            ◮   This time an RDF graph is to be translated into a relational
                database!
            ◮   We want to have a unifying framework for all these cases



M. Arenas   –   Exchanging more than Complete Data - RR2011                    27 / 68
But these are not the only reasons . . .

       Nowadays several applications use knowledge bases to represent data.
            ◮   A knowledge base has not only data but also rules that allows to
                infer new data
            ◮   In the Semantics Web: RDFS and OWL ontologies




M. Arenas   –   Exchanging more than Complete Data - RR2011                        28 / 68
But these are not the only reasons . . .

       Nowadays several applications use knowledge bases to represent data.
            ◮   A knowledge base has not only data but also rules that allows to
                infer new data
            ◮   In the Semantics Web: RDFS and OWL ontologies


       In a data exchange application over the Semantics Web:
                The input is a mapping and a source specification including data
                and rules, and the output is a target specification also including
                data and rules




M. Arenas   –   Exchanging more than Complete Data - RR2011                         28 / 68
But these are not the only reasons . . .

       Nowadays several applications use knowledge bases to represent data.
            ◮   A knowledge base has not only data but also rules that allows to
                infer new data
            ◮   In the Semantics Web: RDFS and OWL ontologies


       In a data exchange application over the Semantics Web:
                The input is a mapping and a source specification including data
                and rules, and the output is a target specification also including
                data and rules


       There is a need for a data exchange framework that can handle
       knowledge bases.


M. Arenas   –   Exchanging more than Complete Data - RR2011                         28 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example
       Assume given the following source knowledge base:

       Data:
                                   Father                          Mother
                               Andy    Bob                       Carrie Bob
                               Bob     Danny
                               Danny Eddie

       Rules:
                                           Father(x, y ) → Parent(x, y )
                                           Mother(x, y ) → Parent(x, y )
                     Parent(x, y ) ∧ Parent(y , z)            → Grandparent(x, z)


M. Arenas   –   Exchanging more than Complete Data - RR2011                         29 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Given a mapping:
                                         Father(x, y ) → F(x, y )
                                    Grandparent(x, y ) → G(x, y )


       What is a good translation of the initial knowledge base?




M. Arenas   –   Exchanging more than Complete Data - RR2011         30 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Given a mapping:
                                         Father(x, y ) → F(x, y )
                                    Grandparent(x, y ) → G(x, y )


       What is a good translation of the initial knowledge base?

       Data:
                                       F                               G
                             Andy          Bob                Andy         Danny
                             Bob           Danny              Carrie       Danny
                             Danny         Eddie              Bob          Eddie

       Rules: ∅

M. Arenas   –   Exchanging more than Complete Data - RR2011                        30 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:
                                          Father(x, y )       → Parent(x, y )
                                    Mother(x, y )             → Parent(x, y )
                    Parent(x, y ) ∧ Parent(y , z)             → Grandparent(x, z)




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:


                                    Mother(x, y )             → Parent(x, y )
                    Parent(x, y ) ∧ Parent(y , z)             → Grandparent(x, z)




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                    Parent(x, y ) ∧ Parent(y , z)             → Grandparent(x, z)




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                                      F(x, y ) ∧ F(y , z)     → G(x, z)




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                                      F(x, y ) ∧ F(y , z)     → G(x, z)


       What data should we materialize?




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                                      F(x, y ) ∧ F(y , z)     → G(x, z)


       What data should we materialize?
                                      F                                 G
                            Andy          Bob                  Andy         Danny
                            Bob           Danny                Carrie       Danny
                            Danny         Eddie                Bob          Eddie




M. Arenas   –   Exchanging more than Complete Data - RR2011                         31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                                      F(x, y ) ∧ F(y , z)     → G(x, z)


       What data should we materialize?
                                      F                                  G
                            Andy          Bob
                            Bob           Danny                 Carrie       Danny
                            Danny         Eddie




M. Arenas   –   Exchanging more than Complete Data - RR2011                          31 / 68
Knowledge exchange: A more general data exchange
 framework is needed

       Example (cont’d)
       Our first alternative does not include any translation of the source rules:



                                      F(x, y ) ∧ F(y , z)     → G(x, z)


       What data should we materialize?
                                      F                                  G
                            Andy          Bob
                            Bob           Danny                 Carrie       Danny
                            Danny         Eddie

       Is this a good translation? Why?

M. Arenas   –   Exchanging more than Complete Data - RR2011                          31 / 68
One can exchange more than complete data


            ◮   In data exchange one starts with a database instance (with
                complete information).

            ◮   What if we have an initial object that has several
                interpretations?
                   ◮   A representation of a set of possible instances


            ◮   We propose a new general formalism to exchange
                representations of possible instances
                   ◮   We apply it to the problems of exchanging instances with
                       incomplete information and exchanging knowledge bases




M. Arenas   –   Exchanging more than Complete Data - RR2011                       32 / 68
Outline: Second part



            ◮   Formalism for exchanging representations systems

            ◮   Applications to incomplete instances

            ◮   Applications to knowledge bases

            ◮   Concluding remarks




M. Arenas   –   Exchanging more than Complete Data - RR2011        33 / 68
Outline: Second part



            ◮   Formalism for exchanging representations systems

            ◮   Applications to incomplete instances

            ◮   Applications to knowledge bases

            ◮   Concluding remarks




M. Arenas   –   Exchanging more than Complete Data - RR2011        34 / 68
Representation systems

       A representation system R = (W, rep) consists of:
            ◮   a set W of representatives
            ◮   a function rep that assigns a set of instances to every element
                in W
                                rep(V) = {I1 , I2 , I3 , . . .} for every V ∈ W


       Uniformity assumption: For every V ∈ W, there exists a relational
       schema S (the type of V) such that rep(V) ⊆ Inst(S)




M. Arenas   –   Exchanging more than Complete Data - RR2011                       35 / 68
Representation systems

       A representation system R = (W, rep) consists of:
            ◮   a set W of representatives
            ◮   a function rep that assigns a set of instances to every element
                in W
                                rep(V) = {I1 , I2 , I3 , . . .} for every V ∈ W


       Uniformity assumption: For every V ∈ W, there exists a relational
       schema S (the type of V) such that rep(V) ⊆ Inst(S)



       Incomplete instances and knowledge bases are representation
       systems


M. Arenas   –   Exchanging more than Complete Data - RR2011                       35 / 68
In classical data exchange we consider only complete data


       Recall that given M = (S, T, Σ), I ∈ Inst(S) and J ∈ Inst(T): J is
       a solution for I under M if (I , J) |= Σ


                                                  J ∈ SolM (I )




M. Arenas   –   Exchanging more than Complete Data - RR2011                 36 / 68
In classical data exchange we consider only complete data


       Recall that given M = (S, T, Σ), I ∈ Inst(S) and J ∈ Inst(T): J is
       a solution for I under M if (I , J) |= Σ


                                                  J ∈ SolM (I )



       This can be extended to set of instances. Given X ⊆ Inst(S):


                                      SolM (X ) =                    SolM (I )
                                                              I ∈X




M. Arenas   –   Exchanging more than Complete Data - RR2011                      36 / 68
Extending the definition to representation systems

       Given:
            ◮   a mapping M = (S, T, Σ)
            ◮   a representation system R = (W, rep)
            ◮   U, V ∈ W of types S and T, respectively




M. Arenas   –   Exchanging more than Complete Data - RR2011   37 / 68
Extending the definition to representation systems

       Given:
            ◮   a mapping M = (S, T, Σ)
            ◮   a representation system R = (W, rep)
            ◮   U, V ∈ W of types S and T, respectively



       Definition (APR11)
       V is an R-solution of U under M if
                                       rep(V) ⊆ SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011     37 / 68
Extending the definition to representation systems

       Given:
            ◮   a mapping M = (S, T, Σ)
            ◮   a representation system R = (W, rep)
            ◮   U, V ∈ W of types S and T, respectively



       Definition (APR11)
       V is an R-solution of U under M if
                                       rep(V) ⊆ SolM (rep(U))



       Or equivalently: V is an R-solution of U if for every J ∈ rep(V),
       there exists I ∈ rep(U) such that J ∈ SolM (I ).

M. Arenas   –   Exchanging more than Complete Data - RR2011                37 / 68
Universal solutions




       What is a good solution in this framework?




M. Arenas   –   Exchanging more than Complete Data - RR2011   38 / 68
Universal solutions




       What is a good solution in this framework?



       Definition (APR11)
       V is an universal R-solution of U under M if
                                       rep(V) = SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011     38 / 68
Strong representation systems

       Let C be a class of mappings.




M. Arenas   –   Exchanging more than Complete Data - RR2011   39 / 68
Strong representation systems

       Let C be a class of mappings.


       Definition (APR11)
       R = (W, rep) is a strong representation system for C if for every
       M∈C                  and for every U ∈ W            , there exists a
       V ∈W             :

                                       rep(V) = SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011                   39 / 68
Strong representation systems

       Let C be a class of mappings.


       Definition (APR11)
       R = (W, rep) is a strong representation system for C if for every
       M ∈ C from S to T, and for every U ∈ W              , there exists a
       V ∈W             :

                                       rep(V) = SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011                   39 / 68
Strong representation systems

       Let C be a class of mappings.


       Definition (APR11)
       R = (W, rep) is a strong representation system for C if for every
       M ∈ C from S to T, and for every U ∈ W of type S, there exists a
       V ∈W             :

                                       rep(V) = SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011                39 / 68
Strong representation systems

       Let C be a class of mappings.


       Definition (APR11)
       R = (W, rep) is a strong representation system for C if for every
       M ∈ C from S to T, and for every U ∈ W of type S, there exists a
       V ∈ W of type T:

                                       rep(V) = SolM (rep(U))




M. Arenas   –   Exchanging more than Complete Data - RR2011                39 / 68
Strong representation systems

       Let C be a class of mappings.


       Definition (APR11)
       R = (W, rep) is a strong representation system for C if for every
       M ∈ C from S to T, and for every U ∈ W of type S, there exists a
       V ∈ W of type T:

                                       rep(V) = SolM (rep(U))



       If R = (W, rep) is a strong representation system, then the
       universal solutions for the representatives in W can be represented
       in the same system.


M. Arenas   –   Exchanging more than Complete Data - RR2011                  39 / 68
Outline: Second part



            ◮   Formalism for exchanging representations systems

            ◮   Applications to incomplete instances

            ◮   Applications to knowledge bases

            ◮   Concluding remarks




M. Arenas   –   Exchanging more than Complete Data - RR2011        40 / 68
Motivating questions



       What is a strong representation system for the class of mappings
       specified by st-tgds?
            ◮   Are instances including nulls enough?


       Can the fundamental data exchange problems be solved in
       polynomial time in this setting?
            ◮   Computing (universal) solutions
            ◮   Computing certain answers




M. Arenas   –   Exchanging more than Complete Data - RR2011               41 / 68
Naive instances

       We have already considered naive instances: Instances with null values
            ◮   Example: Canonical universal solution


       A naive instance I has labeled nulls:
                                                     R(1, n1 )
                                                     R(n1 , 2)
                                                     R(1, n2 )




M. Arenas   –   Exchanging more than Complete Data - RR2011                     42 / 68
Naive instances

       We have already considered naive instances: Instances with null values
            ◮   Example: Canonical universal solution


       A naive instance I has labeled nulls:
                                                     R(1, n1 )
                                                     R(n1 , 2)
                                                     R(1, n2 )


       The interpretations of I are constructed by replacing nulls by constants:

                        rep(I)      =     {K | µ(I) ⊆ K for some valuation µ}




M. Arenas   –   Exchanging more than Complete Data - RR2011                        42 / 68
Are naive instances expressive enough?


       Naive instances have been extensively used in data exchange:


       Proposition (FKMP03)
       Let M = (S, T, Σ), where Σ is a set of st-tgds. Then for every
       instance I of S, there exists a naive instance J of T such that:

                                           rep(J ) = SolM (I )


       In fact, the canonical universal solution satisfies the property
       mentioned above.




M. Arenas   –   Exchanging more than Complete Data - RR2011               43 / 68
Are naive instances expressive enough?



       But naive instances are not expressive enough to deal with
       incomplete information in the source instances:



       Proposition (APR11)
       Naive instances are not a strong representation system for the class
       of mappings specified by st-tgds




M. Arenas   –   Exchanging more than Complete Data - RR2011                   44 / 68
Are naive instances expressive enough?


       Example
       Consider a mapping M specified by:
                                Manager(x, y ) → Reports(x, y )
                                Manager(x, x) → SelfManager(x)


       The canonical universal solution for I = {Manager(n, Peter)} under M:
                                       J     = {Reports(n, Peter)}


       But J is not a good solution for I.
            ◮   It cannot represent the fact that if n is given value Peter, then
                SelfManager(Peter) should hold in the target.



M. Arenas   –   Exchanging more than Complete Data - RR2011                         45 / 68
Conditional instances


       What should be added to naive instances to obtain a strong
       representation system?




M. Arenas   –   Exchanging more than Complete Data - RR2011         46 / 68
Conditional instances


       What should be added to naive instances to obtain a strong
       representation system?
            ◮   Answer from database theory: Conditions on the nulls




M. Arenas   –   Exchanging more than Complete Data - RR2011            46 / 68
Conditional instances


       What should be added to naive instances to obtain a strong
       representation system?
            ◮   Answer from database theory: Conditions on the nulls


       Conditional instances: Naive instances plus tuple conditions

       A tuple condition is a positive Boolean combinations of:
            ◮   equalities and inequalities between nulls, and between nulls
                and constants




M. Arenas   –   Exchanging more than Complete Data - RR2011                    46 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2




M. Arenas   –   Exchanging more than Complete Data - RR2011               47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:




M. Arenas   –   Exchanging more than Complete Data - RR2011               47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:
            µ(n1 ) = µ(n2 ) = 2               µ(n1 ) = µ(n2 ) = 3     µ(n1 ) = 2, µ(n2 ) = 3




M. Arenas   –   Exchanging more than Complete Data - RR2011                                    47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:
            µ(n1 ) = µ(n2 ) = 2               µ(n1 ) = µ(n2 ) = 3     µ(n1 ) = 2, µ(n2 ) = 3
                   R(1, 2)
                   R(2, 2)




M. Arenas   –   Exchanging more than Complete Data - RR2011                                    47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:
            µ(n1 ) = µ(n2 ) = 2               µ(n1 ) = µ(n2 ) = 3     µ(n1 ) = 2, µ(n2 ) = 3
                   R(1, 2)                           R(1, 3)
                   R(2, 2)




M. Arenas   –   Exchanging more than Complete Data - RR2011                                    47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:
            µ(n1 ) = µ(n2 ) = 2               µ(n1 ) = µ(n2 ) = 3     µ(n1 ) = 2, µ(n2 ) = 3
                   R(1, 2)                           R(1, 3)
                   R(2, 2)                                                   R(2, 3)




M. Arenas   –   Exchanging more than Complete Data - RR2011                                    47 / 68
Conditional instances

       Example
                                      R(1, n1 )        n1 = n2
                                      R(n1 , n2 )      n1 = n2 ∨ n2 = 2


       Semantics:
            µ(n1 ) = µ(n2 ) = 2               µ(n1 ) = µ(n2 ) = 3     µ(n1 ) = 2, µ(n2 ) = 3
                   R(1, 2)                           R(1, 3)
                   R(2, 2)                                                   R(2, 3)


       Interpretations of a conditional instance I:

                        rep(I)      =     {K | µ(I) ⊆ K for some valuation µ}



M. Arenas   –   Exchanging more than Complete Data - RR2011                                    47 / 68
Positive conditional instances




       Many problems are intractable over conditional instances.
            ◮   We also consider a restricted class of conditional instances


       Positive conditional instances: Conditional instances without
       inequalities




M. Arenas   –   Exchanging more than Complete Data - RR2011                    48 / 68
(Positive) conditional instances are enough

       Theorem (APR11)
       Both conditional instances and positive conditional instances are strong
       representation systems for the class of mappings specified by st-tgds.


       Example
       Consider again the mapping M specified by:
                                Manager(x, y ) → Reports(x, y )
                                Manager(x, x) → SelfManager(x)


       The following is a universal solution for I = {Manager(n, Peter)}
                                    Reports(n, Peter)         true
                                    SelfManager(Peter)        n = Peter


M. Arenas   –   Exchanging more than Complete Data - RR2011                       49 / 68
Positive conditional instances are exactly the needed
 representation system

       Positive conditional instances are minimal:

       Theorem (APR11)
       All the following are needed to obtain a strong representation
       system for the class of mappings specified by st-tgds:
            ◮   equalities between nulls
            ◮   equalities between constant and nulls
            ◮   conjunctions and disjunctions



       Conditional instances are enough but not minimal.


M. Arenas   –   Exchanging more than Complete Data - RR2011             50 / 68
Positive conditional instance can be used in practice!

       Let M = (S, T, Σ), where Σ is a set of st-tgds.




M. Arenas   –   Exchanging more than Complete Data - RR2011   51 / 68
Positive conditional instance can be used in practice!

       Let M = (S, T, Σ), where Σ is a set of st-tgds.

       Theorem (APR11)
       There exists a polynomial time algorithm that, given a positive
       conditional instance I over S, computes a positive conditional instance
       J over T that is a universal solution for I under M.




M. Arenas   –   Exchanging more than Complete Data - RR2011                      51 / 68
Positive conditional instance can be used in practice!

       Let M = (S, T, Σ), where Σ is a set of st-tgds.

       Theorem (APR11)
       There exists a polynomial time algorithm that, given a positive
       conditional instance I over S, computes a positive conditional instance
       J over T that is a universal solution for I under M.



       Let Q be a union of conjunctive queries over T.

                                     Q(J ) =                      Q(J)
                                                      J∈rep(J )

                       certainM (Q, I)          =                                     Q(J )
                                                      J is a solution for I under M




M. Arenas   –   Exchanging more than Complete Data - RR2011                                   51 / 68
Positive conditional instance can be used in practice!



       Theorem (APR11)
       There exists a polynomial time algorithm that, given a positive
       conditional instance I over S, computes certainM (Q, I).




M. Arenas   –   Exchanging more than Complete Data - RR2011              52 / 68
Positive conditional instance can be used in practice!



       Theorem (APR11)
       There exists a polynomial time algorithm that, given a positive
       conditional instance I over S, computes certainM (Q, I).



       The same result holds for the class of unions of conjunctive queries with
       at most one inequality per disjunct.
            ◮   The other important class of queries in the data exchange area for
                which certain answers can be computed in polynomial time




M. Arenas   –   Exchanging more than Complete Data - RR2011                          52 / 68
Outline: Second part



            ◮   Formalism for exchanging representations systems

            ◮   Applications to incomplete instances

            ◮   Applications to knowledge bases

            ◮   Concluding remarks




M. Arenas   –   Exchanging more than Complete Data - RR2011        53 / 68
The semantics of knowledge bases is given by sets of
 instances



       Knowledge base over S: (I , Γ) such that
            ◮   I ∈ Inst(S)
            ◮   Γ a set of rules over S


       Semantics: finite models

                     Mod(I , Γ) = {K ∈ Inst(S) | I ⊆ K and K |= Γ}




M. Arenas   –   Exchanging more than Complete Data - RR2011          54 / 68
We can apply our formalism to knowledge bases


       (I2 , Γ2 ) is a KB-solution for (I1 , Γ1 ) under M if:


                                Mod(I2 , Γ2 ) ⊆ SolM (Mod(I1 , Γ1 ))



       (I2 , Γ2 ) is a universal KB-solution for (I1 , Γ1 ) under M if:


                                Mod(I2 , Γ2 ) = SolM (Mod(I1 , Γ1 ))




M. Arenas   –   Exchanging more than Complete Data - RR2011               55 / 68
Motivating questions




       Same as for the case of instances with incomplete information.
            ◮   Constructing universal KB-solutions
            ◮   Answering target queries


       New fundamental problem: Construct solutions including as much
       implicit knowledge as possible.




M. Arenas   –   Exchanging more than Complete Data - RR2011             56 / 68
What are good knowledge-base solutions?

       First alternative: universal KB-solutions

       But there exist some other KB-solutions desirable to materialize
            ◮   Minimality comes into play




M. Arenas   –   Exchanging more than Complete Data - RR2011               57 / 68
What are good knowledge-base solutions?

       First alternative: universal KB-solutions

       But there exist some other KB-solutions desirable to materialize
            ◮   Minimality comes into play


       Given sets X , Y of instances:
            ◮   X ≡min Y if X and Y coincide in the minimal instances under ⊆



       Definition
       (I2 , Γ2 ) is a minimal KB-solution of (I1 , Γ1 ) under M if:

                               Mod(I2 , Γ2 )       ≡min       SolM (Mod(I1 , Γ1 ))



M. Arenas   –   Exchanging more than Complete Data - RR2011                          57 / 68
Two requirements to construct minimal knowledge-base
 solutions

       Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution
       (I2 , Γ2 ) we would like:




M. Arenas   –   Exchanging more than Complete Data - RR2011              58 / 68
Two requirements to construct minimal knowledge-base
 solutions

       Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution
       (I2 , Γ2 ) we would like:


            1. Γ2 to only depend on Γ1 and M:

                                             Γ2 is safe for Γ1 and M




M. Arenas   –   Exchanging more than Complete Data - RR2011              58 / 68
Two requirements to construct minimal knowledge-base
 solutions

       Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution
       (I2 , Γ2 ) we would like:


            1. Γ2 to only depend on Γ1 and M:

                                             Γ2 is safe for Γ1 and M


       Definition
       Γ2 is safe for Γ1 and M, if for every I1 there exists I2 :

                   (I2 , Γ2 ) is a minimal KB-solution of (I1 , Γ1 ) under M


M. Arenas   –   Exchanging more than Complete Data - RR2011                    58 / 68
Two requirements to construct minimal knowledge-base
 solutions



            2. Γ2 to be as informative as possible (thus minimizing the size
               of I2 ):




M. Arenas   –   Exchanging more than Complete Data - RR2011                    59 / 68
Two requirements to construct minimal knowledge-base
 solutions



            2. Γ2 to be as informative as possible (thus minimizing the size
               of I2 ):


       Definition
       Γ2 is optimal-safe if for every other safe set Γ′ :

                                                     Γ2 |= Γ′




M. Arenas   –   Exchanging more than Complete Data - RR2011                    59 / 68
Computing minimal KB-solutions

       To obtain algorithms for computing minimal KB-solutions, we need
       to specify the language used in knowledge bases.
            ◮   Full st-tgd:

                                          ∀¯∀¯ (ϕ(¯ , y ) → ψ(¯ ))
                                           x y    x ¯         x




M. Arenas   –   Exchanging more than Complete Data - RR2011               60 / 68
Computing minimal KB-solutions

       To obtain algorithms for computing minimal KB-solutions, we need
       to specify the language used in knowledge bases.
            ◮   Full st-tgd:

                                          ∀¯∀¯ (ϕ(¯ , y ) → ψ(¯ ))
                                           x y    x ¯         x




       Theorem (APR11)
       There exists a polynomial-time algorithm that, given
       M = (S, T, Σ), where Σ is a set of full st-tgds, and given a set Γ1
       of full tgds over S, computes a set Γ2 of second-order logic
       sentences over T that is optimal-safe for Γ1 and M.


M. Arenas   –   Exchanging more than Complete Data - RR2011                  60 / 68
Computing minimal KB-solutions

       Unfortunately, first-order logic is no expressive enough.


       Theorem (APR11)
       There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a
       set Γ1 of full tgds over S such that:

                        no FO-sentence is optimal-safe for Γ1 and M.




M. Arenas   –   Exchanging more than Complete Data - RR2011                 61 / 68
Computing minimal KB-solutions

       Unfortunately, first-order logic is no expressive enough.


       Theorem (APR11)
       There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a
       set Γ1 of full tgds over S such that:

                        no FO-sentence is optimal-safe for Γ1 and M.



       How can we deal with these problems in practice?




M. Arenas   –   Exchanging more than Complete Data - RR2011                 61 / 68
Computing minimal KB-solutions

       Unfortunately, first-order logic is no expressive enough.


       Theorem (APR11)
       There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a
       set Γ1 of full tgds over S such that:

                        no FO-sentence is optimal-safe for Γ1 and M.



       How can we deal with these problems in practice?
            ◮   We need to restrict the language used to specify knowledge
                bases: Description logics [ABC11]



M. Arenas   –   Exchanging more than Complete Data - RR2011                  61 / 68
Outline: Second part



            ◮   Formalism for exchanging representations systems

            ◮   Applications to incomplete instances

            ◮   Applications to knowledge bases

            ◮   Concluding remarks




M. Arenas   –   Exchanging more than Complete Data - RR2011        62 / 68
We can exchange more than complete data

       We propose a general formalism to exchange representation
       systems
            ◮   Applications to incomplete instances
            ◮   Applications to knowledge bases


       Next step: Apply our general setting to the Semantic Web
            ◮   Semantic Web data has nulls (blank nodes)
            ◮   Semantic Web specifications have rules (RDFS, OWL)


       Lots of interesting problems to solve if knowledge bases are
       specified by means of description logics.
            ◮   Better results can be obtained


M. Arenas   –   Exchanging more than Complete Data - RR2011           63 / 68
We can exchange more than complete data

       We propose a general formalism to exchange representation
       systems
            ◮   Applications to incomplete instances
            ◮   Applications to knowledge bases


       Next step: Apply our general setting to the Semantic Web
            ◮   Semantic Web data has nulls (blank nodes)
            ◮   Semantic Web specifications have rules (RDFS, OWL)


       Lots of interesting problems to solve if knowledge bases are
       specified by means of description logics.
            ◮   Better results can be obtained


M. Arenas   –   Exchanging more than Complete Data - RR2011           63 / 68
We can exchange more than complete data

       We propose a general formalism to exchange representation
       systems
            ◮   Applications to incomplete instances
            ◮   Applications to knowledge bases


       Next step: Apply our general setting to the Semantic Web
            ◮   Semantic Web data has nulls (blank nodes)
            ◮   Semantic Web specifications have rules (RDFS, OWL)


       Lots of interesting problems to solve if knowledge bases are
       specified by means of description logics.
            ◮   Better results can be obtained


M. Arenas   –   Exchanging more than Complete Data - RR2011           63 / 68
Thank you!




M. Arenas   –   Exchanging more than Complete Data - RR2011   64 / 68
Bibliography



            [ABC11]         M. Arenas, E. Botoeva, D. Calvanese. Knowledge Base Exchange.
                            DL 2011.

            [APR11]         M. Arenas, J. P´rez, J. Reutter. Data Exchange beyond Complete
                                           e
                            Data. PODS 2011.

            [B03]           P. A. Bernstein. Applying Model Management to Classical Meta
                            Data Problems. CIDR 2003.

            [FKMP03]        R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa. Data Exchange:
                            Semantics and Query Answering. ICDT 2003.




M. Arenas    –   Exchanging more than Complete Data - RR2011                             65 / 68
Bonus track: Computation of solutions and its associated
 decision problem


       Decision problem: Check-KB-Sol
            Input:         M = (S, T, Σ), where Σ is a set of st-tgds
                           (I1 , Γ1 ) KB over S with Γ1 a set of tgds
                           (I2 , Γ2 ) KB over T with Γ2 a set of tgds

            Output:        Is (I2 , Γ2 ) a KB-solution of (I1 , Γ1 ) under M?




M. Arenas    –   Exchanging more than Complete Data - RR2011                    66 / 68
Bonus track: Computation of solutions and its associated
 decision problem


       Decision problem: Check-KB-Sol
            Input:         M = (S, T, Σ), where Σ is a set of st-tgds
                           (I1 , Γ1 ) KB over S with Γ1 a set of tgds
                           (I2 , Γ2 ) KB over T with Γ2 a set of tgds

            Output:        Is (I2 , Γ2 ) a KB-solution of (I1 , Γ1 ) under M?




       Theorem (APR11)
                   Check-KB-Sol is undecidable (even for a fixed M).


M. Arenas    –   Exchanging more than Complete Data - RR2011                    66 / 68
Bonus track: Computation of solutions and its associated
 decision problem




       Undecidability is a consequence of using ∃ in knowledge bases.
            ◮   We need to restrict the input


       Check-Full-KB-Sol: Γ1 , Γ2 are assumed to be sets of full tgds




M. Arenas   –   Exchanging more than Complete Data - RR2011             67 / 68
Bonus track: Computation of solutions and its associated
 decision problem


       Theorem (APR11)
                       Check-Full-KB-Sol is EXPTIME-complete.




M. Arenas   –   Exchanging more than Complete Data - RR2011     68 / 68
Bonus track: Computation of solutions and its associated
 decision problem


       Theorem (APR11)
                        Check-Full-KB-Sol is EXPTIME-complete.



       Theorem (APR11)
       If M = (S, T, Σ) is fixed:

                     Check-Full-KB-Sol is ∆P [O(log n)]-complete.
                                           2


            ∆P [O(log n)]:
             2                      P NP with a logarithmic number of
                                    calls to the NP oracle.


M. Arenas    –   Exchanging more than Complete Data - RR2011            68 / 68

More Related Content

PPTX
Property Alignment on Linked Open Data
PDF
2 beginning problem solving concepts for the computer
PDF
Relational Model and Relational Algebra - Lecture 3 - Introduction to Databas...
PDF
Random Manhattan Indexing
PDF
XSPARQL Tutorial
PDF
Borders of Decidability in Verification of Data-Centric Dynamic Systems
PDF
Extending DBpedia (LOD) using WikiTables
PDF
Exchanging OWL 2 QL Knowledge Bases
Property Alignment on Linked Open Data
2 beginning problem solving concepts for the computer
Relational Model and Relational Algebra - Lecture 3 - Introduction to Databas...
Random Manhattan Indexing
XSPARQL Tutorial
Borders of Decidability in Verification of Data-Centric Dynamic Systems
Extending DBpedia (LOD) using WikiTables
Exchanging OWL 2 QL Knowledge Bases

More from net2-project (12)

PDF
Extracting Information for Context-aware Meeting Preparation
PDF
Vector spaces for information extraction - Random Projection Example
PDF
Federation and Navigation in SPARQL 1.1
PDF
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
PDF
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
PDF
Managing Social Communities
PDF
Data Exchange over RDF
PDF
Exchanging More than Complete Data
PDF
Exchanging More than Complete Data
PDF
Answer-set programming
PDF
Evolving web, evolving search
PPTX
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
Extracting Information for Context-aware Meeting Preparation
Vector spaces for information extraction - Random Projection Example
Federation and Navigation in SPARQL 1.1
Mining Semi-structured Data: Understanding Web-tables – Building a Taxonomy f...
Tailoring Temporal Description Logics for Reasoning over Temporal Conceptual ...
Managing Social Communities
Data Exchange over RDF
Exchanging More than Complete Data
Exchanging More than Complete Data
Answer-set programming
Evolving web, evolving search
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
Ad

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation theory and applications.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
A comparative analysis of optical character recognition models for extracting...
Spectral efficient network and resource selection model in 5G networks
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Empathic Computing: Creating Shared Understanding
Assigned Numbers - 2025 - Bluetooth® Document
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation theory and applications.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced methodologies resolving dimensionality complications for autism neur...
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Big Data Technologies - Introduction.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
A comparative analysis of optical character recognition models for extracting...
Ad

Exchanging more than Complete Data

  • 1. Exchanging more than Complete Data Marcelo Arenas PUC Chile Joint work with Jorge P´rez (U. de Chile) and Juan Reutter (U. Edinburgh) e M. Arenas – Exchanging more than Complete Data - RR2011 1 / 68
  • 2. Outline: First part ◮ The data exchange problem ◮ Some fundamental results in relational data exchange ◮ The need for a more general data exchange framework ◮ Two important scenarios: Incomplete databases (open-world databases: RDF) and knowledge bases M. Arenas – Exchanging more than Complete Data - RR2011 2 / 68
  • 3. Outline: First part ◮ The data exchange problem ◮ Some fundamental results in relational data exchange ◮ The need for a more general data exchange framework ◮ Two important scenarios: Incomplete databases (open-world databases: RDF) and knowledge bases M. Arenas – Exchanging more than Complete Data - RR2011 3 / 68
  • 4. The problem of data exchange Given: A source schema S, a target schema T and a specification Σ of the relationship between these schemas Data exchange: Problem of materializing an instance of T given an instance of S ◮ Target instance should reflect the source data as accurately as possible, given the constraints imposed by Σ and T ◮ It should be efficiently computable ◮ It should allow one to evaluate queries on the target in a way that is semantically consistent with the source data M. Arenas – Exchanging more than Complete Data - RR2011 4 / 68
  • 5. Data exchange in a picture Query Q Σ Schema S Schema T M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68
  • 6. Data exchange in a picture Query Q Σ Schema S Schema T M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68
  • 7. Data exchange in a picture Query Q Σ Schema S Schema T M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68
  • 8. Data exchange in a picture Σ Schema S Schema T M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68
  • 9. Data exchange in a picture Query Q Σ Schema S Schema T M. Arenas – Exchanging more than Complete Data - RR2011 5 / 68
  • 10. Data exchange: Some fundamental questions What are the challenges in the area? ◮ What is a good language for specifying the relationship between source and target data? ◮ Expressiveness versus complexity ◮ What is a good instance to materialize? ◮ What does it mean to answer a query over target data? ◮ How do we answer queries over target data? Can we do this efficiently? M. Arenas – Exchanging more than Complete Data - RR2011 6 / 68
  • 11. Exchanging relational data The data exchange problem has been extensively studied in the relational world. ◮ It has also been commercially implemented: IBM Clio Relational data exchange setting: ◮ Source and target schemas: Relational schemas ◮ Relationship between source and target schemas: Source-to-target tuple-generating dependencies (st-tgds) Semantics of data exchange has been precisely defined. ◮ Efficient algorithms for materializing target instances and for answering queries over the target schema have been developed M. Arenas – Exchanging more than Complete Data - RR2011 7 / 68
  • 12. Schema mapping: The key component in relational data exchange Schema mapping: M = (S, T, Σ) ◮ S and T are disjoint relational schemas ◮ Σ is a finite set of st-tgds: ∀¯∀¯ (ϕ(¯ , y ) → ∃¯ ψ(¯ , z )) x y x ¯ z x ¯ ϕ(¯, y ): conjunction of relational atomic formulas over S x ¯ ψ(¯, z ): conjunction of relational atomic formulas over T x ¯ M. Arenas – Exchanging more than Complete Data - RR2011 8 / 68
  • 13. Relational schema mappings: An example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: ∀x Employee(x) → ∃y Dept(x, y ) M. Arenas – Exchanging more than Complete Data - RR2011 9 / 68
  • 14. Relational schema mappings: An example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: ∀x Employee(x) → ∃y Dept(x, y ) Note We omit universal quantifiers in st-tgds: Employee(x) → ∃y Dept(x, y ) M. Arenas – Exchanging more than Complete Data - RR2011 9 / 68
  • 15. Relational data exchange problem Fixed: M = (S, T, Σ) Problem: Given instance I of S, find an instance J of T such that (I , J) satisfies Σ ◮ (I , J) satisfies ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) if whenever I satisfies x ¯ z x ¯ a ¯ ϕ(¯, b), there is a tuple c such that J satisfies ψ(¯, c ) ¯ a ¯ M. Arenas – Exchanging more than Complete Data - RR2011 10 / 68
  • 16. Relational data exchange problem Fixed: M = (S, T, Σ) Problem: Given instance I of S, find an instance J of T such that (I , J) satisfies Σ ◮ (I , J) satisfies ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) if whenever I satisfies x ¯ z x ¯ a ¯ ϕ(¯, b), there is a tuple c such that J satisfies ψ(¯, c ) ¯ a ¯ Notation J is a solution for I under M ◮ SolM (I ): Set of solutions for I under M M. Arenas – Exchanging more than Complete Data - RR2011 10 / 68
  • 17. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 18. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: J1 : {Dept(Peter,1)} M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 19. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: J1 : {Dept(Peter,1)} J2 : {Dept(Peter,1), Dept(Peter,2)} M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 20. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: J1 : {Dept(Peter,1)} J2 : {Dept(Peter,1), Dept(Peter,2)} J3 : {Dept(Peter,1), Dept(John,1)} M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 21. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: J1 : {Dept(Peter,1)} J2 : {Dept(Peter,1), Dept(Peter,2)} J3 : {Dept(Peter,1), Dept(John,1)} J4 : {Dept(Peter,n1 )} M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 22. The notion of solution: Example Example ◮ S: Employee(name) ◮ T: Dept(name, number) ◮ Σ: Employee(x) → ∃y Dept(x, y ) Solutions for I = {Employee(Peter)}: J1 : {Dept(Peter,1)} J2 : {Dept(Peter,1), Dept(Peter,2)} J3 : {Dept(Peter,1), Dept(John,1)} J4 : {Dept(Peter,n1 )} J5 : {Dept(Peter,n1 ), Dept(Peter,n2 )} M. Arenas – Exchanging more than Complete Data - RR2011 11 / 68
  • 23. Canonical universal solution Algorithm (Chase) Input : M = (S, T, Σ) and an instance I of S Output : Canonical universal solution J ⋆ for I under M let J ⋆ := empty instance of T for every ϕ(¯ , y ) → ∃¯ ψ(¯ , z ) in Σ do x ¯ z x ¯ a ¯ a ¯ for every ¯, b such that I satisfies ϕ(¯, b) do create a fresh tuple n of pairwise distinct null values ¯ insert ψ(¯, n) into J a ¯ ⋆ M. Arenas – Exchanging more than Complete Data - RR2011 12 / 68
  • 24. Canonical universal solution: Example Example Consider mapping M specified by dependency: Employee(x) → ∃y Dept(x, y ) Canonical universal solution for I = {Employee(Peter), Employee(John)}: ◮ For a = Peter do ◮ Create a fresh null value n1 ◮ Insert Dept(Peter, n1 ) into J ⋆ ◮ For a = John do ◮ Create a fresh null value n2 ◮ Insert Dept(John, n2 ) into J ⋆ Result: J ⋆ = {Dept(Peter, n1 ), Dept(John, n2 )} M. Arenas – Exchanging more than Complete Data - RR2011 13 / 68
  • 25. Query answering in data exchange Given: Mapping M, source instance I and query Q over the target schema ◮ What does it mean to answer Q? M. Arenas – Exchanging more than Complete Data - RR2011 14 / 68
  • 26. Query answering in data exchange Given: Mapping M, source instance I and query Q over the target schema ◮ What does it mean to answer Q? Definition (Certain answers) certainM (Q, I ) = Q(J) J is a solution for I under M M. Arenas – Exchanging more than Complete Data - RR2011 14 / 68
  • 27. Certain answers: Example Example Consider mapping M specified by: Employee(x) → ∃y Dept(x, y ) Given instance I = {Employee(Peter)}: certainM (∃y Dept(x, y ), I ) = {Peter} certainM (Dept(x, y ), I ) = ∅ M. Arenas – Exchanging more than Complete Data - RR2011 15 / 68
  • 28. Query rewriting: An approach for answering queries How can we compute certain answers? ◮ Na¨ algorithm does not work: infinitely many solutions ıve M. Arenas – Exchanging more than Complete Data - RR2011 16 / 68
  • 29. Query rewriting: An approach for answering queries How can we compute certain answers? ◮ Na¨ algorithm does not work: infinitely many solutions ıve Approach proposed in [FKMP03]: Query Rewriting Given a mapping M and a target query Q, compute a query Q ⋆ such that for every source instance I with canonical universal solution J ⋆ : certainM (Q, I ) = Q ⋆ (J ⋆ ) M. Arenas – Exchanging more than Complete Data - RR2011 16 / 68
  • 30. Query rewriting over the canonical universal solution Theorem (FKMP03) Given a mapping M specified by st-tgds and a union of conjunctive queries Q, there exists a query Q ⋆ such that for every source instance I with canonical universal solution J ⋆ : certainM (Q, I ) = Q ⋆ (J ⋆ ) M. Arenas – Exchanging more than Complete Data - RR2011 17 / 68
  • 31. Query rewriting over the canonical universal solution Theorem (FKMP03) Given a mapping M specified by st-tgds and a union of conjunctive queries Q, there exists a query Q ⋆ such that for every source instance I with canonical universal solution J ⋆ : certainM (Q, I ) = Q ⋆ (J ⋆ ) Proof idea: Assume that C(a) holds whenever a is a constant. Then: Q ⋆ (x1 , . . . , xm ) = C(x1 ) ∧ · · · ∧ C(xm ) ∧ Q(x1 , . . . , xm ) M. Arenas – Exchanging more than Complete Data - RR2011 17 / 68
  • 32. Computing certain answers: Complexity Data complexity: Data exchange setting and query are considered to be fixed. Corollary (FKMP03) For mappings given by st-tgds, certain answers for UCQ can be computed in polynomial time (data complexity) M. Arenas – Exchanging more than Complete Data - RR2011 18 / 68
  • 33. Relational data exchange: Some lessons learned Key steps in the development of the area: ◮ Definition of schema mappings: Precise syntax and semantics ◮ Definition of the notion of solution ◮ Identification of good solutions ◮ Polynomial time algorithms for materializing good solutions ◮ Definition of target queries: Precise semantics ◮ Polynomial time algorithms for computing certain answers for UCQ M. Arenas – Exchanging more than Complete Data - RR2011 19 / 68
  • 34. Relational data exchange: Some lessons learned Key steps in the development of the area: ◮ Definition of schema mappings: Precise syntax and semantics ◮ Definition of the notion of solution ◮ Identification of good solutions ◮ Polynomial time algorithms for materializing good solutions ◮ Definition of target queries: Precise semantics ◮ Polynomial time algorithms for computing certain answers for UCQ Creating schema mappings is a time consuming and expensive process ◮ Manual or semi-automatic process in general M. Arenas – Exchanging more than Complete Data - RR2011 19 / 68
  • 35. Outline: First part ◮ The data exchange problem ◮ Some fundamental results in relational data exchange ◮ The need for a more general data exchange framework ◮ Two important scenarios: Incomplete databases (open-world databases: RDF) and knowledge bases M. Arenas – Exchanging more than Complete Data - RR2011 20 / 68
  • 36. Ongoing project: Reusing schema mappings S T U ΣST ΣTU ΣSU M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68
  • 37. Ongoing project: Reusing schema mappings S T U ΣST ΣTU ΣSU M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68
  • 38. Ongoing project: Reusing schema mappings S T U ΣST ΣTU ΣSU ? M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68
  • 39. Ongoing project: Reusing schema mappings S T U ΣST ΣTU ΣSU ? We need some operators for schema mappings M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68
  • 40. Ongoing project: Reusing schema mappings S T U ΣST ΣTU ΣSU = ΣST ◦ ΣTU We need some operators for schema mappings ◮ Composition in the above case M. Arenas – Exchanging more than Complete Data - RR2011 21 / 68
  • 41. Metadata management Contributions mentioned in the previous slides are just a first step towards the development of a general framework for data exchange. In fact, as pointed in [B03], many information system problems involve not only the design and integration of complex application artifacts, but also their subsequent manipulation. M. Arenas – Exchanging more than Complete Data - RR2011 22 / 68
  • 42. Metadata management This has motivated the need for the development of a general infrastructure for managing schema mappings. The problem of managing schema mappings is called metadata management. High-level algebraic operators, such as compose, are used to manipulate mappings. ◮ What other operators are needed? M. Arenas – Exchanging more than Complete Data - RR2011 23 / 68
  • 43. An inverse operator is also needed S T U ΣST ΣTU ΣVS =ΣVS ? Σ−1 SV ΣSV Σ−1 ◦ ΣST VS (Σ−1 ◦ ΣST ) ◦ ΣTU VS V M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 44. An inverse operator is also needed S T U ΣST ΣTU ΣVS =ΣVS ? ΣSV −1 ΣSV Σ−1 ◦ ΣST VS (ΣVS ◦ ΣST ) ◦ ΣTU −1 V M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 45. An inverse operator is also needed S T U ΣST ΣTU ΣVS = Σ−1 SV ΣVS ? ΣSV Σ−1 ◦ ΣST VS (Σ−1 ◦ ΣST ) ◦ ΣTU VS V M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 46. An inverse operator is also needed S T U ΣST ΣTU ΣVS = Σ−1 SV ΣSV Σ−1 ◦ ΣST VS (Σ−1 ◦ ΣST ) ◦ ΣTU VS V M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 47. An inverse operator is also needed S T U ΣST ΣTU ΣVS = Σ−1 SV ΣSV Σ−1 ◦ ΣST SV (Σ−1 ◦ ΣST ) ◦ ΣTU VS V Composition and inverse operators have to be combined M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 48. An inverse operator is also needed S T U ΣST ΣTU ΣVS = Σ−1 SV ΣSV Σ−1 ◦ ΣST SV (Σ−1 ◦ ΣST ) ◦ ΣTU SV V Composition and inverse operators have to be combined M. Arenas – Exchanging more than Complete Data - RR2011 24 / 68
  • 49. Metadata management: A more general data exchange framework is needed Composition and inverse operators have been extensively studied in the relational world. ◮ Semantics, computation, . . . Combining these operators is an open issue. M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68
  • 50. Metadata management: A more general data exchange framework is needed Composition and inverse operators have been extensively studied in the relational world. ◮ Semantics, computation, . . . Combining these operators is an open issue. ◮ Key observation: A target instance of a mapping can be the source instance of another mapping M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68
  • 51. Metadata management: A more general data exchange framework is needed Composition and inverse operators have been extensively studied in the relational world. ◮ Semantics, computation, . . . Combining these operators is an open issue. ◮ Key observation: A target instance of a mapping can be the source instance of another mapping ◮ Sources instances may contain null values M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68
  • 52. Metadata management: A more general data exchange framework is needed Composition and inverse operators have been extensively studied in the relational world. ◮ Semantics, computation, . . . Combining these operators is an open issue. ◮ Key observation: A target instance of a mapping can be the source instance of another mapping ◮ Sources instances may contain null values There is a need for a data exchange framework that can handle databases with incomplete information. M. Arenas – Exchanging more than Complete Data - RR2011 25 / 68
  • 53. Data exchange in the RDF world There is an increasing interest in publishing relational data as RDF ◮ Resulted in the creation of the W3C RDB2RDF Working Group The problem of translating relational data into RDF can be seen as a data exchange problem ◮ Schema mappings can be used to describe how the relational data is to be mapped into RDF M. Arenas – Exchanging more than Complete Data - RR2011 26 / 68
  • 54. Data exchange in the RDF world There is an increasing interest in publishing relational data as RDF ◮ Resulted in the creation of the W3C RDB2RDF Working Group The problem of translating relational data into RDF can be seen as a data exchange problem ◮ Schema mappings can be used to describe how the relational data is to be mapped into RDF But there is a mismatch here: A relational database under a closed-world semantics is to be translated into an RDF graph under an open-world semantics ◮ There is a need for a data exchange framework that can handle both databases with complete and incomplete information M. Arenas – Exchanging more than Complete Data - RR2011 26 / 68
  • 55. Data exchange in the RDF world An issue discussed at the W3C RDB2RDF Working Group: Is a mapping information preserving? ◮ In particular: For the default mapping defined by this group How can we address this issue? ◮ Metadata management can help us M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68
  • 56. Data exchange in the RDF world An issue discussed at the W3C RDB2RDF Working Group: Is a mapping information preserving? ◮ In particular: For the default mapping defined by this group How can we address this issue? ◮ Metadata management can help us Question to answer: Is a mapping invertible? M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68
  • 57. Data exchange in the RDF world An issue discussed at the W3C RDB2RDF Working Group: Is a mapping information preserving? ◮ In particular: For the default mapping defined by this group How can we address this issue? ◮ Metadata management can help us Question to answer: Is a mapping invertible? ◮ This time an RDF graph is to be translated into a relational database! M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68
  • 58. Data exchange in the RDF world An issue discussed at the W3C RDB2RDF Working Group: Is a mapping information preserving? ◮ In particular: For the default mapping defined by this group How can we address this issue? ◮ Metadata management can help us Question to answer: Is a mapping invertible? ◮ This time an RDF graph is to be translated into a relational database! ◮ We want to have a unifying framework for all these cases M. Arenas – Exchanging more than Complete Data - RR2011 27 / 68
  • 59. But these are not the only reasons . . . Nowadays several applications use knowledge bases to represent data. ◮ A knowledge base has not only data but also rules that allows to infer new data ◮ In the Semantics Web: RDFS and OWL ontologies M. Arenas – Exchanging more than Complete Data - RR2011 28 / 68
  • 60. But these are not the only reasons . . . Nowadays several applications use knowledge bases to represent data. ◮ A knowledge base has not only data but also rules that allows to infer new data ◮ In the Semantics Web: RDFS and OWL ontologies In a data exchange application over the Semantics Web: The input is a mapping and a source specification including data and rules, and the output is a target specification also including data and rules M. Arenas – Exchanging more than Complete Data - RR2011 28 / 68
  • 61. But these are not the only reasons . . . Nowadays several applications use knowledge bases to represent data. ◮ A knowledge base has not only data but also rules that allows to infer new data ◮ In the Semantics Web: RDFS and OWL ontologies In a data exchange application over the Semantics Web: The input is a mapping and a source specification including data and rules, and the output is a target specification also including data and rules There is a need for a data exchange framework that can handle knowledge bases. M. Arenas – Exchanging more than Complete Data - RR2011 28 / 68
  • 62. Knowledge exchange: A more general data exchange framework is needed Example Assume given the following source knowledge base: Data: Father Mother Andy Bob Carrie Bob Bob Danny Danny Eddie Rules: Father(x, y ) → Parent(x, y ) Mother(x, y ) → Parent(x, y ) Parent(x, y ) ∧ Parent(y , z) → Grandparent(x, z) M. Arenas – Exchanging more than Complete Data - RR2011 29 / 68
  • 63. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Given a mapping: Father(x, y ) → F(x, y ) Grandparent(x, y ) → G(x, y ) What is a good translation of the initial knowledge base? M. Arenas – Exchanging more than Complete Data - RR2011 30 / 68
  • 64. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Given a mapping: Father(x, y ) → F(x, y ) Grandparent(x, y ) → G(x, y ) What is a good translation of the initial knowledge base? Data: F G Andy Bob Andy Danny Bob Danny Carrie Danny Danny Eddie Bob Eddie Rules: ∅ M. Arenas – Exchanging more than Complete Data - RR2011 30 / 68
  • 65. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: Father(x, y ) → Parent(x, y ) Mother(x, y ) → Parent(x, y ) Parent(x, y ) ∧ Parent(y , z) → Grandparent(x, z) M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 66. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: Mother(x, y ) → Parent(x, y ) Parent(x, y ) ∧ Parent(y , z) → Grandparent(x, z) M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 67. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: Parent(x, y ) ∧ Parent(y , z) → Grandparent(x, z) M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 68. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: F(x, y ) ∧ F(y , z) → G(x, z) M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 69. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: F(x, y ) ∧ F(y , z) → G(x, z) What data should we materialize? M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 70. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: F(x, y ) ∧ F(y , z) → G(x, z) What data should we materialize? F G Andy Bob Andy Danny Bob Danny Carrie Danny Danny Eddie Bob Eddie M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 71. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: F(x, y ) ∧ F(y , z) → G(x, z) What data should we materialize? F G Andy Bob Bob Danny Carrie Danny Danny Eddie M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 72. Knowledge exchange: A more general data exchange framework is needed Example (cont’d) Our first alternative does not include any translation of the source rules: F(x, y ) ∧ F(y , z) → G(x, z) What data should we materialize? F G Andy Bob Bob Danny Carrie Danny Danny Eddie Is this a good translation? Why? M. Arenas – Exchanging more than Complete Data - RR2011 31 / 68
  • 73. One can exchange more than complete data ◮ In data exchange one starts with a database instance (with complete information). ◮ What if we have an initial object that has several interpretations? ◮ A representation of a set of possible instances ◮ We propose a new general formalism to exchange representations of possible instances ◮ We apply it to the problems of exchanging instances with incomplete information and exchanging knowledge bases M. Arenas – Exchanging more than Complete Data - RR2011 32 / 68
  • 74. Outline: Second part ◮ Formalism for exchanging representations systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases ◮ Concluding remarks M. Arenas – Exchanging more than Complete Data - RR2011 33 / 68
  • 75. Outline: Second part ◮ Formalism for exchanging representations systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases ◮ Concluding remarks M. Arenas – Exchanging more than Complete Data - RR2011 34 / 68
  • 76. Representation systems A representation system R = (W, rep) consists of: ◮ a set W of representatives ◮ a function rep that assigns a set of instances to every element in W rep(V) = {I1 , I2 , I3 , . . .} for every V ∈ W Uniformity assumption: For every V ∈ W, there exists a relational schema S (the type of V) such that rep(V) ⊆ Inst(S) M. Arenas – Exchanging more than Complete Data - RR2011 35 / 68
  • 77. Representation systems A representation system R = (W, rep) consists of: ◮ a set W of representatives ◮ a function rep that assigns a set of instances to every element in W rep(V) = {I1 , I2 , I3 , . . .} for every V ∈ W Uniformity assumption: For every V ∈ W, there exists a relational schema S (the type of V) such that rep(V) ⊆ Inst(S) Incomplete instances and knowledge bases are representation systems M. Arenas – Exchanging more than Complete Data - RR2011 35 / 68
  • 78. In classical data exchange we consider only complete data Recall that given M = (S, T, Σ), I ∈ Inst(S) and J ∈ Inst(T): J is a solution for I under M if (I , J) |= Σ J ∈ SolM (I ) M. Arenas – Exchanging more than Complete Data - RR2011 36 / 68
  • 79. In classical data exchange we consider only complete data Recall that given M = (S, T, Σ), I ∈ Inst(S) and J ∈ Inst(T): J is a solution for I under M if (I , J) |= Σ J ∈ SolM (I ) This can be extended to set of instances. Given X ⊆ Inst(S): SolM (X ) = SolM (I ) I ∈X M. Arenas – Exchanging more than Complete Data - RR2011 36 / 68
  • 80. Extending the definition to representation systems Given: ◮ a mapping M = (S, T, Σ) ◮ a representation system R = (W, rep) ◮ U, V ∈ W of types S and T, respectively M. Arenas – Exchanging more than Complete Data - RR2011 37 / 68
  • 81. Extending the definition to representation systems Given: ◮ a mapping M = (S, T, Σ) ◮ a representation system R = (W, rep) ◮ U, V ∈ W of types S and T, respectively Definition (APR11) V is an R-solution of U under M if rep(V) ⊆ SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 37 / 68
  • 82. Extending the definition to representation systems Given: ◮ a mapping M = (S, T, Σ) ◮ a representation system R = (W, rep) ◮ U, V ∈ W of types S and T, respectively Definition (APR11) V is an R-solution of U under M if rep(V) ⊆ SolM (rep(U)) Or equivalently: V is an R-solution of U if for every J ∈ rep(V), there exists I ∈ rep(U) such that J ∈ SolM (I ). M. Arenas – Exchanging more than Complete Data - RR2011 37 / 68
  • 83. Universal solutions What is a good solution in this framework? M. Arenas – Exchanging more than Complete Data - RR2011 38 / 68
  • 84. Universal solutions What is a good solution in this framework? Definition (APR11) V is an universal R-solution of U under M if rep(V) = SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 38 / 68
  • 85. Strong representation systems Let C be a class of mappings. M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 86. Strong representation systems Let C be a class of mappings. Definition (APR11) R = (W, rep) is a strong representation system for C if for every M∈C and for every U ∈ W , there exists a V ∈W : rep(V) = SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 87. Strong representation systems Let C be a class of mappings. Definition (APR11) R = (W, rep) is a strong representation system for C if for every M ∈ C from S to T, and for every U ∈ W , there exists a V ∈W : rep(V) = SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 88. Strong representation systems Let C be a class of mappings. Definition (APR11) R = (W, rep) is a strong representation system for C if for every M ∈ C from S to T, and for every U ∈ W of type S, there exists a V ∈W : rep(V) = SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 89. Strong representation systems Let C be a class of mappings. Definition (APR11) R = (W, rep) is a strong representation system for C if for every M ∈ C from S to T, and for every U ∈ W of type S, there exists a V ∈ W of type T: rep(V) = SolM (rep(U)) M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 90. Strong representation systems Let C be a class of mappings. Definition (APR11) R = (W, rep) is a strong representation system for C if for every M ∈ C from S to T, and for every U ∈ W of type S, there exists a V ∈ W of type T: rep(V) = SolM (rep(U)) If R = (W, rep) is a strong representation system, then the universal solutions for the representatives in W can be represented in the same system. M. Arenas – Exchanging more than Complete Data - RR2011 39 / 68
  • 91. Outline: Second part ◮ Formalism for exchanging representations systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases ◮ Concluding remarks M. Arenas – Exchanging more than Complete Data - RR2011 40 / 68
  • 92. Motivating questions What is a strong representation system for the class of mappings specified by st-tgds? ◮ Are instances including nulls enough? Can the fundamental data exchange problems be solved in polynomial time in this setting? ◮ Computing (universal) solutions ◮ Computing certain answers M. Arenas – Exchanging more than Complete Data - RR2011 41 / 68
  • 93. Naive instances We have already considered naive instances: Instances with null values ◮ Example: Canonical universal solution A naive instance I has labeled nulls: R(1, n1 ) R(n1 , 2) R(1, n2 ) M. Arenas – Exchanging more than Complete Data - RR2011 42 / 68
  • 94. Naive instances We have already considered naive instances: Instances with null values ◮ Example: Canonical universal solution A naive instance I has labeled nulls: R(1, n1 ) R(n1 , 2) R(1, n2 ) The interpretations of I are constructed by replacing nulls by constants: rep(I) = {K | µ(I) ⊆ K for some valuation µ} M. Arenas – Exchanging more than Complete Data - RR2011 42 / 68
  • 95. Are naive instances expressive enough? Naive instances have been extensively used in data exchange: Proposition (FKMP03) Let M = (S, T, Σ), where Σ is a set of st-tgds. Then for every instance I of S, there exists a naive instance J of T such that: rep(J ) = SolM (I ) In fact, the canonical universal solution satisfies the property mentioned above. M. Arenas – Exchanging more than Complete Data - RR2011 43 / 68
  • 96. Are naive instances expressive enough? But naive instances are not expressive enough to deal with incomplete information in the source instances: Proposition (APR11) Naive instances are not a strong representation system for the class of mappings specified by st-tgds M. Arenas – Exchanging more than Complete Data - RR2011 44 / 68
  • 97. Are naive instances expressive enough? Example Consider a mapping M specified by: Manager(x, y ) → Reports(x, y ) Manager(x, x) → SelfManager(x) The canonical universal solution for I = {Manager(n, Peter)} under M: J = {Reports(n, Peter)} But J is not a good solution for I. ◮ It cannot represent the fact that if n is given value Peter, then SelfManager(Peter) should hold in the target. M. Arenas – Exchanging more than Complete Data - RR2011 45 / 68
  • 98. Conditional instances What should be added to naive instances to obtain a strong representation system? M. Arenas – Exchanging more than Complete Data - RR2011 46 / 68
  • 99. Conditional instances What should be added to naive instances to obtain a strong representation system? ◮ Answer from database theory: Conditions on the nulls M. Arenas – Exchanging more than Complete Data - RR2011 46 / 68
  • 100. Conditional instances What should be added to naive instances to obtain a strong representation system? ◮ Answer from database theory: Conditions on the nulls Conditional instances: Naive instances plus tuple conditions A tuple condition is a positive Boolean combinations of: ◮ equalities and inequalities between nulls, and between nulls and constants M. Arenas – Exchanging more than Complete Data - RR2011 46 / 68
  • 101. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 102. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 103. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: µ(n1 ) = µ(n2 ) = 2 µ(n1 ) = µ(n2 ) = 3 µ(n1 ) = 2, µ(n2 ) = 3 M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 104. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: µ(n1 ) = µ(n2 ) = 2 µ(n1 ) = µ(n2 ) = 3 µ(n1 ) = 2, µ(n2 ) = 3 R(1, 2) R(2, 2) M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 105. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: µ(n1 ) = µ(n2 ) = 2 µ(n1 ) = µ(n2 ) = 3 µ(n1 ) = 2, µ(n2 ) = 3 R(1, 2) R(1, 3) R(2, 2) M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 106. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: µ(n1 ) = µ(n2 ) = 2 µ(n1 ) = µ(n2 ) = 3 µ(n1 ) = 2, µ(n2 ) = 3 R(1, 2) R(1, 3) R(2, 2) R(2, 3) M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 107. Conditional instances Example R(1, n1 ) n1 = n2 R(n1 , n2 ) n1 = n2 ∨ n2 = 2 Semantics: µ(n1 ) = µ(n2 ) = 2 µ(n1 ) = µ(n2 ) = 3 µ(n1 ) = 2, µ(n2 ) = 3 R(1, 2) R(1, 3) R(2, 2) R(2, 3) Interpretations of a conditional instance I: rep(I) = {K | µ(I) ⊆ K for some valuation µ} M. Arenas – Exchanging more than Complete Data - RR2011 47 / 68
  • 108. Positive conditional instances Many problems are intractable over conditional instances. ◮ We also consider a restricted class of conditional instances Positive conditional instances: Conditional instances without inequalities M. Arenas – Exchanging more than Complete Data - RR2011 48 / 68
  • 109. (Positive) conditional instances are enough Theorem (APR11) Both conditional instances and positive conditional instances are strong representation systems for the class of mappings specified by st-tgds. Example Consider again the mapping M specified by: Manager(x, y ) → Reports(x, y ) Manager(x, x) → SelfManager(x) The following is a universal solution for I = {Manager(n, Peter)} Reports(n, Peter) true SelfManager(Peter) n = Peter M. Arenas – Exchanging more than Complete Data - RR2011 49 / 68
  • 110. Positive conditional instances are exactly the needed representation system Positive conditional instances are minimal: Theorem (APR11) All the following are needed to obtain a strong representation system for the class of mappings specified by st-tgds: ◮ equalities between nulls ◮ equalities between constant and nulls ◮ conjunctions and disjunctions Conditional instances are enough but not minimal. M. Arenas – Exchanging more than Complete Data - RR2011 50 / 68
  • 111. Positive conditional instance can be used in practice! Let M = (S, T, Σ), where Σ is a set of st-tgds. M. Arenas – Exchanging more than Complete Data - RR2011 51 / 68
  • 112. Positive conditional instance can be used in practice! Let M = (S, T, Σ), where Σ is a set of st-tgds. Theorem (APR11) There exists a polynomial time algorithm that, given a positive conditional instance I over S, computes a positive conditional instance J over T that is a universal solution for I under M. M. Arenas – Exchanging more than Complete Data - RR2011 51 / 68
  • 113. Positive conditional instance can be used in practice! Let M = (S, T, Σ), where Σ is a set of st-tgds. Theorem (APR11) There exists a polynomial time algorithm that, given a positive conditional instance I over S, computes a positive conditional instance J over T that is a universal solution for I under M. Let Q be a union of conjunctive queries over T. Q(J ) = Q(J) J∈rep(J ) certainM (Q, I) = Q(J ) J is a solution for I under M M. Arenas – Exchanging more than Complete Data - RR2011 51 / 68
  • 114. Positive conditional instance can be used in practice! Theorem (APR11) There exists a polynomial time algorithm that, given a positive conditional instance I over S, computes certainM (Q, I). M. Arenas – Exchanging more than Complete Data - RR2011 52 / 68
  • 115. Positive conditional instance can be used in practice! Theorem (APR11) There exists a polynomial time algorithm that, given a positive conditional instance I over S, computes certainM (Q, I). The same result holds for the class of unions of conjunctive queries with at most one inequality per disjunct. ◮ The other important class of queries in the data exchange area for which certain answers can be computed in polynomial time M. Arenas – Exchanging more than Complete Data - RR2011 52 / 68
  • 116. Outline: Second part ◮ Formalism for exchanging representations systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases ◮ Concluding remarks M. Arenas – Exchanging more than Complete Data - RR2011 53 / 68
  • 117. The semantics of knowledge bases is given by sets of instances Knowledge base over S: (I , Γ) such that ◮ I ∈ Inst(S) ◮ Γ a set of rules over S Semantics: finite models Mod(I , Γ) = {K ∈ Inst(S) | I ⊆ K and K |= Γ} M. Arenas – Exchanging more than Complete Data - RR2011 54 / 68
  • 118. We can apply our formalism to knowledge bases (I2 , Γ2 ) is a KB-solution for (I1 , Γ1 ) under M if: Mod(I2 , Γ2 ) ⊆ SolM (Mod(I1 , Γ1 )) (I2 , Γ2 ) is a universal KB-solution for (I1 , Γ1 ) under M if: Mod(I2 , Γ2 ) = SolM (Mod(I1 , Γ1 )) M. Arenas – Exchanging more than Complete Data - RR2011 55 / 68
  • 119. Motivating questions Same as for the case of instances with incomplete information. ◮ Constructing universal KB-solutions ◮ Answering target queries New fundamental problem: Construct solutions including as much implicit knowledge as possible. M. Arenas – Exchanging more than Complete Data - RR2011 56 / 68
  • 120. What are good knowledge-base solutions? First alternative: universal KB-solutions But there exist some other KB-solutions desirable to materialize ◮ Minimality comes into play M. Arenas – Exchanging more than Complete Data - RR2011 57 / 68
  • 121. What are good knowledge-base solutions? First alternative: universal KB-solutions But there exist some other KB-solutions desirable to materialize ◮ Minimality comes into play Given sets X , Y of instances: ◮ X ≡min Y if X and Y coincide in the minimal instances under ⊆ Definition (I2 , Γ2 ) is a minimal KB-solution of (I1 , Γ1 ) under M if: Mod(I2 , Γ2 ) ≡min SolM (Mod(I1 , Γ1 )) M. Arenas – Exchanging more than Complete Data - RR2011 57 / 68
  • 122. Two requirements to construct minimal knowledge-base solutions Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution (I2 , Γ2 ) we would like: M. Arenas – Exchanging more than Complete Data - RR2011 58 / 68
  • 123. Two requirements to construct minimal knowledge-base solutions Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution (I2 , Γ2 ) we would like: 1. Γ2 to only depend on Γ1 and M: Γ2 is safe for Γ1 and M M. Arenas – Exchanging more than Complete Data - RR2011 58 / 68
  • 124. Two requirements to construct minimal knowledge-base solutions Given (I1 , Γ1 ) and M, when constructing a minimal KB-solution (I2 , Γ2 ) we would like: 1. Γ2 to only depend on Γ1 and M: Γ2 is safe for Γ1 and M Definition Γ2 is safe for Γ1 and M, if for every I1 there exists I2 : (I2 , Γ2 ) is a minimal KB-solution of (I1 , Γ1 ) under M M. Arenas – Exchanging more than Complete Data - RR2011 58 / 68
  • 125. Two requirements to construct minimal knowledge-base solutions 2. Γ2 to be as informative as possible (thus minimizing the size of I2 ): M. Arenas – Exchanging more than Complete Data - RR2011 59 / 68
  • 126. Two requirements to construct minimal knowledge-base solutions 2. Γ2 to be as informative as possible (thus minimizing the size of I2 ): Definition Γ2 is optimal-safe if for every other safe set Γ′ : Γ2 |= Γ′ M. Arenas – Exchanging more than Complete Data - RR2011 59 / 68
  • 127. Computing minimal KB-solutions To obtain algorithms for computing minimal KB-solutions, we need to specify the language used in knowledge bases. ◮ Full st-tgd: ∀¯∀¯ (ϕ(¯ , y ) → ψ(¯ )) x y x ¯ x M. Arenas – Exchanging more than Complete Data - RR2011 60 / 68
  • 128. Computing minimal KB-solutions To obtain algorithms for computing minimal KB-solutions, we need to specify the language used in knowledge bases. ◮ Full st-tgd: ∀¯∀¯ (ϕ(¯ , y ) → ψ(¯ )) x y x ¯ x Theorem (APR11) There exists a polynomial-time algorithm that, given M = (S, T, Σ), where Σ is a set of full st-tgds, and given a set Γ1 of full tgds over S, computes a set Γ2 of second-order logic sentences over T that is optimal-safe for Γ1 and M. M. Arenas – Exchanging more than Complete Data - RR2011 60 / 68
  • 129. Computing minimal KB-solutions Unfortunately, first-order logic is no expressive enough. Theorem (APR11) There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a set Γ1 of full tgds over S such that: no FO-sentence is optimal-safe for Γ1 and M. M. Arenas – Exchanging more than Complete Data - RR2011 61 / 68
  • 130. Computing minimal KB-solutions Unfortunately, first-order logic is no expressive enough. Theorem (APR11) There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a set Γ1 of full tgds over S such that: no FO-sentence is optimal-safe for Γ1 and M. How can we deal with these problems in practice? M. Arenas – Exchanging more than Complete Data - RR2011 61 / 68
  • 131. Computing minimal KB-solutions Unfortunately, first-order logic is no expressive enough. Theorem (APR11) There exist M = (S, T, Σ), where Σ is a set of full st-tgds, and a set Γ1 of full tgds over S such that: no FO-sentence is optimal-safe for Γ1 and M. How can we deal with these problems in practice? ◮ We need to restrict the language used to specify knowledge bases: Description logics [ABC11] M. Arenas – Exchanging more than Complete Data - RR2011 61 / 68
  • 132. Outline: Second part ◮ Formalism for exchanging representations systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases ◮ Concluding remarks M. Arenas – Exchanging more than Complete Data - RR2011 62 / 68
  • 133. We can exchange more than complete data We propose a general formalism to exchange representation systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases Next step: Apply our general setting to the Semantic Web ◮ Semantic Web data has nulls (blank nodes) ◮ Semantic Web specifications have rules (RDFS, OWL) Lots of interesting problems to solve if knowledge bases are specified by means of description logics. ◮ Better results can be obtained M. Arenas – Exchanging more than Complete Data - RR2011 63 / 68
  • 134. We can exchange more than complete data We propose a general formalism to exchange representation systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases Next step: Apply our general setting to the Semantic Web ◮ Semantic Web data has nulls (blank nodes) ◮ Semantic Web specifications have rules (RDFS, OWL) Lots of interesting problems to solve if knowledge bases are specified by means of description logics. ◮ Better results can be obtained M. Arenas – Exchanging more than Complete Data - RR2011 63 / 68
  • 135. We can exchange more than complete data We propose a general formalism to exchange representation systems ◮ Applications to incomplete instances ◮ Applications to knowledge bases Next step: Apply our general setting to the Semantic Web ◮ Semantic Web data has nulls (blank nodes) ◮ Semantic Web specifications have rules (RDFS, OWL) Lots of interesting problems to solve if knowledge bases are specified by means of description logics. ◮ Better results can be obtained M. Arenas – Exchanging more than Complete Data - RR2011 63 / 68
  • 136. Thank you! M. Arenas – Exchanging more than Complete Data - RR2011 64 / 68
  • 137. Bibliography [ABC11] M. Arenas, E. Botoeva, D. Calvanese. Knowledge Base Exchange. DL 2011. [APR11] M. Arenas, J. P´rez, J. Reutter. Data Exchange beyond Complete e Data. PODS 2011. [B03] P. A. Bernstein. Applying Model Management to Classical Meta Data Problems. CIDR 2003. [FKMP03] R. Fagin, P. G. Kolaitis, R. J. Miller, L. Popa. Data Exchange: Semantics and Query Answering. ICDT 2003. M. Arenas – Exchanging more than Complete Data - RR2011 65 / 68
  • 138. Bonus track: Computation of solutions and its associated decision problem Decision problem: Check-KB-Sol Input: M = (S, T, Σ), where Σ is a set of st-tgds (I1 , Γ1 ) KB over S with Γ1 a set of tgds (I2 , Γ2 ) KB over T with Γ2 a set of tgds Output: Is (I2 , Γ2 ) a KB-solution of (I1 , Γ1 ) under M? M. Arenas – Exchanging more than Complete Data - RR2011 66 / 68
  • 139. Bonus track: Computation of solutions and its associated decision problem Decision problem: Check-KB-Sol Input: M = (S, T, Σ), where Σ is a set of st-tgds (I1 , Γ1 ) KB over S with Γ1 a set of tgds (I2 , Γ2 ) KB over T with Γ2 a set of tgds Output: Is (I2 , Γ2 ) a KB-solution of (I1 , Γ1 ) under M? Theorem (APR11) Check-KB-Sol is undecidable (even for a fixed M). M. Arenas – Exchanging more than Complete Data - RR2011 66 / 68
  • 140. Bonus track: Computation of solutions and its associated decision problem Undecidability is a consequence of using ∃ in knowledge bases. ◮ We need to restrict the input Check-Full-KB-Sol: Γ1 , Γ2 are assumed to be sets of full tgds M. Arenas – Exchanging more than Complete Data - RR2011 67 / 68
  • 141. Bonus track: Computation of solutions and its associated decision problem Theorem (APR11) Check-Full-KB-Sol is EXPTIME-complete. M. Arenas – Exchanging more than Complete Data - RR2011 68 / 68
  • 142. Bonus track: Computation of solutions and its associated decision problem Theorem (APR11) Check-Full-KB-Sol is EXPTIME-complete. Theorem (APR11) If M = (S, T, Σ) is fixed: Check-Full-KB-Sol is ∆P [O(log n)]-complete. 2 ∆P [O(log n)]: 2 P NP with a logarithmic number of calls to the NP oracle. M. Arenas – Exchanging more than Complete Data - RR2011 68 / 68