SlideShare a Scribd company logo
An Evolutionary Perspective on
 Approximate RDF Query Answering



Christophe Guéret, Eyal Oren, Stefan Schlobach,
     Frank van Harmelen and Martijn Schut



         Vrije Universiteit, Amsterdam
Problem and context          Method proposed   Experimental results   Conclusion



 The next 30 minutes in 4 points...

              RDF




                                                                          griffioen



SUM 2008 - October 2, 2008                                                 2 / 24
Problem and context            Method proposed   Experimental results   Conclusion



 The next 30 minutes in 4 points...

              RDF
                      Data on the Web




                                                                            griffioen



SUM 2008 - October 2, 2008                                                   2 / 24
Problem and context            Method proposed       Experimental results    Conclusion



 The next 30 minutes in 4 points...

              RDF
                      Data on the Web
                      Inconsistent, uncertain, heterogeneous, Huge and growing!




                                                                                  griffioen



SUM 2008 - October 2, 2008                                                        2 / 24
Problem and context            Method proposed       Experimental results    Conclusion



 The next 30 minutes in 4 points...

              RDF
                      Data on the Web
                      Inconsistent, uncertain, heterogeneous, Huge and growing!

              RDF Query answering




                                                                                  griffioen



SUM 2008 - October 2, 2008                                                        2 / 24
Problem and context             Method proposed         Experimental results   Conclusion



 The next 30 minutes in 4 points...

              RDF
                      Data on the Web
                      Inconsistent, uncertain, heterogeneous, Huge and growing!

              RDF Query answering
                      Finding data matching criterion




                                                                                   griffioen



SUM 2008 - October 2, 2008                                                          2 / 24
Problem and context             Method proposed        Experimental results   Conclusion



 The next 30 minutes in 4 points...

              RDF
                      Data on the Web
                      Inconsistent, uncertain, heterogeneous, Huge and growing!

              RDF Query answering
                      Finding data matching criterion
                      ... many queries are actually not satisfiable




                                                                                  griffioen



SUM 2008 - October 2, 2008                                                         2 / 24
Problem and context             Method proposed        Experimental results   Conclusion



 The next 30 minutes in 4 points...

              RDF
                      Data on the Web
                      Inconsistent, uncertain, heterogeneous, Huge and growing!

              RDF Query answering
                      Finding data matching criterion
                      ... many queries are actually not satisfiable

              Approximate RDF Query answering




                                                                                  griffioen



SUM 2008 - October 2, 2008                                                         2 / 24
Problem and context             Method proposed          Experimental results   Conclusion



 The next 30 minutes in 4 points...

              RDF
                      Data on the Web
                      Inconsistent, uncertain, heterogeneous, Huge and growing!

              RDF Query answering
                      Finding data matching criterion
                      ... many queries are actually not satisfiable

              Approximate RDF Query answering
                      Finding some, almost valid, data




                                                                                    griffioen



SUM 2008 - October 2, 2008                                                           2 / 24
Problem and context             Method proposed          Experimental results   Conclusion



 The next 30 minutes in 4 points...

              RDF
                      Data on the Web
                      Inconsistent, uncertain, heterogeneous, Huge and growing!

              RDF Query answering
                      Finding data matching criterion
                      ... many queries are actually not satisfiable

              Approximate RDF Query answering
                      Finding some, almost valid, data

              The Evolutionary Perspective

                                                                                    griffioen



SUM 2008 - October 2, 2008                                                           2 / 24
Problem and context             Method proposed          Experimental results   Conclusion



 The next 30 minutes in 4 points...

              RDF
                      Data on the Web
                      Inconsistent, uncertain, heterogeneous, Huge and growing!

              RDF Query answering
                      Finding data matching criterion
                      ... many queries are actually not satisfiable

              Approximate RDF Query answering
                      Finding some, almost valid, data

              The Evolutionary Perspective
                      Test different solutions
                                                                                    griffioen



SUM 2008 - October 2, 2008                                                           2 / 24
Problem and context             Method proposed          Experimental results   Conclusion



 The next 30 minutes in 4 points...

              RDF
                      Data on the Web
                      Inconsistent, uncertain, heterogeneous, Huge and growing!

              RDF Query answering
                      Finding data matching criterion
                      ... many queries are actually not satisfiable

              Approximate RDF Query answering
                      Finding some, almost valid, data

              The Evolutionary Perspective
                      Test different solutions
                      Progressive optimisation of the result                        griffioen



SUM 2008 - October 2, 2008                                                           2 / 24
Problem and context          Method proposed   Experimental results   Conclusion




       1    What’s the problem ?
             Querying RDF datastores
             Standard techniques

       2    And Now for Something Completely Different
              Guessing the solution instead
              The way we do it

       3    Does it work ?
              Evolution of the quality
              Some characteristics of this method

       4    TODO list
                                                                          griffioen



SUM 2008 - October 2, 2008                                                 3 / 24
Problem and context          Method proposed   Experimental results   Conclusion




       1    What’s the problem ?
             Querying RDF datastores
             Standard techniques

       2    And Now for Something Completely Different
              Guessing the solution instead
              The way we do it

       3    Does it work ?
              Evolution of the quality
              Some characteristics of this method

       4    TODO list
                                                                          griffioen



SUM 2008 - October 2, 2008                                                 4 / 24
Problem and context          Method proposed   Experimental results   Conclusion



 Example

              RDF dataset
              <Ullman88> type Book .
              <Ullman88> label "Principles of Database and
                  Knowledge-Base Systems" .
              <Ullman88> author b1 .
              b1 _1 ullman .
              ullman homepage <http://stanford.edu/~ullman/> .

              SPARQL query
              SELECT ?title WHERE {
              ?publication type Book .
              ?publication label ?title .
              }

              Expected answer
                                                                    griffioen
              ?title = "Principles of Database and Knowledge-Base Systems

SUM 2008 - October 2, 2008                                                 5 / 24
Problem and context          Method proposed           Experimental results   Conclusion



 Problem description


         Triple =
         subject,
         predicate,
         object

         Dataset =
         graph of
         triples

         Querying :
         find a
         pattern in
         the graph                                                                griffioen

                                               A query and a graph [PSPARQL07]
SUM 2008 - October 2, 2008                                                         6 / 24
Problem and context          Method proposed   Experimental results   Conclusion



 Standard techniques

              Standard approach :




                                                                          griffioen



SUM 2008 - October 2, 2008                                                 7 / 24
Problem and context            Method proposed      Experimental results    Conclusion



 Standard techniques

              Standard approach :
                 1    Find all the possible results for ?publication type
                      Book




                                                                                griffioen



SUM 2008 - October 2, 2008                                                       7 / 24
Problem and context              Method proposed    Experimental results    Conclusion



 Standard techniques

              Standard approach :
                 1    Find all the possible results for ?publication type
                      Book
                              ?publication
                             <Ullman88>




                                                                                griffioen



SUM 2008 - October 2, 2008                                                       7 / 24
Problem and context              Method proposed    Experimental results     Conclusion



 Standard techniques

              Standard approach :
                 1    Find all the possible results for ?publication type
                      Book
                              ?publication
                             <Ullman88>
                 2    Find all the possible results for ?publication label
                      ?title




                                                                                 griffioen



SUM 2008 - October 2, 2008                                                        7 / 24
Problem and context              Method proposed            Experimental results   Conclusion



 Standard techniques

              Standard approach :
                 1    Find all the possible results for ?publication type
                      Book
                              ?publication
                             <Ullman88>
                 2    Find all the possible results for ?publication label
                      ?title
                              ?publication                 ?title
                             <Ullman88>            "Principles of ..."




                                                                                       griffioen



SUM 2008 - October 2, 2008                                                              7 / 24
Problem and context              Method proposed            Experimental results   Conclusion



 Standard techniques

              Standard approach :
                 1    Find all the possible results for ?publication type
                      Book
                              ?publication
                             <Ullman88>
                 2    Find all the possible results for ?publication label
                      ?title
                              ?publication                 ?title
                             <Ullman88>            "Principles of ..."
                 3    Do a join on the two tables and return the result




                                                                                       griffioen



SUM 2008 - October 2, 2008                                                              7 / 24
Problem and context               Method proposed            Experimental results   Conclusion



 Standard techniques

              Standard approach :
                 1    Find all the possible results for ?publication type
                      Book
                               ?publication
                              <Ullman88>
                 2    Find all the possible results for ?publication label
                      ?title
                               ?publication                 ?title
                              <Ullman88>            "Principles of ..."
                 3    Do a join on the two tables and return the result
                             ?title = "Principles of ..."




                                                                                        griffioen



SUM 2008 - October 2, 2008                                                               7 / 24
Problem and context               Method proposed            Experimental results   Conclusion



 Standard techniques

              Standard approach :
                 1    Find all the possible results for ?publication type
                      Book
                               ?publication
                              <Ullman88>
                 2    Find all the possible results for ?publication label
                      ?title
                               ?publication                 ?title
                              <Ullman88>            "Principles of ..."
                 3    Do a join on the two tables and return the result
                             ?title = "Principles of ..."



              Fast thanks to the creation of indexes and query
              optimisation                                                              griffioen



SUM 2008 - October 2, 2008                                                               7 / 24
Problem and context          Method proposed   Experimental results   Conclusion



 Motivation
              Designed to return results only when there are some
              Not designed for incomplete and approximate
              queries/answers
              Hard to distribute




                                                                          griffioen



SUM 2008 - October 2, 2008                                                 8 / 24
Problem and context             Method proposed         Experimental results       Conclusion



 Motivation
              Designed to return results only when there are some
              Not designed for incomplete and approximate
              queries/answers
              Hard to distribute


              Approximate answers to precise queries
                      If the query is unsat, return the best almost sat solution
                      found




                                                                                       griffioen



SUM 2008 - October 2, 2008                                                              8 / 24
Problem and context             Method proposed         Experimental results       Conclusion



 Motivation
              Designed to return results only when there are some
              Not designed for incomplete and approximate
              queries/answers
              Hard to distribute


              Approximate answers to precise queries
                      If the query is unsat, return the best almost sat solution
                      found

              Precises answers to approximate queries
                      Return a subset of existing solutions instead of showing
                      them all


                                                                                       griffioen



SUM 2008 - October 2, 2008                                                              8 / 24
Problem and context             Method proposed         Experimental results       Conclusion



 Motivation
              Designed to return results only when there are some
              Not designed for incomplete and approximate
              queries/answers
              Hard to distribute


              Approximate answers to precise queries
                      If the query is unsat, return the best almost sat solution
                      found

              Precises answers to approximate queries
                      Return a subset of existing solutions instead of showing
                      them all

              Interactive querying
                      Use of intermediate results to help the user improving his       griffioen

                      query
SUM 2008 - October 2, 2008                                                              8 / 24
Problem and context          Method proposed   Experimental results   Conclusion




       1    What’s the problem ?
             Querying RDF datastores
             Standard techniques

       2    And Now for Something Completely Different
              Guessing the solution instead
              The way we do it

       3    Does it work ?
              Evolution of the quality
              Some characteristics of this method

       4    TODO list
                                                                          griffioen



SUM 2008 - October 2, 2008                                                 9 / 24
Problem and context          Method proposed   Experimental results   Conclusion



 Approach

              “I’m Feeling Lucky” approach :




                                                                          griffioen



SUM 2008 - October 2, 2008                                                10 / 24
Problem and context            Method proposed       Experimental results   Conclusion



 Approach

              “I’m Feeling Lucky” approach :
                 1    Assign some random values to the variables




                                                                                griffioen



SUM 2008 - October 2, 2008                                                      10 / 24
Problem and context              Method proposed          Experimental results   Conclusion



 Approach

              “I’m Feeling Lucky” approach :
                 1    Assign some random values to the variables
                             ?publication     =    <Ullman88>
                             ?title           =    Book




                                                                                     griffioen



SUM 2008 - October 2, 2008                                                           10 / 24
Problem and context              Method proposed          Experimental results   Conclusion



 Approach

              “I’m Feeling Lucky” approach :
                 1    Assign some random values to the variables
                             ?publication     =    <Ullman88>
                             ?title           =    Book
                 2    Verify if the solution is valid




                                                                                     griffioen



SUM 2008 - October 2, 2008                                                           10 / 24
Problem and context              Method proposed          Experimental results   Conclusion



 Approach

              “I’m Feeling Lucky” approach :
                 1    Assign some random values to the variables
                             ?publication     =    <Ullman88>
                             ?title           =    Book
                 2    Verify if the solution is valid
                             Triple                          Is in the graph ?
                             <Ullman88> type Book                    yes
                             <Ullman88> label Book                   no




                                                                                     griffioen



SUM 2008 - October 2, 2008                                                           10 / 24
Problem and context              Method proposed          Experimental results   Conclusion



 Approach

              “I’m Feeling Lucky” approach :
                 1    Assign some random values to the variables
                             ?publication     =    <Ullman88>
                             ?title           =    Book
                 2    Verify if the solution is valid
                             Triple                          Is in the graph ?
                             <Ullman88> type Book                    yes
                             <Ullman88> label Book                   no
                 3    If the solution is OK, stop. Otherwise, try again with
                      something else




                                                                                     griffioen



SUM 2008 - October 2, 2008                                                           10 / 24
Problem and context              Method proposed          Experimental results   Conclusion



 Approach

              “I’m Feeling Lucky” approach :
                 1    Assign some random values to the variables
                             ?publication     =    <Ullman88>
                             ?title           =    Book
                 2    Verify if the solution is valid
                             Triple                          Is in the graph ?
                             <Ullman88> type Book                    yes
                             <Ullman88> label Book                   no
                 3    If the solution is OK, stop. Otherwise, try again with
                      something else

              Rely on membership testing (instead of lookup)


                                                                                     griffioen



SUM 2008 - October 2, 2008                                                           10 / 24
Problem and context              Method proposed          Experimental results   Conclusion



 Approach

              “I’m Feeling Lucky” approach :
                 1    Assign some random values to the variables
                             ?publication     =    <Ullman88>
                             ?title           =    Book
                 2    Verify if the solution is valid
                             Triple                          Is in the graph ?
                             <Ullman88> type Book                    yes
                             <Ullman88> label Book                   no
                 3    If the solution is OK, stop. Otherwise, try again with
                      something else

              Rely on membership testing (instead of lookup)
              The testing loop can be stopped at any time
                                                                                     griffioen



SUM 2008 - October 2, 2008                                                           10 / 24
Problem and context              Method proposed          Experimental results   Conclusion



 Approach

              “I’m Feeling Lucky” approach :
                 1    Assign some random values to the variables
                             ?publication     =    <Ullman88>
                             ?title           =    Book
                 2    Verify if the solution is valid
                             Triple                          Is in the graph ?
                             <Ullman88> type Book                    yes
                             <Ullman88> label Book                   no
                 3    If the solution is OK, stop. Otherwise, try again with
                      something else

              Rely on membership testing (instead of lookup)
              The testing loop can be stopped at any time
              A result may satisfy part of the query                                 griffioen



SUM 2008 - October 2, 2008                                                           10 / 24
Problem and context          Method proposed    Experimental results   Conclusion



 Our choices


              Need to pay attention to two aspects




                                                                           griffioen



SUM 2008 - October 2, 2008                                                 11 / 24
Problem and context               Method proposed           Experimental results        Conclusion



 Our choices


              Need to pay attention to two aspects
                 1    Each try should be a step closer to the solution
                             Random guessing may never end
                             Stopping the process at t + 1 should give better results than
                             at t




                                                                                             griffioen



SUM 2008 - October 2, 2008                                                                   11 / 24
Problem and context                Method proposed          Experimental results        Conclusion



 Our choices


              Need to pay attention to two aspects
                 1    Each try should be a step closer to the solution
                             Random guessing may never end
                             Stopping the process at t + 1 should give better results than
                             at t
                 2    Testing a candidate solution must be fast
                             Will try a lot of solutions




                                                                                             griffioen



SUM 2008 - October 2, 2008                                                                   11 / 24
Problem and context                Method proposed          Experimental results        Conclusion



 Our choices


              Need to pay attention to two aspects
                 1    Each try should be a step closer to the solution
                             Random guessing may never end
                             Stopping the process at t + 1 should give better results than
                             at t
                 2    Testing a candidate solution must be fast
                             Will try a lot of solutions


              We made the following choices
                      Generation of solutions : Evolutionary algorithm
                      Verification of solutions : Bloom filter based testing

                                                                                             griffioen



SUM 2008 - October 2, 2008                                                                   11 / 24
Problem and context              Method proposed               Experimental results   Conclusion



 Binary Bloom filters                                                                   (1/2)

              Compact representation of information : a set of n = 8 bits


                                1      2    3      4   5   6     7     8

              Supports two operations
                      INSERT ( KEY )   : Insert a key into the filter
                      CONTAINS ( KEY )     : Test for the presence of a key

              Use k = 3 hash functions to compute a set of bits from a
              key
           HASH 1(“ HELLO WORLD ”)=8
           HASH 2(“ HELLO WORLD ”)=6
           HASH 3(“ HELLO WORLD ”)=3
                                                                                          griffioen



SUM 2008 - October 2, 2008                                                                12 / 24
Problem and context          Method proposed    Experimental results          Conclusion



 Binary Bloom filters                                                              (2/2)
              INSERT (“ HELLO WORLD ”)
           Current
                                                        Bit-wise or operation
                       OR
                                                        Always successful (i.e.
    “Hello world”
                                                        unlimited capacity)
                        =
                                                        Precision depends of
               New
                                                        number of elements m.


              CONTAINS (“B ONJOUR         !”)
          Current
                                                        Bit-wise and operation
                      AND
      “Bonjour !”                                       Positive result can be a
                                                        collision
                        =
                                                                             kn     griffioen
       Test result                                          perror = (1 − e− m )k

SUM 2008 - October 2, 2008                                                          13 / 24
Problem and context          Method proposed       Experimental results          Conclusion



 A first (naive) approach

              Insert all the triples into a unique Bloom filter.
                      INSERT (“<Ullman88>_type_Book”)
                      INSERT (“<Ullman88>_label_"Principles               of ..."”)
                      ...




                                                                                      griffioen



SUM 2008 - October 2, 2008                                                            14 / 24
Problem and context           Method proposed      Experimental results          Conclusion



 A first (naive) approach

              Insert all the triples into a unique Bloom filter.
                      INSERT (“<Ullman88>_type_Book”)
                      INSERT (“<Ullman88>_label_"Principles               of ..."”)
                      ...
              Use the CONTAINS operation to verify a solution
                      CONTAINS (“<Ullman88>_type_Book”) ⇒ true
                      CONTAINS (“<Ullman88>_label_Book”) ⇒ false




                                                                                      griffioen



SUM 2008 - October 2, 2008                                                            14 / 24
Problem and context           Method proposed      Experimental results          Conclusion



 A first (naive) approach

              Insert all the triples into a unique Bloom filter.
                      INSERT (“<Ullman88>_type_Book”)
                      INSERT (“<Ullman88>_label_"Principles               of ..."”)
                      ...
              Use the CONTAINS operation to verify a solution
                      CONTAINS (“<Ullman88>_type_Book”) ⇒ true
                      CONTAINS (“<Ullman88>_label_Book”) ⇒ false


              Not the best approach ! Let’s see what happen in detail . . .




                                                                                      griffioen



SUM 2008 - October 2, 2008                                                            14 / 24
Problem and context              Method proposed   Experimental results          Conclusion



 A first (naive) approach

              Insert all the triples into a unique Bloom filter.
                      INSERT (“<Ullman88>_type_Book”)
                      INSERT (“<Ullman88>_label_"Principles               of ..."”)
                      ...
              Use the CONTAINS operation to verify a solution
                      CONTAINS (“<Ullman88>_type_Book”) ⇒ true
                      CONTAINS (“<Ullman88>_label_Book”) ⇒ false


              Not the best approach ! Let’s see what happen in detail . . .
                              ?publication label ?title

                        CONTAINS (“<Ullman88>_label_Book”)

                                                                                      griffioen
                             modify ?publication and ?title
SUM 2008 - October 2, 2008                                                            14 / 24
Problem and context          Method proposed          Experimental results   Conclusion



 Graph parsing
              Every triple of the graph is inserted into 4 Bloom filters
               <Ullman88>                      type             Book




                                                                                 griffioen



SUM 2008 - October 2, 2008                                                       15 / 24
Problem and context          Method proposed          Experimental results   Conclusion



 Graph parsing
              Every triple of the graph is inserted into 4 Bloom filters
               <Ullman88>                      type             Book

       <Ullman88>_type_Book



                      SPO




                                                                                 griffioen



SUM 2008 - October 2, 2008                                                       15 / 24
Problem and context          Method proposed          Experimental results   Conclusion



 Graph parsing
              Every triple of the graph is inserted into 4 Bloom filters
               <Ullman88>                      type             Book

       <Ullman88>_type_Book <Ullman88>_type



                      SPO                      SP




                                                                                 griffioen



SUM 2008 - October 2, 2008                                                       15 / 24
Problem and context          Method proposed          Experimental results   Conclusion



 Graph parsing
              Every triple of the graph is inserted into 4 Bloom filters
               <Ullman88>                      type             Book

       <Ullman88>_type_Book <Ullman88>_type type_Book



                      SPO                      SP                 PO




                                                                                 griffioen



SUM 2008 - October 2, 2008                                                       15 / 24
Problem and context          Method proposed          Experimental results   Conclusion



 Graph parsing
              Every triple of the graph is inserted into 4 Bloom filters
               <Ullman88>                      type             Book

       <Ullman88>_type_Book <Ullman88>_type type_Book <Ullman88>_Boo



                      SPO                      SP                 PO         SO




                                                                                  griffioen



SUM 2008 - October 2, 2008                                                        15 / 24
Problem and context          Method proposed          Experimental results   Conclusion



 Graph parsing
              Every triple of the graph is inserted into 4 Bloom filters
               <Ullman88>                      type             Book

       <Ullman88>_type_Book <Ullman88>_type type_Book <Ullman88>_Boo



                      SPO                      SP                 PO         SO


              Three domains are defined
              S = <Ullman88> b1 ullman
              P = type label author _1 homepage
              O = Book "Principles of ..." b1 ullman <http://...>
                                                                                  griffioen



SUM 2008 - October 2, 2008                                                        15 / 24
Problem and context          Method proposed          Experimental results   Conclusion



 Graph parsing
              Every triple of the graph is inserted into 4 Bloom filters
               <Ullman88>                      type             Book

       <Ullman88>_type_Book <Ullman88>_type type_Book <Ullman88>_Boo



                      SPO                      SP                 PO         SO


              Three domains are defined
              S = <Ullman88> b1 ullman
              P = type label author _1 homepage
              O = Book "Principles of ..." b1 ullman <http://...>

              Each term is replaced by an integer (with a dictionary)             griffioen

                      <Ullman88> → 46
SUM 2008 - October 2, 2008                                                        15 / 24
Problem and context          Method proposed    Experimental results         Conclusion



 Evolutionary algorithm flowchart                                       [Eiben2003]

              Set of populations + Set of operators




                                                                                 griffioen



SUM 2008 - October 2, 2008                                                       16 / 24
Problem and context          Method proposed   Experimental results   Conclusion



 Query parsing
              Definition of the chromosome for the individuals
                      ?publication1 ?publication2 ?title




                                                                          griffioen



SUM 2008 - October 2, 2008                                                17 / 24
Problem and context          Method proposed      Experimental results   Conclusion



 Query parsing
              Definition of the chromosome for the individuals
                      ?publication1 ?publication2 ?title


              Creation of constraints to verify




                                                                             griffioen



SUM 2008 - October 2, 2008                                                   17 / 24
Problem and context          Method proposed      Experimental results   Conclusion



 Query parsing
              Definition of the chromosome for the individuals
                      ?publication1 ?publication2 ?title


              Creation of constraints to verify
                      Clause ?publication type Book .
                       bloom(spo |?publication1 type Book)
                       bloom(sp   |?publication1 type)
                       bloom(po   |type Book)




                                                                             griffioen



SUM 2008 - October 2, 2008                                                   17 / 24
Problem and context          Method proposed      Experimental results   Conclusion



 Query parsing
              Definition of the chromosome for the individuals
                      ?publication1 ?publication2 ?title


              Creation of constraints to verify
                      Clause ?publication type Book .
                       bloom(spo |?publication1 type Book)
                       bloom(sp   |?publication1 type)
                       bloom(po   |type Book)
                      Clause ?publication label ?title .
                       bloom(spo |?publication2 label ?title)
                       bloom(sp   |?publication2 label)
                       bloom(po   |label ?title)
                       bloom(so   |?publication2 ?title)

                                                                             griffioen



SUM 2008 - October 2, 2008                                                   17 / 24
Problem and context          Method proposed      Experimental results   Conclusion



 Query parsing
              Definition of the chromosome for the individuals
                      ?publication1 ?publication2 ?title


              Creation of constraints to verify
                      Clause ?publication type Book .
                       bloom(spo |?publication1 type Book)
                       bloom(sp      |?publication1 type)
                       bloom(po      |type Book)
                      Clause ?publication label ?title .
                       bloom(spo |?publication2 label ?title)
                       bloom(sp      |?publication2 label)
                       bloom(po      |label ?title)
                       bloom(so      |?publication2 ?title)
                      Equality constraint
                       equal(?publication1 ,?publication2 )                  griffioen



SUM 2008 - October 2, 2008                                                   17 / 24
Problem and context          Method proposed          Experimental results   Conclusion



 Query parsing
              Definition of the chromosome for the individuals
                      ?publication1 ?publication2 ?title

                                                        Removed
              Creation of constraints to verify
                                                        because
                      Clause ?publication type Book .
                                                        always true
                       bloom(spo |?publication type Book)
                                                  1
                       bloom(sp      |?publication1 type)
                       bloom(po      |type Book)
                      Clause ?publication label ?title .
                       bloom(spo |?publication2 label ?title)
                       bloom(sp      |?publication2 label)
                       bloom(po      |label ?title)
                       bloom(so      |?publication2 ?title)
                      Equality constraint
                       equal(?publication1 ,?publication2 )                      griffioen



SUM 2008 - October 2, 2008                                                       17 / 24
Problem and context             Method proposed       Experimental results   Conclusion



 Evaluation of a candidate solution


              Solution is checked against all the constraints. If one is
              satisfied,
                      A global reward w is won
                      Each variable used is equally rewarded



              Rewards for : bloom(spo|?publication2 label
              ?title)
                      reward(solution) += w
                                                  w
                      reward(?publication1 ) +=   2
                      reward(?title) += w
                                        2


                                                                                 griffioen



SUM 2008 - October 2, 2008                                                       18 / 24
Problem and context                 Method proposed             Experimental results                    Conclusion



 Creation of new individuals

                Select two individuals and do a one point crossover

      dblp:ullman     <Ullman88>     "Principles. . . "    dblp:ullman     <Ullman88>            _:b1

      <Ullman88>      dblp:ullman           _:b1           <Ullman88>     dblp:ullman     "Principles. . . "


   Randomly pick a pivot point                            Swap the two parts


                Mutate the least efficient variable
      dblp:ullman     <Ullman88>    "Principles. . . "

            0           3×w               2×w              <Ullman88>    <Ullman88>     "Principles. . . "


   Select the variable with lowest                        Assign a random new value
   reward                                                                                                    griffioen



SUM 2008 - October 2, 2008                                                                                   19 / 24
Problem and context          Method proposed   Experimental results   Conclusion




       1    What’s the problem ?
             Querying RDF datastores
             Standard techniques

       2    And Now for Something Completely Different
              Guessing the solution instead
              The way we do it

       3    Does it work ?
              Evolution of the quality
              Some characteristics of this method

       4    TODO list
                                                                          griffioen



SUM 2008 - October 2, 2008                                                20 / 24
Problem and context                 Method proposed                               Experimental results              Conclusion



                Results on some (small) datasets
                         Database FOAF (15k triples) and DBLP (3M triples)
                         Query with, respectively, 4 and 11 different variables
                         Average result for 200 individuals and 500 generations
                60                                                                   100

                50
                                                                                     90
fitness value




                                                                     fitness value
                40

                30                                                                   80

                20
                                                                                     70
                10

                 0                                                                   60
                     0   100      200      300     400         500                         0       100       200     300     400       500
                                 n-th generation                                                           n-th generation

                         Solutions with maximum reward (52) are found for FOAF                                                     griffioen

                         Not enough time for DBLP (max 319)
           SUM 2008 - October 2, 2008                                                                                              21 / 24
Problem and context              Method proposed      Experimental results    Conclusion



 Scalibility & speed

              Low memory requirements
                      Only depends on the number of individuals and the size of
                      the Bloom filters

                                 (a) parsing           (b) querying
                             dataset    memory     dataset        memory
                             FOAF        65 MB     FOAF            15 MB
                             DBLP       230 MB     DBLP           140 MB

                Table: Average memory usage (mostly due to dictionary)


              Computation can be distributed
                      Candidate solutions are independent
                      The dictionary can be based on a DHT                        griffioen



SUM 2008 - October 2, 2008                                                        22 / 24
Problem and context          Method proposed   Experimental results   Conclusion




       1    What’s the problem ?
             Querying RDF datastores
             Standard techniques

       2    And Now for Something Completely Different
              Guessing the solution instead
              The way we do it

       3    Does it work ?
              Evolution of the quality
              Some characteristics of this method

       4    TODO list
                                                                          griffioen



SUM 2008 - October 2, 2008                                                23 / 24
Problem and context             Method proposed        Experimental results    Conclusion



 Status and future work
              Current status
                      The search process can be slow to converge
                      Several parameters to tune (rewards, size of the population,
                      number of generations, . . . )




                                                                                     griffioen



SUM 2008 - October 2, 2008                                                           24 / 24
Problem and context             Method proposed        Experimental results    Conclusion



 Status and future work
              Current status
                      The search process can be slow to converge
                      Several parameters to tune (rewards, size of the population,
                      number of generations, . . . )


              Current work




                                                                                     griffioen



SUM 2008 - October 2, 2008                                                           24 / 24
Problem and context               Method proposed          Experimental results   Conclusion



 Status and future work
              Current status
                      The search process can be slow to converge
                      Several parameters to tune (rewards, size of the population,
                      number of generations, . . . )


              Current work
                 1    Improve benchmarking
                             Test with more queries and more datasets
                             Better study of the influence of the parameters




                                                                                      griffioen



SUM 2008 - October 2, 2008                                                            24 / 24
Problem and context               Method proposed          Experimental results    Conclusion



 Status and future work
              Current status
                      The search process can be slow to converge
                      Several parameters to tune (rewards, size of the population,
                      number of generations, . . . )


              Current work
                 1    Improve benchmarking
                             Test with more queries and more datasets
                             Better study of the influence of the parameters
                 2    Improve evolution
                             Experiment different type of crossover and mutation
                             Implement dynamic valuations for the rewards
                             Improve early results on tabbu search approach

                                                                                       griffioen



SUM 2008 - October 2, 2008                                                             24 / 24
Problem and context               Method proposed          Experimental results    Conclusion



 Status and future work
              Current status
                      The search process can be slow to converge
                      Several parameters to tune (rewards, size of the population,
                      number of generations, . . . )


              Current work
                 1    Improve benchmarking
                             Test with more queries and more datasets
                             Better study of the influence of the parameters
                 2    Improve evolution
                             Experiment different type of crossover and mutation
                             Implement dynamic valuations for the rewards
                             Improve early results on tabbu search approach
                 3    Test other, easy to parallelize and anytime, optimizer
                             Swarm based algorithm (PSO, ...) or an other EA           griffioen

                             CSP solver
SUM 2008 - October 2, 2008                                                             24 / 24

More Related Content

PDF
Question Answering - Application and Challenges
PPTX
Question answering
PPTX
From TREC to Watson: is open domain question answering a solved problem?
PPTX
R programming for psychometrics
PDF
Open domain Question Answering System - Research project in NLP
PDF
Practical machine learning - Part 1
PDF
Natural language processing for requirements engineering: ICSE 2021 Technical...
PPTX
Using a keyword extraction pipeline to understand concepts in future work sec...
Question Answering - Application and Challenges
Question answering
From TREC to Watson: is open domain question answering a solved problem?
R programming for psychometrics
Open domain Question Answering System - Research project in NLP
Practical machine learning - Part 1
Natural language processing for requirements engineering: ICSE 2021 Technical...
Using a keyword extraction pipeline to understand concepts in future work sec...

What's hot (8)

PDF
Combining IR with Relevance Feedback for Concept Location
PPTX
Automatic Key Term Extraction from Spoken Course Lectures
PPTX
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
PDF
Extracting keywords from texts - Sanda Martincic Ipsic
PPTX
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
PDF
2011 EASE - Motivation in Software Engineering: A Systematic Review Update
PPT
QALL-ME: Ontology and Semantic Web
PDF
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Combining IR with Relevance Feedback for Concept Location
Automatic Key Term Extraction from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
Extracting keywords from texts - Sanda Martincic Ipsic
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
2011 EASE - Motivation in Software Engineering: A Systematic Review Update
QALL-ME: Ontology and Semantic Web
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Ad

Viewers also liked (20)

PPT
Drugan Notes- Biological Perspective
PPT
Ch. 2 -_the_biological_perspective
PDF
Every Crisis is Global, Social, Viral
PPTX
Learning styles from a multicultural perspective
PDF
2010 1 materialism1
PPT
Attitude Changes Everything
PPTX
The bee effect: Action to effect change
PPTX
Valuing ecosystem services: a biological perspective
PDF
Managing Multicultural Individuals
PPTX
Inculcate Self Confidence & Self Belief
PPTX
Why Watson Won: A cognitive perspective
PPTX
C:\Multicultural Education Powerpoint
PPT
Evolutionary perspective
PPTX
our behaviour is the foundation of our attitude and self perception
PPTX
Theorizing the Future of Computer-Mediated Communication: The Changing Role o...
PPTX
Cognition, Learning, and Self-Tracking - Quantified Self 2011
DOC
ATTITUDE AND BEHAVIOUR
PPT
Meth Powerpoint
PPTX
Social cognition
PDF
Social cognition
Drugan Notes- Biological Perspective
Ch. 2 -_the_biological_perspective
Every Crisis is Global, Social, Viral
Learning styles from a multicultural perspective
2010 1 materialism1
Attitude Changes Everything
The bee effect: Action to effect change
Valuing ecosystem services: a biological perspective
Managing Multicultural Individuals
Inculcate Self Confidence & Self Belief
Why Watson Won: A cognitive perspective
C:\Multicultural Education Powerpoint
Evolutionary perspective
our behaviour is the foundation of our attitude and self perception
Theorizing the Future of Computer-Mediated Communication: The Changing Role o...
Cognition, Learning, and Self-Tracking - Quantified Self 2011
ATTITUDE AND BEHAVIOUR
Meth Powerpoint
Social cognition
Social cognition
Ad

Similar to An Evolutionary Perspective on Approximate RDF Query Answering (20)

PPTX
Logical Detection of Invalid SameAs Statements in RDF Data
PPTX
20130622 okfn hackathon t2
PDF
Modelling context and statement-level metadata in knowledge graphs
PPT
Ks2007 Semanticweb In Action
PPTX
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
PPTX
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
PDF
Hide the Stack: Toward Usable Linked Data
PDF
Uplift – Generating RDF datasets from non-RDF data with R2RML
PPTX
Mining and Managing Large-scale Linked Open Data
PPTX
Mining and Managing Large-scale Linked Open Data
PPTX
CBS CEDAR Presentation
PPTX
RDF2Rule PRESENTATION
PDF
QUALITY-AWARE SUBGRAPH MATCHING OVER INCONSISTENT PROBABILISTIC GRAPH DATABASES
PPTX
Representing verifiable statistical index computations as linked data
PDF
Which Model Does Not Belong: A Dialogue
PPT
R for the semantic web, Quesada useR 2009
PDF
Using Page Size for Controlling Duplicate Query Results in Semantic Web
PPTX
How the Web can change social science research (including yours)
PPTX
Querying the Web of Data
PDF
Interactive Knowledge Discovery over Web of Data.
Logical Detection of Invalid SameAs Statements in RDF Data
20130622 okfn hackathon t2
Modelling context and statement-level metadata in knowledge graphs
Ks2007 Semanticweb In Action
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Hide the Stack: Toward Usable Linked Data
Uplift – Generating RDF datasets from non-RDF data with R2RML
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
CBS CEDAR Presentation
RDF2Rule PRESENTATION
QUALITY-AWARE SUBGRAPH MATCHING OVER INCONSISTENT PROBABILISTIC GRAPH DATABASES
Representing verifiable statistical index computations as linked data
Which Model Does Not Belong: A Dialogue
R for the semantic web, Quesada useR 2009
Using Page Size for Controlling Duplicate Query Results in Semantic Web
How the Web can change social science research (including yours)
Querying the Web of Data
Interactive Knowledge Discovery over Web of Data.

More from Christophe Guéret (20)

PDF
HHAI June 2022 - KGs and Hybrid Intelligence
PDF
Informal presentation about RES
ODP
Stop making tools! Nobody likes them anyway...
ODP
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
ODP
Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"
PDF
The Entity Registry System (ERS)
PDF
Let's downscale the semantic web !
PDF
Your next data viz gear should be a Wii-U
PDF
Linking knowledge spaces
ODP
The data behind the HuisKluis
PDF
Digital archiving 3.0
PDF
The road towards a Web-based data ecosystem
PDF
Linked Open Data for Digital Humanities
PDF
Downscaling information systems for education
PDF
ICT4D course 2013 - Low resources infrastructure
PDF
ICT4D course 2013 - OLPC deployments
PDF
ICT4D course 2013 - Sugar
PDF
Exposing the data from NARCIS with VIVO
PDF
Clarifier le sens de vos données publiques avec le Web de données
ODP
Embedding young learners into the information society
HHAI June 2022 - KGs and Hybrid Intelligence
Informal presentation about RES
Stop making tools! Nobody likes them anyway...
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"
The Entity Registry System (ERS)
Let's downscale the semantic web !
Your next data viz gear should be a Wii-U
Linking knowledge spaces
The data behind the HuisKluis
Digital archiving 3.0
The road towards a Web-based data ecosystem
Linked Open Data for Digital Humanities
Downscaling information systems for education
ICT4D course 2013 - Low resources infrastructure
ICT4D course 2013 - OLPC deployments
ICT4D course 2013 - Sugar
Exposing the data from NARCIS with VIVO
Clarifier le sens de vos données publiques avec le Web de données
Embedding young learners into the information society

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The AUB Centre for AI in Media Proposal.docx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Understanding_Digital_Forensics_Presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Modernizing your data center with Dell and AMD
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
“AI and Expert System Decision Support & Business Intelligence Systems”
20250228 LYD VKU AI Blended-Learning.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Unlocking AI with Model Context Protocol (MCP)
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
MYSQL Presentation for SQL database connectivity
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

An Evolutionary Perspective on Approximate RDF Query Answering

  • 1. An Evolutionary Perspective on Approximate RDF Query Answering Christophe Guéret, Eyal Oren, Stefan Schlobach, Frank van Harmelen and Martijn Schut Vrije Universiteit, Amsterdam
  • 2. Problem and context Method proposed Experimental results Conclusion The next 30 minutes in 4 points... RDF griffioen SUM 2008 - October 2, 2008 2 / 24
  • 3. Problem and context Method proposed Experimental results Conclusion The next 30 minutes in 4 points... RDF Data on the Web griffioen SUM 2008 - October 2, 2008 2 / 24
  • 4. Problem and context Method proposed Experimental results Conclusion The next 30 minutes in 4 points... RDF Data on the Web Inconsistent, uncertain, heterogeneous, Huge and growing! griffioen SUM 2008 - October 2, 2008 2 / 24
  • 5. Problem and context Method proposed Experimental results Conclusion The next 30 minutes in 4 points... RDF Data on the Web Inconsistent, uncertain, heterogeneous, Huge and growing! RDF Query answering griffioen SUM 2008 - October 2, 2008 2 / 24
  • 6. Problem and context Method proposed Experimental results Conclusion The next 30 minutes in 4 points... RDF Data on the Web Inconsistent, uncertain, heterogeneous, Huge and growing! RDF Query answering Finding data matching criterion griffioen SUM 2008 - October 2, 2008 2 / 24
  • 7. Problem and context Method proposed Experimental results Conclusion The next 30 minutes in 4 points... RDF Data on the Web Inconsistent, uncertain, heterogeneous, Huge and growing! RDF Query answering Finding data matching criterion ... many queries are actually not satisfiable griffioen SUM 2008 - October 2, 2008 2 / 24
  • 8. Problem and context Method proposed Experimental results Conclusion The next 30 minutes in 4 points... RDF Data on the Web Inconsistent, uncertain, heterogeneous, Huge and growing! RDF Query answering Finding data matching criterion ... many queries are actually not satisfiable Approximate RDF Query answering griffioen SUM 2008 - October 2, 2008 2 / 24
  • 9. Problem and context Method proposed Experimental results Conclusion The next 30 minutes in 4 points... RDF Data on the Web Inconsistent, uncertain, heterogeneous, Huge and growing! RDF Query answering Finding data matching criterion ... many queries are actually not satisfiable Approximate RDF Query answering Finding some, almost valid, data griffioen SUM 2008 - October 2, 2008 2 / 24
  • 10. Problem and context Method proposed Experimental results Conclusion The next 30 minutes in 4 points... RDF Data on the Web Inconsistent, uncertain, heterogeneous, Huge and growing! RDF Query answering Finding data matching criterion ... many queries are actually not satisfiable Approximate RDF Query answering Finding some, almost valid, data The Evolutionary Perspective griffioen SUM 2008 - October 2, 2008 2 / 24
  • 11. Problem and context Method proposed Experimental results Conclusion The next 30 minutes in 4 points... RDF Data on the Web Inconsistent, uncertain, heterogeneous, Huge and growing! RDF Query answering Finding data matching criterion ... many queries are actually not satisfiable Approximate RDF Query answering Finding some, almost valid, data The Evolutionary Perspective Test different solutions griffioen SUM 2008 - October 2, 2008 2 / 24
  • 12. Problem and context Method proposed Experimental results Conclusion The next 30 minutes in 4 points... RDF Data on the Web Inconsistent, uncertain, heterogeneous, Huge and growing! RDF Query answering Finding data matching criterion ... many queries are actually not satisfiable Approximate RDF Query answering Finding some, almost valid, data The Evolutionary Perspective Test different solutions Progressive optimisation of the result griffioen SUM 2008 - October 2, 2008 2 / 24
  • 13. Problem and context Method proposed Experimental results Conclusion 1 What’s the problem ? Querying RDF datastores Standard techniques 2 And Now for Something Completely Different Guessing the solution instead The way we do it 3 Does it work ? Evolution of the quality Some characteristics of this method 4 TODO list griffioen SUM 2008 - October 2, 2008 3 / 24
  • 14. Problem and context Method proposed Experimental results Conclusion 1 What’s the problem ? Querying RDF datastores Standard techniques 2 And Now for Something Completely Different Guessing the solution instead The way we do it 3 Does it work ? Evolution of the quality Some characteristics of this method 4 TODO list griffioen SUM 2008 - October 2, 2008 4 / 24
  • 15. Problem and context Method proposed Experimental results Conclusion Example RDF dataset <Ullman88> type Book . <Ullman88> label "Principles of Database and Knowledge-Base Systems" . <Ullman88> author b1 . b1 _1 ullman . ullman homepage <http://stanford.edu/~ullman/> . SPARQL query SELECT ?title WHERE { ?publication type Book . ?publication label ?title . } Expected answer griffioen ?title = "Principles of Database and Knowledge-Base Systems SUM 2008 - October 2, 2008 5 / 24
  • 16. Problem and context Method proposed Experimental results Conclusion Problem description Triple = subject, predicate, object Dataset = graph of triples Querying : find a pattern in the graph griffioen A query and a graph [PSPARQL07] SUM 2008 - October 2, 2008 6 / 24
  • 17. Problem and context Method proposed Experimental results Conclusion Standard techniques Standard approach : griffioen SUM 2008 - October 2, 2008 7 / 24
  • 18. Problem and context Method proposed Experimental results Conclusion Standard techniques Standard approach : 1 Find all the possible results for ?publication type Book griffioen SUM 2008 - October 2, 2008 7 / 24
  • 19. Problem and context Method proposed Experimental results Conclusion Standard techniques Standard approach : 1 Find all the possible results for ?publication type Book ?publication <Ullman88> griffioen SUM 2008 - October 2, 2008 7 / 24
  • 20. Problem and context Method proposed Experimental results Conclusion Standard techniques Standard approach : 1 Find all the possible results for ?publication type Book ?publication <Ullman88> 2 Find all the possible results for ?publication label ?title griffioen SUM 2008 - October 2, 2008 7 / 24
  • 21. Problem and context Method proposed Experimental results Conclusion Standard techniques Standard approach : 1 Find all the possible results for ?publication type Book ?publication <Ullman88> 2 Find all the possible results for ?publication label ?title ?publication ?title <Ullman88> "Principles of ..." griffioen SUM 2008 - October 2, 2008 7 / 24
  • 22. Problem and context Method proposed Experimental results Conclusion Standard techniques Standard approach : 1 Find all the possible results for ?publication type Book ?publication <Ullman88> 2 Find all the possible results for ?publication label ?title ?publication ?title <Ullman88> "Principles of ..." 3 Do a join on the two tables and return the result griffioen SUM 2008 - October 2, 2008 7 / 24
  • 23. Problem and context Method proposed Experimental results Conclusion Standard techniques Standard approach : 1 Find all the possible results for ?publication type Book ?publication <Ullman88> 2 Find all the possible results for ?publication label ?title ?publication ?title <Ullman88> "Principles of ..." 3 Do a join on the two tables and return the result ?title = "Principles of ..." griffioen SUM 2008 - October 2, 2008 7 / 24
  • 24. Problem and context Method proposed Experimental results Conclusion Standard techniques Standard approach : 1 Find all the possible results for ?publication type Book ?publication <Ullman88> 2 Find all the possible results for ?publication label ?title ?publication ?title <Ullman88> "Principles of ..." 3 Do a join on the two tables and return the result ?title = "Principles of ..." Fast thanks to the creation of indexes and query optimisation griffioen SUM 2008 - October 2, 2008 7 / 24
  • 25. Problem and context Method proposed Experimental results Conclusion Motivation Designed to return results only when there are some Not designed for incomplete and approximate queries/answers Hard to distribute griffioen SUM 2008 - October 2, 2008 8 / 24
  • 26. Problem and context Method proposed Experimental results Conclusion Motivation Designed to return results only when there are some Not designed for incomplete and approximate queries/answers Hard to distribute Approximate answers to precise queries If the query is unsat, return the best almost sat solution found griffioen SUM 2008 - October 2, 2008 8 / 24
  • 27. Problem and context Method proposed Experimental results Conclusion Motivation Designed to return results only when there are some Not designed for incomplete and approximate queries/answers Hard to distribute Approximate answers to precise queries If the query is unsat, return the best almost sat solution found Precises answers to approximate queries Return a subset of existing solutions instead of showing them all griffioen SUM 2008 - October 2, 2008 8 / 24
  • 28. Problem and context Method proposed Experimental results Conclusion Motivation Designed to return results only when there are some Not designed for incomplete and approximate queries/answers Hard to distribute Approximate answers to precise queries If the query is unsat, return the best almost sat solution found Precises answers to approximate queries Return a subset of existing solutions instead of showing them all Interactive querying Use of intermediate results to help the user improving his griffioen query SUM 2008 - October 2, 2008 8 / 24
  • 29. Problem and context Method proposed Experimental results Conclusion 1 What’s the problem ? Querying RDF datastores Standard techniques 2 And Now for Something Completely Different Guessing the solution instead The way we do it 3 Does it work ? Evolution of the quality Some characteristics of this method 4 TODO list griffioen SUM 2008 - October 2, 2008 9 / 24
  • 30. Problem and context Method proposed Experimental results Conclusion Approach “I’m Feeling Lucky” approach : griffioen SUM 2008 - October 2, 2008 10 / 24
  • 31. Problem and context Method proposed Experimental results Conclusion Approach “I’m Feeling Lucky” approach : 1 Assign some random values to the variables griffioen SUM 2008 - October 2, 2008 10 / 24
  • 32. Problem and context Method proposed Experimental results Conclusion Approach “I’m Feeling Lucky” approach : 1 Assign some random values to the variables ?publication = <Ullman88> ?title = Book griffioen SUM 2008 - October 2, 2008 10 / 24
  • 33. Problem and context Method proposed Experimental results Conclusion Approach “I’m Feeling Lucky” approach : 1 Assign some random values to the variables ?publication = <Ullman88> ?title = Book 2 Verify if the solution is valid griffioen SUM 2008 - October 2, 2008 10 / 24
  • 34. Problem and context Method proposed Experimental results Conclusion Approach “I’m Feeling Lucky” approach : 1 Assign some random values to the variables ?publication = <Ullman88> ?title = Book 2 Verify if the solution is valid Triple Is in the graph ? <Ullman88> type Book yes <Ullman88> label Book no griffioen SUM 2008 - October 2, 2008 10 / 24
  • 35. Problem and context Method proposed Experimental results Conclusion Approach “I’m Feeling Lucky” approach : 1 Assign some random values to the variables ?publication = <Ullman88> ?title = Book 2 Verify if the solution is valid Triple Is in the graph ? <Ullman88> type Book yes <Ullman88> label Book no 3 If the solution is OK, stop. Otherwise, try again with something else griffioen SUM 2008 - October 2, 2008 10 / 24
  • 36. Problem and context Method proposed Experimental results Conclusion Approach “I’m Feeling Lucky” approach : 1 Assign some random values to the variables ?publication = <Ullman88> ?title = Book 2 Verify if the solution is valid Triple Is in the graph ? <Ullman88> type Book yes <Ullman88> label Book no 3 If the solution is OK, stop. Otherwise, try again with something else Rely on membership testing (instead of lookup) griffioen SUM 2008 - October 2, 2008 10 / 24
  • 37. Problem and context Method proposed Experimental results Conclusion Approach “I’m Feeling Lucky” approach : 1 Assign some random values to the variables ?publication = <Ullman88> ?title = Book 2 Verify if the solution is valid Triple Is in the graph ? <Ullman88> type Book yes <Ullman88> label Book no 3 If the solution is OK, stop. Otherwise, try again with something else Rely on membership testing (instead of lookup) The testing loop can be stopped at any time griffioen SUM 2008 - October 2, 2008 10 / 24
  • 38. Problem and context Method proposed Experimental results Conclusion Approach “I’m Feeling Lucky” approach : 1 Assign some random values to the variables ?publication = <Ullman88> ?title = Book 2 Verify if the solution is valid Triple Is in the graph ? <Ullman88> type Book yes <Ullman88> label Book no 3 If the solution is OK, stop. Otherwise, try again with something else Rely on membership testing (instead of lookup) The testing loop can be stopped at any time A result may satisfy part of the query griffioen SUM 2008 - October 2, 2008 10 / 24
  • 39. Problem and context Method proposed Experimental results Conclusion Our choices Need to pay attention to two aspects griffioen SUM 2008 - October 2, 2008 11 / 24
  • 40. Problem and context Method proposed Experimental results Conclusion Our choices Need to pay attention to two aspects 1 Each try should be a step closer to the solution Random guessing may never end Stopping the process at t + 1 should give better results than at t griffioen SUM 2008 - October 2, 2008 11 / 24
  • 41. Problem and context Method proposed Experimental results Conclusion Our choices Need to pay attention to two aspects 1 Each try should be a step closer to the solution Random guessing may never end Stopping the process at t + 1 should give better results than at t 2 Testing a candidate solution must be fast Will try a lot of solutions griffioen SUM 2008 - October 2, 2008 11 / 24
  • 42. Problem and context Method proposed Experimental results Conclusion Our choices Need to pay attention to two aspects 1 Each try should be a step closer to the solution Random guessing may never end Stopping the process at t + 1 should give better results than at t 2 Testing a candidate solution must be fast Will try a lot of solutions We made the following choices Generation of solutions : Evolutionary algorithm Verification of solutions : Bloom filter based testing griffioen SUM 2008 - October 2, 2008 11 / 24
  • 43. Problem and context Method proposed Experimental results Conclusion Binary Bloom filters (1/2) Compact representation of information : a set of n = 8 bits 1 2 3 4 5 6 7 8 Supports two operations INSERT ( KEY ) : Insert a key into the filter CONTAINS ( KEY ) : Test for the presence of a key Use k = 3 hash functions to compute a set of bits from a key HASH 1(“ HELLO WORLD ”)=8 HASH 2(“ HELLO WORLD ”)=6 HASH 3(“ HELLO WORLD ”)=3 griffioen SUM 2008 - October 2, 2008 12 / 24
  • 44. Problem and context Method proposed Experimental results Conclusion Binary Bloom filters (2/2) INSERT (“ HELLO WORLD ”) Current Bit-wise or operation OR Always successful (i.e. “Hello world” unlimited capacity) = Precision depends of New number of elements m. CONTAINS (“B ONJOUR !”) Current Bit-wise and operation AND “Bonjour !” Positive result can be a collision = kn griffioen Test result perror = (1 − e− m )k SUM 2008 - October 2, 2008 13 / 24
  • 45. Problem and context Method proposed Experimental results Conclusion A first (naive) approach Insert all the triples into a unique Bloom filter. INSERT (“<Ullman88>_type_Book”) INSERT (“<Ullman88>_label_"Principles of ..."”) ... griffioen SUM 2008 - October 2, 2008 14 / 24
  • 46. Problem and context Method proposed Experimental results Conclusion A first (naive) approach Insert all the triples into a unique Bloom filter. INSERT (“<Ullman88>_type_Book”) INSERT (“<Ullman88>_label_"Principles of ..."”) ... Use the CONTAINS operation to verify a solution CONTAINS (“<Ullman88>_type_Book”) ⇒ true CONTAINS (“<Ullman88>_label_Book”) ⇒ false griffioen SUM 2008 - October 2, 2008 14 / 24
  • 47. Problem and context Method proposed Experimental results Conclusion A first (naive) approach Insert all the triples into a unique Bloom filter. INSERT (“<Ullman88>_type_Book”) INSERT (“<Ullman88>_label_"Principles of ..."”) ... Use the CONTAINS operation to verify a solution CONTAINS (“<Ullman88>_type_Book”) ⇒ true CONTAINS (“<Ullman88>_label_Book”) ⇒ false Not the best approach ! Let’s see what happen in detail . . . griffioen SUM 2008 - October 2, 2008 14 / 24
  • 48. Problem and context Method proposed Experimental results Conclusion A first (naive) approach Insert all the triples into a unique Bloom filter. INSERT (“<Ullman88>_type_Book”) INSERT (“<Ullman88>_label_"Principles of ..."”) ... Use the CONTAINS operation to verify a solution CONTAINS (“<Ullman88>_type_Book”) ⇒ true CONTAINS (“<Ullman88>_label_Book”) ⇒ false Not the best approach ! Let’s see what happen in detail . . . ?publication label ?title CONTAINS (“<Ullman88>_label_Book”) griffioen modify ?publication and ?title SUM 2008 - October 2, 2008 14 / 24
  • 49. Problem and context Method proposed Experimental results Conclusion Graph parsing Every triple of the graph is inserted into 4 Bloom filters <Ullman88> type Book griffioen SUM 2008 - October 2, 2008 15 / 24
  • 50. Problem and context Method proposed Experimental results Conclusion Graph parsing Every triple of the graph is inserted into 4 Bloom filters <Ullman88> type Book <Ullman88>_type_Book SPO griffioen SUM 2008 - October 2, 2008 15 / 24
  • 51. Problem and context Method proposed Experimental results Conclusion Graph parsing Every triple of the graph is inserted into 4 Bloom filters <Ullman88> type Book <Ullman88>_type_Book <Ullman88>_type SPO SP griffioen SUM 2008 - October 2, 2008 15 / 24
  • 52. Problem and context Method proposed Experimental results Conclusion Graph parsing Every triple of the graph is inserted into 4 Bloom filters <Ullman88> type Book <Ullman88>_type_Book <Ullman88>_type type_Book SPO SP PO griffioen SUM 2008 - October 2, 2008 15 / 24
  • 53. Problem and context Method proposed Experimental results Conclusion Graph parsing Every triple of the graph is inserted into 4 Bloom filters <Ullman88> type Book <Ullman88>_type_Book <Ullman88>_type type_Book <Ullman88>_Boo SPO SP PO SO griffioen SUM 2008 - October 2, 2008 15 / 24
  • 54. Problem and context Method proposed Experimental results Conclusion Graph parsing Every triple of the graph is inserted into 4 Bloom filters <Ullman88> type Book <Ullman88>_type_Book <Ullman88>_type type_Book <Ullman88>_Boo SPO SP PO SO Three domains are defined S = <Ullman88> b1 ullman P = type label author _1 homepage O = Book "Principles of ..." b1 ullman <http://...> griffioen SUM 2008 - October 2, 2008 15 / 24
  • 55. Problem and context Method proposed Experimental results Conclusion Graph parsing Every triple of the graph is inserted into 4 Bloom filters <Ullman88> type Book <Ullman88>_type_Book <Ullman88>_type type_Book <Ullman88>_Boo SPO SP PO SO Three domains are defined S = <Ullman88> b1 ullman P = type label author _1 homepage O = Book "Principles of ..." b1 ullman <http://...> Each term is replaced by an integer (with a dictionary) griffioen <Ullman88> → 46 SUM 2008 - October 2, 2008 15 / 24
  • 56. Problem and context Method proposed Experimental results Conclusion Evolutionary algorithm flowchart [Eiben2003] Set of populations + Set of operators griffioen SUM 2008 - October 2, 2008 16 / 24
  • 57. Problem and context Method proposed Experimental results Conclusion Query parsing Definition of the chromosome for the individuals ?publication1 ?publication2 ?title griffioen SUM 2008 - October 2, 2008 17 / 24
  • 58. Problem and context Method proposed Experimental results Conclusion Query parsing Definition of the chromosome for the individuals ?publication1 ?publication2 ?title Creation of constraints to verify griffioen SUM 2008 - October 2, 2008 17 / 24
  • 59. Problem and context Method proposed Experimental results Conclusion Query parsing Definition of the chromosome for the individuals ?publication1 ?publication2 ?title Creation of constraints to verify Clause ?publication type Book . bloom(spo |?publication1 type Book) bloom(sp |?publication1 type) bloom(po |type Book) griffioen SUM 2008 - October 2, 2008 17 / 24
  • 60. Problem and context Method proposed Experimental results Conclusion Query parsing Definition of the chromosome for the individuals ?publication1 ?publication2 ?title Creation of constraints to verify Clause ?publication type Book . bloom(spo |?publication1 type Book) bloom(sp |?publication1 type) bloom(po |type Book) Clause ?publication label ?title . bloom(spo |?publication2 label ?title) bloom(sp |?publication2 label) bloom(po |label ?title) bloom(so |?publication2 ?title) griffioen SUM 2008 - October 2, 2008 17 / 24
  • 61. Problem and context Method proposed Experimental results Conclusion Query parsing Definition of the chromosome for the individuals ?publication1 ?publication2 ?title Creation of constraints to verify Clause ?publication type Book . bloom(spo |?publication1 type Book) bloom(sp |?publication1 type) bloom(po |type Book) Clause ?publication label ?title . bloom(spo |?publication2 label ?title) bloom(sp |?publication2 label) bloom(po |label ?title) bloom(so |?publication2 ?title) Equality constraint equal(?publication1 ,?publication2 ) griffioen SUM 2008 - October 2, 2008 17 / 24
  • 62. Problem and context Method proposed Experimental results Conclusion Query parsing Definition of the chromosome for the individuals ?publication1 ?publication2 ?title Removed Creation of constraints to verify because Clause ?publication type Book . always true bloom(spo |?publication type Book) 1 bloom(sp |?publication1 type) bloom(po |type Book) Clause ?publication label ?title . bloom(spo |?publication2 label ?title) bloom(sp |?publication2 label) bloom(po |label ?title) bloom(so |?publication2 ?title) Equality constraint equal(?publication1 ,?publication2 ) griffioen SUM 2008 - October 2, 2008 17 / 24
  • 63. Problem and context Method proposed Experimental results Conclusion Evaluation of a candidate solution Solution is checked against all the constraints. If one is satisfied, A global reward w is won Each variable used is equally rewarded Rewards for : bloom(spo|?publication2 label ?title) reward(solution) += w w reward(?publication1 ) += 2 reward(?title) += w 2 griffioen SUM 2008 - October 2, 2008 18 / 24
  • 64. Problem and context Method proposed Experimental results Conclusion Creation of new individuals Select two individuals and do a one point crossover dblp:ullman <Ullman88> "Principles. . . " dblp:ullman <Ullman88> _:b1 <Ullman88> dblp:ullman _:b1 <Ullman88> dblp:ullman "Principles. . . " Randomly pick a pivot point Swap the two parts Mutate the least efficient variable dblp:ullman <Ullman88> "Principles. . . " 0 3×w 2×w <Ullman88> <Ullman88> "Principles. . . " Select the variable with lowest Assign a random new value reward griffioen SUM 2008 - October 2, 2008 19 / 24
  • 65. Problem and context Method proposed Experimental results Conclusion 1 What’s the problem ? Querying RDF datastores Standard techniques 2 And Now for Something Completely Different Guessing the solution instead The way we do it 3 Does it work ? Evolution of the quality Some characteristics of this method 4 TODO list griffioen SUM 2008 - October 2, 2008 20 / 24
  • 66. Problem and context Method proposed Experimental results Conclusion Results on some (small) datasets Database FOAF (15k triples) and DBLP (3M triples) Query with, respectively, 4 and 11 different variables Average result for 200 individuals and 500 generations 60 100 50 90 fitness value fitness value 40 30 80 20 70 10 0 60 0 100 200 300 400 500 0 100 200 300 400 500 n-th generation n-th generation Solutions with maximum reward (52) are found for FOAF griffioen Not enough time for DBLP (max 319) SUM 2008 - October 2, 2008 21 / 24
  • 67. Problem and context Method proposed Experimental results Conclusion Scalibility & speed Low memory requirements Only depends on the number of individuals and the size of the Bloom filters (a) parsing (b) querying dataset memory dataset memory FOAF 65 MB FOAF 15 MB DBLP 230 MB DBLP 140 MB Table: Average memory usage (mostly due to dictionary) Computation can be distributed Candidate solutions are independent The dictionary can be based on a DHT griffioen SUM 2008 - October 2, 2008 22 / 24
  • 68. Problem and context Method proposed Experimental results Conclusion 1 What’s the problem ? Querying RDF datastores Standard techniques 2 And Now for Something Completely Different Guessing the solution instead The way we do it 3 Does it work ? Evolution of the quality Some characteristics of this method 4 TODO list griffioen SUM 2008 - October 2, 2008 23 / 24
  • 69. Problem and context Method proposed Experimental results Conclusion Status and future work Current status The search process can be slow to converge Several parameters to tune (rewards, size of the population, number of generations, . . . ) griffioen SUM 2008 - October 2, 2008 24 / 24
  • 70. Problem and context Method proposed Experimental results Conclusion Status and future work Current status The search process can be slow to converge Several parameters to tune (rewards, size of the population, number of generations, . . . ) Current work griffioen SUM 2008 - October 2, 2008 24 / 24
  • 71. Problem and context Method proposed Experimental results Conclusion Status and future work Current status The search process can be slow to converge Several parameters to tune (rewards, size of the population, number of generations, . . . ) Current work 1 Improve benchmarking Test with more queries and more datasets Better study of the influence of the parameters griffioen SUM 2008 - October 2, 2008 24 / 24
  • 72. Problem and context Method proposed Experimental results Conclusion Status and future work Current status The search process can be slow to converge Several parameters to tune (rewards, size of the population, number of generations, . . . ) Current work 1 Improve benchmarking Test with more queries and more datasets Better study of the influence of the parameters 2 Improve evolution Experiment different type of crossover and mutation Implement dynamic valuations for the rewards Improve early results on tabbu search approach griffioen SUM 2008 - October 2, 2008 24 / 24
  • 73. Problem and context Method proposed Experimental results Conclusion Status and future work Current status The search process can be slow to converge Several parameters to tune (rewards, size of the population, number of generations, . . . ) Current work 1 Improve benchmarking Test with more queries and more datasets Better study of the influence of the parameters 2 Improve evolution Experiment different type of crossover and mutation Implement dynamic valuations for the rewards Improve early results on tabbu search approach 3 Test other, easy to parallelize and anytime, optimizer Swarm based algorithm (PSO, ...) or an other EA griffioen CSP solver SUM 2008 - October 2, 2008 24 / 24