SlideShare a Scribd company logo
Dependable Cardinality Forecasts for XQuery
                   Jens Teubner, ETH (formerly IBM Research)
                   Torsten Grust, U T¨bingen (formerly TUM)
                                       u
                     Sebastian Maneth, UNSW and NICTA
                          Sherif Sakr, UNSW and NICTA




c Systems Group — Department of Computer Science — ETH Z¨rich
                                                        u       August 26, 2008
Cardinality Estimation for XQuery
   The feature-richness and semantics of the language make
   cardinality estimation for XQuery notoriously hard.
                    for $d in doc ("forecast.xml")/descendant::day
                    let $day := $d/@t
                    let $ppcp := data ($d/descendant::ppcp)
                    return
                      if ($ppcp > 50)
                        then ("rain likely on", $day,
                              "chance of precipitation:", $ppcp)
                        else ("no rain on", $day)

         for iteration                                      sequence construction
         conditionals (if-then-else)                        ...
         existential quantification                     (XPath is not a focus of this work.)



  August 26, 2008       Systems Group — Department of Computer Science — ETH Z¨rich
                                                                              u                2
Cardinality Estimation for XQuery
   The feature-richness and semantics of the language make
   cardinality estimation for XQuery notoriously hard.
                    for $d in doc ("forecast.xml")/descendant::day card1 = ?
                    let $day := $d/@t
                    let $ppcp := data ($d/descendant::ppcp)
                    return
                      if ($ppcp > 50)                              card2 = ?
                        then ("rain likely on", $day,            card3 = ?
                              "chance of precipitation:", $ppcp)
                        else ("no rain on", $day) card4 = ?

         for iteration                                      sequence construction
         conditionals (if-then-else)                        ...
         existential quantification                     (XPath is not a focus of this work.)

     → Goal: Compute subexpression-level cardinalities cardi .

  August 26, 2008       Systems Group — Department of Computer Science — ETH Z¨rich
                                                                              u                2
Idea: Perform cardinality estimation on relational plan
      equivalents for XQuery.

                    relational cardinality estimation (System R)
                  + existing work on XPath estimation
                  + histograms for value predicates
                  = cardinality estimation for XQuery


           Build on Pathfinder’s XQuery-to-relational algebra compiler.1
            → tuple count ≡ XQuery item count
           Cardinality information for each subexpression.


      1
          http://www.pathfinder-xquery.org/
August 26, 2008       Systems Group — Department of Computer Science — ETH Z¨rich
                                                                            u       3
πiter:outer,pos:pos1,item

                                                                                                          pos1: ord,pos        outer

Example (Plan details not of interest today.)                                     2                             iter=inner

                                                                                        ·
                                                                                        ∪
                                                                                                               4
                                              1                3                                   πiter,pos:pos1,item

for $d in doc ("forecast.xml")/descendant::day            πiter,pos:pos1,item                          pos1: ord,pos      iter

let $day := $d/@t                                          pos1: ord,pos        iter                           ·
                                                                                                               ∪
                                                                                  πiter,pos:1,ord:1,
let $ppcp := data ($d/descendant::ppcp)                               ·
                                                                      ∪               item:"rain..."
                                                                                                                          ·
                                                                                                                          ∪
return                                      2                   πiter,pos:1,ord:1,
                                                                                               πiter,pos,item,ord:2             ∪·
  if ($ppcp > 50)                          3                       item:"no rain..."                             πiter,pos:1,ord:3,
                                                                                                                      item:"chance..."
    then ("rain likely on", $day,                   πiter,pos,item,ord:2                                                          πiter,pos,item,ord:4

          "chance of precipitation:", $ppcp)               iter1=iter                                    iter1=iter                     iter=iter1
    else ("no rain on", $day) 4                                               
                                                                                                   δ
                                                                                              πiter

                                                                                               σres

                                                                                          res:(item,item1)
      Pathfinder compiles XQuery                                                              iter1=iter                                  πiter1:iter,
                                                πiter1:iter,pos,item
      with arbitrary nesting.                                                 πiter1:iter,
                                                                                pos1:1,item1:50                item
                                                                                                                                            pos,item


                                                  pos: item    iter

      Maintain correspondence to                                          πiter                          pos: item      iter

                                                item:attribute::t(item)                      item:descendant::ppcp(item)
      original query if back-end is                                               πiter:inner,item                                            πinner,outer:iter,
                                                                                                                                                ord:pos

      not relational.                                                                 inner: iter,pos

                                                                          1           pos: item         iter

      Derive estimates based on an                                        item:descendant::day(item)


      inference rule set.                                                               docitem
                                                                          πiter,item:"forecast.xml"

                                                                                            iter
                                                                                              1
Relational XQuery Cardinality Estimation
   Apply System R-style estimation to relational XQuery plans, e.g.,

   Disjoint union:                               Cartesian product:
          ·
      |q1 ∪ q2 | = |q1 | + |q2 |                   |q1 × q2 | = |q1 | · |q2 |

   Equi-join:     
                  
                        |q1 | · |q2 |                         if there are indexes on
                   max {|a| , |b| }                           both join columns,
                  
                  
                  
                              idx       idx
                  
       |q1 q2 | =        |q1 | · |q2 |                         if there is only an index
          a=b
                             |a|idx                            on column a,
                  
                  
                  
                  
                  
                  
                  
                      |q1 | · |q2 | · 1/10                     otherwise
                  



       |c|idx : Number of unique values in index on column c.
  August 26, 2008      Systems Group — Department of Computer Science — ETH Z¨rich
                                                                             u             5
Relational XQuery Cardinality Estimation
   Apply System R-style estimation to relational XQuery plans, e.g.,

   Disjoint union:                               Cartesian product:
          ·
      |q1 ∪ q2 | = |q1 | + |q2 |                   |q1 × q2 | = |q1 | · |q2 |

   Equi-join:     
                         |q1 | · |q2 |                         if there are indexes on
                  
                  
                   max {|a| , |b| }
                  
                  
                  
                              idx       idx
                                                               both join columns,          ?
                  
       |q1 q2 | =        |q1 | · |q2 |                         if there is only an index
          a=b     
                  
                  
                  
                  
                  
                             |a|idx                            on column a,                ?
                  
                      |q1 | · |q2 | · 1/10                     otherwise
                  

          Our joins typically operate over computed relations.
       |c|idx : Number of unique values in index on column c.
  August 26, 2008      Systems Group — Department of Computer Science — ETH Z¨rich
                                                                             u             5
Abstract Domain Identifiers
   A simple form of data flow analysis provides the information
   needed.
             Introduce abstract domain identifiers α, β, . . . as placeholders
             for the active runtime domain for each column c.
             (Read c α as “column c contains values from domain α.”)
             Estimate the size α of each domain α, e.g.,2

                    dom      a: b1 ,...,bn   (q) ⊇ dom (q) ∪ aα ∧ α =! |q|                     .

             Identify inclusion relationships α                      β between domains, e.g.,

                       aα ∈ dom (q) ∧ aβ ∈ dom (σ··· (q)) ⇒ β                            α .

        2
            Operator   a: b1 ,...,bn   introduces a new key column (holding row numbers).
  August 26, 2008          Systems Group — Department of Computer Science — ETH Z¨rich
                                                                                 u                 6
Abstract Domain Identifiers
   Use abstract domain information for cardinality estimation.
   E.g., “foreign key” join:

                    aα ∈ dom (q1 )        bβ ∈ dom (q2 )                      α      β
                                              |q1 | · |q2 |
                                   |q1 q2 | =
                                      a=b          β

           Domain inclusion guarantees that each tuple in q1 finds
           (at least one) join partner in q2 .

   Other examples:
           |q1  q2 | = |q1 | − |q2 |        if q2 is a subset of q1 .
           |q1  q2 | = 0                    if q1 is a subset of q2 .
           |q1  q2 | = |q1 |                if q1 and q2 are disjoint.
  August 26, 2008      Systems Group — Department of Computer Science — ETH Z¨rich
                                                                             u           7
Interfacing with XPath—Projection Paths

   Track XPath navigation by means of projection paths3

                        a                        a        b
                                                                                a    b     c
                                                               c:child::*(b)    1    γa   γb
                                                 1        γa
                    b   c       d                2        γb
                                                                                1    γa   γc
                                                                                1    γa   γd
                                                 3        γd
                                e                                               3    γd   γe
                                                     q1
                                                                                     q2


             b⇒p ∈ path (q1 ) ⇒ c⇒p/child::* ∈ path (q2 )
             Step operator    makes XPath navigation explicit in relational
             plans (compiles to join on SQL back-ends).



        3
            A. Marian and J. Sim´ on. Projecting XML Documents. VLDB 2003.
                                e
  August 26, 2008           Systems Group — Department of Computer Science — ETH Z¨rich
                                                                                  u            8
Interfacing with XPath—Cardinality Inference
                                                                                        bα
                                                      bα
                        a                        a        b
                                                                                   a     b     c
                                                                 c:child::*(b)     1     γa   γb
                                                 1        γa
                    b   c       d                2        γb
                                                                                   1     γa   γc    α    α
                                                                                   1     γa   γd
                                                 3        γd
                                e                                                  3     γd   γe
                                                     q1
                                                                                         q2
   Cardinality:
                            fn:count (p/child::*)
       |q2 | = |q1 | ·          fn:count (p)                   = |q1 | · Prchild::* (p)

                                                                                 fanout (here: 4/3)
   Domain Sizes:
                               fn:count (p [ child::* ])
     α = α ·                        fn:count (p)                   = α · Pr[child::*] (p)

                                                                                       selectivity (here: 2/3)
   Any XPath estimator that provides Prp2 (p1 ) and Pr[p2 ] (p1 ) will do.
           Our prototype uses a simple Data Guide-based implementation.
  August 26, 2008           Systems Group — Department of Computer Science — ETH Z¨rich
                                                                                  u                       9
πiter:outer,pos:pos1,item

                                                                                                                  pos1: ord,pos        outer

Back to our Example Plan                                                                        2                       iter=inner

                                                                                                ·
                                                                                                ∪
                                                                                                                       4
                                                                                                           πiter,pos:pos1,item
                                                                        3
                                                                   πiter,pos:pos1,item                         pos1: ord,pos      iter
                      res:(item,item1)
                                                                    pos1: ord,pos        iter                          ·
                                                                                                                       ∪
                                                                                           πiter,pos:1,ord:1,
                                                                               ·
                                                                               ∪                                                  ·
                                                                                                                                  ∪
                         iter1=iter                                                          item:"rain..."
                                                                                                       πiter,pos,item,ord:2             ∪·
                                                                         πiter,pos:1,ord:1,
       πiter1:iter,                                                         item:"no rain..."                            πiter,pos:1,ord:3,
                                                                                                                              item:"chance..."
         pos1:1,item1:50                   item              πiter,pos,item,ord:2                                                         πiter,pos,item,ord:4

                                                                    iter1=iter                                   iter1=iter                     iter=iter1
  πiter                               pos: item   iter
                                                                                       
                                                                                                           δ
                           item:descendant::ppcp(item)                                                πiter

                                                                                                       σres
            πiter:inner,item
                                                                                                res:(item,item1)

                                                                                                      iter1=iter                                  πiter1:iter,
                                                         πiter1:iter,pos,item
                                                                                                                                                    pos,item
                                                                                       πiter1:iter,
                                                                                         pos1:1,item1:50               item
                                                           pos: item    iter
                                                                                   πiter                         pos: item      iter

                                                         item:attribute::t(item)                      item:descendant::ppcp(item)

                                                                                           πiter:inner,item                                           πinner,outer:iter,
                                                                                                                                                        ord:pos
                                                                                            inner: iter,pos

                                                                                   1        pos: item           iter

                                                                                   item:descendant::day(item)

                                                                                                docitem

 a:   Retrieve typed values for node identifiers                                   πiter,item:"forecast.xml"

      in column a (atomization).                                                                    iter
                                                                                                      1
πiter:outer,pos:pos1,item

                                                                                                               pos1: ord,pos         outer

Back to our Example Plan                                                                         2                    iter=inner

                                                                                                 ·
                                                                                                 ∪
                                                                                                                     4
                                                                                                            πiter,pos:pos1,item
                                                                         3
                                                                    πiter,pos:pos1,item                     pos1: ord,pos       iter
                      res:(item,item1)
                                                                     pos1: ord,pos        iter                       ·
                                                                                                                     ∪
                                                                                           πiter,pos:1,ord:1,
                                                                                ·
                                                                                ∪                                               ·
                                                                                                                                ∪
                         iter1=iter                                                          item:"rain..."
                                                                                                        πiter,pos,item,ord:2          ∪·
                                                                          πiter,pos:1,ord:1,
       πiter1:iter,                                                          item:"no rain..."                         πiter,pos:1,ord:3,
                                                                                                                            item:"chance..."
         pos1:1,item1:50                   item                πiter,pos,item,ord:2                                                     πiter,pos,item,ord:4

                                                                     iter1=iter                                iter1=iter                     iter=iter1
  πiter                               pos: item   iter
                                                         Projection path: δ
                                                                       

                           item:descendant::ppcp(item)    item⇒···/descendant::ppcp ∈ path
                                                                             πiter                                                                  ... (q)
                                                                                                        σres
            πiter:inner,item
                                                         Cardinality (uses fanout):
                                                                                res:(item,item1)


                                                               item:ax::nt(item) (q) = |q|·Prax::nt (· · · )
                                                           πiter1:iter,pos,item
                                                                                    iter1=iter   πiter1:iter,
                                                                                                  pos,item
                                                                                        πiter1:iter,
                                                                                          pos1:1,item1:50            item
                                                             pos: item   iter
                                                                                    πiter                      pos: item      iter

                                                            item:attribute::t(item)                    item:descendant::ppcp(item)

                                                                                            πiter:inner,item                                        πinner,outer:iter,
                                                                                                                                                      ord:pos
                                                                                             inner: iter,pos

                                                                                    1        pos: item       iter

                                                                                    item:descendant::day(item)

                                                                                                 docitem

 a:   Retrieve typed values for node identifiers                                    πiter,item:"forecast.xml"

      in column a (atomization).                                                                     iter
                                                                                                       1
πiter:outer,pos:pos1,item

                                                                                                                      pos1: ord,pos        outer

Back to our Example Plan                                                                                2                    iter=inner

                                                                                                        ·
                                                                                                        ∪
                                                                                                                            4
                                                                                                                   πiter,pos:pos1,item
                                                                                  3
                                                                             πiter,pos:pos1,item                   pos1: ord,pos       iter
                      res:(item,item1)
                                                                              pos1: ord,pos      iter                       ·
                                                                                                                            ∪
                                                                                                  πiter,pos:1,ord:1,
                                                                                      ·
                                                                                      ∪                                                ·
                                                                                                                                       ∪
                         iter1=iter                                                                 item:"rain..."
                                                                                                               πiter,pos,item,ord:2          ∪·
                                                                                   πiter,pos:1,ord:1,
       πiter1:iter,                                                                   item:"no rain..."                       πiter,pos:1,ord:3,
                                                                                                                                   item:"chance..."
         pos1:1,item1:50                   item                         πiter,pos,item,ord:2                                                  πiter,pos,item,ord:4

                                                                              iter1=iter                              iter1=iter                    iter=iter1
  πiter                               pos: item    iter
                                                          iterα   Projection path: δ
                                                                                

                           item:descendant::ppcp(item)             item⇒···/descendant::ppcp ∈ path
                                                                                      πiter                                                               ... (q)
                                                                                                               σres
            πiter:inner,item            iter   α
                                                                  Cardinality (uses fanout):
                                                                                         res:(item,item1)


                                                                        item:ax::nt(item) (q) = |q|·Prax::nt (· · · )
                                                                    πiter1:iter,pos,item
                                                                                             iter1=iter   πiter1:iter,
                                                                                                           pos,item
                                                                                               πiter1:iter,
                                                                                                 pos1:1,item1:50            item
                                                                  Domain iter πiter
                                                                      pos: item
                                                                                  inclusion: pos: item iter
                                                                  iterα ∈ dom
                                                                    item:attribute::t(item) ... (q) ; α                α
                                                                                             item:descendant::ppcp(item)

                                                                                                   πiter:inner,item                                       πinner,outer:iter,
                                                                                                                                                            ord:pos

                                                                  Domain size (uses step selectivity):
                                                                               inner: iter,pos


                                                                   α = α 1· Pr[ax::nt] (· · · )
                                                                               pos: item iter

                                                                                           item:descendant::day(item)

                                                                                                        docitem

 a:   Retrieve typed values for node identifiers                                           πiter,item:"forecast.xml"

      in column a (atomization).                                                                            iter
                                                                                                              1
πiter:outer,pos:pos1,item

                                                                                                                     pos1: ord,pos       outer

Back to our Example Plan                                                                               2                    iter=inner

                                                                                                       ·
                                                                                                       ∪
                                                                                                                           4
                                                                                                                  πiter,pos:pos1,item
                                                                                 3
                                                                            πiter,pos:pos1,item                   pos1: ord,pos       iter
                   res:(item,item1)
                                                                             pos1: ord,pos      iter                       ·
                                                                                                                           ∪
                                                                                                 π
 iter1α               iter1=iter                                “Foreign key” iter,pos:1,ord:1,
                                                                          ∪·           join:
                                                                                      item:"rain..."             ∪·

                                                iterα            q1       q2 rain..." α
                                                                         item:"no =
                                                                                              πiter,pos,item,ord:2
                                                                        πiter,pos:1,ord:1, |q1 |·|q2 |
                                                                                                            (since α
                                                                                                                          ∪·
                                                                                                                                                                α)
    πiter1:iter,                                                                                           πiter,pos:1,ord:3,
                                                                      iter1=iter                                                  item:"chance..."
        pos1:1,item1:50                 item                           πiter,pos,item,ord:2                                                  πiter,pos,item,ord:4

                                                                             iter1=iter                              iter1=iter                    iter=iter1
  πiter                            pos: item     iter
                                                        iterα   Projection path: δ
                                                                              

                        item:descendant::ppcp(item)              item⇒···/descendant::ppcp ∈ path
                                                                                    πiter                                                                ... (q)
                                                                                                              σres
           πiter:inner,item          iter   α
                                                                Cardinality (uses fanout):
                                                                                       res:(item,item1)


                                                                      item:ax::nt(item) (q) = |q|·Prax::nt (· · · )
                                                                  πiter1:iter,pos,item
                                                                                           iter1=iter   πiter1:iter,
                                                                                                         pos,item
                                                                                              πiter1:iter,
                                                                                                pos1:1,item1:50            item
                                                                Domain iter πiter
                                                                    pos: item
                                                                                inclusion: pos: item iter
                                                                iterα ∈ dom
                                                                  item:attribute::t(item) ... (q) ; α                α
                                                                                           item:descendant::ppcp(item)

                                                                                                  πiter:inner,item                                       πinner,outer:iter,
                                                                                                                                                           ord:pos

                                                                Domain size (uses step selectivity):
                                                                             inner: iter,pos


                                                                 α = α 1· Pr[ax::nt] (· · · )
                                                                             pos: item iter

                                                                                          item:descendant::day(item)

                                                                                                       docitem

 a:   Retrieve typed values for node identifiers                                          πiter,item:"forecast.xml"

      in column a (atomization).                                                                           iter
                                                                                                             1
πiter:outer,pos:pos1,item

                                                                        card2 : 4104                                   pos1: ord,pos       outer

Back to our Example Plan                                                                                 2                    iter=inner

                                                            card3 : 3540                                 ·
                                                                                                         ∪
                                                                                                                             4                     card4 : 564
                                                                                                                    πiter,pos:pos1,item
                                                                                   3
                                                                              πiter,pos:pos1,item                   pos1: ord,pos       iter
                   res:(item,item1)
                                                                               pos1: ord,pos      iter                       ·
                                                                                                                             ∪
                                                                                                   π
 iter1α               iter1=iter                                  “Foreign key” iter,pos:1,ord:1,
                                                                            ∪·           join:
                                                                                        item:"rain..."             ∪·

                                                iterα              q1       q2 rain..." α
                                                                           item:"no =
                                                                                                πiter,pos,item,ord:2
                                                                          πiter,pos:1,ord:1, |q1 |·|q2 |
                                                                                                              (since α
                                                                                                                            ∪·
                                                                                                                                                                  α)
    πiter1:iter,                                                                                             πiter,pos:1,ord:3,
                                                                        iter1=iter                                                  item:"chance..."
        pos1:1,item1:50                 item                             πiter,pos,item,ord:2                                                  πiter,pos,item,ord:4

                                                                               iter1=iter                              iter1=iter                    iter=iter1
  πiter                            pos: item     iter
                                                        iterα     Projection path: δ
                                                                                

                        item:descendant::ppcp(item)                item⇒···/descendant::ppcp ∈ path
                                                                                      πiter                                                                ... (q)
                                                                                                                σres
           πiter:inner,item          iter   α
                                                                  Cardinality (uses fanout):
                                                                                         res:(item,item1)


                                                                        item:ax::nt(item) (q) = |q|·Prax::nt (· · · )
                                                                    πiter1:iter,pos,item
                                                                                             iter1=iter   πiter1:iter,
                                                                                                           pos,item
                                                                                                πiter1:iter,
                                                                                                  pos1:1,item1:50            item
                                                                  Domain iter πiter
                                                                      pos: item
                                                                                  inclusion: pos: item iter
                                                                  iterα ∈ dom
                                                                    item:attribute::t(item) ... (q) ; α                α
                                                                                             item:descendant::ppcp(item)

                                                                                                    πiter:inner,item                                       πinner,outer:iter,
                                                card1 : 990                                                                                                  ord:pos

                                                                  Domain size (uses step selectivity):
                                                                               inner: iter,pos


                                                                   α = α 1· Pr[ax::nt] (· · · )
                                                                               pos: item iter

                                                                                            item:descendant::day(item)

                                                                                                         docitem

 a:   Retrieve typed values for node identifiers                                            πiter,item:"forecast.xml"

      in column a (atomization).                                                                             iter
                                                                                                               1
Forecasting New Zealand’s Weather
   The obtained cardinalities can be mapped back to predict item
   counts for corresponding XQuery expressions:4
           for $d in doc ("forecast.xml")/descendant::day card1 = 990/990
           let $day := $d/@t
           let $ppcp := data ($d/descendant::ppcp)
           return
             if ($ppcp > 50)                              card2 = 4104/3402
               then ("rain likely on", $day,            card3 = 3540/2370
                      "chance of precipitation:", $ppcp)
                else ("no rain on", $day) card4 = 564/1032


            Value statistics based on histograms (                        paper)
            Inaccuracy is mainly due to correlations in the data.
                    Rain in the morning likely means rain in the afternoon, too.

       4
           estimated/observed, based on data taken for New Zealand two weeks ago.
  August 26, 2008        Systems Group — Department of Computer Science — ETH Z¨rich
                                                                               u       11
More Realistic Queries: W3C XQuery Use Cases
                                                                 0.01           0.1            1            10
 Prototype implementation                     XMP Q4
 based on Pathfinder                          XMP Q5
                                              XMP Q6
 For each subexpression:                      XMP Q8
                                              XMP Q9
     estimated cardinality                    XMP Q10
                                              XMP Q11
     observed cardinality                     SGML Q3
                                              SGML Q4
 Diameter indicates data                      SGML Q8a
 point “stacking”                             SGML Q8b
                                              R Q2
 Plan root: filled circle                     R Q3
                                              R Q10
                                              R Q11
 Applicable to “real” queries                 R Q13
                                              R Q14
 Recovery from intermediate                   R Q15
 mis-estimations                              R Q17
      e.g., existential semantics                                0.01           0.1            1            10
                                                                    estimated cardinality/observed cardinality

  August 26, 2008    Systems Group — Department of Computer Science — ETH Z¨rich
                                                                           u                               12
Wrap-Up
  Cardinality estimation framework for XQuery
  Subexpression-level estimates for arbitrary XQuery expressions
  Based on Pathfinder’s XQuery-to-relational algebra compiler

                     relational cardinality estimation (System R)
                   + existing work on XPath estimation
                   + histograms for value predicates
                   = cardinality estimation for XQuery


  High-quality estimates for realistic XQuery workloads
      Robust with respect to intermediate errors
  Pluggable and extensible
      e.g., XPath estimation subsystem, positional predicates
 August 26, 2008         Systems Group — Department of Computer Science — ETH Z¨rich
                                                                               u       13

More Related Content

PDF
Mining Frequent Closed Graphs on Evolving Data Streams
PDF
Cloud jpl
PPT
An Effective Rule Miner for Instance Matching in a Web of Data
PDF
Extending lifespan with Hadoop and R
PPTX
Python GC
PDF
PDF
Making Big Data Analytics Interactive and Real-­Time
PDF
Lecture12 xing
Mining Frequent Closed Graphs on Evolving Data Streams
Cloud jpl
An Effective Rule Miner for Instance Matching in a Web of Data
Extending lifespan with Hadoop and R
Python GC
Making Big Data Analytics Interactive and Real-­Time
Lecture12 xing

Viewers also liked (20)

PPT
Beths Powerpoint
PPT
50 Words Powerpoint Jayden
KEY
ECAWA Keynote
PPT
Bankruptcy Information
PPT
Tbfs uc commercial presentation
PPTX
Η συνάρτηση. Μια προξενήτρα αλλοιώτικη από τις άλλες! Function: The matchmaker!
PPT
Autobiography Anthony[1]
PPTX
Gang announcements 2011 01
PPT
GrouperEye
PPT
Make-A-Wish Presentation
PDF
Does the social game business model work for mobile apps?
PPT
City University Food Thinkers
PDF
Transforming HR in an Uncertain Economy: Priorities and Processes That Delive...
PDF
Jack Thurston talk at EU budget and CAP conference: Prague, 11 December 2008
PPT
2009 03 31 Healthstory Webinar Presentation
PPTX
Gang announcements 2010 09
PPT
A Ri Zona Powerpoint
PPT
GrouperEye Product Plan
PDF
Farm Subsidy Transparency
PDF
Gray Routes Full Time JDs
Beths Powerpoint
50 Words Powerpoint Jayden
ECAWA Keynote
Bankruptcy Information
Tbfs uc commercial presentation
Η συνάρτηση. Μια προξενήτρα αλλοιώτικη από τις άλλες! Function: The matchmaker!
Autobiography Anthony[1]
Gang announcements 2011 01
GrouperEye
Make-A-Wish Presentation
Does the social game business model work for mobile apps?
City University Food Thinkers
Transforming HR in an Uncertain Economy: Priorities and Processes That Delive...
Jack Thurston talk at EU budget and CAP conference: Prague, 11 December 2008
2009 03 31 Healthstory Webinar Presentation
Gang announcements 2010 09
A Ri Zona Powerpoint
GrouperEye Product Plan
Farm Subsidy Transparency
Gray Routes Full Time JDs
Ad

Similar to Dependable Cardinality Forecast for XQuery (7)

KEY
Arrows in perl
PDF
Rough Set based Decision Tree for Identifying Vulnerable and Food Insecure Ho...
KEY
Perl saved a lady.
PDF
☣ ppencode ♨
PDF
Computing Marginal in CCMRFs - NIPS 2010
PDF
Engr 371 final exam april 2006
PDF
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
Arrows in perl
Rough Set based Decision Tree for Identifying Vulnerable and Food Insecure Ho...
Perl saved a lady.
☣ ppencode ♨
Computing Marginal in CCMRFs - NIPS 2010
Engr 371 final exam april 2006
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
Ad

More from University of New South Wales (11)

PDF
Declarative analysis of noisy information networks
PDF
DHHT - Modeling beyond plain graphs
PDF
Ontological Conjunctive Query Answering over Large Knowledge Bases
PPTX
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
PDF
GraphREL: A Relational Graph Query Processor
PDF
XML Compression Benchmark
Declarative analysis of noisy information networks
DHHT - Modeling beyond plain graphs
Ontological Conjunctive Query Answering over Large Knowledge Bases
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
GraphREL: A Relational Graph Query Processor
XML Compression Benchmark

Recently uploaded (20)

PDF
TR - Agricultural Crops Production NC III.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Cell Structure & Organelles in detailed.
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
RMMM.pdf make it easy to upload and study
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Basic Mud Logging Guide for educational purpose
PPTX
master seminar digital applications in india
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Microbial disease of the cardiovascular and lymphatic systems
TR - Agricultural Crops Production NC III.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Cell Structure & Organelles in detailed.
Final Presentation General Medicine 03-08-2024.pptx
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
O7-L3 Supply Chain Operations - ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pharma ospi slides which help in ospi learning
RMMM.pdf make it easy to upload and study
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Basic Mud Logging Guide for educational purpose
master seminar digital applications in india
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
VCE English Exam - Section C Student Revision Booklet
Anesthesia in Laparoscopic Surgery in India
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Microbial disease of the cardiovascular and lymphatic systems

Dependable Cardinality Forecast for XQuery

  • 1. Dependable Cardinality Forecasts for XQuery Jens Teubner, ETH (formerly IBM Research) Torsten Grust, U T¨bingen (formerly TUM) u Sebastian Maneth, UNSW and NICTA Sherif Sakr, UNSW and NICTA c Systems Group — Department of Computer Science — ETH Z¨rich u August 26, 2008
  • 2. Cardinality Estimation for XQuery The feature-richness and semantics of the language make cardinality estimation for XQuery notoriously hard. for $d in doc ("forecast.xml")/descendant::day let $day := $d/@t let $ppcp := data ($d/descendant::ppcp) return if ($ppcp > 50) then ("rain likely on", $day, "chance of precipitation:", $ppcp) else ("no rain on", $day) for iteration sequence construction conditionals (if-then-else) ... existential quantification (XPath is not a focus of this work.) August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 2
  • 3. Cardinality Estimation for XQuery The feature-richness and semantics of the language make cardinality estimation for XQuery notoriously hard. for $d in doc ("forecast.xml")/descendant::day card1 = ? let $day := $d/@t let $ppcp := data ($d/descendant::ppcp) return if ($ppcp > 50) card2 = ? then ("rain likely on", $day, card3 = ? "chance of precipitation:", $ppcp) else ("no rain on", $day) card4 = ? for iteration sequence construction conditionals (if-then-else) ... existential quantification (XPath is not a focus of this work.) → Goal: Compute subexpression-level cardinalities cardi . August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 2
  • 4. Idea: Perform cardinality estimation on relational plan equivalents for XQuery. relational cardinality estimation (System R) + existing work on XPath estimation + histograms for value predicates = cardinality estimation for XQuery Build on Pathfinder’s XQuery-to-relational algebra compiler.1 → tuple count ≡ XQuery item count Cardinality information for each subexpression. 1 http://www.pathfinder-xquery.org/ August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 3
  • 5. πiter:outer,pos:pos1,item pos1: ord,pos outer Example (Plan details not of interest today.) 2 iter=inner · ∪ 4 1 3 πiter,pos:pos1,item for $d in doc ("forecast.xml")/descendant::day πiter,pos:pos1,item pos1: ord,pos iter let $day := $d/@t pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, let $ppcp := data ($d/descendant::ppcp) · ∪ item:"rain..." · ∪ return 2 πiter,pos:1,ord:1, πiter,pos,item,ord:2 ∪· if ($ppcp > 50) 3 item:"no rain..." πiter,pos:1,ord:3, item:"chance..." then ("rain likely on", $day, πiter,pos,item,ord:2 πiter,pos,item,ord:4 "chance of precipitation:", $ppcp) iter1=iter iter1=iter iter=iter1 else ("no rain on", $day) 4 δ πiter σres res:(item,item1) Pathfinder compiles XQuery iter1=iter πiter1:iter, πiter1:iter,pos,item with arbitrary nesting. πiter1:iter, pos1:1,item1:50 item pos,item pos: item iter Maintain correspondence to πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) original query if back-end is πiter:inner,item πinner,outer:iter, ord:pos not relational. inner: iter,pos 1 pos: item iter Derive estimates based on an item:descendant::day(item) inference rule set. docitem πiter,item:"forecast.xml" iter 1
  • 6. Relational XQuery Cardinality Estimation Apply System R-style estimation to relational XQuery plans, e.g., Disjoint union: Cartesian product: · |q1 ∪ q2 | = |q1 | + |q2 | |q1 × q2 | = |q1 | · |q2 | Equi-join:    |q1 | · |q2 | if there are indexes on  max {|a| , |b| } both join columns,     idx idx  |q1 q2 | = |q1 | · |q2 | if there is only an index a=b |a|idx on column a,        |q1 | · |q2 | · 1/10 otherwise  |c|idx : Number of unique values in index on column c. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 5
  • 7. Relational XQuery Cardinality Estimation Apply System R-style estimation to relational XQuery plans, e.g., Disjoint union: Cartesian product: · |q1 ∪ q2 | = |q1 | + |q2 | |q1 × q2 | = |q1 | · |q2 | Equi-join:  |q1 | · |q2 | if there are indexes on    max {|a| , |b| }     idx idx both join columns, ?  |q1 q2 | = |q1 | · |q2 | if there is only an index a=b       |a|idx on column a, ?  |q1 | · |q2 | · 1/10 otherwise  Our joins typically operate over computed relations. |c|idx : Number of unique values in index on column c. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 5
  • 8. Abstract Domain Identifiers A simple form of data flow analysis provides the information needed. Introduce abstract domain identifiers α, β, . . . as placeholders for the active runtime domain for each column c. (Read c α as “column c contains values from domain α.”) Estimate the size α of each domain α, e.g.,2 dom a: b1 ,...,bn (q) ⊇ dom (q) ∪ aα ∧ α =! |q| . Identify inclusion relationships α β between domains, e.g., aα ∈ dom (q) ∧ aβ ∈ dom (σ··· (q)) ⇒ β α . 2 Operator a: b1 ,...,bn introduces a new key column (holding row numbers). August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 6
  • 9. Abstract Domain Identifiers Use abstract domain information for cardinality estimation. E.g., “foreign key” join: aα ∈ dom (q1 ) bβ ∈ dom (q2 ) α β |q1 | · |q2 | |q1 q2 | = a=b β Domain inclusion guarantees that each tuple in q1 finds (at least one) join partner in q2 . Other examples: |q1 q2 | = |q1 | − |q2 | if q2 is a subset of q1 . |q1 q2 | = 0 if q1 is a subset of q2 . |q1 q2 | = |q1 | if q1 and q2 are disjoint. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 7
  • 10. Interfacing with XPath—Projection Paths Track XPath navigation by means of projection paths3 a a b a b c c:child::*(b) 1 γa γb 1 γa b c d 2 γb 1 γa γc 1 γa γd 3 γd e 3 γd γe q1 q2 b⇒p ∈ path (q1 ) ⇒ c⇒p/child::* ∈ path (q2 ) Step operator makes XPath navigation explicit in relational plans (compiles to join on SQL back-ends). 3 A. Marian and J. Sim´ on. Projecting XML Documents. VLDB 2003. e August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 8
  • 11. Interfacing with XPath—Cardinality Inference bα bα a a b a b c c:child::*(b) 1 γa γb 1 γa b c d 2 γb 1 γa γc α α 1 γa γd 3 γd e 3 γd γe q1 q2 Cardinality: fn:count (p/child::*) |q2 | = |q1 | · fn:count (p) = |q1 | · Prchild::* (p) fanout (here: 4/3) Domain Sizes: fn:count (p [ child::* ]) α = α · fn:count (p) = α · Pr[child::*] (p) selectivity (here: 2/3) Any XPath estimator that provides Prp2 (p1 ) and Pr[p2 ] (p1 ) will do. Our prototype uses a simple Data Guide-based implementation. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 9
  • 12. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter δ item:descendant::ppcp(item) πiter σres πiter:inner,item res:(item,item1) iter1=iter πiter1:iter, πiter1:iter,pos,item pos,item πiter1:iter, pos1:1,item1:50 item pos: item iter πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos inner: iter,pos 1 pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  • 13. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item pos: item iter πiter pos: item iter item:attribute::t(item) item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos inner: iter,pos 1 pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  • 14. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ πiter,pos:1,ord:1, · ∪ · ∪ iter1=iter item:"rain..." πiter,pos,item,ord:2 ∪· πiter,pos:1,ord:1, πiter1:iter, item:"no rain..." πiter,pos:1,ord:3, item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  • 15. πiter:outer,pos:pos1,item pos1: ord,pos outer Back to our Example Plan 2 iter=inner · ∪ 4 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ π iter1α iter1=iter “Foreign key” iter,pos:1,ord:1, ∪· join: item:"rain..." ∪· iterα q1 q2 rain..." α item:"no = πiter,pos,item,ord:2 πiter,pos:1,ord:1, |q1 |·|q2 | (since α ∪· α) πiter1:iter, πiter,pos:1,ord:3, iter1=iter item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  • 16. πiter:outer,pos:pos1,item card2 : 4104 pos1: ord,pos outer Back to our Example Plan 2 iter=inner card3 : 3540 · ∪ 4 card4 : 564 πiter,pos:pos1,item 3 πiter,pos:pos1,item pos1: ord,pos iter res:(item,item1) pos1: ord,pos iter · ∪ π iter1α iter1=iter “Foreign key” iter,pos:1,ord:1, ∪· join: item:"rain..." ∪· iterα q1 q2 rain..." α item:"no = πiter,pos,item,ord:2 πiter,pos:1,ord:1, |q1 |·|q2 | (since α ∪· α) πiter1:iter, πiter,pos:1,ord:3, iter1=iter item:"chance..." pos1:1,item1:50 item πiter,pos,item,ord:2 πiter,pos,item,ord:4 iter1=iter iter1=iter iter=iter1 πiter pos: item iter iterα Projection path: δ item:descendant::ppcp(item) item⇒···/descendant::ppcp ∈ path πiter ... (q) σres πiter:inner,item iter α Cardinality (uses fanout): res:(item,item1) item:ax::nt(item) (q) = |q|·Prax::nt (· · · ) πiter1:iter,pos,item iter1=iter πiter1:iter, pos,item πiter1:iter, pos1:1,item1:50 item Domain iter πiter pos: item inclusion: pos: item iter iterα ∈ dom item:attribute::t(item) ... (q) ; α α item:descendant::ppcp(item) πiter:inner,item πinner,outer:iter, card1 : 990 ord:pos Domain size (uses step selectivity): inner: iter,pos α = α 1· Pr[ax::nt] (· · · ) pos: item iter item:descendant::day(item) docitem a: Retrieve typed values for node identifiers πiter,item:"forecast.xml" in column a (atomization). iter 1
  • 17. Forecasting New Zealand’s Weather The obtained cardinalities can be mapped back to predict item counts for corresponding XQuery expressions:4 for $d in doc ("forecast.xml")/descendant::day card1 = 990/990 let $day := $d/@t let $ppcp := data ($d/descendant::ppcp) return if ($ppcp > 50) card2 = 4104/3402 then ("rain likely on", $day, card3 = 3540/2370 "chance of precipitation:", $ppcp) else ("no rain on", $day) card4 = 564/1032 Value statistics based on histograms ( paper) Inaccuracy is mainly due to correlations in the data. Rain in the morning likely means rain in the afternoon, too. 4 estimated/observed, based on data taken for New Zealand two weeks ago. August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 11
  • 18. More Realistic Queries: W3C XQuery Use Cases 0.01 0.1 1 10 Prototype implementation XMP Q4 based on Pathfinder XMP Q5 XMP Q6 For each subexpression: XMP Q8 XMP Q9 estimated cardinality XMP Q10 XMP Q11 observed cardinality SGML Q3 SGML Q4 Diameter indicates data SGML Q8a point “stacking” SGML Q8b R Q2 Plan root: filled circle R Q3 R Q10 R Q11 Applicable to “real” queries R Q13 R Q14 Recovery from intermediate R Q15 mis-estimations R Q17 e.g., existential semantics 0.01 0.1 1 10 estimated cardinality/observed cardinality August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 12
  • 19. Wrap-Up Cardinality estimation framework for XQuery Subexpression-level estimates for arbitrary XQuery expressions Based on Pathfinder’s XQuery-to-relational algebra compiler relational cardinality estimation (System R) + existing work on XPath estimation + histograms for value predicates = cardinality estimation for XQuery High-quality estimates for realistic XQuery workloads Robust with respect to intermediate errors Pluggable and extensible e.g., XPath estimation subsystem, positional predicates August 26, 2008 Systems Group — Department of Computer Science — ETH Z¨rich u 13