SlideShare a Scribd company logo
Realtime Analytics
 with Cassandra
      Acunu Analytics

     Tom Wilkie, Acunu
     21st August 2012
•   Motivation / alternatives
    •   What is it?
    •   How does it work?
    •   Approximate Analytics
    •   Whats it good for?



2
                                    Analytics
•   Motivation / alternatives
    •   What is it?
    •   How does it work?
    •   Approximate Analytics
    •   Whats it good for?



3
                                    Analytics
Why bother?
    “Companies that can harness big data will
         trample data incompetents”
                The Economist, May 26th 2011




4
                                                Analytics
time                                page                           session id              duration
          time                                 page                           session id             duration
            time                               ...page                          session id            duration
          ... time
             ... time
                                                    page
                                                  ... page                        session id ......
                                                                                  ...
                                                                                    ...                 duration
                                                    ... page                        session id            duration
                ... time
    14:58:03.234 time                                  ...
                                        /index.html page                               session id 175 ......
                                                                                       ...
                                                                                         ...                duration
                   ...
                                                         ...
                                                                           248.180.3.40  session id 175 duration
     14:58:03.234 time...
       14:58:03.234 time
                                           /index.html page
                                                            ...
                                             /index.html page
                                                                              248.180.3.40 session id 175 ......
                                                                                           ...
                                                                                             ...                duration
                         ...
         14:58:03.234 /csi/csi/council/freedom.html
    14:58:03.409 ... time                                     ...
                                                                                248.180.3.40
                                                /index.html page 248.180.3.40 session id     session id 175 duration
                                                                                               ...
                                                                                  248.180.3.40 1234                ...
     14:58:03.409 ... time                         /index.html page 248.180.3.40 session id duration
                                /csi/csi/council/freedom.html    ...                 248.180.3.40 1234 175 ...
                                                                                                 ...
                                                     /index.html page 248.180.3.40 session id duration
           14:58:03.234
                                /docs/access/chapter8.txt ...... page 248.180.3.40 ...session id ......
              14:58:03.234 /csi/csi/council/freedom.html
       14:58:03.409 ... time                                                           248.180.3.40 1234 175 duration
                                    /csi/csi/council/freedom.html 99.1.10.178                          52
                                  /docs/access/chapter8.txt ... page 248.180.3.40 ...session id duration
         14:58:03.409 ... time
    14:58:03.877 14:58:03.234                           /index.html                       248.180.3.40 1234 175
     14:58:03.877   14:58:03.234 /csi/csi/council/freedom.html
           14:58:03.409 ... time                           /index.html         99.1.10.178               52
                                                                                            248.180.3.40 1234 175 ...
                                                                                                       ... 52 1234 175 duration
              14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.178
                                    /docs/access/chapter8.txt ... page 248.180.3.40 session id
    14:58:03.87714:58:03.234 time                                                         248.180.3.40 ...session id ......
                                                                                                248.180.3.40 1234 175 duration
       14:58:03.877                                          /index.html                      248.180.3.40
         14:58:03.877 /docs/access/chapter8.txt
                 14:58:03.409 ... time/docs/access/chapter8.txt ...99.1.10.178
                                           /csi/csi/council/freedom.html 99.1.10.178
                                                                /index.html page                       52 ...52
                                                                                            248.180.3.40 session id duration
                          14:58:03.234
           14:58:03.877 /docs/access/chapter8.txt
     14:58:03.877 14:58:03.234 time      /docs/access/chapter8.txt ...99.1.10.178
                                              /csi/csi/council/freedom.html 99.1.10.178
                                                                  /index.html page                       52 ... 1234 175 ...
                                                                                                              52
                                                                                                           52 ... 1234 175 duration
                                                                                                  248.180.3.40
                    14:58:03.409 ...
                                    /docs/access/chapter8.txt
                                           /docs/access/chapter8.txt ...99.1.10.178     99.1.10.178              52
    14:58:03.87714:58:03.409 ...... /csi/csi/council/freedom.html 99.1.10.17852 52session id 175 ......
                                /docs/access/chapter8.txt /index.html page 248.180.3.40
       14:58:03.877 14:58:03.234 time            /csi/csi/council/freedom.html
                          14:58:03.409 /docs/access/chapter8.txt 99.1.10.178 248.180.3.40 session id duration
                                                                                   99.1.10.178 248.180.3.40... 1234
              14:58:03.877
                                      /docs/access/chapter8.txt             99.1.10.178                           52
                                                /docs/access/chapter8.txt ......99.1.10.178
         14:58:03.877 14:58:03.234 time
     14:58:03.877 14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 duration
                                  /docs/access/chapter8.txt /index.html page                          248.180.3.40
                 14:58:03.877
                    14:58:03.877 /docs/access/chapter8.txt                                                           52
       14:58:03.877 14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 ......
           14:58:03.877 14:58:03.234                                                              248.180.3.40
                                    /docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52
                                                                                 99.1.10.178
                       14:58:03.877 /docs/access/chapter8.txt248.180.3.40
              14:58:03.877 14:58:03.234            /docs/access/chapter8.txt ...                    248.180.3.40
                                      /docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52
         14:58:03.877 /csi/csi/council/freedom.html
    14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
                              14:58:03.409 ...        /docs/access/chapter8.txt ...                   1234
                 14:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 ...
                                                                                   99.1.10.178        248.180.3.40
                                                                           /index.html 99.1.10.178 248.180.3.40
                                /csi/csi/council/freedom.html
                                         /docs/access/chapter8.txt /index.html
     14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
           14:58:03.877 14:58:03.409 /docs/access/chapter8.txt                248.180.3.40
                    14:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 175
                                                                                      99.1.10.178       1234 248.180.3.40
                                                                                                       248.180.3.40
           14:58:03.877 /csi/csi/council/freedom.html                  248.180.3.40
                                   /docs/access/chapter8.txt /index.html 99.1.10.178
       14:58:03.409 14:58:03.877 /docs/access/chapter8.txt                   99.1.10.178     1234 52
                        /docs/access/chapter8.txt/csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 175
    14:58:03.87714:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 1234 175
                           14:58:03.409 /docs/access/chapter8.txt 99.1.10.17852248.180.3.40
                                     /docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52
                            /csi/csi/council/freedom.html 99.1.10.178
        14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
             14:58:03.877 14:58:03.409 /docs/access/chapter8.txt         248.180.3.40          1234        52
                          /docs/access/chapter8.txt
                               /csi/csi/council/freedom.html 99.1.10.178  248.180.3.40
                   14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 99.1.10.178
          14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
               14:58:03.877 14:58:03.409 /docs/access/chapter8.txt               99.1.10.17852248.180.3.40
     14:58:03.87714:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 52 1234
                                                                                                1234
       14:58:03.877 14:58:03.877 /docs/access/chapter8.txt 248.180.3.40
                14:58:03.877 /csi/csi/council/freedom.html                         99.1.10.17852248.180.3.40
                            /docs/access/chapter8.txt/csi/csi/council/freedom.html 99.1.10.17852 52 52 1234
                                                                                                  1234
    14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.178 248.180.3.40
           14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
                                  14:58:03.409 /docs/access/chapter8.txt 99.1.10.17852248.180.3.40
                      /csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 99.1.10.17852 52 52 1234
                                                                   248.180.3.40
                               /docs/access/chapter8.txt/csi/csi/council/freedom.html
        14:58:03.877 14:58:03.877 /docs/access/chapter8.txt 248.180.3.40
             14:58:03.409 14:58:03.877 /docs/access/chapter8.txt                                    1234
                                                                                          99.1.10.178
     14:58:03.409 14:58:03.87714:58:03.409
          14:58:03.877 /csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 99.1.10.17852 52 52
                                 /docs/access/chapter8.txt 248.180.3.40
                                     /csi/csi/council/freedom.html 99.1.10.178
                                              /docs/access/chapter8.txt 248.180.3.40123452 1234
                                                                                      99.1.10.178
       14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.178 1234 52 52 52
               14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
           14:58:03.87714:58:03.877
                          /csi/csi/council/freedom.html/docs/access/chapter8.txt
                                   /docs/access/chapter8.txt 248.180.3.40                           52
                                                                    99.1.10.178 248.180.3.40 99.1.10.178
                        /docs/access/chapter8.txt /docs/access/chapter8.txt 99.1.10.178 99.1.10.17852 52 52
                           14:58:03.877 /docs/access/chapter8.txt 99.1.10.17852 99.1.10.178
                14:58:03.409 14:58:03.877 /docs/access/chapter8.txt
                                     /docs/access/chapter8.txt 248.180.3.40
    14:58:03.877 14:58:03.409 14:58:03.877
        14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html                     248.180.3.40 99.1.10.178
             14:58:03.877 /csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 52 1234
     14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52
               14:58:03.87714:58:03.877
                                                   /docs/access/chapter8.txt
          14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234
                          /docs/access/chapter8.txt /docs/access/chapter8.txt
                               /csi/csi/council/freedom.html 99.1.10.178
                                                     /docs/access/chapter8.txt 248.180.3.401234
       14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52
           14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178
                14:58:03.87714:58:03.877
                            /docs/access/chapter8.txt /docs/access/chapter8.txt
                                 /csi/csi/council/freedom.html 99.1.10.178
                                                       /docs/access/chapter8.txt 248.180.3.401234
        14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52
             14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178
                  14:58:03.87714:58:03.877
                               /docs/access/chapter8.txt /docs/access/chapter8.txt
                                   /csi/csi/council/freedom.html 99.1.10.178
                                                          /docs/access/chapter8.txt 248.180.3.401234
          14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52
               14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178
                   14:58:03.87714:58:03.877
                                 /docs/access/chapter8.txt
                                     /csi/csi/council/freedom.html 99.1.10.178
                                                            /docs/access/chapter8.txt 248.180.3.40
           14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52
                14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234
                                   /docs/access/chapter8.txt
                                        /csi/csi/council/freedom.html 99.1.10.178
                                                              /docs/access/chapter8.txt 248.180.3.40
             14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52
                     14:58:03.877
                  14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234
                      14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178
                                     /docs/access/chapter8.txt  /docs/access/chapter8.txt 248.180.3.40
               14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234
                   14:58:03.409 /docs/access/chapter8.txt
                         14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178
                                                          /csi/csi/council/freedom.html 99.1.10.17852 1234
                                                                                                248.180.3.40
                     14:58:03.409 /docs/access/chapter8.txt
                           14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178
                                                            /csi/csi/council/freedom.html 99.1.10.17852 1234 52 1234
                14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40                  248.180.3.40
                      14:58:03.409 /docs/access/chapter8.txt
                                                 /csi/csi/council/freedom.html 99.1.10.178
                  14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40
                             14:58:03.877                     /csi/csi/council/freedom.html    99.1.10.17852 1234 52 1234
                                                                                                    248.180.3.40
                         14:58:03.409 /docs/access/chapter8.txt
                   14:58:03.877 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178           99.1.10.17852 1234 52
                                                            /docs/access/chapter8.txt 248.180.3.40
                           14:58:03.409 /docs/access/chapter8.txt
                     14:58:03.877 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178           99.1.10.17852 1234 52
                                                              /docs/access/chapter8.txt 248.180.3.40
                      14:58:03.877                 /docs/access/chapter8.txt
                                    14:58:03.877 /csi/csi/council/freedom.html
                             14:58:03.409                                                 99.1.10.17899.1.10.17852 1234 52
                                                                /docs/access/chapter8.txt 248.180.3.40
                                14:58:03.409 /docs/access/chapter8.txt
                         14:58:03.877                     /csi/csi/council/freedom.html 99.1.10.178
                                                                                                248.180.3.40     52 1234
                                  14:58:03.409 /docs/access/chapter8.txt
                           14:58:03.877                     /csi/csi/council/freedom.html 99.1.10.178
                                                                                                  248.180.3.40     52 1234
                             14:58:03.877
                                    14:58:03.409          /docs/access/chapter8.txt
                                                              /csi/csi/council/freedom.html 99.1.10.178
                                                                                                    248.180.3.40    52 1234
                                14:58:03.877                /docs/access/chapter8.txt            99.1.10.178          52
                                  14:58:03.877                /docs/access/chapter8.txt            99.1.10.178         52
                                    14:58:03.877                /docs/access/chapter8.txt            99.1.10.178         52


5
                                                                                                                               Analytics
Combining “big” and “real-time” is hard

    Live & historical                    Drill downs
                         Trends...
      aggregates...                      and roll ups




6
                                                        Analytics
Solution              Con

                       Scalability
                         $$$


                      Not realtime


               Spartan query semantics =>
                 complex, DIY solutions

7
                                            Analytics
•   Motivation / alternatives
    •   What is it?
    •   How does it work?
    •   Approximate Analytics
    •   Whats it good for?



8
                                    Analytics
Analytics

                                     counter
                                     updates
Click stream    events
                          Acunu
Sensor data
                         Analytics
     etc




     •   Aggregate incrementally, on the fly
     •   Store live + historical aggregates
{
              time : TIME(HOUR; MIN; SEC),
              page : PATH(/),
              category : STRING,
              loadTime : LONG
          }




     {
         select : ["COUNT", "AVG(loadTime)"],
         where : “time, ?path”,
         group : “time, ?category”
     }



10
                                                Analytics
Dashboard UI




11
                    Analytics
•   Motivation / alternatives
     •   What is it?
     •   How does it work?
     •   Approximate Analytics
     •   Whats it good for?



12
                                     Analytics
count
                grouped by ...
                    day
  count
 distinct
(session)
     count       ... geography

avg(duration)
                  ... browser


13
                          Analytics
time : TIME(HOUR; MIN; SEC),
                 cust_id : LONG,
       Data      session_id : LONG,
     Definition   geography : STRING,
                 browser : STRING,
                 load_time : LONG



                 { select: “COUNT”
                   patterns: [
                      { where : “?time”, group : “?time” },

      Query           { where : “”, group : “geography” },
                      { where : “”, group : “browser” }
     Patterns      ]
                 }, {
                   select: [“COUNT_DISTINCT(session_id)”,
                        “AVG(load_time)”],
                   where: “time”, group: “”
                 }



14
                                                              Analytics
21:00      all→1345    :00→45      :01→62      :02→87       ...

                         22:00      all→3221    :00→22      :00→19     :02→104       ...
{
     cust_id: user01,      ...                                                       ...

     session_id: 102,      UK        all→228    user01→1   user14→12   user99→7      ...
     geography: UK,
                           US        all→354    user01→4   user04→8    user56→17     ...
     browser: IE,
     time: 22:02,          ...

}                       UK, 22:00   all→1904       ...

                           ∅        all→87314   UK→238     US→354         ...




15
                                                                                 Analytics
21:00      all→1345     :00→45     :01→62      :02→87       ...

                         22:00      all→3222     :00→22     :00→19     :02→105       ...
{
     cust_id: user01,      ...                                                       ...

     session_id: 102,      UK        all→229    user01→2   user14→12   user99→7      ...
     geography: UK,
                           US        all→354    user01→4   user04→8    user56→17     ...
     browser: IE,
     time: 22:02,          ...

}                       UK, 22:00   all→1905       ...

                           ∅        all→87315   UK→239     US→354         ...




16
                                                                                 Analytics
21:00      all→1345    :00→45      :01→62      :02→87       ...

      22:00      all→3221    :00→22      :00→19     :02→104       ...

        ...                                                       ...

        UK        all→228    user01→1   user14→12   user99→7      ...

        US        all→354    user01→4   user04→8    user56→17     ...

        ...

     UK, 22:00   all→1904       ...

        ∅        all→87314   UK→238     US→354         ...




17
                                                              Analytics
where time 21:00-22:00
 count(*)
                          21:00      all→1345    :00→45      :01→62      :02→87       ...

                          22:00      all→3222    :00→22      :01→19     :02→105       ...

                            ...                                                       ...

                            UK        all→229    user01→2   user14→12   user99→7      ...

                            US        all→354    user01→4   user04→8    user56→17     ...

                            ...

                         UK, 22:00   all→1905       ...

                            ∅        all→87315   UK→239     US→354         ...




18
                                                                                  Analytics
where time 21:00-22:00
 count(*)
                           21:00      all→1345    :00→45      :01→62      :02→87       ...


where time 22:00-23:00,    22:00      all→3222    :00→22      :01→19     :02→105       ...


 group by minute             ...                                                       ...

                             UK        all→229    user01→2   user14→12   user99→7      ...

                             US        all→354    user01→4   user04→8    user56→17     ...

                             ...

                          UK, 22:00   all→1905       ...

                             ∅        all→87315   UK→239     US→354         ...




19
                                                                                   Analytics
where time 21:00-22:00
 count(*)
                           21:00      all→1345     :00→45     :01→62      :02→87       ...


where time 22:00-23:00,    22:00      all→3222    :00→22      :01→19     :02→105       ...


 group by minute             ...                                                       ...

                             UK        all→229    user01→2   user14→12   user99→7      ...


where geography=UK           US        all→354    user01→4   user04→8    user56→17     ...


 group all by user,          ...

                          UK, 22:00   all→1905       ...

                             ∅        all→87315   UK→239      US→354        ...




20
                                                                                   Analytics
where time 21:00-22:00
 count(*)
                           21:00      all→1345     :00→45     :01→62      :02→87       ...


where time 22:00-23:00,    22:00      all→3222    :00→22      :01→19     :02→105       ...


 group by minute             ...                                                       ...

                             UK        all→229    user01→2   user14→12   user99→7      ...


where geography=UK           US        all→354    user01→4   user04→8    user56→17     ...


 group all by user,          ...

                          UK, 22:00   all→1905       ...

count all                    ∅        all→87315   UK→239      US→354        ...




21
                                                                                   Analytics
where time 21:00-22:00
 count(*)
                           21:00      all→1345     :00→45     :01→62      :02→87       ...


where time 22:00-23:00,    22:00      all→3222    :00→22      :01→19     :02→105       ...


 group by minute             ...                                                       ...

                             UK        all→229    user01→2   user14→12   user99→7      ...


where geography=UK           US        all→354    user01→4   user04→8    user56→17     ...


 group all by user,          ...

                          UK, 22:00   all→1905       ...

count all                    ∅        all→87315   UK→239      US→354        ...




group all by geo
22
                                                                                   Analytics
•   Motivation / alternatives
     •   What is it?
     •   How does it work?
     •   Approximate Analytics
     •   Whats it good for?



23
                                     Analytics
Approximate Analytics
                 Exact




     Real-time           Large Scale


24
                                       Analytics
Count Distinct

     Plan A: keep a list of all the things you’ve seen
               count them at query time


                Quick to update
                  ... but at scale ...
                Takes lots of space
                Takes a long time to query
25
                                                         Analytics
Approximate Distinct

     max # leading zeroes seen so far
         item          hash        leading zeroes   max so far

         x        00101001110...          2            2
         y        11010100111...          0            2
         z        00011101011...          3            3
                       ...
     ... to see a max of M takes about        2M    items

26
                                                                 Analytics
Approximate Distinct

            to reduce var, average over m=2k sub-streams

     item          hash          index, zeroes   max so far

     x       00101001110...          0, 0        0,0,0,0
     y       11010100111...          3, 1        0,0,1,0
     z       00011101011...          0, 1        1,0,1,0
                   ...
            take the harmonic mean
27
                                                              Analytics
•   Motivation / alternatives
     •   What is it?
     •   How does it work?
     •   Approximate Analytics
     •   Whats it good for?



28
                                     Analytics
Was it worth it?




29
                        Analytics
What’s Coming?

     •   Ad Hoc: same queries, but without the need
         to pre-define them
     •   Geolocation: support for location-based
         events and queries
     •   Drill down: see the events that make up any
         given aggregate


30
                                                       Analytics
•   Motivation / alternatives
     •   What is it?
     •   How does it work?
     •   Approximate Analytics
     •   Whats it good for?



31
                                     Analytics
Manufacturing   Social Media   Ad Analytics




                 Systems         Financial
 Oil + Gas
                Monitoring       Services



                                              Analytics
“Up and running in about 4 hours”
“We found out a competitor
  was scraping our data”

                      “We keep discovering use cases
                         we hadn’t thought of ”




                                                 Analytics
Analytics
www.acunu.com @acunu




Apache, Apache Cassandra, Cassandra, Hadoop, and the eye and
elephant logos are trademarks of the Apache Software Foundation.
 35
                                                                        Analytics

More Related Content

PDF
Acunu Analytics @ Cassandra London
PDF
Bash Beginners Guide
PDF
Case cx130 d crawler excavator service repair manual
PDF
Dive into greasemonkey (español)
PDF
Abs guide
PDF
PDF
RHEL-7 Administrator Guide for RedHat 7
PDF
Load runner generator
Acunu Analytics @ Cassandra London
Bash Beginners Guide
Case cx130 d crawler excavator service repair manual
Dive into greasemonkey (español)
Abs guide
RHEL-7 Administrator Guide for RedHat 7
Load runner generator

What's hot (15)

PDF
Spring Reference
PDF
PDF
ChucK_manual
PDF
Cinelerra Video Editing Manual
PDF
Another example PDF
PDF
Ateji PX manual
PDF
User manual MXSuite ENG 201902
PDF
Spelling bee
PDF
Abs guide
PDF
Guia de referencia do at 8000 s
PDF
Ghi chep ccna__vnpro_[bookbooming.com]
PDF
Red hat storage-3-administration_guide-en-us
PDF
Dreamweaver reference
PDF
2010 French Domain name Industry Report
PDF
Introduction to system_administration
Spring Reference
ChucK_manual
Cinelerra Video Editing Manual
Another example PDF
Ateji PX manual
User manual MXSuite ENG 201902
Spelling bee
Abs guide
Guia de referencia do at 8000 s
Ghi chep ccna__vnpro_[bookbooming.com]
Red hat storage-3-administration_guide-en-us
Dreamweaver reference
2010 French Domain name Industry Report
Introduction to system_administration
Ad

More from Acunu (20)

PDF
Acunu and Hailo: a realtime analytics case study on Cassandra
PDF
Virtual nodes: Operational Aspirin
PDF
Acunu Analytics and Cassandra at Hailo All Your Base 2013
PDF
Understanding Cassandra internals to solve real-world problems
PDF
Acunu Analytics: Simpler Real-Time Cassandra Apps
PDF
All Your Base
PDF
Realtime Analytics with Apache Cassandra
PDF
Realtime Analytics with Apache Cassandra - JAX London
PDF
Real-time Cassandra
PDF
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
KEY
Exploring Big Data value for your business
PDF
Realtime Analytics on the Twitter Firehose with Cassandra
PDF
Progressive NOSQL: Cassandra
PPTX
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
KEY
Cassandra EU 2012 - Putting the X Factor into Cassandra
PPTX
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
PDF
Next Generation Cassandra
PDF
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
PDF
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
PDF
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Acunu and Hailo: a realtime analytics case study on Cassandra
Virtual nodes: Operational Aspirin
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Understanding Cassandra internals to solve real-world problems
Acunu Analytics: Simpler Real-Time Cassandra Apps
All Your Base
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache Cassandra - JAX London
Real-time Cassandra
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Exploring Big Data value for your business
Realtime Analytics on the Twitter Firehose with Cassandra
Progressive NOSQL: Cassandra
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Next Generation Cassandra
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Ad

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Spectroscopy.pptx food analysis technology
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Electronic commerce courselecture one. Pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Reach Out and Touch Someone: Haptics and Empathic Computing
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
The Rise and Fall of 3GPP – Time for a Sabbatical?
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
Spectroscopy.pptx food analysis technology
A comparative analysis of optical character recognition models for extracting...
Unlocking AI with Model Context Protocol (MCP)
Group 1 Presentation -Planning and Decision Making .pptx
SOPHOS-XG Firewall Administrator PPT.pptx
cuic standard and advanced reporting.pdf
Tartificialntelligence_presentation.pptx
Electronic commerce courselecture one. Pdf

Realtime Analytics with Cassandra

  • 1. Realtime Analytics with Cassandra Acunu Analytics Tom Wilkie, Acunu 21st August 2012
  • 2. Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for? 2 Analytics
  • 3. Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for? 3 Analytics
  • 4. Why bother? “Companies that can harness big data will trample data incompetents” The Economist, May 26th 2011 4 Analytics
  • 5. time page session id duration time page session id duration time ...page session id duration ... time ... time page ... page session id ...... ... ... duration ... page session id duration ... time 14:58:03.234 time ... /index.html page session id 175 ...... ... ... duration ... ... 248.180.3.40 session id 175 duration 14:58:03.234 time... 14:58:03.234 time /index.html page ... /index.html page 248.180.3.40 session id 175 ...... ... ... duration ... 14:58:03.234 /csi/csi/council/freedom.html 14:58:03.409 ... time ... 248.180.3.40 /index.html page 248.180.3.40 session id session id 175 duration ... 248.180.3.40 1234 ... 14:58:03.409 ... time /index.html page 248.180.3.40 session id duration /csi/csi/council/freedom.html ... 248.180.3.40 1234 175 ... ... /index.html page 248.180.3.40 session id duration 14:58:03.234 /docs/access/chapter8.txt ...... page 248.180.3.40 ...session id ...... 14:58:03.234 /csi/csi/council/freedom.html 14:58:03.409 ... time 248.180.3.40 1234 175 duration /csi/csi/council/freedom.html 99.1.10.178 52 /docs/access/chapter8.txt ... page 248.180.3.40 ...session id duration 14:58:03.409 ... time 14:58:03.877 14:58:03.234 /index.html 248.180.3.40 1234 175 14:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 14:58:03.409 ... time /index.html 99.1.10.178 52 248.180.3.40 1234 175 ... ... 52 1234 175 duration 14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt ... page 248.180.3.40 session id 14:58:03.87714:58:03.234 time 248.180.3.40 ...session id ...... 248.180.3.40 1234 175 duration 14:58:03.877 /index.html 248.180.3.40 14:58:03.877 /docs/access/chapter8.txt 14:58:03.409 ... time/docs/access/chapter8.txt ...99.1.10.178 /csi/csi/council/freedom.html 99.1.10.178 /index.html page 52 ...52 248.180.3.40 session id duration 14:58:03.234 14:58:03.877 /docs/access/chapter8.txt 14:58:03.877 14:58:03.234 time /docs/access/chapter8.txt ...99.1.10.178 /csi/csi/council/freedom.html 99.1.10.178 /index.html page 52 ... 1234 175 ... 52 52 ... 1234 175 duration 248.180.3.40 14:58:03.409 ... /docs/access/chapter8.txt /docs/access/chapter8.txt ...99.1.10.178 99.1.10.178 52 14:58:03.87714:58:03.409 ...... /csi/csi/council/freedom.html 99.1.10.17852 52session id 175 ...... /docs/access/chapter8.txt /index.html page 248.180.3.40 14:58:03.877 14:58:03.234 time /csi/csi/council/freedom.html 14:58:03.409 /docs/access/chapter8.txt 99.1.10.178 248.180.3.40 session id duration 99.1.10.178 248.180.3.40... 1234 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52 /docs/access/chapter8.txt ......99.1.10.178 14:58:03.877 14:58:03.234 time 14:58:03.877 14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 duration /docs/access/chapter8.txt /index.html page 248.180.3.40 14:58:03.877 14:58:03.877 /docs/access/chapter8.txt 52 14:58:03.877 14:58:03.409 ... /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 ...... 14:58:03.877 14:58:03.234 248.180.3.40 /docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52 99.1.10.178 14:58:03.877 /docs/access/chapter8.txt248.180.3.40 14:58:03.877 14:58:03.234 /docs/access/chapter8.txt ... 248.180.3.40 /docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52 14:58:03.877 /csi/csi/council/freedom.html 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.409 ... /docs/access/chapter8.txt ... 1234 14:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 ... 1234 175 ... 99.1.10.178 248.180.3.40 /index.html 99.1.10.178 248.180.3.40 /csi/csi/council/freedom.html /docs/access/chapter8.txt /index.html 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 175 99.1.10.178 1234 248.180.3.40 248.180.3.40 14:58:03.877 /csi/csi/council/freedom.html 248.180.3.40 /docs/access/chapter8.txt /index.html 99.1.10.178 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 1234 52 /docs/access/chapter8.txt/csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 175 14:58:03.87714:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 1234 175 14:58:03.409 /docs/access/chapter8.txt 99.1.10.17852248.180.3.40 /docs/access/chapter8.txt /index.html 99.1.10.178 248.180.3.40 52 /csi/csi/council/freedom.html 99.1.10.178 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 1234 52 /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 99.1.10.178 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 99.1.10.17852248.180.3.40 14:58:03.87714:58:03.877 14:58:03.234 /csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 1234 14:58:03.877 14:58:03.877 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852248.180.3.40 /docs/access/chapter8.txt/csi/csi/council/freedom.html 99.1.10.17852 52 52 1234 1234 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.178 248.180.3.40 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.409 /docs/access/chapter8.txt 99.1.10.17852248.180.3.40 /csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 99.1.10.17852 52 52 1234 248.180.3.40 /docs/access/chapter8.txt/csi/csi/council/freedom.html 14:58:03.877 14:58:03.877 /docs/access/chapter8.txt 248.180.3.40 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 1234 99.1.10.178 14:58:03.409 14:58:03.87714:58:03.409 14:58:03.877 /csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 99.1.10.17852 52 52 /docs/access/chapter8.txt 248.180.3.40 /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.40123452 1234 99.1.10.178 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.178 1234 52 52 52 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt 14:58:03.87714:58:03.877 /csi/csi/council/freedom.html/docs/access/chapter8.txt /docs/access/chapter8.txt 248.180.3.40 52 99.1.10.178 248.180.3.40 99.1.10.178 /docs/access/chapter8.txt /docs/access/chapter8.txt 99.1.10.178 99.1.10.17852 52 52 14:58:03.877 /docs/access/chapter8.txt 99.1.10.17852 99.1.10.178 14:58:03.409 14:58:03.877 /docs/access/chapter8.txt /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 14:58:03.409 14:58:03.877 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 248.180.3.40 99.1.10.178 14:58:03.877 /csi/csi/council/freedom.html/docs/access/chapter8.txt 1234 52 1234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52 14:58:03.87714:58:03.877 /docs/access/chapter8.txt 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234 /docs/access/chapter8.txt /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.401234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178 14:58:03.87714:58:03.877 /docs/access/chapter8.txt /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.401234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 52 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178 14:58:03.87714:58:03.877 /docs/access/chapter8.txt /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.401234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 99.1.10.178 14:58:03.87714:58:03.877 /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234 /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 52 14:58:03.877 14:58:03.409 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 /docs/access/chapter8.txt /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 99.1.10.178 52 1234 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 /csi/csi/council/freedom.html 99.1.10.17852 1234 248.180.3.40 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 /csi/csi/council/freedom.html 99.1.10.17852 1234 52 1234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 248.180.3.40 14:58:03.409 /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.17852 1234 52 1234 248.180.3.40 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.17852 1234 52 /docs/access/chapter8.txt 248.180.3.40 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 99.1.10.17852 1234 52 /docs/access/chapter8.txt 248.180.3.40 14:58:03.877 /docs/access/chapter8.txt 14:58:03.877 /csi/csi/council/freedom.html 14:58:03.409 99.1.10.17899.1.10.17852 1234 52 /docs/access/chapter8.txt 248.180.3.40 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 52 1234 14:58:03.409 /docs/access/chapter8.txt 14:58:03.877 /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 52 1234 14:58:03.877 14:58:03.409 /docs/access/chapter8.txt /csi/csi/council/freedom.html 99.1.10.178 248.180.3.40 52 1234 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52 14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52 5 Analytics
  • 6. Combining “big” and “real-time” is hard Live & historical Drill downs Trends... aggregates... and roll ups 6 Analytics
  • 7. Solution Con Scalability $$$ Not realtime Spartan query semantics => complex, DIY solutions 7 Analytics
  • 8. Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for? 8 Analytics
  • 9. Analytics counter updates Click stream events Acunu Sensor data Analytics etc • Aggregate incrementally, on the fly • Store live + historical aggregates
  • 10. { time : TIME(HOUR; MIN; SEC), page : PATH(/), category : STRING, loadTime : LONG } { select : ["COUNT", "AVG(loadTime)"], where : “time, ?path”, group : “time, ?category” } 10 Analytics
  • 11. Dashboard UI 11 Analytics
  • 12. Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for? 12 Analytics
  • 13. count grouped by ... day count distinct (session) count ... geography avg(duration) ... browser 13 Analytics
  • 14. time : TIME(HOUR; MIN; SEC), cust_id : LONG, Data session_id : LONG, Definition geography : STRING, browser : STRING, load_time : LONG { select: “COUNT” patterns: [ { where : “?time”, group : “?time” }, Query { where : “”, group : “geography” }, { where : “”, group : “browser” } Patterns ] }, { select: [“COUNT_DISTINCT(session_id)”, “AVG(load_time)”], where: “time”, group: “” } 14 Analytics
  • 15. 21:00 all→1345 :00→45 :01→62 :02→87 ... 22:00 all→3221 :00→22 :00→19 :02→104 ... { cust_id: user01, ... ... session_id: 102, UK all→228 user01→1 user14→12 user99→7 ... geography: UK, US all→354 user01→4 user04→8 user56→17 ... browser: IE, time: 22:02, ... } UK, 22:00 all→1904 ... ∅ all→87314 UK→238 US→354 ... 15 Analytics
  • 16. 21:00 all→1345 :00→45 :01→62 :02→87 ... 22:00 all→3222 :00→22 :00→19 :02→105 ... { cust_id: user01, ... ... session_id: 102, UK all→229 user01→2 user14→12 user99→7 ... geography: UK, US all→354 user01→4 user04→8 user56→17 ... browser: IE, time: 22:02, ... } UK, 22:00 all→1905 ... ∅ all→87315 UK→239 US→354 ... 16 Analytics
  • 17. 21:00 all→1345 :00→45 :01→62 :02→87 ... 22:00 all→3221 :00→22 :00→19 :02→104 ... ... ... UK all→228 user01→1 user14→12 user99→7 ... US all→354 user01→4 user04→8 user56→17 ... ... UK, 22:00 all→1904 ... ∅ all→87314 UK→238 US→354 ... 17 Analytics
  • 18. where time 21:00-22:00 count(*) 21:00 all→1345 :00→45 :01→62 :02→87 ... 22:00 all→3222 :00→22 :01→19 :02→105 ... ... ... UK all→229 user01→2 user14→12 user99→7 ... US all→354 user01→4 user04→8 user56→17 ... ... UK, 22:00 all→1905 ... ∅ all→87315 UK→239 US→354 ... 18 Analytics
  • 19. where time 21:00-22:00 count(*) 21:00 all→1345 :00→45 :01→62 :02→87 ... where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ... group by minute ... ... UK all→229 user01→2 user14→12 user99→7 ... US all→354 user01→4 user04→8 user56→17 ... ... UK, 22:00 all→1905 ... ∅ all→87315 UK→239 US→354 ... 19 Analytics
  • 20. where time 21:00-22:00 count(*) 21:00 all→1345 :00→45 :01→62 :02→87 ... where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ... group by minute ... ... UK all→229 user01→2 user14→12 user99→7 ... where geography=UK US all→354 user01→4 user04→8 user56→17 ... group all by user, ... UK, 22:00 all→1905 ... ∅ all→87315 UK→239 US→354 ... 20 Analytics
  • 21. where time 21:00-22:00 count(*) 21:00 all→1345 :00→45 :01→62 :02→87 ... where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ... group by minute ... ... UK all→229 user01→2 user14→12 user99→7 ... where geography=UK US all→354 user01→4 user04→8 user56→17 ... group all by user, ... UK, 22:00 all→1905 ... count all ∅ all→87315 UK→239 US→354 ... 21 Analytics
  • 22. where time 21:00-22:00 count(*) 21:00 all→1345 :00→45 :01→62 :02→87 ... where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ... group by minute ... ... UK all→229 user01→2 user14→12 user99→7 ... where geography=UK US all→354 user01→4 user04→8 user56→17 ... group all by user, ... UK, 22:00 all→1905 ... count all ∅ all→87315 UK→239 US→354 ... group all by geo 22 Analytics
  • 23. Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for? 23 Analytics
  • 24. Approximate Analytics Exact Real-time Large Scale 24 Analytics
  • 25. Count Distinct Plan A: keep a list of all the things you’ve seen count them at query time Quick to update ... but at scale ... Takes lots of space Takes a long time to query 25 Analytics
  • 26. Approximate Distinct max # leading zeroes seen so far item hash leading zeroes max so far x 00101001110... 2 2 y 11010100111... 0 2 z 00011101011... 3 3 ... ... to see a max of M takes about 2M items 26 Analytics
  • 27. Approximate Distinct to reduce var, average over m=2k sub-streams item hash index, zeroes max so far x 00101001110... 0, 0 0,0,0,0 y 11010100111... 3, 1 0,0,1,0 z 00011101011... 0, 1 1,0,1,0 ... take the harmonic mean 27 Analytics
  • 28. Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for? 28 Analytics
  • 29. Was it worth it? 29 Analytics
  • 30. What’s Coming? • Ad Hoc: same queries, but without the need to pre-define them • Geolocation: support for location-based events and queries • Drill down: see the events that make up any given aggregate 30 Analytics
  • 31. Motivation / alternatives • What is it? • How does it work? • Approximate Analytics • Whats it good for? 31 Analytics
  • 32. Manufacturing Social Media Ad Analytics Systems Financial Oil + Gas Monitoring Services Analytics
  • 33. “Up and running in about 4 hours” “We found out a competitor was scraping our data” “We keep discovering use cases we hadn’t thought of ” Analytics
  • 35. www.acunu.com @acunu Apache, Apache Cassandra, Cassandra, Hadoop, and the eye and elephant logos are trademarks of the Apache Software Foundation. 35 Analytics