SlideShare a Scribd company logo
Five factors to consider when
choosing a big data solution!
Jonathan Ellis
CTO, DataStax
Project Chair, Apache Cassandra
how do I



 my application?
                 model

©2012 DataStax
Popular options
  • Key/value
  • Tabular
  • Document
  • Graph?




©2012 DataStax
Schema is your friend

{
         "id": "e451dd42-ece3-11e1-a0a3-34159e154f4c",
         "name": "jbellis",
         "state": "TX",
         "birthdate": "1/1/1976",
         "email_addresses": ["jbellis@gmail", "jbellis@datastax.com"],
}




    ©2012 DataStax
SQL can be your friend too

 CREATE TABLE users (
    id uuid PRIMARY KEY,
    name text,
    state text,
    birth_date date
 );



 CREATE INDEX ON users(state);

 SELECT * FROM users
 WHERE state=‘Texas’ AND birth_date > ‘1950-01-01’;




©2012 DataStax
Collections
 CREATE TABLE users (
    id uuid PRIMARY KEY,
    name text,
    state text,
    birth_date date
 );

 CREATE TABLE users_addresses (
    user_id uuid REFERENCES users,
    email text
 );

 SELECT *
 FROM users NATURAL JOIN users_addresses;




©2012 DataStax
Collections
 CREATE TABLE users (
    id uuid PRIMARY KEY,
    name text,
    state text,




                 X
    birth_date date
 );

 CREATE TABLE users_addresses (
    user_id uuid REFERENCES users,
    email text
 );

 SELECT *
 FROM users NATURAL JOIN users_addresses;




©2012 DataStax
Collections
 CREATE TABLE users (
    id uuid PRIMARY KEY,
    name text,
    state text,
    birth_date date,
    email_addresses set<text>
 );

 UPDATE users
 SET email_addresses = email_addresses + {‘jbellis@gmail.com’,
 ‘jbellis@datastax.com’};




©2012 DataStax
Joins don’t scale
  • No joins
  • No subqueries
  • No aggregation functions* or GROUP BY
  • ORDER BY?




©2012 DataStax
SELECT * FROM tweets
WHERE user_id IN (SELECT follower FROM followers
                  WHERE user_id = ’driftx’)

                       followers




                  ?




 ©2012 DataStax
                                    tweets
Clustering in Cassandra
CREATE TABLE timeline (     user_id   tweet_id   _author    _body
  user_id uuid,
  tweet_id timeuuid,        jbellis   3290f9da.. rbranson   lorem
  tweet_author uuid,        jbellis   3895411a..   tjake    ipsum
   tweet_body text,           ...         ...        ...
  PRIMARY KEY (user_id,
                tweet_id)   driftx    3290f9da.. rbranson   lorem
);
                            driftx    71b46a84.. yzhang     dolor
                              ...         ...       ...


                            yukim     3290f9da.. rbranson   lorem
                            yukim     e451dd42..   tjake     amet
                              ...         ...        ...



 ©2012 DataStax
Clustering in Cassandra
CREATE TABLE timeline (     user_id   tweet_id   _author    _body
  user_id uuid,
  tweet_id timeuuid,        jbellis   3290f9da.. rbranson   lorem
  tweet_author uuid,        jbellis   3895411a..   tjake    ipsum
   tweet_body text,           ...         ...        ...
  PRIMARY KEY (user_id,
                tweet_id)   driftx    3290f9da.. rbranson   lorem
);
                            driftx    71b46a84.. yzhang     dolor
                              ...         ...       ...
SELECT * FROM timeline
WHERE user_id = ’driftx’;   yukim     3290f9da.. rbranson   lorem
                            yukim     e451dd42..   tjake     amet
                              ...         ...        ...



 ©2012 DataStax
how does it

                 perform?

©2012 DataStax
Larger than memory datasets




©2012 DataStax
Locking




©2012 DataStax
Efficiency




©2012 DataStax
UPDATE users
 SET email_addresses = email_addresses + {...}
 WHERE user_id = ‘jbellis’;




©2012 DataStax
Durability




©2012 DataStax
C* storage engine very briefly
           write( k1 , c1:v1 )

                                              Memory




                                 Memtable




         Commit log


©2012 DataStax                              Hard drive
write( k1 , c1:v1 )

                                                         Memory
                                 k1 c1:v1




                                            Memtable



                 k1 c1:v1




         Commit log


©2012 DataStax                                         Hard drive
write( k1 , c2:v2 )

                                                    Memory
                                 k1 c1:v1 c2:v2




                 k1 c1:v1
                 k1 c2:v2




©2012 DataStax                                    Hard drive
write(        k2   ,   c1:v1 c2:v2   )

                                                                        Memory
                                                     k1 c1:v1 c2:v2

                                                     k2 c1:v1 c2:v2




                   k1 c1:v1
                   k1 c2:v2
                 k2 c1:v1 c2:v2




©2012 DataStax                                                        Hard drive
write(        k1   ,   c1:v4 c3:v3   )

                                                                              Memory
                                                     k1 c1:v4 c2:v2 c3:v3

                                                     k2 c1:v1 c2:v2




                   k1 c1:v1
                   k1 c2:v2
                 k2 c1:v1 c2:v2
             k1 c1:v4 c3:v3




©2012 DataStax                                                              Hard drive
Memory




                           flush




                                  index
                 cleanup    k1 c1:v4 c2:v2 c3:v3

                            k2 c1:v1 c2:v2


                                                   SSTable




©2012 DataStax                                               Hard drive
No random writes




©2012 DataStax
reads/s            writes/s

                                                                       35000



                                                                      30000


                                                                     25000


                                                                    20000


                                                                   15000


                                                                   10000

                                                               5000
                 Cassandra 0.6
                                                               0
©2012 DataStax
                                           Cassandra 1.0
how does it handle

                 failure?

©2012 DataStax
Classic partitioning with SPOF
                 partition 1   partition 2      partition 3   partition 4




                                         router


                                             client
©2012 DataStax
Availability
  • “High availability implies that a single fault will not bring
            down your system. Not ‘we’ll recover quickly.’”
            -- Ben Coverston: DataStax

     •      “The biggest problem with failover is that you're almost
            never using it until it really hurts. It's like backups that
            you never test.”
            -- Rick Branson: Instagram




©2012 DataStax
Fully distributed, no SPOF
                 client




                          p3
                                p6        p1
                           p1




                                     p1




©2012 DataStax
Multiple datacenters




©2012 DataStax
©2012 DataStax
how does it

                 scale?

©2012 DataStax
Scaling antipatterns
  • Metadata servers
  • Router bottlenecks
  • Overloading existing nodes when adding capacity




©2012 DataStax
©2012 DataStax
how


 is it?
                 flexible

©2012 DataStax
36
Data model: Realtime
     LiveStocks      stock       last
                    GOOG        $95.52
                     AAPL      $186.10
                    AMZN       $112.98


       Portfolios    user       stock       shares
                    jbellis     GOOG          80
                    jbellis     LNKD          20
                    yukim       AMZN         100

      StockHist     stock        date       price
                    GOOG      2011-01-01    $8.23
                    GOOG      2011-01-02    $6.14
                    GOOG      2011-001-03   $7.78
©2012 DataStax
Data model: Analytics
 HistLoss                     worst_date    loss
                 Portfolio1   2011-07-23   -$34.81
                 Portfolio2   2011-03-11 -$11432.24
                 Portfolio3   2011-05-21 -$1476.93




©2012 DataStax
Data model: Analytics
  10dayreturns
          stock      rdate     return
          GOOG    2011-07-25   $8.23
          GOOG    2011-07-24   $6.14
          GOOG    2011-07-23   $7.78
          AAPL    2011-07-25   $15.32
          AAPL    2011-07-24   $12.68


     INSERT OVERWRITE TABLE 10dayreturns
     SELECT a.stock,
            b.date as rdate,
            b.price - a.price
     FROM StockHist a
     JOIN StockHist b
     ON (a.stock = b.stock
         AND date_add(a.date, 10) = b.date);

©2012 DataStax
Data model: Analytics
  portfolio_returns
            portfolio       rdate      preturn
            Portfolio1   2011-07-25    $118.21
            Portfolio1   2011-07-24     $60.78
            Portfolio1   2011-07-23    -$34.81
            Portfolio2   2011-07-25   $2143.92
            Portfolio3   2011-07-24    -$10.19


       INSERT OVERWRITE TABLE portfolio_returns
       SELECT portfolio,
              rdate,
              SUM(b.return)
       FROM portfolios a JOIN 10dayreturns b
       ON (a.stock = b.stock)
       GROUP BY portfolio, rdate;

©2012 DataStax
Data model: Analytics
  HistLoss
                       worst_date    loss
          Portfolio1   2011-07-23   -$34.81
          Portfolio2   2011-03-11 -$11432.24
          Portfolio3   2011-05-21 -$1476.93



    INSERT OVERWRITE TABLE HistLoss
    SELECT a.portfolio, rdate, minp
    FROM (
      SELECT portfolio, min(preturn) as minp
      FROM portfolio_returns
      GROUP BY portfolio
    ) a
    JOIN portfolio_returns b
    ON (a.portfolio = b.portfolio and a.minp = b.preturn);

©2012 DataStax
42
Some Cassandra users




©2012 DataStax
Questions?

Image credits
•    http://www.flickr.com/photos/26817893@N05/2573006312/

•    http://www.flickr.com/photos/rowanbank/7686239548

•    http://www.flickr.com/photos/mervtheswerve/6081933265

•    http://www.flickr.com/photos/dg_pics/2526208830

•    http://www.flickr.com/photos/wainwright/351684037

•    http://www.flickr.com/photos/mikeneilson/1606662529

•    http://www.flickr.com/photos/sbisson/3852905534

•    http://www.flickr.com/photos/breadnbadger/2674928517

More Related Content

PDF
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
PDF
Cassandra 1.1
PDF
Cassandra at NoSql Matters 2012
PDF
Cassandra Summit 2013 Keynote
PDF
Intro to the Hadoop Stack @ April 2011 JavaMUG
PPSX
Exchange 2010 ha ctd
PDF
How Do I Cassandra?
PDF
MySQL データ分割入門
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
Cassandra 1.1
Cassandra at NoSql Matters 2012
Cassandra Summit 2013 Keynote
Intro to the Hadoop Stack @ April 2011 JavaMUG
Exchange 2010 ha ctd
How Do I Cassandra?
MySQL データ分割入門

What's hot (17)

PDF
Cassandra presentation at NoSQL
PPTX
Advanced Windows Debugging
PDF
Cassandra summit keynote 2014
PDF
Ben Coverston - The Apache Cassandra Project
PPTX
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
PDF
Tokyo cassandra conference 2014
PDF
Deployment in Oracle SOA Suite and in Oracle BPM Suite
PDF
Grails 2.0 Update
PDF
Introduction to NoSQL and Couchbase
PDF
Akiban Technologies: Renormalize
PDF
The Native NDB Engine for Memcached
PDF
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
PDF
Cassandra11
PDF
Cassandra 2.1
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
What You Need to Know to Move from a Relational to a NoSQL Database
PPT
Advanced queuinginternals
Cassandra presentation at NoSQL
Advanced Windows Debugging
Cassandra summit keynote 2014
Ben Coverston - The Apache Cassandra Project
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
Tokyo cassandra conference 2014
Deployment in Oracle SOA Suite and in Oracle BPM Suite
Grails 2.0 Update
Introduction to NoSQL and Couchbase
Akiban Technologies: Renormalize
The Native NDB Engine for Memcached
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
Cassandra11
Cassandra 2.1
Introduction to Data Modeling with Apache Cassandra
What You Need to Know to Move from a Relational to a NoSQL Database
Advanced queuinginternals
Ad

Similar to Top five questions to ask when choosing a big data solution (20)

PDF
State of Cassandra 2012
PDF
Massively Scalable NoSQL with Apache Cassandra
PPTX
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
PDF
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
PDF
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
PPT
Toronto jaspersoft meetup
PDF
On Cassandra Development: Past, Present and Future
PDF
Cassandra and Solid State Drives
KEY
Taming Cassandra
PDF
Progressive NOSQL: Cassandra
PDF
Cacheconcurrencyconsistency cassandra svcc
PDF
Cassandra - An Introduction
PDF
Cassandra structured storage system over a p2 p network
PDF
Cassandra Nosql
KEY
Fosdem 2012
PDF
MyCassandra (Full English Version)
PDF
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
KEY
33rd degree conference
PPT
Cassandra NoSQL
PPTX
Sql rally 2013 columnstore indexes
State of Cassandra 2012
Massively Scalable NoSQL with Apache Cassandra
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
Toronto jaspersoft meetup
On Cassandra Development: Past, Present and Future
Cassandra and Solid State Drives
Taming Cassandra
Progressive NOSQL: Cassandra
Cacheconcurrencyconsistency cassandra svcc
Cassandra - An Introduction
Cassandra structured storage system over a p2 p network
Cassandra Nosql
Fosdem 2012
MyCassandra (Full English Version)
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
33rd degree conference
Cassandra NoSQL
Sql rally 2013 columnstore indexes
Ad

More from jbellis (20)

PPTX
Vector Search @ sw2con for slideshare.pptx
PDF
Five Lessons in Distributed Databases
PDF
Data day texas: Cassandra and the Cloud
PDF
Cassandra Summit 2015
PDF
Cassandra Summit EU 2013
PDF
London + Dublin Cassandra 2.0
PDF
Pycon 2012 What Python can learn from Java
PDF
Apache Cassandra: NoSQL in the enterprise
PDF
Cassandra at High Performance Transaction Systems 2011
PDF
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
PDF
What python can learn from java
PDF
State of Cassandra, 2011
PDF
Brisk: more powerful Hadoop powered by Cassandra
PDF
PyCon 2010 SQLAlchemy tutorial
PDF
Cassandra 0.7, Los Angeles High Scalability Group
PDF
Cassandra devoxx 2010
PDF
Cassandra FrOSCon 10
PDF
State of Cassandra, August 2010
PDF
Cassandra nosql eu 2010
PDF
What every developer should know about database scalability, PyCon 2010
Vector Search @ sw2con for slideshare.pptx
Five Lessons in Distributed Databases
Data day texas: Cassandra and the Cloud
Cassandra Summit 2015
Cassandra Summit EU 2013
London + Dublin Cassandra 2.0
Pycon 2012 What Python can learn from Java
Apache Cassandra: NoSQL in the enterprise
Cassandra at High Performance Transaction Systems 2011
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
What python can learn from java
State of Cassandra, 2011
Brisk: more powerful Hadoop powered by Cassandra
PyCon 2010 SQLAlchemy tutorial
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra devoxx 2010
Cassandra FrOSCon 10
State of Cassandra, August 2010
Cassandra nosql eu 2010
What every developer should know about database scalability, PyCon 2010

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation theory and applications.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Understanding_Digital_Forensics_Presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
CIFDAQ's Market Insight: SEC Turns Pro Crypto

Top five questions to ask when choosing a big data solution

  • 1. Five factors to consider when choosing a big data solution! Jonathan Ellis CTO, DataStax Project Chair, Apache Cassandra
  • 2. how do I my application? model ©2012 DataStax
  • 3. Popular options • Key/value • Tabular • Document • Graph? ©2012 DataStax
  • 4. Schema is your friend { "id": "e451dd42-ece3-11e1-a0a3-34159e154f4c", "name": "jbellis", "state": "TX", "birthdate": "1/1/1976", "email_addresses": ["jbellis@gmail", "jbellis@datastax.com"], } ©2012 DataStax
  • 5. SQL can be your friend too CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > ‘1950-01-01’; ©2012 DataStax
  • 6. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses; ©2012 DataStax
  • 7. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, X birth_date date ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses; ©2012 DataStax
  • 8. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date, email_addresses set<text> ); UPDATE users SET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’}; ©2012 DataStax
  • 9. Joins don’t scale • No joins • No subqueries • No aggregation functions* or GROUP BY • ORDER BY? ©2012 DataStax
  • 10. SELECT * FROM tweets WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ’driftx’) followers ? ©2012 DataStax tweets
  • 11. Clustering in Cassandra CREATE TABLE timeline ( user_id tweet_id _author _body   user_id uuid,   tweet_id timeuuid, jbellis 3290f9da.. rbranson lorem   tweet_author uuid, jbellis 3895411a.. tjake ipsum tweet_body text, ... ... ...   PRIMARY KEY (user_id, tweet_id) driftx 3290f9da.. rbranson lorem ); driftx 71b46a84.. yzhang dolor ... ... ... yukim 3290f9da.. rbranson lorem yukim e451dd42.. tjake amet ... ... ... ©2012 DataStax
  • 12. Clustering in Cassandra CREATE TABLE timeline ( user_id tweet_id _author _body   user_id uuid,   tweet_id timeuuid, jbellis 3290f9da.. rbranson lorem   tweet_author uuid, jbellis 3895411a.. tjake ipsum tweet_body text, ... ... ...   PRIMARY KEY (user_id, tweet_id) driftx 3290f9da.. rbranson lorem ); driftx 71b46a84.. yzhang dolor ... ... ... SELECT * FROM timeline WHERE user_id = ’driftx’; yukim 3290f9da.. rbranson lorem yukim e451dd42.. tjake amet ... ... ... ©2012 DataStax
  • 13. how does it perform? ©2012 DataStax
  • 14. Larger than memory datasets ©2012 DataStax
  • 17. UPDATE users SET email_addresses = email_addresses + {...} WHERE user_id = ‘jbellis’; ©2012 DataStax
  • 19. C* storage engine very briefly write( k1 , c1:v1 ) Memory Memtable Commit log ©2012 DataStax Hard drive
  • 20. write( k1 , c1:v1 ) Memory k1 c1:v1 Memtable k1 c1:v1 Commit log ©2012 DataStax Hard drive
  • 21. write( k1 , c2:v2 ) Memory k1 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2 ©2012 DataStax Hard drive
  • 22. write( k2 , c1:v1 c2:v2 ) Memory k1 c1:v1 c2:v2 k2 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2 k2 c1:v1 c2:v2 ©2012 DataStax Hard drive
  • 23. write( k1 , c1:v4 c3:v3 ) Memory k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 k1 c1:v1 k1 c2:v2 k2 c1:v1 c2:v2 k1 c1:v4 c3:v3 ©2012 DataStax Hard drive
  • 24. Memory flush index cleanup k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 SSTable ©2012 DataStax Hard drive
  • 26. reads/s writes/s 35000 30000 25000 20000 15000 10000 5000 Cassandra 0.6 0 ©2012 DataStax Cassandra 1.0
  • 27. how does it handle failure? ©2012 DataStax
  • 28. Classic partitioning with SPOF partition 1 partition 2 partition 3 partition 4 router client ©2012 DataStax
  • 29. Availability • “High availability implies that a single fault will not bring down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston: DataStax • “The biggest problem with failover is that you're almost never using it until it really hurts. It's like backups that you never test.” -- Rick Branson: Instagram ©2012 DataStax
  • 30. Fully distributed, no SPOF client p3 p6 p1 p1 p1 ©2012 DataStax
  • 33. how does it scale? ©2012 DataStax
  • 34. Scaling antipatterns • Metadata servers • Router bottlenecks • Overloading existing nodes when adding capacity ©2012 DataStax
  • 36. how is it? flexible ©2012 DataStax
  • 37. 36
  • 38. Data model: Realtime LiveStocks stock last GOOG $95.52 AAPL $186.10 AMZN $112.98 Portfolios user stock shares jbellis GOOG 80 jbellis LNKD 20 yukim AMZN 100 StockHist stock date price GOOG 2011-01-01 $8.23 GOOG 2011-01-02 $6.14 GOOG 2011-001-03 $7.78 ©2012 DataStax
  • 39. Data model: Analytics HistLoss worst_date loss Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-03-11 -$11432.24 Portfolio3 2011-05-21 -$1476.93 ©2012 DataStax
  • 40. Data model: Analytics 10dayreturns stock rdate return GOOG 2011-07-25 $8.23 GOOG 2011-07-24 $6.14 GOOG 2011-07-23 $7.78 AAPL 2011-07-25 $15.32 AAPL 2011-07-24 $12.68 INSERT OVERWRITE TABLE 10dayreturns SELECT a.stock, b.date as rdate, b.price - a.price FROM StockHist a JOIN StockHist b ON (a.stock = b.stock AND date_add(a.date, 10) = b.date); ©2012 DataStax
  • 41. Data model: Analytics portfolio_returns portfolio rdate preturn Portfolio1 2011-07-25 $118.21 Portfolio1 2011-07-24 $60.78 Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-07-25 $2143.92 Portfolio3 2011-07-24 -$10.19 INSERT OVERWRITE TABLE portfolio_returns SELECT portfolio, rdate, SUM(b.return) FROM portfolios a JOIN 10dayreturns b ON (a.stock = b.stock) GROUP BY portfolio, rdate; ©2012 DataStax
  • 42. Data model: Analytics HistLoss worst_date loss Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-03-11 -$11432.24 Portfolio3 2011-05-21 -$1476.93 INSERT OVERWRITE TABLE HistLoss SELECT a.portfolio, rdate, minp FROM ( SELECT portfolio, min(preturn) as minp FROM portfolio_returns GROUP BY portfolio ) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn); ©2012 DataStax
  • 43. 42
  • 45. Questions? Image credits • http://www.flickr.com/photos/26817893@N05/2573006312/ • http://www.flickr.com/photos/rowanbank/7686239548 • http://www.flickr.com/photos/mervtheswerve/6081933265 • http://www.flickr.com/photos/dg_pics/2526208830 • http://www.flickr.com/photos/wainwright/351684037 • http://www.flickr.com/photos/mikeneilson/1606662529 • http://www.flickr.com/photos/sbisson/3852905534 • http://www.flickr.com/photos/breadnbadger/2674928517