SlideShare a Scribd company logo
Increasing Your Prospects: Cassandra in
                      Online Advertising
                                                                          Let 'em know: #cassandra12




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
A little about what we do




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Impressions look like…




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
A High Level look at RTB




           1.    Browsers visit Publishers and create
           2. impressions. sell impressions via Exchanges.
                 Publishers
           3.    Exchanges serve as auction houses for the
              impressions.
           4.                           M6d bids on impression. If we in we display an
                          ad.

© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Key Cassandra features
         •          Horizontal scalability
                     ●
                        More nodes more storage
                          ●
                                    More nodes more throughput
         •          Cassandra is a high availability solution
                     ●
                       Almost all changes can be made at run time
                          ●
                                    Rolling updates
                          ●
                                    Survives node failures
         •          One configuration file




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Key storage model features
         •          Type Validation give us creature comforts
                      Help prevent insertion of bad data
                         – Columns named 'age' should be a number

                        Make data easier to read and write for end users
                        Encourage/Enforce storage in terse format
                                         –        Store 478 as 478 not “478”
         •          Rows do not need to have fixed columns
         •          Writes do not read
         •          Optimal for set/get/slice operations




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Things I have learned on the presentation
                                          circuit
         •          Gratuitous use of Meme Generator (tx Nathan)
         •          Gratuitous buzzwords for maximum tweet-ability
                     ●
                        Big Data
                          ●
                                    Real Time analytics
                          ●
                                    Cloud
                          ●
                                    Web scale
         •          Make prolific statements that contradict current software
                    trends (tx Dean)


         •          Attempted Prolific Statement: Transactions and locking are
                    highly overrated



© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Signal De-duplication and
                                                                   frequency capping
         •          Solution must be “web-scale”
                     ●
                        billions of users
                          ●
                                    one->thousands of events per user
         •          Solution must record events
         •          Do not store the same event N times a minute
                          ●
                                    Control data growth
                                         –        Spiders, nagios, pathological cases
                                         –        Small statistical difference in signal
                                                        ●
                                                               An action 10 times a day vs 1 time a minute




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
What this would look like




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
'?' Solution with transactions
                                                                     and locking

                                                                           ●
                                                                               Likely need scalable
                                                                               redundant lock layer
                                                                               ●
                                                                                   Built in locks are not free
                                                                           ●
                                                                               Lots of code
                                                                           ●
                                                                               Lots of sockets
                                                                           ●
                                                                               Likely need to read to write
                                                                               ●
                                                                                   Results in more nodes or
                                                                                   caching layer for disk io




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Remember with Cassandra...
         •          Rows have one to many columns
         •          Column is composed of { name, value, timstamp }
                     ●
                        If two columns have the same name > timestamp wins
         •          Memtables absorb overwrites
         •          Writes are fast
                          ●
                                    Sorted structure in memory
                          ●
                                    Commit log to disk
         •          Log-structured storage prunes old values and deletes
         •          No reads on write path




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
12


                                                                               Cassandr'ified solution




     © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Consistent Hashing distributes data




                       ●
                                 Random Partitioner rows keys are MD5 to locate node
                                      –        Results in even distribution of rows across nodes
                                      –        Limits/Removes hot spots
                       ●
                                 Big Data is not so big when you have N nodes attack it
                  * Wife asked me if diagram above was a flag. Pledge your allegiance to the United Nodes of Big Data



© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Memtables absorb overwrites




                       ●
                                 Memtables give de-duplication for free
                                      –        Large memtable has larger chance of absorbing a write
                       ●
                                 This solves our original requirement:
                                      –        Do not store the same event N-times per interval
                       ●
                                 Worst-case data written to disk N-times and compacted away
                       ●
                                 Automatically de-duplicate on read with last-update-wins rule
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Casandra & stream processing as an
                                            alternative to ETL
                       ●
                                 ETL (Extract,Transform,Load) is a useful paradigm
                       ●
                                 Batch process can be obtuse
                                      –        Processes with long startup
                                      –        Little support for Appends, inserts, updates
                                      –        Throughput issues for small files
                       ●
                                 Difficult for small windows of time
                       ●
                                 Overhead from MapReduce
                       ●
                                 Sample scenario breakdown of state, city, and count




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
City, State, count(1) in ETL system




         ●
                    Several phases / copies
         ●
                    Storing the entire log to build/rebuild aggregation
         ●
                    Difficult to do on small intervals
         ●
                    Needs scheduling, needs log push system




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
City, State, count(1) stream system




           ●
                     Could use Cassandra's counter feature directly
           ●
                     Added Apache Kafka layer
                        ●
                                  Decouples producers and consumers
                        ●
                                  Allows message replay
                        ●
                                  Allows backlog and recover from failures (never happens btw)
                        ●
                                  Near real time



© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
An application to search logs
                                                                          ●
                                                                              In 2008 this article sold
                                                                              me on map reduce
                                                                          ●
                                                                              Take logs from all servers
                                                                          ●
                                                                              Put them into hadoop
                                                                          ●
                                                                              Generate lucene indexes
                                                                          ●
                                                                              Load into sharded SOLR
                                                                              cluster on interval




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Pseudo diagram of solution



                                                                            ●
                                                                                Process to get files from
                                                                                servers into hadoop
                                                                            ●
                                                                                MapReduce process to build
                                                                                indexes
                                                                            ●
                                                                                Embedded SOLR on Hadoop
                                                                                Datanodes




* Go here for real story: http://www.slideshare.net/schubertzhang/case-study-how-rackspace-query-terabytes-of-data-2400928




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
But now its the future!
                       ●
                                 Every component or layer of an architecture is another
                                 thing document and manage
                       ●
                                 DataStax has built SOLR into Cassandra
                       ●
                                 Applications can write to solr/cassandra directly
                       ●
                                 Applications can read solr/cassandra directly




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Ah ha! moment

          ●
                    Determined the rackspace log application could be done
                    with simple pieces
          ●
                    Someone called it Taco Bell Programming
                    'The more I write code and design systems, the more I
                    understand that many times, you can achieve the desired
                    functionality simply with clever reconfigurations of the basic
                    Unix tool set. After all, functionality is an asset, but code is a
                    liability.
          ●
                    Cassandra is my main taco ingredient




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Prolific statement: Design stuff
                                                              with less arrows
          ●
                    More layers/components
          ●
                    Batch driven




         ●
                    Less layers/components
         ●
                    Low latency




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Solr has wide adoption
         ●
                    Clients for many programming languages
         ●
                    Many hip JQuery Ajax widgets and stuff
         ●
                    Open source Reuters Ajax Solr demo worked seamlessly with
                    cassandra/solr
         ●
                    Implemented Rackspace like solution with small code




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Game Changer: Compression
         ●
                    Main memory reference 100 ns 20x L2 cache, 200x L1 cache
         ●
                    Compress 1K bytes with Zippy 3,000 ns
         ●
                    Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
         ●
                    Read 4K randomly from SSD* 150,000 ns 0.15 ms
         ●
                    Read 1 MB sequentially from memory 250,000 ns 0.25 ms
         ●
                    Round trip within same datacenter 500,000 ns 0.5 ms
         ●
                    Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory
         ●
                    Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip
         ●
                    Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD




                                                              Source: https://gist.github.com/2841832
© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Why compression helps
         ●
                    Compressed data is smaller on disk
         ●
                    If we compress data more fits in RAM and is cached


         ●
                    Rotational disks:
                       ●
                                 Rotational disks have very slow seeks
                       ●
                                 RAM not used by process with cache disk


         ●
                    Solid State Disks do seek faster then rotational
                       ●
                                 But they are more expensive then rotationa l




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Enabling Compression
         ●
                    Rolling update to Cassandra
         ●
                    update column family my_stuff with
                    compression_options={sstable_compression:SnappyCompresso
                    r, chunk_length_kb:64};
         ●
                    bin/nodetool -h cdbla120 -p 8585 rebuildsstables my_stuff




         ●
                    68 GB of data shrinks to 36


© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Compression in action
         ●
                    Disk activity reduced drastically as more/all data fit in cache




         ●
                    Better performance
         ●
                    Disks that spin less should last longer




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Compression lessons
         ●
                    Creates extra CPU usage (but not really much)
         ●
                    Creates more young gen garbage (some)
         ●
                    Anecdotal experimentation with chunk_length_kb
                       ●
                                 64KB is good for sparse less frequent tables
                       ●
                                 16KB had same compression ratio and made less garbage
                       ●
                                 Found 4KB to be less effective then 16KB
         ●
                    This is easy to experiment with




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
We have reached the point of the
                                            presentation where we...




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Hate on everything not Cassandra




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Cassandra's uptime story
                       ●
                                 Main cluster in continuous operation since 8/6/11
                       ●
                                 Doubled physical nodes in the cluster
                       ●
                                 Upgraded Cassandra twice 0.7.7->0.8.6->1.0.7
                       ●
                                 Rolling reboot kernel update, 1 for leap second
                       ●
                                 No maintenance windows
                       ●
                                 Let's compare Cassandra with other things I use/used




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Cassandra vs MySQL master/slave...

                                                                          MySQL                      Cassandra
               Replication                                                Single thread, binlogs,    Per operation
                                                                          manual recovery
               Scaling                                                    Add more nodes, initial    Bootstrap new
                                                                          sync, setup replication,   Cassandra node, re-
                                                                          configure applications     balance off-peak
               Consistency                                                Applications that care     Per operation
                                                                          read master, or
                                                                          application check
                                                                          status of replication
               Backup                                                     Mysqldump/LVM              Sstabletojson |
                                                                          snapshot                   snapshot
               Restore                                                    Re-insert                  Copy files into place
                                                                          everything/Restore
                                                                          snapshot




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
So with mysql...
         ●
                    Replication breaking often
                       ●
                                 requiring manual intervention for many fixes
         ●
                    Blocking writes for 30 minutes to add a column to a table
         ●
                    Scale up to big iron then...
                       ●
                                 Restart takes 30 minutes to fsck all disks
         ●
                    Applications needing to be coded with state aware logic
                       ●
                                 Which node should I query?
                       ●
                                 Is replication behind?
                       ●
                                 Is there some merge table trickery going on?




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Cassandra vs Memcache

                                                                          Memcache                Cassandra
               Replication                                                None (client managed)   Per operation
               Scaling                                                    None (client managed)   Grow or shrink without
                                                                                                  bad reads
               Consistency                                                Yes (and really no)     Per operation
               Backup                                                     No persistence          sstabletojson|snapshot
               Restore                                                    No persistence          Cache warming




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
So memcache is...
         ●
                    Not persistent
         ●
                    Not clear on sharding
         ●
                    Not clear on failure modes
         ●
                    Actual experiences with memcache
                       ●
                                 Memcache client was not sharding requests evenly. 60 % were going to
                                 node 1..
                       ●
                                 We lost rack with 40% of the memcache nodes
                                      –        Site went to crawl as DB's were overloaded
                                      –        took 1 hour to warm up again




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Cassandra vs DRBD

                                                                            DRBD                     Cassandra
               Replication                                                  1 or 2 nodes per block   Per operation
               Scaling                                                      No scaling. Just more    Grow or shrink
                                                                            availability.            dynamically
               Consistency                                                  Sync modes change        Per operation
                                                                            failure consistency,
                                                                            deadtime between flip-
                                                                            flops
               Backup                                                       Like a disk              sstabletojson|snapshot
               Restore                                                      Like a disk              Like a disk




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
So DRBD is...
         ●
                    A 30 second to 1 minute fail over/outage
         ●
                    An alert that might wake you up
                       ●
                                 but hopefully allows you to sleep again
         ●
                    Handcuffed to linux-ha/keepalived etc
                       ●
                                 Making it an involved setup
                       ●
                                 Making it involved to troubleshoot
         ●
                    Might need a crossover cable or dedicated network
         ●
                    cpu/network intensive with very active disks
         ●
                    Can successfully fail over a data file in an inconsistent state




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Cassandra vs HDFS

                                                                           Hadoop               Cassandra
               Replication                                                 Per file             Per operation
               Scaling                                                     Add nodes            Add nodes

               Consistency                                                 Very, to the point   Per operation
                                                                           getting data in
                                                                           becomes difficult
               Backup                                                      Distcp               sstabletojson|snapshot
               Restore                                                     Distcp               Like a disk




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
So HDFS...
         ●
                    Comes up with about 4 or 5 reasons a year for master node/
                    full cluster restart
                       ●
                                 Grow NameNode heap
                       ●
                                 Enable jobtracker setting to stop 100,000 task jobs
                       ●
                                 Enabled/updated trash feature (off by default)
                       ●
                                 Forced to do a fail over by hardware fault
                       ●
                                 Random DRBD/Kernel brain fart
                       ●
                                 Need to update a JVM/kernel eventually
         ●
                    Now finally new versions have HA NameNode
         ●
                    Running jobs lose progress will not automatically restart




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
Questions?




© 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential

More Related Content

PPTX
Applying Boyd's OODA Loop Strategy to Drive IT Security Decision and Action
PPT
Introduction to cassandra
PDF
Meltwater Buzz Service Overview
PPTX
마이클 수업 과제2 1
PDF
33 deputados norte-americanos criticam o golpe
PDF
Practical eCommerce with WooCommerce
PDF
Kimpton Hotels Mid-Atlantic Region Brochure
KEY
Globalisation starter
Applying Boyd's OODA Loop Strategy to Drive IT Security Decision and Action
Introduction to cassandra
Meltwater Buzz Service Overview
마이클 수업 과제2 1
33 deputados norte-americanos criticam o golpe
Practical eCommerce with WooCommerce
Kimpton Hotels Mid-Atlantic Region Brochure
Globalisation starter

Viewers also liked (17)

PDF
Parking
PPTX
Working progress preliminary task
PPTX
The civil war, lincoln, lee
PDF
Slides open stack emily_updated_2
PPTX
Kanji from the Start - Unit 1 p12 spelling test
PDF
General Quiz (Finals) | Elixir '12
PPTX
PDF
長野市放課後子ども総合プラン有料化の方針
PPT
PDF
Exposicion redes sociales, buscadores, correos y paginas
PDF
PDF
Offshore Operations Maintenance[2]
PDF
A guide to selling and buying a business 1.0
PDF
Panorama economy 12 aprile 2012
PPT
Новогодний счастливый купон
PDF
Kuronen: Oppilas- ja opiskelijahuolto osaksi lasten ja nuorten hyvinvointisuu...
PDF
Alive Day - series of features
Parking
Working progress preliminary task
The civil war, lincoln, lee
Slides open stack emily_updated_2
Kanji from the Start - Unit 1 p12 spelling test
General Quiz (Finals) | Elixir '12
長野市放課後子ども総合プラン有料化の方針
Exposicion redes sociales, buscadores, correos y paginas
Offshore Operations Maintenance[2]
A guide to selling and buying a business 1.0
Panorama economy 12 aprile 2012
Новогодний счастливый купон
Kuronen: Oppilas- ja opiskelijahuolto osaksi lasten ja nuorten hyvinvointisuu...
Alive Day - series of features
Ad

Similar to M6d cassandra summit (20)

PDF
Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...
PPTX
Cloud as a Flexible & Collaborative Tool for Creators
PPTX
Data distribution in the cloud with Node.js
PDF
Master agile development and testing
PDF
Mobile Development Meets Semantic Technology
PDF
LSM 2011 AdaLabs presentation slides: How to make my business opensource & vi...
PDF
Santo Leto - MySQL Connect 2012 - Getting Started with Mysql Cluster
PDF
Integrating Big Data Technologies
PDF
“Startup - it’s not just an IT project” - a random sampling of problems we’ve...
PDF
Dynomite - PerconaLive 2017
PPTX
MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the E...
PDF
2011-05-22 Domain Driven Design
PDF
2011-05-22 Domain Driven Design
PPTX
Webinar: Designing a Storage Consolidation Strategy for Today, the Future and...
PPTX
Feedback on DDD Europe - short -event storming.pptx
KEY
Writing GREAT Agile User Stories
PDF
Dutch entrepreneurs visiting twago in Berlin
PPTX
Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12
PPTX
Enabling Edge-Cloud Duality of Time Series Data
PDF
Data Patterns
Achieving genuine elastic multitenancy with the Waratek Cloud VM for Java : J...
Cloud as a Flexible & Collaborative Tool for Creators
Data distribution in the cloud with Node.js
Master agile development and testing
Mobile Development Meets Semantic Technology
LSM 2011 AdaLabs presentation slides: How to make my business opensource & vi...
Santo Leto - MySQL Connect 2012 - Getting Started with Mysql Cluster
Integrating Big Data Technologies
“Startup - it’s not just an IT project” - a random sampling of problems we’ve...
Dynomite - PerconaLive 2017
MongoDB at NoSQL Now! 2012: Benefits and Challenges of Using MongoDB in the E...
2011-05-22 Domain Driven Design
2011-05-22 Domain Driven Design
Webinar: Designing a Storage Consolidation Strategy for Today, the Future and...
Feedback on DDD Europe - short -event storming.pptx
Writing GREAT Agile User Stories
Dutch entrepreneurs visiting twago in Berlin
Webinar- Simple and Cost-Effective Disaster Recovery in the Cloud - 7-19-12
Enabling Edge-Cloud Duality of Time Series Data
Data Patterns
Ad

More from Edward Capriolo (16)

PPT
Nibiru: Building your own NoSQL store
ODP
Web-scale data processing: practical approaches for low-latency and batch
ODP
Big data nyu
PPT
Cassandra4hadoop
ODP
Intravert Server side processing for Cassandra
ODP
Apache Kafka Demo
ODP
Cassandra NoSQL Lan party
PPTX
M6d cassandrapresentation
PPT
Breaking first-normal form with Hive
ODP
Casbase presentation
PPT
Hadoop Monitoring best Practices
PPT
Whirlwind tour of Hadoop and HIve
ODP
Cli deep dive
ODP
Cassandra as Memcache
PPT
Counters for real-time statistics
PPT
Real world capacity
Nibiru: Building your own NoSQL store
Web-scale data processing: practical approaches for low-latency and batch
Big data nyu
Cassandra4hadoop
Intravert Server side processing for Cassandra
Apache Kafka Demo
Cassandra NoSQL Lan party
M6d cassandrapresentation
Breaking first-normal form with Hive
Casbase presentation
Hadoop Monitoring best Practices
Whirlwind tour of Hadoop and HIve
Cli deep dive
Cassandra as Memcache
Counters for real-time statistics
Real world capacity

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
KodekX | Application Modernization Development
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Digital-Transformation-Roadmap-for-Companies.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Advanced methodologies resolving dimensionality complications for autism neur...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
KodekX | Application Modernization Development
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Unlocking AI with Model Context Protocol (MCP)
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Spectral efficient network and resource selection model in 5G networks
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Dropbox Q2 2025 Financial Results & Investor Presentation

M6d cassandra summit

  • 1. Increasing Your Prospects: Cassandra in Online Advertising Let 'em know: #cassandra12 © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 2. A little about what we do © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 3. Impressions look like… © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 4. A High Level look at RTB 1. Browsers visit Publishers and create 2. impressions. sell impressions via Exchanges. Publishers 3. Exchanges serve as auction houses for the impressions. 4. M6d bids on impression. If we in we display an ad. © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 5. Key Cassandra features • Horizontal scalability ● More nodes more storage ● More nodes more throughput • Cassandra is a high availability solution ● Almost all changes can be made at run time ● Rolling updates ● Survives node failures • One configuration file © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 6. Key storage model features • Type Validation give us creature comforts  Help prevent insertion of bad data – Columns named 'age' should be a number  Make data easier to read and write for end users  Encourage/Enforce storage in terse format – Store 478 as 478 not “478” • Rows do not need to have fixed columns • Writes do not read • Optimal for set/get/slice operations © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 7. Things I have learned on the presentation circuit • Gratuitous use of Meme Generator (tx Nathan) • Gratuitous buzzwords for maximum tweet-ability ● Big Data ● Real Time analytics ● Cloud ● Web scale • Make prolific statements that contradict current software trends (tx Dean) • Attempted Prolific Statement: Transactions and locking are highly overrated © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 8. Signal De-duplication and frequency capping • Solution must be “web-scale” ● billions of users ● one->thousands of events per user • Solution must record events • Do not store the same event N times a minute ● Control data growth – Spiders, nagios, pathological cases – Small statistical difference in signal ● An action 10 times a day vs 1 time a minute © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 9. What this would look like © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 10. '?' Solution with transactions and locking ● Likely need scalable redundant lock layer ● Built in locks are not free ● Lots of code ● Lots of sockets ● Likely need to read to write ● Results in more nodes or caching layer for disk io © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 11. Remember with Cassandra... • Rows have one to many columns • Column is composed of { name, value, timstamp } ● If two columns have the same name > timestamp wins • Memtables absorb overwrites • Writes are fast ● Sorted structure in memory ● Commit log to disk • Log-structured storage prunes old values and deletes • No reads on write path © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 12. 12 Cassandr'ified solution © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 13. Consistent Hashing distributes data ● Random Partitioner rows keys are MD5 to locate node – Results in even distribution of rows across nodes – Limits/Removes hot spots ● Big Data is not so big when you have N nodes attack it * Wife asked me if diagram above was a flag. Pledge your allegiance to the United Nodes of Big Data © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 14. Memtables absorb overwrites ● Memtables give de-duplication for free – Large memtable has larger chance of absorbing a write ● This solves our original requirement: – Do not store the same event N-times per interval ● Worst-case data written to disk N-times and compacted away ● Automatically de-duplicate on read with last-update-wins rule © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 15. Casandra & stream processing as an alternative to ETL ● ETL (Extract,Transform,Load) is a useful paradigm ● Batch process can be obtuse – Processes with long startup – Little support for Appends, inserts, updates – Throughput issues for small files ● Difficult for small windows of time ● Overhead from MapReduce ● Sample scenario breakdown of state, city, and count © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 16. City, State, count(1) in ETL system ● Several phases / copies ● Storing the entire log to build/rebuild aggregation ● Difficult to do on small intervals ● Needs scheduling, needs log push system © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 17. City, State, count(1) stream system ● Could use Cassandra's counter feature directly ● Added Apache Kafka layer ● Decouples producers and consumers ● Allows message replay ● Allows backlog and recover from failures (never happens btw) ● Near real time © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 18. An application to search logs ● In 2008 this article sold me on map reduce ● Take logs from all servers ● Put them into hadoop ● Generate lucene indexes ● Load into sharded SOLR cluster on interval © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 19. Pseudo diagram of solution ● Process to get files from servers into hadoop ● MapReduce process to build indexes ● Embedded SOLR on Hadoop Datanodes * Go here for real story: http://www.slideshare.net/schubertzhang/case-study-how-rackspace-query-terabytes-of-data-2400928 © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 20. But now its the future! ● Every component or layer of an architecture is another thing document and manage ● DataStax has built SOLR into Cassandra ● Applications can write to solr/cassandra directly ● Applications can read solr/cassandra directly © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 21. Ah ha! moment ● Determined the rackspace log application could be done with simple pieces ● Someone called it Taco Bell Programming 'The more I write code and design systems, the more I understand that many times, you can achieve the desired functionality simply with clever reconfigurations of the basic Unix tool set. After all, functionality is an asset, but code is a liability. ● Cassandra is my main taco ingredient © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 22. Prolific statement: Design stuff with less arrows ● More layers/components ● Batch driven ● Less layers/components ● Low latency © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 23. Solr has wide adoption ● Clients for many programming languages ● Many hip JQuery Ajax widgets and stuff ● Open source Reuters Ajax Solr demo worked seamlessly with cassandra/solr ● Implemented Rackspace like solution with small code © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 24. Game Changer: Compression ● Main memory reference 100 ns 20x L2 cache, 200x L1 cache ● Compress 1K bytes with Zippy 3,000 ns ● Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms ● Read 4K randomly from SSD* 150,000 ns 0.15 ms ● Read 1 MB sequentially from memory 250,000 ns 0.25 ms ● Round trip within same datacenter 500,000 ns 0.5 ms ● Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory ● Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip ● Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD Source: https://gist.github.com/2841832 © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 25. Why compression helps ● Compressed data is smaller on disk ● If we compress data more fits in RAM and is cached ● Rotational disks: ● Rotational disks have very slow seeks ● RAM not used by process with cache disk ● Solid State Disks do seek faster then rotational ● But they are more expensive then rotationa l © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 26. Enabling Compression ● Rolling update to Cassandra ● update column family my_stuff with compression_options={sstable_compression:SnappyCompresso r, chunk_length_kb:64}; ● bin/nodetool -h cdbla120 -p 8585 rebuildsstables my_stuff ● 68 GB of data shrinks to 36 © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 27. Compression in action ● Disk activity reduced drastically as more/all data fit in cache ● Better performance ● Disks that spin less should last longer © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 28. Compression lessons ● Creates extra CPU usage (but not really much) ● Creates more young gen garbage (some) ● Anecdotal experimentation with chunk_length_kb ● 64KB is good for sparse less frequent tables ● 16KB had same compression ratio and made less garbage ● Found 4KB to be less effective then 16KB ● This is easy to experiment with © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 29. We have reached the point of the presentation where we... © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 30. Hate on everything not Cassandra © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 31. Cassandra's uptime story ● Main cluster in continuous operation since 8/6/11 ● Doubled physical nodes in the cluster ● Upgraded Cassandra twice 0.7.7->0.8.6->1.0.7 ● Rolling reboot kernel update, 1 for leap second ● No maintenance windows ● Let's compare Cassandra with other things I use/used © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 32. Cassandra vs MySQL master/slave... MySQL Cassandra Replication Single thread, binlogs, Per operation manual recovery Scaling Add more nodes, initial Bootstrap new sync, setup replication, Cassandra node, re- configure applications balance off-peak Consistency Applications that care Per operation read master, or application check status of replication Backup Mysqldump/LVM Sstabletojson | snapshot snapshot Restore Re-insert Copy files into place everything/Restore snapshot © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 33. So with mysql... ● Replication breaking often ● requiring manual intervention for many fixes ● Blocking writes for 30 minutes to add a column to a table ● Scale up to big iron then... ● Restart takes 30 minutes to fsck all disks ● Applications needing to be coded with state aware logic ● Which node should I query? ● Is replication behind? ● Is there some merge table trickery going on? © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 34. Cassandra vs Memcache Memcache Cassandra Replication None (client managed) Per operation Scaling None (client managed) Grow or shrink without bad reads Consistency Yes (and really no) Per operation Backup No persistence sstabletojson|snapshot Restore No persistence Cache warming © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 35. So memcache is... ● Not persistent ● Not clear on sharding ● Not clear on failure modes ● Actual experiences with memcache ● Memcache client was not sharding requests evenly. 60 % were going to node 1.. ● We lost rack with 40% of the memcache nodes – Site went to crawl as DB's were overloaded – took 1 hour to warm up again © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 36. Cassandra vs DRBD DRBD Cassandra Replication 1 or 2 nodes per block Per operation Scaling No scaling. Just more Grow or shrink availability. dynamically Consistency Sync modes change Per operation failure consistency, deadtime between flip- flops Backup Like a disk sstabletojson|snapshot Restore Like a disk Like a disk © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 37. So DRBD is... ● A 30 second to 1 minute fail over/outage ● An alert that might wake you up ● but hopefully allows you to sleep again ● Handcuffed to linux-ha/keepalived etc ● Making it an involved setup ● Making it involved to troubleshoot ● Might need a crossover cable or dedicated network ● cpu/network intensive with very active disks ● Can successfully fail over a data file in an inconsistent state © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 38. Cassandra vs HDFS Hadoop Cassandra Replication Per file Per operation Scaling Add nodes Add nodes Consistency Very, to the point Per operation getting data in becomes difficult Backup Distcp sstabletojson|snapshot Restore Distcp Like a disk © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 39. So HDFS... ● Comes up with about 4 or 5 reasons a year for master node/ full cluster restart ● Grow NameNode heap ● Enable jobtracker setting to stop 100,000 task jobs ● Enabled/updated trash feature (off by default) ● Forced to do a fail over by hardware fault ● Random DRBD/Kernel brain fart ● Need to update a JVM/kernel eventually ● Now finally new versions have HA NameNode ● Running jobs lose progress will not automatically restart © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential
  • 40. Questions? © 2012 Media6Degrees. All Rights Reserved. Proprietary and Confidential