SlideShare a Scribd company logo
Jeff Bollinger – CTO - @jbollinger
Jeff Smoley – Infrastructure Architect
Scaling With Cassandra
About NativeX
The Backstory
Why Cassandra
Cassandra Overview
NativeX Cassandra Implementation / Metrics
What we Learned
Agenda
Formerly W3i
Marketing technology platform
that enables developers to build
successful businesses around
their apps.
NativeX
Over 620M unique devices on our network
Over 500 apps in network
> 100M Monthly Active Users
100 GB of data ingest per week
Vanity Metrics
A growing mobile advertising network
Backstory
0
1
2
3
4
5
6
2011 Q4 2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1
Billions
API Requests
Infrastructure Intensive Model
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9 10 11 12
Millions
Session Calls by Week After User Acquired
Lifetime of user
Microsoft SQL Server
2 Node Cluster (failover)
12 cores / node
192 GB of / node
Compellent SAN
172 Disk (SSD,FC,SATA)
Scale Up Architecture
Consistency
Partition
Tolerance
Availability
CAP Theorem
SQL Server, MySQL
Cassandra
MongoDB
Scale
•Horizontal
•Incremental
cost structure
Resiliency
•No single point
of failure
•Geographically
distributed
Objectives
Web Application Tier
Database Tier
What Needed to Scale
Web Application Tier is already a server farm that can scale
horizontally through our VMWare environment.
Database Tier was one giant monolithic Microsoft SQL
Server machine.
Stands for Not Only SQL
The NoSQL movement is not about silver bullets and
black boxes.
It’s about understanding problems and focusing on
solutions.
It’s about using the right tool for the right problem.
What is NoSQL?
Selecting Cassandra
DB Distributed Maturity High Availability Style Documentation Native Language Drivers Popularity
MongoDB Yes Medium Yes Document - NoSQL Excellent Major Languages High
VoltDB Yes Low Yes RDBMS - SQL Good Major Languages Low
MySQL Cluster Yes High Yes RDBMS - SQL & Key/Value Excellent Major Languages Medium
MySQL ScaleDB Yes Low Yes RDBMS - SQL Good Major Languages Low
Cassandra Yes Medium Yes Key/Value - Column Family Excellent Major; Poor .Net High
CouchDB No Medium Yes Document - NoSQL ? No - REST only Medium
RavenDB Yes? Low No Document - NoSQL Poor C#, JS, REST Medium
Couchbase Yes Medium Yes Key/Value - Document Good Major Languages Medium
*Disclaimer, this data was complied in spring of 2012 and my not reflect the
current state of each database system shown here.
http://nosql.mypopescu.com/ is a helpful site for discovering and learning about
different DB Systems.
Considered Multiple DB Providers
MySQL Cluster
Relational and very familiar.
Has physical row limitations.
MongoDB
Data modeling was simpler than C*.
Not very clear if it had multi-cluster support.
Cassandra
At the very core it’s all about scalability and resiliency.
Data modeling a little scary, limited .Net support.
Top Choices
Cassandra
Multi-node
Multi-cluster
Highly Available
Durable
Tunable Consistency
Shared Nothing
C* was not a replacement DB system, but an addition.
C* solves a very specific problem (for us).
Writing large volumes of data quickly.
Reading very specific data out of a large record set.
NoSQL solutions, like C*, are not meant to be a
replacement for everything.
You will make your lifer harder if you try!
The same should be said about Relational Databases.
They don’t solve every problem!
C* at NativeX
We have three major classifications of data.
Configuration
Activity Tracking
Device History
Data Classification
This data is relatively small in total size and is used
to operationally run our products. Examples
include:
Mobile Apps
Offers
Campaigns
Restrictions
Queue Settings
This data is typically relational and therefore
continues to be stored in MS SQL Server.
Configuration Data
Data is stored inside of Column Families using nested Key/Value pairs.
A Row Key maps to a collection of Columns.
A Column Name (AKA Column Key) maps to a Column Value.
The Column Name is stored along side the Value.
A common strategy is to store JSON/XML in the Column Value.
(Side note, if you’ve heard of Super Columns, forget about them, they
hurt more than they help)
The Very Basics of C* Data Modeling
Raw tracking data for all activities used by the ETL process to
produce OLAP data on an hourly basis.
Synonymous with Time Series, Event Series, or Logging data.
Examples include:
Running of Mobile Apps
Viewing Offers
Clicking on Offers
Receiving Rewards
Activity Tracking Data
Historical activities that each device has performed while
being part of NativeX’s network.
Used for offer classification for a given device.
Examples include:
Clicking on Offers
Running Mobile Apps
Redeeming Rewards
Device History Data
12 Nodes
Cisco UCS Blades
12 Cores @ 2.0GHz with Hyper-threading
64GB of Ram
2 x 480GB Intel commodity SSDs in RAID 0
10.5 TB total, ~7 TB usable
Red Hat Linux
Hardware
We chose to use Enterprise hardware for the servers
so that we would have support for them.
However, our work load is very read heavy and 15K
rpm rotational disks were a bottle neck.
We chose to swap out the rotational for commodity
SSDs. (Enterprise SSDs were 10x as expensive)
We have limited support on the hardware because of
this.
Commodity Vs. Enterprise
240 peak Writes per second per node
2,880/sec cluster wide
888 peak Reads per second per node
10,656/sec cluster wide
0.53 ms average Write Latency per request
1.7 ms average Read Latency per request
Almost 3 TB of data adding 1 TB a month
Internal C* Cluster Stats
MS SQL
Writes 12 ms
Reads 1.5 ms
C*
Writes 3 ms
Reads 4 ms
Application Side Latencies
We think that in SQL Server, reads were faster
because most of the data sat in memory.
We might be able to achieve lower latencies in C* if
we gave each node just as much memory as our SQL
Server.
To counter act the increased latencies we used
certain techniques like parallel reads using multi-
threading in our web application.
Can We Make Reads Faster?
There are still challenges with C*, like any complex
system.
More moving parts and things that need to stay in
sync.
Misconfigurations can literally destroy your data.
Certain config settings cannot be changed after you
are live, such as the number of virtual Racks.
Not all Roses
Get into production early
Data Import = Reality
Break down communication barriers
Understanding your IO profile is really important
Cassandra changes quickly, you need to keep up
Scalable systems like C* have a massive amount of
knobs, you need to know them
Leverage cloud resources in working toward right
sizing your cluster
Lessons Learned
We’re hiring
http://nativex.com/careers/
Join the MSP C* Meetup
http://www.meetup.com/Minneapolis-St-Paul-Cassandra-
Meetup/
Email us
Jeff.Smoley@nativex.com
Jeff.Bollinger@nativex.com or @jbollinger
Slide Deck
http://www.slideshare.net/JBollinger/minnebar-2013-scaling-
with-cassandra
Thanks

More Related Content

PPTX
Minnebar 2013 - Scaling with Cassandra
PPT
SQL or NoSQL, that is the question!
PPTX
Cassandra vs. MongoDB
ODP
Nonrelational Databases
PPTX
NoSQL and MongoDB Introdction
DOCX
Sql vs NO-SQL database differences explained
PPTX
Sql vs nosql
PPT
RDBMS vs NoSQL
Minnebar 2013 - Scaling with Cassandra
SQL or NoSQL, that is the question!
Cassandra vs. MongoDB
Nonrelational Databases
NoSQL and MongoDB Introdction
Sql vs NO-SQL database differences explained
Sql vs nosql
RDBMS vs NoSQL

What's hot (20)

PPTX
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
PDF
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
PPTX
Introduction to DataStax Enterprise Graph Database
PDF
SQL vs. NoSQL
PPTX
Relational and non relational database 7
PPTX
Non relational databases-no sql
PDF
Azure Data services
PPT
SQL/NoSQL How to choose ?
PPTX
What's new in SQL Server 2016
PDF
SQL Server 2019 Data Virtualization
PPT
SQL vs NoSQL
PDF
Relational vs. Non-Relational
PPTX
How SQL Server 2016 SP1 Changes the Game
PPTX
Sql vs. NoSql
PPTX
Presentation on Databases in the Cloud
PDF
MongoDB and AWS Best Practices
PPT
NoSQL Options Compared
PPTX
Why no sql ? Why Couchbase ?
PPTX
Database Virtualization: The Next Wave of Big Data
PPTX
Polyglot Database - Linuxcon North America 2016
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Introduction to DataStax Enterprise Graph Database
SQL vs. NoSQL
Relational and non relational database 7
Non relational databases-no sql
Azure Data services
SQL/NoSQL How to choose ?
What's new in SQL Server 2016
SQL Server 2019 Data Virtualization
SQL vs NoSQL
Relational vs. Non-Relational
How SQL Server 2016 SP1 Changes the Game
Sql vs. NoSql
Presentation on Databases in the Cloud
MongoDB and AWS Best Practices
NoSQL Options Compared
Why no sql ? Why Couchbase ?
Database Virtualization: The Next Wave of Big Data
Polyglot Database - Linuxcon North America 2016
Ad

Viewers also liked (17)

PPTX
XMPro BPM - Innovative Solutions to Painful Problems
PDF
Bronces Mestre Teo Macias series
PPTX
PPTX
Cct slide
PPTX
Blog.analitics
PDF
Memoria de actividades MADin USAL
PPTX
From Blog to Brand - TBU Rotterdam
PDF
The Best Practices in Travel Blogging White Paper
PDF
presentación Madinusal 2016, ADDIP Uruguay
PPT
Looking Beyond The Blog to Create Opportunities - TBU Rotterdam
PDF
How to work with tourism boards & travel brands
PPT
The Professional Bloggers Sponsorship Guide - TBU Rotterdam
PPT
Introduction to Travel PR - TBU Rotterdam
PPTX
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
PDF
How travel bloggers impact the booking funnel
PPT
Is Content Really King? - TBU Rotterdam
PPT
Create an Editing Workflow - TBU Rotterdam
XMPro BPM - Innovative Solutions to Painful Problems
Bronces Mestre Teo Macias series
Cct slide
Blog.analitics
Memoria de actividades MADin USAL
From Blog to Brand - TBU Rotterdam
The Best Practices in Travel Blogging White Paper
presentación Madinusal 2016, ADDIP Uruguay
Looking Beyond The Blog to Create Opportunities - TBU Rotterdam
How to work with tourism boards & travel brands
The Professional Bloggers Sponsorship Guide - TBU Rotterdam
Introduction to Travel PR - TBU Rotterdam
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
How travel bloggers impact the booking funnel
Is Content Really King? - TBU Rotterdam
Create an Editing Workflow - TBU Rotterdam
Ad

Similar to MinneBar 2013 - Scaling with Cassandra (20)

PDF
Slides: Relational to NoSQL Migration
PPTX
Cassandra implementation for collecting data and presenting data
PPTX
Learning Cassandra NoSQL
PPTX
noSQL choices
PPTX
Cassandra training
PDF
Intro to cassandra
PDF
An Introduction to Apache Cassandra
PDF
C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...
PPT
SQL, NoSQL, BigData in Data Architecture
PPTX
NoSQL Intro with cassandra
PPTX
The CIOs Guide to NoSQL
PPTX
Apache Cassandra introduction
PDF
Five Lessons in Distributed Databases
PPT
5266732.ppt
PPTX
Unit -3 _Cassandra-CRUD Operations_Practice Examples
PPTX
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
PPTX
Learn Cassandra at edureka!
PPTX
Introduction to cassandra
PDF
Think Big, think fast: how to select a future-proof database - Bruno Simic (C...
PPTX
Presentation of Apache Cassandra
Slides: Relational to NoSQL Migration
Cassandra implementation for collecting data and presenting data
Learning Cassandra NoSQL
noSQL choices
Cassandra training
Intro to cassandra
An Introduction to Apache Cassandra
C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsof...
SQL, NoSQL, BigData in Data Architecture
NoSQL Intro with cassandra
The CIOs Guide to NoSQL
Apache Cassandra introduction
Five Lessons in Distributed Databases
5266732.ppt
Unit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
Learn Cassandra at edureka!
Introduction to cassandra
Think Big, think fast: how to select a future-proof database - Bruno Simic (C...
Presentation of Apache Cassandra

Recently uploaded (20)

PPTX
OMC Textile Division Presentation 2021.pptx
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
Chapter 5: Probability Theory and Statistics
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Hybrid model detection and classification of lung cancer
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
The various Industrial Revolutions .pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
STKI Israel Market Study 2025 version august
PPTX
1. Introduction to Computer Programming.pptx
PPTX
O2C Customer Invoices to Receipt V15A.pptx
OMC Textile Division Presentation 2021.pptx
DP Operators-handbook-extract for the Mautical Institute
Chapter 5: Probability Theory and Statistics
NewMind AI Weekly Chronicles – August ’25 Week III
cloud_computing_Infrastucture_as_cloud_p
A contest of sentiment analysis: k-nearest neighbor versus neural network
gpt5_lecture_notes_comprehensive_20250812015547.pdf
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
observCloud-Native Containerability and monitoring.pptx
Getting Started with Data Integration: FME Form 101
Group 1 Presentation -Planning and Decision Making .pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Hybrid model detection and classification of lung cancer
Programs and apps: productivity, graphics, security and other tools
Assigned Numbers - 2025 - Bluetooth® Document
The various Industrial Revolutions .pptx
Tartificialntelligence_presentation.pptx
STKI Israel Market Study 2025 version august
1. Introduction to Computer Programming.pptx
O2C Customer Invoices to Receipt V15A.pptx

MinneBar 2013 - Scaling with Cassandra

  • 1. Jeff Bollinger – CTO - @jbollinger Jeff Smoley – Infrastructure Architect Scaling With Cassandra
  • 2. About NativeX The Backstory Why Cassandra Cassandra Overview NativeX Cassandra Implementation / Metrics What we Learned Agenda
  • 3. Formerly W3i Marketing technology platform that enables developers to build successful businesses around their apps. NativeX
  • 4. Over 620M unique devices on our network Over 500 apps in network > 100M Monthly Active Users 100 GB of data ingest per week Vanity Metrics
  • 5. A growing mobile advertising network Backstory 0 1 2 3 4 5 6 2011 Q4 2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1 Billions API Requests
  • 6. Infrastructure Intensive Model 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 8 9 10 11 12 Millions Session Calls by Week After User Acquired Lifetime of user
  • 7. Microsoft SQL Server 2 Node Cluster (failover) 12 cores / node 192 GB of / node Compellent SAN 172 Disk (SSD,FC,SATA) Scale Up Architecture
  • 9. Scale •Horizontal •Incremental cost structure Resiliency •No single point of failure •Geographically distributed Objectives
  • 10. Web Application Tier Database Tier What Needed to Scale Web Application Tier is already a server farm that can scale horizontally through our VMWare environment. Database Tier was one giant monolithic Microsoft SQL Server machine.
  • 11. Stands for Not Only SQL The NoSQL movement is not about silver bullets and black boxes. It’s about understanding problems and focusing on solutions. It’s about using the right tool for the right problem. What is NoSQL?
  • 12. Selecting Cassandra DB Distributed Maturity High Availability Style Documentation Native Language Drivers Popularity MongoDB Yes Medium Yes Document - NoSQL Excellent Major Languages High VoltDB Yes Low Yes RDBMS - SQL Good Major Languages Low MySQL Cluster Yes High Yes RDBMS - SQL & Key/Value Excellent Major Languages Medium MySQL ScaleDB Yes Low Yes RDBMS - SQL Good Major Languages Low Cassandra Yes Medium Yes Key/Value - Column Family Excellent Major; Poor .Net High CouchDB No Medium Yes Document - NoSQL ? No - REST only Medium RavenDB Yes? Low No Document - NoSQL Poor C#, JS, REST Medium Couchbase Yes Medium Yes Key/Value - Document Good Major Languages Medium *Disclaimer, this data was complied in spring of 2012 and my not reflect the current state of each database system shown here. http://nosql.mypopescu.com/ is a helpful site for discovering and learning about different DB Systems.
  • 13. Considered Multiple DB Providers MySQL Cluster Relational and very familiar. Has physical row limitations. MongoDB Data modeling was simpler than C*. Not very clear if it had multi-cluster support. Cassandra At the very core it’s all about scalability and resiliency. Data modeling a little scary, limited .Net support. Top Choices
  • 15. C* was not a replacement DB system, but an addition. C* solves a very specific problem (for us). Writing large volumes of data quickly. Reading very specific data out of a large record set. NoSQL solutions, like C*, are not meant to be a replacement for everything. You will make your lifer harder if you try! The same should be said about Relational Databases. They don’t solve every problem! C* at NativeX
  • 16. We have three major classifications of data. Configuration Activity Tracking Device History Data Classification
  • 17. This data is relatively small in total size and is used to operationally run our products. Examples include: Mobile Apps Offers Campaigns Restrictions Queue Settings This data is typically relational and therefore continues to be stored in MS SQL Server. Configuration Data
  • 18. Data is stored inside of Column Families using nested Key/Value pairs. A Row Key maps to a collection of Columns. A Column Name (AKA Column Key) maps to a Column Value. The Column Name is stored along side the Value. A common strategy is to store JSON/XML in the Column Value. (Side note, if you’ve heard of Super Columns, forget about them, they hurt more than they help) The Very Basics of C* Data Modeling
  • 19. Raw tracking data for all activities used by the ETL process to produce OLAP data on an hourly basis. Synonymous with Time Series, Event Series, or Logging data. Examples include: Running of Mobile Apps Viewing Offers Clicking on Offers Receiving Rewards Activity Tracking Data
  • 20. Historical activities that each device has performed while being part of NativeX’s network. Used for offer classification for a given device. Examples include: Clicking on Offers Running Mobile Apps Redeeming Rewards Device History Data
  • 21. 12 Nodes Cisco UCS Blades 12 Cores @ 2.0GHz with Hyper-threading 64GB of Ram 2 x 480GB Intel commodity SSDs in RAID 0 10.5 TB total, ~7 TB usable Red Hat Linux Hardware
  • 22. We chose to use Enterprise hardware for the servers so that we would have support for them. However, our work load is very read heavy and 15K rpm rotational disks were a bottle neck. We chose to swap out the rotational for commodity SSDs. (Enterprise SSDs were 10x as expensive) We have limited support on the hardware because of this. Commodity Vs. Enterprise
  • 23. 240 peak Writes per second per node 2,880/sec cluster wide 888 peak Reads per second per node 10,656/sec cluster wide 0.53 ms average Write Latency per request 1.7 ms average Read Latency per request Almost 3 TB of data adding 1 TB a month Internal C* Cluster Stats
  • 24. MS SQL Writes 12 ms Reads 1.5 ms C* Writes 3 ms Reads 4 ms Application Side Latencies
  • 25. We think that in SQL Server, reads were faster because most of the data sat in memory. We might be able to achieve lower latencies in C* if we gave each node just as much memory as our SQL Server. To counter act the increased latencies we used certain techniques like parallel reads using multi- threading in our web application. Can We Make Reads Faster?
  • 26. There are still challenges with C*, like any complex system. More moving parts and things that need to stay in sync. Misconfigurations can literally destroy your data. Certain config settings cannot be changed after you are live, such as the number of virtual Racks. Not all Roses
  • 27. Get into production early Data Import = Reality Break down communication barriers Understanding your IO profile is really important Cassandra changes quickly, you need to keep up Scalable systems like C* have a massive amount of knobs, you need to know them Leverage cloud resources in working toward right sizing your cluster Lessons Learned
  • 28. We’re hiring http://nativex.com/careers/ Join the MSP C* Meetup http://www.meetup.com/Minneapolis-St-Paul-Cassandra- Meetup/ Email us Jeff.Smoley@nativex.com Jeff.Bollinger@nativex.com or @jbollinger Slide Deck http://www.slideshare.net/JBollinger/minnebar-2013-scaling- with-cassandra Thanks