SlideShare a Scribd company logo
Why your RDBMS’s fails at scale
And one built for it…..
DataStax: from validation to momentum.
400+
Employees
$190M
Funding
500+
Customers
Founded in April 2010
Santa Clara • San Francisco • Austin •
London • Paris • Berlin • Tokyo • Sydney
(Series E – Sept. 2014) 30% +
2016, 2017 World’s Best
100 Cloud Companies
Ranked #1 in multiple operational
database categories
© 2017 DataStax, All Rights Reserved. Company Confidential
Let’s take
a moment
Your business interacts with people,
processes and things all the time
© 2017 DataStax, All Rights Reserved. Company Confidential
CONTEXTUAL
Real-time, globally distributed cloud
applications must meet expectations.
ALWAYS-ON DISTRIBUTED SCALABLEREAL-TIME
© 2017 DataStax, All Rights Reserved. Company Confidential
Netflix disrupted video distribution
and creation with a cloud application
70 million
Customers
400
Cities
125 million
Hours Watched per Day
© DataStax, All Rights Reserved.5
Microsoft remains a leader
in collaboration with a cloud application
#1
Deployed App
in Enterprises
5 Million
Events Per
Organization a Month
60 Million
Monthly Active Users
6
© DataStax, All Rights Reserved.7
No Downtime: 4 Black Fridays in a row
Potted history of the Database
Database and the Internet
1970 Invented by
E.F. Codd at IBM
1979 First
commercial RDBMS
available (Oracle V2)
1983 official birth of
internet or TCP/IP
1986 SQL becomes
international
standard
1993 WWW finally
available
1995 First internet
based applications
arrive
Explosion of Cloud Applications
Some of the issues faced
• How do you scale the database?
• Add more RAM
• Add more CPU
• Add faster and more disks
• How do you do this?
• Bring the database OFFLINE
• Vertical scaling has a finite limit
Some of the issues faced
• How do you scale client connections?
• Add a connection pool
• But this has a finite limit
• Adds complexity
Listener
Connection Pool
Single Points of Failure
• With a single database we have a SPOF
• Use replication
• Problem solved
• But now
• Single Master
• Scales for Reads not Writes
• Action needed if Master goes down
• Only suitable for LAN deployments
Master
Read Only
Subscriber
But How to Horizontally Scale
• Shard your data across databases
• Each shard needs a replica
• Need a load balancer
• Just showing 2 shards
• Things get more complicated
• Could have multiple read only
subscribers
A-M
N-Z
Master
Read Only
Subscriber
L
o
a
d
b
a
l
a
n
c
e
r
D
What about multiple Data Centres?
• Extremely complicated
• Difficult to support Active
Active
• Need to consider conflicts
• More Disaster Recovery than
Disaster Avoidance
Traditional Data Models don’t help
• Normalised Data Model
• Random seeks result in high
volume of I/O operations
• Joins extremely expensive
• Won’t scale horizontally
• De-Normalised Data Model
• Sequential seek to return results
• Joins eliminated
• Scales indefinitely
1:M
M:N
Summary
• Traditional Databases developed before the web and cloud based
applications
• Scaling up results in downtime
• Single node is a single point of failure
• Number of client connections finite
• Add a read only replica for high availability
• Shard to horizontally scale
• Data Center support extremely difficult
• Data model not built for horizontal scale
A new approach is required
17
Client/Server
1990s
Cloud
Today
Web
2000s
© DataStax, All Rights Reserved.
Scaling out solves the distributed problem
18 © DataStax, All Rights Reserved.
SCALE-OUT APP
LAYER
SCALE-OUT DATA
LAYER
MASTER-SLAVE DATABASE
San
Francisco
New York
London
So What’s the answer?
• Distributed masterless NoSQL
Database
• Continuous Availability
• Disaster Avoidance
• Linear Scale Performance
• Add nodes to scale
• Runs on Commodity Hardware
• Cloud or on Premise or Hybrid
Linear Scalability
• Have More Data? Add more nodes.
• Need More Throughput? Add more nodes.
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
9000 Nodes
700 Nodes
400 Nodes
Continuous Availability
• Nodes Down != Database Down
• Datacenter Down != Database Down
• Upgrade != Database Down
Platform for Cloud Applications
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or
The most innovative companies use DataStax
© DataStax, All Rights Reserved.23
2010 2012 2014 2016 2017
Key takeaways
• Why your RDBMS fails at scale
• Fundamentally not built for cloud based applications
• World’s leading brands rely on DataStax for globally distributed data
management
• Next steps: Download today at www.datastax.com and
register for your DataStax Academy account for free online training
DataStax: The power behind the moment
24
Backup Slides
25
ACID is a lie with data replication
Scenario: client with read-heavy workload decides to add asynchronous
replication, so there is lag for propagating data from master to the slave.
• Consistency: If a client decides to do a read to the slave before the data
is replicated, it’s going to get the old data back, which means loss of
consistency
• Atomicity: not having the correct data results in the failure of entire
transaction
• Isolation: receiving the old data means loss of isolation
• Durability: client will receive the old data and not the data it had written to
the master node
When applying RDBMS to Big Data replication, ACID collapses
26
CAP tradeoffs
• Relational databases choose strong consistency over high availability
• Latency between data centers makes consistency impractical
• NoSQL databases like Cassandra choose high availability and partition tolerance over
consistency.
• Data is replicated asynchronously across multiple data centers. We are LIMITED
by the speed of light making consistency impossible.
• Lets you specify consistency level (one replica vs majority of replicas) suitable for
your application
Can’t be both consistent and highly available during a network partition
27
Replication Complexity in RDBMS
© 2015 DataStax, All Rights
Reserved.28
*Source: Oracle Database 12c New Features, Slide 17. (http://bit.ly/1MIxKc1)
HTTP Application Message Queue
Streaming
Analytics
Batch
Analytics
Real-time
DSE Real-time Analytics IoT Reference Architecture
© 2015 DataStax, All Rights
Reserved.29

More Related Content

PPTX
Beyond Batch: Is ETL still relevant in the API economy?
PPTX
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
PPTX
Introduction: Architecting for Scale
PPTX
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
PPTX
Data Warehouse in Cloud
PPTX
Webinar: Customer Experience in Banking - a CTO's Perspective
PDF
Designing a Distributed Cloud Database for Dummies
PDF
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Beyond Batch: Is ETL still relevant in the API economy?
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Introduction: Architecting for Scale
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Data Warehouse in Cloud
Webinar: Customer Experience in Banking - a CTO's Perspective
Designing a Distributed Cloud Database for Dummies
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management

What's hot (19)

PPTX
Fundraising and Technology: A Match Made in the Cloud
PPTX
Get Mainframe and IBM i Data to Snowflake
PPTX
Data Warehousing in the Cloud: Practical Migration Strategies
PPTX
Altis AWS Snowflake Practice
PDF
Presentation by Bart Gielen (DataSense) at the Data Vault Modelling and Data ...
PPTX
Cloud Computing and Big Data
PDF
Don’t Bring Old Problems to Your New Cloud Data Warehouse
PDF
Postgres Vision 2018: Five Sharding Data Models
 
PPTX
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
PPTX
Webinar - Data Management for the "Right-Now" Economy - The 5 Key Ingredients
PPTX
Microsof azure class 1- intro
PDF
Data lake
PDF
Webinar: It's the 21st Century - Why Isn't Your Data Integration Loosely Coup...
PPTX
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
PDF
How to Choose a Host for a Big Data Project
PDF
A Mashup with Backbone
PPTX
Fixing data science & Accelerating Artificial Super Intelligence Development
PDF
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
PPTX
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Fundraising and Technology: A Match Made in the Cloud
Get Mainframe and IBM i Data to Snowflake
Data Warehousing in the Cloud: Practical Migration Strategies
Altis AWS Snowflake Practice
Presentation by Bart Gielen (DataSense) at the Data Vault Modelling and Data ...
Cloud Computing and Big Data
Don’t Bring Old Problems to Your New Cloud Data Warehouse
Postgres Vision 2018: Five Sharding Data Models
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar - Data Management for the "Right-Now" Economy - The 5 Key Ingredients
Microsof azure class 1- intro
Data lake
Webinar: It's the 21st Century - Why Isn't Your Data Integration Loosely Coup...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
How to Choose a Host for a Big Data Project
A Mashup with Backbone
Fixing data science & Accelerating Artificial Super Intelligence Development
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
Webinar: Comparing DataStax Enterprise with Open Source Apache Cassandra
Ad

Similar to Datastax - Why Your RDBMS fails at scale (20)

PPTX
Event Sponsor NetApp - CSO- Jon Kissane
PDF
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
PPTX
Cloud Services and Infrastructure in 2017
PDF
Groth data of-cloud
PPTX
IBM Relay 2015: Open for Data
 
PPTX
Self-Tuning MySQL - a Hosting Provider's Unfair Advantage
PDF
Cloud-Native Data: What data questions to ask when building cloud-native apps
PDF
Picking the Right Clustering for MySQL - Cloud-only Services or Flexible Tung...
PDF
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
PDF
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
PPTX
Database Virtualization: The Next Wave of Big Data
PPTX
How To Tell if Your Business Needs NoSQL
PDF
Adapting to a Hybrid World [Webinar on Demand]
PDF
Overcoming Today's Data Challenges with MongoDB
PDF
Evolving From Monolithic to Distributed Architecture Patterns in the Cloud
PDF
Introducing Neo4j
PPTX
Liberate Legacy Data Sources with Precisely and Databricks
PDF
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
PPTX
Does it Mix? Cassandra and RDBMS working together!
PDF
IBM - Introduction to Cloudant
Event Sponsor NetApp - CSO- Jon Kissane
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Cloud Services and Infrastructure in 2017
Groth data of-cloud
IBM Relay 2015: Open for Data
 
Self-Tuning MySQL - a Hosting Provider's Unfair Advantage
Cloud-Native Data: What data questions to ask when building cloud-native apps
Picking the Right Clustering for MySQL - Cloud-only Services or Flexible Tung...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Database Virtualization: The Next Wave of Big Data
How To Tell if Your Business Needs NoSQL
Adapting to a Hybrid World [Webinar on Demand]
Overcoming Today's Data Challenges with MongoDB
Evolving From Monolithic to Distributed Architecture Patterns in the Cloud
Introducing Neo4j
Liberate Legacy Data Sources with Precisely and Databricks
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
Does it Mix? Cassandra and RDBMS working together!
IBM - Introduction to Cloudant
Ad

Recently uploaded (20)

PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Global journeys: estimating international migration
PPT
Quality review (1)_presentation of this 21
PDF
Foundation of Data Science unit number two notes
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Lecture1 pattern recognition............
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Computer network topology notes for revision
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Moving the Public Sector (Government) to a Digital Adoption
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Global journeys: estimating international migration
Quality review (1)_presentation of this 21
Foundation of Data Science unit number two notes
Reliability_Chapter_ presentation 1221.5784
Galatica Smart Energy Infrastructure Startup Pitch Deck
Lecture1 pattern recognition............
.pdf is not working space design for the following data for the following dat...
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Knowledge Engineering Part 1
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Computer network topology notes for revision
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf

Datastax - Why Your RDBMS fails at scale

  • 1. Why your RDBMS’s fails at scale And one built for it…..
  • 2. DataStax: from validation to momentum. 400+ Employees $190M Funding 500+ Customers Founded in April 2010 Santa Clara • San Francisco • Austin • London • Paris • Berlin • Tokyo • Sydney (Series E – Sept. 2014) 30% + 2016, 2017 World’s Best 100 Cloud Companies Ranked #1 in multiple operational database categories © 2017 DataStax, All Rights Reserved. Company Confidential
  • 3. Let’s take a moment Your business interacts with people, processes and things all the time © 2017 DataStax, All Rights Reserved. Company Confidential
  • 4. CONTEXTUAL Real-time, globally distributed cloud applications must meet expectations. ALWAYS-ON DISTRIBUTED SCALABLEREAL-TIME © 2017 DataStax, All Rights Reserved. Company Confidential
  • 5. Netflix disrupted video distribution and creation with a cloud application 70 million Customers 400 Cities 125 million Hours Watched per Day © DataStax, All Rights Reserved.5
  • 6. Microsoft remains a leader in collaboration with a cloud application #1 Deployed App in Enterprises 5 Million Events Per Organization a Month 60 Million Monthly Active Users 6
  • 7. © DataStax, All Rights Reserved.7 No Downtime: 4 Black Fridays in a row
  • 8. Potted history of the Database Database and the Internet 1970 Invented by E.F. Codd at IBM 1979 First commercial RDBMS available (Oracle V2) 1983 official birth of internet or TCP/IP 1986 SQL becomes international standard 1993 WWW finally available 1995 First internet based applications arrive
  • 9. Explosion of Cloud Applications
  • 10. Some of the issues faced • How do you scale the database? • Add more RAM • Add more CPU • Add faster and more disks • How do you do this? • Bring the database OFFLINE • Vertical scaling has a finite limit
  • 11. Some of the issues faced • How do you scale client connections? • Add a connection pool • But this has a finite limit • Adds complexity Listener Connection Pool
  • 12. Single Points of Failure • With a single database we have a SPOF • Use replication • Problem solved • But now • Single Master • Scales for Reads not Writes • Action needed if Master goes down • Only suitable for LAN deployments Master Read Only Subscriber
  • 13. But How to Horizontally Scale • Shard your data across databases • Each shard needs a replica • Need a load balancer • Just showing 2 shards • Things get more complicated • Could have multiple read only subscribers A-M N-Z Master Read Only Subscriber L o a d b a l a n c e r D
  • 14. What about multiple Data Centres? • Extremely complicated • Difficult to support Active Active • Need to consider conflicts • More Disaster Recovery than Disaster Avoidance
  • 15. Traditional Data Models don’t help • Normalised Data Model • Random seeks result in high volume of I/O operations • Joins extremely expensive • Won’t scale horizontally • De-Normalised Data Model • Sequential seek to return results • Joins eliminated • Scales indefinitely 1:M M:N
  • 16. Summary • Traditional Databases developed before the web and cloud based applications • Scaling up results in downtime • Single node is a single point of failure • Number of client connections finite • Add a read only replica for high availability • Shard to horizontally scale • Data Center support extremely difficult • Data model not built for horizontal scale
  • 17. A new approach is required 17 Client/Server 1990s Cloud Today Web 2000s © DataStax, All Rights Reserved.
  • 18. Scaling out solves the distributed problem 18 © DataStax, All Rights Reserved. SCALE-OUT APP LAYER SCALE-OUT DATA LAYER MASTER-SLAVE DATABASE
  • 19. San Francisco New York London So What’s the answer? • Distributed masterless NoSQL Database • Continuous Availability • Disaster Avoidance • Linear Scale Performance • Add nodes to scale • Runs on Commodity Hardware • Cloud or on Premise or Hybrid
  • 20. Linear Scalability • Have More Data? Add more nodes. • Need More Throughput? Add more nodes. http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html 9000 Nodes 700 Nodes 400 Nodes
  • 21. Continuous Availability • Nodes Down != Database Down • Datacenter Down != Database Down • Upgrade != Database Down
  • 22. Platform for Cloud Applications DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or
  • 23. The most innovative companies use DataStax © DataStax, All Rights Reserved.23 2010 2012 2014 2016 2017
  • 24. Key takeaways • Why your RDBMS fails at scale • Fundamentally not built for cloud based applications • World’s leading brands rely on DataStax for globally distributed data management • Next steps: Download today at www.datastax.com and register for your DataStax Academy account for free online training DataStax: The power behind the moment 24
  • 26. ACID is a lie with data replication Scenario: client with read-heavy workload decides to add asynchronous replication, so there is lag for propagating data from master to the slave. • Consistency: If a client decides to do a read to the slave before the data is replicated, it’s going to get the old data back, which means loss of consistency • Atomicity: not having the correct data results in the failure of entire transaction • Isolation: receiving the old data means loss of isolation • Durability: client will receive the old data and not the data it had written to the master node When applying RDBMS to Big Data replication, ACID collapses 26
  • 27. CAP tradeoffs • Relational databases choose strong consistency over high availability • Latency between data centers makes consistency impractical • NoSQL databases like Cassandra choose high availability and partition tolerance over consistency. • Data is replicated asynchronously across multiple data centers. We are LIMITED by the speed of light making consistency impossible. • Lets you specify consistency level (one replica vs majority of replicas) suitable for your application Can’t be both consistent and highly available during a network partition 27
  • 28. Replication Complexity in RDBMS © 2015 DataStax, All Rights Reserved.28 *Source: Oracle Database 12c New Features, Slide 17. (http://bit.ly/1MIxKc1)
  • 29. HTTP Application Message Queue Streaming Analytics Batch Analytics Real-time DSE Real-time Analytics IoT Reference Architecture © 2015 DataStax, All Rights Reserved.29