SlideShare a Scribd company logo
ROI on Big Data: RDBMS,
NoSQL or Both?
Robin Schumacher
VP Products, DataStax
3 Big Ideas for today’s conversation
•Big data != big money
•Big words require big back-up
•All questions are big (and never foolish)
•Exciting News: Launch of DataStax Enterprise 3.1
Agenda: What Will We Cover?
• Introduction to DataStax and NoSQL
• Overview of legacy vs. modern, big data applications
• Comparing RDBMS’s and NoSQL
• Customers examples of RDBMS-to-NoSQL swap out’s
and co-existence strategies
• Conclusions
Avoid Big Data FUD
• Cost compared to what?
• Value compared to what?
• How to plan for success?
http://www.informationweek.com/big-data/commentary/big-data-analytics/when-big-data-equals-big-
money-waste/240157956
DataStax: An Overview
• Founded in April 2010
• We drive Apache Cassandra™,
the popular open-source NoSQL database
• We provide DataStax Enterprise for
enterprise NoSQL implementations
• 300+ customers
• 100+ employees
• Home to Apache Cassandra Chair & most
committers
• Headquartered in San Francisco Bay area
• Funded by prominent venture firms
What is Apache Cassandra?
Datacenter
Cloud
Massively scalable
NoSQL database
Source: (http://www.datastax.com/resources/whitepapers/bigdata)
And easy
data distribution
That offers
uptime, all the time
(continuous availability)
What is DataStax Enterprise?
DataStax Enterprise --
powered by Apache Cassandra™, certified for production
1. DataStax Enterprise Server
2. OpsCenter Enterprise
3. Expert Support & Services
• Massive scalability
• Continuous availability, and
• Operational simplicity for real-time,
analytic, and enterprise search data.
Details of DataStax Enterprise Server
• Production-certified version of
Cassandra for online applications.
• Integrated Hadoop for batch
analytics.
• Built-in Solr for enterprise search.
• Comprehensive security for
sensitive data.
• Active everywhere architecture.
• Gold standard for multi-data center
and cloud deployments.
• Built-in data replication; removes
need for ETL.
• Complete isolation between different
workloads.
• Methods for data migration from
legacy RDBMS’s.
Details of DataStax OpsCenter
A new, 10-node Cassandra (or Hadoop) cluster with OpsCenter running in 3 minutes…A new, 10-node DSE cluster with OpsCenter running on AWS in 3 minutes…
Done1 2 3
Launch Today: DataStax Enterprise 3.1
• Lower Total Cost of Ownership
• Better ROI
• Simpler & faster development
• Greater insight
• More flexibility and functionality
What’s New: Cassandra 1.2 Integration
• Manage up to 10x more Cassandra
data per node than prior versions for
many use cases
• Use vnodes and parallel operations
to increase capacity and perform
maintenance operations much faster
• Get much greater functionality with
new CQL binary protocol via Java
and .NET drivers
• Store arrays and lists of data much
more easily with collections
• Get deeper visibility into the
response times of your queries and
other database operations with
tracing
What’s New: Solr 4.3 Integration
• 60+ new features
• Even faster performance
• Stability Improvements
• New memory caches and memory
monitoring
• Easier customization with new
pluggable document handling
Cassandra/DataStax Users: A Sample
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-
on.html
Netflix Cloud Benchmark…
“In terms of scalability, there is a clear winner throughout
our experiments. Cassandra achieves the highest
throughput for the maximum number of nodes in all
experiments with a linear increasing throughput.”
Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable, et al., August
2013, p. 10. Benchmark paper presented at the Very Large Database Conference, 2013.
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2013.pdf
End Point Independent NoSQL Benchmark
Highest in throughput…
Lowest in latency…
Cassandra: NoSQL Performance Leader
Use Cases Handled By DataStax Enterprise
Managed by Cassandra Managed by Hadoop Managed by Solr
• Time series data
• Device/Sensor/Data
“exhaust” systems
• Distributed applications
• Media streaming
• Online Web retail
(transactional, shopping
carts, etc.)
• Real-time data analytics
• Social media capture and
analysis
• Web click-stream analysis
• Write-intensive transactional
systems
• Buyer behavior analytics
• Compliance/regulatory
analysis
• Customer
recommendation output
• Fraud detection
• Risk analysis
• Sales program
campaign analysis
• Supply chain analytics
• Batch Web clickstream
analysis
• General Web search
• Web retail faceted
(categorization) search
• Search/hit prioritization
and highlighting
• Application log search and
analysis
• Document (PDF, MS
Word, etc.) search and
analysis
• Geospatial search
• Real estate location and
property search
• Social media match ups
NoSQL Momentum
“The economics don’t look
great for Oracle.
According to analysis by
Wikibon’s David Floyer
(and highlighted in the
Wall Street Journal), the
NoSQL database market
is expected to grow at a
compound annual
growth rate of nearly
60% between 2011 and
2017. The SQL slice of
the Big Data market, in
contrast, will grow at just a
26% CAGR during that
same time period.”
NoSQL Momentum
“NoSQL is the stuff of the Internet
Age.”
- Andrew Oliver,
InfoWorld
Examples of Oracle RDBMS Replacements
But does this mean the RDBMS is on the way out…?
The truth is the vast
majority of modern
application architectures
use both an RDBMS and
NoSQL. The question is
when and where should
each be used?
Legacy vs. Today’s Data Applications
LOB
App
RDBMS
Oracle
LOB
App
RDBMS
MySQL
LOB
App
RDBMS
SQL
Server
Data Warehouse
RDBMS
Teradata/
Column DB’s
LOB
App
NoSQL
LOB
App
NoSQL
LOB
App
NoSQL
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
Data Warehouse
Hadoop
Legacy Line-of-
Business Apps
Today’s Line-of-
Business Apps
Components of Legacy vs. Today’s Data Applications
LOB
App
RDBMS
Oracle
LOB
App
RDBMS
MySQL
LOB
App
RDBMS
SQL
Server
Data Warehouse
RDBMS
Teradata/
Column DB’s
LOB
App
NoSQL
LOB
App
NoSQL
LOB
App
NoSQL
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
Data Warehouse
Hadoop
Transactions:
• LOB Style
• Full consistency
Analytics:
• ROLAP
• Rank
• Windowing
• Partition by, etc.
Search
• Full Text
Transactions:
• LOB Style
• Tunable
consistency
Analytics:
• MapReduce
• Hive
• Pig
• Mahout
Search
• Solr
Transactions:
• DW style
Analytics:
• ROLAP
• RANK
• Windowing
• Partition by, etc.
Search
• Full Text
Transactions:
• None
Analytics:
• MapReduce
• Hive
• Pig
• Mahout
Search
• Solr
Previous Generation vs. Modern Applications
Slow/medium velocity data High velocity data
Data coming in from one/few locations Data coming in from many locations
Rigid, static structured data Flexible, fluid, multi-type data
Low/medium data volumes; purge often High data volumes; retain forever
Deploy app central location/ one server Deploy app everywhere / many servers
Write data in one location Write data everywhere/anywhere
Primary concern: scale reads Scale writes and reads
Scale up for more users/data Scale out for more users/data
Downtime tolerated Downtime not tolerated
Legacy Applications Today’s Applications
DataStax / Cassandra vs. Legacy RDBMS
Fluid and flexible data model Rigid data model
Easily supports modern data types Difficulty in supporting all datatypes
Automatic data sharding/distribution Manual data sharding/distribution
Multi-data center/cloud support Single DC with data shipping options
Continuous availability Medium to high availability
Read from anywhere Read from primary, possibly slaves
Write data anywhere Write data to primary or specified shards
AID transactions; tunable consistency ACID transactions
Unlimited scale out for more capacity Limited scale up for capacity (out-reads)
CQL for primary interface SQL for primary interface
DataStax Enterprise/Cassandra Legacy RDBMS
Business Catalysts For NoSQL - Do You Need To…
…keep business always online and serving customers?
…serve customers everywhere (i.e. in multiple locations)?
…deliver information fast both internally and externally?
…handle increasing customer demand?
…protect information that runs the business?
…make business decisions based on right information?
…easily find needed information?
…receive strong payback for IT investments?
Keep Business Online
Netflix systems are run in the cloud across multiple availability zones
with Cassandra and sport constant uptime. Over 95% of Netflix’s data
is stored in Cassandra (much of it previously on Oracle).
Keep Business Online
Commenting on Amazon outage in Oct 2012: “We configure all our clusters
to use a replication factor of three, with each replica located in a different
Availability Zone. This allowed Cassandra to handle the outage remarkably
well. When a single zone became unavailable, we didn't need to do
anything. Cassandra routed requests around the unavailable zone and when
it recovered, the ring was repaired.”
- Netflix Tech Blog
Serve Customers Everywhere
Rightscale keeps its customers in contact with each other all over the
world via DataStax clusters in 5+ global data centers.
Deliver Information Fast Everywhere
Adobe delivers on very stringent response time requirements (<
12ms or less for 95% of requests) for its marketing cloud with
DataStax clusters in two data centers.
Handle Increasing Customer Demand
Gnip delivers social media data to 95% of Fortune 500 by using
DataStax Enterprise. Data velocity rates for Twitter alone can be
20,000 tweets per second.
Handle Increasing Customer Demand
Ooyala distributes and analyzes media/video content for companies
like ESPN, Rolling Stone and others. They track about one quarter of
all online video viewers each day and generate 1-2 billion events that
are streaming in real-time through their DataStax cluster.
Handle Increasing Customer Demand
Make Right Business Decisions
“DataStax made it all work together”
• Cassandra, Hadoop, Solr, Security
Manage costs & improve performance
• 400% ROI over five years
• $750K five-year savings in support costs
• 90% better response and upload time
Analyzing Information
• Doctors’ notes
• Analyze notes to bill back Medicare /
Medicaid
Find Information Instantly
Datafiniti, which is a search engine for data, needs to consume lots
of data in real time and provide fast search on top of the same data.
Get Strong Payback on IT Investment
Constant Contact found that scaling out with NoSQL vs. IBM DB2
saved them 90% in software costs, and was implemented in 1/3 the
time...
“To do what we need to do today
without Cassandra would cost a
couple million dollars more and
would be significantly harder to
manage operationally.”
Conclusions
When Legacy RDBMS over NoSQL/Cassandra
• No need for a flexible data model; data is all structured and fits
well within an RDBMS schema.
• Data does not come in at high rates and the speed at which
data is written is not important.
• You need detailed/complex/nested ACID transactions.
• All your data can fit into memory or reside on 1-2 machines and
substantial growth is not expected.
• You have no need for constant uptime; unexpected downtime
has no/little impact.
• You don’t need to distribute data to multiple locations, various
cloud availability zones, or have multiple copies for disaster
recovery purposes.
• No need to integrate/seamlessly move data between real-time,
analytics, and search systems.
• Software costs not a concern.
When DataStax/Cassandra Over Legacy RDBMS
• You need a more flexible data model.
• You have to store a variety of data types.
• You need constant uptime/continuous availability.
• You need to distribute data across multiple data centers or
cloud availability zones.
• You need linear scale-out performance for growing data.
• You need very fast write capabilities.
• You need to write and read data in multiple locations.
• You need transactions but eventual consistency is OK (or
strong consistency with performance impact for many data
copies).
• You need an easy way to integrate real-time, analytics, and
search data.
• You need cost savings/a better ROI.
How Can I Try DataStax Enterprise?
• Go to
www.datastax.com/download.
• Download a copy of DataStax
Enterprise.
• Installs and configures in minutes.
• Completely free for development
evaluation (no trial time bombs,
etc.); subscription required for
production deployments.
For More Information
Thank You – Questions?
We power the big data applications
that transform business.

More Related Content

PPTX
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
PPTX
Webinar | Introducing DataStax Enterprise 4.6
PPTX
How much money do you lose every time your ecommerce site goes down?
PPTX
C*ollege Credit: Keep the DB, Lose the A
PPTX
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
PPTX
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
PPTX
How To Tell if Your Business Needs NoSQL
PPTX
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Webinar | Introducing DataStax Enterprise 4.6
How much money do you lose every time your ecommerce site goes down?
C*ollege Credit: Keep the DB, Lose the A
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
How To Tell if Your Business Needs NoSQL
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...

What's hot (20)

PPTX
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
PDF
Building a Digital Bank
PPTX
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
PPTX
Getting Big Value from Big Data
PPTX
Data Engineer's Lunch #55: Get Started in Data Engineering
PPTX
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
PPTX
The Microsoft BigData Story
PPTX
Hadoop vs. RDBMS for Advanced Analytics
PDF
DataStax Training – Everything you need to become a Cassandra Rockstar
PPTX
Webinar: Don't Leave Your Data in the Dark
PDF
Designing a modern data warehouse in azure
PPTX
How Glidewell Moves Data to Amazon Redshift
PPTX
Snowflake Datawarehouse Architecturing
PPTX
What's new in SQL Server 2016
PPTX
How jKool Analyzes Streaming Data in Real Time with DataStax
PDF
Data Lake and the rise of the microservices
PPTX
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
PPTX
Introducing Azure SQL Data Warehouse
PPTX
Introduction to DataStax Enterprise Graph Database
PDF
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Building a Digital Bank
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Getting Big Value from Big Data
Data Engineer's Lunch #55: Get Started in Data Engineering
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
The Microsoft BigData Story
Hadoop vs. RDBMS for Advanced Analytics
DataStax Training – Everything you need to become a Cassandra Rockstar
Webinar: Don't Leave Your Data in the Dark
Designing a modern data warehouse in azure
How Glidewell Moves Data to Amazon Redshift
Snowflake Datawarehouse Architecturing
What's new in SQL Server 2016
How jKool Analyzes Streaming Data in Real Time with DataStax
Data Lake and the rise of the microservices
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Introducing Azure SQL Data Warehouse
Introduction to DataStax Enterprise Graph Database
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Ad

Viewers also liked (20)

PPTX
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
PDF
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
PPTX
Webinar: Eventual Consistency != Hopeful Consistency
PPTX
Cassandra Community Webinar: Back to Basics with CQL3
PDF
Cassandra Community Webinar | In Case of Emergency Break Glass
PDF
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
PDF
Cassandra TK 2014 - Large Nodes
PPT
Webinar: 2 Billion Data Points Each Day
PPT
Webinar: Getting Started with Apache Cassandra
PPTX
Cassandra Community Webinar | Make Life Easier - An Introduction to Cassandra...
PPTX
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
PPTX
Webinar: Building Blocks for the Future of Television
PDF
Webinar: Diagnosing Apache Cassandra Problems in Production
PPTX
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
PDF
Cassandra Community Webinar: Apache Cassandra Internals
PDF
Cassandra Community Webinar | Become a Super Modeler
PPTX
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
PPT
Community Webinar: 15 Commandments of Cassandra DBAs
PDF
Cassandra Community Webinar | The World's Next Top Data Model
PDF
Shift: Real World Migration from MongoDB to Cassandra
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
Webinar: Eventual Consistency != Hopeful Consistency
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
Cassandra TK 2014 - Large Nodes
Webinar: 2 Billion Data Points Each Day
Webinar: Getting Started with Apache Cassandra
Cassandra Community Webinar | Make Life Easier - An Introduction to Cassandra...
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: Building Blocks for the Future of Television
Webinar: Diagnosing Apache Cassandra Problems in Production
ProtectWise Revolutionizes Enterprise Network Security in the Cloud with Data...
Cassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar | Become a Super Modeler
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
Community Webinar: 15 Commandments of Cassandra DBAs
Cassandra Community Webinar | The World's Next Top Data Model
Shift: Real World Migration from MongoDB to Cassandra
Ad

Similar to Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing How to Choose (20)

PDF
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
PPTX
DataStax
PDF
Top 5 Considerations for a Big Data Solution
PPTX
John Glendenning - Real time data driven services in the Cloud
PDF
Apache Cassandra: NoSQL in the enterprise
PDF
The Top 5 Factors to Consider When Choosing a Big Data Solution
PDF
Slides: Relational to NoSQL Migration
PDF
What is DataStax Enterprise?
PPTX
Datastax - Why Your RDBMS fails at scale
PPTX
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
PPTX
Essential Data Engineering for Data Scientist
PPT
Big data - Cassandra
PDF
20160331 sa introduction to big data pipelining berlin meetup 0.3
PPTX
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
PDF
CWIN17 Frankfurt / data_stax_personalisatontopowercx
PDF
The Evolution of Open Source Databases
PDF
State of Cassandra 2012
PPTX
The CIOs Guide to NoSQL
PPTX
NoSQL Architecture Overview
PDF
Data Con LA 2018 - Analyzing Movie Reviews using DataStax by Amanda Moran
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax
Top 5 Considerations for a Big Data Solution
John Glendenning - Real time data driven services in the Cloud
Apache Cassandra: NoSQL in the enterprise
The Top 5 Factors to Consider When Choosing a Big Data Solution
Slides: Relational to NoSQL Migration
What is DataStax Enterprise?
Datastax - Why Your RDBMS fails at scale
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Essential Data Engineering for Data Scientist
Big data - Cassandra
20160331 sa introduction to big data pipelining berlin meetup 0.3
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
CWIN17 Frankfurt / data_stax_personalisatontopowercx
The Evolution of Open Source Databases
State of Cassandra 2012
The CIOs Guide to NoSQL
NoSQL Architecture Overview
Data Con LA 2018 - Analyzing Movie Reviews using DataStax by Amanda Moran

More from DataStax (20)

PPTX
Is Your Enterprise Ready to Shine This Holiday Season?
PPTX
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
PPTX
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
PPTX
Best Practices for Getting to Production with DataStax Enterprise Graph
PPTX
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
PPTX
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
PDF
Webinar | Better Together: Apache Cassandra and Apache Kafka
PDF
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
PDF
Introduction to Apache Cassandra™ + What’s New in 4.0
PPTX
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
PPTX
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
PDF
Designing a Distributed Cloud Database for Dummies
PDF
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
PDF
How to Evaluate Cloud Databases for eCommerce
PPTX
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
PPTX
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
PPTX
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
PPTX
Datastax - The Architect's guide to customer experience (CX)
PPTX
An Operational Data Layer is Critical for Transformative Banking Applications
PPTX
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Is Your Enterprise Ready to Shine This Holiday Season?
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Best Practices for Getting to Production with DataStax Enterprise Graph
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | Better Together: Apache Cassandra and Apache Kafka
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Introduction to Apache Cassandra™ + What’s New in 4.0
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Designing a Distributed Cloud Database for Dummies
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Evaluate Cloud Databases for eCommerce
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Datastax - The Architect's guide to customer experience (CX)
An Operational Data Layer is Critical for Transformative Banking Applications
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
KodekX | Application Modernization Development
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Cloud computing and distributed systems.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
KodekX | Application Modernization Development
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Big Data Technologies - Introduction.pptx
Cloud computing and distributed systems.
Encapsulation_ Review paper, used for researhc scholars
Chapter 3 Spatial Domain Image Processing.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
MYSQL Presentation for SQL database connectivity
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction

Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing How to Choose

  • 1. ROI on Big Data: RDBMS, NoSQL or Both? Robin Schumacher VP Products, DataStax
  • 2. 3 Big Ideas for today’s conversation •Big data != big money •Big words require big back-up •All questions are big (and never foolish) •Exciting News: Launch of DataStax Enterprise 3.1
  • 3. Agenda: What Will We Cover? • Introduction to DataStax and NoSQL • Overview of legacy vs. modern, big data applications • Comparing RDBMS’s and NoSQL • Customers examples of RDBMS-to-NoSQL swap out’s and co-existence strategies • Conclusions
  • 4. Avoid Big Data FUD • Cost compared to what? • Value compared to what? • How to plan for success? http://www.informationweek.com/big-data/commentary/big-data-analytics/when-big-data-equals-big- money-waste/240157956
  • 5. DataStax: An Overview • Founded in April 2010 • We drive Apache Cassandra™, the popular open-source NoSQL database • We provide DataStax Enterprise for enterprise NoSQL implementations • 300+ customers • 100+ employees • Home to Apache Cassandra Chair & most committers • Headquartered in San Francisco Bay area • Funded by prominent venture firms
  • 6. What is Apache Cassandra? Datacenter Cloud Massively scalable NoSQL database Source: (http://www.datastax.com/resources/whitepapers/bigdata) And easy data distribution That offers uptime, all the time (continuous availability)
  • 7. What is DataStax Enterprise? DataStax Enterprise -- powered by Apache Cassandra™, certified for production 1. DataStax Enterprise Server 2. OpsCenter Enterprise 3. Expert Support & Services • Massive scalability • Continuous availability, and • Operational simplicity for real-time, analytic, and enterprise search data.
  • 8. Details of DataStax Enterprise Server • Production-certified version of Cassandra for online applications. • Integrated Hadoop for batch analytics. • Built-in Solr for enterprise search. • Comprehensive security for sensitive data. • Active everywhere architecture. • Gold standard for multi-data center and cloud deployments. • Built-in data replication; removes need for ETL. • Complete isolation between different workloads. • Methods for data migration from legacy RDBMS’s.
  • 9. Details of DataStax OpsCenter A new, 10-node Cassandra (or Hadoop) cluster with OpsCenter running in 3 minutes…A new, 10-node DSE cluster with OpsCenter running on AWS in 3 minutes… Done1 2 3
  • 10. Launch Today: DataStax Enterprise 3.1 • Lower Total Cost of Ownership • Better ROI • Simpler & faster development • Greater insight • More flexibility and functionality
  • 11. What’s New: Cassandra 1.2 Integration • Manage up to 10x more Cassandra data per node than prior versions for many use cases • Use vnodes and parallel operations to increase capacity and perform maintenance operations much faster • Get much greater functionality with new CQL binary protocol via Java and .NET drivers • Store arrays and lists of data much more easily with collections • Get deeper visibility into the response times of your queries and other database operations with tracing
  • 12. What’s New: Solr 4.3 Integration • 60+ new features • Even faster performance • Stability Improvements • New memory caches and memory monitoring • Easier customization with new pluggable document handling
  • 14. http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability- on.html Netflix Cloud Benchmark… “In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput.” Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable, et al., August 2013, p. 10. Benchmark paper presented at the Very Large Database Conference, 2013. http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2013.pdf End Point Independent NoSQL Benchmark Highest in throughput… Lowest in latency… Cassandra: NoSQL Performance Leader
  • 15. Use Cases Handled By DataStax Enterprise Managed by Cassandra Managed by Hadoop Managed by Solr • Time series data • Device/Sensor/Data “exhaust” systems • Distributed applications • Media streaming • Online Web retail (transactional, shopping carts, etc.) • Real-time data analytics • Social media capture and analysis • Web click-stream analysis • Write-intensive transactional systems • Buyer behavior analytics • Compliance/regulatory analysis • Customer recommendation output • Fraud detection • Risk analysis • Sales program campaign analysis • Supply chain analytics • Batch Web clickstream analysis • General Web search • Web retail faceted (categorization) search • Search/hit prioritization and highlighting • Application log search and analysis • Document (PDF, MS Word, etc.) search and analysis • Geospatial search • Real estate location and property search • Social media match ups
  • 16. NoSQL Momentum “The economics don’t look great for Oracle. According to analysis by Wikibon’s David Floyer (and highlighted in the Wall Street Journal), the NoSQL database market is expected to grow at a compound annual growth rate of nearly 60% between 2011 and 2017. The SQL slice of the Big Data market, in contrast, will grow at just a 26% CAGR during that same time period.”
  • 17. NoSQL Momentum “NoSQL is the stuff of the Internet Age.” - Andrew Oliver, InfoWorld
  • 18. Examples of Oracle RDBMS Replacements
  • 19. But does this mean the RDBMS is on the way out…? The truth is the vast majority of modern application architectures use both an RDBMS and NoSQL. The question is when and where should each be used?
  • 20. Legacy vs. Today’s Data Applications LOB App RDBMS Oracle LOB App RDBMS MySQL LOB App RDBMS SQL Server Data Warehouse RDBMS Teradata/ Column DB’s LOB App NoSQL LOB App NoSQL LOB App NoSQL C * C * C * C * C *C * C * C * C * C * C * C * C * C * C *C * C * C * C * C * C * C * C * C * C *C * C * C * C * C * Data Warehouse Hadoop Legacy Line-of- Business Apps Today’s Line-of- Business Apps
  • 21. Components of Legacy vs. Today’s Data Applications LOB App RDBMS Oracle LOB App RDBMS MySQL LOB App RDBMS SQL Server Data Warehouse RDBMS Teradata/ Column DB’s LOB App NoSQL LOB App NoSQL LOB App NoSQL C * C * C * C * C *C * C * C * C * C * C * C * C * C * C *C * C * C * C * C * C * C * C * C * C *C * C * C * C * C * Data Warehouse Hadoop Transactions: • LOB Style • Full consistency Analytics: • ROLAP • Rank • Windowing • Partition by, etc. Search • Full Text Transactions: • LOB Style • Tunable consistency Analytics: • MapReduce • Hive • Pig • Mahout Search • Solr Transactions: • DW style Analytics: • ROLAP • RANK • Windowing • Partition by, etc. Search • Full Text Transactions: • None Analytics: • MapReduce • Hive • Pig • Mahout Search • Solr
  • 22. Previous Generation vs. Modern Applications Slow/medium velocity data High velocity data Data coming in from one/few locations Data coming in from many locations Rigid, static structured data Flexible, fluid, multi-type data Low/medium data volumes; purge often High data volumes; retain forever Deploy app central location/ one server Deploy app everywhere / many servers Write data in one location Write data everywhere/anywhere Primary concern: scale reads Scale writes and reads Scale up for more users/data Scale out for more users/data Downtime tolerated Downtime not tolerated Legacy Applications Today’s Applications
  • 23. DataStax / Cassandra vs. Legacy RDBMS Fluid and flexible data model Rigid data model Easily supports modern data types Difficulty in supporting all datatypes Automatic data sharding/distribution Manual data sharding/distribution Multi-data center/cloud support Single DC with data shipping options Continuous availability Medium to high availability Read from anywhere Read from primary, possibly slaves Write data anywhere Write data to primary or specified shards AID transactions; tunable consistency ACID transactions Unlimited scale out for more capacity Limited scale up for capacity (out-reads) CQL for primary interface SQL for primary interface DataStax Enterprise/Cassandra Legacy RDBMS
  • 24. Business Catalysts For NoSQL - Do You Need To… …keep business always online and serving customers? …serve customers everywhere (i.e. in multiple locations)? …deliver information fast both internally and externally? …handle increasing customer demand? …protect information that runs the business? …make business decisions based on right information? …easily find needed information? …receive strong payback for IT investments?
  • 25. Keep Business Online Netflix systems are run in the cloud across multiple availability zones with Cassandra and sport constant uptime. Over 95% of Netflix’s data is stored in Cassandra (much of it previously on Oracle).
  • 26. Keep Business Online Commenting on Amazon outage in Oct 2012: “We configure all our clusters to use a replication factor of three, with each replica located in a different Availability Zone. This allowed Cassandra to handle the outage remarkably well. When a single zone became unavailable, we didn't need to do anything. Cassandra routed requests around the unavailable zone and when it recovered, the ring was repaired.” - Netflix Tech Blog
  • 27. Serve Customers Everywhere Rightscale keeps its customers in contact with each other all over the world via DataStax clusters in 5+ global data centers.
  • 28. Deliver Information Fast Everywhere Adobe delivers on very stringent response time requirements (< 12ms or less for 95% of requests) for its marketing cloud with DataStax clusters in two data centers.
  • 29. Handle Increasing Customer Demand Gnip delivers social media data to 95% of Fortune 500 by using DataStax Enterprise. Data velocity rates for Twitter alone can be 20,000 tweets per second.
  • 30. Handle Increasing Customer Demand Ooyala distributes and analyzes media/video content for companies like ESPN, Rolling Stone and others. They track about one quarter of all online video viewers each day and generate 1-2 billion events that are streaming in real-time through their DataStax cluster.
  • 32. Make Right Business Decisions “DataStax made it all work together” • Cassandra, Hadoop, Solr, Security Manage costs & improve performance • 400% ROI over five years • $750K five-year savings in support costs • 90% better response and upload time Analyzing Information • Doctors’ notes • Analyze notes to bill back Medicare / Medicaid
  • 33. Find Information Instantly Datafiniti, which is a search engine for data, needs to consume lots of data in real time and provide fast search on top of the same data.
  • 34. Get Strong Payback on IT Investment Constant Contact found that scaling out with NoSQL vs. IBM DB2 saved them 90% in software costs, and was implemented in 1/3 the time... “To do what we need to do today without Cassandra would cost a couple million dollars more and would be significantly harder to manage operationally.”
  • 36. When Legacy RDBMS over NoSQL/Cassandra • No need for a flexible data model; data is all structured and fits well within an RDBMS schema. • Data does not come in at high rates and the speed at which data is written is not important. • You need detailed/complex/nested ACID transactions. • All your data can fit into memory or reside on 1-2 machines and substantial growth is not expected. • You have no need for constant uptime; unexpected downtime has no/little impact. • You don’t need to distribute data to multiple locations, various cloud availability zones, or have multiple copies for disaster recovery purposes. • No need to integrate/seamlessly move data between real-time, analytics, and search systems. • Software costs not a concern.
  • 37. When DataStax/Cassandra Over Legacy RDBMS • You need a more flexible data model. • You have to store a variety of data types. • You need constant uptime/continuous availability. • You need to distribute data across multiple data centers or cloud availability zones. • You need linear scale-out performance for growing data. • You need very fast write capabilities. • You need to write and read data in multiple locations. • You need transactions but eventual consistency is OK (or strong consistency with performance impact for many data copies). • You need an easy way to integrate real-time, analytics, and search data. • You need cost savings/a better ROI.
  • 38. How Can I Try DataStax Enterprise? • Go to www.datastax.com/download. • Download a copy of DataStax Enterprise. • Installs and configures in minutes. • Completely free for development evaluation (no trial time bombs, etc.); subscription required for production deployments.
  • 40. Thank You – Questions? We power the big data applications that transform business.

Editor's Notes

  • #3: Point 1: Big data does not equal big money. In fact, choosing a NoSQL solution will almost certainly save your business money, in terms of hardware, licensing, and total cost of ownership. What&apos;s more, choosing the correct technology for your use case will almost certainly increase your top line as well.Point 2: Don’t settle for big words without big back-up. In this webinar and in lots of other materials on our web site, we&apos;ll back up what we say with customer case studies and lots of details. After today’s conversation, you’ll know the basics for growing your business in a profitable way. What&apos;s the use of growing your top line but outspending any gains on cumbersome, ineffective, outdated IT? We&apos;ll take you through the specific use cases and business models that are the best fit for NoSQL solutions.Point 3: No prior knowledge is required at this point. If you don&apos;t even know what RDBMS or NoSQL stand for, you are in the right place. Get your questions answered, and get your business on the right track to meeting your customers&apos; needs in today&apos;s data environment.
  • #5: Every once in a while a prominent information provider gets it wrong, and that was certainly the case with last week’s InformationWeek post. Writer Todd Holmes perpetuated some of the fears that you might have already experienced as you’ve looked into non-relational database technologies. We’ll dispell those concerns during today’s talk, and give you the information you need to take the right next steps for your business.
  • #8: DataStax Enterprise is an enterprise NoSQL platform built on Cassandra that lets you scale with no surprises and keep your applications running, no matter what. The platform gives you operational simplicity for real-time, analytic, and enterprise search data. Its components are the DataStax Enterprise Server, OpsCenter Enterprise, and Expert Support &amp; Services
  • #27: http://techblog.netflix.com/2012/10/post-mortem-of-october-222012-aws.html