SlideShare a Scribd company logo
One Size Doesn’t Fit All
Choosing which big data,
NoSQL or database
technology to use

March 14, 2012

Mark R. Madsen
http://ThirdNature.net
The problem of “big” is three problems of volume

  Computations!




                                         Number
                          Amount         of users!
                          of data!
Big data?




      Unstructured data isn’t 
      really unstructured.
      The problem is that this 
      data is unmodeled.
      The real challenge is 
      complexity.
The holy grail of databases under current market hype




A key problem is that we’re 
talking mostly about 
computation over data 
when we talk about “big 
data” and analytics, a 
potential mismatch for 
both relational and nosql.
Solving the Problem Depends on the Diagnosis
You must understand your 
workload ‐ throughput and 
response time requirements 
aren’t enough.
  ▪ 100 simple queries accessing 
    month‐to‐date data
  ▪ 90 simple queries accessing 
    month‐to‐date data plus 10 
    complex queries using two 
    years of history
  ▪ Hazard calculation for the 
    entire customer master
  ▪ Performance problems are 
    rarely due to a single factor. 
Workload: One big query or many small queries?




Retrieval: small return set or large?
Selectivity: large volume of data scanned or small?
Important workload parameters to know
• Read‐intensive  vs. write‐intensive
Important workload parameters to know
• Read‐intensive  vs. write‐intensive
• Mutable vs. immutable data
Important workload parameters to know
• Read‐intensive  vs. write‐intensive
• Mutable vs. immutable data
• Immediate vs. eventual consistency
Important workload parameters to know
• Read‐intensive  vs. write‐intensive
• Mutable vs. immutable data
• Immediate vs. eventual consistency
• Short vs. long access latency
Important workload parameters to know
• Read‐intensive  vs. write‐intensive
• Mutable vs. immutable data
• Immediate vs. eventual consistency
• Short vs. long access latency
• Predictable vs. unpredictable data access patterns
Types of workloads
Write‐biased:                Read‐biased:
  ▪ OLTP                       ▪ Query
  ▪ OLTP, batch                ▪ Query, simple retrieval
  ▪ OLTP, lite                 ▪ Query, complex
  ▪ Object persistence         ▪ Query‐hierarchical / 
  ▪ Data ingest, batch           object / network
  ▪ Data ingest, real‐time     ▪ Analytic


                        Mixed?
      Inline analytic execution, operational BI
Matching to parameters, at assumption of data scale
Workload      Write‐   Read‐ Updateable Eventual     Un‐         Compute
parameters    biased   biased data      consistency  predictable intensive
                                        ok           query path
Standard 
RDBMS
Parallel
RDBMS
NoSQL (kv,
dht, obj)
Hadoop*

Streaming 
database

    You see the problem: it’s an intersection of multiple parameters, and
    this chart only includes the first tier of parameters. Plus, workload
    factors can completely invert these general rules of thumb.
Matching to parameters, at assumption of data scale
Workload           Complex  Selective  Low latency  High          High ingest 
parameters         queries  queries    queries      concurrency   rate


Standard 
RDBMS
Parallel RDBMS


NoSQL (kv, dht, 
obj)
Hadoop

Streaming 
database

   You have to look at the combination of workload factors: data scale,
   concurrency, latency & response time, then chart the parameters.
Always build a proof of concept!
Image Attributions
Thanks to the people who supplied the images used in this presentation:

Holy Grail – © Monty Python Ltd.
Cupcakes – <lost attribution on Flickr>
rock‐fall‐roadblock.jpg ‐ http://www.flickr.com/photos/wsdot/4679360979/
roadblock‐sheep.jpg ‐ http://www.flickr.com/photos/brizo_the_scot/4013939756/




                                                                                Slide 17
About the Presenter
                      Mark Madsen is president of Third
                      Nature, a technology research and
                      consulting firm focused on business
                      intelligence, analytics and
                      information management. Mark is an
                      award-winning author, architect and
                      former CTO whose work has been
                      featured in numerous industry
                      publications. During his career Mark
                      received awards from the American
                      Productivity & Quality Center, TDWI,
                      Computerworld and the Smithsonian
                      Institute. He is an international
                      speaker, contributing editor at
                      Intelligent Enterprise, and manages
                      the open source channel at the
                      Business Intelligence Network. For
                      more information or to contact Mark,
                      visit http://ThirdNature.net.
About Third Nature

Third Nature is a research and consulting firm focused on new and
emerging technology and practices in analytics, business intelligence, and
performance management. If your question is related to data, analytics,
information strategy and technology infrastructure then you‘re at the right
place.
Our goal is to help companies take advantage of information-driven
management practices and applications. We offer education, consulting
and research services to support business and IT organizations as well as
technology vendors.
We fill the gap between what the industry analyst firms cover and what IT
needs. We specialize in product and technology analysis, so we look at
emerging technologies and markets, evaluating technology and hw it is
applied rather than vendor market positions.

More Related Content

PDF
So you want to be a Data Scientist?
PDF
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
PPTX
PPTX
Road Map for Careers in Big Data
PDF
"Selling" Open Source 101
PDF
Data Scientist Toolbox
PDF
Data science vs. Data scientist by Jothi Periasamy
PPTX
Domino and AWS: collaborative analytics and model governance at financial ser...
So you want to be a Data Scientist?
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Road Map for Careers in Big Data
"Selling" Open Source 101
Data Scientist Toolbox
Data science vs. Data scientist by Jothi Periasamy
Domino and AWS: collaborative analytics and model governance at financial ser...

What's hot (20)

PDF
Introduction to Data Science (Data Summit, 2017)
PDF
Lecture2 big data life cycle
PPTX
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
PDF
Data Science: Harnessing Open Data for High Impact Solutions
PDF
Paving The Way To Data Driven
PPTX
Data Analytics
PPTX
Advanced Analytics and Data Science Expertise
PPTX
Introduction to data science
PPTX
Introduction to Big Data Analytics
PDF
Data science fin_tech_2016
DOCX
Datascienceindia article
PDF
The Big Data Dream Team
PPTX
Data Science Overview
PDF
Back to Square One: Building a Data Science Team from Scratch
PPTX
Introduction to Data Analytics
PPTX
BIG DATA and USE CASES
PDF
Data Architecture: OMG It’s Made of People
PPTX
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
PPT
Data analytics & its Trends
PPTX
000 introduction to big data analytics 2021
Introduction to Data Science (Data Summit, 2017)
Lecture2 big data life cycle
Making ‘Big Data’ Your Ally – Using data analytics to improve compliance, due...
Data Science: Harnessing Open Data for High Impact Solutions
Paving The Way To Data Driven
Data Analytics
Advanced Analytics and Data Science Expertise
Introduction to data science
Introduction to Big Data Analytics
Data science fin_tech_2016
Datascienceindia article
The Big Data Dream Team
Data Science Overview
Back to Square One: Building a Data Science Team from Scratch
Introduction to Data Analytics
BIG DATA and USE CASES
Data Architecture: OMG It’s Made of People
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data analytics & its Trends
000 introduction to big data analytics 2021
Ad

Similar to Choosing which big data, nosql or database technology to use (20)

PDF
Introduction to BigData
PDF
All Together Now: A Recipe for Successful Data Governance
PDF
The Role of Data Wrangling in Driving Hadoop Adoption
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
PDF
How Can Analytics Improve Business?
PPTX
Chapter 1 Introduction to Data Science (Computing)
PDF
Top 3 Interesting Careers in Big Data.pdf
PPTX
BDA 2012 Big data why the big fuss?
PPTX
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
PDF
Keyrus US Information
PDF
Keyrus US Information
PDF
TOUG Big Data Challenge and Impact
PDF
Building Data Science Teams
 
PPT
Delivering Value Through Business Analytics
PDF
Introduction to Data Science - Fundamentals
PDF
One Size Doesn't Fit All: The New Database Revolution
PPTX
1 UNIT-DSP.pptx
PPTX
CO1_Session_1&2 modified on introduction
PDF
Data Science Highlights
PDF
BAR360 open data platform presentation at DAMA, Sydney
Introduction to BigData
All Together Now: A Recipe for Successful Data Governance
The Role of Data Wrangling in Driving Hadoop Adoption
The Maturity Model: Taking the Growing Pains Out of Hadoop
How Can Analytics Improve Business?
Chapter 1 Introduction to Data Science (Computing)
Top 3 Interesting Careers in Big Data.pdf
BDA 2012 Big data why the big fuss?
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
Keyrus US Information
Keyrus US Information
TOUG Big Data Challenge and Impact
Building Data Science Teams
 
Delivering Value Through Business Analytics
Introduction to Data Science - Fundamentals
One Size Doesn't Fit All: The New Database Revolution
1 UNIT-DSP.pptx
CO1_Session_1&2 modified on introduction
Data Science Highlights
BAR360 open data platform presentation at DAMA, Sydney
Ad

More from mark madsen (20)

PDF
Solve User Problems: Data Architecture for Humans
PDF
The Black Box: Interpretability, Reproducibility, and Data Management
PDF
Operationalizing Machine Learning in the Enterprise
PDF
Building a Data Platform Strata SF 2019
PDF
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
PDF
Architecting a Platform for Enterprise Use - Strata London 2018
PDF
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
PDF
How to understand trends in the data & software market
PDF
Pay no attention to the man behind the curtain - the unseen work behind data ...
PDF
Assumptions about Data and Analysis: Briefing room webcast slides
PDF
Everything Has Changed Except Us: Modernizing the Data Warehouse
PDF
A Pragmatic Approach to Analyzing Customers
PDF
Disruptive Innovation: how do you use these theories to manage your IT?
PDF
Briefing room: An alternative for streaming data collection
PDF
Building the Enterprise Data Lake: A look at architecture
PDF
Briefing Room analyst comments - streaming analytics
PDF
Everything has changed except us
PDF
Bi isn't big data and big data isn't BI (updated)
PDF
On the edge: analytics for the modern enterprise (analyst comments)
PDF
Crossing the chasm with a high performance dynamically scalable open source p...
Solve User Problems: Data Architecture for Humans
The Black Box: Interpretability, Reproducibility, and Data Management
Operationalizing Machine Learning in the Enterprise
Building a Data Platform Strata SF 2019
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Platform for Enterprise Use - Strata London 2018
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
How to understand trends in the data & software market
Pay no attention to the man behind the curtain - the unseen work behind data ...
Assumptions about Data and Analysis: Briefing room webcast slides
Everything Has Changed Except Us: Modernizing the Data Warehouse
A Pragmatic Approach to Analyzing Customers
Disruptive Innovation: how do you use these theories to manage your IT?
Briefing room: An alternative for streaming data collection
Building the Enterprise Data Lake: A look at architecture
Briefing Room analyst comments - streaming analytics
Everything has changed except us
Bi isn't big data and big data isn't BI (updated)
On the edge: analytics for the modern enterprise (analyst comments)
Crossing the chasm with a high performance dynamically scalable open source p...

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Approach and Philosophy of On baking technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Spectroscopy.pptx food analysis technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
1. Introduction to Computer Programming.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
A Presentation on Artificial Intelligence
Machine Learning_overview_presentation.pptx
Programs and apps: productivity, graphics, security and other tools
Per capita expenditure prediction using model stacking based on satellite ima...
Approach and Philosophy of On baking technology
MIND Revenue Release Quarter 2 2025 Press Release
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Building Integrated photovoltaic BIPV_UPV.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectroscopy.pptx food analysis technology
Machine learning based COVID-19 study performance prediction
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
1. Introduction to Computer Programming.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia

Choosing which big data, nosql or database technology to use