SlideShare a Scribd company logo
This presentation includes information that is confidential and
proprietary to Basho Technologies and should not be forwarded or
distributed without Basho's prior written consent. © 2014. Basho
Technologies, Inc. All Rights Reserved.
This presentation includes information that is confidential and proprietary to
Basho Technologies and should not be forwarded or distributed without Basho's
prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved.
Matt Brender
Developer Advocate
Taming
Big Data with NoSQL
Relational databases are
not bad
Data Scientist
Big Data
Basho Confidential 3
Data Scientist
Big Data
4
Big Data is
Basho Confidential 5
6
And it’s a distributed
systems problem
Basho Confidential 7
Beyond the
Scope of
one tool
Beyond the
Scope of
one file
system
Beyond the
Scope of
one
database
8
Ergo, NoSQL
9
NoSQL is
10
11
For Good Reason
Basho Confidential 12
13
Consistency Level
Conflict Resolution
Partitioning Strategy
14
Consistency Level
Eventually Consistent
C = Consistency
A = Availability
P = Partition Tolerance
Client Client
DBDBDB
Network Partition
Cap theorem states that a distributed system can at most support
2 out of these 3 properties
16
Consistency Level
17
Conflict Resolution
Last Write Wins
vs.
Causal Context
18
Conflict Resolution
19
Partition Strategy
Master
Slave Slave Slave
OR
Node	
  1	
   Node	
  2	
   Node	
  3	
  
20
Partition Strategy
21
“data is powerful when
stored and analyzed”
Relational databases are not
bad
23
Storing
24
Report on this
Basho Confidential 25
Report on this
26
vs
27
A single scalable system
28
A single scalable system
29
Analyzing
What kind of questions
do you need to ask?
Basho Confidential 31
32
Error Analysis?
As simple as NoSQL + Solr
33
Patterns?
Machine Learning?
Multi-client writes to NoSQL
& HDFS
OR
35
NoSQL + ETL process =>
other datastore + Spark
and/or Hadoop M/R
OR
37
NoSQL & Apache Storm to
Kafka to HDFS
38
LAMDA
39
So we agree.
NoSQL is helpful.
Side Note: NoSQL is a
terrible term
42
In Review
43
You can’t analyze what you
don’t have.
And you don’t want an analysis
system to be unreliable.
THE COST OF DOWNTIME
44
Basho Confidential 45
46
everything works
at small scale
47
Nothing matters if..
Basho Confidential 48
•  Hadoop

A framework that allows for
the distributed processing of
large data sets across
clusters
•  Spark

A fast, general engine for
large-scale data processing
•  Storm

A distributed real-time
computation system
•  NoSQL

A collection of highly
scalable, highly available
systems that fall within CAP
theorem
•  Solr

Apache project for indexing
text for search
•  Kafka

Distributed scalable pub/sub
messaging queue
Summary
50
Thank You!
Matt Brender
@mjbrender

More Related Content

PDF
Reliable and confidential cloud storage with efficient data forwarding functi...
PPTX
Big Data Research Project Help
PDF
9 facts about statice's data anonymization solution
PDF
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
DOCX
A secure and dynamic multi keyword ranked
PDF
A secure and dynamic multi
DOC
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
DOC
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
Reliable and confidential cloud storage with efficient data forwarding functi...
Big Data Research Project Help
9 facts about statice's data anonymization solution
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
A secure and dynamic multi keyword ranked
A secure and dynamic multi
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...

What's hot (20)

PDF
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
PPTX
Fast raq a fast approach to range aggregate queries in big data environments
PDF
Efficient Privacy Preserving Clustering Based Multi Keyword Search
DOC
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
PDF
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
PDF
Efficient Similarity Search over Encrypted Data
DOCX
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
DOCX
a scalable two phase top down specialization approach for data anonymization ...
PPTX
Topic modeling using big data analytics
DOC
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
PPTX
Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...
DOCX
Enabling Efficient and Geometric Range Query with Access Control over Encrypt...
PDF
Privacy preserving machine learning
PDF
Nastel Solution for kafka Monitoring and Management
DOCX
Enabling Efficient and Geometric Range Query with Access Control over Encrypt...
PDF
Fluency Introduction Deck - October, 23, 2017
PPTX
Talk to NTU - Spark
PPTX
Is Spark the right choice for data analysis ?
PPTX
Overview for Hadoop Framework
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
Fast raq a fast approach to range aggregate queries in big data environments
Efficient Privacy Preserving Clustering Based Multi Keyword Search
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
Efficient Similarity Search over Encrypted Data
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...
a scalable two phase top down specialization approach for data anonymization ...
Topic modeling using big data analytics
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...
Enabling Efficient and Geometric Range Query with Access Control over Encrypt...
Privacy preserving machine learning
Nastel Solution for kafka Monitoring and Management
Enabling Efficient and Geometric Range Query with Access Control over Encrypt...
Fluency Introduction Deck - October, 23, 2017
Talk to NTU - Spark
Is Spark the right choice for data analysis ?
Overview for Hadoop Framework
Ad

Viewers also liked (20)

PDF
Taming the Big Data Beast - Together
PDF
Spark: Taming Big Data
PPT
Big Data
PPT
Big data ppt
PPTX
What is Big Data?
PPTX
Big data ppt
PDF
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
PDF
Big Data introduction - Café Numérique Bruxelles
PPTX
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
PPT
Big data introduction - Big Data from a Consulting perspective - Sogeti
PPTX
Introduction to Big Data
PPTX
Taming Big Data!
PDF
Big Data: an introduction
PDF
Introduction to big data
PPTX
Introduction to Big Data
PDF
Taming Social Data: How Social Data Framing liberates analysis and accelerate...
PDF
Big data Introduction by Mohan
PDF
Autonomic Computing: Vision or Reality - Presentation
PPT
Autonomic Computing (Basics) Presentation
PPTX
Installing windows 10
Taming the Big Data Beast - Together
Spark: Taming Big Data
Big Data
Big data ppt
What is Big Data?
Big data ppt
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Big Data introduction - Café Numérique Bruxelles
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Big data introduction - Big Data from a Consulting perspective - Sogeti
Introduction to Big Data
Taming Big Data!
Big Data: an introduction
Introduction to big data
Introduction to Big Data
Taming Social Data: How Social Data Framing liberates analysis and accelerate...
Big data Introduction by Mohan
Autonomic Computing: Vision or Reality - Presentation
Autonomic Computing (Basics) Presentation
Installing windows 10
Ad

Similar to Taming Big Data with NoSQL (20)

PPTX
Big Data Practice_Planning_steps_RK
PDF
BigData Behind-the-Scenes~20150827
ODP
Everything you always wanted to know about Distributed databases, at devoxx l...
PDF
Mongo DB
PDF
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
PDF
Apache Spark 101 - Demi Ben-Ari - Panorays
PPTX
Azure SQL Database & Azure SQL Data Warehouse
PDF
DSM - Comparison of Hbase and Cassandra
PPT
NoSQL Options Compared
PPTX
NoSQL databases - An introduction
PPTX
عصر کلان داده، چرا و چگونه؟
PPTX
PPT
Trouble with nosql_dbs
PDF
Lecture4 big data technology foundations
PDF
Introduction to Hadoop
PDF
Modeling data and best practices for the Azure Cosmos DB.
PDF
17 Things Developers Should Know About Databases
PPTX
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
PDF
flexpod_hadoop_cloudera
PDF
Hands-on with Apache Druid: Installation & Data Ingestion Steps
Big Data Practice_Planning_steps_RK
BigData Behind-the-Scenes~20150827
Everything you always wanted to know about Distributed databases, at devoxx l...
Mongo DB
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Apache Spark 101 - Demi Ben-Ari - Panorays
Azure SQL Database & Azure SQL Data Warehouse
DSM - Comparison of Hbase and Cassandra
NoSQL Options Compared
NoSQL databases - An introduction
عصر کلان داده، چرا و چگونه؟
Trouble with nosql_dbs
Lecture4 big data technology foundations
Introduction to Hadoop
Modeling data and best practices for the Azure Cosmos DB.
17 Things Developers Should Know About Databases
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
flexpod_hadoop_cloudera
Hands-on with Apache Druid: Installation & Data Ingestion Steps

More from Basho Technologies (11)

PPTX
Data Modeling IoT and Time Series data in NoSQL
PPTX
A Zen Journey to Database Management
PPTX
Vagrant up a Distributed Test Environment - Nginx Summit 2015
PPTX
O'Reilly Webinar: Simplicity Scales - Big Data
PPTX
A little about Message Queues - Boston Riak Meetup
PPTX
tecFinal 451 webinar deck
PPTX
NoSQL Implementation - Part 1 (Velocity 2015)
PPTX
Coding with Riak (from Velocity 2015)
PDF
Relational Databases to Riak
PPTX
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
KEY
Using Basho Bench to Load Test Distributed Applications
Data Modeling IoT and Time Series data in NoSQL
A Zen Journey to Database Management
Vagrant up a Distributed Test Environment - Nginx Summit 2015
O'Reilly Webinar: Simplicity Scales - Big Data
A little about Message Queues - Boston Riak Meetup
tecFinal 451 webinar deck
NoSQL Implementation - Part 1 (Velocity 2015)
Coding with Riak (from Velocity 2015)
Relational Databases to Riak
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
Using Basho Bench to Load Test Distributed Applications

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
KodekX | Application Modernization Development
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Electronic commerce courselecture one. Pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPT
Teaching material agriculture food technology
MIND Revenue Release Quarter 2 2025 Press Release
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
The AUB Centre for AI in Media Proposal.docx
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Electronic commerce courselecture one. Pdf
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
Spectroscopy.pptx food analysis technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Digital-Transformation-Roadmap-for-Companies.pptx
Teaching material agriculture food technology

Taming Big Data with NoSQL