SlideShare a Scribd company logo
Grab some
coffee and
enjoy the
pre-show
banter before
the top of the
hour!
The Briefing Room
Framing the Argument: How to Scale Faster with NoSQL
Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
Twitter Tag: #briefr The Briefing Room
  Reveal the essential characteristics of enterprise
software, good and bad
  Provide a forum for detailed analysis of today s innovative
technologies
  Give vendors a chance to explain their product to savvy
analysts
  Allow audience members to pose serious questions... and
get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
More Than } Way to Skin a Cat
NoSQL engines provide escape hatches
  Force-fitting all data into relational will fail, because:
Performance is ALWAYS important,
now more than ever
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
Twitter Tag: #briefr The Briefing Room
IBM Cloudant
  IBM Cloudant offers a non-relational, cloud-based
distributed database
  The product is based on Apache CouchDB and provides data
management, search, hosting, admin tools and analytics
Cloudant’s database-as-a-service is often used for web or
mobile application development
Twitter Tag: #briefr The Briefing Room
Guest: Ryan Millay
Ryan Millay started with IBM® Cloudant® in
May 2014 after three years as a software
engineer. Now he is part of the Field
Engineering team working on both pre- and
post-sales opportunities with a variety of
different accounts. He is also a member of
the Cloudant Local Services team to help
customers scope and install Cloudant’s on-
premises software. When not at Cloudant,
Ryan enjoys travelling, playing a round of
golf, or binging on the latest show on Netflix.
SQL to NoSQL: Top 5 Questions
Mike Broberg
Marketing Communications, Cloudant, IBM Cloud Data Services
Ryan Millay
Field Engineer, Cloudant, IBM Cloud Data Services
Agenda
11
•  About Cloudant
•  Top 5 Questions When Moving to NoSQL
•  Live Q&A
Housekeeping Notes
12
•  Today’s webcast is being recorded. We
will send you a link to the recording, a
link to the library and its code examples,
and a copy of the slide deck after the
presentation.
•  The webcast recording will be available
on our website: https://cloudant.com
•  If you would like to ask a question during
today’s presentation, please type in your
question using the GoToWebinar tool bar.
1. Why NoSQL?
13
But, What Is NoSQL, Really?
14
•  Umbrella term for databases using non-SQL query languages
•  Key-Value stores
•  Wide column stores
•  Document stores
•  Graph stores
•  Some also say "non-relational," because data is not
decomposed into separate tables, rows, and columns
•  As we’ll see, it’s still possible to represent relationships in NoSQL
•  The question is, are these relationships always necessary?
Schema Flexibility
15
•  Cloudant uses JavaScript Object Notation (JSON) as its data format
•  Cloudant is based on Apache CouchDB. In both systems, a "database" is simply
a collection of JSON documents
{
"docs": [
{
"_id": "df8cecd9809662d08eb853989a5ca2f2",
"_rev": "1-8522c9a1d9570566d96b7f7171623270",
"Movie_runtime": 162,
"Movie_rating": "PG-13",
"Person_name": "Zoe Saldana",
"Actor_actor_id": "0757855",
"Movie_genre": "AVYS",
"Movie_name": "Avatar",
"Actor_movie_id": "0499549",
"Movie_earnings_rank": "1",
"Person_pob": "New Jersey, USA",
"Person_id": "0757855",
"Movie_id": "0499549",
"Movie_year": 2009,
"Person_dob": "1978-06-19"
}
]
}
Horizontal Scaling
16
•  Many commodity servers vs. few expensive ones
•  Performance improves linearly with cost, not exponentially
Master-Master Replication
•  Or "masterless replica architecture"
•  Minimize latency by putting data close to users
•  Replicate data widely to mitigate disasters
•  Cloudant excels at data movement
2. Rows and Tables Become ... What?
17
... This!
SQL Terms/Concepts
database -->
table -->
row -->
column -->
materialized view -->
primary key -->
table JOIN operations -->
Document Store Terms/Concepts
database
bunch of documents
document
field
index/database view/secondary index
"_id":
entity relations
18
Rows --> Documents
19
•  Use some field to group documents by schema
•  Example: "type":"user" or "type":"edge:follower"
Tables --> Databases
•  Put all tables in one database; use "type": to distinguish
•  Model entity relationships with secondary indexes
•  More on this later in the webinar
•  If you're curious, we're talking about concepts described in the
CouchDB documentation on entity relations
•  http://wiki.apache.org/couchdb/EntityRelationship
Indexes and Queries
20
•  An "index" in Cloudant is not strictly a performance optimization
•  Instead, more akin to "materialized view" in RDBMS terms
•  Index also called a "database view" in Cloudant
•  Index, then query.
•  You need one before you can do the other
•  Create index, then query by URL
•  Can create a secondary index on any field within a document
•  You get primary index (based on reserved "_id": field) by default
•  Indexes precomputed, updated in real time
•  Performant at big-honkin' scale
3. Will I Have to Rebuild My App?
21
Yes
22
By ripping out the bad parts:
•  Extract, Transform, Load
•  Schema migrations
•  JOINs that don't scale
A little more work up-front, but your application will adapt to scale
much better
4. So Each of My Tables Becomes a
Different Type of JSON Document?
23
No
24
•  Fancy explanation:
•  Best practice is to denormalize data into 3rd normal form
•  Or, less fancy:
•  Smoosh relationships for each entry all together into one JSON doc
•  Denormalization
•  Approach to data modeling that shards well and scales well
•  Works well with data that is somewhat static, or infrequently updated
Static Data Example: TV Cast Members
http://www.sarahmei.com/blog/
2013/11/11/why-you-should-
never-use-mongodb/
25
What Doesn't Scale
26
•  RDBMS JOINs across shards
•  Presumably across different machines
•  Common pain point when scaling RDBMS
What Does Scale
•  Denormalized data models + modern
distributed systems
•  More efficient to distribute data if it's already
in one compact unit
5. But What if I Need Relationships? Can
Cloudant Do JOINs?
27
Yes ... But First, Don't Do This
Relationships as single documents
28
http://www.sarahmei.com/blog/
2013/11/11/why-you-should-never-use-
mongodb/
Some "Key" Concepts
29
•  Inject logic into "_id": field to enforce uniqueness
•  Example: "_id":"<course>-<student>" ensures at most one
document per course per student
•  Give your documents a "type": field
•  Add relations as separate "edge" documents
•  Exploit powerful materialized view engine
Preview: Defining an Index/View
30
•  This design document (built in Cloudant Web dashboard)
encapsulates everything that follows
•  It builds our secondary index/database view, which we will soon query
•  It's the incremental MapReduce view engine we cited earlier
•  https://webinar.cloudant.com/relational/_design/join
Sample Related Data: Twitter
31
User documents flexible & straightforward
How Do We Deal With Followers?
32
a.  Update each user document with a list
b.  Create relation documents and "join"
E.g., Follower Graph
33
Relationships as Documents
34
Goal: Materialize Users & Following List
35
"join" by selecting rows at lines 103–105
Index Sorting Rules
36
http://wiki.apache.org/couchdb/View_collation
Materialize Users, With All Followed
37
Materialize Users, With All Followed
38
Let's Query That View
39
https://webinar.cloudant.com/relational/_design/join/_view/follows?
startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]
System-generated
unique doc "_id":
Sort key Pointer to related
followed user's
doc "_id":
Let's Query
That View, and
Follow Pointers
40
https://webinar.cloudant.com/relational/_design/join/_view/follows?
startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]&include_docs=true
Wait. What Did We Get?
41
•  kocolosk’s USER document
•  list of all USERs kocolosk FOLLOWS
•  full USER document for all USERs that kocolosk FOLLOWS
•  In a fast, single query
Legal Slide #1
42
© "Apache", "CouchDB", "Apache CouchDB", "Apache Lucene," "Lucene", and the CouchDB logo are trademarks or registered
trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.
Legal Slide #2
43
© Copyright IBM Corporation 2015.
IBM and the IBM Cloudant logo are trademarks of International Business Machines Corp., registered in many
jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A
current list of IBM trademarks is available on the Web at "Copyright and trademark information" at ibm.com/legal/
copytrade.shtml
Thank You
@cloudant
mbroberg@us.ibm.com
rmillay@us.ibm.com
Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Robin Bloor
Robin Bloor, PhD
Database is Being Disrupted
u  Data volumes
u  Speed of arrival
u  Content data (JSON)
u  IOT data
u  Cloud deployment
u  Schema on read
u  Memory for disk
u  Analytic workloads
THIS IS A PERFECT
STORM OF A KIND
What Is a Database?
A database is software that presides over a heap
of data that:
u  Implements a data model
u  Manages multiple concurrent requests for data
u  Implements a security model
u  Is ACID compliant (?)
u  Is resilient
RDBMS
Databases that:
u  Assume you can represent all data in related
tables
u  Assume that you want to process data in a set-wise
manner
u  Can be used for many problems
u  Are absolutely not universal, hence:
•  The Null kluge
•  The impedance mismatch
•  BLOBS
•  OR Databases
Another Couple of Issues…
Programmers prefer JSON
The SEMANTICS of data
u  It is already beginning to look as though
graph databases are a separate category of
engine
u  The triple store tactic (representing data in
triples) is required for semantics, otherwise
meaning is limited
Data Access
In reality there is no
DATA ACCESS STANDARD
There are several different
approaches according to the
data model
u  How much evangelizing of JSON do you find it
necessary to do?
u  How swiftly do SQL developers adjust to JSON?
u  JOINs are performance hogs in all database
systems. Please explain why you think they are
more economic with Cloudant.
u  Does Cloudant scale better than, say, a column
store SQL model?
u  Can you explain the tuning and other DBA
activities with Cloudant?
u  Is recovery the same as with RDBMS?
u  What is the database size of your largest
customer (users, data volume)?
Twitter Tag: #briefr The Briefing Room
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
THANK YOU
for your
ATTENTION!
Some images provided courtesy of
Wikimedia Commons

More Related Content

PDF
Troubleshooting: The Two Laws - IXIASOFT User Conference 2016
PDF
facebook architecture for 600M users
PDF
Prototyping like it is 2022
PPTX
Big Data and NoSQL for Database and BI Pros
PPTX
SPS Monaco 2017 - The Lay of the Land of Client-Side Development circa 2017
PPTX
Relational databases vs Non-relational databases
PPTX
Spsbe 18-04-15 - should i move my network folders to office 365
PDF
AD1542 Get Hands On With Bluemix
Troubleshooting: The Two Laws - IXIASOFT User Conference 2016
facebook architecture for 600M users
Prototyping like it is 2022
Big Data and NoSQL for Database and BI Pros
SPS Monaco 2017 - The Lay of the Land of Client-Side Development circa 2017
Relational databases vs Non-relational databases
Spsbe 18-04-15 - should i move my network folders to office 365
AD1542 Get Hands On With Bluemix

What's hot (20)

PPTX
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
PPTX
Inside the mind of a SharePoint Solutions Architect
PDF
Facebook Architecture - Breaking it Open
PPTX
SEF2013 - Create a Business Solution, Step by Step, with No Managed Code
PDF
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
PPTX
Stop SharePoint Project Failure
PPT
RDBMS vs NoSQL
PPTX
Big Data Strategy for the Relational World
PPTX
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
PDF
Web Services PHP Tutorial
PPTX
A SharePoint File Migration Framework
PDF
Contours of DITA 2.0
PDF
Your Future HTML: The Evolution of Site Design with Web Components
PPTX
SQLCAT: A Preview to PowerPivot Server Best Practices
PDF
Getting started with SharePoint REST API in custom SharePoint workflows Resto...
PPTX
Portal / BI 2008 Presentation by Ted Tschopp
PPTX
Getting Everything You want Out of SharePoint
PPTX
To SQL or NoSQL, that is the question
PPTX
Deploying and Managing PowerPivot for SharePoint
PDF
 Active Storage - Modern File Storage? 
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Inside the mind of a SharePoint Solutions Architect
Facebook Architecture - Breaking it Open
SEF2013 - Create a Business Solution, Step by Step, with No Managed Code
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Stop SharePoint Project Failure
RDBMS vs NoSQL
Big Data Strategy for the Relational World
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
Web Services PHP Tutorial
A SharePoint File Migration Framework
Contours of DITA 2.0
Your Future HTML: The Evolution of Site Design with Web Components
SQLCAT: A Preview to PowerPivot Server Best Practices
Getting started with SharePoint REST API in custom SharePoint workflows Resto...
Portal / BI 2008 Presentation by Ted Tschopp
Getting Everything You want Out of SharePoint
To SQL or NoSQL, that is the question
Deploying and Managing PowerPivot for SharePoint
 Active Storage - Modern File Storage? 
Ad

Viewers also liked (17)

PDF
Crawl, Walk, Run: How to Get Started with Hadoop
PPTX
DisrupTech 2015ek
PDF
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
PDF
Deeper Questions: How Interactive Visualization Empowers Analysts
PDF
The Hadoop Guarantee: Keeping Analytics Running On Time
PDF
Moving Targets: Harnessing Real-time Value from Data in Motion
PDF
A Connected Data Landscape: Virtualization and the Internet of Things
PDF
DisrupTech - Dave Duggal
PDF
Big Data Enabled: How YARN Changes the Game
PDF
The Biggest Picture: Situational Awareness on a Global Level
PDF
The Perfect Fit: Scalable Graph for Big Data
PDF
Time Difference: How Tomorrow's Companies Will Outpace Today's
PDF
Structurally Sound: How to Tame Your Architecture
PDF
Presumption of Abundance: Architecting the Future of Success
PDF
Achieving Business Value by Fusing Hadoop and Corporate Data
PPTX
Modus Operandi
PDF
Data Wrangling and the Art of Big Data Discovery
Crawl, Walk, Run: How to Get Started with Hadoop
DisrupTech 2015ek
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Deeper Questions: How Interactive Visualization Empowers Analysts
The Hadoop Guarantee: Keeping Analytics Running On Time
Moving Targets: Harnessing Real-time Value from Data in Motion
A Connected Data Landscape: Virtualization and the Internet of Things
DisrupTech - Dave Duggal
Big Data Enabled: How YARN Changes the Game
The Biggest Picture: Situational Awareness on a Global Level
The Perfect Fit: Scalable Graph for Big Data
Time Difference: How Tomorrow's Companies Will Outpace Today's
Structurally Sound: How to Tame Your Architecture
Presumption of Abundance: Architecting the Future of Success
Achieving Business Value by Fusing Hadoop and Corporate Data
Modus Operandi
Data Wrangling and the Art of Big Data Discovery
Ad

Similar to Framing the Argument: How to Scale Faster with NoSQL (20)

PPTX
SQL to NoSQL: Top 6 Questions
PDF
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
PPT
Uklug 2014 connections dev faq
PPTX
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
PPTX
SQL To NoSQL - Top 6 Questions Before Making The Move
PDF
IBM - Introduction to Cloudant
PDF
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
PPTX
CQRS recipes or how to cook your architecture
PDF
Software Architecture and Architectors: useless VS valuable
PPTX
NoSQLDatabases
PDF
DB2 and PHP in Depth on IBM i
PDF
"It’s not only Lambda! Economics behind Serverless" at Serverless Architectur...
PPTX
How to Survive as a Data Architect in a Polyglot Database World
PPTX
Untangling fall2017 week1
PDF
Architecture by Accident
PPTX
Untangling the web11
PPTX
Building FoundationDB
PPTX
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
PDF
The View - Leveraging Lotuscript for Database Connectivity
PPTX
The convergence of reporting and interactive BI on Hadoop
SQL to NoSQL: Top 6 Questions
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Uklug 2014 connections dev faq
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
SQL To NoSQL - Top 6 Questions Before Making The Move
IBM - Introduction to Cloudant
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
CQRS recipes or how to cook your architecture
Software Architecture and Architectors: useless VS valuable
NoSQLDatabases
DB2 and PHP in Depth on IBM i
"It’s not only Lambda! Economics behind Serverless" at Serverless Architectur...
How to Survive as a Data Architect in a Polyglot Database World
Untangling fall2017 week1
Architecture by Accident
Untangling the web11
Building FoundationDB
Building your first Analysis Services Tabular BI Semantic model with SQL Serv...
The View - Leveraging Lotuscript for Database Connectivity
The convergence of reporting and interactive BI on Hadoop

More from Inside Analysis (20)

PDF
An Ounce of Prevention: Forging Healthy BI
PDF
Agile, Automated, Aware: How to Model for Success
PDF
First in Class: Optimizing the Data Lake for Tighter Integration
PDF
Fit For Purpose: Preventing a Big Data Letdown
PDF
To Serve and Protect: Making Sense of Hadoop Security
PDF
Introducing: A Complete Algebra of Data
PDF
The Role of Data Wrangling in Driving Hadoop Adoption
PDF
Ahead of the Stream: How to Future-Proof Real-Time Analytics
PDF
All Together Now: Connected Analytics for the Internet of Everything
PDF
SQL In Hadoop: Big Data Innovation Without the Risk
PDF
A Revolutionary Approach to Modernizing the Data Warehouse
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
PDF
Rethinking Data Availability and Governance in a Mobile World
PPTX
Phasic Systems - Dr. Geoffrey Malafsky
PPT
Red Hat - Sarangan Rangachari
PPTX
WebAction-Sami Abkay
PPTX
DisrupTech - Robin Bloor (2)
PPTX
DisrupTech - Robin Bloor (1)
PDF
Big Data Refinery: Distilling Value for User-Driven Analytics
PDF
Understanding What’s Possible: Getting Business Value from Big Data Quickly
An Ounce of Prevention: Forging Healthy BI
Agile, Automated, Aware: How to Model for Success
First in Class: Optimizing the Data Lake for Tighter Integration
Fit For Purpose: Preventing a Big Data Letdown
To Serve and Protect: Making Sense of Hadoop Security
Introducing: A Complete Algebra of Data
The Role of Data Wrangling in Driving Hadoop Adoption
Ahead of the Stream: How to Future-Proof Real-Time Analytics
All Together Now: Connected Analytics for the Internet of Everything
SQL In Hadoop: Big Data Innovation Without the Risk
A Revolutionary Approach to Modernizing the Data Warehouse
The Maturity Model: Taking the Growing Pains Out of Hadoop
Rethinking Data Availability and Governance in a Mobile World
Phasic Systems - Dr. Geoffrey Malafsky
Red Hat - Sarangan Rangachari
WebAction-Sami Abkay
DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (1)
Big Data Refinery: Distilling Value for User-Driven Analytics
Understanding What’s Possible: Getting Business Value from Big Data Quickly

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
KodekX | Application Modernization Development
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Cloud computing and distributed systems.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Big Data Technologies - Introduction.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KodekX | Application Modernization Development
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Cloud computing and distributed systems.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

Framing the Argument: How to Scale Faster with NoSQL

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. The Briefing Room Framing the Argument: How to Scale Faster with NoSQL
  • 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4. Twitter Tag: #briefr The Briefing Room   Reveal the essential characteristics of enterprise software, good and bad   Provide a forum for detailed analysis of today s innovative technologies   Give vendors a chance to explain their product to savvy analysts   Allow audience members to pose serious questions... and get answers! Mission
  • 5. Twitter Tag: #briefr The Briefing Room Topics March: BI/ANALYTICS April: BIG DATA May: CLOUD
  • 6. Twitter Tag: #briefr The Briefing Room More Than } Way to Skin a Cat NoSQL engines provide escape hatches   Force-fitting all data into relational will fail, because: Performance is ALWAYS important, now more than ever
  • 7. Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  • 8. Twitter Tag: #briefr The Briefing Room IBM Cloudant   IBM Cloudant offers a non-relational, cloud-based distributed database   The product is based on Apache CouchDB and provides data management, search, hosting, admin tools and analytics Cloudant’s database-as-a-service is often used for web or mobile application development
  • 9. Twitter Tag: #briefr The Briefing Room Guest: Ryan Millay Ryan Millay started with IBM® Cloudant® in May 2014 after three years as a software engineer. Now he is part of the Field Engineering team working on both pre- and post-sales opportunities with a variety of different accounts. He is also a member of the Cloudant Local Services team to help customers scope and install Cloudant’s on- premises software. When not at Cloudant, Ryan enjoys travelling, playing a round of golf, or binging on the latest show on Netflix.
  • 10. SQL to NoSQL: Top 5 Questions Mike Broberg Marketing Communications, Cloudant, IBM Cloud Data Services Ryan Millay Field Engineer, Cloudant, IBM Cloud Data Services
  • 11. Agenda 11 •  About Cloudant •  Top 5 Questions When Moving to NoSQL •  Live Q&A
  • 12. Housekeeping Notes 12 •  Today’s webcast is being recorded. We will send you a link to the recording, a link to the library and its code examples, and a copy of the slide deck after the presentation. •  The webcast recording will be available on our website: https://cloudant.com •  If you would like to ask a question during today’s presentation, please type in your question using the GoToWebinar tool bar.
  • 14. But, What Is NoSQL, Really? 14 •  Umbrella term for databases using non-SQL query languages •  Key-Value stores •  Wide column stores •  Document stores •  Graph stores •  Some also say "non-relational," because data is not decomposed into separate tables, rows, and columns •  As we’ll see, it’s still possible to represent relationships in NoSQL •  The question is, are these relationships always necessary?
  • 15. Schema Flexibility 15 •  Cloudant uses JavaScript Object Notation (JSON) as its data format •  Cloudant is based on Apache CouchDB. In both systems, a "database" is simply a collection of JSON documents { "docs": [ { "_id": "df8cecd9809662d08eb853989a5ca2f2", "_rev": "1-8522c9a1d9570566d96b7f7171623270", "Movie_runtime": 162, "Movie_rating": "PG-13", "Person_name": "Zoe Saldana", "Actor_actor_id": "0757855", "Movie_genre": "AVYS", "Movie_name": "Avatar", "Actor_movie_id": "0499549", "Movie_earnings_rank": "1", "Person_pob": "New Jersey, USA", "Person_id": "0757855", "Movie_id": "0499549", "Movie_year": 2009, "Person_dob": "1978-06-19" } ] }
  • 16. Horizontal Scaling 16 •  Many commodity servers vs. few expensive ones •  Performance improves linearly with cost, not exponentially Master-Master Replication •  Or "masterless replica architecture" •  Minimize latency by putting data close to users •  Replicate data widely to mitigate disasters •  Cloudant excels at data movement
  • 17. 2. Rows and Tables Become ... What? 17
  • 18. ... This! SQL Terms/Concepts database --> table --> row --> column --> materialized view --> primary key --> table JOIN operations --> Document Store Terms/Concepts database bunch of documents document field index/database view/secondary index "_id": entity relations 18
  • 19. Rows --> Documents 19 •  Use some field to group documents by schema •  Example: "type":"user" or "type":"edge:follower" Tables --> Databases •  Put all tables in one database; use "type": to distinguish •  Model entity relationships with secondary indexes •  More on this later in the webinar •  If you're curious, we're talking about concepts described in the CouchDB documentation on entity relations •  http://wiki.apache.org/couchdb/EntityRelationship
  • 20. Indexes and Queries 20 •  An "index" in Cloudant is not strictly a performance optimization •  Instead, more akin to "materialized view" in RDBMS terms •  Index also called a "database view" in Cloudant •  Index, then query. •  You need one before you can do the other •  Create index, then query by URL •  Can create a secondary index on any field within a document •  You get primary index (based on reserved "_id": field) by default •  Indexes precomputed, updated in real time •  Performant at big-honkin' scale
  • 21. 3. Will I Have to Rebuild My App? 21
  • 22. Yes 22 By ripping out the bad parts: •  Extract, Transform, Load •  Schema migrations •  JOINs that don't scale A little more work up-front, but your application will adapt to scale much better
  • 23. 4. So Each of My Tables Becomes a Different Type of JSON Document? 23
  • 24. No 24 •  Fancy explanation: •  Best practice is to denormalize data into 3rd normal form •  Or, less fancy: •  Smoosh relationships for each entry all together into one JSON doc •  Denormalization •  Approach to data modeling that shards well and scales well •  Works well with data that is somewhat static, or infrequently updated
  • 25. Static Data Example: TV Cast Members http://www.sarahmei.com/blog/ 2013/11/11/why-you-should- never-use-mongodb/ 25
  • 26. What Doesn't Scale 26 •  RDBMS JOINs across shards •  Presumably across different machines •  Common pain point when scaling RDBMS What Does Scale •  Denormalized data models + modern distributed systems •  More efficient to distribute data if it's already in one compact unit
  • 27. 5. But What if I Need Relationships? Can Cloudant Do JOINs? 27
  • 28. Yes ... But First, Don't Do This Relationships as single documents 28 http://www.sarahmei.com/blog/ 2013/11/11/why-you-should-never-use- mongodb/
  • 29. Some "Key" Concepts 29 •  Inject logic into "_id": field to enforce uniqueness •  Example: "_id":"<course>-<student>" ensures at most one document per course per student •  Give your documents a "type": field •  Add relations as separate "edge" documents •  Exploit powerful materialized view engine
  • 30. Preview: Defining an Index/View 30 •  This design document (built in Cloudant Web dashboard) encapsulates everything that follows •  It builds our secondary index/database view, which we will soon query •  It's the incremental MapReduce view engine we cited earlier •  https://webinar.cloudant.com/relational/_design/join
  • 31. Sample Related Data: Twitter 31 User documents flexible & straightforward
  • 32. How Do We Deal With Followers? 32 a.  Update each user document with a list b.  Create relation documents and "join"
  • 35. Goal: Materialize Users & Following List 35 "join" by selecting rows at lines 103–105
  • 37. Materialize Users, With All Followed 37
  • 38. Materialize Users, With All Followed 38
  • 39. Let's Query That View 39 https://webinar.cloudant.com/relational/_design/join/_view/follows? startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}] System-generated unique doc "_id": Sort key Pointer to related followed user's doc "_id":
  • 40. Let's Query That View, and Follow Pointers 40 https://webinar.cloudant.com/relational/_design/join/_view/follows? startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]&include_docs=true
  • 41. Wait. What Did We Get? 41 •  kocolosk’s USER document •  list of all USERs kocolosk FOLLOWS •  full USER document for all USERs that kocolosk FOLLOWS •  In a fast, single query
  • 42. Legal Slide #1 42 © "Apache", "CouchDB", "Apache CouchDB", "Apache Lucene," "Lucene", and the CouchDB logo are trademarks or registered trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.
  • 43. Legal Slide #2 43 © Copyright IBM Corporation 2015. IBM and the IBM Cloudant logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at ibm.com/legal/ copytrade.shtml
  • 45. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  • 47. Database is Being Disrupted u  Data volumes u  Speed of arrival u  Content data (JSON) u  IOT data u  Cloud deployment u  Schema on read u  Memory for disk u  Analytic workloads THIS IS A PERFECT STORM OF A KIND
  • 48. What Is a Database? A database is software that presides over a heap of data that: u  Implements a data model u  Manages multiple concurrent requests for data u  Implements a security model u  Is ACID compliant (?) u  Is resilient
  • 49. RDBMS Databases that: u  Assume you can represent all data in related tables u  Assume that you want to process data in a set-wise manner u  Can be used for many problems u  Are absolutely not universal, hence: •  The Null kluge •  The impedance mismatch •  BLOBS •  OR Databases
  • 50. Another Couple of Issues… Programmers prefer JSON The SEMANTICS of data u  It is already beginning to look as though graph databases are a separate category of engine u  The triple store tactic (representing data in triples) is required for semantics, otherwise meaning is limited
  • 51. Data Access In reality there is no DATA ACCESS STANDARD There are several different approaches according to the data model
  • 52. u  How much evangelizing of JSON do you find it necessary to do? u  How swiftly do SQL developers adjust to JSON? u  JOINs are performance hogs in all database systems. Please explain why you think they are more economic with Cloudant. u  Does Cloudant scale better than, say, a column store SQL model?
  • 53. u  Can you explain the tuning and other DBA activities with Cloudant? u  Is recovery the same as with RDBMS? u  What is the database size of your largest customer (users, data volume)?
  • 54. Twitter Tag: #briefr The Briefing Room
  • 55. Twitter Tag: #briefr The Briefing Room Upcoming Topics www.insideanalysis.com March: BI/ANALYTICS April: BIG DATA May: CLOUD
  • 56. Twitter Tag: #briefr The Briefing Room THANK YOU for your ATTENTION! Some images provided courtesy of Wikimedia Commons