SlideShare a Scribd company logo
NoSQL Data Modeling Using JSON Documents –
A Practical Approach
David Segleau
Dir.Technical Product Marketing
Couchbase
©2016 Couchbase Inc. 2
About the speaker – David Segleau
David Segleau
DirectorTechnical Product Marketing
Couchbase (since Nov 2015)
Experience:
- Database guy
- Couchbase, Oracle, Sleepycat, Informix, Illustra,Teradata
- Tech Marketing,VP Eng, Prod Mgmt, QA, Support,Training, Docs
- Technology is only useful when it’s deployed
- Expertise:
- Database server technology, RDBMS, and NoSQL
©2016 Couchbase Inc. 3
Today’s agenda
§ What is Couchbase?
§ Why NoSQL?
§ Identifying the right application
§ Modeling your data
§ Accessing your data
§ Migrating your data
§ Q & A
©2016 Couchbase Inc. 4
What is Couchbase?
Couchbase delivers the Data Platform for the Digital Economy
• Products: Couchbase Server & Couchbase Mobile
• Open source NoSQL, JSON document database
• Founded 2010
• 500+ enterprise customers, including 20+ Fortune 100
UNIFIED ADMINISTRATION
UNIFIED PROGRAMMING INTERFACE
Data Query Index SearchMobileReplication Analytics
{N1QL}
©2016 Couchbase Inc. 5
Who is using Couchbase?
6 of the top 10
ECOMMERCE
COMPANIES
IN THE US
3 of the 3
GDS COMPANIES
3 of the 10
AIRLINES
6 of the top 10
US & EUROPEAN
BROADCAST
COMPANIES
6 of the top 10
ONLINE CASINO
GAMING
COMPANIES
6 of the top 10
FIN SERVICES
COMPANIES
IN THE US
©2016 Couchbase Inc. 6
Who is using Couchbase?
§ Gannett, publisher of 90+ media properties, replaced relational database
technology with NoSQL to power its digital publishing platform.
§ eBay, with over 2 billion page views per day, uses Couchbase + RDBMS for
their Listing cache, and Couchbase as database of record forToken
management.
§ Cars.com, with over 30 million visits per month, replaced SQL Server with
NoSQL to store customer and vehicle data.
§ Marriott deployed NoSQL to modernize its hotel reservation system that
supports $38 billion in annual bookings.
§ Equifax uses Couchbase to generate insights from historic credit data,
leveraging the JSON documents to represent complex data objects without
normalization.
©2016 Couchbase Inc. 7
What is NoSQL?
§ No SQL?
§ Not only SQL?
üNon relational
§ Distributed (most)
– Scaled out, not up
• Elasticity and commodity hardware
– Partitioned and replicated
• Scalability, performance, availability
§ Schema-less (most)
– Flexible model
– JSON (some)
§ Multi-model
– Key-value & Document
– Columnar & Graph
– Graph & Key-value
©2016 Couchbase Inc. 8
Why are they using NoSQL?
Technology Drivers
§ Customers are going online
§ The internet is connecting everything
§ Big Data is getting bigger
§ Applications are moving to the cloud
§ The world has gone mobile
Technical Needs
§ Develop with agility
– Flexibility + Simplicity
– Easier + Faster
§ Operate at any scale
– Elasticity + Availability
– Performance at scale
– Always-on, global deployment
Business Needs
§ Innovate and compete
– Faster time to market
– Reduced costs (operational + hardware)
– Increased revenue
©2016 Couchbase Inc. 9
NoSQL vs. RDBMS
§ Replace or Complement? à It depends
– Replace: NoSQL is often the operational
database of record
– Complement: NoSQL adds perf, scale, and
availability to legacy RDBMS
§ Most customers use RDBMS and NoSQL
§ NoSQL is adding RDBMS features
– Security, Query Language, Analytics
§ RDBMS is adding NoSQL features
– Sharding, JSON, Distributed Processing
©2016 Couchbase Inc. 10
Why migrate from an RDBMS to NoSQL?
§ Easier to scale
3 nodes to 100s, 1 data center to many, commodity hardware
§ Better performance
Integrated caching, memory-optimized indexes, memory-based replication
§ Up to 40x lower cost
Open source, subscription-based, per instance (not per core)
§ Greater agility
JSON-based data model, SQL-based query language
§ Cross-platform
Runs onWindows or Linux (Red Hat, Ubuntu, Debian, etc.)
©2016 Couchbase Inc. 11
How do you get started?
1. Identify the right application
2. Model your data
3. Access your data
4. Migrate your data
5. Q&A
©2016 Couchbase Inc. 12
Identifying the right application
©2016 Couchbase Inc. 13
Identifying the right application
Have one or more of the following characteristics or requirements:
ü Innovate and iterate faster
ü Send and receive JSON
ü Provide low latency at any throughput
ü Support many concurrent users
ü Supports users anywhere and everywhere
ü Be available 24x7
ü Store terabytes of data
ü Read and write to multiple data centers
Service
RDBMS
Service Service
NoSQL
Application
Examples:
Ø High performance, high availability caching service
Ø Independent application with a narrow scope
Ø Logical or physical service within a large application
Ø Global service that powers multiple applications
©2016 Couchbase Inc. 14
Model your data
©2016 Couchbase Inc. 15
Demystifying terminology
Relational NoSQL (Couchbase)
Failover Cluster Cluster
Availability Group Cluster
Database Bucket
Table Bucket
Row (Tuple) Document (JSON)
Primary Key Object ID
IDENTITY or Sequence Counter
IndexedView View
SQL N1QL
©2016 Couchbase Inc. 16
Data Modeling Approaches
NoSQL
Relaxed Normalization
schema implied by structure
fields may be empty, duplicate, or missing
Relational
Required Normalization
schema enforced by DB
same fields in all records
• Minimize data inconsistencies (one item = one location)
• Reduced duplicated data
• Preserve storage resources
• Optimized based on access patterns
• Flexible, based on application requirements
• Supports clustered architecture
• Reduced server overhead
©2016 Couchbase Inc. 17
What and Why JSON?
17
• What is JSON?
– Schema flexibility
– Lightweight data interchange format
– Based on JavaScript
– Programming language independent
– Field names must be unique
• Why JSON?
– Less verbose
– Can represent Objects and Arrays
(including nested documents)
No impedance mismatch between a JSON Document and a Java Object
©2016 Couchbase Inc. 18
Modeling your data: Fixed vs. self-describing schema
©2016 Couchbase Inc. 19
Modeling your data:The flexibility of JSON
Same document type,
Different fields
• Different types
• Optional
• On demand
Tip: Add a version field to track changes.
{“docType”: “user”, “docVersion”: “1”, …}
{“docType”: “user”, “docVersion”: “2”, …}
©2016 Couchbase Inc. 20
Modeling your data: Changing the data model
Relational database
• Modify the database schema
• Modify the application code (e.g., Java)
• Modify the interface (e.g., HTML5/JS)
Document database
• Modify the interface (e.g., HTML5/JS)
©2016 Couchbase Inc. 21
Modeling your data: Object IDs
Best Practices
• Natural Keys
• Human Readable
• Deterministic
• Semantic
Examples
• author::shane
• author::shane::blogs
• blog::nosql_fueled_hadoop
• blog::nosql_fueled_hadoop::comments
What about identity columns?
1. Document<Long> nextAuthorIdDoc = bucket.counter(“authorIdCounter”, 1);
2. Long nextAuthorId = nextAuthorIdDoc.content();
3. String authDocId = “author::” + nextAuthorId; // author::101
Tip: Increment the counter by 10, 20, etc. instead of doing it for every insert.
©2016 Couchbase Inc. 22
Modeling your data: Relationships
Author
Blog (FK)Blog (FK)
Comment (FK) Comment (FK)
Author (FK x2)
BlogBlog (FK x2)
Comment Comment
Bottom up/”BelongsTo” Top down/”Has”
©2016 Couchbase Inc. 23
Modeling your data: Relationships - Related or Nested
©2016 Couchbase Inc. 24
Modeling your data: Strategies and best practices
If … Then …
Relationship is one-to-one or one-to-many Store related data as nested objects
Relationship is many-to-one or many-to-many Store related data as separate documents
Data reads are mostly parent fields Store children as separate documents
Data reads are mostly parent + child fields Store children as nested objects
Data writes are mostly parent or child (not both) Store children as separate documents
Data writes are mostly parent and child (both) Store children as nested objects
©2016 Couchbase Inc. 25
Modeling your data: Strategies and best practices
§ Are there a lot of concurrent writes, continuous updates?
§ Store children as separate documents
Blog
§ Thread
§ Comment
§ Comment
§ Thread
§ Comment
§ Comment
Blog
{
“docType”: “blog”,
“author”: “author::shane”,
“title”: “Couchbase Wins”,
“threads”: [
“blog::couchbase_wins::threads::001”,
“blog::couchbase_wins::threads::002”
}
Thread
{
“docType”: “thread”,
“comments”: [
{
“visitor”: “Brendan Bond”,
“text”: “This blog is amazing!”
“replies”: [
{
“user”: “Dustin Johnson”,
“text”: “No, it is not.”
}]
}
}
©2016 Couchbase Inc. 26
Some JSON Design Choices
26
• Couchbase Server neither enforces nor validates for any particular document
structure
• Choices that impact JSON document design:
– Single Root Attributes vs. Document type
– Objects vs. Arrays
– Array ElementTypes
– Timestamp Formats
– Property Names
– Empty and Null PropertyValuesVS Missing Properties
– JSON Schema Options
• See "Agile document modeling and data structures“ from Couchbase
Connect16 On-Demand Recordings
©2016 Couchbase Inc. 27
Access your data
©2016 Couchbase Inc. 28
Accessing your data: Options
Key-Value
(CRUD)
N1QL
(Query)
Views
(Query)
Documents
Indexes MapReduce
FullText
(Search)
Geospatial
(Search)
We’ll focus on N1QL ]for now.
Indexes MapReduce
©2016 Couchbase Inc. 29
Accessing your data – N1QL queries: Capabilities
Feature SQL N1QL
JOIN ✔ ✔
TRANSFORM ✔ ✔
FILTER ✔ ✔
AGGREGATE ✔ ✔
SORT ✔ ✔
SUBQUERIES ✔ ✔
PAGINATION ✔ ✔
OPERATORS ✔ ✔
FUNCTIONS ✔ ✔
©2016 Couchbase Inc. 30
Accessing your data: N1QL queries – referenced data
©2016 Couchbase Inc. 31
Accessing your data: N1QL queries – nested data
©2016 Couchbase Inc. 32
Accessing your data: N1QL queries – CRUD
©2016 Couchbase Inc. 33
Accessing your data: N1QL queries – indexes
Simple
Compound
Functional
Partial
©2016 Couchbase Inc. 34
Couchbase Index Options
34
IndexType Description
1 Primary Index Index on the document key on the whole bucket
2 Simple Index Index on the key-value or document-key
3 Composite Index Index on more than one key-value
4 Functional Index Index on function or expression on key-values
5 Partial Index Index subset of items in the bucket -- usesWHERE clause
6 Array Index Index individual elements of the arrays
7 Memory Optimized
Index
Index that is pinned in memory – defined when the cluster is configured
8 Covering Index Query able to resolve the query 100% within the index
9 Duplicate Index Ability to create a copy of the index on specific nodes within the cluster,
thereby providing load balancing and failover – usesWITH { “nodes”: } clause
©2016 Couchbase Inc. 35
Accessing your data: Indexing Considerations
Relational Couchbase
Indexes are synchronous, index & data are in
sync
Indexes are asynchronous, index updates lag
behind the data, application specifies read
consistency
Indexes slow down write operations Indexes do not affect write throughput
Index load balancing for queries can only be
implemented in the application
Index load balancing for queries is automatic,
based on index signature
Indexes contend with other memory usage
Memory Optimized indexes are pinned in
memory and provides low-latency, high
mutation throughput
©2016 Couchbase Inc. 36
Understanding your Query Plan: Explain
§ EXPLAIN shows the query plan, i.e exact steps how N1QL
plans to execute the query
cbq> EXPLAIN INSERT INTO default VALUES ("1", { "make" : "Toyota"});
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "ValueScan",
"values": "[["1", {""make"": "Toyota"}]]"
},
{
"#operator": "Parallel",
"maxParallelism": 1,
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "SendInsert",
©2016 Couchbase Inc. 37
Accessing your data: Strategies and best practices
Concept Strategies & Best Practices
Key-Value Operations provide the best
possible performance
• Create an effective key naming strategy
• Create an optimized data model
Incremental MapReduce (Views) are well
suited to aggregation
• Ideal for large data sets
• Data set can be used to create complex
view indexes
N1QL queries provide the most flexibility –
everything else
• Query data regardless of how it is modeled
• Remember to create secondary indexes,
leverage covering indexes where possible
©2016 Couchbase Inc. 38
Migrate your data
©2016 Couchbase Inc. 39
So many options! Remember the KISS principle
1) Identify the requirements
• ETL vs. Data cleanse vs. Data enrichment
• Duration vs. Resources
• Data governance
2) Pick your strategy
• Batch vs. Incremental
• Single threaded vs. multi-threaded
3) Pick your tools
• Data migration tools (Informatica, Looker,
Talend)
• BYO-tool (PHP & Python scripts, Hadoop, Spark)
• KISS with Couchbase
• Export to CVS; Import as documents; Use
N1QL to transform & insert into new
bucket
• Use SQL to transform & export; Insert into
Couchbase
• Best Practices
• Align with your data model
• Plan for failure (bad source data, hardware
failure, resource limitations)
• Ensure interruptible, restartable, logged,
predictable
©2016 Couchbase Inc. 40
How can you sync NoSQL and relational?
§ 1. Application Code (Manual)
§ 2. Replication (Automatic)
– From NoSQL to relational
– From relational to NoSQL
Couchbase
Kafka
Queue
Producer Consumer RDBMSDCP
Stream
RDBMS Handler CouchbaseGoldenGate
https://github.com/mahurtado/CouchbaseGoldenGateAdapter
©2016 Couchbase Inc. 41
Data Modeling Best Practices Recap
• Pick the right application
• Focus on SOA, application/use case specific
• Drive data model from data access patterns
• Use Document type,Versionid
• Create optimized, understandable keys
• Weigh nested, referenced or mixed designs
• Add indexes: Simple, Compound, Functional, Partial, Array, Covering, Memory
Optimized
• Match the data access method to requirements
• N1QL, Key-value,Views,
• Proof of Concept
• Focus, Success Criteria, Review Architecture
©2016 Couchbase Inc. 42
Questions?
©2016 Couchbase Inc. 43
Want to learn more?
Getting Started guide:
http://www.couchbase.com/get-started-developing-nosql
Download Couchbase software:
http://www.couchbase.com/nosql-databases/downloads
Free OnlineTraining
http://training.couchbase.com/online
“Why NoSQL” white paper
http://www.couchbase.com/nosql-resources/why-nosql
©2016 Couchbase Inc. 44
Additional Resources
44
§ General Docs: http://docs.couchbase.com
§ Developer Portal: http://developer.couchbase.com
§ Couchbase Labs: https://github.com/couchbaselabs
§ Query Portal: http://query.couchbase.com
§ Sample Applications:
§ https://github.com/couchbaselabs?utf8=%E2%9C%93&query=try
§ https://github.com/couchbaselabs?utf8=%E2%9C%93&query=beer
§ Blog: http://blog.couchbase.com
§ Forum: http://forums.couchbase.com
©2016 Couchbase Inc. 45
Additional Resources – Data Modeling
45
Webinar:The Why,When, and How of NoSQL: A Practical Approach
Webinar: Relational to NoSQL: How to Get Started from SQL Server
Presentation: Data Modeling with Couchbase Server
Connect16 On Demand Recordings
• Agile document modeling and data structures
• Migrating from relational – Data modeling and access
• LINQing to data: Easing the transition from SQL
• Tuning for Performance: Indexes and Queries
Documentation: Data Modeling with JSON
Training class: CD210 Couchbase NoSQL Data Modeling, Querying, andTuning Using
N1QL
©2016 Couchbase Inc. 46
Thank you

More Related Content

PPT
9. Document Oriented Databases
PPTX
SpringBoot with MyBatis, Flyway, QueryDSL
PPTX
Angular 6 Form Validation with Material
PPTX
Applying Domain-Driven Design to craft Rich Domain Models
PDF
Ml ops on AWS
PPSX
Domain Driven Design
PDF
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
PPTX
Domain Driven Design - Strategic Patterns and Microservices
9. Document Oriented Databases
SpringBoot with MyBatis, Flyway, QueryDSL
Angular 6 Form Validation with Material
Applying Domain-Driven Design to craft Rich Domain Models
Ml ops on AWS
Domain Driven Design
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
Domain Driven Design - Strategic Patterns and Microservices

What's hot (20)

PPTX
Introduction to NodeJS
PDF
Domain Driven Design
PDF
Laravel Introduction
ZIP
NoSQL databases
PPTX
Introducing MongoDB Atlas
PDF
How to govern and secure a Data Mesh?
PPTX
Introduction to NoSQL Databases
PDF
D2 domain driven-design
PDF
AWS glue technical enablement training
KEY
Solid principles
PPTX
Introduction to MongoDB
PDF
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
PDF
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
PPTX
LINQ in C#
PPTX
Mongodb basics and architecture
PPTX
Indexing with MongoDB
PPT
Domain Driven Design (DDD)
PPTX
The Semantic Knowledge Graph
PDF
Node.js Tutorial for Beginners | Node.js Web Application Tutorial | Node.js T...
PPTX
Web tier-framework-mvc
Introduction to NodeJS
Domain Driven Design
Laravel Introduction
NoSQL databases
Introducing MongoDB Atlas
How to govern and secure a Data Mesh?
Introduction to NoSQL Databases
D2 domain driven-design
AWS glue technical enablement training
Solid principles
Introduction to MongoDB
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
LINQ in C#
Mongodb basics and architecture
Indexing with MongoDB
Domain Driven Design (DDD)
The Semantic Knowledge Graph
Node.js Tutorial for Beginners | Node.js Web Application Tutorial | Node.js T...
Web tier-framework-mvc
Ad

Similar to Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach (20)

PDF
The Why, When, and How of NoSQL - A Practical Approach
PDF
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
PDF
NoSQL on ACID - Meet Unstructured Postgres
 
PDF
NoSQL Simplified: Schema vs. Schema-less
PPTX
PPTX
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
PDF
Couchbase Mobile on Android
PDF
Manuel Hurtado. Couchbase paradigma4oct
PPTX
NoSQL and MongoDB Introdction
PDF
Agile Document Models & Data Structures
PPTX
Introduction to NoSQL and MongoDB
PDF
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
PPTX
Architecting Your First Big Data Implementation
PDF
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...
PPTX
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
PPTX
I Have a NoSQL toaster - DC - August 2017
PPTX
Why no sql ? Why Couchbase ?
PPTX
GraphTalks Rome - Selecting the right Technology
PPTX
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
PPTX
Relational databases vs Non-relational databases
The Why, When, and How of NoSQL - A Practical Approach
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
NoSQL on ACID - Meet Unstructured Postgres
 
NoSQL Simplified: Schema vs. Schema-less
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Couchbase Mobile on Android
Manuel Hurtado. Couchbase paradigma4oct
NoSQL and MongoDB Introdction
Agile Document Models & Data Structures
Introduction to NoSQL and MongoDB
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Architecting Your First Big Data Implementation
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
I Have a NoSQL toaster - DC - August 2017
Why no sql ? Why Couchbase ?
GraphTalks Rome - Selecting the right Technology
MongoDB Days UK: Building an Enterprise Data Fabric at Royal Bank of Scotland...
Relational databases vs Non-relational databases
Ad

More from DATAVERSITY (20)

PDF
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
PDF
Data at the Speed of Business with Data Mastering and Governance
PDF
Exploring Levels of Data Literacy
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Make Data Work for You
PDF
Data Catalogs Are the Answer – What is the Question?
PDF
Data Catalogs Are the Answer – What Is the Question?
PDF
Data Modeling Fundamentals
PDF
Showing ROI for Your Analytic Project
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
PDF
Is Enterprise Data Literacy Possible?
PDF
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Data Governance Trends - A Look Backwards and Forwards
PDF
Data Governance Trends and Best Practices To Implement Today
PDF
2023 Trends in Enterprise Analytics
PDF
Data Strategy Best Practices
PDF
Who Should Own Data Governance – IT or Business?
PDF
Data Management Best Practices
PDF
MLOps – Applying DevOps to Competitive Advantage
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Data at the Speed of Business with Data Mastering and Governance
Exploring Levels of Data Literacy
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Make Data Work for You
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What Is the Question?
Data Modeling Fundamentals
Showing ROI for Your Analytic Project
How a Semantic Layer Makes Data Mesh Work at Scale
Is Enterprise Data Literacy Possible?
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends and Best Practices To Implement Today
2023 Trends in Enterprise Analytics
Data Strategy Best Practices
Who Should Own Data Governance – IT or Business?
Data Management Best Practices
MLOps – Applying DevOps to Competitive Advantage

Recently uploaded (20)

PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
NewMind AI Monthly Chronicles - July 2025
Understanding_Digital_Forensics_Presentation.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Unlocking AI with Model Context Protocol (MCP)
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Network Security Unit 5.pdf for BCA BBA.
Spectral efficient network and resource selection model in 5G networks
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Modernizing your data center with Dell and AMD
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation theory and applications.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I
Reach Out and Touch Someone: Haptics and Empathic Computing
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction

Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach

  • 1. NoSQL Data Modeling Using JSON Documents – A Practical Approach David Segleau Dir.Technical Product Marketing Couchbase
  • 2. ©2016 Couchbase Inc. 2 About the speaker – David Segleau David Segleau DirectorTechnical Product Marketing Couchbase (since Nov 2015) Experience: - Database guy - Couchbase, Oracle, Sleepycat, Informix, Illustra,Teradata - Tech Marketing,VP Eng, Prod Mgmt, QA, Support,Training, Docs - Technology is only useful when it’s deployed - Expertise: - Database server technology, RDBMS, and NoSQL
  • 3. ©2016 Couchbase Inc. 3 Today’s agenda § What is Couchbase? § Why NoSQL? § Identifying the right application § Modeling your data § Accessing your data § Migrating your data § Q & A
  • 4. ©2016 Couchbase Inc. 4 What is Couchbase? Couchbase delivers the Data Platform for the Digital Economy • Products: Couchbase Server & Couchbase Mobile • Open source NoSQL, JSON document database • Founded 2010 • 500+ enterprise customers, including 20+ Fortune 100 UNIFIED ADMINISTRATION UNIFIED PROGRAMMING INTERFACE Data Query Index SearchMobileReplication Analytics {N1QL}
  • 5. ©2016 Couchbase Inc. 5 Who is using Couchbase? 6 of the top 10 ECOMMERCE COMPANIES IN THE US 3 of the 3 GDS COMPANIES 3 of the 10 AIRLINES 6 of the top 10 US & EUROPEAN BROADCAST COMPANIES 6 of the top 10 ONLINE CASINO GAMING COMPANIES 6 of the top 10 FIN SERVICES COMPANIES IN THE US
  • 6. ©2016 Couchbase Inc. 6 Who is using Couchbase? § Gannett, publisher of 90+ media properties, replaced relational database technology with NoSQL to power its digital publishing platform. § eBay, with over 2 billion page views per day, uses Couchbase + RDBMS for their Listing cache, and Couchbase as database of record forToken management. § Cars.com, with over 30 million visits per month, replaced SQL Server with NoSQL to store customer and vehicle data. § Marriott deployed NoSQL to modernize its hotel reservation system that supports $38 billion in annual bookings. § Equifax uses Couchbase to generate insights from historic credit data, leveraging the JSON documents to represent complex data objects without normalization.
  • 7. ©2016 Couchbase Inc. 7 What is NoSQL? § No SQL? § Not only SQL? üNon relational § Distributed (most) – Scaled out, not up • Elasticity and commodity hardware – Partitioned and replicated • Scalability, performance, availability § Schema-less (most) – Flexible model – JSON (some) § Multi-model – Key-value & Document – Columnar & Graph – Graph & Key-value
  • 8. ©2016 Couchbase Inc. 8 Why are they using NoSQL? Technology Drivers § Customers are going online § The internet is connecting everything § Big Data is getting bigger § Applications are moving to the cloud § The world has gone mobile Technical Needs § Develop with agility – Flexibility + Simplicity – Easier + Faster § Operate at any scale – Elasticity + Availability – Performance at scale – Always-on, global deployment Business Needs § Innovate and compete – Faster time to market – Reduced costs (operational + hardware) – Increased revenue
  • 9. ©2016 Couchbase Inc. 9 NoSQL vs. RDBMS § Replace or Complement? à It depends – Replace: NoSQL is often the operational database of record – Complement: NoSQL adds perf, scale, and availability to legacy RDBMS § Most customers use RDBMS and NoSQL § NoSQL is adding RDBMS features – Security, Query Language, Analytics § RDBMS is adding NoSQL features – Sharding, JSON, Distributed Processing
  • 10. ©2016 Couchbase Inc. 10 Why migrate from an RDBMS to NoSQL? § Easier to scale 3 nodes to 100s, 1 data center to many, commodity hardware § Better performance Integrated caching, memory-optimized indexes, memory-based replication § Up to 40x lower cost Open source, subscription-based, per instance (not per core) § Greater agility JSON-based data model, SQL-based query language § Cross-platform Runs onWindows or Linux (Red Hat, Ubuntu, Debian, etc.)
  • 11. ©2016 Couchbase Inc. 11 How do you get started? 1. Identify the right application 2. Model your data 3. Access your data 4. Migrate your data 5. Q&A
  • 12. ©2016 Couchbase Inc. 12 Identifying the right application
  • 13. ©2016 Couchbase Inc. 13 Identifying the right application Have one or more of the following characteristics or requirements: ü Innovate and iterate faster ü Send and receive JSON ü Provide low latency at any throughput ü Support many concurrent users ü Supports users anywhere and everywhere ü Be available 24x7 ü Store terabytes of data ü Read and write to multiple data centers Service RDBMS Service Service NoSQL Application Examples: Ø High performance, high availability caching service Ø Independent application with a narrow scope Ø Logical or physical service within a large application Ø Global service that powers multiple applications
  • 14. ©2016 Couchbase Inc. 14 Model your data
  • 15. ©2016 Couchbase Inc. 15 Demystifying terminology Relational NoSQL (Couchbase) Failover Cluster Cluster Availability Group Cluster Database Bucket Table Bucket Row (Tuple) Document (JSON) Primary Key Object ID IDENTITY or Sequence Counter IndexedView View SQL N1QL
  • 16. ©2016 Couchbase Inc. 16 Data Modeling Approaches NoSQL Relaxed Normalization schema implied by structure fields may be empty, duplicate, or missing Relational Required Normalization schema enforced by DB same fields in all records • Minimize data inconsistencies (one item = one location) • Reduced duplicated data • Preserve storage resources • Optimized based on access patterns • Flexible, based on application requirements • Supports clustered architecture • Reduced server overhead
  • 17. ©2016 Couchbase Inc. 17 What and Why JSON? 17 • What is JSON? – Schema flexibility – Lightweight data interchange format – Based on JavaScript – Programming language independent – Field names must be unique • Why JSON? – Less verbose – Can represent Objects and Arrays (including nested documents) No impedance mismatch between a JSON Document and a Java Object
  • 18. ©2016 Couchbase Inc. 18 Modeling your data: Fixed vs. self-describing schema
  • 19. ©2016 Couchbase Inc. 19 Modeling your data:The flexibility of JSON Same document type, Different fields • Different types • Optional • On demand Tip: Add a version field to track changes. {“docType”: “user”, “docVersion”: “1”, …} {“docType”: “user”, “docVersion”: “2”, …}
  • 20. ©2016 Couchbase Inc. 20 Modeling your data: Changing the data model Relational database • Modify the database schema • Modify the application code (e.g., Java) • Modify the interface (e.g., HTML5/JS) Document database • Modify the interface (e.g., HTML5/JS)
  • 21. ©2016 Couchbase Inc. 21 Modeling your data: Object IDs Best Practices • Natural Keys • Human Readable • Deterministic • Semantic Examples • author::shane • author::shane::blogs • blog::nosql_fueled_hadoop • blog::nosql_fueled_hadoop::comments What about identity columns? 1. Document<Long> nextAuthorIdDoc = bucket.counter(“authorIdCounter”, 1); 2. Long nextAuthorId = nextAuthorIdDoc.content(); 3. String authDocId = “author::” + nextAuthorId; // author::101 Tip: Increment the counter by 10, 20, etc. instead of doing it for every insert.
  • 22. ©2016 Couchbase Inc. 22 Modeling your data: Relationships Author Blog (FK)Blog (FK) Comment (FK) Comment (FK) Author (FK x2) BlogBlog (FK x2) Comment Comment Bottom up/”BelongsTo” Top down/”Has”
  • 23. ©2016 Couchbase Inc. 23 Modeling your data: Relationships - Related or Nested
  • 24. ©2016 Couchbase Inc. 24 Modeling your data: Strategies and best practices If … Then … Relationship is one-to-one or one-to-many Store related data as nested objects Relationship is many-to-one or many-to-many Store related data as separate documents Data reads are mostly parent fields Store children as separate documents Data reads are mostly parent + child fields Store children as nested objects Data writes are mostly parent or child (not both) Store children as separate documents Data writes are mostly parent and child (both) Store children as nested objects
  • 25. ©2016 Couchbase Inc. 25 Modeling your data: Strategies and best practices § Are there a lot of concurrent writes, continuous updates? § Store children as separate documents Blog § Thread § Comment § Comment § Thread § Comment § Comment Blog { “docType”: “blog”, “author”: “author::shane”, “title”: “Couchbase Wins”, “threads”: [ “blog::couchbase_wins::threads::001”, “blog::couchbase_wins::threads::002” } Thread { “docType”: “thread”, “comments”: [ { “visitor”: “Brendan Bond”, “text”: “This blog is amazing!” “replies”: [ { “user”: “Dustin Johnson”, “text”: “No, it is not.” }] } }
  • 26. ©2016 Couchbase Inc. 26 Some JSON Design Choices 26 • Couchbase Server neither enforces nor validates for any particular document structure • Choices that impact JSON document design: – Single Root Attributes vs. Document type – Objects vs. Arrays – Array ElementTypes – Timestamp Formats – Property Names – Empty and Null PropertyValuesVS Missing Properties – JSON Schema Options • See "Agile document modeling and data structures“ from Couchbase Connect16 On-Demand Recordings
  • 27. ©2016 Couchbase Inc. 27 Access your data
  • 28. ©2016 Couchbase Inc. 28 Accessing your data: Options Key-Value (CRUD) N1QL (Query) Views (Query) Documents Indexes MapReduce FullText (Search) Geospatial (Search) We’ll focus on N1QL ]for now. Indexes MapReduce
  • 29. ©2016 Couchbase Inc. 29 Accessing your data – N1QL queries: Capabilities Feature SQL N1QL JOIN ✔ ✔ TRANSFORM ✔ ✔ FILTER ✔ ✔ AGGREGATE ✔ ✔ SORT ✔ ✔ SUBQUERIES ✔ ✔ PAGINATION ✔ ✔ OPERATORS ✔ ✔ FUNCTIONS ✔ ✔
  • 30. ©2016 Couchbase Inc. 30 Accessing your data: N1QL queries – referenced data
  • 31. ©2016 Couchbase Inc. 31 Accessing your data: N1QL queries – nested data
  • 32. ©2016 Couchbase Inc. 32 Accessing your data: N1QL queries – CRUD
  • 33. ©2016 Couchbase Inc. 33 Accessing your data: N1QL queries – indexes Simple Compound Functional Partial
  • 34. ©2016 Couchbase Inc. 34 Couchbase Index Options 34 IndexType Description 1 Primary Index Index on the document key on the whole bucket 2 Simple Index Index on the key-value or document-key 3 Composite Index Index on more than one key-value 4 Functional Index Index on function or expression on key-values 5 Partial Index Index subset of items in the bucket -- usesWHERE clause 6 Array Index Index individual elements of the arrays 7 Memory Optimized Index Index that is pinned in memory – defined when the cluster is configured 8 Covering Index Query able to resolve the query 100% within the index 9 Duplicate Index Ability to create a copy of the index on specific nodes within the cluster, thereby providing load balancing and failover – usesWITH { “nodes”: } clause
  • 35. ©2016 Couchbase Inc. 35 Accessing your data: Indexing Considerations Relational Couchbase Indexes are synchronous, index & data are in sync Indexes are asynchronous, index updates lag behind the data, application specifies read consistency Indexes slow down write operations Indexes do not affect write throughput Index load balancing for queries can only be implemented in the application Index load balancing for queries is automatic, based on index signature Indexes contend with other memory usage Memory Optimized indexes are pinned in memory and provides low-latency, high mutation throughput
  • 36. ©2016 Couchbase Inc. 36 Understanding your Query Plan: Explain § EXPLAIN shows the query plan, i.e exact steps how N1QL plans to execute the query cbq> EXPLAIN INSERT INTO default VALUES ("1", { "make" : "Toyota"}); "plan": { "#operator": "Sequence", "~children": [ { "#operator": "ValueScan", "values": "[["1", {""make"": "Toyota"}]]" }, { "#operator": "Parallel", "maxParallelism": 1, "~child": { "#operator": "Sequence", "~children": [ { "#operator": "SendInsert",
  • 37. ©2016 Couchbase Inc. 37 Accessing your data: Strategies and best practices Concept Strategies & Best Practices Key-Value Operations provide the best possible performance • Create an effective key naming strategy • Create an optimized data model Incremental MapReduce (Views) are well suited to aggregation • Ideal for large data sets • Data set can be used to create complex view indexes N1QL queries provide the most flexibility – everything else • Query data regardless of how it is modeled • Remember to create secondary indexes, leverage covering indexes where possible
  • 38. ©2016 Couchbase Inc. 38 Migrate your data
  • 39. ©2016 Couchbase Inc. 39 So many options! Remember the KISS principle 1) Identify the requirements • ETL vs. Data cleanse vs. Data enrichment • Duration vs. Resources • Data governance 2) Pick your strategy • Batch vs. Incremental • Single threaded vs. multi-threaded 3) Pick your tools • Data migration tools (Informatica, Looker, Talend) • BYO-tool (PHP & Python scripts, Hadoop, Spark) • KISS with Couchbase • Export to CVS; Import as documents; Use N1QL to transform & insert into new bucket • Use SQL to transform & export; Insert into Couchbase • Best Practices • Align with your data model • Plan for failure (bad source data, hardware failure, resource limitations) • Ensure interruptible, restartable, logged, predictable
  • 40. ©2016 Couchbase Inc. 40 How can you sync NoSQL and relational? § 1. Application Code (Manual) § 2. Replication (Automatic) – From NoSQL to relational – From relational to NoSQL Couchbase Kafka Queue Producer Consumer RDBMSDCP Stream RDBMS Handler CouchbaseGoldenGate https://github.com/mahurtado/CouchbaseGoldenGateAdapter
  • 41. ©2016 Couchbase Inc. 41 Data Modeling Best Practices Recap • Pick the right application • Focus on SOA, application/use case specific • Drive data model from data access patterns • Use Document type,Versionid • Create optimized, understandable keys • Weigh nested, referenced or mixed designs • Add indexes: Simple, Compound, Functional, Partial, Array, Covering, Memory Optimized • Match the data access method to requirements • N1QL, Key-value,Views, • Proof of Concept • Focus, Success Criteria, Review Architecture
  • 42. ©2016 Couchbase Inc. 42 Questions?
  • 43. ©2016 Couchbase Inc. 43 Want to learn more? Getting Started guide: http://www.couchbase.com/get-started-developing-nosql Download Couchbase software: http://www.couchbase.com/nosql-databases/downloads Free OnlineTraining http://training.couchbase.com/online “Why NoSQL” white paper http://www.couchbase.com/nosql-resources/why-nosql
  • 44. ©2016 Couchbase Inc. 44 Additional Resources 44 § General Docs: http://docs.couchbase.com § Developer Portal: http://developer.couchbase.com § Couchbase Labs: https://github.com/couchbaselabs § Query Portal: http://query.couchbase.com § Sample Applications: § https://github.com/couchbaselabs?utf8=%E2%9C%93&query=try § https://github.com/couchbaselabs?utf8=%E2%9C%93&query=beer § Blog: http://blog.couchbase.com § Forum: http://forums.couchbase.com
  • 45. ©2016 Couchbase Inc. 45 Additional Resources – Data Modeling 45 Webinar:The Why,When, and How of NoSQL: A Practical Approach Webinar: Relational to NoSQL: How to Get Started from SQL Server Presentation: Data Modeling with Couchbase Server Connect16 On Demand Recordings • Agile document modeling and data structures • Migrating from relational – Data modeling and access • LINQing to data: Easing the transition from SQL • Tuning for Performance: Indexes and Queries Documentation: Data Modeling with JSON Training class: CD210 Couchbase NoSQL Data Modeling, Querying, andTuning Using N1QL
  • 46. ©2016 Couchbase Inc. 46 Thank you