SlideShare a Scribd company logo
JSON Data Modeling
Matthew D. Groves, @mgroves
David Segleau, @dsegleau
©2017 Couchbase Inc. 2
Agenda
Why NoSQL?
JSON Data Modeling
Accessing data
Migrating data
©2017 Couchbase Inc. 3
Where am I?
• PittsburghTech Fest
• http://www.pghtechfest.com/
©2017 Couchbase Inc. 4
Who am I?
• Matthew D. Groves
• Developer Advocate for Couchbase
• @mgroves onTwitter
• Podcast and blog: http://crosscuttingconcerns.com
• “I am not an expert, but I am an enthusiast.” – Alan Stevens
JSON Data Modeling
Matthew D. Groves, @mgroves
David Segleau, @dsegleau
©2017 Couchbase Inc. 6
Major Enterprises Across Industries are Adopting NoSQL
6
CommunicationsTechnology
Travel & Hospitality Media &
Entertainment
E-Commerce &
Digital Advertising
Retail & Apparel
Games & GamingFinance &
Business Services
©2017 Couchbase Inc. 7
Why NoSQL?
©2017 Couchbase Inc. 8
NoSQL Landscape
Document
• Couchbase
• MongoDB
• DynamoDB
• DocumentDB
Graph
• OrientDB
• Neo4J
• DEX
• GraphBase
Key-Value
• Couchbase
• Riak
• BerkeleyDB
• Redis
• … Wide Column
• Hbase
• Cassandra
• Hypertable
©2017 Couchbase Inc. 9
NoSQL Landscape
Document
• Couchbase
• MongoDB
• DynamoDB
• DocumentDB
• Get by key(s)
• Set by key(s)
• Replace by key(s)
• Delete by key(s)
• Map/Reduce
©2017 Couchbase Inc. 10
Why NoSQL? Scalability
©2017 Couchbase Inc. 11
Why NoSQL? Flexibility
©2017 Couchbase Inc. 12
Why NoSQL? Performance
©2017 Couchbase Inc. 13
Why NoSQL? Availability
©2017 Couchbase Inc. 14
JSON Data Modeling
©2017 Couchbase Inc. 15
Models for Representing Data
Data Concern Relational Model JSON Document Model
Rich Structure
Relationships
Value Evolution
Structure Evolution
©2017 Couchbase Inc. 16
Properties of Real-World Data
©2017 Couchbase Inc. 17
Modeling Data in RelationalWorld
Billing
ConnectionsPurchases
Contacts
Customer
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30”
}
Customer DocumentKey: CBL2015
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
}
]
}
Customer DocumentKey: CBL2015
CustomerID Type Cardnum Expiry
CBL2015 visa 5827… 2019-03
Table: Billing
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2542-5847-3949",
"expiry" : "2018-12"
}
]
}
Customer DocumentKey: CBL2015
CustomerID Type Cardnum Expiry
CBL2015 visa 5827… 2019-03
CBL2015 master 6274… 2018-12
Table: Billing
CustomerID ConnId Name
CBL2015 XYZ987 Joe Smith
CBL2015 SKR007 Sam Smith
Table: Connections
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2542-5847-3949",
"expiry" : "2018-12"
}
],
"Connections" : [
{
"ConnId" : "XYZ987",
"Name" : "Joe Smith"
},
{
"ConnId" : ”SKR007",
"Name" : ”Sam Smith"
}
}
Customer DocumentKey: CBL2015
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2842-2847-3909",
"expiry" : "2019-03"
}
],
"Connections" : [
{
"CustId" : "XYZ987",
"Name" : "Joe Smith"
},
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
],
"Purchases" : [
{ "id":12, item: "mac", "amt": 2823.52 }
{ "id":19, item: "ipad2", "amt": 623.52 }
]
}
DocumentKey: CBL2015
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Customer
ID
Type Cardnum Expiry
CBL2015 visa 5827… 2019-03
CBL2015 master 6274… 2018-12
CustomerID ConnId Name
CBL2015 XYZ987 Joe Smith
CBL2015 SKR007 Sam Smith
CustomerID item amt
CBL2015 mac 2823.52
CBL2015 ipad2 623.52
CustomerID ConnId Name
CBL2015 XYZ987 Joe Smith
CBL2015 SKR007 Sam
Smith
Contacts
Customer
Billing
ConnectionsPurchases
©2017 Couchbase Inc. 23
Models for Representing Data
Data Concern Relational Model
JSON Document Model
(NoSQL)
Rich Structure
 Multiple flat tables
 Constant assembly / disassembly
 Documents
 No assembly required!
Relationships
 Represented
 Queried (SQL)
 Represented
 Yes – N1QL (SQL for JSON)
Value Evolution  Data can be updated  Data can be updated
Structure Evolution
 Uniform and rigid
 Manual change (disruptive)
 Flexible
 Dynamic change
©2017 Couchbase Inc. 24
Demo: Modeling
©2016 Couchbase Inc. 25
Modeling your data: Strategies / rules of thumb
If … Then …
Relationship is one-to-one or one-to-many Store related data as nested objects
Relationship is many-to-one or many-to-many Store related data as separate documents
Data reads are mostly parent fields Store children as separate documents
Data reads are mostly parent + child fields Store children as nested objects
Data writes are mostly parent or child (not both) Store children as separate documents
Data writes are mostly parent and child (both) Store children as nested objects
©2017 Couchbase Inc. 26
Accessing Data
©2017 Couchbase Inc. 27
Accessing your data (Couchbase)
Key-Value
(CRUD)
N1QL
(Query)
Views
(Query)
Documents
Indexes MapReduce
FullText
(Search)
Geospatial
(Search)
Indexes MapReduce
©2017 Couchbase Inc. 28
Key/Value
public ShoppingCart GetCartById(Guid id)
{
return _bucket.Get<ShoppingCart>(id.ToString()).Value;
}
public void CreateShoppingCart()
{
_bucket.Insert(new Document<dynamic>
{
Id = Guid.NewGuid().ToString(),
Content = new { . . . }
});
}
©2016 Couchbase Inc. 29
Key/Value: Recommendations for keys
•Natural Keys
•Human Readable
•Deterministic
•Semantic
©2016 Couchbase Inc. 30
Key/Value: Example keys
• author::matt
• author::matt::blogs
• blog::csharp_7_features
• blog::csharp_7_features::comments
©2017 Couchbase Inc. 31
N1QL
©2017 Couchbase Inc. 32
Understanding your Query Plan
©2017 Couchbase Inc. 33
Map/Reduce
©2017 Couchbase Inc. 34
Accessing your data: Strategies and recommendation
Concept Strategies & Recommendations
Key-Value Operations provide the best
possible performance
• Create an effective key naming strategy
• Create an optimized data model
Incremental MapReduce (Views) are well
suited to aggregation
• Ideal for large data sets
• Data set can be used to create complex
view indexes
N1QL queries provide the most flexibility –
everything else
• Query data regardless of how it is modeled
• Good indexing is vital
©2017 Couchbase Inc. 35
Migrating Data
©2017 Couchbase Inc. 36
Migration options: Requirements
ETL / data cleanse / data enrichment
©2017 Couchbase Inc. 37
Migration options: Requirements
Duration vs. Resources
©2017 Couchbase Inc. 38
Migration options: Requirements
Data governance
©2017 Couchbase Inc. 39
Migration options: Pick your strategy
• Batch vs. Incremental
• Single threaded vs. multi-threaded
©2017 Couchbase Inc. 40
Migration options: Pick your tools
• Data migration tools:
• Informatica, Looker,Talend
• BYO-tool
• C# / Powershell / etc
• RhinoETL / DTS / SSIS
• Hadoop, Spark
©2017 Couchbase Inc. 41
Migration options: KISS
• CSV:
• Export to CSV
• Import as documents into a 'staging' bucket
• Use N1QL to transform
• Insert into new bucket
• SQL:
• Transform
• Export
• Insert into document database
©2017 Couchbase Inc. 42
Migration options: Recommendations
• Align with your data model
• Plan for failure
• Bad source data
• Hardware failure
• Resource limitations
• Ensure: Interruptible, restartable, logged, predictable
©2017 Couchbase Inc. 43
Sync NoSQL and relational? Automatic Replication
Couchbase
Kafka
Queue
Producer Consumer
RDBMSDCP
Stream
©2017 Couchbase Inc. 44
How can you sync NoSQL and relational?
RDBMS
CData
CouchbaseSSIS
https://www.cdata.com/drivers/couchbase
©2017 Couchbase Inc. 45
Sync NoSQL and relational? Manual.
©2017 Couchbase Inc. 46
Summary
©2017 Couchbase Inc. 47
Summary
Pick the right application
©2017 Couchbase Inc. 48
Summary
Drive data model from
data access patterns
©2017 Couchbase Inc. 49
Summary
Match the data access
method to requirements
©2017 Couchbase Inc. 50
Summary
Proof of Concept
©2017 Couchbase Inc. 51
Resources
 https://blog.couchbase.com/moving-from-sql-server-to-
couchbase-part-1-data-modeling/
– http://tinyurl.com/jsonmodel1
 https://blog.couchbase.com/sql-to-json-data-modeling-
hackolade/
– http://tinyurl.com/jsonmodel2
©2017 Couchbase Inc. 52
Couchbase, everybody!
©2017 Couchbase Inc. 53
Where do you find us?
• blog.couchbase.com
• @couchbasedev
• @mgroves
©2017 Couchbase Inc. 54
Frequently Asked Questions
1. How is Couchbase different than Mongo?
2. Is Couchbase the same thing as CouchDb?
3. How did you get to be both incredibly handsome and tremendously
intelligent?
4. What is the Couchbase licensing situation?
5. Is Couchbase a managed cloud service?
6. Transactions?

More Related Content

PPTX
JSON Data Modeling - July 2018 - Tulsa Techfest
PPTX
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
PDF
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
PDF
TehranDB Meet-up April 2018 Introduction to Graph Database
PPTX
Introduction: Relational to Graphs
PDF
Webinar: RDBMS to Graphs
PDF
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
PPTX
Conclusions - Linked Data
JSON Data Modeling - July 2018 - Tulsa Techfest
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
TehranDB Meet-up April 2018 Introduction to Graph Database
Introduction: Relational to Graphs
Webinar: RDBMS to Graphs
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
Conclusions - Linked Data

What's hot (20)

PDF
Graph database Use Cases
PDF
Making Sense of Schema on Read
PPTX
Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
PDF
Introduction to Graph databases and Neo4j (by Stefan Armbruster)
PDF
Do I need a Graph Database?
PDF
RDBMS to Graphs
PDF
Virtualizing Relational Databases as Graphs: a multi-model approach
PDF
Integrating Semantic Web in the Real World: A Journey between Two Cities
PPT
Graph db
PPTX
GraphTalk Berlin - Einführung in Graphdatenbanken
PDF
Einführung in Neo4j
PDF
A Brief Introduction: MongoDB
PDF
Graph Query Languages: update from LDBC
PDF
Democratizing Data at Airbnb
PDF
RDBMS to Graph Webinar
PDF
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
PPTX
Intro to Data Vault 2.0 on Snowflake
PDF
Introducing Neo4j
PDF
MongoDB Europe 2016 - The Rise of the Data Lake
PDF
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Graph database Use Cases
Making Sense of Schema on Read
Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
Introduction to Graph databases and Neo4j (by Stefan Armbruster)
Do I need a Graph Database?
RDBMS to Graphs
Virtualizing Relational Databases as Graphs: a multi-model approach
Integrating Semantic Web in the Real World: A Journey between Two Cities
Graph db
GraphTalk Berlin - Einführung in Graphdatenbanken
Einführung in Neo4j
A Brief Introduction: MongoDB
Graph Query Languages: update from LDBC
Democratizing Data at Airbnb
RDBMS to Graph Webinar
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
Intro to Data Vault 2.0 on Snowflake
Introducing Neo4j
MongoDB Europe 2016 - The Rise of the Data Lake
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Ad

Similar to Json data modeling june 2017 - pittsburgh tech fest (20)

PPTX
JSON Data Modeling - GDG Indy - April 2020
PDF
Data Modeling and Relational to NoSQL
PDF
JSON Data Modeling in Document Database
PDF
The Why, When, and How of NoSQL - A Practical Approach
PPTX
Querying NoSQL with SQL - MIGANG - July 2017
PPTX
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
PPTX
Introducing N1QL: New SQL Based Query Language for JSON
PPTX
Query in Couchbase. N1QL: SQL for JSON
PPTX
Querying NoSQL with SQL - KCDC - August 2017
PPTX
NoSQL Data Modeling using Couchbase
PDF
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
PPTX
Utilizing Arrays: Modeling, Querying and Indexing
PPTX
Putting the SQL Back in NoSQL - October 2022 - All Things Open
PPTX
No sql for sql professionals
PDF
Slides: Moving from a Relational Model to NoSQL
PPTX
Characteristics of no sql databases
PDF
Three Things You Need to Know About Document Data Modeling in NoSQL
PPTX
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
PPTX
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
PPTX
From SQL to NoSQL: Structured Querying for JSON
JSON Data Modeling - GDG Indy - April 2020
Data Modeling and Relational to NoSQL
JSON Data Modeling in Document Database
The Why, When, and How of NoSQL - A Practical Approach
Querying NoSQL with SQL - MIGANG - July 2017
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
Introducing N1QL: New SQL Based Query Language for JSON
Query in Couchbase. N1QL: SQL for JSON
Querying NoSQL with SQL - KCDC - August 2017
NoSQL Data Modeling using Couchbase
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Utilizing Arrays: Modeling, Querying and Indexing
Putting the SQL Back in NoSQL - October 2022 - All Things Open
No sql for sql professionals
Slides: Moving from a Relational Model to NoSQL
Characteristics of no sql databases
Three Things You Need to Know About Document Data Modeling in NoSQL
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
From SQL to NoSQL: Structured Querying for JSON
Ad

More from Matthew Groves (20)

PPTX
CREAM - That Conference Austin - January 2024.pptx
PPTX
FluentMigrator - Dayton .NET - July 2023
PPTX
Cache Rules Everything Around Me - DevIntersection - December 2022
PPTX
Cache Rules Everything Around Me - Momentum - October 2022.pptx
PPTX
Don't Drop ACID (July 2021)
PPTX
Don't Drop ACID - Data Love - April 2021
PPTX
Demystifying NoSQL - All Things Open - October 2020
PPTX
Autonomous Microservices - Manning - July 2020
PPTX
CONDG April 23 2020 - Baskar Rao - GraphQL
PPTX
Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...
PPTX
Intro to SQL++ - Detroit Tech Watch - June 2019
PPTX
Autonomous Microservices - CodeMash - January 2019
PPTX
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
PPTX
5 NoSQL Options - Toronto - May 2018
PPTX
Full stack development with node and NoSQL - All Things Open - October 2017
PPTX
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
PPTX
I Have a NoSQL toaster - DC - August 2017
PPTX
I Have a NoSQL Toaster - Troy .NET User Group - July 2017
PDF
I have a NoSQL Toaster - ConnectJS - October 2016
PDF
Full stack development with Node and NoSQL - Austin Node.JS Group - October ...
CREAM - That Conference Austin - January 2024.pptx
FluentMigrator - Dayton .NET - July 2023
Cache Rules Everything Around Me - DevIntersection - December 2022
Cache Rules Everything Around Me - Momentum - October 2022.pptx
Don't Drop ACID (July 2021)
Don't Drop ACID - Data Love - April 2021
Demystifying NoSQL - All Things Open - October 2020
Autonomous Microservices - Manning - July 2020
CONDG April 23 2020 - Baskar Rao - GraphQL
Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...
Intro to SQL++ - Detroit Tech Watch - June 2019
Autonomous Microservices - CodeMash - January 2019
5 Popular Choices for NoSQL on a Microsoft Platform - Tulsa - July 2018
5 NoSQL Options - Toronto - May 2018
Full stack development with node and NoSQL - All Things Open - October 2017
5 Popular Choices for NoSQL on a Microsoft Platform - All Things Open - Octob...
I Have a NoSQL toaster - DC - August 2017
I Have a NoSQL Toaster - Troy .NET User Group - July 2017
I have a NoSQL Toaster - ConnectJS - October 2016
Full stack development with Node and NoSQL - Austin Node.JS Group - October ...

Recently uploaded (20)

PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Essential Infomation Tech presentation.pptx
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Introduction to Artificial Intelligence
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
ai tools demonstartion for schools and inter college
PPTX
Transform Your Business with a Software ERP System
PDF
Digital Strategies for Manufacturing Companies
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
Softaken Excel to vCard Converter Software.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Understanding Forklifts - TECH EHS Solution
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Upgrade and Innovation Strategies for SAP ERP Customers
Essential Infomation Tech presentation.pptx
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
How Creative Agencies Leverage Project Management Software.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Introduction to Artificial Intelligence
Navsoft: AI-Powered Business Solutions & Custom Software Development
ai tools demonstartion for schools and inter college
Transform Your Business with a Software ERP System
Digital Strategies for Manufacturing Companies
Odoo Companies in India – Driving Business Transformation.pdf
wealthsignaloriginal-com-DS-text-... (1).pdf

Json data modeling june 2017 - pittsburgh tech fest

  • 1. JSON Data Modeling Matthew D. Groves, @mgroves David Segleau, @dsegleau
  • 2. ©2017 Couchbase Inc. 2 Agenda Why NoSQL? JSON Data Modeling Accessing data Migrating data
  • 3. ©2017 Couchbase Inc. 3 Where am I? • PittsburghTech Fest • http://www.pghtechfest.com/
  • 4. ©2017 Couchbase Inc. 4 Who am I? • Matthew D. Groves • Developer Advocate for Couchbase • @mgroves onTwitter • Podcast and blog: http://crosscuttingconcerns.com • “I am not an expert, but I am an enthusiast.” – Alan Stevens
  • 5. JSON Data Modeling Matthew D. Groves, @mgroves David Segleau, @dsegleau
  • 6. ©2017 Couchbase Inc. 6 Major Enterprises Across Industries are Adopting NoSQL 6 CommunicationsTechnology Travel & Hospitality Media & Entertainment E-Commerce & Digital Advertising Retail & Apparel Games & GamingFinance & Business Services
  • 7. ©2017 Couchbase Inc. 7 Why NoSQL?
  • 8. ©2017 Couchbase Inc. 8 NoSQL Landscape Document • Couchbase • MongoDB • DynamoDB • DocumentDB Graph • OrientDB • Neo4J • DEX • GraphBase Key-Value • Couchbase • Riak • BerkeleyDB • Redis • … Wide Column • Hbase • Cassandra • Hypertable
  • 9. ©2017 Couchbase Inc. 9 NoSQL Landscape Document • Couchbase • MongoDB • DynamoDB • DocumentDB • Get by key(s) • Set by key(s) • Replace by key(s) • Delete by key(s) • Map/Reduce
  • 10. ©2017 Couchbase Inc. 10 Why NoSQL? Scalability
  • 11. ©2017 Couchbase Inc. 11 Why NoSQL? Flexibility
  • 12. ©2017 Couchbase Inc. 12 Why NoSQL? Performance
  • 13. ©2017 Couchbase Inc. 13 Why NoSQL? Availability
  • 14. ©2017 Couchbase Inc. 14 JSON Data Modeling
  • 15. ©2017 Couchbase Inc. 15 Models for Representing Data Data Concern Relational Model JSON Document Model Rich Structure Relationships Value Evolution Structure Evolution
  • 16. ©2017 Couchbase Inc. 16 Properties of Real-World Data
  • 17. ©2017 Couchbase Inc. 17 Modeling Data in RelationalWorld Billing ConnectionsPurchases Contacts Customer
  • 18. CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Table: Customer { "Name" : "Jane Smith", "DOB" : "1990-01-30” } Customer DocumentKey: CBL2015
  • 19. CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Table: Customer { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" } ] } Customer DocumentKey: CBL2015 CustomerID Type Cardnum Expiry CBL2015 visa 5827… 2019-03 Table: Billing
  • 20. CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Table: Customer { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2542-5847-3949", "expiry" : "2018-12" } ] } Customer DocumentKey: CBL2015 CustomerID Type Cardnum Expiry CBL2015 visa 5827… 2019-03 CBL2015 master 6274… 2018-12 Table: Billing
  • 21. CustomerID ConnId Name CBL2015 XYZ987 Joe Smith CBL2015 SKR007 Sam Smith Table: Connections { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2542-5847-3949", "expiry" : "2018-12" } ], "Connections" : [ { "ConnId" : "XYZ987", "Name" : "Joe Smith" }, { "ConnId" : ”SKR007", "Name" : ”Sam Smith" } } Customer DocumentKey: CBL2015
  • 22. { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ], "Connections" : [ { "CustId" : "XYZ987", "Name" : "Joe Smith" }, { "CustId" : "PQR823", "Name" : "Dylan Smith" } { "CustId" : "PQR823", "Name" : "Dylan Smith" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ] } DocumentKey: CBL2015 CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Customer ID Type Cardnum Expiry CBL2015 visa 5827… 2019-03 CBL2015 master 6274… 2018-12 CustomerID ConnId Name CBL2015 XYZ987 Joe Smith CBL2015 SKR007 Sam Smith CustomerID item amt CBL2015 mac 2823.52 CBL2015 ipad2 623.52 CustomerID ConnId Name CBL2015 XYZ987 Joe Smith CBL2015 SKR007 Sam Smith Contacts Customer Billing ConnectionsPurchases
  • 23. ©2017 Couchbase Inc. 23 Models for Representing Data Data Concern Relational Model JSON Document Model (NoSQL) Rich Structure  Multiple flat tables  Constant assembly / disassembly  Documents  No assembly required! Relationships  Represented  Queried (SQL)  Represented  Yes – N1QL (SQL for JSON) Value Evolution  Data can be updated  Data can be updated Structure Evolution  Uniform and rigid  Manual change (disruptive)  Flexible  Dynamic change
  • 24. ©2017 Couchbase Inc. 24 Demo: Modeling
  • 25. ©2016 Couchbase Inc. 25 Modeling your data: Strategies / rules of thumb If … Then … Relationship is one-to-one or one-to-many Store related data as nested objects Relationship is many-to-one or many-to-many Store related data as separate documents Data reads are mostly parent fields Store children as separate documents Data reads are mostly parent + child fields Store children as nested objects Data writes are mostly parent or child (not both) Store children as separate documents Data writes are mostly parent and child (both) Store children as nested objects
  • 26. ©2017 Couchbase Inc. 26 Accessing Data
  • 27. ©2017 Couchbase Inc. 27 Accessing your data (Couchbase) Key-Value (CRUD) N1QL (Query) Views (Query) Documents Indexes MapReduce FullText (Search) Geospatial (Search) Indexes MapReduce
  • 28. ©2017 Couchbase Inc. 28 Key/Value public ShoppingCart GetCartById(Guid id) { return _bucket.Get<ShoppingCart>(id.ToString()).Value; } public void CreateShoppingCart() { _bucket.Insert(new Document<dynamic> { Id = Guid.NewGuid().ToString(), Content = new { . . . } }); }
  • 29. ©2016 Couchbase Inc. 29 Key/Value: Recommendations for keys •Natural Keys •Human Readable •Deterministic •Semantic
  • 30. ©2016 Couchbase Inc. 30 Key/Value: Example keys • author::matt • author::matt::blogs • blog::csharp_7_features • blog::csharp_7_features::comments
  • 32. ©2017 Couchbase Inc. 32 Understanding your Query Plan
  • 33. ©2017 Couchbase Inc. 33 Map/Reduce
  • 34. ©2017 Couchbase Inc. 34 Accessing your data: Strategies and recommendation Concept Strategies & Recommendations Key-Value Operations provide the best possible performance • Create an effective key naming strategy • Create an optimized data model Incremental MapReduce (Views) are well suited to aggregation • Ideal for large data sets • Data set can be used to create complex view indexes N1QL queries provide the most flexibility – everything else • Query data regardless of how it is modeled • Good indexing is vital
  • 35. ©2017 Couchbase Inc. 35 Migrating Data
  • 36. ©2017 Couchbase Inc. 36 Migration options: Requirements ETL / data cleanse / data enrichment
  • 37. ©2017 Couchbase Inc. 37 Migration options: Requirements Duration vs. Resources
  • 38. ©2017 Couchbase Inc. 38 Migration options: Requirements Data governance
  • 39. ©2017 Couchbase Inc. 39 Migration options: Pick your strategy • Batch vs. Incremental • Single threaded vs. multi-threaded
  • 40. ©2017 Couchbase Inc. 40 Migration options: Pick your tools • Data migration tools: • Informatica, Looker,Talend • BYO-tool • C# / Powershell / etc • RhinoETL / DTS / SSIS • Hadoop, Spark
  • 41. ©2017 Couchbase Inc. 41 Migration options: KISS • CSV: • Export to CSV • Import as documents into a 'staging' bucket • Use N1QL to transform • Insert into new bucket • SQL: • Transform • Export • Insert into document database
  • 42. ©2017 Couchbase Inc. 42 Migration options: Recommendations • Align with your data model • Plan for failure • Bad source data • Hardware failure • Resource limitations • Ensure: Interruptible, restartable, logged, predictable
  • 43. ©2017 Couchbase Inc. 43 Sync NoSQL and relational? Automatic Replication Couchbase Kafka Queue Producer Consumer RDBMSDCP Stream
  • 44. ©2017 Couchbase Inc. 44 How can you sync NoSQL and relational? RDBMS CData CouchbaseSSIS https://www.cdata.com/drivers/couchbase
  • 45. ©2017 Couchbase Inc. 45 Sync NoSQL and relational? Manual.
  • 46. ©2017 Couchbase Inc. 46 Summary
  • 47. ©2017 Couchbase Inc. 47 Summary Pick the right application
  • 48. ©2017 Couchbase Inc. 48 Summary Drive data model from data access patterns
  • 49. ©2017 Couchbase Inc. 49 Summary Match the data access method to requirements
  • 50. ©2017 Couchbase Inc. 50 Summary Proof of Concept
  • 51. ©2017 Couchbase Inc. 51 Resources  https://blog.couchbase.com/moving-from-sql-server-to- couchbase-part-1-data-modeling/ – http://tinyurl.com/jsonmodel1  https://blog.couchbase.com/sql-to-json-data-modeling- hackolade/ – http://tinyurl.com/jsonmodel2
  • 52. ©2017 Couchbase Inc. 52 Couchbase, everybody!
  • 53. ©2017 Couchbase Inc. 53 Where do you find us? • blog.couchbase.com • @couchbasedev • @mgroves
  • 54. ©2017 Couchbase Inc. 54 Frequently Asked Questions 1. How is Couchbase different than Mongo? 2. Is Couchbase the same thing as CouchDb? 3. How did you get to be both incredibly handsome and tremendously intelligent? 4. What is the Couchbase licensing situation? 5. Is Couchbase a managed cloud service? 6. Transactions?

Editor's Notes

  • #3: Spend just a little time on why people are using NoSQL Talk about how data is modeled differently in JSON Let’s talk about why SQL is good and why SQL for JSON is needed Let’s talk about the exciting stuff happening in the database ecosystem Including but not limited to the stuff Couchbase is doing If we have time, we’ll look at how a .NET developer (or Java developer, etc) would interact with SQL for JSON
  • #6: This session is a WIP. It’s based on my knowledge of Couchbase, SQL server experience, and David Segleau’s engagement and lessons learned with customers, all combined into an hour presentation. David likes bullet points, I like to break up bullet points and use lots of pictures. David works with customers, I work with dev community. So you’re going to see a meshing of that, hopefully it works.
  • #7: What’s also interesting is that we’re seeing the use of NoSQL expand inside many of these companies. Orbitz, the online travel company, is a great example – they started using Couchbase to store their hotel rate data, and now they use Couchbase in many other ways. Same with ebay, they recently presented at the Couchbase conference with a chart tracking how many instances of various nosql databases are in use, and we see growth in Cassandra, mongo, and couchbase has actually surpassed them within ebay
  • #8: SQL (relational) databases are great. They give you LOT OF functionality. Great set of abstractions (tables, columns, data types, constraints, triggers, SQL, ACID TRANSACTIONS, stored procedures and more) at a highly reasonable cost. Change is inevitable One thing RDBMS does not handle well is CHANGE. Change of schema (both logical and physical), change of hardware, change of capacity. NoSQL databases ESPECIALLY ONES DESIGNED TO BE DISTRIBUTED tend to help solve problems with: agility, scalability, performance, and availability
  • #9: Let’s talk about what NoSQL is, first. NoSQL generally refers to databases which lack SQL or don’t use a relational model Once the SQL language, transaction became optional, flurry of databases were created using distinct approaches for common use-cases. KEY-Value simply provided quick access to data for a given KEY. Wide Column databases can store large number of arbitrary columns in each row Graph databases store data and relationships as first class concepts Document databases aggregate data into a hierarchical structure. With JSON is a means to the end. Document databases provide flexible schema,built-in data types, rich structure, implicit relationships using JSON.
  • #10: When we look at document databases, they originally came with a Minimal set of APIs and features But as they continue to mature, we’re seeing more features being added And generally I’m seeing a convergent trend between SQL and NoSQL But anyway, this set of minimal features, lacking a SQL language and tables gives us the buzzword “nosql”
  • #11: Elastic scaling Size your cluster for today Scale out on demand Cost effective scaling Commodity hardware On premise or on cloud Scale OUT instead of Scale UP [example: changing the channel to a soccer game or Game of Thrones, everyone makes the same API request in the same 5 minutes] [example: TV show lets watchers vote during some period of the week, so you can scale up during that period of time] [example: black Friday]
  • #12: Schema flexibility Easier management of change in the business requirements Easier management of change in the structure of the data Sometimes you're pulling together data, integrating from different sources (e.g. ELT) and that flexibility helps Document database means that you have no rigid schema. You can do whatever the heck you want. That being said, you SHOULDN’T. You should still have discipline about your data.
  • #13: NoSQL systems are optimized for specific access patterns Low response time for web & mobile user experience Millisecond latency Consistently high throughput to handle growth [perf measures can be subjective – talk about architecture, integrated cache, maybe mention MDS too]
  • #14: If one machine goes down, customers can still use the other. Or if you need to perform maintenance, upgrade, etc, you don't have to take the whole system down This is related to scaling Built-in replication and fail-over No application downtime when hardware fails Online maintenance & upgrade No application downtime
  • #15: Let’s talk about data modeling a bit, because storing data in JSON Is different that storing in tables.
  • #16: So I want to compare the approaches over 4 key areas. I’m going to fill in this table, traditional SQL on the left and JSON on the right
  • #17: Let’s look at modeling Customer data. This is an example of what a customer might look like There is a rich structure: attributes, potentially sub-attributes (first name and last name) Relationships: to other data (other customers, to products perhaps) Value evolution: Maybe we’d start with one connection, change to multiple (data is updated) Structure evolution: Maybe we start without connections and add those later, or we evolve name field to be more than first and last name (data is reshaped)
  • #18: Rich Structure In relational database, this customers data would be stored in five normalized tables. Each time you want to construct a customer object, you JOIN the data in these tables; Each time you persist, you find the appropriate rows in relevant tables and insert/update. Relationship Enforcement is via referential constraints. Objects are constructed by JOINS, EACH time. Value Evolution Additional values of the SAME TYPE (e.g. additional phone, additional address) is managed by additional ROWS in one of the tables. Customer:contacts will have 1:n relationship. Structure Evolution: Imagine we didn't start with a billing table. This is the most difficult part. Changing the structure is difficult, within a table, across tables. While you can do these via ALTER TABLE, requires downtime, migration and application versioning. This is one of the problem document databases try to handle by representing data in JSON.
  • #19: Let’s see how to represent customer data in JSON. The primary (CustomerID) becomes the DocumentKey Column name-Column value becomes KEY-VALUE pair.
  • #20: We aren’t normal form anymore Rich Structure & Relationships Billing information is stored as a sub-document There could be more than a single credit card. So, use an array.
  • #21: Value evolution Simply add additional array element or update a value.
  • #22: Structure evolution Simply add new key-value pairs No downtime to add new KV pairs Applications can validate data Structure evolution over time. Relations via Reference
  • #23: So, finally, you have a JSON document that represents a CUSTOMER. In a single JSON document, relationship between the data is implicit by use of sub-structures and arrays and arrays of sub-structures.
  • #24: Reference slide
  • #26: What types of relationships are being modeled? How are the relationships accessed?
  • #27: Let’s talk about data modeling a bit, because storing data in JSON Is different that storing in tables.
  • #28: We’ll focus on N1QL for now.
  • #29: Notice I’m using Guid That may not be a good idea
  • #32: N1QL is powerful in it's flexibility, declarative nature, familiar to developers, JOINs, etc. Indexing is very important, as it's not as performant as key/value or map/reduce (Maybe talk about indexing on a SQL table vs indexing on a whole bucket)
  • #33: Couchbase 5.0 has introduced some tools for analyzing query performance So you can see what indexes are being used, where the biggest costs are in the query And so on. There are a lot of different types of indexes for N1QL
  • #34: This is kinda like a materialized view It's powerful in that it can be run in parallel, can use JavaScript to do filtering/mapping, great for aggregation. It's limited in that it can't do anything like a JOIN, can't get input from other views, and more
  • #36: Let’s talk about data modeling a bit, because storing data in JSON Is different that storing in tables.
  • #37: Are you going to take the time to clean up the data? Do you need to? Do you need to enrich or restructure the data to take advantage of Json? Duration v resources: how long is it going to take? What tools and resources are available to you? Data governance: what are the rules for moving data, auditing, etc?
  • #38: Duration v resources: how long is it going to take? What tools and resources are available to you? What’s your biggest constraint – time or resources? Do you need to get the migration done in 1 hr (and have it use as many parallel resources as needed) or do you need to minimize/manage the resource impact on the existing system and it doesn’t matter how long it takes?
  • #39: Data governance: what are the rules for moving data, auditing, etc? Do you need to keep track of where the data came from and who is allowed to access it? Many newer systems need to track where sensitive data originated. 
  • #40: A whole bunch at a time, or one at a time Single threaded – easier Multi-threaded – faster, complicated is the migration a one-time event or does it need to happen incrementally (every day or over a 2-3 month period where both the old system and new system are both operating in parallel)? Do you plan to do the data migration as a single thread (read all the data, write all of the data) or using a multi-threaded or multi-process approach where each thread or process reads some percentage of the data.
  • #41: If you're writing your own, Entity Framework can be helpful, because it can do the mapping of aggregate root C# objects for you, which you can then write to a document database So if you already have EF mappings created, you're part way there.
  • #42: KISS: Either export to CVS and use N1QL to do any ETL that’s required (assuming that it’s Simple) or use SQL to do simple ETL on export and then just import into CB. Basically keep it as simple as you can and plan for failure. Developers often think of the migration process as “One and Done”, but the reality is that data migration is often an ongoing headache that DevOps needs to monitor and manage in a production environment. Make everyone’s life easier by thinking about the long game as much as possible.
  • #44: From NoSQL to relational
  • #45: From relational to NoSQL: Goldendate is from oracle Cdata for SSIS and Couchbase https://github.com/mahurtado/CouchbaseGoldenGateAdapter https://www.cdata.com/drivers/couchbase
  • #46: Make it part of your application directly May or may not be reusable This is a lot of work, so make sure you have a good reason
  • #47: Let’s talk about data modeling a bit, because storing data in JSON Is different that storing in tables.
  • #48: Focus on SOA, application/use case specific
  • #49: Use Document type, Versionid Create optimized, understandable keys Weigh nested, referenced or mixed designs Add indexes: Simple, Compound, Functional, Partial, Array, Covering, Memory Optimized
  • #50: N1QL, Key-value, Views,
  • #51: Focus, Success Criteria, Review Architecture consider using a tool like Hackolade to define models rigorously and collaboratively
  • #53: Start the animation
  • #55: Mongo: Features N1QL, XDCR, Full Text Search, Mobile & Sync. Memory-first architecture and proven, easy scaling. CouchDb: Couchbase started as a whole new piece of software that was basically a combination of memcache and CouchDb a long time ago, but has grown far beyond that. Couchbase isn’t a fork or vice versa. They share an acronym and they are both NoSQL. Like MySQL and SQL Server, for instance. Open source apache license for community edition, enterprise edition on a faster release schedule, some advanced features, and support license. Couchbase is software you can run in the cloud on a VM or on your own data center. CosmosDb is a manage cloud service, but there is a emulator you can run locally. Transactions: if you can use nested modeling, you don't need multi-document transactions.