SlideShare a Scribd company logo
Semi Formal Model for Document
Oriented Databases
Daniel Coupal
Universia.com
1
Agenda
1.Why Having a Model?
2.Modeling Steps
3.Capturing the Model
4.Tools
2
Why having a Model?
• Documentation, common language
• Repeatable process
• Abstraction from database implementations
• Support for tools
• A document DB is supposed to be “schemaless”!
• No! Having a schema is a good thing.
Need to declare everything is the problem.
3
What if you have many apps?
Info about the schema is in
the code of Application A
Application B wants to read
the data in the DB.
Where is the description of
what it can read, write, ...?
4
Why we choose NoSQL?
• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
5
Why we choose NoSQL?
• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
• Compromises
• No joins, no transactions, less integrity
• Not as mature technology
• Less tools
6
Tradeoff between Performance and Data Integrity
NoSQL Little Secrets
• No experience on maintaining
databases and apps over the
years, which is the most
expensive activity in software
development.
• Not all the same vendors will
be there in few years.
• What if your DB is not
maintained anymore?
• What if there is a better DB
available?
7
NoSQL State of the Art
• Designing by Example
• Used in most tutorials
• Works well on small examples, like blogs
• Database with more tables needs a better way
to capture the design
8
{
"_id" : ObjectId("508d27069cc1ae293b36928d"),
"title" : "This is the title",
"body" : "This is the body text.",
"tags" : [
"chocolate",
"spleen",
"piano",
"spatula"
],
"created_date" : ISODate("2012-10-28T12:41:39.110Z"),
"author_id" : ObjectId("508d280e9cc1ae293b36928e"),
"category_id" : ObjectId("508d29709cc1ae293b369295"),
"comments" : [
{
"subject" : "This is comment 1",
"body" : "This is the body of comment 1.",
"author_id" : ObjectId("508d345f9cc1ae293b369296"),
"created_date" : ISODate("2012-10-28T13:34:23.929Z")
},
{
"subject" : "This is comment 2",
"body" : "This is the body of comment 2.",
"author_id" : ObjectId("508d34739cc1ae293b369297"),
"created_date" : ISODate("2012-10-28T13:34:43.192Z")
},
]
}
9
NoSQL State of the Art
Complex ER Diagram
10
Northwind ER Diagram
11
Northwind Doc Diagram
11 tables in those 5 collections
No need for:
- CustomerCustomerDemographics
- EmployeeTerritories
because they are N-to-N relationships,
and don’t contain any data
Products
Suppliers
OrdersEmployees Customers
Customer
Demographics
Shippers
OrderDetails
Region
Categories
12
Territories
That was a bad example...
• Why?
13
That was a bad example...
• Why?
• With a document database, you don’t model
data as your first step!
• Data is modeled based on the usage
• SQL’s model first approach leads to bad
performance for every app.
NOSQL does the opposite.
14
Modeling Steps
SQL NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
general usage current usage
what answer do I have? what questions do I have?
model data write queries
write application add indexes
write queries model data
add indexes write application
15
Step 1: Write Queries
• Basic fields to retrieve
• Frequency of the query, requested speed
• Criticality of the query for the system
• Design notes
➡ Sort the queries by importance
16
Step 2: Add Indexes
• Which indexes do you need for the queries to go
fast?
• Attributes of your indexes
17
Step 3: Model Data
• List the collections
• How many documents per collection?
➡ NoSQL is all about size and performance, no?
• Attributes on the collections (capped, ...)
• List the fields, their types, constraints
➡ Only for the important fields
18
Step 4: Write Application
• Integration code/driver/queries/database
• Balance between using the product functionality and
isolating the layer that deals with the database.
• Interesting new tools to normalize to a common
query language: JSONiq, BigSQL, ...
19
Capturing the Model
• JSON is a cool format!
• Your document database is a cool storage facility!
• Language for the model: JSON Schema
• supports things like: types, cardinality, references, acceptable values, ...
20
JSON Schema
{
"address": {
"streetAddress": "21 2nd Street",
"city":"New York"
},
"phoneNumber":
[
{
"type":"home",
"number":"212 555-1234"
}
]
}
{
"type": "object",
"properties": {
"address": {
"type": "object",
"properties": {
"city": {
"type": "string"
},
"streetAddress": {
"type": "string"
}
}
},
"phoneNumber": {
"type": "array",
"items": {
"properties": {
"number": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
21
Model: Query
• Use:
• the native DB notation
• or use SQL (everyone can read SQL)
• Avoid joins!!!
• Example:
• Product by ProductID, ProductName, SupplierID
• Order by OrderID, CustomerID, ContactName
• Customer by CustomerID, ContactName, OrderID
22
Example
23
{
! "id" : "REQ002",
! "name" : "Get product by name",
! "n" : “20000/day”,
“t” : “2 ms”,
! "notes" : [
! ! "User asking about a product availability by product name"
! ],
! "sqlquery" : "select * from product where product.ProductName = abcde",
! "mongoquery" : {
! ! "ProductName" : "abcde"
! }
}
Model: Index
• Again, use the native DB notation
• Example:
• Product.ProductID, .ProductName, .SupplierID
• Order.OrderID, .CustomerID, .ContactName
• Customer by .CustomerID, .ContactName, .OrderID
• Why is it useful, it looks so trivial?
• If written a tool can validate it or create estimates
24
Example
25
{
! "id" : "REQ002",
! "name" : "Get product by name",
! "n" : “20000/day”,
“t” : “2 ms”,
! "notes" : [
! ! "User asking about a product availability by product name"
! ],
! "sqlquery" : "select * from product where product.ProductName = abcde",
! "mongoquery" : {
! ! "ProductName" : "abcde"
! },
! "index" : {
! ! "collection" : "Products",
! ! "field" : "ProductName"
! }
}
Model: Data
• Collection
• One JSON-Schema document per collection
• Fields for collection and database
• Optionally, add a version number
26
Example for ‘Orders’
27
{
“database” : “northwind”,
“collection” : “Orders”,
“version” : 1,
"type":"object",
"$schema": “http://json-schema.org/draft-03/schema”,
"id": "http://jsonschema.net",
“properties”: {
"CustomerID": {
"type":"string",
"id": "http://jsonschema.net/CustomerID"
},
“Details”: {
"type":"array",
"id": "http://jsonschema.net/Details",
"items":
{
“type”: “object”,
"id": "http://jsonschema.net/Details/0",
“required”: [ “ProductID”, “Quantity” ],
"properties": {
"ProductID": {
"type":"number",
"id": "http://jsonschema.net/Details/0/ProductID"
},
"Quantity": {
“type”: “number",
},
Simpler...
28
{
“database” : “northwind”,
“collection” : “Orders”,
“version” : 1,
"type":"object",
"properties": {
"CustomerID": {
"type":"string"
},
"Details": {
"type":"array",
"items":
{
"type":"object",
"properties": {
"ProductID": {
"type":"number"
},
"Quantity": {
"type":"number"
},
...
Model: Versioning
• Each modified version of a
collection is a new document
• db.<database>.find({“version:2”})
➡shows all collections for version
‘2’ of the schema for the DB.
29
Partial Schema
• Example: you just want to validate the ‘version’
field which has values as ‘string’ and as ‘number’
30
{
"type": "object",
"properties": {
"version": {
"type": "string",
}
}
}
{
"version": 1.0,
...
},
{
"version": “1.0.1”,
...
}
JSON SchemaJSON
Tools
• Get some JSON Schema from JSON:
• http://www.jsonschema.net/
• Validate your schema
• http://jsonschemalint.com/
• https://github.com/dcoupal/godbtools.git
• Validate/edit JSON
• http://jsonlint.com/ or RoboMongo
• Import SQL into NoSQL
• Pentaho, Talend
31
Tools considerations
• NoSQL often relies on data being in RAM.
Scanning all your data can make your dataset in
memory “cold”, instead of “hot”
• running incremental validations work better, ensure
you have timestamps on insertions and updates
32
Document Validator
33
Schema
(JSON Schema)
Collection
(JSON)
Validator
“Eventual Integrity”
• NoSQL have eventual consistency
• With tools that validate and fix the data according
to a set of rules, we get “eventual integrity”
34
Tools to be developed
• UI to manipulate a schema graphically
• More Complete Validators:
• constraints
• relationships
• Per language library to validate inserted/updated
documents
35
Conclusion: Take Aways
• Design in this order:
queries, indexes, data,
application.
• Capture your model
outside the application.
• Not having a schema is
not a good thing!
Use the attribute
‘schemaless’ wisely!
36
NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
current usage
what questions do I have?
write queries
add indexes
model data
write application
Questions?
• dcoupal@universia.com
37

More Related Content

PDF
Multi-model database
PPTX
Common MongoDB Use Cases
PPTX
NoSQL Tel Aviv Meetup#1: NoSQL Data Modeling
PPTX
Tuning for Performance: indexes & Queries
PPTX
Creating a Single View: Data Design and Loading Strategies
PPTX
Creating a Single View Part 2: Loading Disparate Source Data and Creating a S...
PPTX
Webinar: MongoDB Schema Design and Performance Implications
PDF
Native JSON Support in SQL2016
Multi-model database
Common MongoDB Use Cases
NoSQL Tel Aviv Meetup#1: NoSQL Data Modeling
Tuning for Performance: indexes & Queries
Creating a Single View: Data Design and Loading Strategies
Creating a Single View Part 2: Loading Disparate Source Data and Creating a S...
Webinar: MongoDB Schema Design and Performance Implications
Native JSON Support in SQL2016

What's hot (20)

PDF
Multi-model Databases and Tightly Integrated Polystores
PPTX
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
PPTX
N1QL workshop: Indexing & Query turning.
PPTX
The Right (and Wrong) Use Cases for MongoDB
PPTX
Data Analytics with MongoDB - Jane Fine
PPTX
Indexing Strategies to Help You Scale
PPTX
Understanding N1QL Optimizer to Tune Queries
PDF
OrientDB: Unlock the Value of Document Data Relationships
PDF
Data Modeling with Neo4j
PDF
01 nosql and multi model database
PPTX
Jumpstart: Introduction to Schema Design
PDF
Vital AI: Big Data Modeling
PDF
MongoDB Meetup
PPTX
How Insurance Companies Use MongoDB
PDF
Getting Started with NoSQL
PPTX
Webinar: Schema Design and Performance Implications
PPTX
Webinar: How Banks Use MongoDB as a Tick Database
PDF
Multi model-databases
PPTX
MongoDB Days UK: Jumpstart: Schema Design
PPTX
Multi-model Databases and Tightly Integrated Polystores
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
N1QL workshop: Indexing & Query turning.
The Right (and Wrong) Use Cases for MongoDB
Data Analytics with MongoDB - Jane Fine
Indexing Strategies to Help You Scale
Understanding N1QL Optimizer to Tune Queries
OrientDB: Unlock the Value of Document Data Relationships
Data Modeling with Neo4j
01 nosql and multi model database
Jumpstart: Introduction to Schema Design
Vital AI: Big Data Modeling
MongoDB Meetup
How Insurance Companies Use MongoDB
Getting Started with NoSQL
Webinar: Schema Design and Performance Implications
Webinar: How Banks Use MongoDB as a Tick Database
Multi model-databases
MongoDB Days UK: Jumpstart: Schema Design
Ad

Similar to Semi Formal Model for Document Oriented Databases (20)

PDF
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
PDF
Gab document db scaling database
PPTX
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
PPTX
No SQL, No Problem: Use Azure DocumentDB
PDF
Data_Modeling_MongoDB.pdf
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
PDF
Application development with Oracle NoSQL Database 3.0
PPTX
SQL To NoSQL - Top 6 Questions Before Making The Move
PPTX
NoSQL Data Modeling using Couchbase
KEY
Managing Social Content with MongoDB
PPTX
MongoDB Schema Design: Practical Applications and Implications
PDF
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
PPTX
Crafting Evolvable Api Responses
PDF
Inferring Versioned Schemas from NoSQL Databases and its Applications
PDF
Aggregation Framework MongoDB Days Munich
PPTX
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
PDF
MongoDB for Coder Training (Coding Serbia 2013)
PDF
Data Processing and Aggregation with MongoDB
PPTX
SQL to NoSQL: Top 6 Questions
PPTX
Retail referencearchitecture productcatalog
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Gab document db scaling database
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
No SQL, No Problem: Use Azure DocumentDB
Data_Modeling_MongoDB.pdf
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
Application development with Oracle NoSQL Database 3.0
SQL To NoSQL - Top 6 Questions Before Making The Move
NoSQL Data Modeling using Couchbase
Managing Social Content with MongoDB
MongoDB Schema Design: Practical Applications and Implications
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Crafting Evolvable Api Responses
Inferring Versioned Schemas from NoSQL Databases and its Applications
Aggregation Framework MongoDB Days Munich
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
MongoDB for Coder Training (Coding Serbia 2013)
Data Processing and Aggregation with MongoDB
SQL to NoSQL: Top 6 Questions
Retail referencearchitecture productcatalog
Ad

More from Daniel Coupal (6)

PPTX
MongoDB.Live 2020 - Advanced Schema Design Patterns
PDF
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
PDF
Silicon Valley Code Camp 2016 - MongoDB in production
PDF
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
PDF
MMS: The Easiest Way to Run MongoDB
PDF
Silicon Valley Code Camp 2014 - Advanced MongoDB
MongoDB.Live 2020 - Advanced Schema Design Patterns
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
MMS: The Easiest Way to Run MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDB

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
Understanding_Digital_Forensics_Presentation.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
sap open course for s4hana steps from ECC to s4
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Programs and apps: productivity, graphics, security and other tools
Encapsulation_ Review paper, used for researhc scholars
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Building Integrated photovoltaic BIPV_UPV.pdf
The AUB Centre for AI in Media Proposal.docx

Semi Formal Model for Document Oriented Databases

  • 1. Semi Formal Model for Document Oriented Databases Daniel Coupal Universia.com 1
  • 2. Agenda 1.Why Having a Model? 2.Modeling Steps 3.Capturing the Model 4.Tools 2
  • 3. Why having a Model? • Documentation, common language • Repeatable process • Abstraction from database implementations • Support for tools • A document DB is supposed to be “schemaless”! • No! Having a schema is a good thing. Need to declare everything is the problem. 3
  • 4. What if you have many apps? Info about the schema is in the code of Application A Application B wants to read the data in the DB. Where is the description of what it can read, write, ...? 4
  • 5. Why we choose NoSQL? • Rewards • Huge amount of data • Cheap hardware • Blazing fast 5
  • 6. Why we choose NoSQL? • Rewards • Huge amount of data • Cheap hardware • Blazing fast • Compromises • No joins, no transactions, less integrity • Not as mature technology • Less tools 6 Tradeoff between Performance and Data Integrity
  • 7. NoSQL Little Secrets • No experience on maintaining databases and apps over the years, which is the most expensive activity in software development. • Not all the same vendors will be there in few years. • What if your DB is not maintained anymore? • What if there is a better DB available? 7
  • 8. NoSQL State of the Art • Designing by Example • Used in most tutorials • Works well on small examples, like blogs • Database with more tables needs a better way to capture the design 8
  • 9. { "_id" : ObjectId("508d27069cc1ae293b36928d"), "title" : "This is the title", "body" : "This is the body text.", "tags" : [ "chocolate", "spleen", "piano", "spatula" ], "created_date" : ISODate("2012-10-28T12:41:39.110Z"), "author_id" : ObjectId("508d280e9cc1ae293b36928e"), "category_id" : ObjectId("508d29709cc1ae293b369295"), "comments" : [ { "subject" : "This is comment 1", "body" : "This is the body of comment 1.", "author_id" : ObjectId("508d345f9cc1ae293b369296"), "created_date" : ISODate("2012-10-28T13:34:23.929Z") }, { "subject" : "This is comment 2", "body" : "This is the body of comment 2.", "author_id" : ObjectId("508d34739cc1ae293b369297"), "created_date" : ISODate("2012-10-28T13:34:43.192Z") }, ] } 9 NoSQL State of the Art
  • 12. Northwind Doc Diagram 11 tables in those 5 collections No need for: - CustomerCustomerDemographics - EmployeeTerritories because they are N-to-N relationships, and don’t contain any data Products Suppliers OrdersEmployees Customers Customer Demographics Shippers OrderDetails Region Categories 12 Territories
  • 13. That was a bad example... • Why? 13
  • 14. That was a bad example... • Why? • With a document database, you don’t model data as your first step! • Data is modeled based on the usage • SQL’s model first approach leads to bad performance for every app. NOSQL does the opposite. 14
  • 15. Modeling Steps SQL NoSQL Goal Answer to Step 1 Step 2 Step 3 Step 4 general usage current usage what answer do I have? what questions do I have? model data write queries write application add indexes write queries model data add indexes write application 15
  • 16. Step 1: Write Queries • Basic fields to retrieve • Frequency of the query, requested speed • Criticality of the query for the system • Design notes ➡ Sort the queries by importance 16
  • 17. Step 2: Add Indexes • Which indexes do you need for the queries to go fast? • Attributes of your indexes 17
  • 18. Step 3: Model Data • List the collections • How many documents per collection? ➡ NoSQL is all about size and performance, no? • Attributes on the collections (capped, ...) • List the fields, their types, constraints ➡ Only for the important fields 18
  • 19. Step 4: Write Application • Integration code/driver/queries/database • Balance between using the product functionality and isolating the layer that deals with the database. • Interesting new tools to normalize to a common query language: JSONiq, BigSQL, ... 19
  • 20. Capturing the Model • JSON is a cool format! • Your document database is a cool storage facility! • Language for the model: JSON Schema • supports things like: types, cardinality, references, acceptable values, ... 20
  • 21. JSON Schema { "address": { "streetAddress": "21 2nd Street", "city":"New York" }, "phoneNumber": [ { "type":"home", "number":"212 555-1234" } ] } { "type": "object", "properties": { "address": { "type": "object", "properties": { "city": { "type": "string" }, "streetAddress": { "type": "string" } } }, "phoneNumber": { "type": "array", "items": { "properties": { "number": { "type": "string" }, "type": { "type": "string" } } } } } } 21
  • 22. Model: Query • Use: • the native DB notation • or use SQL (everyone can read SQL) • Avoid joins!!! • Example: • Product by ProductID, ProductName, SupplierID • Order by OrderID, CustomerID, ContactName • Customer by CustomerID, ContactName, OrderID 22
  • 23. Example 23 { ! "id" : "REQ002", ! "name" : "Get product by name", ! "n" : “20000/day”, “t” : “2 ms”, ! "notes" : [ ! ! "User asking about a product availability by product name" ! ], ! "sqlquery" : "select * from product where product.ProductName = abcde", ! "mongoquery" : { ! ! "ProductName" : "abcde" ! } }
  • 24. Model: Index • Again, use the native DB notation • Example: • Product.ProductID, .ProductName, .SupplierID • Order.OrderID, .CustomerID, .ContactName • Customer by .CustomerID, .ContactName, .OrderID • Why is it useful, it looks so trivial? • If written a tool can validate it or create estimates 24
  • 25. Example 25 { ! "id" : "REQ002", ! "name" : "Get product by name", ! "n" : “20000/day”, “t” : “2 ms”, ! "notes" : [ ! ! "User asking about a product availability by product name" ! ], ! "sqlquery" : "select * from product where product.ProductName = abcde", ! "mongoquery" : { ! ! "ProductName" : "abcde" ! }, ! "index" : { ! ! "collection" : "Products", ! ! "field" : "ProductName" ! } }
  • 26. Model: Data • Collection • One JSON-Schema document per collection • Fields for collection and database • Optionally, add a version number 26
  • 27. Example for ‘Orders’ 27 { “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "$schema": “http://json-schema.org/draft-03/schema”, "id": "http://jsonschema.net", “properties”: { "CustomerID": { "type":"string", "id": "http://jsonschema.net/CustomerID" }, “Details”: { "type":"array", "id": "http://jsonschema.net/Details", "items": { “type”: “object”, "id": "http://jsonschema.net/Details/0", “required”: [ “ProductID”, “Quantity” ], "properties": { "ProductID": { "type":"number", "id": "http://jsonschema.net/Details/0/ProductID" }, "Quantity": { “type”: “number", },
  • 28. Simpler... 28 { “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "properties": { "CustomerID": { "type":"string" }, "Details": { "type":"array", "items": { "type":"object", "properties": { "ProductID": { "type":"number" }, "Quantity": { "type":"number" }, ...
  • 29. Model: Versioning • Each modified version of a collection is a new document • db.<database>.find({“version:2”}) ➡shows all collections for version ‘2’ of the schema for the DB. 29
  • 30. Partial Schema • Example: you just want to validate the ‘version’ field which has values as ‘string’ and as ‘number’ 30 { "type": "object", "properties": { "version": { "type": "string", } } } { "version": 1.0, ... }, { "version": “1.0.1”, ... } JSON SchemaJSON
  • 31. Tools • Get some JSON Schema from JSON: • http://www.jsonschema.net/ • Validate your schema • http://jsonschemalint.com/ • https://github.com/dcoupal/godbtools.git • Validate/edit JSON • http://jsonlint.com/ or RoboMongo • Import SQL into NoSQL • Pentaho, Talend 31
  • 32. Tools considerations • NoSQL often relies on data being in RAM. Scanning all your data can make your dataset in memory “cold”, instead of “hot” • running incremental validations work better, ensure you have timestamps on insertions and updates 32
  • 34. “Eventual Integrity” • NoSQL have eventual consistency • With tools that validate and fix the data according to a set of rules, we get “eventual integrity” 34
  • 35. Tools to be developed • UI to manipulate a schema graphically • More Complete Validators: • constraints • relationships • Per language library to validate inserted/updated documents 35
  • 36. Conclusion: Take Aways • Design in this order: queries, indexes, data, application. • Capture your model outside the application. • Not having a schema is not a good thing! Use the attribute ‘schemaless’ wisely! 36 NoSQL Goal Answer to Step 1 Step 2 Step 3 Step 4 current usage what questions do I have? write queries add indexes model data write application