Semi Formal Model for Document Oriented Databases

Semi Formal Model for Document
Oriented Databases
Daniel Coupal
Universia.com
1

Agenda
1.Why Having a Model?
2.Modeling Steps
3.Capturing the Model
4.Tools
2

Why having a Model?
• Documentation, common language
• Repeatable process
• Abstraction from database implementations
• Support for tools
• A document DB is supposed to be “schemaless”!
• No! Having a schema is a good thing.
Need to declare everything is the problem.
3

What if you have many apps?
Info about the schema is in
the code of Application A
Application B wants to read
the data in the DB.
Where is the description of
what it can read, write, ...?
4

Why we choose NoSQL?
• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
5

Why we choose NoSQL?
• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
• Compromises
• No joins, no transactions, less integrity
• Not as mature technology
• Less tools
6
Tradeoff between Performance and Data Integrity

NoSQL Little Secrets
• No experience on maintaining
databases and apps over the
years, which is the most
expensive activity in software
development.
• Not all the same vendors will
be there in few years.
• What if your DB is not
maintained anymore?
• What if there is a better DB
available?
7

NoSQL State of the Art
• Designing by Example
• Used in most tutorials
• Works well on small examples, like blogs
• Database with more tables needs a better way
to capture the design
8

{
"_id" : ObjectId("508d27069cc1ae293b36928d"),
"title" : "This is the title",
"body" : "This is the body text.",
"tags" : [
"chocolate",
"spleen",
"piano",
"spatula"
],
"created_date" : ISODate("2012-10-28T12:41:39.110Z"),
"author_id" : ObjectId("508d280e9cc1ae293b36928e"),
"category_id" : ObjectId("508d29709cc1ae293b369295"),
"comments" : [
{
"subject" : "This is comment 1",
"body" : "This is the body of comment 1.",
"author_id" : ObjectId("508d345f9cc1ae293b369296"),
"created_date" : ISODate("2012-10-28T13:34:23.929Z")
},
{
"subject" : "This is comment 2",
"body" : "This is the body of comment 2.",
"author_id" : ObjectId("508d34739cc1ae293b369297"),
"created_date" : ISODate("2012-10-28T13:34:43.192Z")
},
]
}
9
NoSQL State of the Art

Northwind Doc Diagram
11 tables in those 5 collections
No need for:
- CustomerCustomerDemographics
- EmployeeTerritories
because they are N-to-N relationships,
and don’t contain any data
Products
Suppliers
OrdersEmployees Customers
Customer
Demographics
Shippers
OrderDetails
Region
Categories
12
Territories

That was a bad example...
• Why?
13

That was a bad example...
• Why?
• With a document database, you don’t model
data as your ﬁrst step!
• Data is modeled based on the usage
• SQL’s model ﬁrst approach leads to bad
performance for every app.
NOSQL does the opposite.
14

Modeling Steps
SQL NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
general usage current usage
what answer do I have? what questions do I have?
model data write queries
write application add indexes
write queries model data
add indexes write application
15

Step 1: Write Queries
• Basic ﬁelds to retrieve
• Frequency of the query, requested speed
• Criticality of the query for the system
• Design notes
➡ Sort the queries by importance
16

Step 2: Add Indexes
• Which indexes do you need for the queries to go
fast?
• Attributes of your indexes
17

Step 3: Model Data
• List the collections
• How many documents per collection?
➡ NoSQL is all about size and performance, no?
• Attributes on the collections (capped, ...)
• List the ﬁelds, their types, constraints
➡ Only for the important ﬁelds
18

Step 4: Write Application
• Integration code/driver/queries/database
• Balance between using the product functionality and
isolating the layer that deals with the database.
• Interesting new tools to normalize to a common
query language: JSONiq, BigSQL, ...
19

Capturing the Model
• JSON is a cool format!
• Your document database is a cool storage facility!
• Language for the model: JSON Schema
• supports things like: types, cardinality, references, acceptable values, ...
20

JSON Schema
{
"address": {
"streetAddress": "21 2nd Street",
"city":"New York"
},
"phoneNumber":
[
{
"type":"home",
"number":"212 555-1234"
}
]
}
{
"type": "object",
"properties": {
"address": {
"type": "object",
"properties": {
"city": {
"type": "string"
},
"streetAddress": {
"type": "string"
}
}
},
"phoneNumber": {
"type": "array",
"items": {
"properties": {
"number": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
21

Model: Query
• Use:
• the native DB notation
• or use SQL (everyone can read SQL)
• Avoid joins!!!
• Example:
• Product by ProductID, ProductName, SupplierID
• Order by OrderID, CustomerID, ContactName
• Customer by CustomerID, ContactName, OrderID
22

Example
23
{
! "id" : "REQ002",
! "name" : "Get product by name",
! "n" : “20000/day”,
“t” : “2 ms”,
! "notes" : [
! ! "User asking about a product availability by product name"
! ],
! "sqlquery" : "select * from product where product.ProductName = abcde",
! "mongoquery" : {
! ! "ProductName" : "abcde"
! }
}

Model: Index
• Again, use the native DB notation
• Example:
• Product.ProductID, .ProductName, .SupplierID
• Order.OrderID, .CustomerID, .ContactName
• Customer by .CustomerID, .ContactName, .OrderID
• Why is it useful, it looks so trivial?
• If written a tool can validate it or create estimates
24

Example
25
{
! "id" : "REQ002",
! "name" : "Get product by name",
! "n" : “20000/day”,
“t” : “2 ms”,
! "notes" : [
! ! "User asking about a product availability by product name"
! ],
! "sqlquery" : "select * from product where product.ProductName = abcde",
! "mongoquery" : {
! ! "ProductName" : "abcde"
! },
! "index" : {
! ! "collection" : "Products",
! ! "ﬁeld" : "ProductName"
! }
}

Model: Data
• Collection
• One JSON-Schema document per collection
• Fields for collection and database
• Optionally, add a version number
26

Example for ‘Orders’
27
{
“database” : “northwind”,
“collection” : “Orders”,
“version” : 1,
"type":"object",
"$schema": “http://json-schema.org/draft-03/schema”,
"id": "http://jsonschema.net",
“properties”: {
"CustomerID": {
"type":"string",
"id": "http://jsonschema.net/CustomerID"
},
“Details”: {
"type":"array",
"id": "http://jsonschema.net/Details",
"items":
{
“type”: “object”,
"id": "http://jsonschema.net/Details/0",
“required”: [ “ProductID”, “Quantity” ],
"properties": {
"ProductID": {
"type":"number",
"id": "http://jsonschema.net/Details/0/ProductID"
},
"Quantity": {
“type”: “number",
},

Simpler...
28
{
“database” : “northwind”,
“collection” : “Orders”,
“version” : 1,
"type":"object",
"properties": {
"CustomerID": {
"type":"string"
},
"Details": {
"type":"array",
"items":
{
"type":"object",
"properties": {
"ProductID": {
"type":"number"
},
"Quantity": {
"type":"number"
},
...

Model: Versioning
• Each modiﬁed version of a
collection is a new document
• db.<database>.ﬁnd({“version:2”})
➡shows all collections for version
‘2’ of the schema for the DB.
29

Partial Schema
• Example: you just want to validate the ‘version’
ﬁeld which has values as ‘string’ and as ‘number’
30
{
"type": "object",
"properties": {
"version": {
"type": "string",
}
}
}
{
"version": 1.0,
...
},
{
"version": “1.0.1”,
...
}
JSON SchemaJSON

Tools
• Get some JSON Schema from JSON:
• http://www.jsonschema.net/
• Validate your schema
• http://jsonschemalint.com/
• https://github.com/dcoupal/godbtools.git
• Validate/edit JSON
• http://jsonlint.com/ or RoboMongo
• Import SQL into NoSQL
• Pentaho, Talend
31

Tools considerations
• NoSQL often relies on data being in RAM.
Scanning all your data can make your dataset in
memory “cold”, instead of “hot”
• running incremental validations work better, ensure
you have timestamps on insertions and updates
32

Document Validator
33
Schema
(JSON Schema)
Collection
(JSON)
Validator

“Eventual Integrity”
• NoSQL have eventual consistency
• With tools that validate and ﬁx the data according
to a set of rules, we get “eventual integrity”
34

Tools to be developed
• UI to manipulate a schema graphically
• More Complete Validators:
• constraints
• relationships
• Per language library to validate inserted/updated
documents
35

Conclusion: Take Aways
• Design in this order:
queries, indexes, data,
application.
• Capture your model
outside the application.
• Not having a schema is
not a good thing!
Use the attribute
‘schemaless’ wisely!
36
NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
current usage
what questions do I have?
write queries
add indexes
model data
write application

Questions?
• dcoupal@universia.com
37

Semi Formal Model for Document Oriented Databases

More Related Content

What's hot (20)

Similar to Semi Formal Model for Document Oriented Databases (20)

More from Daniel Coupal (6)

Recently uploaded (20)

Semi Formal Model for Document Oriented Databases