SlideShare a Scribd company logo
Christian Amor Kvalheim (MongoDB Staff Engineer)
From SQL to
MongoDB
How to get from A to B in a
reasonably ordered fashion
Whats Up
❖ The Challenge
❖ Explicit Schema
❖ Implicit Schema
❖ Rules of Thumb
❖ Summary
The Challenge
Take an existing SQL Schema
and pick an Appropriate
MongoDB Schema
Our Example SQL Schema
Explicit Schema
❖ Table structure definition
❖ Primary Key definition
❖ Foreign Key relationships
❖ 1:n
❖ 1:1
❖ n:m
Implicit Schema
❖ The SQL Schema as expressed by the following
operations and their associated metadata
❖ Insert operation
❖ Update operations
❖ Select Operations
❖ Join relationships
With Explicit Schema Only
Relationships
1
n
1
n
n
1
1
n
1
n
1
n
1
n
1 n
No duplication of Data
1
n
1
n
n
1
1
n
1
n
1
n
1
n
1 n
Collections
❖ Customers
❖ Payments [array]
❖ Orders
❖ Orderdetails
❖ Employees
❖ Products
❖ Productlines
❖ Offices
What if we allow duplication ?
1
n
1
n
n
1
1
n
1
n
1
n
1
n
1 n
Collections
❖ Customers
❖ Payments [array]
❖ Orders
❖ Orderdetails [array]
❖ Products [document]
❖ Productlines [document]
❖ Offices
❖ Products
❖ Productlines
Important Notes
❖ Foreign Key Relationship in most cases are not
representative of application level queries
❖ Cannot discover the degree of mutability looking at the
SQL in isolation
❖ Cannot know how the average sizes of n in the 1:n
relationships
Implicit Schema
Implicit Schema
❖ The Implicit Schema represents the SQL operations
executed against the relational schema (Application
Schema)
❖ Can vary hugely from the foreign key relationships
❖ Expresses read/vs write ratios for tables
❖ Can be used to deduct entity mutability
❖ Can be used to estimate n in the 1:n relationships
Example - SELECT
❖ SELECT * FROM orders, orderdetails, products WHERE
…. [1000]
❖ SELECT * FROM offices, employees WHERE … [100]
❖ SELECT * FROM productlines, products WHERE … [2000]
❖ SELECT * FROM products WHERE … [4000]
❖ SELECT * FROM employees, customers WHERE … [200]
❖ SELECT * FROM customers, orders WHERE … [200]
What We Can Learn
❖ The frequency of the SQL operations
❖ The Application Schema relationships studying the join
relationships.
❖ If the logs include the number of rows returned we can
make estimates for the size of n in the 1:n relationships
❖ We can also calculate the rate of growth of the n over
time
1
~5 (+1 every 100 min)
1 ~12
1
1
1
~10001
~15
1
~10
1
~20
1
~6
Example - INSERT/UPDATE
❖ INSERT (…) VALUES (…) INTO orders
❖ INSERT (…) VALUES (…) INTO order details
❖ UPDATE … orders WHERE orderNumber = 1
Data Islands
❖ Single Item Mutability Rate (SIMR)
❖ How much an entity mutates in a given time period
❖ A low mutation rate
❖ Entity reaches a stable state and is a good candidate for rolling up
into a single document
❖ Duplication of data is ok as the document is a snapshot in time
❖ A high mutation rate
❖ Entity does not reach a stable state and keeps mutating and might
not be a good candidate for rollup
Single Item Mutability Rate
❖ Order life span example
❖ An order gets created at T=0
❖ 10 order details are created at T+1
❖ Order is filled and order record updated T+10
❖ Order is shipped and order record updated T+15
❖ Past T+15 there are no more mutations
Order Life Span Example
Order Life Span Example
T
T = 0
Order
Created
T = 1
Added
10
Order Details
T = 10
Order
Fulfilled
T = 15
Order
Shipped
1
~5
1 ~12
1
1
1
~10001
~15
1
~10
1
~20
1
~6
❖ Customer[1:n] -> Payment relationship
❖ A payment created at T=5, T=50
❖ Customer[1:n] -> Orders
❖ An order created at T=0, T=15, T=20, T=45
❖ Unbound Relationships
Customer Life Span Example
Customer Life Span Example
T
T = 0
Order
Created
T = 1 T = 5
Payment
Created
T = 15
Order
Created
Order
Created
T = 20
Order
Created
1
1 ~12
1
1
1
~10001
~15
1
~10
1
~20
1
~6
~5 (+1 every 5 days)
❖ The recursive relationship for Employees makes it
unsuitable for rolling up
❖ The same recursive relationship also affects the offices-
>employees relationship
❖ The ProductLines -> Products relationship are big and
possibly unbound
And The Rest ?
Rules of Thumb
1. SQL Schema + Foreign Key Relationships
❖ Only have the Explicit Relationships and Table
definitions
2. SQL Operations Logs (mysql general log)
❖ Contains only SQL operations (no result set size)
3. Full SQL Operations Logs (mysql slow log)
❖ Contains SQL operations (result set size, latencies)
Levels Of Information
1. Use Selects with Joins to draw the new relationship
2. Establish the average n join relationship
3. Establish the mutation rate of over time
❖ Does the relationship go static ?
❖ Are the relationships unbound ? (growing n)
Analysis Steps
1. Roll up relationships
1. If entity relationship reaches a static state
2. If the rate of growth of n is slow enough for the relationship to be
static (analyst discretion)
2. Don’t rollup relationships
1. If the rate of mutability is high
2. If the average size of n is huge
3. If the mutation rate of the entity is large
4. If an entity has a recursive relationship
Algorithms
Applying It
1
1 ~12
1
1
1
~10001
~15
1
~10
1
~20
1
~6
~5 (+1 every 5 days)
Collapsing, Duplicating
Products and Productlines
1
1 ~12
1
1
1
~10001
~15
1
~10
1
~20
1
~6
~5 (+1 every 5 days)
Collapsing, Duplicating
Products and Productlines
Collapsing
Payments into
Customers as never
queried separately
Collections
❖ Customers
❖ Payments [array]
❖ Orders
❖ Orderdetails [array]
❖ Products [document]
❖ Productlines [document]
❖ Offices
❖ Products
❖ Productlines
What’s Cooking
1. We are working on building tooling to help
1. Analyze your relational schema
2. Propose schema recommendations
3. Load and transform your data
2. Push the whole subject of schema transformation
forward doing something never done before
Tooling
1. Are Operation Latencies important for recommending
a Schema ?
2. Can one quantify a schema recommendation (is
recommendation A better than B and if, then why ?)
3. Can Machine Learning produce better
recommendations ?
4. … etc etc
Tons of Open Questions
Are you ready to build a new team, to build a brand new product, and to create
a whole new category of products for the most popular NoSQL database?
 MongoDB, the leader in NoSQL databases is building a new team in Dublin.
This team will develop products that help our customer adopt our technology
by analyzing their legacy relational systems. We need someone who is going to
participate in the research, partner with our staff engineers who are
prototyping solutions, write production ready code, and build a team.
This person will report to the Director of Integrations at MongoDB.
Come Work With Us
http://grnh.se/ge1rfp1
Q/A
http://grnh.se/ge1rfp1

More Related Content

PDF
Streaming SQL
PDF
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
PDF
WBC Entity Relationship and data flow diagrams
PDF
Tracking your data across the fourth dimension
PDF
Course material of the life sciences .pdf
PDF
DOES16 London - Better Faster Cheaper .. How?
PPTX
JOSA TechTalks - Data Oriented Architecture
DOCX
Directions for Multiple Trendlines on a Single Graph· After yo.docx
Streaming SQL
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
WBC Entity Relationship and data flow diagrams
Tracking your data across the fourth dimension
Course material of the life sciences .pdf
DOES16 London - Better Faster Cheaper .. How?
JOSA TechTalks - Data Oriented Architecture
Directions for Multiple Trendlines on a Single Graph· After yo.docx

Similar to From SQL to MongoDB (20)

PPSX
SOA the Oracle way
PPTX
Data and functional modeling
PPTX
SOA guest lecture at DIKU by Dr. Rasmus Petersen (Dec 17 2015)
PPTX
Data architecture principles to accelerate your data strategy
PPTX
Enough Blame for System Performance Issues
PDF
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
PPTX
Chapter 5 transactions and dcl statements
PPT
PDF
The Complete Lean Enterprise: Value Stream Mapping for Office and Services
PPT
Database management system
PPTX
Talend AS A Product
PDF
Informatica Transformations with Examples | Informatica Tutorial | Informatic...
PPTX
Advanced Cherwell Administration Tips
PDF
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
DOC
AjayKumar Resume
PPTX
Data Quality
PDF
Testing in the New World of Off-the-Shelf Software
DOC
Etl testing
PDF
DDI - Assignment 1.pdf hjhjjhhjkh jkhkjhkj hkljhk h
PPTX
Presentation-wo
SOA the Oracle way
Data and functional modeling
SOA guest lecture at DIKU by Dr. Rasmus Petersen (Dec 17 2015)
Data architecture principles to accelerate your data strategy
Enough Blame for System Performance Issues
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
Chapter 5 transactions and dcl statements
The Complete Lean Enterprise: Value Stream Mapping for Office and Services
Database management system
Talend AS A Product
Informatica Transformations with Examples | Informatica Tutorial | Informatic...
Advanced Cherwell Administration Tips
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
AjayKumar Resume
Data Quality
Testing in the New World of Off-the-Shelf Software
Etl testing
DDI - Assignment 1.pdf hjhjjhhjkh jkhkjhkj hkljhk h
Presentation-wo
Ad

More from christkv (9)

PDF
New in MongoDB 2.6
PDF
Lessons from 4 years of driver develoment
PPTX
Storage talk
KEY
Mongo db ecommerce
PDF
Cdr stats-vo ip-analytics_solution_mongodb_meetup
KEY
Mongodb intro
KEY
Schema design
KEY
Node js mongodriver
PDF
Node.js and ruby
New in MongoDB 2.6
Lessons from 4 years of driver develoment
Storage talk
Mongo db ecommerce
Cdr stats-vo ip-analytics_solution_mongodb_meetup
Mongodb intro
Schema design
Node js mongodriver
Node.js and ruby
Ad

Recently uploaded (20)

PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
ai tools demonstartion for schools and inter college
PPTX
Introduction to Artificial Intelligence
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPT
Introduction Database Management System for Course Database
PDF
top salesforce developer skills in 2025.pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
System and Network Administraation Chapter 3
PPTX
history of c programming in notes for students .pptx
PDF
AI in Product Development-omnex systems
PDF
System and Network Administration Chapter 2
PPTX
Transform Your Business with a Software ERP System
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Upgrade and Innovation Strategies for SAP ERP Customers
ai tools demonstartion for schools and inter college
Introduction to Artificial Intelligence
Online Work Permit System for Fast Permit Processing
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Introduction Database Management System for Course Database
top salesforce developer skills in 2025.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
ManageIQ - Sprint 268 Review - Slide Deck
Operating system designcfffgfgggggggvggggggggg
System and Network Administraation Chapter 3
history of c programming in notes for students .pptx
AI in Product Development-omnex systems
System and Network Administration Chapter 2
Transform Your Business with a Software ERP System
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Understanding Forklifts - TECH EHS Solution
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises

From SQL to MongoDB

  • 1. Christian Amor Kvalheim (MongoDB Staff Engineer) From SQL to MongoDB How to get from A to B in a reasonably ordered fashion
  • 2. Whats Up ❖ The Challenge ❖ Explicit Schema ❖ Implicit Schema ❖ Rules of Thumb ❖ Summary
  • 3. The Challenge Take an existing SQL Schema and pick an Appropriate MongoDB Schema
  • 5. Explicit Schema ❖ Table structure definition ❖ Primary Key definition ❖ Foreign Key relationships ❖ 1:n ❖ 1:1 ❖ n:m
  • 6. Implicit Schema ❖ The SQL Schema as expressed by the following operations and their associated metadata ❖ Insert operation ❖ Update operations ❖ Select Operations ❖ Join relationships
  • 9. No duplication of Data 1 n 1 n n 1 1 n 1 n 1 n 1 n 1 n
  • 10. Collections ❖ Customers ❖ Payments [array] ❖ Orders ❖ Orderdetails ❖ Employees ❖ Products ❖ Productlines ❖ Offices
  • 11. What if we allow duplication ? 1 n 1 n n 1 1 n 1 n 1 n 1 n 1 n
  • 12. Collections ❖ Customers ❖ Payments [array] ❖ Orders ❖ Orderdetails [array] ❖ Products [document] ❖ Productlines [document] ❖ Offices ❖ Products ❖ Productlines
  • 13. Important Notes ❖ Foreign Key Relationship in most cases are not representative of application level queries ❖ Cannot discover the degree of mutability looking at the SQL in isolation ❖ Cannot know how the average sizes of n in the 1:n relationships
  • 15. Implicit Schema ❖ The Implicit Schema represents the SQL operations executed against the relational schema (Application Schema) ❖ Can vary hugely from the foreign key relationships ❖ Expresses read/vs write ratios for tables ❖ Can be used to deduct entity mutability ❖ Can be used to estimate n in the 1:n relationships
  • 16. Example - SELECT ❖ SELECT * FROM orders, orderdetails, products WHERE …. [1000] ❖ SELECT * FROM offices, employees WHERE … [100] ❖ SELECT * FROM productlines, products WHERE … [2000] ❖ SELECT * FROM products WHERE … [4000] ❖ SELECT * FROM employees, customers WHERE … [200] ❖ SELECT * FROM customers, orders WHERE … [200]
  • 17. What We Can Learn ❖ The frequency of the SQL operations ❖ The Application Schema relationships studying the join relationships. ❖ If the logs include the number of rows returned we can make estimates for the size of n in the 1:n relationships ❖ We can also calculate the rate of growth of the n over time
  • 18. 1 ~5 (+1 every 100 min) 1 ~12 1 1 1 ~10001 ~15 1 ~10 1 ~20 1 ~6
  • 19. Example - INSERT/UPDATE ❖ INSERT (…) VALUES (…) INTO orders ❖ INSERT (…) VALUES (…) INTO order details ❖ UPDATE … orders WHERE orderNumber = 1
  • 21. ❖ Single Item Mutability Rate (SIMR) ❖ How much an entity mutates in a given time period ❖ A low mutation rate ❖ Entity reaches a stable state and is a good candidate for rolling up into a single document ❖ Duplication of data is ok as the document is a snapshot in time ❖ A high mutation rate ❖ Entity does not reach a stable state and keeps mutating and might not be a good candidate for rollup Single Item Mutability Rate
  • 22. ❖ Order life span example ❖ An order gets created at T=0 ❖ 10 order details are created at T+1 ❖ Order is filled and order record updated T+10 ❖ Order is shipped and order record updated T+15 ❖ Past T+15 there are no more mutations Order Life Span Example
  • 23. Order Life Span Example T T = 0 Order Created T = 1 Added 10 Order Details T = 10 Order Fulfilled T = 15 Order Shipped
  • 25. ❖ Customer[1:n] -> Payment relationship ❖ A payment created at T=5, T=50 ❖ Customer[1:n] -> Orders ❖ An order created at T=0, T=15, T=20, T=45 ❖ Unbound Relationships Customer Life Span Example
  • 26. Customer Life Span Example T T = 0 Order Created T = 1 T = 5 Payment Created T = 15 Order Created Order Created T = 20 Order Created
  • 28. ❖ The recursive relationship for Employees makes it unsuitable for rolling up ❖ The same recursive relationship also affects the offices- >employees relationship ❖ The ProductLines -> Products relationship are big and possibly unbound And The Rest ?
  • 30. 1. SQL Schema + Foreign Key Relationships ❖ Only have the Explicit Relationships and Table definitions 2. SQL Operations Logs (mysql general log) ❖ Contains only SQL operations (no result set size) 3. Full SQL Operations Logs (mysql slow log) ❖ Contains SQL operations (result set size, latencies) Levels Of Information
  • 31. 1. Use Selects with Joins to draw the new relationship 2. Establish the average n join relationship 3. Establish the mutation rate of over time ❖ Does the relationship go static ? ❖ Are the relationships unbound ? (growing n) Analysis Steps
  • 32. 1. Roll up relationships 1. If entity relationship reaches a static state 2. If the rate of growth of n is slow enough for the relationship to be static (analyst discretion) 2. Don’t rollup relationships 1. If the rate of mutability is high 2. If the average size of n is huge 3. If the mutation rate of the entity is large 4. If an entity has a recursive relationship Algorithms
  • 34. 1 1 ~12 1 1 1 ~10001 ~15 1 ~10 1 ~20 1 ~6 ~5 (+1 every 5 days) Collapsing, Duplicating Products and Productlines
  • 35. 1 1 ~12 1 1 1 ~10001 ~15 1 ~10 1 ~20 1 ~6 ~5 (+1 every 5 days) Collapsing, Duplicating Products and Productlines Collapsing Payments into Customers as never queried separately
  • 36. Collections ❖ Customers ❖ Payments [array] ❖ Orders ❖ Orderdetails [array] ❖ Products [document] ❖ Productlines [document] ❖ Offices ❖ Products ❖ Productlines
  • 38. 1. We are working on building tooling to help 1. Analyze your relational schema 2. Propose schema recommendations 3. Load and transform your data 2. Push the whole subject of schema transformation forward doing something never done before Tooling
  • 39. 1. Are Operation Latencies important for recommending a Schema ? 2. Can one quantify a schema recommendation (is recommendation A better than B and if, then why ?) 3. Can Machine Learning produce better recommendations ? 4. … etc etc Tons of Open Questions
  • 40. Are you ready to build a new team, to build a brand new product, and to create a whole new category of products for the most popular NoSQL database?  MongoDB, the leader in NoSQL databases is building a new team in Dublin. This team will develop products that help our customer adopt our technology by analyzing their legacy relational systems. We need someone who is going to participate in the research, partner with our staff engineers who are prototyping solutions, write production ready code, and build a team. This person will report to the Director of Integrations at MongoDB. Come Work With Us http://grnh.se/ge1rfp1