SlideShare a Scribd company logo
#MDBlocal
ETL for Pros: Getting Data Into
MongoDB
Sam Harley, Senior Solutions Architect
#MDBlocal
Agenda
● Discuss the MongoDB document model
● Discuss how data can be migrated to MongoDB
● Discuss important MongoDB schema design considerations
● Talk through common approaches to MongoDB schema migration
● Outline how threading and batching can be leveraged for ETL purposes
#MDBlocal
Introduction
● At some point, most applications need to batch-load large amounts of data
○ Billions of rows
○ Huge initial load
○ Daily updates (CDC)
● This is typically facilitated using an ETL platform
#MDBlocal
Schema Design Principles
#MDBlocal
The Document Model
{
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: {
type : ‘Point’,
coordinates : [45.123,47.232]
},
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
MongoDB
RDBMS
#MDBlocal
Documents are Rich Data Structures
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: {
type : ‘Point’,
coordinates : [45.123,47.232]
},
Profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array
of sub-documents
Typed field values
Fields can contain arrays
Number
Fields
#MDBlocal
Two Big ‘Rules’
● Consider how the data will be used
● Data that works together lives together
#MDBlocal
Example > Guitar Collectors
ERD
Relational DB MongoDB
Table → Collection
Column → Field
Row → Document
Terminology
#MDBlocal
Example > Guitar Collectors
Relational
Consider your consumers:
how will this data be used?
#MDBlocal
Example > Guitar Collectors
MongoDB - Embedding
#MDBlocal
Example > Guitar Collectors
Typical queries from a consumer of this data might be:
● “What guitars does Aimee Doe own?”
● “Show me all the people who own a Gibson
Les Paul of any age”
● “How many people within 100 miles of Seattle
own guitars made before 1970?”
Consider how the data will be used
X
#MDBlocal
Example > Guitar Collectors
Consider how the data will be used
Typical queries from a consumer of this data might be:
● “What guitars does Aimee Doe own?”
● “Show me all the people who own a Gibson
Les Paul of any age”
● “How many people within 100 miles of Seattle
own guitars made before 1970?”
#MDBlocal
• Ab Initio
• Talend
• Pentaho
• Informatica
How Do I Migrate Data?
#MDBlocal
WYOC (Write Your Own Code)
More challenging, but you have got
ultimate control
How Do I Migrate Data?
#MDBlocal
How do I do it efficiently?
Image: Julian Lim
#MDBlocal
An Example
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
#MDBlocal
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ "qty": 1, "description" : "Aston Martin", "price" : 120000 },
{ "qty": 1, "description" : "Dinner Jacket", "price" : 4000 },
{ "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 }
],
"tracking" : [
{ "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" }
]
}
#MDBlocal
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ "qty": 1, "description" : "Aston Martin", "price" : 120000 },
{ "qty": 1, "description" : "Dinner Jacket", "price" : 4000 },
{ "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 }
],
"tracking" : [
{ "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" }
]
}
#MDBlocal
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ "qty": 1, "description" : "Aston Martin", "price" : 120000 },
{ "qty": 1, "description" : "Dinner Jacket", "price" : 4000 },
{ "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 }
],
"tracking" : [
{ "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" }
]
}
#MDBlocal
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ "qty": 1, "description" : "Aston Martin", "price" : 120000 },
{ "qty": 1, "description" : "Dinner Jacket", "price" : 4000 },
{ "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 }
],
"tracking" : [
{ "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" }
]
}
#MDBlocal
Three Common Approaches
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
#MDBlocal
Approach #1 – Nested Queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
#MDBlocal
Approach #1 – Nested Queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
#MDBlocal
Approach #1 – Nested Queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
#MDBlocal
Approach #1 – Nested Queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
#MDBlocal
Approach #1 – Nested Queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
#MDBlocal
Approach #1 – Nested Queries
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id
doc.items.push (y)
for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id
doc.tracking.push (y)
mongodb.insert (doc)
#MDBlocal
Fan-In & Fan-Out
Number of Database Operations per MongoDB Document
ETL Job
1 per order
+ 2
1
#MDBlocal
Benchmark Results
14.5
0
5
10
15
20
Time (min)
Nested Queries
• 1 million orders
• 10 million line items
• 3 million tracking states
• MySQL (local) to MongoDB (local)
• Python
#MDBlocal
Approach #2 – Build Documents In DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
#MDBlocal
Approach #2 – Build Documents In DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
#MDBlocal
Approach #2 – Build Documents In DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
#MDBlocal
Approach #2 – Build Documents In DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
#MDBlocal
Approach #2 – Build Documents In DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
#MDBlocal
Approach #2 – Build Documents In DB
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
mongodb.insert (doc)
for y in SELECT * FROM ITEMS
mongodb.update ({"_id" : y.order_id},
{"$push" : {"items" : y}})
for z in SELECT * FROM TRACKING
mongodb.update ({"_id" : z.order_id},
{"$push" : {"tracking" : z}})
#MDBlocal
Fan-In & Fan-Out
Number of Database Operations per MongoDB Document
ETL Job3 per order 1 + i + t
i = # of assoc. line items
t = # of assoc. tracking rows
#MDBlocal
Benchmark Results
14.5
95.9
0
20
40
60
80
100
120
Time (min)
Nested Queries Build in DB
#MDBlocal
Approach #3 – Load It All Into Memory
db_items = SELECT * FROM ITEMS
db_tracking = SELECT * FROM TRACKING
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
doc.items.pushAll (db_items.getAll(x.order_id))
doc.tracking.pushAll (db_tracking.getAll(x.order_id))
mongodb.insert (doc)
#MDBlocal
Approach #3 – Load It All Into Memory
db_items = SELECT * FROM ITEMS
db_tracking = SELECT * FROM TRACKING
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
doc.items.pushAll (db_items.getAll(x.order_id))
doc.tracking.pushAll (db_tracking.getAll(x.order_id))
mongodb.insert (doc)
#MDBlocal
Approach #3 – Load It All Into Memory
db_items = SELECT * FROM ITEMS
db_tracking = SELECT * FROM TRACKING
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
doc.items.pushAll (db_items.getAll(x.order_id))
doc.tracking.pushAll (db_tracking.getAll(x.order_id))
mongodb.insert (doc)
#MDBlocal
Approach #3 – Load It All Into Memory
db_items = SELECT * FROM ITEMS
db_tracking = SELECT * FROM TRACKING
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
doc.items.pushAll (db_items.getAll(x.order_id))
doc.tracking.pushAll (db_tracking.getAll(x.order_id))
mongodb.insert (doc)
#MDBlocal
Approach #3 – Load It All Into Memory
db_items = SELECT * FROM ITEMS
db_tracking = SELECT * FROM TRACKING
for x in SELECT * FROM ORDERS
doc = { "first_name" : x.first_name,
"last_name" : x.last_name,
"address" : x.address,
"items" : [], "tracking" : [] }
doc.items.pushAll (db_items.getAll(x.order_id))
doc.tracking.pushAll (db_tracking.getAll(x.order_id))
mongodb.insert (doc)
#MDBlocal
Fan-In & Fan-Out
Number of Database Operations per MongoDB Document
ETL Job3 per order 1
#MDBlocal
Benchmark Results
14.5
95.9
8.5
0
20
40
60
80
100
120
Time (min)
Nested Queries Build in DB Lookup from Memory
#MDBlocal
Co-Iteration
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US"
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... }
]
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... },
{ ..., "description" : "Dinner Jacket", ... }
]
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... },
{ ..., "description" : "Dinner Jacket", ... },
{ ..., "description" : "Champagne...", ... }
]
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... },
{ ..., "description" : "Dinner Jacket", ... },
{ ..., "description" : "Champagne...", ... }
]
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... },
{ ..., "description" : "Dinner Jacket", ... },
{ ..., "description" : "Champagne...", ... }
],
"tracking" : [
{ ... "1985-04-30 09:48:00", ... "ORDERED" }
]
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "James",
"last_name" : "Bond",
"address" : "Nassau, Bahamas, US",
"items" : [
{ ..., "description" : "Aston Martin", ... },
{ ..., "description" : "Dinner Jacket", ... },
{ ..., "description" : "Champagne...", ... }
],
"tracking" : [
{ ... "1985-04-30 09:48:00", ... "ORDERED" }
]
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela"
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... }
]
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... },
{ ..., "description" : "Launch Pad", ... }
]
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... },
{ ..., "description" : "Launch Pad", ... }
],
"tracking" : [
{ ... "1985-04-23 01:30:22", ... "ORDERED" }
]
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... },
{ ..., "description" : "Launch Pad", ... }
],
"tracking" : [
{ ... "1985-04-23 01:30:22", ... "ORDERED" },
{ ... "1985-04-25 08:30:00", ... "SHIPPED" }
]
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
{
"first_name" : "Ernst",
"last_name" : "Blofeldt",
"address" : "Caracas, Venezuela",
"items" : [
{ ..., "description" : "Cat Food", ... },
{ ..., "description" : "Launch Pad", ... }
],
"tracking" : [
{ ... "1985-04-23 01:30:22", ... "ORDERED" },
{ ... "1985-04-25 08:30:00", ... "SHIPPED" },
{ ... "1985-05-14 21:37:00", .. "DELIVERED" }
]
}
#MDBlocal
ORDER
S
TRACKING
ITEM
S
ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS
1 James Bond Nassau, Bahamas, US
2 Ernst Blofeldt Caracas, Venezuela
ID ORDER_ID QTY DESCRIPTION PRICE
1 1 1 Aston Martin 120,000
2 1 1 Dinner Jacket 4,000
3 1 3 Champagne Veuve-Cliquot 200
4 2 100 Cat Food 1
5 2 1 Launch Pad 1,000,000
ORDER_ID TIMESTAMP STATUS
1 1985-04-30 09:48:00 ORDERED
2 1985-04-23 01:30:22 ORDERED
2 1985-04-25 08:30:00 SHIPPED
2 1985-05-14 21:37:00 DELIVERED
Done!
#MDBlocal
Fan-In & Fan-Out
Number of Database Operations per MongoDB Document
ETL Job3 per order 1
#MDBlocal
Benchmark Results
14.5
95.9
8.5 8.1
0
20
40
60
80
100
120
Time (min)
Nested Queries Build in DB Lookup from Memory Co-Iteration
#MDBlocal
Oh, And One More Thing …
#MDBlocal
Threading & Batching
Batch
Size
Threads
Throughput
#MDBlocal
Fan-In & Fan-Out
Number of Database Operations per MongoDB Document
ETL Job3 per order
1 for every
1000 orders
#MDBlocal
Benchmark Results
14.5 9.1
95.9
36.2
8.5 48.1 3.9
0
20
40
60
80
100
120
Simple Batch = 1000
Nested Queries Build in DB Lookup from Memory Co-Iteration
#MDBlocal
Summary
● Remember the two big rules regarding MongoDB schema design
○ Consider how the data will be used
○ Data that works together lives together
● Consider the common approaches to MongoDB schema migration
● Prototype, prototype, prototype
● Don’t forget to utilise batching and threading
#MDBlocal
What Now?
university.mongodb.comAsk The Experts
Sydney MongoDB
User Group
#MDBlocal
THANK YOU
FOR JOINING!

More Related Content

PPTX
Powerful Analysis with the Aggregation Pipeline
PPTX
ETL for Pros: Getting Data Into MongoDB
PPTX
ETL for Pros: Getting Data Into MongoDB
PPTX
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
PPTX
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
PDF
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
PPTX
Data Governance with JSON Schema
PDF
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
Powerful Analysis with the Aggregation Pipeline
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
Data Governance with JSON Schema
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines

What's hot (20)

PDF
Indexing
PPTX
Webinar: Strongly Typed Languages and Flexible Schemas
PDF
Strongly Typed Languages and Flexible Schemas
PDF
Doing More with MongoDB Aggregation
PDF
MongoDB Europe 2016 - Debugging MongoDB Performance
PDF
Embedding a language into string interpolator
PPTX
Introduction to MongoDB at IGDTUW
PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
PPTX
Webinar: Back to Basics: Thinking in Documents
PPTX
Dev Jumpstart: Schema Design Best Practices
PPTX
MongoDB - Back to Basics - La tua prima Applicazione
PPTX
Back to Basics Webinar 2: Your First MongoDB Application
PPTX
Back to Basics Webinar 3 - Thinking in Documents
PPTX
MongoDB Schema Design: Four Real-World Examples
PPTX
Back to Basics Webinar 1 - Introduction to NoSQL
PPTX
MongoDB 3.2 - Analytics
PDF
Mongo DB schema design patterns
PPT
Building Your First App with MongoDB
KEY
Schema Design with MongoDB
PDF
Inside MongoDB: the Internals of an Open-Source Database
Indexing
Webinar: Strongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible Schemas
Doing More with MongoDB Aggregation
MongoDB Europe 2016 - Debugging MongoDB Performance
Embedding a language into string interpolator
Introduction to MongoDB at IGDTUW
Back to Basics Webinar 3: Schema Design Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
Dev Jumpstart: Schema Design Best Practices
MongoDB - Back to Basics - La tua prima Applicazione
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 3 - Thinking in Documents
MongoDB Schema Design: Four Real-World Examples
Back to Basics Webinar 1 - Introduction to NoSQL
MongoDB 3.2 - Analytics
Mongo DB schema design patterns
Building Your First App with MongoDB
Schema Design with MongoDB
Inside MongoDB: the Internals of an Open-Source Database
Ad

Similar to ETL for Pros: Getting Data Into MongoDB (20)

PDF
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PPTX
Jumpstart! From SQL to NoSQL -- Changing Your Mindset
PPTX
MongoDB Schema Design
PDF
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
PDF
Simplifying & accelerating application development with MongoDB's intelligent...
PPTX
Jumpstart: Introduction to Schema Design
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
Data Modeling and Relational to NoSQL
KEY
Hybrid MongoDB and RDBMS Applications
PDF
MongoDB .local Houston 2019: Jumpstart: From SQL to NoSQL -- Changing Your Mi...
PDF
Single View of the Customer
PPTX
Nosql Now 2012: MongoDB Use Cases
PDF
Jumpstart! Building Your First MongoDB App Using Atlas & Stitch
PPTX
Jumpstart: Introduction to MongoDB
PPTX
Intro to MongoDB Workshop
PDF
MongoDB in FS
PPTX
Intro to MongoDB (Extended Session)
PPTX
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
Jumpstart! From SQL to NoSQL -- Changing Your Mindset
MongoDB Schema Design
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
Simplifying & accelerating application development with MongoDB's intelligent...
Jumpstart: Introduction to Schema Design
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
Data Modeling and Relational to NoSQL
Hybrid MongoDB and RDBMS Applications
MongoDB .local Houston 2019: Jumpstart: From SQL to NoSQL -- Changing Your Mi...
Single View of the Customer
Nosql Now 2012: MongoDB Use Cases
Jumpstart! Building Your First MongoDB App Using Atlas & Stitch
Jumpstart: Introduction to MongoDB
Intro to MongoDB Workshop
MongoDB in FS
Intro to MongoDB (Extended Session)
MongoDB Evenings Minneapolis: MongoDB is Cool But When Should I Use It?
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
PDF
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
PDF
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Electronic commerce courselecture one. Pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
Digital-Transformation-Roadmap-for-Companies.pptx
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
Chapter 3 Spatial Domain Image Processing.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Electronic commerce courselecture one. Pdf
The AUB Centre for AI in Media Proposal.docx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
A comparative analysis of optical character recognition models for extracting...
NewMind AI Weekly Chronicles - August'25-Week II
Machine Learning_overview_presentation.pptx
Spectroscopy.pptx food analysis technology
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation

ETL for Pros: Getting Data Into MongoDB

  • 1. #MDBlocal ETL for Pros: Getting Data Into MongoDB Sam Harley, Senior Solutions Architect
  • 2. #MDBlocal Agenda ● Discuss the MongoDB document model ● Discuss how data can be migrated to MongoDB ● Discuss important MongoDB schema design considerations ● Talk through common approaches to MongoDB schema migration ● Outline how threading and batching can be leveraged for ETL purposes
  • 3. #MDBlocal Introduction ● At some point, most applications need to batch-load large amounts of data ○ Billions of rows ○ Huge initial load ○ Daily updates (CDC) ● This is typically facilitated using an ETL platform
  • 5. #MDBlocal The Document Model { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: { type : ‘Point’, coordinates : [45.123,47.232] }, cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } MongoDB RDBMS
  • 6. #MDBlocal Documents are Rich Data Structures { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: { type : ‘Point’, coordinates : [45.123,47.232] }, Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub-documents Typed field values Fields can contain arrays Number Fields
  • 7. #MDBlocal Two Big ‘Rules’ ● Consider how the data will be used ● Data that works together lives together
  • 8. #MDBlocal Example > Guitar Collectors ERD Relational DB MongoDB Table → Collection Column → Field Row → Document Terminology
  • 9. #MDBlocal Example > Guitar Collectors Relational Consider your consumers: how will this data be used?
  • 10. #MDBlocal Example > Guitar Collectors MongoDB - Embedding
  • 11. #MDBlocal Example > Guitar Collectors Typical queries from a consumer of this data might be: ● “What guitars does Aimee Doe own?” ● “Show me all the people who own a Gibson Les Paul of any age” ● “How many people within 100 miles of Seattle own guitars made before 1970?” Consider how the data will be used X
  • 12. #MDBlocal Example > Guitar Collectors Consider how the data will be used Typical queries from a consumer of this data might be: ● “What guitars does Aimee Doe own?” ● “Show me all the people who own a Gibson Les Paul of any age” ● “How many people within 100 miles of Seattle own guitars made before 1970?”
  • 13. #MDBlocal • Ab Initio • Talend • Pentaho • Informatica How Do I Migrate Data?
  • 14. #MDBlocal WYOC (Write Your Own Code) More challenging, but you have got ultimate control How Do I Migrate Data?
  • 15. #MDBlocal How do I do it efficiently? Image: Julian Lim
  • 17. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 18. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 19. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 20. #MDBlocal { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { "qty": 1, "description" : "Aston Martin", "price" : 120000 }, { "qty": 1, "description" : "Dinner Jacket", "price" : 4000 }, { "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 } ], "tracking" : [ { "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" } ] }
  • 21. #MDBlocal { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { "qty": 1, "description" : "Aston Martin", "price" : 120000 }, { "qty": 1, "description" : "Dinner Jacket", "price" : 4000 }, { "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 } ], "tracking" : [ { "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" } ] }
  • 22. #MDBlocal { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { "qty": 1, "description" : "Aston Martin", "price" : 120000 }, { "qty": 1, "description" : "Dinner Jacket", "price" : 4000 }, { "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 } ], "tracking" : [ { "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" } ] }
  • 23. #MDBlocal { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { "qty": 1, "description" : "Aston Martin", "price" : 120000 }, { "qty": 1, "description" : "Dinner Jacket", "price" : 4000 }, { "qty": 3, "description" : "Champagne Veuve-Cliquot", "price": 200 } ], "tracking" : [ { "timestamp" : "1985-04-30 09:48:00", "status": "ORDERED" } ] }
  • 25. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 26. #MDBlocal Approach #1 – Nested Queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 27. #MDBlocal Approach #1 – Nested Queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 28. #MDBlocal Approach #1 – Nested Queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 29. #MDBlocal Approach #1 – Nested Queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 30. #MDBlocal Approach #1 – Nested Queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 31. #MDBlocal Approach #1 – Nested Queries for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } for y in SELECT * FROM ITEMS WHERE ORDER_ID = x.order_id doc.items.push (y) for z in SELECT * FROM TRACKING WHERE ORDER_ID = x.order_id doc.tracking.push (y) mongodb.insert (doc)
  • 32. #MDBlocal Fan-In & Fan-Out Number of Database Operations per MongoDB Document ETL Job 1 per order + 2 1
  • 33. #MDBlocal Benchmark Results 14.5 0 5 10 15 20 Time (min) Nested Queries • 1 million orders • 10 million line items • 3 million tracking states • MySQL (local) to MongoDB (local) • Python
  • 34. #MDBlocal Approach #2 – Build Documents In DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 35. #MDBlocal Approach #2 – Build Documents In DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 36. #MDBlocal Approach #2 – Build Documents In DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 37. #MDBlocal Approach #2 – Build Documents In DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 38. #MDBlocal Approach #2 – Build Documents In DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 39. #MDBlocal Approach #2 – Build Documents In DB for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } mongodb.insert (doc) for y in SELECT * FROM ITEMS mongodb.update ({"_id" : y.order_id}, {"$push" : {"items" : y}}) for z in SELECT * FROM TRACKING mongodb.update ({"_id" : z.order_id}, {"$push" : {"tracking" : z}})
  • 40. #MDBlocal Fan-In & Fan-Out Number of Database Operations per MongoDB Document ETL Job3 per order 1 + i + t i = # of assoc. line items t = # of assoc. tracking rows
  • 42. #MDBlocal Approach #3 – Load It All Into Memory db_items = SELECT * FROM ITEMS db_tracking = SELECT * FROM TRACKING for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } doc.items.pushAll (db_items.getAll(x.order_id)) doc.tracking.pushAll (db_tracking.getAll(x.order_id)) mongodb.insert (doc)
  • 43. #MDBlocal Approach #3 – Load It All Into Memory db_items = SELECT * FROM ITEMS db_tracking = SELECT * FROM TRACKING for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } doc.items.pushAll (db_items.getAll(x.order_id)) doc.tracking.pushAll (db_tracking.getAll(x.order_id)) mongodb.insert (doc)
  • 44. #MDBlocal Approach #3 – Load It All Into Memory db_items = SELECT * FROM ITEMS db_tracking = SELECT * FROM TRACKING for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } doc.items.pushAll (db_items.getAll(x.order_id)) doc.tracking.pushAll (db_tracking.getAll(x.order_id)) mongodb.insert (doc)
  • 45. #MDBlocal Approach #3 – Load It All Into Memory db_items = SELECT * FROM ITEMS db_tracking = SELECT * FROM TRACKING for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } doc.items.pushAll (db_items.getAll(x.order_id)) doc.tracking.pushAll (db_tracking.getAll(x.order_id)) mongodb.insert (doc)
  • 46. #MDBlocal Approach #3 – Load It All Into Memory db_items = SELECT * FROM ITEMS db_tracking = SELECT * FROM TRACKING for x in SELECT * FROM ORDERS doc = { "first_name" : x.first_name, "last_name" : x.last_name, "address" : x.address, "items" : [], "tracking" : [] } doc.items.pushAll (db_items.getAll(x.order_id)) doc.tracking.pushAll (db_tracking.getAll(x.order_id)) mongodb.insert (doc)
  • 47. #MDBlocal Fan-In & Fan-Out Number of Database Operations per MongoDB Document ETL Job3 per order 1
  • 50. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 51. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 52. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US" }
  • 53. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... } ] }
  • 54. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... }, { ..., "description" : "Dinner Jacket", ... } ] }
  • 55. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... }, { ..., "description" : "Dinner Jacket", ... }, { ..., "description" : "Champagne...", ... } ] }
  • 56. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... }, { ..., "description" : "Dinner Jacket", ... }, { ..., "description" : "Champagne...", ... } ] }
  • 57. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... }, { ..., "description" : "Dinner Jacket", ... }, { ..., "description" : "Champagne...", ... } ], "tracking" : [ { ... "1985-04-30 09:48:00", ... "ORDERED" } ] }
  • 58. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "James", "last_name" : "Bond", "address" : "Nassau, Bahamas, US", "items" : [ { ..., "description" : "Aston Martin", ... }, { ..., "description" : "Dinner Jacket", ... }, { ..., "description" : "Champagne...", ... } ], "tracking" : [ { ... "1985-04-30 09:48:00", ... "ORDERED" } ] }
  • 59. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED
  • 60. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela" }
  • 61. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... } ] }
  • 62. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... }, { ..., "description" : "Launch Pad", ... } ] }
  • 63. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... }, { ..., "description" : "Launch Pad", ... } ], "tracking" : [ { ... "1985-04-23 01:30:22", ... "ORDERED" } ] }
  • 64. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... }, { ..., "description" : "Launch Pad", ... } ], "tracking" : [ { ... "1985-04-23 01:30:22", ... "ORDERED" }, { ... "1985-04-25 08:30:00", ... "SHIPPED" } ] }
  • 65. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED { "first_name" : "Ernst", "last_name" : "Blofeldt", "address" : "Caracas, Venezuela", "items" : [ { ..., "description" : "Cat Food", ... }, { ..., "description" : "Launch Pad", ... } ], "tracking" : [ { ... "1985-04-23 01:30:22", ... "ORDERED" }, { ... "1985-04-25 08:30:00", ... "SHIPPED" }, { ... "1985-05-14 21:37:00", .. "DELIVERED" } ] }
  • 66. #MDBlocal ORDER S TRACKING ITEM S ID FIRST_NAME LAST_NAME SHIPPING_ADDRESS 1 James Bond Nassau, Bahamas, US 2 Ernst Blofeldt Caracas, Venezuela ID ORDER_ID QTY DESCRIPTION PRICE 1 1 1 Aston Martin 120,000 2 1 1 Dinner Jacket 4,000 3 1 3 Champagne Veuve-Cliquot 200 4 2 100 Cat Food 1 5 2 1 Launch Pad 1,000,000 ORDER_ID TIMESTAMP STATUS 1 1985-04-30 09:48:00 ORDERED 2 1985-04-23 01:30:22 ORDERED 2 1985-04-25 08:30:00 SHIPPED 2 1985-05-14 21:37:00 DELIVERED Done!
  • 67. #MDBlocal Fan-In & Fan-Out Number of Database Operations per MongoDB Document ETL Job3 per order 1
  • 68. #MDBlocal Benchmark Results 14.5 95.9 8.5 8.1 0 20 40 60 80 100 120 Time (min) Nested Queries Build in DB Lookup from Memory Co-Iteration
  • 69. #MDBlocal Oh, And One More Thing …
  • 71. #MDBlocal Fan-In & Fan-Out Number of Database Operations per MongoDB Document ETL Job3 per order 1 for every 1000 orders
  • 72. #MDBlocal Benchmark Results 14.5 9.1 95.9 36.2 8.5 48.1 3.9 0 20 40 60 80 100 120 Simple Batch = 1000 Nested Queries Build in DB Lookup from Memory Co-Iteration
  • 73. #MDBlocal Summary ● Remember the two big rules regarding MongoDB schema design ○ Consider how the data will be used ○ Data that works together lives together ● Consider the common approaches to MongoDB schema migration ● Prototype, prototype, prototype ● Don’t forget to utilise batching and threading
  • 74. #MDBlocal What Now? university.mongodb.comAsk The Experts Sydney MongoDB User Group