Gab document db scaling database

{
"name": "SmugMug",
"permalink": "smugmug",
"homepage_url": "http://www.smugmug.com",
"blog_url": "http://blogs.smugmug.com/",
"category_code": "photo_video",
"products": [
{
"name": "SmugMug",
"permalink": "smugmug"
}
],
"offices": [
{
"description": "",
"address1": "67 E. Evelyn Ave",
"address2": "",
"zip_code": "94041",
"city": "Mountain View",
"state_code": "CA",
"country_code": "USA",
"latitude": 37.390056,
"longitude": -122.067692
}
]
}
Perfect for these
Documents
schema-agnostic JSON store
for
hierarchical and de-normalized data at scale

Azure DocumentDB
Millions of RPS
Many TBs of data
Transparent Partitioning
<10ms Reads
<15ms Writes
@P99
Low-latency access
around the globe!
Automatic Indexing
Easy-to-learn query
grammar
Multi-Record
Transactions
Blazing fast, planet scale NoSQL service
99.99% SLAs for availability, latency, and throughput

Item Author Pages Language
Harry Potter and the Sorcerer’s
Stone
J.K. Rowling 309 English
Game of Thrones: A Song of Ice
and Fire
George R.R.
Martin
864 English

Stone
and Fire
George R.R.
Martin
864 English
Lenovo Thinkpad X1 Carbon ??? ??? ???

Item Author Pages Language Processor Memory Storage
Harry Potter
and the
Sorcerer’s
Stone
J.K.
Rowling
309 English ??? ??? ???
Game of
Thrones: A
Song of Ice
and Fire
George
R.R.
Martin
864 English ??? ??? ???
Lenovo
Thinkpad X1
Carbon
??? ??? ??? Core i7
3.3ghz
8 GB 256 GB
SSD

Stone
and Fire
George R.R.
Martin
864 English
Item CPU Memory Storage
Lenovo Thinkpad X1 Carbon Core i7 3.3ghz 8 GB 256 GB
SSD

ProductId Item
1 Harry Potter and the
Sorcerer’s Stone
2 Game of Thrones: A Song of
Ice and Fire
3 Lenovo Thinkpad X1 Carbon
ProductId Attribute Value
1 Author J.K. Rowling
1 Pages 309
…
2 Author George R.R. Martin
2 Pages 864
…
3 Processor Core i7 3.3ghz
3 Memory 8 GB
…

EL reto
▪ Escalar con expectativas de
milliones de usuarios en Day 1
▪ Entregar real time responsiveness
for a lag-free, gaming experience
▪ Altamente competitivo – high scores
y global leaderboards son criticos
More Users, More Problems

The Results
▪ #1 in Apple app store free apps
during launch week
▪ >1M downloads
▪ ~1B queries per day
▪ 99p queries served under 10ms

Cuestion de tirar los datos en la bd….

Cuestion de tirar los datos en la
bd….

Porque es tan difícil?
▪ Caches
▪ Scoreboard siempre se actualiza…
▪ SQL database
▪ Se necesita hacer sharding
▪ Schema and Index Management
▪ Loss of relational benefits
▪ Azure Table Storage
▪ Secondary Indexes
▪ Latencia
▪ Throughput

Planet-Scale NoSQL
▪ Horizontal Scaling for storage and
throughput
▪ High performance with SSDs and
automatic indexing
▪ Operating on a global scale

Request Unit (RU) is the
normalized currency
% Memory
% IOPS
% CPU
Replica gets a fixed budget
of Request Units
Resource
Resource
set
Resource
Resource
DocumentsSQL
sprocs
args
Resource Resource
Predictable Performance

Creando una coleccion particionada
//pre-defined collections
DocumentCollection collectionSpec = new DocumentCollection { Id = "Walkers" };
RequestOptions options = new RequestOptions { OfferType = "S3" };
DocumentCollection documentCollection = await client.CreateDocumentCollectionAsync("dbs/" +
database.Id, collectionSpec, options);
//partitioned collections
DocumentCollection collectionSpec = new DocumentCollection { Id = "Walkers" };
collectionSpec.PartitionKey.Paths.Add(“/walkerId”);
int collectionThroughput = 100000;
RequestOptions options = new RequestOptions { OfferThroughput = collectionThroughput };
DocumentCollection documentCollection = await client.CreateDocumentCollectionAsync("dbs/" +
database.Id, collectionSpec, options);

Globalmente distribuido
• Not just for disaster recovery…. DocumentDB is unreasonably highly available
• Replicate data across any # of regions of your choice
• Low-latency access to your data around the globe
• Dynamically configure your write and read regions
Azure DocumentDB nos permite supercar la velocidad de la luz!

Strong consistency, High latency Eventual consistency, Low latency

App define preferencias regionales
ConnectionPolicy docClientConnectionPolicy = new ConnectionPolicy { ConnectionMode =
ConnectionMode.Direct, ConnectionProtocol = Protocol.Tcp };
docClientConnectionPolicy.PreferredLocations.Add(LocationNames.EastUS2);
docClientConnectionPolicy.PreferredLocations.Add(LocationNames.WestUS);
docClient = new DocumentClient(
new Uri("https://myglobaldb.documents.azure.com:443"),
"PARvqUuBw2QTO4rRXr6d1GnLCR7VinERcYrBQvDRh6EDTJLOHtZxgjTS4pv8nQv2Lg1QQLBLfO6TVziOZKvYow==",
docClientConnectionPolicy);

Automatic Indexing
• Index is a union of all the document trees
Common
structure
Terms Postings List/Values
$/location/0/ 1, 2
location/0/country/ 1, 2
location/0/city/ 1, 2
0/country/Germany 1, 2
1/country/France 2
… …
0/city/Moscow 2
0/dealers/0 2
http://aka.ms/docdbvldb
No need to define secondary indices / schema hints!

Politicas de indexación
customize index management including storage
overhead, throughput and query consistency
▪ range, hash and spatial indexes
▪ included and excluded paths
▪ indexing mode; consistent or lazy
▪ index precision
▪ online, in-place index transformations
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
},
{
"kind": "Spatial",
"dataType": "Point"
}
]
}
],
"excludedPaths": []
}

-- Nested lookup against index
SELECT Books.Author
FROM Books
WHERE Books.Author.Name = "Leo Tolstoy"
-- Transformation, Filters, Array access
SELECT { Name: Books.Title, Author: Books.Author.Name }
FROM Books
WHERE Books.Price > 10 AND Books.Languages[0] = "English"
-- Joins, User Defined Functions (UDF)
SELECT CalculateRegionalTax(Books.Price, "USA", "WA")
FROM Books
JOIN LanguagesArr IN Books.Languages
WHERE LanguagesArr.Language = "Russian"
SQL Query Grammar

function(playerId1, playerId2) {
var playersToSwap = __.filter (function (document) {
return (document.id == playerId1 || document.id == playerId2);
});
var player1 = playersToSwap[0], player2 = playersToSwap[1];
var player1ItemTemp = player1.item;
player1.item = player2.item;
player2.item = player1ItemTemp;
__.replaceDocument(player1)
.then(function() { return __.replaceDocument(player2); })
.fail(function(error){ throw 'Unable to update players, abort'; });
}
client.executeStoredProcedureAsync
("procs/1234", ["MasterChief", "SolidSnake“])
.then(function (response) {
console.log(“success!");
}, function (err) {
console.log("Failed to swap!", error);
}
);
Client Database

API and Toolchain Options
DocumentDB
REST over HTTPS/TCPJava .NET
PowerBI

{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"addresses": [
{
"line1": "100 Some Street",
"line2": "Unit 1",
"city": "Seattle",
"state": "WA",
"zip": 98012 }
],
"contactDetails": [
{"email: "thomas@andersen.com"},
{"phone": "+1 555 555-5555", "extension": 5555}
]
}
Try model your entity as a self-
contained document
Generally, use embedded data
models when:
contains
one-to-few
changes infrequently
won’t grow
integral
better read performance

In general, use normalized data
models when:
Write performance
one-to-many
many-to-many
changes frequently
{
"id": "xyz",
"username: "user xyz"
}
{
"id": "address_xyz",
"userid": "xyz",
"address" : {
…
}
}
{
"id: "contact_xyz",
"userid": "xyz",
"email" : "user@user.com"
"phone" : "555 5555"
}
Normalizing typically provides better write performance

No magic bullet
Think about how your data is
going to be written, read and
model accordingly
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"countOfBooks": 3,
"books": [1, 2, 3],
"images": [
{"thumbnail": "http://....png"}
{"profile": "http://....png"}
]
}
{
"id": 1,
"name": "DocumentDB 101",
"authors": [
{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},
{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}
]
}

Gab document db scaling database

More Related Content

What's hot (20)

Similar to Gab document db scaling database (20)

More from MUG Perú (10)

Recently uploaded (20)

Gab document db scaling database