SlideShare a Scribd company logo
Gab   document db scaling database
Gab   document db scaling database
Patrocinadores
Gab   document db scaling database
Gab   document db scaling database
Gab   document db scaling database
Gab   document db scaling database
Gab   document db scaling database
{
"name": "SmugMug",
"permalink": "smugmug",
"homepage_url": "http://www.smugmug.com",
"blog_url": "http://blogs.smugmug.com/",
"category_code": "photo_video",
"products": [
{
"name": "SmugMug",
"permalink": "smugmug"
}
],
"offices": [
{
"description": "",
"address1": "67 E. Evelyn Ave",
"address2": "",
"zip_code": "94041",
"city": "Mountain View",
"state_code": "CA",
"country_code": "USA",
"latitude": 37.390056,
"longitude": -122.067692
}
]
}
Perfect for these
Documents
schema-agnostic JSON store
for
hierarchical and de-normalized data at scale
Not these
documents
{
"name": "SmugMug",
"permalink": "smugmug",
"homepage_url": "http://www.smugmug.com",
"blog_url": "http://blogs.smugmug.com/",
"category_code": "photo_video",
"products": [
{
"name": "SmugMug",
"permalink": "smugmug"
}
],
"offices": [
{
"description": "",
"address1": "67 E. Evelyn Ave",
"address2": "",
"zip_code": "94041",
"city": "Mountain View",
"state_code": "CA",
"country_code": "USA",
"latitude": 37.390056,
"longitude": -122.067692
}
]
}
Perfect for these
Documents
schema-agnostic JSON store
for
hierarchical and de-normalized data at scale
Azure DocumentDB
Millions of RPS
Many TBs of data
Transparent Partitioning
<10ms Reads
<15ms Writes
@P99
Low-latency access
around the globe!
Automatic Indexing
Easy-to-learn query
grammar
Multi-Record
Transactions
Blazing fast, planet scale NoSQL service
99.99% SLAs for availability, latency, and throughput
Gab   document db scaling database
Gab   document db scaling database
Gab   document db scaling database
Gab   document db scaling database
Item Author Pages Language
Harry Potter and the Sorcerer’s
Stone
J.K. Rowling 309 English
Game of Thrones: A Song of Ice
and Fire
George R.R.
Martin
864 English
Item Author Pages Language
Harry Potter and the Sorcerer’s
Stone
J.K. Rowling 309 English
Game of Thrones: A Song of Ice
and Fire
George R.R.
Martin
864 English
Lenovo Thinkpad X1 Carbon ??? ??? ???
Gab   document db scaling database
Gab   document db scaling database
Item Author Pages Language Processor Memory Storage
Harry Potter
and the
Sorcerer’s
Stone
J.K.
Rowling
309 English ??? ??? ???
Game of
Thrones: A
Song of Ice
and Fire
George
R.R.
Martin
864 English ??? ??? ???
Lenovo
Thinkpad X1
Carbon
??? ??? ??? Core i7
3.3ghz
8 GB 256 GB
SSD
Item Author Pages Language
Harry Potter and the Sorcerer’s
Stone
J.K. Rowling 309 English
Game of Thrones: A Song of Ice
and Fire
George R.R.
Martin
864 English
Item CPU Memory Storage
Lenovo Thinkpad X1 Carbon Core i7 3.3ghz 8 GB 256 GB
SSD
ProductId Item
1 Harry Potter and the
Sorcerer’s Stone
2 Game of Thrones: A Song of
Ice and Fire
3 Lenovo Thinkpad X1 Carbon
ProductId Attribute Value
1 Author J.K. Rowling
1 Pages 309
…
2 Author George R.R. Martin
2 Pages 864
…
3 Processor Core i7 3.3ghz
3 Memory 8 GB
…
Gab   document db scaling database
Gab   document db scaling database
Gab   document db scaling database
Gab   document db scaling database
Gab   document db scaling database
▪
▪
▪
▪
EL reto
▪ Escalar con expectativas de
milliones de usuarios en Day 1
▪ Entregar real time responsiveness
for a lag-free, gaming experience
▪ Altamente competitivo – high scores
y global leaderboards son criticos
More Users, More Problems
Gab   document db scaling database
The Results
▪ #1 in Apple app store free apps
during launch week
▪ >1M downloads
▪ ~1B queries per day
▪ 99p queries served under 10ms
Cómo?
Cuestion de tirar los datos en la bd….
Cuestion de tirar los datos en la
bd….
No es tan fácil
Porque es tan difícil?
▪ Caches
▪ Scoreboard siempre se actualiza…
▪ SQL database
▪ Se necesita hacer sharding
▪ Schema and Index Management
▪ Loss of relational benefits
▪ Azure Table Storage
▪ Secondary Indexes
▪ Latencia
▪ Throughput
Planet-Scale NoSQL
▪ Horizontal Scaling for storage and
throughput
▪ High performance with SSDs and
automatic indexing
▪ Operating on a global scale
Gab   document db scaling database
realmente
doloroso
Gab   document db scaling database
Gab   document db scaling database
Request Unit (RU) is the
normalized currency
% Memory
% IOPS
% CPU
Replica gets a fixed budget
of Request Units
Resource
Resource
set
Resource
Resource
DocumentsSQL
sprocs
args
Resource Resource
Predictable Performance
Gab   document db scaling database
Gab   document db scaling database
Gab   document db scaling database
Creando una coleccion particionada
//pre-defined collections
DocumentCollection collectionSpec = new DocumentCollection { Id = "Walkers" };
RequestOptions options = new RequestOptions { OfferType = "S3" };
DocumentCollection documentCollection = await client.CreateDocumentCollectionAsync("dbs/" +
database.Id, collectionSpec, options);
//partitioned collections
DocumentCollection collectionSpec = new DocumentCollection { Id = "Walkers" };
collectionSpec.PartitionKey.Paths.Add(“/walkerId”);
int collectionThroughput = 100000;
RequestOptions options = new RequestOptions { OfferThroughput = collectionThroughput };
DocumentCollection documentCollection = await client.CreateDocumentCollectionAsync("dbs/" +
database.Id, collectionSpec, options);
Gab   document db scaling database
Globalmente distribuido
• Not just for disaster recovery…. DocumentDB is unreasonably highly available
• Replicate data across any # of regions of your choice
• Low-latency access to your data around the globe
• Dynamically configure your write and read regions
Azure DocumentDB nos permite supercar la velocidad de la luz!
Strong consistency, High latency Eventual consistency, Low latency
Gab   document db scaling database
App define preferencias regionales
ConnectionPolicy docClientConnectionPolicy = new ConnectionPolicy { ConnectionMode =
ConnectionMode.Direct, ConnectionProtocol = Protocol.Tcp };
docClientConnectionPolicy.PreferredLocations.Add(LocationNames.EastUS2);
docClientConnectionPolicy.PreferredLocations.Add(LocationNames.WestUS);
docClient = new DocumentClient(
new Uri("https://myglobaldb.documents.azure.com:443"),
"PARvqUuBw2QTO4rRXr6d1GnLCR7VinERcYrBQvDRh6EDTJLOHtZxgjTS4pv8nQv2Lg1QQLBLfO6TVziOZKvYow==",
docClientConnectionPolicy);
Gab   document db scaling database
Automatic Indexing
• Index is a union of all the document trees
Common
structure
Terms Postings List/Values
$/location/0/ 1, 2
location/0/country/ 1, 2
location/0/city/ 1, 2
0/country/Germany 1, 2
1/country/France 2
… …
0/city/Moscow 2
0/dealers/0 2
http://aka.ms/docdbvldb
No need to define secondary indices / schema hints!
Politicas de indexación
customize index management including storage
overhead, throughput and query consistency
▪ range, hash and spatial indexes
▪ included and excluded paths
▪ indexing mode; consistent or lazy
▪ index precision
▪ online, in-place index transformations
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
},
{
"kind": "Spatial",
"dataType": "Point"
}
]
}
],
"excludedPaths": []
}
-- Nested lookup against index
SELECT Books.Author
FROM Books
WHERE Books.Author.Name = "Leo Tolstoy"
-- Transformation, Filters, Array access
SELECT { Name: Books.Title, Author: Books.Author.Name }
FROM Books
WHERE Books.Price > 10 AND Books.Languages[0] = "English"
-- Joins, User Defined Functions (UDF)
SELECT CalculateRegionalTax(Books.Price, "USA", "WA")
FROM Books
JOIN LanguagesArr IN Books.Languages
WHERE LanguagesArr.Language = "Russian"
SQL Query Grammar
Gab   document db scaling database
Gab   document db scaling database
Gab   document db scaling database
function(playerId1, playerId2) {
var playersToSwap = __.filter (function (document) {
return (document.id == playerId1 || document.id == playerId2);
});
var player1 = playersToSwap[0], player2 = playersToSwap[1];
var player1ItemTemp = player1.item;
player1.item = player2.item;
player2.item = player1ItemTemp;
__.replaceDocument(player1)
.then(function() { return __.replaceDocument(player2); })
.fail(function(error){ throw 'Unable to update players, abort'; });
}
client.executeStoredProcedureAsync
("procs/1234", ["MasterChief", "SolidSnake“])
.then(function (response) {
console.log(“success!");
}, function (err) {
console.log("Failed to swap!", error);
}
);
Client Database
Gab   document db scaling database
Gab   document db scaling database
Gab   document db scaling database
API and Toolchain Options
DocumentDB
REST over HTTPS/TCPJava .NET
PowerBI
Gab   document db scaling database
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"addresses": [
{
"line1": "100 Some Street",
"line2": "Unit 1",
"city": "Seattle",
"state": "WA",
"zip": 98012 }
],
"contactDetails": [
{"email: "thomas@andersen.com"},
{"phone": "+1 555 555-5555", "extension": 5555}
]
}
Try model your entity as a self-
contained document
Generally, use embedded data
models when:
contains
one-to-few
changes infrequently
won’t grow
integral
better read performance
In general, use normalized data
models when:
Write performance
one-to-many
many-to-many
changes frequently
{
"id": "xyz",
"username: "user xyz"
}
{
"id": "address_xyz",
"userid": "xyz",
"address" : {
…
}
}
{
"id: "contact_xyz",
"userid": "xyz",
"email" : "user@user.com"
"phone" : "555 5555"
}
Normalizing typically provides better write performance
No magic bullet
Think about how your data is
going to be written, read and
model accordingly
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"countOfBooks": 3,
"books": [1, 2, 3],
"images": [
{"thumbnail": "http://....png"}
{"profile": "http://....png"}
]
}
{
"id": 1,
"name": "DocumentDB 101",
"authors": [
{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},
{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}
]
}
Gab   document db scaling database
Preguntas?

More Related Content

PDF
Azure SQL Data Warehouse
PPTX
Azure SQL Data Warehouse for beginners
PDF
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
PPTX
Microsoft Azure Data Warehouse Overview
PPTX
How SQL Server 2016 SP1 Changes the Game
PPTX
Get started with Microsoft SQL Polybase
PPTX
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
PPTX
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Azure SQL Data Warehouse
Azure SQL Data Warehouse for beginners
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Microsoft Azure Data Warehouse Overview
How SQL Server 2016 SP1 Changes the Game
Get started with Microsoft SQL Polybase
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...

What's hot (20)

PPTX
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
PPTX
SQL Server 2016 - Stretch DB
PPTX
Leveraging ApsaraDB to Deploy Business Data on the Cloud
PDF
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
PPTX
R in Power BI
PPTX
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
PPTX
Deep Dive DMG (september update)
PDF
Cisco: Cassandra adoption on Cisco UCS & OpenStack
PDF
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
PPTX
Webinar | Introducing DataStax Enterprise 4.6
PPTX
Session 02 data_storage_and_database_services_in_aws_and_azure
PPTX
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
PPTX
Scalable relational database with SQL Azure
PPTX
Azure DocumentDB 101
PPTX
Vitalii Bondarenko "Machine Learning on Fast Data"
PPTX
Empowering the AWS DynamoDB™ application developer with Alternator
PDF
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
PDF
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
PPTX
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
PDF
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
SQL Server 2016 - Stretch DB
Leveraging ApsaraDB to Deploy Business Data on the Cloud
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
R in Power BI
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Deep Dive DMG (september update)
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
Webinar | Introducing DataStax Enterprise 4.6
Session 02 data_storage_and_database_services_in_aws_and_azure
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Scalable relational database with SQL Azure
Azure DocumentDB 101
Vitalii Bondarenko "Machine Learning on Fast Data"
Empowering the AWS DynamoDB™ application developer with Alternator
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Ad

Similar to Gab document db scaling database (20)

PDF
Building Highly Flexible, High Performance Query Engines
PPTX
Azure DocumentDB: Advanced Features for Large Scale-Apps
PDF
Semi Formal Model for Document Oriented Databases
PDF
Introduction to Elasticsearch
PPTX
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
PPTX
N1QL: What's new in Couchbase 5.0
PDF
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
PDF
d3sparql.js demo at SWAT4LS 2014 in Berlin
PDF
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
PPTX
Introduction to Apache Drill - interactive query and analysis at scale
PPTX
曾勇 Elastic search-intro
PPTX
SQL To NoSQL - Top 6 Questions Before Making The Move
PPTX
MongoDB Roadmap
PDF
Applied Machine learning using H2O, python and R Workshop
PPTX
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
PDF
MongoDB and Schema Design
PPTX
Mongo db 101 dc group
PDF
2012 mongo db_bangalore_roadmap_new
PPTX
Elastic search intro-@lamper
PDF
Visualizing Web Data Query Results
Building Highly Flexible, High Performance Query Engines
Azure DocumentDB: Advanced Features for Large Scale-Apps
Semi Formal Model for Document Oriented Databases
Introduction to Elasticsearch
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
N1QL: What's new in Couchbase 5.0
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
d3sparql.js demo at SWAT4LS 2014 in Berlin
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Introduction to Apache Drill - interactive query and analysis at scale
曾勇 Elastic search-intro
SQL To NoSQL - Top 6 Questions Before Making The Move
MongoDB Roadmap
Applied Machine learning using H2O, python and R Workshop
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
MongoDB and Schema Design
Mongo db 101 dc group
2012 mongo db_bangalore_roadmap_new
Elastic search intro-@lamper
Visualizing Web Data Query Results
Ad

More from MUG Perú (10)

PPTX
2017 04-22 - mst - curso patrones de diseño en nube
PPTX
Big data2
PDF
Azure machine learning studio gab17
PDF
Sql azure data warehouse gab jorge muchaypina
PDF
Linux en microsoft azure global azure lima
PPTX
Gab2017 explorando solucionesiot
PPTX
Gab cognitive services + xamarin
PDF
Ethical Hacking azure juan-oliva
PPTX
Gab17 ems + seguridad
PPTX
Azure site extensions
2017 04-22 - mst - curso patrones de diseño en nube
Big data2
Azure machine learning studio gab17
Sql azure data warehouse gab jorge muchaypina
Linux en microsoft azure global azure lima
Gab2017 explorando solucionesiot
Gab cognitive services + xamarin
Ethical Hacking azure juan-oliva
Gab17 ems + seguridad
Azure site extensions

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
KodekX | Application Modernization Development
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Cloud computing and distributed systems.
PDF
Approach and Philosophy of On baking technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
cuic standard and advanced reporting.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Weekly Chronicles - August'25 Week I
KodekX | Application Modernization Development
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Empathic Computing: Creating Shared Understanding
Understanding_Digital_Forensics_Presentation.pptx
Cloud computing and distributed systems.
Approach and Philosophy of On baking technology
20250228 LYD VKU AI Blended-Learning.pptx
The AUB Centre for AI in Media Proposal.docx
cuic standard and advanced reporting.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Gab document db scaling database

  • 9. { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents schema-agnostic JSON store for hierarchical and de-normalized data at scale
  • 11. { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents schema-agnostic JSON store for hierarchical and de-normalized data at scale
  • 12. Azure DocumentDB Millions of RPS Many TBs of data Transparent Partitioning <10ms Reads <15ms Writes @P99 Low-latency access around the globe! Automatic Indexing Easy-to-learn query grammar Multi-Record Transactions Blazing fast, planet scale NoSQL service 99.99% SLAs for availability, latency, and throughput
  • 17. Item Author Pages Language Harry Potter and the Sorcerer’s Stone J.K. Rowling 309 English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864 English
  • 18. Item Author Pages Language Harry Potter and the Sorcerer’s Stone J.K. Rowling 309 English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864 English Lenovo Thinkpad X1 Carbon ??? ??? ???
  • 21. Item Author Pages Language Processor Memory Storage Harry Potter and the Sorcerer’s Stone J.K. Rowling 309 English ??? ??? ??? Game of Thrones: A Song of Ice and Fire George R.R. Martin 864 English ??? ??? ??? Lenovo Thinkpad X1 Carbon ??? ??? ??? Core i7 3.3ghz 8 GB 256 GB SSD
  • 22. Item Author Pages Language Harry Potter and the Sorcerer’s Stone J.K. Rowling 309 English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864 English Item CPU Memory Storage Lenovo Thinkpad X1 Carbon Core i7 3.3ghz 8 GB 256 GB SSD
  • 23. ProductId Item 1 Harry Potter and the Sorcerer’s Stone 2 Game of Thrones: A Song of Ice and Fire 3 Lenovo Thinkpad X1 Carbon ProductId Attribute Value 1 Author J.K. Rowling 1 Pages 309 … 2 Author George R.R. Martin 2 Pages 864 … 3 Processor Core i7 3.3ghz 3 Memory 8 GB …
  • 30. EL reto ▪ Escalar con expectativas de milliones de usuarios en Day 1 ▪ Entregar real time responsiveness for a lag-free, gaming experience ▪ Altamente competitivo – high scores y global leaderboards son criticos More Users, More Problems
  • 32. The Results ▪ #1 in Apple app store free apps during launch week ▪ >1M downloads ▪ ~1B queries per day ▪ 99p queries served under 10ms
  • 34. Cuestion de tirar los datos en la bd….
  • 35. Cuestion de tirar los datos en la bd….
  • 36. No es tan fácil
  • 37. Porque es tan difícil? ▪ Caches ▪ Scoreboard siempre se actualiza… ▪ SQL database ▪ Se necesita hacer sharding ▪ Schema and Index Management ▪ Loss of relational benefits ▪ Azure Table Storage ▪ Secondary Indexes ▪ Latencia ▪ Throughput
  • 38. Planet-Scale NoSQL ▪ Horizontal Scaling for storage and throughput ▪ High performance with SSDs and automatic indexing ▪ Operating on a global scale
  • 43. Request Unit (RU) is the normalized currency % Memory % IOPS % CPU Replica gets a fixed budget of Request Units Resource Resource set Resource Resource DocumentsSQL sprocs args Resource Resource Predictable Performance
  • 47. Creando una coleccion particionada //pre-defined collections DocumentCollection collectionSpec = new DocumentCollection { Id = "Walkers" }; RequestOptions options = new RequestOptions { OfferType = "S3" }; DocumentCollection documentCollection = await client.CreateDocumentCollectionAsync("dbs/" + database.Id, collectionSpec, options); //partitioned collections DocumentCollection collectionSpec = new DocumentCollection { Id = "Walkers" }; collectionSpec.PartitionKey.Paths.Add(“/walkerId”); int collectionThroughput = 100000; RequestOptions options = new RequestOptions { OfferThroughput = collectionThroughput }; DocumentCollection documentCollection = await client.CreateDocumentCollectionAsync("dbs/" + database.Id, collectionSpec, options);
  • 49. Globalmente distribuido • Not just for disaster recovery…. DocumentDB is unreasonably highly available • Replicate data across any # of regions of your choice • Low-latency access to your data around the globe • Dynamically configure your write and read regions Azure DocumentDB nos permite supercar la velocidad de la luz!
  • 50. Strong consistency, High latency Eventual consistency, Low latency
  • 52. App define preferencias regionales ConnectionPolicy docClientConnectionPolicy = new ConnectionPolicy { ConnectionMode = ConnectionMode.Direct, ConnectionProtocol = Protocol.Tcp }; docClientConnectionPolicy.PreferredLocations.Add(LocationNames.EastUS2); docClientConnectionPolicy.PreferredLocations.Add(LocationNames.WestUS); docClient = new DocumentClient( new Uri("https://myglobaldb.documents.azure.com:443"), "PARvqUuBw2QTO4rRXr6d1GnLCR7VinERcYrBQvDRh6EDTJLOHtZxgjTS4pv8nQv2Lg1QQLBLfO6TVziOZKvYow==", docClientConnectionPolicy);
  • 54. Automatic Indexing • Index is a union of all the document trees Common structure Terms Postings List/Values $/location/0/ 1, 2 location/0/country/ 1, 2 location/0/city/ 1, 2 0/country/Germany 1, 2 1/country/France 2 … … 0/city/Moscow 2 0/dealers/0 2 http://aka.ms/docdbvldb No need to define secondary indices / schema hints!
  • 55. Politicas de indexación customize index management including storage overhead, throughput and query consistency ▪ range, hash and spatial indexes ▪ included and excluded paths ▪ indexing mode; consistent or lazy ▪ index precision ▪ online, in-place index transformations { "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/*", "indexes": [ { "kind": "Range", "dataType": "Number", "precision": -1 }, { "kind": "Hash", "dataType": "String", "precision": 3 }, { "kind": "Spatial", "dataType": "Point" } ] } ], "excludedPaths": [] }
  • 56. -- Nested lookup against index SELECT Books.Author FROM Books WHERE Books.Author.Name = "Leo Tolstoy" -- Transformation, Filters, Array access SELECT { Name: Books.Title, Author: Books.Author.Name } FROM Books WHERE Books.Price > 10 AND Books.Languages[0] = "English" -- Joins, User Defined Functions (UDF) SELECT CalculateRegionalTax(Books.Price, "USA", "WA") FROM Books JOIN LanguagesArr IN Books.Languages WHERE LanguagesArr.Language = "Russian" SQL Query Grammar
  • 60. function(playerId1, playerId2) { var playersToSwap = __.filter (function (document) { return (document.id == playerId1 || document.id == playerId2); }); var player1 = playersToSwap[0], player2 = playersToSwap[1]; var player1ItemTemp = player1.item; player1.item = player2.item; player2.item = player1ItemTemp; __.replaceDocument(player1) .then(function() { return __.replaceDocument(player2); }) .fail(function(error){ throw 'Unable to update players, abort'; }); } client.executeStoredProcedureAsync ("procs/1234", ["MasterChief", "SolidSnake“]) .then(function (response) { console.log(“success!"); }, function (err) { console.log("Failed to swap!", error); } ); Client Database
  • 64. API and Toolchain Options DocumentDB REST over HTTPS/TCPJava .NET PowerBI
  • 66. { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "thomas@andersen.com"}, {"phone": "+1 555 555-5555", "extension": 5555} ] } Try model your entity as a self- contained document Generally, use embedded data models when: contains one-to-few changes infrequently won’t grow integral better read performance
  • 67. In general, use normalized data models when: Write performance one-to-many many-to-many changes frequently { "id": "xyz", "username: "user xyz" } { "id": "address_xyz", "userid": "xyz", "address" : { … } } { "id: "contact_xyz", "userid": "xyz", "email" : "user@user.com" "phone" : "555 5555" } Normalizing typically provides better write performance
  • 68. No magic bullet Think about how your data is going to be written, read and model accordingly { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [ {"thumbnail": "http://....png"} {"profile": "http://....png"} ] } { "id": 1, "name": "DocumentDB 101", "authors": [ {"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"}, {"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"} ] }