SlideShare a Scribd company logo
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
#MDBLocal
“To free the genius within everyone by
making data stunningly easy to work
with.”
#MDBLocal
Welcome to the World of
Atlas Data Lake
#MDBLocal
Isabel Peters
Senior Software Engineer, MongoDB
Atlas Backup
#MDBLocal
Why are we building this?
“IDC predicts that by 2025 worldwide data will reach
175 Zettabytes and 49% of it will reside in the public
cloud. “
VS
#MDBLocal
Atlas Data Lake Technical Deep Dive
1. Design Goals and Requirements
2. Creating an Atlas Data Lake
3. Atlas Data Lake Architecture
4. Future improvements
Design Goals and Requirements
#MDBLocal
Implementation Requirements
#MDBLocal
MongoDB Wire Protocol Support
Requirements
1) Look and act like MongoDB
Solution
Empty
• Implement a TCP server in Go.
• Used mongo-go-driver’s wireprotocol packagey
• Used mongo-go-driver's bson package
• Read only
#MDBLocal
MongoDB Security Model
Requirements
2) Access customer’s data securely.
Solution
Empty
• Users configured in MongoDB Atlas
• Same authentication and authorization
• Configure buckets
#MDBLocal
Scalable Processing
Requirements
3) Handle long running queries over vast amounts
of data using resources efficiently
Solution
Empty
• Read-only commands
• Use server’s aggregation engine
• Distributed MQL processing
• Intelligent file targeting
#MDBLocal
Data Formats
Requirements
4) Support a variety of data formats
Solution
Empty
• Avro (gzipped)
• Parquet
• BSON/ JSON (gzipped)
• CSV/TSV (gzipped)
#MDBLocal
Atlas Data Lake Features
Multiple data formats
Scalable
MongoDB Query Language
Serverless
On Demand
Integrated with Atlas
Creating your Atlas Data Lake
Files in S3 bucket: ent-archive
/archive/customers
- a-m.json
- n-z.json
/archive/invoices
- 2019
- 1.parquet
- 2.parquet
- 2018
- 1.parquet
- 2017.json.gz
- 2016.json.gz
#MDBLocal
You control your data layout
Stores
Empty
Databases
Empty
Collections
Empty
DataSources
CollectionCollection
Store Store
Database
DataSource DataSource
DataSource
#MDBLocal
Data Lake Configuration
1. Configure a new Data Lake in Atlas
2. Connect to your Data Lake
3. Configure your databases and collections
4. Query your Data Lake
Configuration: S3 Store
s3: {
name: "ent-archive",
bucket: "ent-archive",
region: ”us-east-1",
prefix: "/archive/"
}
Configuration: Databases & Collections
history: {
customers: [{
store: "ent-archive",
definition: "/customers/*"
}],
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition: "/invoices/{year int}.json.gz"
}]
}
#MDBLocal
Querying via MongoDB Atlas
• Atlas users require readWriteAnyDatabase or readAnyDatabase roles.
• Use MongoDB drivers/clients including the mongo shell and MongoDB
Compass
• Write queries in MongoDB Query Language (MQL)
Atlas Data Lake Architecture
#MDBLocal
MQL à Distributed MQL
Parse query
Parallelize processing
Distribute workload
#MDBLocal
Atlas Data Lake Architecture
Atlas
Control
Control
Plane
Compute
Plane
Data
Plane
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
#MDBLocal
Architecture
Atlas
Control
Control
Plane
Compute
Plane
Data
Plane
DataLake
Frontend
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Query Example: $limit
Map:
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Reduce:
{ $limit: 10 }
{ $group: { _id: "$year", totalAvg: { $avg: "amount" } } }
Query Example: $group
Map:
{ $group: { _id: "$year",
totalAvg_sum: { $sum: "$amount" },
totalAvg_count: { $sum: 1 }
} }
Reduce:
{ $group: { _id: "$_id",
totalAvg_sum: { $sum: "$totalAvg_sum" },
totalAvg_count: { $sum: "$totalAvg_count" }
} }
Finalize:
{ $project: { _id: "$_id", totalAvg: { $divide: ["$totalAvg_sum", "$totalAvg_count"] } } }
Future improvements
#MDBLocal
On the roadmap …
MongoDB Operators
$out
$merge
$graphLookup
Performance
• Aggregation
• Indexes
• Statistics over data
File FormatsIntegrations
Summary
#MDBLocal
Atlas Data Lake is the best way to:
Access long-term data in multiple formats
Query long-term data using MQL
Analyse long-term data on demand
#MDBLocal
Give it a try -
Create your own Atlas Data Lake!
THANK YOU

More Related Content

PDF
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
PDF
MongoDB .local Munich 2019: MongoDB Atlas Auto-Scaling
PDF
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
PPTX
Addressing Your Backup Needs Using Ops Manager and Atlas
PDF
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
PPTX
Tutorial: Building Your First App with MongoDB Stitch
PDF
MongoDB .local Munich 2019: Mastering MongoDB on Kubernetes – MongoDB Enterpr...
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB .local Munich 2019: MongoDB Atlas Auto-Scaling
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
Addressing Your Backup Needs Using Ops Manager and Atlas
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Tutorial: Building Your First App with MongoDB Stitch
MongoDB .local Munich 2019: Mastering MongoDB on Kubernetes – MongoDB Enterpr...

What's hot (20)

PPTX
Data Analytics: Understanding Your MongoDB Data
PPTX
Webinar: Live Data Visualisation with Tableau and MongoDB
PPTX
The Right (and Wrong) Use Cases for MongoDB
PDF
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep Dive
PDF
MongoDB on Azure
PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
PPTX
Jumpstart: Introduction to MongoDB
PPTX
Advanced Schema Design Patterns
PDF
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
Blazing Fast Analytics with MongoDB & Spark
PDF
MongoDB World 2019: Ticketek: Scaling to Global Ticket Sales with MongoDB Atlas
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
PPTX
[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design
PDF
Spark and MongoDB
PDF
MongoDB .local Toronto 2019: MongoDB Atlas Jumpstart
PPTX
Benefits of Using MongoDB Over RDBMSs
Data Analytics: Understanding Your MongoDB Data
Webinar: Live Data Visualisation with Tableau and MongoDB
The Right (and Wrong) Use Cases for MongoDB
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Toronto 2019: MongoDB Atlas Search Deep Dive
MongoDB on Azure
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
Jumpstart: Introduction to MongoDB
Advanced Schema Design Patterns
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
Blazing Fast Analytics with MongoDB & Spark
MongoDB World 2019: Ticketek: Scaling to Global Ticket Sales with MongoDB Atlas
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema Design
Spark and MongoDB
MongoDB .local Toronto 2019: MongoDB Atlas Jumpstart
Benefits of Using MongoDB Over RDBMSs
Ad

Similar to MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive (20)

PDF
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
PPTX
Cloud-based Data Lake for Analytics and AI
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
PDF
Embracing Database Diversity with Kafka and Debezium
PDF
Serverless Data Platform
PDF
Serverless SQL
PDF
Scylla Summit 2016: Compose on Containing the Database
PPTX
Meetup#2: Building responsive Symbology & Suggest WebService
PDF
Mongodb
PPTX
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
PPTX
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
PDF
TechEvent 2019: DB, CMU and EUS engineering with vagrant; Stefan Oehrli - Tri...
PPT
Spring data presentation
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
Cloud-based Data Lake for Analytics and AI
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
Embracing Database Diversity with Kafka and Debezium
Serverless Data Platform
Serverless SQL
Scylla Summit 2016: Compose on Containing the Database
Meetup#2: Building responsive Symbology & Suggest WebService
Mongodb
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
TechEvent 2019: DB, CMU and EUS engineering with vagrant; Stefan Oehrli - Tri...
Spring data presentation
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
PDF
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
PDF
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
PDF
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
PDF
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
PDF
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
KodekX | Application Modernization Development
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation theory and applications.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Approach and Philosophy of On baking technology
“AI and Expert System Decision Support & Business Intelligence Systems”
KodekX | Application Modernization Development
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
MIND Revenue Release Quarter 2 2025 Press Release
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Understanding_Digital_Forensics_Presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Digital-Transformation-Roadmap-for-Companies.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Programs and apps: productivity, graphics, security and other tools
Encapsulation theory and applications.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf
Approach and Philosophy of On baking technology

MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive