SlideShare a Scribd company logo
#MDBlocal
Atlas Data Lake
Technical Deep-Dive
CHICAGO
Craig Wilson, Senior Staff Engineer, MongoDB
#MDBLocal
State of Affairs
Businesses have a humongous amount of data
• IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the
public cloud.
Cloud storage is cost-effective
Cloud storage is hard to operationalize
#MDBLocal
A New Service Offered by MongoDB Atlas
Access long-term data
Query long-term data
Analyze long-term data
#MDBLocal
Requirements
Look and act like MongoDB
Access customer’s data securely
Handle queries over vast amounts of data
Handle long-running queries
Efficient use of resources
Emulating MongoDB
#MDBLocal
Language
Must be able to communicate with our drivers
Written in Go
Implemented a TCP server
Used mongo-go-driver’s wireprotocol package
Used mongo-go-driver's bson package
#MDBLocal
Security
Must have the same security as MongoDB
Users configured in Atlas
Implemented MongoDB’s security model
Require the use of TLS + SNI(Server Name Indicator)
#MDBLocal
Behavior
Must behave like MongoDB
Implemented commands for a read-only server
Used the server’s aggregation engine
Customer’s Data
#MDBLocal
Security: Customers
Customers have complete control
Provide us with an IAM Role
Configure your buckets
Configure your users in Atlas
#MDBLocal
Security: Atlas
Atlas controls access to your data
Storage of IAM Role
Temporary Credentials
#MDBLocal
Configuration
Customers control their data layout
Stores
Databases, Collections
DataSources
CollectionCollection
Store Store
DataSource DataSource
DataSource
#MDBLocal
Configuration: File Formats
• BSON (gzipped)
• JSON (gzipped)
• Avro (gzipped)
• CSV/TSV (gzipped)
• Parquet
• XLSX
Configuration (S3 Bucket): ent-archive
/archive/customers
- a-m.json
- n-z.json
/archive/invoices
- 2019
- 1.parquet
- 2.parquet
- 2018
- 1.parquet
- 2017.json.gz
- 2016.json.gz
Configuration: Store
s3 : {
name: "ent-archive",
bucket: "ent-archive",
region: "us-east-1",
prefix: "/archive/"
}
Configuration: Data
history: {
customers: [{
store: "ent-archive",
definition: "/customers/*"
}],
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition: "/invoices/{year int}.json.gz"
}]
}
history: {
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition : "/invoices/{year int}.json.gz
}, {
store: "atlas",
db: "customers",
collection: "invoices"
}]
}
Configuration: Data (Future)
Queries
#MDBLocal
Processing
MQL à Distributed MQL
Parse
Parallelize
Distribute
#MDBLocal
Architecture
Atlas
Control
Control
Plane
Compute
Plane
Data
Plane
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
#MDBLocal
Architecture
Atlas
Control
Control
Plane
Compute
Plane
Data
Plane
DataLake
Frontend
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
Query Example: $limit
Map:
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Reduce:
{ $limit: 10 }
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Query Example: $group
Map:
{ $group: { _id: "$year",
totalAvg_sum: { $sum: "$amount" },
totalAvg_count: { $sum: 1 }
} }
Reduce:
{ $group: { _id: "$_id",
totalAvg_sum: { $sum: "$totalAvg_sum" },
totalAvg_count: { $sum: "$totalAvg_count" }
} }
Finalize:
{ $project: { _id: "$_id", totalAvg: { $divide: ["$totalAvg_sum", "$totalAvg_count"] } } }
{ $group: { _id: "$year", totalAvg: { $avg: "amount" } } }
Future
#MDBLocal
Future
More supported MongoDB operators.
$out
$merge
Geo operators
Full Text Search
#MDBLocal
Future
Optimizations
Indexes
Statistics
#MDBLocal
Future
File Formats
• ORC
• PDF
Compression
• Bzip2
• Snappy
• LZMA
• LZO
• Zstd
#MDBLocal
Future
Integrations
Atlas
Microsoft Azure
Google Cloud
#MDBLocal
Hiring
Lots to do
mongodb.com/careers
Craig Wilson
Senior Staff Engineer, MongoDB
Our Developer focused talks
are back on the road!
Find one near you
At your MongoDB.local, you’ll learn technologies, tool, and best practices
That make it easy for you to build data-driven applications without distraction.
THANK YOU

More Related Content

PDF
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
Zentrales logging mit dem Elastic Stack
PPTX
Eagle6 Enterprise Situational Awareness
PDF
Google BigQuery Best Practices
PPTX
MongoDB + Spring
PDF
Social Data and Log Analysis Using MongoDB
PPTX
MongoDB_Sharan_Prakash_Babu
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
Zentrales logging mit dem Elastic Stack
Eagle6 Enterprise Situational Awareness
Google BigQuery Best Practices
MongoDB + Spring
Social Data and Log Analysis Using MongoDB
MongoDB_Sharan_Prakash_Babu

What's hot (20)

PDF
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
Google Big Query UDFs
ODP
Dataspace presentatie
PPTX
Accelerating Delivery of Data Products - The EBSCO Way
PPTX
Beyond the Basics 3: Introduction to the MongoDB BI Connector
PPTX
Introduction to Azure DocumentDB
PDF
Google Cloud Platform at Vente-Exclusive.com
PDF
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
PDF
Data Lessons Learned at Scale - Big Data DC
PDF
Building an open data platform with apache iceberg
PPTX
A Presentation on MongoDB Introduction - Habilelabs
PPTX
Why MongoDB over other Databases - Habilelabs
PPTX
Python and MongoDB as a Market Data Platform by James Blackburn
ODP
Introduction to MongoDB
PPTX
Mongo db intro.pptx
PPTX
.Net Distributed Caching
PDF
Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...
PDF
Mongodb tutorial at Easylearning Guru
PPT
Introduction to mongodb
PPT
Mongo db
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
Google Big Query UDFs
Dataspace presentatie
Accelerating Delivery of Data Products - The EBSCO Way
Beyond the Basics 3: Introduction to the MongoDB BI Connector
Introduction to Azure DocumentDB
Google Cloud Platform at Vente-Exclusive.com
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Data Lessons Learned at Scale - Big Data DC
Building an open data platform with apache iceberg
A Presentation on MongoDB Introduction - Habilelabs
Why MongoDB over other Databases - Habilelabs
Python and MongoDB as a Market Data Platform by James Blackburn
Introduction to MongoDB
Mongo db intro.pptx
.Net Distributed Caching
Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...
Mongodb tutorial at Easylearning Guru
Introduction to mongodb
Mongo db
Ad

Similar to MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive (20)

PDF
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
Big Query - Women Techmarkers (Ukraine - March 2014)
PPTX
Eagle6 mongo dc revised
PPTX
Webinar: The Anatomy of the Cloudant Data Layer
PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB Stitch Introduction
PDF
Mongodb
PPTX
AquaQ Analytics Kx Event - Data Direct Networks Presentation
PPTX
IBM THINK 2018 - IBM Cloud SQL Query Introduction
PDF
Simplifying & accelerating application development with MongoDB's intelligent...
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
PPTX
Cloud-based Data Lake for Analytics and AI
PPTX
Dbs302 driving a realtime personalization engine with cloud bigtable
PPTX
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
PPTX
MongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDB
PPTX
Microsoft Azure Big Data Analytics
PPTX
MongoDB 3.4 webinar
PPTX
MongoDB Stich Overview
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
Big Query - Women Techmarkers (Ukraine - March 2014)
Eagle6 mongo dc revised
Webinar: The Anatomy of the Cloudant Data Layer
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB Stitch Introduction
Mongodb
AquaQ Analytics Kx Event - Data Direct Networks Presentation
IBM THINK 2018 - IBM Cloud SQL Query Introduction
Simplifying & accelerating application development with MongoDB's intelligent...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
Cloud-based Data Lake for Analytics and AI
Dbs302 driving a realtime personalization engine with cloud bigtable
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
MongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDB
Microsoft Azure Big Data Analytics
MongoDB 3.4 webinar
MongoDB Stich Overview
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
PDF
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Approach and Philosophy of On baking technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
cuic standard and advanced reporting.pdf
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
Approach and Philosophy of On baking technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The AUB Centre for AI in Media Proposal.docx

MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive