SlideShare a Scribd company logo
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
Atlas Data Lake
Technical Deep-Dive
Subhead for the presentation goes here
Craig Wilson, Senior Staff Engineer, MongoDB
State of Affairs
Businesses have a humongous amount of data
• IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the public
cloud.
Cloud storage is cost-effective
Cloud storage is hard to operationalize
A New Service Offered by MongoDB Atlas
Access long-term data
Query long-term data
Analyze long-term data
Requirements
Look and act like MongoDB
Access customer’s data securely
Handle queries over vast amounts of data
Handle long-running queries
Efficient use of resources
Emulating MongoDB
Language
Must be able to communicate with our drivers
Written in Go
Implemented a TCP server
Used mongo-go-driver’s wireprotocol package
Used mongo-go-driver's bson package
Security
Must have the same security as MongoDB
Users configured in Atlas
Implemented MongoDB’s security model
Require the use of TLS + SNI(Server Name Indicator)
Behavior
Must behave like MongoDB
Implemented commands for a read-only server
Used the server’s aggregation engine
Customer’s Data
Security: Customers
Customers have complete control
Provide us with an IAM Role
Configure your buckets
Configure your users in Atlas
Security: Atlas
Atlas controls access to your data
Storage of IAM Role
Temporary Credentials
Configuration
Customers control their data layout
Stores
Databases, Collections
DataSources
CollectionCollection
Store Store
DataSource DataSource
DataSource
Configuration: File Formats
• BSON (gzipped)
• JSON (gzipped)
• Avro (gzipped)
• CSV/TSV (gzipped)
• Parquet
• XLSX
Configuration (S3 Bucket): ent-archive
/archive/customers
- a-m.json
- n-z.json
/archive/invoices
- 2019
- 1.parquet
- 2.parquet
- 2018
- 1.parquet
- 2017.json.gz
- 2016.json.gz
Configuration: Store
s3 : {
name: "ent-archive",
bucket: "ent-archive",
region: "us-east-1",
prefix: "/archive/"
}
Configuration: Data
history: {
customers: [{
store: "ent-archive",
definition: "/customers/*"
}],
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition: "/invoices/{year int}.json.gz"
}]
}
Configuration: Data (Future)
history: {
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition : "/invoices/{year int}.json.gz
}, {
store: "atlas",
db: "customers",
collection: "invoices"
}]
}
Queries
Processing
MQL à Distributed MQL
Parse
Parallelize
Distribute
Architecture
Atlas
Control
Control
Plane
Compute
Plane
Data
Plane
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
Architecture
Atlas
Control
Control
Plane
Compute
Plane
Data
Plane
DataLake
Frontend
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
Query Example: $limit
Map:
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Reduce:
{ $limit: 10 }
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Query Example: $group
Map:
{ $group: { _id: "$year",
totalAvg_sum: { $sum: "$amount" },
totalAvg_count: { $sum: 1 }
} }
Reduce:
{ $group: { _id: "$_id",
totalAvg_sum: { $sum: "$totalAvg_sum" },
totalAvg_count: { $sum: "$totalAvg_count" }
} }
Finalize:
{ $project: { _id: "$_id", totalAvg: { $divide: ["$totalAvg_sum", "$totalAvg_count"] } } }
{ $group: { _id: "$year", totalAvg: { $avg: "amount" } } }
Future
Future
More supported MongoDB operators.
$out
$merge
Geo operators
Full Text Search
Future
Optimizations
Indexes
Statistics
Future
File Formats
ORC
PDF
Future
Integrations
Atlas
Microsoft Azure
Google Cloud
Hiring
Lots to do
mongodb.com/careers
Craig Wilson
Senior Staff Engineer, MongoDB
Thank You!
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive

More Related Content

PDF
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
PPTX
An Intro to Elasticsearch and Kibana
PPTX
Exploring MongoDB & Elasticsearch: Better Together
PDF
Big Query - Women Techmarkers (Ukraine - March 2014)
PDF
Big data @ Hootsuite analtyics
PDF
Zentrales logging mit dem Elastic Stack
PDF
Presto Summit 2018 - 03 - Starburst CBO
PDF
Social Data and Log Analysis Using MongoDB
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
An Intro to Elasticsearch and Kibana
Exploring MongoDB & Elasticsearch: Better Together
Big Query - Women Techmarkers (Ukraine - March 2014)
Big data @ Hootsuite analtyics
Zentrales logging mit dem Elastic Stack
Presto Summit 2018 - 03 - Starburst CBO
Social Data and Log Analysis Using MongoDB

What's hot (20)

PDF
Kafka as an Eventing System to Replatform a Monolith into Microservices
PDF
Redis Overview
PPTX
.NET Fest 2017. Константин Проскурдин. Marten как хранилище документов для .N...
PPTX
MongoDB + Spring
PDF
A Cheapskates Guide to AWS v2.0
PPTX
.Net Distributed Caching
PDF
Iceberg: a fast table format for S3
ODP
A Cheapskates Guide to AWS
PPTX
Using MongoDB For BigData in 20 Minutes
PDF
Building Pinterest Real-Time Ads Platform Using Kafka Streams
PPTX
Python and MongoDB as a Market Data Platform by James Blackburn
PPTX
AmazonRedshift
PPTX
Big Data at Tube: Events to Insights to Action
PPTX
GraphTalk München - Einführung in Graphdatenbanken
PPTX
Introduction to Azure DocumentDB
PDF
Introduction to new high performance storage engines in mongodb 3.0
PDF
Redis Day TLV 2018 - RediSearch Aggregations
PDF
Exploring the replication and sharding in MongoDB
PDF
The Next Generation Software Stack: Meteor
Kafka as an Eventing System to Replatform a Monolith into Microservices
Redis Overview
.NET Fest 2017. Константин Проскурдин. Marten как хранилище документов для .N...
MongoDB + Spring
A Cheapskates Guide to AWS v2.0
.Net Distributed Caching
Iceberg: a fast table format for S3
A Cheapskates Guide to AWS
Using MongoDB For BigData in 20 Minutes
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Python and MongoDB as a Market Data Platform by James Blackburn
AmazonRedshift
Big Data at Tube: Events to Insights to Action
GraphTalk München - Einführung in Graphdatenbanken
Introduction to Azure DocumentDB
Introduction to new high performance storage engines in mongodb 3.0
Redis Day TLV 2018 - RediSearch Aggregations
Exploring the replication and sharding in MongoDB
The Next Generation Software Stack: Meteor
Ad

Similar to MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive (20)

PDF
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
PPTX
IBM THINK 2018 - IBM Cloud SQL Query Introduction
PDF
Mongodb
PDF
Developing hybrid applications with informix
PPTX
Webinar: The Anatomy of the Cloudant Data Layer
PDF
Instrumenting and Scaling Databases with Envoy
PPTX
Lessons learned mongodb to redhsift - meetup July 1st Tel Aviv
PDF
AWS Well Architected-Info Session WeCloudData
PDF
Virtual training intro to InfluxDB - June 2021
PDF
Solving enterprise challenges through scale out storage & big compute final
PPTX
MongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDB
PPTX
Cloud-based Data Lake for Analytics and AI
PPTX
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
PDF
Data Analytics Service Company and Its Ruby Usage
PDF
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
IBM THINK 2018 - IBM Cloud SQL Query Introduction
Mongodb
Developing hybrid applications with informix
Webinar: The Anatomy of the Cloudant Data Layer
Instrumenting and Scaling Databases with Envoy
Lessons learned mongodb to redhsift - meetup July 1st Tel Aviv
AWS Well Architected-Info Session WeCloudData
Virtual training intro to InfluxDB - June 2021
Solving enterprise challenges through scale out storage & big compute final
MongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDB
Cloud-based Data Lake for Analytics and AI
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
Data Analytics Service Company and Its Ruby Usage
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
Teaching material agriculture food technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Electronic commerce courselecture one. Pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
KodekX | Application Modernization Development
PDF
Machine learning based COVID-19 study performance prediction
The AUB Centre for AI in Media Proposal.docx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
sap open course for s4hana steps from ECC to s4
Review of recent advances in non-invasive hemoglobin estimation
Teaching material agriculture food technology
Understanding_Digital_Forensics_Presentation.pptx
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced methodologies resolving dimensionality complications for autism neur...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Electronic commerce courselecture one. Pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
KodekX | Application Modernization Development
Machine learning based COVID-19 study performance prediction

MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive