SlideShare a Scribd company logo
Craig Wilson – Software Engineer
Atlas Data Lake Technical Deep-Dive
@craiggwilson
State of Affairs
Why are we building this?
• Businesses have a humongous amount of data
• IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the public cloud.
• Cloud storage is cost-effective
• Cloud storage is hard to operationalize
A New Service Offered by MongoDB Atlas
Atlas Data Lake allows you to...
§ Access long-term data
§ Query long-term data
§ Analyze long-term data
Requirements
Every product has requirements!
§ Look and act like MongoDB
§ Access customer’s data securely
§ Handle queries over vast amounts of data
§ Handle long-running queries
§ Efficient use of resources
Emulating
MongoDB
Language
Must be able to communicate with our drivers.
§ Written in Go
§ Implemented a TCP server
§ Used mongo-go-driver’s wireprotocol package
§ Used mongo-go-driver's bson package
Security
Must have the same security as MongoDB.
§ Users configured in Atlas
§ Implemented MongoDB’s security model
§ Authentication
§ Authorization
§ Require the use of TLS + SNI
§ SNI = Server Name Indicator
Behavior
Must behave like MongoDB.
§ Implemented commands for a read-only server
§ Used the server’s aggregation engine
Customer’s Data
Security: Customers
Customers have complete control.
§ Provide us with an IAM Role
§ Configure your buckets
§ Configure your users in Atlas
Security: Atlas
Atlas controls access to your data.
§ Storage of IAM Role
§ Temporary Credentials
Configuration
Customers control their data layout.
§ Stores
§ Databases, Collections
§ DataSources
CollectionCollection
Store Store
Database
DataSource DataSource
DataSource
Configuration: File Formats
Each file has a format.
§ BSON (gzipped)
§ JSON (gzipped)
§ Avro (gzipped)
§ CSV/TSV (gzipped)
§ Parquet
/archive/customers
- a-m.json
- n-z.json
Configuration (S3 Bucket): ent-archive
/archive/invoices
- 2019
- 1.parquet
- 2.parquet
- 2018
- 1.parquet
- 2017.json.gz
- 2016.json.gz
s3 : {
name: "ent-archive",
bucket: "ent-archive",
region: "us-east-1",
prefix: "/archive/"
}
Configuration: Store
history: {
customers: [{
store: "ent-archive",
definition: "/customers/*"
}],
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition: "/invoices/{year int}.json.gz"
}]
}
Configuration: Data
history: {
invoices: [{
store: "ent-archive",
definition: "/invoices/{year int}/*"
}, {
store: "ent-archive",
definition : "/invoices/{year int}.json.gz
}, {
store: "atlas",
cluster: "my-cluster",
db: "customers",
collection: "invoices"
}]
}
Configuration: Data (Future)
Queries
Processing
MQL à Distributed MQL
§ Parse
§ Parallelize
§ Distribute
Architecture
Atlas
Control
Control
Plane
Compute
Plane
Data
Plane
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
DataLake
Frontend
DataLake
Agent
Load Balancer
Load Balancer
Architecture
Atlas
Control
Control
Plane
Compute
Plane
Data
Plane
DataLake
Frontend
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
DataLake
Agent
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Query Example: $limit
Map:
{ $match: { year: { $gt: 2000 } } }
{ $limit: 10 }
Reduce:
{ $limit: 10 }
{ $group: { _id: "$year", totalAvg: { $avg: "amount" } } }
Query Example: $group
Map:
{ $group: { _id: "$year",
totalAvg_sum: { $sum: "amount" },
totalAvg_count: { $sum: 1 }
} }
Reduce:
{ $group: { _id: "$_id",
totalAvg_sum: { $sum: "$totalAvg_sum" },
totalAvg_count: { $sum: "$totalAvg_count" }
} }
Finalize:
{ $project: { _id: "$_id", totalAvg: { $divide: ["$totalAvg_sum", "$totalAvg_count"] } } }
Future
Future
More supported MongoDB operators.
§ $out
§ $merge
§ $graphLookup
§ Geo operators
§ Full Text Search
Future
Optimizations!
§ Indexes
§ Statistics
Future
More File Formats!
§ ORC
§ Excel
§ PDF
Future
Integrations!
§ Atlas
§ Microsoft Azure
§ Google Cloud
Hiring
Lots to do!
§ mongodb.com/careers
MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive

More Related Content

PDF
MongoDB World 2019: New Encryption Capabilities in MongoDB 4.2: A Deep Dive i...
PPTX
Managing Multi-Tenant SaaS Applications at Scale
PPTX
Managing Cloud Security Design and Implementation in a Ransomware World
PPTX
Securing Your Enterprise Web Apps with MongoDB Enterprise
PPTX
Private Cloud Self-Service at Scale
PPTX
Introducing Stitch
PPTX
It's a Dangerous World
PPTX
Introducing MongoDB Atlas
MongoDB World 2019: New Encryption Capabilities in MongoDB 4.2: A Deep Dive i...
Managing Multi-Tenant SaaS Applications at Scale
Managing Cloud Security Design and Implementation in a Ransomware World
Securing Your Enterprise Web Apps with MongoDB Enterprise
Private Cloud Self-Service at Scale
Introducing Stitch
It's a Dangerous World
Introducing MongoDB Atlas

What's hot (20)

PPTX
Power Real Estate Property Analytics with MongoDB + Spark
PPTX
Bye Bye Legacy: Simplifying the Journey
PDF
C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
PDF
MongoDB .local Chicago 2019: Modern Data Backup and Recovery from On-premises...
PDF
Building a Microservices-based ERP System
PPTX
Advanced Schema Design Patterns
PPTX
Building the Real-Time Performance Panel
PPTX
MongoDB 3.4: Deep Dive on Views, Zones, and MongoDB Compass
PDF
MongoDB Launchpad 2016: Moving Cybersecurity to the Cloud
PDF
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
PPTX
MongoDB Atlas
PPTX
A Free New World: Atlas Free Tier and How It Was Born
PPSX
MongoDB seminar
PDF
Bloom Filters for Web Caching - Lightning Talk
PDF
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
PPTX
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
PDF
MongoDB .local Bengaluru 2019: The Journey of Migration from Oracle to MongoD...
PDF
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PPTX
Webinar: Choosing the Right Shard Key for High Performance and Scale
Power Real Estate Property Analytics with MongoDB + Spark
Bye Bye Legacy: Simplifying the Journey
C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
MongoDB .local Chicago 2019: Modern Data Backup and Recovery from On-premises...
Building a Microservices-based ERP System
Advanced Schema Design Patterns
Building the Real-Time Performance Panel
MongoDB 3.4: Deep Dive on Views, Zones, and MongoDB Compass
MongoDB Launchpad 2016: Moving Cybersecurity to the Cloud
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB Atlas
A Free New World: Atlas Free Tier and How It Was Born
MongoDB seminar
Bloom Filters for Web Caching - Lightning Talk
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
MongoDB .local Bengaluru 2019: The Journey of Migration from Oracle to MongoD...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB SoCal 2020: MongoDB Atlas Jump Start
Webinar: Choosing the Right Shard Key for High Performance and Scale
Ad

Similar to MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive (20)

PDF
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
Using MongoDB to Build a Fast and Scalable Content Repository
PDF
AWS Well Architected-Info Session WeCloudData
PPTX
Webinar: The Anatomy of the Cloudant Data Layer
PPTX
Cloud-based Data Lake for Analytics and AI
PPTX
cosmodb ppt personal.pptxgskjhkjsfgkhkjgskhk
PPTX
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
PDF
Delivering Apache Hadoop for the Modern Data Architecture
PDF
Mongodb
PPTX
Big Data on Cloud Native Platform
PPTX
Big Data on Cloud Native Platform
PDF
Realtime Analytics on AWS
PPTX
AquaQ Analytics Kx Event - Data Direct Networks Presentation
PDF
Embracing Database Diversity with Kafka and Debezium
PPTX
Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01
MongoDB .local Houston 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
Using MongoDB to Build a Fast and Scalable Content Repository
AWS Well Architected-Info Session WeCloudData
Webinar: The Anatomy of the Cloudant Data Layer
Cloud-based Data Lake for Analytics and AI
cosmodb ppt personal.pptxgskjhkjsfgkhkjgskhk
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Delivering Apache Hadoop for the Modern Data Architecture
Mongodb
Big Data on Cloud Native Platform
Big Data on Cloud Native Platform
Realtime Analytics on AWS
AquaQ Analytics Kx Event - Data Direct Networks Presentation
Embracing Database Diversity with Kafka and Debezium
Securededuplicationschemeforcloudstorage 141128075306-conversion-gate01
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
PDF
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Cloud computing and distributed systems.
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Approach and Philosophy of On baking technology
Programs and apps: productivity, graphics, security and other tools
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
Cloud computing and distributed systems.
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25 Week I
Advanced methodologies resolving dimensionality complications for autism neur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Network Security Unit 5.pdf for BCA BBA.
Reach Out and Touch Someone: Haptics and Empathic Computing
The AUB Centre for AI in Media Proposal.docx
Dropbox Q2 2025 Financial Results & Investor Presentation
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
20250228 LYD VKU AI Blended-Learning.pptx
sap open course for s4hana steps from ECC to s4
Unlocking AI with Model Context Protocol (MCP)
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Approach and Philosophy of On baking technology

MongoDB World 2019: MongoDB Atlas Data Lake Technical Deep Dive