SlideShare a Scribd company logo
Transactions and
Durability: Putting the “D”
in ACID
Sue LoVerso - Senior Staff Engineer, Storage Team
MongoDB, Inc sue@mongodb.com
Sue LoVerso
Senior Staff Engineer,
MongoDB Kernel Storage Team
Transaction Concepts
ACID - Atomicity
• “All or nothing”
• Set of operations are an indivisible unit
• Example: Bank transfer:
• Withdraw from Account A
• Deposit into Account B
ACID - Consistency
• Database constraints aren’t violated (“constraints” individually defined)
• Most common in Relational Databases: constraints, triggers or
cascades
• Example:
• [EndDate] >= [StartDate]
• [Value] < 100
ACID - Isolation
• Data visibility (or not) by other concurrently running transactions
• Isolation levels: what data can a transaction read or write and when
• WiredTiger offers visibility choices with tradeoffs
ACID - Durability
• Provides guarantee that committed transactions survive permanently
• Common implementation is write-ahead log (aka journal)
• WiredTiger gives durability-related choices
• This is not the replication oplog. This is per-node in a cluster
• These are not user-level transactions
ACID - Durability Modes
In-memory mode WT memory
ACID - Durability Modes
In-memory mode
write-no-sync
mode
WT memory
WT memory
write(2) file system
buffer cache
ACID - Durability Modes
Disk
In-memory mode
write-no-sync
mode
full-sync mode WT memory
write(2) file system
buffer cache
WT memory
WT memory
write(2) file system
buffer cache
ACID - Durability
Durability Guarantee
Performance
Full Sync
In-memory
Write-no-Sync
ACID - Durability
Synchronization Level Process Crash System Crash
in-memory (“sync=disabled”) Potential data loss Potential data loss
write-no-sync (“sync=off”) Data always recoverable Potential data loss
full sync (“sync=on”) Data always recoverable Data always recoverable
Why Do We Care?
What keeps me up at night?
What keeps me up at night?
What keeps me up at night?
• For storage engineers - “Losing people’s data”
How I sleep better
• Durability - keep data stable and consistent
How I sleep better
• Durability - keep data stable and consistent
• If we say it’s stable, it really is written and synced
How I sleep better
• Durability - keep data stable and consistent
• If we say it’s stable, it really is written and synced
• Crash: we can recover your durably written data
How I sleep better
• Round the clock testing 24/7
• Randomized
• Kill process
• Power cycle
• Both WiredTiger only and via MongoDB
• Any failure is top priority work (and I mean absolute, top priority)
How I sleep better
• Constant testing gives confidence in code
• We think long and hard about durability guarantees
• We pay close attention to OS and system call issues
• We think long and hard about feature impacts on durability
• We take the responsibility of keeping your data seriously
Durability All The
Way Down
Layers All The Way Down
Content
Repo
IoT Sensor
Backend
Ad Service
Customer
Analytics
Archive
Example Future State
Layers All The Way Down
Content
Repo
IoT Sensor
Backend
Ad Service
Customer
Analytics
Archive
MongoDB Query Language (MQL) + Native Drivers
Example Future State
Layers All The Way Down
Content
Repo
IoT Sensor
Backend
Ad Service
Customer
Analytics
Archive
MongoDB Query Language (MQL) + Native Drivers
MongoDB Document Data Model
Management
Security
Example Future State
Layers All The Way Down
Content
Repo
IoT Sensor
Backend
Ad Service
Customer
Analytics
Archive
MongoDB Query Language (MQL) + Native Drivers
MongoDB Document Data Model
MMAP V1 WiredTiger RocksDB Mobile
Management
Security
Example Future State
In-memory
WiredTiger
Architecture
Database
Files
Log Files
WiredTiger C API
Schema &
Cursors
Transactions
Row
Storage
Column
Storage
Snapshots
Cache
Page
read/write
Block
Manager
Logging
MongoDB Storage Engine Layer
WiredTiger
Architecture
Database
Files
Log Files
WiredTiger C API
Schema &
Cursors
Transactions
Row
Storage
Column
Storage
Snapshots
Cache
Page
read/write
Block
Manager
Logging
Layers of Trust
Layers of Trust - User
• User wants assured stability of some data
User
Layers of Trust - User
• User wants assured stability of some data
• User gives write concern: { j : true }
j:true turtles
User
Layers of Trust
• User gives write concern: { j : true }
• https://docs.mongodb.com/manual/reference/write-concern/:
• Write concern describes the level of acknowledgement requested from
MongoDB for write operations …
• If j: true, [the user] requests acknowledgement that the mongod instances ...
have written to the on-disk journal.
Layers of Trust - MongoDB
src/mongo/db/write_concern.cpp:
Status waitForWriteConcern(...)
{
...
case WriteConcernOptions::SyncMode::JOURNAL:
opCtx->recoveryUnit()->waitUntilDurable();
...
}
j:true turtles
User
MDB
Layers of Trust - MongoDB
void WiredTigerSessionCache::waitUntilDurable(...)
{
...
// Use the journal when available, or a checkpoint otherwise.
if (_engine->isDurable()) {
invariantWTOK(_Session->log_flush(_Session, "sync=on"));
...
} else {
invariantWTOK(_Session->checkpoint(_Session, NULL));
...
}
}
j:true turtles
User
MDB
Layers of Trust - MongoDB
• MongoDB calls WT: _Session->log_flush(_Session, “sync=on”)
Layers of Trust - MongoDB
• MongoDB calls: _Session->log_flush(_Session, “sync=on”)
• http://source.wiredtiger.com/3.0.0/struct_w_t___s_e_s_s_i_o_n.html:
log_flush:
• forcibly flush the log and wait for it to achieve the synchronization level
specified. The on setting forces log records to be written to the storage device.
Layers of Trust - WiredTiger
int __session_log_flush(...)
{
...
if (WT_STRING_MATCH("on", cval.str, cval.len))
flags = WT_LOG_FSYNC;
ret = __wt_log_flush(session, flags);
...
}
int __wt_log_flush(...)
{
...
if (FLAG_ISSET(WT_LOG_FSYNC))
return (__wt_log_force_sync(...));
}
j:true turtles
User
MDB
WT
Layers of Trust - WiredTiger
int __wt_log_force_sync()
{
...
ret = __wt_fsync(session, log_fh, true));
}
int __posix_sync()
{
...
#ifdef …
WT_SYSCALL(fdatasync(fd), ret);
#else
WT_SYSCALL(fsync(fd), ret);
#endif
...
}
j:true turtles
User
MDB
WT
Layers of Trust - WiredTiger
• WiredTiger calls: fsync(fd) or fdatasync(fd)
• http://man7.org/linux/man-pages/man2/fdatasync.2.html
• fsync() transfers ("flushes") all modified ... data of the file
to the disk device ...
so that all changed information can be retrieved even if the system
crashes or is rebooted. ... The call blocks until the device reports that
the transfer has completed.
Layers of Trust - Operating System
https://elixir.bootlin.com/linux/v4.17-rc4/source/
fs/sync.c:
SYSCALL_DEFINE1(fsync, unsigned int, fd) {
...
ret = vfs_fsync_range(...);
int vfs_fsync_range(...)
...
return (file->f_op->fsync(...));
fs/xfs/xfs_file.c:
int xfs_file_sync(...)
...
blkdev_issue_flush(...);
j:true turtles
User
MDB
WT
OS
Layers of Trust - Operating System
block/blk-flush.c
int blkdev_issue_flush(...)
{
...
bio_set_dev(...);
bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH;
ret = submit_bio_wait(...);
}
Layers of Trust - Device driver
• Device driver sends some kind of flush operation.
• Code is device specific.
• Hardware is the recipient.
j:true turtles
User
MDB
WT
OS
Dev
Layers of Trust - Hardware
• Device driver is trusting the hardware.
• If the hardware lies, data could be lost in a crash.
• May need to disable device write-caching:
• sudo hdparm -i /dev/sda
• /etc/hdparm.conf
• #write-cache=off
j:true turtles
User
MDB
WT
OS
Dev
HW
Layers of Trust - Summary
Layers of Trust - Summary
• User wants assured stability of some data
User
Layers of Trust - Summary
• User wants assured stability of some data
• User trusts MongoDB using { j : true }
User
MDB
Layers of Trust - Summary
• User wants assured stability of some data
• User trusts MongoDB using { j : true }
• MongoDB trusts WiredTiger log_flush(“sync=on”)
User
MDB
WT
Layers of Trust - Summary
• User wants assured stability of some data
• User trusts MongoDB using { j : true }
• MongoDB trusts WiredTiger log_flush(“sync=on”)
• WiredTiger trusts the OS using fsync(fd)
User
MDB
WT
OS
Layers of Trust - Summary
• User wants assured stability of some data
• User trusts MongoDB using { j : true }
• MongoDB trusts WiredTiger log_flush(“sync=on”)
• WiredTiger trusts the OS using fsync(fd)
• Operating System trusts device driver
User
MDB
WT
OS
Dev
Layers of Trust - Summary
• User wants assured stability of some data
• User trusts MongoDB using { j : true }
• MongoDB trusts WiredTiger log_flush(“sync=on”)
• WiredTiger trusts the OS using fsync(fd)
• Operating System trusts device driver
• The device driver trusts the hardware
User
MDB
WT
OS
Dev
HW
WiredTiger Durability
What Does Durability Mean?
• Writing any data to stable storage for later retrieval
• In WiredTiger:
○ Checkpoints - coarse-grained durability
○ Write-ahead logging - fine-grained durability
WiredTiger
Architecture
Database
Files
Log Files
WiredTiger C API
Schema &
Cursors
Transactions
Row
Storage
Column
Storage
Snapshots
Cache
Page
read/write
Block
Manager
Logging
Checkpoints
• Who:
• Called via WT_SESSION::checkpoint() API (MongoDB)
• What:
• Writes all modified collections, all visible dirty data
• When:
• API call, internal thread period, on close
• How:
• Checkpoint opens a snapshot transaction, writes data
Checkpoints
• Create Table:Collection
Checkpoints
• Create Table:Collection
Collection.wt (Disk)WT Cache (Memory)
Checkpoints
• Insert some data
Collection.wt (Disk)WT Cache (Memory)
Apple
Checkpoints
• Insert some data
Collection.wt (Disk)WT Cache (Memory)
Apple
Banana
Checkpoints
• Insert some data
Collection.wt (Disk)WT Cache (Memory)
Apple
Banana
Cherry
Checkpoints
• Call WT_SESSION::checkpoint()
Collection.wt (Disk)WT Cache (Memory)
Apple
Banana
Cherry
Checkpoints
• WT_SESSION::checkpoint()
Collection.wt (Disk)WT Cache (Memory)
Apple
Banana
Cherry
Checkpoints
• Insert more data
Collection.wt (Disk)WT Cache (Memory)
Apple
Banana
Cherry
Orange
Pineapple
Crash
Crashing and Recovery
• Open collection from most recent checkpoint
• Any data after checkpoint is lost
Checkpoints
• Restart
Collection.wt (Disk)WT Cache (Memory)
Apple
Banana
Cherry
WiredTiger
Architecture
Database
Files
Log Files
WiredTiger C API
Schema &
Cursors
Transactions
Row
Storage
Column
Storage
Snapshots
Cache
Page
read/write
Block
Manager
Logging
Write-ahead log (aka journal)
• Who:
• Enabled/disabled when first creating the database
• What:
• Logical recording of all transaction modifications
• When:
• Every time a transaction commits
Write-ahead log (aka journal)
• Create Table:Collection
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Write-ahead log (aka journal)
• Insert some data
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Commit(Apple)
Write-ahead log (aka journal)
• Insert some data
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
Write-ahead log (aka journal)
• Insert some data
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
Cherry
Commit(Cherry)
Write-ahead log (aka journal)
• Insert some more data
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
Cherry
Commit(Cherry)
Orange
Commit(Orange)
Write-ahead log (aka journal)
• Insert some more data
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
Cherry
Commit(Cherry)
Orange
Commit(Orange)
Pineapple
Commit(Pineapple)
Crash
Crashing and Recovery
• First table open: if it exists, most recent checkpoint taken
• Recovery replays the entire applicable write-ahead log
Write-ahead log (aka journal)
• Restart and recovery
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Commit(Apple)
Commit(Banana)
Commit(Cherry)
Commit(Orange)
Commit(Pineapple)
Write-ahead log (aka journal)
• Recovery replays the operations
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
Cherry
Commit(Cherry)
Orange
Commit(Orange)
Pineapple
Commit(Pineapple)
Write-ahead log (aka journal)
• Recovery checkpoints on completion
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
CherryCommit(Cherry)
OrangeCommit(Orange)
PineappleCommit(Pineapple)
Checkpoints + Logging =
• Faster Recovery
• Less disk usage
• Best of both worlds
Checkpoint + Logging
• Create Table:Collection
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Checkpoint + Logging
• Insert some data
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Commit(Apple)
Checkpoint + Logging
• Insert some data
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
Checkpoint + Logging
• Insert some data
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
Cherry
Commit(Cherry)
Checkpoint + Logging
• WT_SESSION::checkpoint()
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
CherryCommit(Cherry)
Checkpoint
Checkpoint + Logging
• Insert some more data
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
CherryCommit(Cherry)
Orange
Commit(Orange)
Checkpoint
Checkpoint + Logging
• Insert some more data
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
CherryCommit(Cherry)
Orange
Commit(Orange)
Pineapple
Commit(Pineapple)
Checkpoint
Crash
Checkpoint + Logging
• Restart and Recovery
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
CherryCommit(Cherry)
Commit(Orange)
Commit(Pineapple)
Checkpoint
Checkpoint + Logging
• Restart and Recovery
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
CherryCommit(Cherry)
Commit(Orange)
Commit(Pineapple)
Checkpoint
Recovery
starts here
Checkpoint + Logging
• Recovery replays the operations
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
CherryCommit(Cherry)
Orange
Commit(Orange)
Pineapple
Commit(Pineapple)
Checkpoint
Checkpoint + Logging
• Recovery checkpoints on completion
Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
Apple
Banana
Commit(Apple)
Commit(Banana)
CherryCommit(Cherry)
Orange
Commit(Orange) Pineapple
Commit(Pineapple)
Checkpoint
Checkpoint
Checkpoints + Logging =
• Faster Recovery
• Checkpoints bound recovery time. Recovery starts at previous checkpoint
• Less disk usage
• Checkpoints allow archival of earlier write-ahead log files
• Archiving happens on whole-file basis
• Fine-grained durability
• Logging allows recovery of intermediary data
Transactions and
Logging
Transactions + Logging
• Who:
• WT_SESSION::commit_transaction()
• WT_SESSION::checkpoint()
• Internal operations
• What:
• Logical recording of all transaction modifications
• When:
• Every time a transaction commits, before making visible
Write-Ahead Modes
WT Log (Disk)
Commit(Apple)
Checkpoint
Commit(Banana)
Commit(Cherry)
Commit(Orange)
Commit(Pineapple)
Checkpoint
Write-Ahead Modes
Diskfull-sync mode WT memory
write(2) file system
buffer cache
WT memory
WT memory
write-no-sync
mode
in-memory mode
write(2) file system
buffer cache
WT Log (Disk)
Commit(Apple)
Checkpoint
Commit(Banana)
Commit(Cherry)
Commit(Orange)
Commit(Pineapple)
Checkpoint
Write-Ahead Modes
• Guarantees on return from commit transaction:
• in-memory durability (default/MongoDB): none
• write-no-sync: recoverable after process crash
• full-sync: recoverable from process crash or system crash
Synchronization Level Process Crash System Crash
in-memory Potential data loss Potential data loss
write-no-sync Data always recoverable Potential data loss
full sync Data always recoverable Data always recoverable
Write-ahead In-memory Mode
• Why have it? Why use it?
• When does it become durable?
Write-ahead In-memory Mode
• Why have it? Why use it?
• When does it become durable?
• Filling the memory buffer.
• Any later log record requiring greater durability.
• Changing to a new log file.
• Idle system write log memory buffer every 50 msecs.
WT Transactions + Logging + MongoDB
• MongoDB uses least persistent sync mode (in-memory).
• Most performant (by a lot).
• MongoDB can generate multiple transactions at a time.
• WT_SESSION::log_flush() API available.
• MongoDB calls just before returning guarantee to user.
Conclusion
Conclusions
• This is cool stuff
• I hope you enjoyed learning about it
• We take the responsibility of keeping your data seriously
How do you sleep at night?
Z Z Z
Z Z Z
Z Z Z
Transactions and
Durability: Putting the “D”
in ACID
Sue LoVerso - Senior Staff Engineer, Storage Team
MongoDB, Inc sue@mongodb.com

More Related Content

PDF
MongoDB World 2018: Building a New Transactional Model
PPTX
MongoDB World 2018: MongoDB for High Volume Time Series Data Streams
PPT
MongoDB World 2018: From Disruption to Transformation: Document Databases, Do...
PDF
MongoDB .local Bengaluru 2019: New Encryption Capabilities in MongoDB 4.2: A ...
PDF
Engineering an Encrypted Storage Engine
PPTX
Managing Cloud Security Design and Implementation in a Ransomware World
PPTX
Securing Your Enterprise Web Apps with MongoDB Enterprise
PDF
MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...
MongoDB World 2018: Building a New Transactional Model
MongoDB World 2018: MongoDB for High Volume Time Series Data Streams
MongoDB World 2018: From Disruption to Transformation: Document Databases, Do...
MongoDB .local Bengaluru 2019: New Encryption Capabilities in MongoDB 4.2: A ...
Engineering an Encrypted Storage Engine
Managing Cloud Security Design and Implementation in a Ransomware World
Securing Your Enterprise Web Apps with MongoDB Enterprise
MongoDB .local Bengaluru 2019: Using MongoDB Services in Kubernetes: Any Plat...

What's hot (20)

PPTX
It's a Dangerous World
PPTX
Introducing Stitch
PPTX
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
PPTX
Managing Multi-Tenant SaaS Applications at Scale
PPTX
A New Transactional Model - Keith Bostic
PPTX
Webinar: Enabling Microservices with Containers, Orchestration, and MongoDB
PPTX
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
PDF
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
PDF
Webinar: Schema Patterns and Your Storage Engine
PPTX
Performance Tipping Points - Hitting Hardware Bottlenecks
PDF
Containerizing MongoDB with kubernetes
PPTX
Directory Write Leases in MagFS
PPTX
Cloud Backup Overview
PDF
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
PDF
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
PPTX
Webinar: Choosing the Right Shard Key for High Performance and Scale
PPTX
Meetup Microservices Commandments
PPTX
KubeCon EU 2019 "Securing Cloud Native Communication: From End User to Service"
PPTX
Sizing MongoDB Clusters
DOCX
Secure auditing and deduplicating data in cloud
It's a Dangerous World
Introducing Stitch
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Managing Multi-Tenant SaaS Applications at Scale
A New Transactional Model - Keith Bostic
Webinar: Enabling Microservices with Containers, Orchestration, and MongoDB
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
Webinar: Schema Patterns and Your Storage Engine
Performance Tipping Points - Hitting Hardware Bottlenecks
Containerizing MongoDB with kubernetes
Directory Write Leases in MagFS
Cloud Backup Overview
Maginatics Cloud Storage Platform - MCSP 3.0 Technical Highlights
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Webinar: Choosing the Right Shard Key for High Performance and Scale
Meetup Microservices Commandments
KubeCon EU 2019 "Securing Cloud Native Communication: From End User to Service"
Sizing MongoDB Clusters
Secure auditing and deduplicating data in cloud
Ad

Similar to MongoDB World 2018: Transactions and Durability: Putting the “D” in ACID (20)

PPTX
Securing Your MongoDB Deployment
PPTX
Webinar: Deploying MongoDB to Production in Data Centers and the Cloud
PPTX
Production deployment
ODP
Groovy In the Cloud
PPTX
Neue Features in MongoDB 3.6
PPTX
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
PDF
MongoDB World 2019: Why NBCUniversal Migrated to MongoDB Atlas
PPTX
MongoDB.local Atlanta: Introduction to Serverless MongoDB
PPTX
Percona Live 2021 - MongoDB Security Features
PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PPTX
Securing Your Deployment with MongoDB Enterprise
PPTX
MongoDB Days UK: Securing Your Deployment with MongoDB Enterprise
PPTX
Why NBC Universal Migrated to MongoDB Atlas
PDF
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
PDF
Effectively Deploying MongoDB on AEM
PDF
Elements for an iOS Backend
PDF
Node.js for enterprise - JS Conference
PPTX
Webinar: Securing your data - Mitigating the risks with MongoDB
PPTX
MongoDB Best Practices
PPTX
Webinar: Best Practices for Getting Started with MongoDB
Securing Your MongoDB Deployment
Webinar: Deploying MongoDB to Production in Data Centers and the Cloud
Production deployment
Groovy In the Cloud
Neue Features in MongoDB 3.6
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB World 2019: Why NBCUniversal Migrated to MongoDB Atlas
MongoDB.local Atlanta: Introduction to Serverless MongoDB
Percona Live 2021 - MongoDB Security Features
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
Securing Your Deployment with MongoDB Enterprise
MongoDB Days UK: Securing Your Deployment with MongoDB Enterprise
Why NBC Universal Migrated to MongoDB Atlas
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
Effectively Deploying MongoDB on AEM
Elements for an iOS Backend
Node.js for enterprise - JS Conference
Webinar: Securing your data - Mitigating the risks with MongoDB
MongoDB Best Practices
Webinar: Best Practices for Getting Started with MongoDB
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB

Recently uploaded (20)

PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Tartificialntelligence_presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
August Patch Tuesday
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
A Presentation on Touch Screen Technology
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
DP Operators-handbook-extract for the Mautical Institute
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Chapter 5: Probability Theory and Statistics
NewMind AI Weekly Chronicles - August'25-Week II
Digital-Transformation-Roadmap-for-Companies.pptx
Programs and apps: productivity, graphics, security and other tools
Encapsulation_ Review paper, used for researhc scholars
Tartificialntelligence_presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
1. Introduction to Computer Programming.pptx
Unlocking AI with Model Context Protocol (MCP)
August Patch Tuesday
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A Presentation on Touch Screen Technology
Zenith AI: Advanced Artificial Intelligence
OMC Textile Division Presentation 2021.pptx
Enhancing emotion recognition model for a student engagement use case through...
DP Operators-handbook-extract for the Mautical Institute

MongoDB World 2018: Transactions and Durability: Putting the “D” in ACID

  • 1. Transactions and Durability: Putting the “D” in ACID Sue LoVerso - Senior Staff Engineer, Storage Team MongoDB, Inc sue@mongodb.com
  • 2. Sue LoVerso Senior Staff Engineer, MongoDB Kernel Storage Team
  • 4. ACID - Atomicity • “All or nothing” • Set of operations are an indivisible unit • Example: Bank transfer: • Withdraw from Account A • Deposit into Account B
  • 5. ACID - Consistency • Database constraints aren’t violated (“constraints” individually defined) • Most common in Relational Databases: constraints, triggers or cascades • Example: • [EndDate] >= [StartDate] • [Value] < 100
  • 6. ACID - Isolation • Data visibility (or not) by other concurrently running transactions • Isolation levels: what data can a transaction read or write and when • WiredTiger offers visibility choices with tradeoffs
  • 7. ACID - Durability • Provides guarantee that committed transactions survive permanently • Common implementation is write-ahead log (aka journal) • WiredTiger gives durability-related choices • This is not the replication oplog. This is per-node in a cluster • These are not user-level transactions
  • 8. ACID - Durability Modes In-memory mode WT memory
  • 9. ACID - Durability Modes In-memory mode write-no-sync mode WT memory WT memory write(2) file system buffer cache
  • 10. ACID - Durability Modes Disk In-memory mode write-no-sync mode full-sync mode WT memory write(2) file system buffer cache WT memory WT memory write(2) file system buffer cache
  • 11. ACID - Durability Durability Guarantee Performance Full Sync In-memory Write-no-Sync
  • 12. ACID - Durability Synchronization Level Process Crash System Crash in-memory (“sync=disabled”) Potential data loss Potential data loss write-no-sync (“sync=off”) Data always recoverable Potential data loss full sync (“sync=on”) Data always recoverable Data always recoverable
  • 13. Why Do We Care?
  • 14. What keeps me up at night?
  • 15. What keeps me up at night?
  • 16. What keeps me up at night? • For storage engineers - “Losing people’s data”
  • 17. How I sleep better • Durability - keep data stable and consistent
  • 18. How I sleep better • Durability - keep data stable and consistent • If we say it’s stable, it really is written and synced
  • 19. How I sleep better • Durability - keep data stable and consistent • If we say it’s stable, it really is written and synced • Crash: we can recover your durably written data
  • 20. How I sleep better • Round the clock testing 24/7 • Randomized • Kill process • Power cycle • Both WiredTiger only and via MongoDB • Any failure is top priority work (and I mean absolute, top priority)
  • 21. How I sleep better • Constant testing gives confidence in code • We think long and hard about durability guarantees • We pay close attention to OS and system call issues • We think long and hard about feature impacts on durability • We take the responsibility of keeping your data seriously
  • 23. Layers All The Way Down Content Repo IoT Sensor Backend Ad Service Customer Analytics Archive Example Future State
  • 24. Layers All The Way Down Content Repo IoT Sensor Backend Ad Service Customer Analytics Archive MongoDB Query Language (MQL) + Native Drivers Example Future State
  • 25. Layers All The Way Down Content Repo IoT Sensor Backend Ad Service Customer Analytics Archive MongoDB Query Language (MQL) + Native Drivers MongoDB Document Data Model Management Security Example Future State
  • 26. Layers All The Way Down Content Repo IoT Sensor Backend Ad Service Customer Analytics Archive MongoDB Query Language (MQL) + Native Drivers MongoDB Document Data Model MMAP V1 WiredTiger RocksDB Mobile Management Security Example Future State In-memory
  • 27. WiredTiger Architecture Database Files Log Files WiredTiger C API Schema & Cursors Transactions Row Storage Column Storage Snapshots Cache Page read/write Block Manager Logging MongoDB Storage Engine Layer
  • 28. WiredTiger Architecture Database Files Log Files WiredTiger C API Schema & Cursors Transactions Row Storage Column Storage Snapshots Cache Page read/write Block Manager Logging
  • 30. Layers of Trust - User • User wants assured stability of some data User
  • 31. Layers of Trust - User • User wants assured stability of some data • User gives write concern: { j : true } j:true turtles User
  • 32. Layers of Trust • User gives write concern: { j : true } • https://docs.mongodb.com/manual/reference/write-concern/: • Write concern describes the level of acknowledgement requested from MongoDB for write operations … • If j: true, [the user] requests acknowledgement that the mongod instances ... have written to the on-disk journal.
  • 33. Layers of Trust - MongoDB src/mongo/db/write_concern.cpp: Status waitForWriteConcern(...) { ... case WriteConcernOptions::SyncMode::JOURNAL: opCtx->recoveryUnit()->waitUntilDurable(); ... } j:true turtles User MDB
  • 34. Layers of Trust - MongoDB void WiredTigerSessionCache::waitUntilDurable(...) { ... // Use the journal when available, or a checkpoint otherwise. if (_engine->isDurable()) { invariantWTOK(_Session->log_flush(_Session, "sync=on")); ... } else { invariantWTOK(_Session->checkpoint(_Session, NULL)); ... } } j:true turtles User MDB
  • 35. Layers of Trust - MongoDB • MongoDB calls WT: _Session->log_flush(_Session, “sync=on”)
  • 36. Layers of Trust - MongoDB • MongoDB calls: _Session->log_flush(_Session, “sync=on”) • http://source.wiredtiger.com/3.0.0/struct_w_t___s_e_s_s_i_o_n.html: log_flush: • forcibly flush the log and wait for it to achieve the synchronization level specified. The on setting forces log records to be written to the storage device.
  • 37. Layers of Trust - WiredTiger int __session_log_flush(...) { ... if (WT_STRING_MATCH("on", cval.str, cval.len)) flags = WT_LOG_FSYNC; ret = __wt_log_flush(session, flags); ... } int __wt_log_flush(...) { ... if (FLAG_ISSET(WT_LOG_FSYNC)) return (__wt_log_force_sync(...)); } j:true turtles User MDB WT
  • 38. Layers of Trust - WiredTiger int __wt_log_force_sync() { ... ret = __wt_fsync(session, log_fh, true)); } int __posix_sync() { ... #ifdef … WT_SYSCALL(fdatasync(fd), ret); #else WT_SYSCALL(fsync(fd), ret); #endif ... } j:true turtles User MDB WT
  • 39. Layers of Trust - WiredTiger • WiredTiger calls: fsync(fd) or fdatasync(fd) • http://man7.org/linux/man-pages/man2/fdatasync.2.html • fsync() transfers ("flushes") all modified ... data of the file to the disk device ... so that all changed information can be retrieved even if the system crashes or is rebooted. ... The call blocks until the device reports that the transfer has completed.
  • 40. Layers of Trust - Operating System https://elixir.bootlin.com/linux/v4.17-rc4/source/ fs/sync.c: SYSCALL_DEFINE1(fsync, unsigned int, fd) { ... ret = vfs_fsync_range(...); int vfs_fsync_range(...) ... return (file->f_op->fsync(...)); fs/xfs/xfs_file.c: int xfs_file_sync(...) ... blkdev_issue_flush(...); j:true turtles User MDB WT OS
  • 41. Layers of Trust - Operating System block/blk-flush.c int blkdev_issue_flush(...) { ... bio_set_dev(...); bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH; ret = submit_bio_wait(...); }
  • 42. Layers of Trust - Device driver • Device driver sends some kind of flush operation. • Code is device specific. • Hardware is the recipient. j:true turtles User MDB WT OS Dev
  • 43. Layers of Trust - Hardware • Device driver is trusting the hardware. • If the hardware lies, data could be lost in a crash. • May need to disable device write-caching: • sudo hdparm -i /dev/sda • /etc/hdparm.conf • #write-cache=off j:true turtles User MDB WT OS Dev HW
  • 44. Layers of Trust - Summary
  • 45. Layers of Trust - Summary • User wants assured stability of some data User
  • 46. Layers of Trust - Summary • User wants assured stability of some data • User trusts MongoDB using { j : true } User MDB
  • 47. Layers of Trust - Summary • User wants assured stability of some data • User trusts MongoDB using { j : true } • MongoDB trusts WiredTiger log_flush(“sync=on”) User MDB WT
  • 48. Layers of Trust - Summary • User wants assured stability of some data • User trusts MongoDB using { j : true } • MongoDB trusts WiredTiger log_flush(“sync=on”) • WiredTiger trusts the OS using fsync(fd) User MDB WT OS
  • 49. Layers of Trust - Summary • User wants assured stability of some data • User trusts MongoDB using { j : true } • MongoDB trusts WiredTiger log_flush(“sync=on”) • WiredTiger trusts the OS using fsync(fd) • Operating System trusts device driver User MDB WT OS Dev
  • 50. Layers of Trust - Summary • User wants assured stability of some data • User trusts MongoDB using { j : true } • MongoDB trusts WiredTiger log_flush(“sync=on”) • WiredTiger trusts the OS using fsync(fd) • Operating System trusts device driver • The device driver trusts the hardware User MDB WT OS Dev HW
  • 52. What Does Durability Mean? • Writing any data to stable storage for later retrieval • In WiredTiger: ○ Checkpoints - coarse-grained durability ○ Write-ahead logging - fine-grained durability
  • 53. WiredTiger Architecture Database Files Log Files WiredTiger C API Schema & Cursors Transactions Row Storage Column Storage Snapshots Cache Page read/write Block Manager Logging
  • 54. Checkpoints • Who: • Called via WT_SESSION::checkpoint() API (MongoDB) • What: • Writes all modified collections, all visible dirty data • When: • API call, internal thread period, on close • How: • Checkpoint opens a snapshot transaction, writes data
  • 57. Checkpoints • Insert some data Collection.wt (Disk)WT Cache (Memory) Apple
  • 58. Checkpoints • Insert some data Collection.wt (Disk)WT Cache (Memory) Apple Banana
  • 59. Checkpoints • Insert some data Collection.wt (Disk)WT Cache (Memory) Apple Banana Cherry
  • 60. Checkpoints • Call WT_SESSION::checkpoint() Collection.wt (Disk)WT Cache (Memory) Apple Banana Cherry
  • 62. Checkpoints • Insert more data Collection.wt (Disk)WT Cache (Memory) Apple Banana Cherry Orange Pineapple
  • 63. Crash
  • 64. Crashing and Recovery • Open collection from most recent checkpoint • Any data after checkpoint is lost
  • 65. Checkpoints • Restart Collection.wt (Disk)WT Cache (Memory) Apple Banana Cherry
  • 66. WiredTiger Architecture Database Files Log Files WiredTiger C API Schema & Cursors Transactions Row Storage Column Storage Snapshots Cache Page read/write Block Manager Logging
  • 67. Write-ahead log (aka journal) • Who: • Enabled/disabled when first creating the database • What: • Logical recording of all transaction modifications • When: • Every time a transaction commits
  • 68. Write-ahead log (aka journal) • Create Table:Collection Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
  • 69. Write-ahead log (aka journal) • Insert some data Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Commit(Apple)
  • 70. Write-ahead log (aka journal) • Insert some data Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana)
  • 71. Write-ahead log (aka journal) • Insert some data Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) Cherry Commit(Cherry)
  • 72. Write-ahead log (aka journal) • Insert some more data Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) Cherry Commit(Cherry) Orange Commit(Orange)
  • 73. Write-ahead log (aka journal) • Insert some more data Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) Cherry Commit(Cherry) Orange Commit(Orange) Pineapple Commit(Pineapple)
  • 74. Crash
  • 75. Crashing and Recovery • First table open: if it exists, most recent checkpoint taken • Recovery replays the entire applicable write-ahead log
  • 76. Write-ahead log (aka journal) • Restart and recovery Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Commit(Apple) Commit(Banana) Commit(Cherry) Commit(Orange) Commit(Pineapple)
  • 77. Write-ahead log (aka journal) • Recovery replays the operations Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) Cherry Commit(Cherry) Orange Commit(Orange) Pineapple Commit(Pineapple)
  • 78. Write-ahead log (aka journal) • Recovery checkpoints on completion Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) CherryCommit(Cherry) OrangeCommit(Orange) PineappleCommit(Pineapple)
  • 79. Checkpoints + Logging = • Faster Recovery • Less disk usage • Best of both worlds
  • 80. Checkpoint + Logging • Create Table:Collection Collection.wt (Disk)WT Cache (Memory) WT Log (Disk)
  • 81. Checkpoint + Logging • Insert some data Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Commit(Apple)
  • 82. Checkpoint + Logging • Insert some data Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana)
  • 83. Checkpoint + Logging • Insert some data Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) Cherry Commit(Cherry)
  • 84. Checkpoint + Logging • WT_SESSION::checkpoint() Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) CherryCommit(Cherry) Checkpoint
  • 85. Checkpoint + Logging • Insert some more data Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) CherryCommit(Cherry) Orange Commit(Orange) Checkpoint
  • 86. Checkpoint + Logging • Insert some more data Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) CherryCommit(Cherry) Orange Commit(Orange) Pineapple Commit(Pineapple) Checkpoint
  • 87. Crash
  • 88. Checkpoint + Logging • Restart and Recovery Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) CherryCommit(Cherry) Commit(Orange) Commit(Pineapple) Checkpoint
  • 89. Checkpoint + Logging • Restart and Recovery Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) CherryCommit(Cherry) Commit(Orange) Commit(Pineapple) Checkpoint Recovery starts here
  • 90. Checkpoint + Logging • Recovery replays the operations Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) CherryCommit(Cherry) Orange Commit(Orange) Pineapple Commit(Pineapple) Checkpoint
  • 91. Checkpoint + Logging • Recovery checkpoints on completion Collection.wt (Disk)WT Cache (Memory) WT Log (Disk) Apple Banana Commit(Apple) Commit(Banana) CherryCommit(Cherry) Orange Commit(Orange) Pineapple Commit(Pineapple) Checkpoint Checkpoint
  • 92. Checkpoints + Logging = • Faster Recovery • Checkpoints bound recovery time. Recovery starts at previous checkpoint • Less disk usage • Checkpoints allow archival of earlier write-ahead log files • Archiving happens on whole-file basis • Fine-grained durability • Logging allows recovery of intermediary data
  • 94. Transactions + Logging • Who: • WT_SESSION::commit_transaction() • WT_SESSION::checkpoint() • Internal operations • What: • Logical recording of all transaction modifications • When: • Every time a transaction commits, before making visible
  • 95. Write-Ahead Modes WT Log (Disk) Commit(Apple) Checkpoint Commit(Banana) Commit(Cherry) Commit(Orange) Commit(Pineapple) Checkpoint
  • 96. Write-Ahead Modes Diskfull-sync mode WT memory write(2) file system buffer cache WT memory WT memory write-no-sync mode in-memory mode write(2) file system buffer cache WT Log (Disk) Commit(Apple) Checkpoint Commit(Banana) Commit(Cherry) Commit(Orange) Commit(Pineapple) Checkpoint
  • 97. Write-Ahead Modes • Guarantees on return from commit transaction: • in-memory durability (default/MongoDB): none • write-no-sync: recoverable after process crash • full-sync: recoverable from process crash or system crash Synchronization Level Process Crash System Crash in-memory Potential data loss Potential data loss write-no-sync Data always recoverable Potential data loss full sync Data always recoverable Data always recoverable
  • 98. Write-ahead In-memory Mode • Why have it? Why use it? • When does it become durable?
  • 99. Write-ahead In-memory Mode • Why have it? Why use it? • When does it become durable? • Filling the memory buffer. • Any later log record requiring greater durability. • Changing to a new log file. • Idle system write log memory buffer every 50 msecs.
  • 100. WT Transactions + Logging + MongoDB • MongoDB uses least persistent sync mode (in-memory). • Most performant (by a lot). • MongoDB can generate multiple transactions at a time. • WT_SESSION::log_flush() API available. • MongoDB calls just before returning guarantee to user.
  • 102. Conclusions • This is cool stuff • I hope you enjoyed learning about it • We take the responsibility of keeping your data seriously
  • 103. How do you sleep at night? Z Z Z Z Z Z Z Z Z
  • 104. Transactions and Durability: Putting the “D” in ACID Sue LoVerso - Senior Staff Engineer, Storage Team MongoDB, Inc sue@mongodb.com