Redo log

The following is intended to outline our general product direction. It is
intended for information purposes only, and may not be incorporated
into any contract. It is not a commitment to deliver any material,
code, or functionality, and should not be relied upon in making
purchasing decisions. The development, release, timing, and pricing of
any features or functionality described for Oracle's products may
change and remains at the sole discretion of Oracle Corporation.
Safe Harbor Slide

Mini transactions
When they are used ?
InnoDB stores all data in 16kB pages (default size) and all changes to these pages go
through usage of mini transactions. This means that mini transactions are used very, very
often.
Single user transaction consists of multiple mini transactions. Commit of transaction itself
requires a new mini transaction (which modifies undo log pages).
What they are for ?
• Allow to do atomic changes to multiple pages
• Postpone writes of re-modified pages to disk
• Write only log of changes applied to pages

Mini transaction commit in MySQL 5.7
Reserve place
and space in
the redo log
Write log
records to the
log buffer
Mark modified
pages, add
them to flush
lists and
release latches
1 2 3
ACQUIRE
/ DO WORK /
RELEASE
Mutex exchange:
log_sys → log_flush_order caused
performance issue when first thread
started to wait for the log_flush_order
mutex, holding the log_sys mutex.

New design in 8.0.5+
Reserve place
and space in
the redo log
Write log
records to the
log buffer
Mark modified
pages, add
them to flush
lists and
release latches1
2a
3a
Report written2b Report done3b

Comparison of mtr_commit in 5.7 vs in 8.0.5

1. The LSN sequence defines time line for recovery.
2. Stages of mini transaction commit are executed concurrently and threads may interleave.
3. Threads concurrently report finished operations to a new lock-free data structure.
4. The data structure tracks up to which LSN all operations are reported as finished (per stage).
Tracking concurrent operations

Limited window for
pending operations (L)
Pending tasks (in progress)
Wait (unlikely)
All past tasks done
Tracking concurrent operations
1.Window of pending operations is limited (to L bytes of the LSN sequence (1 MB))
2.Before adding dirty page to flush list, wait until its oldest_lsn fits the current window.
3.This guarantees that checkpoint_lsn could be written at oldest_lsn - L

Relaxed order of pages in flush list

/* Create a new “light task” */
your_start_time = time_sequence.next_time(planned_time_interval);
/* Wait until it's permitted to start the execution (unlikely to wait). */
tasks_done.wait_until_in_current_window(your_start_time);
/* Do your work */
foo();
/* Report it's done. */
tasks_done.report_task_done(your_start_time,
your_start_time + planned_time_interval);
Generalized algorithm (extracted)

1
2
3
S
This step is just to have an option to:
“stop the world” which is very
uncommon
Sharded RW-latch for mtr_commit

New strategy for writing to disk:
1. Sooner log is written, sooner transaction's commit can finish.
2. We keep an eager loop of writes to OS buffer.
3. We keep an eager loop of fsyncs.
However:
4. We avoid rewriting log blocks - we write only full log blocks unless none is ready.
5. We preserve write-ahead strategy to avoid read-on-write issue.
Redo threads

Waiting for redo written / flushed
New strategy to wait for redo written / flushed
• Select finer grained event (in 5.7 there was only 1 event for that)
• Granularity adjusted to the expected granularity of writes (per log block)
• Optionally use spin delay first (if CPU is not busy)
• Users waiting in block for which write started, when it was only partially filled,
could experience false wake-ups.

CPU usage is monitored not to use spin delay when server is almost idle, and
not to use spin delay when we don't have enough CPU power for useful things.
Average time between consecutive requests to write or flush redo is monitored to
detect situation in which requests are really not often and spin delay is not required.
In such cases we also start sleeps with higher timeout. This helps to avoid wasting
CPU in cases where log threads don't need to be so eager.
Consuming unused CPU to improve TPS
1
2

Dedicated solution (5.7-alike) for low-concurrent workloads to avoid need for
spinning and consuming CPU and still deliver top TPS for that # of connections.
Changes to redo format. Dynamic resize of the redo log on disk, no more wrapping
within single file. Checkpoints stored within each log file. No longer logfile0 is special.
Changes to redo log incoming soon
1
2

Thank You
Paweł Olchawa
Senior Software Developer
Oracle / MySQL / InnoDB

Redo log

More Related Content

What's hot (20)

Similar to Redo log (20)

Recently uploaded (20)

Redo log