Postgres vision 2018: The Promise of zheap

© 2013 EDB All rights reserved. 1
zheap: Why is EntepriseDB
developing a new storage format for
PostgreSQL?
• Robert Haas | 2018-06-05

• New storage format being developed by EnterpriseDB
• Work in progress is already released on github under
PostgreSQL license
• Basics work, but much remains to be done
• Goal is to get it integrated into PostgreSQL
zheap: What is it?

• Original motivation for zheap: In some workloads,
PostgreSQL tables tend to bloat, and when they do, it’s
hard to get rid of the bloat.
• Bloat occurs when the table and indexes grow even
though the amount of real data being stored has not
increased.
• Bloat is caused mainly by updates, because we must
keep both the old and new row versions for a period of
time.
• Bloat can be a concern because of increased disk
consumption, but typically a bigger concern is
performance loss – if a table is twice as big as it
“should be”, scanning it takes twice as long.
Bloat: Motivation and Definition

• All systems that use MVCC must deal with multiple row
versions, but they store them in different places.
– PostgreSQL and Firebird put all row versions in the
table.
– Oracle and MySQL put old row versions in the
undo log.
– SQL Server puts old row versions in tempdb.
• Leaving old row versions in the table makes cleanup
harder – sometimes need to use CLUSTER or
VACUUM FULL.
• Improving VACUUM helps contain bloat, but can’t
prevent it completely.
Bloat: Why a new storage format?

• Whenever a transaction performs an operation on a
tuple, the operation is also recorded in an undo log.
• If the transaction aborts, the undo log can be used to
reverse all of the operations performed by the
transaction.
• The undo log also contains most of the data we need
for MVCC purposes, so little transactional data needs
to be stored in the table itself. This, and a reduction in
alignment padding, mean that zheap is smaller on disk.
• We avoid dirtying the page except when the data has
been modified, or after an abort. No VACUUM, no
freezing, no “routine maintenance” at all!
zheap: How does it work?

• INSERT: Same as current heap, but in case of an
abort, dead row versions will be removed immediately
by undo, not at a later time by VACUUM.
• DELETE: Same as current heap … mostly.
• UPDATE: Very different! Whenever possible, we want
to update the tuple “in place,” without storing a second
version in the heap.
– The old version of the tuple will be stored in the
undo log.
– In-place updates prevent bloat! The undo log may
bloat, but it will shrink as soon as the relevant
transactions are no longer running.
zheap: Basic Operations

• In the existing heap, every update is either HOT (no
indexes updated) or non-HOT (insert into every index).
• In zheap, every update is either in-place or not. At
present, like a HOT update, an in-place update cannot
modify any indexed columns.
• An “in-place” update is significantly better than a HOT
update because it does not require that the page
contain adequate space for the entire new tuple.
– We don’t need any extra space at all unless the
new version of the tuple is wider than the old one.
• In the future, we will also be able to perform in-place
updates when indexed columns have been modified.
Updates: “In Place” or not?

• Current version of zheap works without any changes to
index access methods.
• We plan to continue supporting the use of unmodified
index access methods with zheap.
• However, if indexes are modified to support “delete-
marking,” we could do in-place updates even when
indexed columns are modified.
• When performing an in-place update, mark the old
index entry as possibly-deleted, and insert a new one.
No changes to indexes where the corresponding
columns aren’t modified!
zheap: Index Support

• In the current heap, each non-HOT update incurs one
insert to every index.
• With zheap, each in-place update will incur one insert
and one delete-marking operation for each index, but
only for indexes where the indexed columns are
modified.
• By removing the restriction that no indexed columns
can be modified, we will be able to perform nearly all
updates in place!
• Only updates that expand the row so that it no longer
fits on the page will need to be performed as not-in-
place.
zheap: Index Support (2)

• If all indexes on the table support delete-marking,
maybe we don’t need VACUUM any more.
• Remember, zheap pages don’t need to be hinted,
frozen, etc. If there are leftover tuples, we can remove
them when we want to reuse the space, rather than in
advance.
• Delete-marked index tuples can be removed “lazily” -
perhaps when they are scanned, or when they are
evicted from shared buffers. Index pages that are
never accessed again might be bloated, but that
doesn’t have much impact on performance.
Eliminating VACUUM

• If we don’t VACUUM, we can’t ever “lose” free space.
This will require changes to free space tracking.
– UPDATE can create free space if the new row
version is narrower than the old one, or if the
update is not-in-place.
– DELETE always creates free space.
– In either case, the free space can’t be used until
the transaction commits.
• VACUUM could still be an option for users wanting to
clean up more aggressively.
Eliminating VACUUM (2)

• pgbench, scale factor 1000
• Simple-update test (1 select, 1 insert, 1 update)
• 64-bit Linux, x86, 2 sockets, 14 cores/socket, 64GB
• shared_buffers=32GB, min_wal_size=15GB,
max_wal_size=20GB, checkpoint_timeout=1200,
maintenance_work_mem=1GB,
checkpoint_completion_target=0.9,
synchronous_commit=off
Performance Data – Test Setup

●
Initial size of accounts table is 13GB in heap – only 11GB in zheap.
●
heap grows to 19GB at 8 clients count test and 26GB at 64-clients. zheap stays
at 11GB!
●
All the undo generated during test gets discarded within a few seconds after the
open transaction is ended.
●
TPS for zheap is ~40% more than heap in above tests at 8 client-count. In some
other high-end machines, we have seen up to ~100% improvement for similar
test.

• Because zheap is smaller on-disk, we get a small
performance boost.
• No worries about VACUUM kicking in unexpectedly.
• Undo bloat is self-healing – good for cloud or other
“unattended” workloads.
• In workloads where the heap bloats and zheap only
bloats the undo, we get a massive performance boost.
• Discarding undo happens in the background and is
cheaper than HOT pruning; that helps, too!
Benefits

• Transaction abort will be more expensive.
• Deletes might not perform as well.
• Could be slow if most/all indexed columns are updated
at the same time.
• Huge amount of development work.
Drawbacks

• Allow PostgreSQL to support pluggable storage
formats.
• Allows innovation – major changes to the heap are
impossible because everyone relies on it. Can’t go
backwards for any use case!
• Allows for user choice – if there are multiple storage
formats available, pick the one that is best for your use
case.
• Hope to see this in PostgreSQL 12 (Fall 2019).
Pluggable Storage: Plan

• Columnar storage
– Most queries don’t need all columns.
• Write-once read-many (WORM)
– No support UPDATE, DELETE, or SELECT FOR UPDATE/SHARE.
• Index-organized storage
– One index is more important than all of the others.
• In-memory storage
– No need to spill to disk.
• Non-transactional storage.
– No MVCC.
Pluggable Storage: Examples

• PostgreSQL 12 or 13
• There will still be much more to do for “v2”.
When?

zheap
• Amit Kapila
(development lead)
• Dilip Kumar
• Kuntal Ghosh
• Mithun CY
• Ashutosh Sharma
• Rafia Sabih
• Beena Emerson
• Amit Khandekar
• Thomas Munro
Who?
Pluggable Storage
• Haribabu Kommi
(Fujitsu)
• Alexander Korotkov
(Postgres Pro)
• Andres Freund
• Ashutosh Bapat

• Any Questions?
Thanks

Postgres vision 2018: The Promise of zheap

More Related Content

What's hot (20)

Similar to Postgres vision 2018: The Promise of zheap (20)

More from EDB (20)

Recently uploaded (20)

Postgres vision 2018: The Promise of zheap