SlideShare a Scribd company logo
Scaling WAL Performance
Eliminate replication lag and reduce startup times with pg_prefaulter
What is WAL?
Write





Ahead





Log
W
A
L
Where is WAL?
Write





Ahead





Log
W
A
L
% tree -ld $PGDATA/
!"" base
#   !"" 1
#   !"" 12668
#   $"" 12669
!"" global
!"" pg_clog
!"" pg_commit_ts
!"" pg_dynshmem
!"" pg_logical
#   !"" mappings
#   $"" snapshots
!"" pg_multixact
#   !"" members
#   $"" offsets
!"" pg_notify
!"" pg_replslot
!"" pg_serial
!"" pg_snapshots
!"" pg_stat
!"" pg_stat_tmp
!"" pg_subtrans
!"" pg_tblspc
!"" pg_twophase
$"" pg_xlog
$"" archive_status
WAL	files
The	"heap"	(a.k.a.	your	data)
pg_xlog/
% ls -lA $PGDATA/pg_xlog/
-rw------- 1 seanc staff 16777216 May 31 12:02 $PGDATA/pg_xlog/000000010000000000000001
-rw------- 1 seanc staff 16777216 May 31 12:02 $PGDATA/pg_xlog/000000010000000000000002
-rw------- 1 seanc staff 16777216 May 31 12:02 $PGDATA/pg_xlog/000000010000000000000003
-rw------- 1 seanc staff 16777216 May 31 12:02 $PGDATA/pg_xlog/000000010000000000000004
Heaps of SQL
postgres@[local]:5432/postgres# CREATE DATABASE test;
CREATE DATABASE
Time: 358.395 ms
^Z
% tree -ld $PGDATA/base
!"" 1
!"" 12668
!"" 12669
$"" 16387
4 directories
Creates	new	DB
New	directory
Table Data as Files
postgres@[local]:5432/postgres# c test
You are now connected to database "test" as user "postgres".
postgres@[local]:5432/test# CREATE TABLE t1 (i INT);
CREATE TABLE
Time: 2.273 ms
postgres@[local]:5432/test# SELECT pg_relation_filepath('t1');
pg_relation_filepath
----------------------
base/16387/16388
(1 row)
Time: 1.160 ms
^Z
% stat -f "%Sp %z %N" $PGDATA/base/16387/16388
-rw------- 0 $PGDATA/base/16387/16388
Empty	file
Physical Storage of Data
postgres@[local]:5432/test# INSERT INTO t1 VALUES (1);
INSERT 0 1
Time: 0.581 ms
^Z
% stat -f "%Sp %z %N" $PGDATA/base/16387/16388
-rw------- 8192 $PGDATA/base/16387/16388
% fg
postgres@[local]:5432/test# INSERT INTO t1 VALUES (2);
UPDATE 1
Time: 5.985 ms
^Z
% stat -f "%Sp %z %N" $PGDATA/base/16387/16388
-rw------- 8192 $PGDATA/base/16387/16388
PG	Page	Size	(8K)
How does the WAL relate to the heap?
Write





Ahead





Log
W
A
L
1. Modifications to the heap are
appended to the WAL first
2. Committed transactions in the WAL
are applied in the heap during a
CHECKPOINT
3. Crash recovery walks backwards
through the WAL to the last
completed CHECKPOINT (then rolls
forward through committed
transactions to prevent data loss)
Things to keep in mind
Write





Ahead





Log
W
A
L
1. The WAL receives sequential append
operations
2. WAL can be read forward and backwards
3. Recently written transaction data exists
only in memory and in WAL
4. WAL is probably your performance friend
(deferred random IO against the heap)
Tuples, Pages, Relations, and you!
https://momjian.us/main/writings/pgsql/internalpics.pdf	
https://momjian.us/main/writings/pgsql/mvcc.pdf	
https://www.postgresql.org/docs/current/static/wal.html
© 2018 Joyent. All rights reserved. Joyent Confidential !11
synchronous_commit="remote_write"
Why do you care about apply lag?
Manta	is	an	HTTP	Frontend	to	ZFS
• Files	distributed	across	different	ZFS	storage	servers	
• Metadata	stored	in	PostgreSQL
LB
Frontend
PGprimary
ZFS
PGfollower
PGasync
Caution:	shapes	in	the
diagram	may	appear	more
simple	than	they	actually	are
PostgreSQL	Replication	is	Awesome
PGprimary PGfollower
PGasync
synchronous_commit="XXX"
???
???
ez-mode	HA	Durability	FTW
PGprimary PGfollower
PGasync
synchronous_commit="XXX"
remote_write
on
Hardware	fails	right	on	time,	every	time	
PGprimary PGfollower
PGasync
synchronous_commit="XXX"
remote_write
on
CAP:	Can	haz	A?
This	isn't	a	hardware	problem
PGprimary PGfollower
PGasync
synchronous_commit="XXX"
remote_write
on
It's	gunna	be	a	while,	m'kay?
HINT:	That's	19hrs	of	apply	lag
...oh
How did
we get
into this
mess?
© 2018 Joyent. All rights reserved. Joyent Confidential !20
Cloudy with a chance of
single threaded execution
Context is everything
PGprimary PGfollower
INSERT INTO...
WAL	Stream
-50K DKP
PGprimary
INSERT INTO...
"Many	[INSERTS],	handle	it!"
Context is everything?
PGprimary PGfollower
INSERT INTO...
WAL	Stream
OH HAI!
PGprimary PGfollower
INSERT INTO...
WAL	Stream	
WAL	Sender
pg pg pg pg pg pg
WAL	Receiver
If we're lucky...
Userspace:

WAL	Receiver
Filesystem	Cache Disk	IO
WAL	Page
But we're not because EREALITY
Userspace:

WAL	Receiver
Filesystem	Cache Disk	IO
WAL	Page
And I lied to you. This:
Userspace:

WAL	Receiver
Filesystem	Cache Disk	IO
WAL	Page
...is actually this.
Userspace:

WAL	Receiver
Filesystem	Cache Disk	IO
WAL	Page
~5-10µs
~5-10µs
~5-10µs
~5-10µs
~5-10µs
And this isn't drawn to scale...
Userspace:

WAL	Receiver
Filesystem	Cache Disk	IO
WAL	Page
~5-10µs
~10-30ms
~10-30ms
~5-10µs
~5-10µs
Pixel Correct Timeline
Userspace:

WAL	Receiver
Filesystem	Cache Disk	IO
WAL	Page
15ms	==	300pt
5µs	==	0.1pt
Pixel Correct Timeline
Userspace:

WAL	Receiver
Filesystem	Cache Disk	IO
WAL	Page
15ms	==	300pt
5µs	==	0.1pt
And that RAID array you have? It's Idle.
Storage math:

150 iops/disk * 16 disks = ~2400 IOPS (if perfectly scheduled)
And	that	RAID	array	you	have?

It's	Idle.
• Storage	math:

150	iops/disk	*	16	disks	=	~2400	IOPS
And	that	RAID	array	you	have?

It's	Idle.
• Storage	math:

150	iops/disk	*	16	disks	=	~2400	IOPS	
• Single	WAL	Receiver	process	issuing	
pread(2)	
• Max	150	IOPS	or	~6%	utilization	of	disks	
• Busy	primaries	will	overrun	followers,	
permanently
It's	gunna	be	a	while,	m'kay?
© 2018 Joyent. All rights reserved. Joyent Confidential !36
Fixed It
Installation
1. Install Go
2. go get github.com/joyent/pg_prefaulter
3. Configure
4. Run
Configuration
[log]
# level can be set to "DEBUG", "INFO", "WARN", "ERROR", or "FATAL"
#level = "INFO"
[postgresql]
#pgdata = "pgdata"
#database = "postgres"
#host = "/tmp"
#password = ""
#port = 5432
#user = "postgres"
[postgresql.xlog]
#pg_xlogdump-path = "/usr/local/bin/pg_xlogdump"
Run: Primary
% env PGPASSWORD=`cat .pwfile` ./pg_prefaulter run --config pg_prefaulter-primary.toml

2018-05-31T11:59:01.413991821-04:00 |DEBU| <nil> config-file=pg_prefaulter-primary.toml
2018-05-31T11:59:01.414189771-04:00 |DEBU| args: []
2018-05-31T11:59:01.414315299-04:00 |DEBU| starting gops(1) agent
2018-05-31T11:59:01.414475394-04:00 |DEBU| starting pprof endpoing agent pprof-port=4242
2018-05-31T11:59:01.414439447-04:00 |DEBU| flags postgresql.host=/tmp postgresql.pgdata=/Users/seanc/go/src/github.com/
joyent/pg_prefaulter/.pgdata_primary/ postgresql.poll-interval=1000 postgresql.port=5432 postgresql.user=postgres pos
tgresql.xlog.mode=pg postgresql.xlog.pg_xlogdump-path=/opt/local//lib/postgresql96/bin/pg_xlogdump
2018-05-31T11:59:01.415005542-04:00 |INFO| Starting pg_prefaulter pid=39865
2018-05-31T11:59:01.417634192-04:00 |DEBU| filehandle cache initialized filehandle-cache-size=2000 filehandle-cache-
ttl=300000 rlimit-nofile=7168
2018-05-31T11:59:01.426437960-04:00 |INFO| started IO worker threads io-worker-threads=3600
2018-05-31T11:59:01.454895027-04:00 |INFO| started WAL worker threads wal-worker-threads=4
2018-05-31T11:59:01.455209806-04:00 |DEBU| Starting wait
2018-05-31T11:59:01.455269901-04:00 |INFO| Starting pg_prefaulter agent commit=none date=unknown tag= version=dev
2018-05-31T11:59:01.498278613-04:00 |DEBU| established DB connection backend-pid=39867 version="PostgreSQL 9.6.3 on x86_64-
apple-darwin16.5.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit"
2018-05-31T11:59:01.500484662-04:00 |DEBU| found redo WAL segment from DB type=redo walfile=000000010000000000000001
2018-05-31T11:59:01.513085485-04:00 |INFO| skipping REDO record for database database=0 input="rmgr: Heap len (rec/
tot): 14/ 469, tx: 4, lsn: 0/01007750, prev 0/01007728, desc: HOT_UPDATE off 1 xmax 4 ; new off 3 x
max 0, blkref #0: rel 1664/0/1260 blk 0 FPW"
2018-05-31T11:59:01.513213488-04:00 |INFO| skipping REDO record for database database=0 input="rmgr: Heap len (rec/
tot): 2/ 337, tx: 0, lsn: 0/01007988, prev 0/01007950, desc: INPLACE off 1, blkref #0: rel 1664/0/
1262 blk 0 FPW"
2018-05-31T11:59:01.558219381-04:00 |INFO| skipping REDO record for database database=0 input="rmgr: Heap len (rec/
tot): 3/ 80, tx: 22, lsn: 0/0116B050, prev 0/0116B028, desc: INSERT+INIT off 1, blkref #0: rel 16$
4/0/1214 blk 0"
Run: Followers
% env PGPASSWORD=Kdr6zmvYOgWTKnol7HcULw91o15KhA6c ./pg_prefaulter run --config pg_prefaulter-follower.toml
--pprof-port=4243
2018-05-31T12:02:15.364191007-04:00 |DEBU| <nil> config-file=pg_prefaulter-follower.toml
2018-05-31T12:02:15.364357715-04:00 |DEBU| args: []
2018-05-31T12:02:15.364448823-04:00 |DEBU| starting gops(1) agent
2018-05-31T12:02:15.364508931-04:00 |DEBU| starting pprof endpoing agent pprof-port=4243
2018-05-31T12:02:15.364556820-04:00 |DEBU| flags postgresql.host=/tmp postgresql.pgdata=/Users/seanc/go/
src/github.com/joyent/pg_prefaulter/.pgdata_follower/ postgresql.poll-interval=1000 postgresql.port=5433
postgresql.user=postgres postgresql.xlog.mode=pg postgresql.xlog.pg_xlogdump-path=/opt/local/lib/
postgresql96/bin/pg_xlogdump
2018-05-31T12:02:15.365189238-04:00 |INFO| Starting pg_prefaulter pid=40018
2018-05-31T12:02:15.367508589-04:00 |DEBU| filehandle cache initialized filehandle-cache-size=2000
filehandle-cache-ttl=300000 rlimit-nofile=7168
2018-05-31T12:02:15.376917068-04:00 |INFO| started IO worker threads io-worker-threads=3600
2018-05-31T12:02:15.377022308-04:00 |INFO| started WAL worker threads wal-worker-threads=4
2018-05-31T12:02:15.377063872-04:00 |DEBU| Starting wait
2018-05-31T12:02:15.377104519-04:00 |INFO| Starting pg_prefaulter agent commit=none date=unknown tag=
version=dev
2018-05-31T12:02:15.413981503-04:00 |DEBU| established DB connection backend-pid=40019 version="PostgreSQL
9.6.3 on x86_64-apple-darwin16.5.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit"
2018-05-31T12:02:15.414627296-04:00 |DEBU| found redo WAL segment from DB type=redo
walfile=000000010000000000000004
© 2018 Joyent. All rights reserved. Joyent Confidential !41
What's the voodoo?
pg_prefaulter(1) Design
1. Find WAL files
2. Process WAL files using pg_xlogdump(1)
3. Read the text output from pg_xlogdump(1)
4. Translate output into offsets into relations (i.e. tables/indexes)
5. Dispatch pread(2) calls in parallel
6. Warm the OS cache before the WAL apply process faults a page in
by itself
7. Dump all internal caches if process notices primary/follower change
8. Profit (or at least, fail less hard on failover or startup)
Finding WAL Files
1. Connect to PostgreSQL
2. Search for hints in process titles
:heart: pg_xlogdump(1)
• Platform and WAL file version agnostic way of extracting WAL
information
• Elided the need for writing a customer WAL parser
PostgreSQL
pg_prefaulter(1) Architecture
WAL	File
WAL	File
WAL	File
WAL	File
WAL	File
WAL	File
pg_prefaulter(1)
OS
WAL	Filename	Cache
IO	Request	Cache
FD	Cache
IO	Thread	1
IO	Thread	1
IO	Thread	1
IO	Thread	1
IO	Thread	1
IO	Thread	N
pread(2)
WAL	File	Scanner
System	
Catalogs
Proc	Titles
pg_xlogdump(1)
pg_xlogdump(1)
WAL	Receiver
Requirements
PostgreSQL 9.6

(an update to support 10 and 11 is coming soon)

Go compiler to build the binary

pg_xlogdump(1)
1
2
3
Where to use pg_prefaulter(1)
1. On the primary
2. On all followers
3. Useful at startup for primaries and followers
4. Useful for promotion of followers
5. Useful on standalone PostgreSQL instances not using replication
6. Any database that you want to see start faster or where you care
about availability (i.e. everywhere, on all PG instances)
7. Any PostgreSQL database that replicates and VACUUMs or
pg_repack(1)s - i.e. generates lots of WAL activity
Don't be laggin' like this...
Be prefaultin' like this!
pg_prefaulter
deployed
Recovery Visualized
Falling	behind	at	0.8s/s
Recovering	at	-0.6s/sFalling	behind	at	0.2s/s
pg_prefaulter
deployed
Fully Recovered
Steady As She Goes
Thank you!
https://github.com/joyent/pg_prefaulter
@SeanChittenden

seanc@joyent.com	
seanc@FreeBSD.org	
sean@chittenden.org
We're	Hiring!

More Related Content

PDF
PostgreSQL + ZFS best practices
PDF
PostgreSQL on ZFS Lightning Talk
PPTX
Backups
PDF
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PDF
Zfs intro v2
PDF
Comparison of-foss-distributed-storage
PDF
Linux performance tuning & stabilization tips (mysqlconf2010)
ZIP
Zfs Nuts And Bolts
PostgreSQL + ZFS best practices
PostgreSQL on ZFS Lightning Talk
Backups
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Zfs intro v2
Comparison of-foss-distributed-storage
Linux performance tuning & stabilization tips (mysqlconf2010)
Zfs Nuts And Bolts

What's hot (20)

PDF
ZFS Talk Part 1
PDF
ZFS Workshop
PDF
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
PDF
PDF
The New Systems Performance
PDF
Scale2014
PDF
An Introduction to the Implementation of ZFS by Kirk McKusick
PPTX
Bluestore
PPTX
UKOUG, Lies, Damn Lies and I/O Statistics
PDF
LizardFS-WhitePaper-Eng-v3.9.2-web
PDF
Systems Performance: Enterprise and the Cloud
PDF
ZFS in 30 minutes
PPTX
Ceph - High Performance Without High Costs
PDF
Performance comparison of Distributed File Systems on 1Gbit networks
PDF
XtraDB 5.7: key performance algorithms
PDF
Kfs presentation
KEY
ZFS Tutorial LISA 2011
PDF
Btrfs current status and_future_prospects
PDF
Comparison of foss distributed storage
ZFS Talk Part 1
ZFS Workshop
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
The New Systems Performance
Scale2014
An Introduction to the Implementation of ZFS by Kirk McKusick
Bluestore
UKOUG, Lies, Damn Lies and I/O Statistics
LizardFS-WhitePaper-Eng-v3.9.2-web
Systems Performance: Enterprise and the Cloud
ZFS in 30 minutes
Ceph - High Performance Without High Costs
Performance comparison of Distributed File Systems on 1Gbit networks
XtraDB 5.7: key performance algorithms
Kfs presentation
ZFS Tutorial LISA 2011
Btrfs current status and_future_prospects
Comparison of foss distributed storage
Ad

Similar to pg_prefaulter: Scaling WAL Performance (20)

PDF
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
PDF
How to take a dump from a Wal file PostgreSQL
PDF
Postgres Vision 2018: WAL: Everything You Want to Know
 
PDF
9.6_Course Material-Postgresql_002.pdf
PPTX
Tuning PostgreSQL for High Write Throughput
PDF
PostgreSQL WAL for DBAs
PDF
Bn 1016 demo postgre sql-online-training
PPTX
515689311-Postgresql-DBA-Architecture.pptx
PPTX
Postgresql Database Administration Basic - Day1
PDF
PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
KEY
Grabbing the PostgreSQL Elephant by the Trunk
PDF
PostgreSQL Prologue
PDF
PostgreSQL_IO_Patterns Dinesh_Kumar Chemuduru pdf
ODP
Pro PostgreSQL, OSCon 2008
PDF
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PDF
7 Ways To Crash Postgres
PPTX
PostgreSQL Terminology
PDF
The Accidental DBA
PDF
libSQL
PDF
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
PostgreSQL Write-Ahead Log (Heikki Linnakangas)
How to take a dump from a Wal file PostgreSQL
Postgres Vision 2018: WAL: Everything You Want to Know
 
9.6_Course Material-Postgresql_002.pdf
Tuning PostgreSQL for High Write Throughput
PostgreSQL WAL for DBAs
Bn 1016 demo postgre sql-online-training
515689311-Postgresql-DBA-Architecture.pptx
Postgresql Database Administration Basic - Day1
PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
Grabbing the PostgreSQL Elephant by the Trunk
PostgreSQL Prologue
PostgreSQL_IO_Patterns Dinesh_Kumar Chemuduru pdf
Pro PostgreSQL, OSCon 2008
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
7 Ways To Crash Postgres
PostgreSQL Terminology
The Accidental DBA
libSQL
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Ad

More from Sean Chittenden (12)

PDF
BSDCan '19 Core Update
PDF
FreeBSD VPC Introduction
PDF
Universal Userland
PDF
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
PDF
Codified PostgreSQL Schema
PDF
Incrementalism: An Industrial Strategy For Adopting Modern Automation
PDF
Production Readiness Strategies in an Automated World
PDF
FreeBSD: Dev to Prod
PDF
Dynamic Database Credentials: Security Contingency Planning
PDF
PostgreSQL High-Availability and Geographic Locality using consul
PDF
Modern tooling to assist with developing applications on FreeBSD
PDF
Creating PostgreSQL-as-a-Service at Scale
BSDCan '19 Core Update
FreeBSD VPC Introduction
Universal Userland
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Codified PostgreSQL Schema
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Production Readiness Strategies in an Automated World
FreeBSD: Dev to Prod
Dynamic Database Credentials: Security Contingency Planning
PostgreSQL High-Availability and Geographic Locality using consul
Modern tooling to assist with developing applications on FreeBSD
Creating PostgreSQL-as-a-Service at Scale

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Getting Started with Data Integration: FME Form 101
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Machine Learning_overview_presentation.pptx
PPTX
1. Introduction to Computer Programming.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Per capita expenditure prediction using model stacking based on satellite ima...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
“AI and Expert System Decision Support & Business Intelligence Systems”
Getting Started with Data Integration: FME Form 101
MIND Revenue Release Quarter 2 2025 Press Release
Building Integrated photovoltaic BIPV_UPV.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Dropbox Q2 2025 Financial Results & Investor Presentation
Machine Learning_overview_presentation.pptx
1. Introduction to Computer Programming.pptx

pg_prefaulter: Scaling WAL Performance