SlideShare a Scribd company logo
Distributed
Point-in-Time Recovery
with Postgres
Eren Başak
Cloud Software Engineer
Citus Data
PGConf.Russia 2018
• What is Point-in-Time Recovery
• How to do point-in-time recovery
• Distributed Point-in-time-Recovery
• Citus Cloud way of doing PITR
Overview
2 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
• Point-in-the-past copy of an existing database
• “I want 02:45 pm yesterday copy of my database”
• A fork
Point-in-Time Recovery
3 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
regular database
PITR database
PITR
• DB Admin mistakes (DROP a wrong column)
• User deletes data by mistake
“I ran our unit tests against the production database”
• Want an independent copy of the production database
• Playground for data analysts
• Understand the impact of bigger changes (a new index)
Point-in-Time Recovery
4 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
Benefits and Use Cases
• Periodic base backups
Run pg_basebackup and archive the backups
• Wal Archiving
archive_command = ‘cp %p “/somewhere/reliable/%f”’
How to?
5 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
Prerequisites
1. Determine a PITR target
Timestamp or named restore points
2. Restore a proper backup
3. Prepare recovery.conf
How to?
6 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
Recovery Steps
1. Restore Command to fetch necessary WAL files
restore_command = 'cp “/somewhere/reliable/%f” %p'
2. Recovery Target
a. Named Restore Point: recovery_target_name = ‘my-restore-point’
b. Time: recovery_target_date = ‘2018-01-24 06:37:00 +0300’
3. Other Settings
a. recovery_target_inclusive = true|false
b. standby_mode = true|false
c. recovery_target_action= shutdown|pause|promote
How to?
7 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
recovery.conf
1. recovery.conf -> recovery.done after promotion
2. SELECT pg_is_in_recovery()
3. SELECT pg_last_xact_replay_timestamp()
How to?
8 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
Monitoring the progress
1. Multiple PostgreSQL servers working together
a. Citus
b. Postgres-XL
c. Application level sharding
2. PITR of all servers at once
3. May need to update metadata
pg_dist_node for Citus
Distributed PITR
9 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
1. Use a suitable target time
All servers should have backups before the selected time
2. PITR all servers to the target time
Distributed PITR
10 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
Simple approach
1. Clock differences
A distributed transaction has completed in one node but not started
at another node
2. Ongoing transactions
What if one transaction is aborted while others are ongoing
Distributed PITR
11 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
Simple approach: Problems
1. Periodically create distributed restore points
a. Block all writes / take locks
b. Run pg_create_restore_point() on all servers
c. Store restore point name on somewhere else
2. Pick a suitable distributed restore point
3. Execute PITR with distributed restore point name
Distributed PITR
12 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
Distributed Restore Points
Citus is an extension to scale-out Postgres.
Citus Cloud: Managed Citus offering from Citus Data
• Nodes are on AWS EC2
• Daily backups of all servers to S3
• WAL archival to S3
• Backups are stored for 7 days
• Using WAL-E (and soon WAL-G)
PITR at Citus Cloud
13 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
1. User selects a target time and instance types.
Temporary/non-production PITR clusters don’t need to be as beefy as the production cluster.
2. Citus Cloud creates a new cluster.
3. Restore backups for each server.
4. Update Coordinator Metadata after PITR is complete.
PITR at Citus Cloud
14 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
citus_create_restore_point()
1. Open connections from coordinator to workers
2. Send BEGIN commands
3. Block distributed transactions by locking metadata
4. Run pg_create_restore_point() on the coordinator
5. Send pg_create_restore_point() commands
PITR at Citus Cloud
15 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
PITR at Citus Cloud
16 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
Serve
r 1
Serve
r 2
Serve
r 3
backu
ps WAL
AWS
S3
wal-
e
Production
Cluster
Normal State of a Citus
Cluster
PITR at Citus Cloud
17 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
Serve
r 1
Serve
r 2
Serve
r 3
backu
ps WAL
AWS
S3
wal-
e
Production
Cluster
During PITR
Serve
r 1
Serve
r 2
Serve
r 3
wal-
e
Fork Cluster
PITR at Citus Cloud
18 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
Serve
r 1
Serve
r 2
Serve
r 3
backu
ps WAL
AWS
S3
wal-
e
Production
Cluster
After PITR is
completed
Serve
r 1
Serve
r 2
Serve
r 3
wal-
e
Fork Cluster
© 2017 Citus Data. All right reserved.
eren@citusdata.com
Thank
You!
Eren Başak
www.citusdata.com @aamederen

More Related Content

PDF
Oracle Enterprise Manager Cloud Control 13c for DBAs
PDF
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
PDF
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
PDF
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
PPTX
SQL Server High Availability and Disaster Recovery
PPTX
Openstack glance
PPTX
Microsoft SQL Server Database Administration.pptx
PPTX
Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Oracle Enterprise Manager Cloud Control 13c for DBAs
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
SQL Server High Availability and Disaster Recovery
Openstack glance
Microsoft SQL Server Database Administration.pptx
Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle

What's hot (20)

PDF
Log Structured Merge Tree
PDF
AWS VPC best practices 2016 by Bogdan Naydenov
PDF
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
PDF
New Generation Oracle RAC Performance
PDF
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
PPT
Oracle Transparent Data Encryption (TDE) 12c
PPTX
Tanel Poder Oracle Scripts and Tools (2010)
PPTX
Hashicorp Vault ppt
PDF
MySQL/MariaDB Proxy Software Test
PPTX
RocksDB detail
PPTX
Transparent Data Encryption
PPTX
PostgreSQL.pptx
PDF
RMAN - New Features in Oracle 12c - IOUG Collaborate 2017
PPTX
Oracle GoldenGate 21c New Features and Best Practices
PDF
Patroni - HA PostgreSQL made easy
PDF
Oracle 21c: New Features and Enhancements of Data Pump & TTS
PPSX
Ppt dbsec-oow2013-avdf
PDF
Oracle RAC One Node 12c Overview
PDF
Oracle Security Presentation
PDF
Learn Oracle WebLogic Server 12c Administration
Log Structured Merge Tree
AWS VPC best practices 2016 by Bogdan Naydenov
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
New Generation Oracle RAC Performance
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
Oracle Transparent Data Encryption (TDE) 12c
Tanel Poder Oracle Scripts and Tools (2010)
Hashicorp Vault ppt
MySQL/MariaDB Proxy Software Test
RocksDB detail
Transparent Data Encryption
PostgreSQL.pptx
RMAN - New Features in Oracle 12c - IOUG Collaborate 2017
Oracle GoldenGate 21c New Features and Best Practices
Patroni - HA PostgreSQL made easy
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Ppt dbsec-oow2013-avdf
Oracle RAC One Node 12c Overview
Oracle Security Presentation
Learn Oracle WebLogic Server 12c Administration
Ad

Similar to Distributed Point-in-Time Recovery with Postgres | PGConf.Russia 2018 | Eren Basak (20)

PPTX
201601007 Limelight - Hui Qin Teoh
PDF
Sprint 121
PDF
Towards a self automated CERN Cloud
ODP
Continuous delivery of Windows micro services in the cloud
PDF
Time series denver an introduction to prometheus
PDF
Ceilometer lsf-intergration-openstack-summit
PPTX
Ceilometer Updates - Kilo Edition
PDF
Multi-tenant Data Pipeline Orchestration
DOCX
Large scale virtual Machine log collector (Project-Report)
PDF
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
PDF
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
PDF
Sprint 77
PPSX
November 2013 HUG: Real-time analytics with in-memory grid
PDF
Sprint 81
PDF
Compsac 2018
PDF
Combinação de logs, métricas e rastreamentos para observabilidade unificada
PPTX
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
PPTX
Kubernetes Info Operators Operators Operators
PDF
Caching Data in OutSystems: A Tale of Gains Without Pain
PDF
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
201601007 Limelight - Hui Qin Teoh
Sprint 121
Towards a self automated CERN Cloud
Continuous delivery of Windows micro services in the cloud
Time series denver an introduction to prometheus
Ceilometer lsf-intergration-openstack-summit
Ceilometer Updates - Kilo Edition
Multi-tenant Data Pipeline Orchestration
Large scale virtual Machine log collector (Project-Report)
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Sprint 77
November 2013 HUG: Real-time analytics with in-memory grid
Sprint 81
Compsac 2018
Combinação de logs, métricas e rastreamentos para observabilidade unificada
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Kubernetes Info Operators Operators Operators
Caching Data in OutSystems: A Tale of Gains Without Pain
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
Ad

More from Citus Data (20)

PDF
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
PDF
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
PDF
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
PDF
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
PDF
When it all goes wrong | PGConf EU 2019 | Will Leinweber
PDF
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
PDF
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
PDF
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
PDF
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
PDF
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
PDF
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
PDF
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
PDF
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
PDF
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
PDF
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
PDF
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
PDF
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
PDF
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
PDF
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
PDF
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
When it all goes wrong | PGConf EU 2019 | Will Leinweber
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
When it all goes wrong (with Postgres) | RailsConf 2019 | Will Leinweber
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
annual-report-2024-2025 original latest.
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Supervised vs unsupervised machine learning algorithms
IB Computer Science - Internal Assessment.pptx
.pdf is not working space design for the following data for the following dat...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Mega Projects Data Mega Projects Data
annual-report-2024-2025 original latest.
oil_refinery_comprehensive_20250804084928 (1).pptx
1_Introduction to advance data techniques.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Business Analytics and business intelligence.pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Reliability_Chapter_ presentation 1221.5784
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Miokarditis (Inflamasi pada Otot Jantung)
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Supervised vs unsupervised machine learning algorithms

Distributed Point-in-Time Recovery with Postgres | PGConf.Russia 2018 | Eren Basak

  • 1. Distributed Point-in-Time Recovery with Postgres Eren Başak Cloud Software Engineer Citus Data PGConf.Russia 2018
  • 2. • What is Point-in-Time Recovery • How to do point-in-time recovery • Distributed Point-in-time-Recovery • Citus Cloud way of doing PITR Overview 2 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
  • 3. • Point-in-the-past copy of an existing database • “I want 02:45 pm yesterday copy of my database” • A fork Point-in-Time Recovery 3 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 regular database PITR database PITR
  • 4. • DB Admin mistakes (DROP a wrong column) • User deletes data by mistake “I ran our unit tests against the production database” • Want an independent copy of the production database • Playground for data analysts • Understand the impact of bigger changes (a new index) Point-in-Time Recovery 4 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 Benefits and Use Cases
  • 5. • Periodic base backups Run pg_basebackup and archive the backups • Wal Archiving archive_command = ‘cp %p “/somewhere/reliable/%f”’ How to? 5 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 Prerequisites
  • 6. 1. Determine a PITR target Timestamp or named restore points 2. Restore a proper backup 3. Prepare recovery.conf How to? 6 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 Recovery Steps
  • 7. 1. Restore Command to fetch necessary WAL files restore_command = 'cp “/somewhere/reliable/%f” %p' 2. Recovery Target a. Named Restore Point: recovery_target_name = ‘my-restore-point’ b. Time: recovery_target_date = ‘2018-01-24 06:37:00 +0300’ 3. Other Settings a. recovery_target_inclusive = true|false b. standby_mode = true|false c. recovery_target_action= shutdown|pause|promote How to? 7 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 recovery.conf
  • 8. 1. recovery.conf -> recovery.done after promotion 2. SELECT pg_is_in_recovery() 3. SELECT pg_last_xact_replay_timestamp() How to? 8 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 Monitoring the progress
  • 9. 1. Multiple PostgreSQL servers working together a. Citus b. Postgres-XL c. Application level sharding 2. PITR of all servers at once 3. May need to update metadata pg_dist_node for Citus Distributed PITR 9 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
  • 10. 1. Use a suitable target time All servers should have backups before the selected time 2. PITR all servers to the target time Distributed PITR 10 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 Simple approach
  • 11. 1. Clock differences A distributed transaction has completed in one node but not started at another node 2. Ongoing transactions What if one transaction is aborted while others are ongoing Distributed PITR 11 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 Simple approach: Problems
  • 12. 1. Periodically create distributed restore points a. Block all writes / take locks b. Run pg_create_restore_point() on all servers c. Store restore point name on somewhere else 2. Pick a suitable distributed restore point 3. Execute PITR with distributed restore point name Distributed PITR 12 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 Distributed Restore Points
  • 13. Citus is an extension to scale-out Postgres. Citus Cloud: Managed Citus offering from Citus Data • Nodes are on AWS EC2 • Daily backups of all servers to S3 • WAL archival to S3 • Backups are stored for 7 days • Using WAL-E (and soon WAL-G) PITR at Citus Cloud 13 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
  • 14. 1. User selects a target time and instance types. Temporary/non-production PITR clusters don’t need to be as beefy as the production cluster. 2. Citus Cloud creates a new cluster. 3. Restore backups for each server. 4. Update Coordinator Metadata after PITR is complete. PITR at Citus Cloud 14 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
  • 15. citus_create_restore_point() 1. Open connections from coordinator to workers 2. Send BEGIN commands 3. Block distributed transactions by locking metadata 4. Run pg_create_restore_point() on the coordinator 5. Send pg_create_restore_point() commands PITR at Citus Cloud 15 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018
  • 16. PITR at Citus Cloud 16 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 Serve r 1 Serve r 2 Serve r 3 backu ps WAL AWS S3 wal- e Production Cluster Normal State of a Citus Cluster
  • 17. PITR at Citus Cloud 17 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 Serve r 1 Serve r 2 Serve r 3 backu ps WAL AWS S3 wal- e Production Cluster During PITR Serve r 1 Serve r 2 Serve r 3 wal- e Fork Cluster
  • 18. PITR at Citus Cloud 18 Eren Başak | Citus Data | PGConf.Russia 2018 | February 2018 Serve r 1 Serve r 2 Serve r 3 backu ps WAL AWS S3 wal- e Production Cluster After PITR is completed Serve r 1 Serve r 2 Serve r 3 wal- e Fork Cluster
  • 19. © 2017 Citus Data. All right reserved. eren@citusdata.com Thank You! Eren Başak www.citusdata.com @aamederen