SlideShare a Scribd company logo
Distributed postgres.
XL, XTM, MultiMaster
Stas Kelvich
Started about a year ago.
Konstantin Knizhnik, Constantin Pan, Stas Kelvich
Cluster group in PgPro
2
Started to playing with Postgres-XC. 2ndQuadrant also had project
(finished now) to port XC to 9.5.
Fork is painful;
How can we bring functionality of XC in core?
Cluster group in PgPro
3
Distributed transactions - nothing in-core;
Distributed planner - fdw, pg_shard, greenplum planner (?);
HA/Autofailover - can be built on top of logical decoding.
Distributed postgres
4
Achieve proper isolation between tx for multi-node transactions.
Now in postgres on write tx start:
Aquire XID;
Get list of running tx’s;
Use that info in visibility checks.
Distributed transactions
5
transam/clog.c:
GetTransactionStatus
SetTransactionStatus
transam/varsup.c:
GetNewTransactionId
ipc/procarray.c:
TransactionIdIsInProgress
GetOldestXmin
GetSnapshotData
time/tqual.c:
XidInMVCCSnapshot
XTM API:
vanilla
6
transam/clog.c:
GetTransactionStatus
SetTransactionStatus
transam/varsup.c:
GetNewTransactionId
ipc/procarray.c:
TransactionIdIsInProgress
GetOldestXmin
GetSnapshotData
time/tqual.c:
XidInMVCCSnapshot
Transaction
Manager
XTM API:
after patch
7
transam/clog.c:
GetTransactionStatus
SetTransactionStatus
transam/varsup.c:
GetNewTransactionId
ipc/procarray.c:
TransactionIdIsInProgress
GetOldestXmin
GetSnapshotData
time/tqual.c:
XidInMVCCSnapshot
Transaction
Manager
pg_dtm.so
XTM API:
after tm load
8
Aquire XID centrally (DTMd, arbiter);
No local tx possible;
DTMd is a bottleneck.
XTM implementations
GTM or snapshot sharing
9
Paper from SAP HANA team;
Central daemon is needed, but only for multi-node tx;
Snapshots -> Commit Sequence Number;
DTMd is still a bottleneck.
XTM implementations
Incremental SI
10
XID/CSN are gathered from all nodes that participates in tx;
No central service;
local tx;
possible to reduce communication by using time (Spanner,
CockroachDB).
XTM implementations
Clock-SI or tsDTM
11
XTM implementations
tsDTM scalability
12
More nodes, higher probability of failure in system.
Possible problems with nodes:
Node stopped (and will not be back);
Node was down small amount of time (and we should bring it
back to operation);
Network partitions (avoid split-brain).
If we want to survive network partitions than we can have not more
than [N/2] - 1 failures.
HA/autofailover
13
Possible usage of such system:
Multimaster replication;
Tables with metainformation in sharded databases;
Sharding with redundancy.
HA/autofailover
14
By Multimaster we mean strongly coupled one, that acts as a single
database. With proper isolation and no merge conflicts.
Ways to build:
Global order to XLOG (Postgres-R, MySQL Galera);
Wrap each tx as distributed – allows parallelism while applying
tx.
Multimaster
15
Our implementation:
Built on top of pg_logical;
Make use of tsDTM;
Pool of workers for tx replay;
Raft-based storage for dealing with failures and distributed
deadlock detection.
Multimaster
16
Our implementation:
Approximately half of a speed of standalone postgres;
Same speed for reads;
Deals with nodes autorecovery;
Deals with network partitions (debugging right now).
Can work as an extension (if community accept XTM API in
core).
Multimaster
17

More Related Content

PDF
Multimaster
PDF
Postgres clusters
PPTX
Scylla Summit 2022: Making Schema Changes Safe with Raft
PDF
Managing terabytes: When Postgres gets big
PDF
OpenTSDB: HBaseCon2017
PPTX
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
PPTX
Eventually, Scylla Chooses Consistency
PDF
Kafka on ZFS: Better Living Through Filesystems
Multimaster
Postgres clusters
Scylla Summit 2022: Making Schema Changes Safe with Raft
Managing terabytes: When Postgres gets big
OpenTSDB: HBaseCon2017
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Eventually, Scylla Chooses Consistency
Kafka on ZFS: Better Living Through Filesystems

What's hot (20)

PPT
An intro to Ceph and big data - CERN Big Data Workshop
PPTX
Update on OpenTSDB and AsyncHBase
PPTX
HBaseCon 2015: OpenTSDB and AsyncHBase Update
PPTX
Bluestore
PDF
Experiences building a distributed shared log on RADOS - Noah Watkins
KEY
Introduction to Cassandra: Replication and Consistency
PDF
Cassandra at teads
PDF
Evolving Virtual Networking with IO Visor
PDF
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
PPTX
Debug generic process
PDF
OpenTSDB 2.0
PDF
SignalFx: Making Cassandra Perform as a Time Series Database
PDF
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
PDF
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
PDF
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
PDF
Pgxc scalability pg_open2012
PDF
Ceph data services in a multi- and hybrid cloud world
PDF
CephFS update February 2016
PPTX
Latest performance changes by Scylla - Project optimus / Nolimits
PDF
Tungsten University: Setup & Operate Tungsten Replicator
An intro to Ceph and big data - CERN Big Data Workshop
Update on OpenTSDB and AsyncHBase
HBaseCon 2015: OpenTSDB and AsyncHBase Update
Bluestore
Experiences building a distributed shared log on RADOS - Noah Watkins
Introduction to Cassandra: Replication and Consistency
Cassandra at teads
Evolving Virtual Networking with IO Visor
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
Debug generic process
OpenTSDB 2.0
SignalFx: Making Cassandra Perform as a Time Series Database
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
Pgxc scalability pg_open2012
Ceph data services in a multi- and hybrid cloud world
CephFS update February 2016
Latest performance changes by Scylla - Project optimus / Nolimits
Tungsten University: Setup & Operate Tungsten Replicator
Ad

Viewers also liked (20)

PDF
Postgres-XC Write Scalable PostgreSQL Cluster
PDF
Flexible Indexing with Postgres
 
PDF
Postgres-XC as a Key Value Store Compared To MongoDB
PDF
How the Postgres Query Optimizer Works
 
PDF
Postgres-XC: Symmetric PostgreSQL Cluster
PPT
Best Practices for Database Schema Design
PDF
5 data storage_and_indexing
PPTX
Managing your tech career
PDF
1 introduction
PDF
4 the sql_standard
PDF
6 relational schema_design
PPTX
Webinar: Build an Application Series - Session 2 - Getting Started
PDF
3 relational model
PDF
MySQL Replication: Pros and Cons
ZIP
Week3 Lecture Database Design
PPTX
Database Design
PDF
2 entity relationship_model
PPTX
English gcse final tips
Postgres-XC Write Scalable PostgreSQL Cluster
Flexible Indexing with Postgres
 
Postgres-XC as a Key Value Store Compared To MongoDB
How the Postgres Query Optimizer Works
 
Postgres-XC: Symmetric PostgreSQL Cluster
Best Practices for Database Schema Design
5 data storage_and_indexing
Managing your tech career
1 introduction
4 the sql_standard
6 relational schema_design
Webinar: Build an Application Series - Session 2 - Getting Started
3 relational model
MySQL Replication: Pros and Cons
Week3 Lecture Database Design
Database Design
2 entity relationship_model
English gcse final tips
Ad

Similar to Distributed Postgres (10)

PDF
Introduction to Postrges-XC
PDF
PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)
PDF
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
PDF
Postgres Vienna DB Meetup 2014
PPTX
How YugaByte DB Implements Distributed PostgreSQL
PPTX
Eventual Consitency with CRDTS
PDF
The Challenges of Distributing Postgres: A Citus Story
PDF
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...
PDF
Blockchain meets database
PDF
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Introduction to Postrges-XC
PostgreSQL Sharding and HA: Theory and Practice (PGConf.ASIA 2017)
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Postgres Vienna DB Meetup 2014
How YugaByte DB Implements Distributed PostgreSQL
Eventual Consitency with CRDTS
The Challenges of Distributing Postgres: A Citus Story
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...
Blockchain meets database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database

Recently uploaded (20)

PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Introduction to Artificial Intelligence
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
System and Network Administration Chapter 2
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
L1 - Introduction to python Backend.pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Transform Your Business with a Software ERP System
PDF
Design an Analysis of Algorithms II-SECS-1021-03
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Introduction to Artificial Intelligence
Softaken Excel to vCard Converter Software.pdf
System and Network Administration Chapter 2
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Adobe Illustrator 28.6 Crack My Vision of Vector Design
2025 Textile ERP Trends: SAP, Odoo & Oracle
How Creative Agencies Leverage Project Management Software.pdf
Understanding Forklifts - TECH EHS Solution
Online Work Permit System for Fast Permit Processing
Wondershare Filmora 15 Crack With Activation Key [2025
PTS Company Brochure 2025 (1).pdf.......
Odoo POS Development Services by CandidRoot Solutions
L1 - Introduction to python Backend.pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Transform Your Business with a Software ERP System
Design an Analysis of Algorithms II-SECS-1021-03

Distributed Postgres