SlideShare a Scribd company logo
Scaling out by distributing and
replicating data in Postgres-XC
Ashutosh Bapat
@Post gr es Open 2012
Agenda
●
What is Postgres-XC
●
Postgres-XC architecture over-view
●
Data distribution in XC
●
Effect of data distribution on performance
●
Example DBT-1 schema
What is Postgres-XC
● Shared Nothing Cluster
– Multiple collaborating PostgreSQL-like servers
– No resources shared
– Scaling by adding commodity hardware
● Write-scalable
– Write/Read scalable by adding nodes
– Multiple nodes where writes can be issued
● Syncronous
– Writes to one node are refl ected on all the nodes
● Transparent
– Applications need not care about the data distribution
Coordinators
Add coordinators
Datanodes
Add datanodes
SQL + libpq interface
Postgres-XC cluster
SQL statements from applications
Transactioninfo
GTM
Postgres-XC architecture
● Replicated tables
– Each row of the table is stored on all the datanodes where
the table is replicated
● Distributed tables
– Each row exists only on a single datanode
– Distribution strategies
● HASH
● MODULO
● ROUNDROBIN
● User defined functions (TBD)
Distribution strategies
Replicated Table
Writes
write write write
val val2
1 2
2 10
3 4
val val2
1 2
2 10
3 4
val val2
1 2
2 10
3 4
Reads
read
val val2
1 2
2 10
3 4
val val2
1 2
2 10
3 4
val val2
1 2
2 10
3 4
Replicated Tables
● Statement level replication
● Each write needs to be replicated
– writes are costly
● Read can happen on any node (where table is
replicated)
– reads from different coordinators can be routed to different
nodes
● Useful for relatively static tables, with high read load
Distributed Tables
Combiner
Read
read read read
val val2
1 2
2 10
3 4
val val2
11 21
21 101
31 41
val val2
10 20
20 100
30 40
Write
write
val val2
1 2
2 10
3 4
val val2
11 21
21 101
31 41
val val2
10 20
20 100
30 40
Distributed Tables
● Write to a single row is applied only on the node
where the row resides
– Multiple rows can be written in parallel
● Scanning rows spanning across the nodes (e.g. table
scans) can hamper performance
● Point reads and writes based on the distribution
column value show good performance
– Datanode where the operation happens can be identifi ed by
the distribution column value
Distributed query processing in Postgres-
XC
Distributed query processing in Postgres-XC
● Coordinator
– Accepts queries and plans them
– Finds the right data-nodes from where to fetch the data
– Frames queries to be sent to these data-nodes
– Gathers data from data-nodes
– Processes it to get the desired result
● Datanode
– Executes queries from coordinator like PostgreSQL
– Has same capabilities as PostgreSQL
Query processing balance
● Coordinator tries to delegate maximum query
processing to data-nodes
– Indexes are located on datanodes
– Materialization of huge results is avoided in case of sorting,
aggregation, grouping, JOINs etc.
– Coordinator is freed to handle large number of connections
● Distributing data wisely helps coordinator to delegate
maximum query processing and improve performance
● Delegation is often termed as shipping
SQL prompt
Deciding the right distribution strategy
Read-write load on tables
● High point reads (based on distribution column)
– Distributed or replicated
● High read activities but no frequent writes
– Better be replicated
● High point writes
– Better be distributed
● High insert-load, but no frequent update/delete/read
– Better be round-robin
● Find the relations/columns participating in equi-Join
conditions, WHERE clause etc.
– Distribute on those columns
● Find columns participating in GROUP BY, DISTINCT
clauses
– Distribute on those columns
● Find columns/tables which are part of primary key and
foreign key constraints
– Global constraints are not yet supported in XC
– Distribute on those columns
Query analysis (Frequently occuring queries)
Thumb rules
● Infrequently written tables participating in JOINs with
many other tables (Dimension tables)
– Replicated table
● Frequently written tables participating in JOINs with
replicated tables
– Distributed table
● Frequently written tables participating in JOINs with
each other, with equi-JOINing columns of same data
type
– Distribute both of them by the columns participating in JOIN on
same nodes
● Referenced tables
– Better be replicated
DBT-1 schema
C_ID
C_UNAME
C_PASSWD
C_FNAME
C_LNAME
C_ADDR_ID
C_PHONE
C_EMAIL
C_SINCE
C_LAST_VISIT
C_LOGIN
C_EXPIRATION
C_DISCOUNT
C_BALANCE
C_YTD_PMT
C_BIRTHDATE
C_DATA
ADDR_ID
ADDR_STREET1
ADDR_STREET2
ADDR_CITY
ADDR_STATE
ADDR_ZIP
ADDR_CO_ID
ADDR_C_ID
O_ID
O_C_ID
O_DATE
O_SUB_TOTAL
O_TAX
O_TOTAL
O_SHIP_TYPE
O_BILL_ADDR_ID
O_SHIP_ADDR_ID
O_STATUS
CUSTOMER
ADDRESS
ORDERS
OL_ID
OL_O_ID
OL_I_ID
OL_QTY
OL_DISCOUNT
OL_COMMENTS
OL_C_ID
ORDER_LI
NE
I_ID
I_TITLE
I_A_ID
I_PUB_DATE
I_PUBLISHER
I_SUBJECT
I_DESC
I_RELATED1
I_RELATED2
I_RELATED3
I_RELATED4
I_RELATED5
I_THUMBNAIL
I_IMAGE
I_SRP
I_COST
I_AVAIL
I_ISBN
I_PAGE
I_BACKING
I_DIMENASIONS
ITEM
CX_I_ID
CX_TYPE
CX_NUM
CX_NAME
CX_EXPIRY
CX_AUTH_ID
CX_XACT_AMT
CX_XACT_DATE
CX_CO_ID
CX_C_ID
CC_XACTS
OL_ID
OL_O_ID
OL_I_ID
OL_QTY
OL_DISCOUNT
OL_COMMENTS
OL_C_ID
AUTHOR
ST_I_ID
ST_STOCK
STOCK
SC_ID
SC_C_ID
SC_DATE
SC_SUB_TOTAL
SC_TAX
SC_SHIPPING_CO
ST
SC_TOTAL
SC_C_FNAME
SC_C_LNAME
SC_C>DISCOUNT
SHOPPING_CART
SCL_SC_ID
SCL_I_ID
SCL_QTY
SCL_COST
SCL_SRP
SCL_TITLE
SCL_BACKING
SCL_C_ID
SHOPPING_CART_LINE
CO_ID
CO_NAME
CO_EXCHANGE
CO_CURRENCY
COUNTRY
Distributed with
Customer ID
Replicated
Distributed with
ItemID
Distributed with
Shopping Cart
ID
Example DBT-1 (1)
● author, item
– Less frequently written
– Frequently read from
– Author and item are frequently JOINed
● Dimension tables
– Hence replicated on all nodes
Example DBT-1 (2)
● customer, address, orders, order_line, cc_xacts
– Frequently written
● hence distributed
– Participate in JOINs amongst each other with customer_id as
JOIN key
– point SELECTs based on customer_id
● hence diistributed by hash on customer_id so that JOINs
are shippable
– Participate in JOINs with item
● Having item replicated helps pushing JOINs to datanode
Example DBT-1 (3)
● Shopping_cart, shopping_cart_line
– Frequently written
● Hence distributed
– Point selects based on column shopping_cart_id
● Hence distributed by hash on shopping_cart_id
– JOINs with item
● Having item replicated helps
DBT-1 scale-up
● Old data, we will publish
bench-marks for 1.0 soon.
● DBT-1 (TPC-W) benchmark
with some minor
modification to the schema
● 1 server = 1 coordinator + 1
datanode on same machine
● Coordinator is CPU bound
● Datanode is I/O bound
Other scaling tips
Using GTM proxy
● GTM can be a bottleneck
– All nodes get snapshots, transactions ids etc. from GTM
● GTM-proxy helps reduce the load on GTM
– Runs on each physical server
– Caches information about snapshots, transaction ids etc.
– Serves logical nodes on that server
Adding coordinator and datanode
● Coordinator
– Scaling connection load
– Too much load on coordinator
– Query processing mostly happens on coordinator
● Datanode
– Data scalability
● Number of tables grow – new nodes for new
tables/databases
● Distributed table sizes grow – new nodes providing space
for additional data
– Redundancy
Impact of transaction management on performance
● 2PC is used when
– More than one node performs write in a transaction
– Explicit 2PC is used
– More than one node performs write during a single statement
● Only nodes performing writes participate in 2PC
● Design transactions such that they span across as
few nodes as possible.
DBT-2 (sneak peek)
● Like TPC-C
● Early results show 4.3 times scaling with 5 servers
– More details to come ...
Thank you
ashutosh.bapat@enterprisedb.com

More Related Content

PDF
Managing terabytes: When Postgres gets big
PDF
Introduction to Postrges-XC
PDF
Log Structured Merge Tree
PDF
Optimizing RocksDB for Open-Channel SSDs
PPTX
Write behind logging
PDF
TokuDB internals / Лесин Владислав (Percona)
PPTX
Some key value stores using log-structure
PPTX
RocksDB compaction
Managing terabytes: When Postgres gets big
Introduction to Postrges-XC
Log Structured Merge Tree
Optimizing RocksDB for Open-Channel SSDs
Write behind logging
TokuDB internals / Лесин Владислав (Percona)
Some key value stores using log-structure
RocksDB compaction

What's hot (20)

PDF
An Introduction to Apache Cassandra
PDF
Postgres clusters
ODP
Introduction to PostgreSQL
PDF
Cassandra: Open Source Bigtable + Dynamo
PPTX
Latest performance changes by Scylla - Project optimus / Nolimits
PDF
Disperse xlator ramon_datalab
PPTX
Update on OpenTSDB and AsyncHBase
ODP
Drupal MySQL Cluster
PDF
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
ODP
Tiering barcelona
PDF
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
PDF
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
PDF
Challenges with Gluster and Persistent Memory with Dan Lambright
ODP
Sdc challenges-2012
PDF
M|18 Understanding the Architecture of MariaDB ColumnStore
PDF
Postgresql tutorial
PDF
An Overview of Spanner: Google's Globally Distributed Database
PDF
Distributed Postgres
PDF
OpenTSDB: HBaseCon2017
PPTX
Bigdata and Hadoop
An Introduction to Apache Cassandra
Postgres clusters
Introduction to PostgreSQL
Cassandra: Open Source Bigtable + Dynamo
Latest performance changes by Scylla - Project optimus / Nolimits
Disperse xlator ramon_datalab
Update on OpenTSDB and AsyncHBase
Drupal MySQL Cluster
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Tiering barcelona
Performance bottlenecks for metadata workload in Gluster with Poornima Gurusi...
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Challenges with Gluster and Persistent Memory with Dan Lambright
Sdc challenges-2012
M|18 Understanding the Architecture of MariaDB ColumnStore
Postgresql tutorial
An Overview of Spanner: Google's Globally Distributed Database
Distributed Postgres
OpenTSDB: HBaseCon2017
Bigdata and Hadoop
Ad

Viewers also liked (6)

PDF
Postgres-XC as a Key Value Store Compared To MongoDB
PDF
Postgres-XC: Symmetric PostgreSQL Cluster
PDF
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
PDF
Postgres-XC Write Scalable PostgreSQL Cluster
PDF
Materialized views in PostgreSQL
PDF
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Postgres-XC as a Key Value Store Compared To MongoDB
Postgres-XC: Symmetric PostgreSQL Cluster
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Postgres-XC Write Scalable PostgreSQL Cluster
Materialized views in PostgreSQL
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Ad

Similar to Pgxc scalability pg_open2012 (20)

PPTX
A tour of Amazon Redshift
PPTX
Parallel databases
PPTX
PostgreSQL as an Alternative to MSSQL
PPT
Advancedrn
PPT
Advanced databases -client /server arch
PDF
Gcp data engineer
PDF
Cassandra overview
PDF
2017 AWS DB Day | Amazon Redshift 소개 및 실습
PDF
GCP Data Engineer cheatsheet
PDF
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
PPTX
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
PPTX
AWS Redshift Introduction - Big Data Analytics
PPTX
Multivariate algorithms in distributed data processing computing.pptx
PPTX
Multivariate algorithms in distributed data processing computing.pptx
PPTX
ClustrixDB: how distributed databases scale out
PPTX
No sql databases
PDF
Big Data processing with Apache Spark
PPTX
Hadoop and Mapreduce for .NET User Group
PPTX
Megastore by Google
A tour of Amazon Redshift
Parallel databases
PostgreSQL as an Alternative to MSSQL
Advancedrn
Advanced databases -client /server arch
Gcp data engineer
Cassandra overview
2017 AWS DB Day | Amazon Redshift 소개 및 실습
GCP Data Engineer cheatsheet
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
AWS Redshift Introduction - Big Data Analytics
Multivariate algorithms in distributed data processing computing.pptx
Multivariate algorithms in distributed data processing computing.pptx
ClustrixDB: how distributed databases scale out
No sql databases
Big Data processing with Apache Spark
Hadoop and Mapreduce for .NET User Group
Megastore by Google

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Encapsulation theory and applications.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
KodekX | Application Modernization Development
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Modernizing your data center with Dell and AMD
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Monthly Chronicles - July 2025
Digital-Transformation-Roadmap-for-Companies.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
Network Security Unit 5.pdf for BCA BBA.
The Rise and Fall of 3GPP – Time for a Sabbatical?
KodekX | Application Modernization Development
Chapter 3 Spatial Domain Image Processing.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The AUB Centre for AI in Media Proposal.docx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Modernizing your data center with Dell and AMD
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.

Pgxc scalability pg_open2012