SlideShare a Scribd company logo
PostgreSQL
Table Partitioning / Sharding
AmirReza Hashemi
PostgreSQL DataBase
Why PSQL?
● Open Source / Cross platform
● Reliability and Stability
● Extensible
● Designed for high volume environments
● Only PSQL has Inherited Tables
● …..
You work on a project that stores data in a
relational database.
The application gets deployed to production
and early on the performance is great,
selecting data from the database is snappy and
insert latency goes unnoticed.
Here’s a classic scenario.
Whats Problems!!!
Over a time period of days / weeks / months the
database starts to get bigger and queries slow
down.
- A Database Administrator (DBA) will
take a look and see that the database is
tuned.
- They offer suggestions to add certain
indexes,
- Move logging to separate disk partitions,
- Adjust database engine parameters and
verify that the database is healthy.
Potential solutions
This will buy you more time and may resolve
this issues to a degree.
At a certain point you realize the
data in the database is the
bottleneck.
There are various approaches that can help you
make your application and database run faster.
Let’s take a look at two of them:
- Table partitioning
- Sharding
Table Partitioning
The main idea :
You take one MASTER TABLE and split it
into many smaller tables
these smaller tables are called partitions or
child tables.
Table Partitioning
Master Table:
Also referred to as a Master Partition Table, this table is the template child tables are created from. This is a normal
table, but it doesn’t contain any data and requires a trigger.
Child Table:
These tables inherit their structure from the master table and belong to a single master table. The child tables
contain all of the data. These tables are also referred to as Table Partitions.
Partition Function:
A partition function is a Stored Procedure that determines which child table should accept a new record. The
master table has a trigger which calls a partition function.
Table Partitioning
Here’s a summary of what should be done:
- Create a master table
- Create a partition function
- Create a table trigger
Implementation
Constraint exclusion is a query optimization technique that improves performance for partitioned
tables :
SET constraint_exclusion = partition ;
Implementation
Performance Testing On Specified Date
--partition table
SELECT * FROM hashvalue_PT
WHERE hashtime = DATE '2008-08-01'
--non partition table
SELECT * FROM hashvalue WHERE
hashtime = DATE '2008-08-01'
When both contains 200 millions of
data, search on specified date,
partition table is more faster than
non-partition table about 144.45%
Search on specified date
“2008-08-01”
Records Retrieved = 741825
Partition Table = 359.61 seconds
Non Partition Table = 879.062
seconds
Performance Testing On Specified Date
Sharding
Sharding
Sharding is like partitioning. The
difference is that with traditional
partitioning, partitions are stored in
the same database while sharding
shards (partitions) are stored in
different servers.
PostgreSQL does not provide built-in tool for sharding. We will use citus which extends PostgreSQL
capability to do sharding and replication.
Sharding Installation
DB server1: 192.168.56.10 (Master)
DB Server2: 192.168.56.11 (Worker)
- Pkg install pg_citus
- root@DB:~ # grep shared_preload_libraries /var/db/postgres/data96/postgresql.conf
shared_preload_libraries = 'citus' # (change requires restart)
- root@DB:~ # grep listen_addresses /var/db/postgres/data96/postgresql.conf
isten_addresses = '*' # what IP address(es) to listen on;
- Echo “host all all 192.168.56.0/24 trust” >> /var/db/postgres/data96/pg_hba.conf
- service postgresql restart
- ONLY ON MASTER: root@DB:/var/db/postgres/data96 # cat pg_worker_list.conf
192.168.56.11 5432
- service postgresql reload
- postgres=# create extension citus;
CREATE EXTENSION
Sharding Installation
verify that the master is ready:
postgres=# SELECT * FROM master_get_active_worker_nodes();
node_name | node_port
---------------+-----------
192.168.56.11 | 5432
(1 row)
Sharding Installation
Every thing is going fine until now, so we can create on the master the
table to be sharded.
CREATE TABLE sales
(deptno int not null,
deptname varchar(20),
total_amount int,
CONSTRAINT pk_sales PRIMARY KEY (deptno)) ;
We need have inform Citus that data of table sales will be distributed
among MASTER and WORKER:
SELECT master_create_distributed_table('sales', 'deptno', 'hash');
Sharding Installation
In our example we are going to create one shard on each worker. We will
Specify
the table name : sales
total shard count : 2
replication factor : 1 –No replication
SELECT master_create_worker_shards(sales, 2, 1);
Sharding is done
Sharding result
insert into sales (deptno,deptname,total_amount) values (1,'french_dept',10000);
insert into sales (deptno,deptname,total_amount) values (2,'german_dept',15000);
insert into sales (deptno,deptname,total_amount) values (3,'china_dept',21000);
insert into sales (deptno,deptname,total_amount) values (4,'gambia_dept',8750);
insert into sales (deptno,deptname,total_amount) values (5,'japan_dept',12010);
insert into sales (deptno,deptname,total_amount) values (6,'china_dept',35000);
insert into sales (deptno,deptname,total_amount) values (7,'nigeria_dept',10000);
insert into sales (deptno,deptname,total_amount) values (8,'senegal_dept',33000);
Sharding Checking
Slide
Format
Arrange
Tools
Table
Add-ons
Help
All changes saved in Drive
Background...
Layout
Conclusion
Note that not all SQL commands are able to work on inheritance hierarchies. Commands that
are used for data querying, data modification, or schema modification (e.g., SELECT, UPDATE,
DELETE, most variants of ALTER TABLE, but not INSERT or ALTER TABLE ... RENAME) typically
default to including child tables and support the ONLY notation to exclude them. Commands
that do database maintenance and tuning (e.g., REINDEX, VACUUM) typically only work on
individual, physical tables and do not support recursing over inheritance hierarchies. The
respective behavior of each individual command is documented in its reference page (Reference
I, SQL Commands).
A serious limitation of the inheritance feature is that indexes (including unique constraints) and
foreign key constraints only apply to single tables, not to their inheritance children. This is true
on both the referencing and referenced sides of a foreign key constraint.
Conclusion
Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits:
Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single
partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the
heavily-used parts of the indexes fit in memory.
When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that
partition instead of using an index and random access reads scattered across the whole table.
Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE NO
INHERIT and DROP TABLE are both far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.
Seldom-used data can be migrated to cheaper and slower storage media.
The benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning
depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server.
Currently, PostgreSQL supports partitioning via table inheritance. Each partition must be created as a child table of a single parent table. The parent table
itself is normally empty; it exists just to represent the entire data set. You should be familiar with inheritance (see Section 5.9) before attempting to set up
partitioning.
END

More Related Content

PPTX
MySQL Optimizer Overview
PDF
Indexes in postgres
PDF
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PDF
Table partitioning in PostgreSQL + Rails
PDF
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PDF
データベース03 - SQL(CREATE, INSERT, DELETE, UPDATEなど)
PDF
データベース01 - データベースとは
MySQL Optimizer Overview
Indexes in postgres
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Table partitioning in PostgreSQL + Rails
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
データベース03 - SQL(CREATE, INSERT, DELETE, UPDATEなど)
データベース01 - データベースとは

What's hot (20)

ODP
Oracle SQL Advanced
PDF
The InnoDB Storage Engine for MySQL
PDF
PostgreSQL Tutorial For Beginners | Edureka
PPTX
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
PDF
Practical Partitioning in Production with Postgres
 
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PDF
How to use histograms to get better performance
PDF
Faster, better, stronger: The new InnoDB
PDF
PostgreSQLの運用・監視にまつわるエトセトラ
PPTX
Galaxy Big Data with MariaDB
PPTX
PostGreSQL Performance Tuning
PDF
Optimizing Autovacuum: PostgreSQL's vacuum cleaner
PPTX
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
PDF
PostgreSQLでスケールアウト
PDF
MyRocks Deep Dive
PDF
InnoDB Internal
PDF
統計情報のリセットによるautovacuumへの影響について(第39回PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
High Availability PostgreSQL with Zalando Patroni
PDF
Practical Partitioning in Production with Postgres
Oracle SQL Advanced
The InnoDB Storage Engine for MySQL
PostgreSQL Tutorial For Beginners | Edureka
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Practical Partitioning in Production with Postgres
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
How to use histograms to get better performance
Faster, better, stronger: The new InnoDB
PostgreSQLの運用・監視にまつわるエトセトラ
Galaxy Big Data with MariaDB
PostGreSQL Performance Tuning
Optimizing Autovacuum: PostgreSQL's vacuum cleaner
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
PostgreSQLでスケールアウト
MyRocks Deep Dive
InnoDB Internal
統計情報のリセットによるautovacuumへの影響について(第39回PostgreSQLアンカンファレンス@オンライン 発表資料)
High Availability PostgreSQL with Zalando Patroni
Practical Partitioning in Production with Postgres
Ad

Similar to PostgreSQL Table Partitioning / Sharding (20)

PDF
Data Organisation: Table Partitioning in PostgreSQL
PDF
Postgre sql 10 table partitioning
PDF
PostgreSQL - Decoding Partitions
PDF
Partition and conquer large data in PostgreSQL 10
PDF
PGConf.ASIA 2019 Bali - Partitioning in PostgreSQL - Amit Langote
PDF
The Truth About Partitioning
 
PDF
Large Table Partitioning with PostgreSQL and Django
 
PPTX
Tech-Spark: Scaling Databases
PDF
PGConf.ASIA 2019 Bali - How did PostgreSQL Write Load Balancing of Queries Us...
PDF
PostgreSQL Partitioning, PGCon 2007
PDF
PostgreSQL 13 is Coming - Find Out What's New!
 
PPTX
Simple Works Best
 
ODP
Chetan postgresql partitioning
ODP
Chetan postgresql partitioning
PDF
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
PDF
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
PPTX
New and Improved Features in PostgreSQL 13
 
PPTX
vFabric SQLFire Introduction
PDF
Divide and Rule partitioning in pg11
PPTX
Postgres db performance improvements
Data Organisation: Table Partitioning in PostgreSQL
Postgre sql 10 table partitioning
PostgreSQL - Decoding Partitions
Partition and conquer large data in PostgreSQL 10
PGConf.ASIA 2019 Bali - Partitioning in PostgreSQL - Amit Langote
The Truth About Partitioning
 
Large Table Partitioning with PostgreSQL and Django
 
Tech-Spark: Scaling Databases
PGConf.ASIA 2019 Bali - How did PostgreSQL Write Load Balancing of Queries Us...
PostgreSQL Partitioning, PGCon 2007
PostgreSQL 13 is Coming - Find Out What's New!
 
Simple Works Best
 
Chetan postgresql partitioning
Chetan postgresql partitioning
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
New and Improved Features in PostgreSQL 13
 
vFabric SQLFire Introduction
Divide and Rule partitioning in pg11
Postgres db performance improvements
Ad

Recently uploaded (20)

PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PPTX
Digital Literacy And Online Safety on internet
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PPTX
artificial intelligence overview of it and more
PDF
Sims 4 Historia para lo sims 4 para jugar
PPTX
international classification of diseases ICD-10 review PPT.pptx
PPTX
Introuction about WHO-FIC in ICD-10.pptx
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PPTX
introduction about ICD -10 & ICD-11 ppt.pptx
PPTX
SAP Ariba Sourcing PPT for learning material
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PPTX
Internet___Basics___Styled_ presentation
PPTX
Introduction to Information and Communication Technology
PDF
The Internet -By the Numbers, Sri Lanka Edition
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Cloud-Scale Log Monitoring _ Datadog.pdf
Digital Literacy And Online Safety on internet
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
Tenda Login Guide: Access Your Router in 5 Easy Steps
artificial intelligence overview of it and more
Sims 4 Historia para lo sims 4 para jugar
international classification of diseases ICD-10 review PPT.pptx
Introuction about WHO-FIC in ICD-10.pptx
An introduction to the IFRS (ISSB) Stndards.pdf
The New Creative Director: How AI Tools for Social Media Content Creation Are...
introduction about ICD -10 & ICD-11 ppt.pptx
SAP Ariba Sourcing PPT for learning material
Job_Card_System_Styled_lorem_ipsum_.pptx
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
PptxGenJS_Demo_Chart_20250317130215833.pptx
Internet___Basics___Styled_ presentation
Introduction to Information and Communication Technology
The Internet -By the Numbers, Sri Lanka Edition
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...

PostgreSQL Table Partitioning / Sharding

  • 1. PostgreSQL Table Partitioning / Sharding AmirReza Hashemi
  • 3. Why PSQL? ● Open Source / Cross platform ● Reliability and Stability ● Extensible ● Designed for high volume environments ● Only PSQL has Inherited Tables ● …..
  • 4. You work on a project that stores data in a relational database. The application gets deployed to production and early on the performance is great, selecting data from the database is snappy and insert latency goes unnoticed. Here’s a classic scenario. Whats Problems!!! Over a time period of days / weeks / months the database starts to get bigger and queries slow down.
  • 5. - A Database Administrator (DBA) will take a look and see that the database is tuned. - They offer suggestions to add certain indexes, - Move logging to separate disk partitions, - Adjust database engine parameters and verify that the database is healthy. Potential solutions This will buy you more time and may resolve this issues to a degree. At a certain point you realize the data in the database is the bottleneck. There are various approaches that can help you make your application and database run faster. Let’s take a look at two of them: - Table partitioning - Sharding
  • 7. The main idea : You take one MASTER TABLE and split it into many smaller tables these smaller tables are called partitions or child tables. Table Partitioning
  • 8. Master Table: Also referred to as a Master Partition Table, this table is the template child tables are created from. This is a normal table, but it doesn’t contain any data and requires a trigger. Child Table: These tables inherit their structure from the master table and belong to a single master table. The child tables contain all of the data. These tables are also referred to as Table Partitions. Partition Function: A partition function is a Stored Procedure that determines which child table should accept a new record. The master table has a trigger which calls a partition function. Table Partitioning
  • 9. Here’s a summary of what should be done: - Create a master table - Create a partition function - Create a table trigger Implementation Constraint exclusion is a query optimization technique that improves performance for partitioned tables : SET constraint_exclusion = partition ;
  • 11. Performance Testing On Specified Date --partition table SELECT * FROM hashvalue_PT WHERE hashtime = DATE '2008-08-01' --non partition table SELECT * FROM hashvalue WHERE hashtime = DATE '2008-08-01' When both contains 200 millions of data, search on specified date, partition table is more faster than non-partition table about 144.45% Search on specified date “2008-08-01” Records Retrieved = 741825 Partition Table = 359.61 seconds Non Partition Table = 879.062 seconds
  • 12. Performance Testing On Specified Date
  • 14. Sharding Sharding is like partitioning. The difference is that with traditional partitioning, partitions are stored in the same database while sharding shards (partitions) are stored in different servers. PostgreSQL does not provide built-in tool for sharding. We will use citus which extends PostgreSQL capability to do sharding and replication.
  • 15. Sharding Installation DB server1: 192.168.56.10 (Master) DB Server2: 192.168.56.11 (Worker) - Pkg install pg_citus - root@DB:~ # grep shared_preload_libraries /var/db/postgres/data96/postgresql.conf shared_preload_libraries = 'citus' # (change requires restart) - root@DB:~ # grep listen_addresses /var/db/postgres/data96/postgresql.conf isten_addresses = '*' # what IP address(es) to listen on; - Echo “host all all 192.168.56.0/24 trust” >> /var/db/postgres/data96/pg_hba.conf - service postgresql restart - ONLY ON MASTER: root@DB:/var/db/postgres/data96 # cat pg_worker_list.conf 192.168.56.11 5432 - service postgresql reload - postgres=# create extension citus; CREATE EXTENSION
  • 16. Sharding Installation verify that the master is ready: postgres=# SELECT * FROM master_get_active_worker_nodes(); node_name | node_port ---------------+----------- 192.168.56.11 | 5432 (1 row)
  • 17. Sharding Installation Every thing is going fine until now, so we can create on the master the table to be sharded. CREATE TABLE sales (deptno int not null, deptname varchar(20), total_amount int, CONSTRAINT pk_sales PRIMARY KEY (deptno)) ; We need have inform Citus that data of table sales will be distributed among MASTER and WORKER: SELECT master_create_distributed_table('sales', 'deptno', 'hash');
  • 18. Sharding Installation In our example we are going to create one shard on each worker. We will Specify the table name : sales total shard count : 2 replication factor : 1 –No replication SELECT master_create_worker_shards(sales, 2, 1); Sharding is done
  • 19. Sharding result insert into sales (deptno,deptname,total_amount) values (1,'french_dept',10000); insert into sales (deptno,deptname,total_amount) values (2,'german_dept',15000); insert into sales (deptno,deptname,total_amount) values (3,'china_dept',21000); insert into sales (deptno,deptname,total_amount) values (4,'gambia_dept',8750); insert into sales (deptno,deptname,total_amount) values (5,'japan_dept',12010); insert into sales (deptno,deptname,total_amount) values (6,'china_dept',35000); insert into sales (deptno,deptname,total_amount) values (7,'nigeria_dept',10000); insert into sales (deptno,deptname,total_amount) values (8,'senegal_dept',33000);
  • 21. Conclusion Note that not all SQL commands are able to work on inheritance hierarchies. Commands that are used for data querying, data modification, or schema modification (e.g., SELECT, UPDATE, DELETE, most variants of ALTER TABLE, but not INSERT or ALTER TABLE ... RENAME) typically default to including child tables and support the ONLY notation to exclude them. Commands that do database maintenance and tuning (e.g., REINDEX, VACUUM) typically only work on individual, physical tables and do not support recursing over inheritance hierarchies. The respective behavior of each individual command is documented in its reference page (Reference I, SQL Commands). A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children. This is true on both the referencing and referenced sides of a foreign key constraint.
  • 22. Conclusion Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits: Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table. Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE NO INHERIT and DROP TABLE are both far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE. Seldom-used data can be migrated to cheaper and slower storage media. The benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server. Currently, PostgreSQL supports partitioning via table inheritance. Each partition must be created as a child table of a single parent table. The parent table itself is normally empty; it exists just to represent the entire data set. You should be familiar with inheritance (see Section 5.9) before attempting to set up partitioning.
  • 23. END