SlideShare a Scribd company logo
www.postgrespro.ru
B-tree - explore the heart
of PostgreSQL
Anastasia Lubennikova
Objectives
● Inspect internals of B-tree index
● Present new features
● Understand difficulties of development
● Clarify our roadmap
Important notes about indexes
● All indexes are secondary
● Index files are divided into standard-size pages
● Page layout is defined by Access Method
● Indexes store TIDs of heap tuples
● There is no visibility information in indexes
● But we have visibility map and LP_DEAD flag
● VACUUM removes dead tuples
● Update = insert
● except HOT-updates
B+ tree
● Tree is balanced
● Root node and inner nodes contain keys and
pointers to lower level nodes
● Leaf nodes contain keys and pointers to the heap
● All keys are sorted inside the node
Lehman & Yao Algorithm
✔ Right-link pointer to the page's right sibling
✔ "High key" - upper bound on the keys that are
allowed on that page
✔ Assume that we can fit at least three items per
page
✗ Assume that all btree keys are unique
✗ Assume fixed size keys
✗ Assume that in-memory copies of tree pages are
unshared
PostgreSQL B-tree implementation
● Left-link pointer to the page's left sibling
• For backward scan purposes
● Nonunique keys
• Use link field to check tuples equality
• Search must descend to the left subtree
• Tuples in b-tree are not ordered by TID
● Variable-size keys
• #define MaxIndexTuplesPerPage 
((int) ((BLCKSZ - SizeOfPageHeaderData) / 
(MAXALIGN(sizeof(IndexTupleData) + 1) + sizeof(ItemIdData))))
● Pages are shared
• Page-level read locking is required
Meta page
● Page zero of every btree is a meta-data page
typedef struct BTMetaPageData
{
uint32 btm_magic; /* should contain BTREE_MAGIC */
uint32 btm_version; /* should contain BTREE_VERSION */
BlockNumber btm_root; /* current root location */
uint32 btm_level; /* tree level of the root page */
BlockNumber btm_fastroot; /* current "fast" root location */
uint32 btm_fastlevel; /* tree level of the "fast" root page */
} BTMetaPageData;
Page layout
typedef struct BTPageOpaqueData
{
BlockNumber btpo_prev;
BlockNumber btpo_next;
union
{
uint32 level;
TransactionId xact;
} btpo;
uint16 btpo_flags;
BTCycleId btpo_cycleid;
} BTPageOpaqueData;
Unique and Primary Key
B-tree index
● 6.0 Add UNIQUE index capability (Dan McGuirk)
CREATE UNIQUE INDEX ON tbl (a);
● 6.3 Implement SQL92 PRIMARY KEY and
UNIQUE clauses using indexes (Thomas
Lockhart)
CREATE TABLE tbl (int a, PRIMARY KEY(a));
System and TOAST indexes
● System and TOAST indexes
postgres=# select count(*) from pg_indexes where
schemaname='pg_catalog';
count
-------
103
Fast index build
● Uses tuplesort.c to sort given tuples
● Loads tuples into leaf-level pages
● Builds index from the bottom up
● There is a trick for support of high-key
Multicolumn B-tree index
● 6.1 multicolumn btree indexes (Vadim Mikheev)
CREATE INDEX ON tbl (a, b);
Expressional B-tree index
CREATE INDEX people_names
ON people ((first_name || ' ' || last_name));
Partial B-tree index
● 7.2 Enable partial indexes (Martijn van Oosterhout)
CREATE INDEX ON tbl (a) WHERE a%2 = 0;
On-the-fly deletion (microvacuum)
● 8.2 Remove dead index entries before B-Tree page
split (Junji Teramoto)
● Also known as microvacuum
● Mark tuples as dead when reading table.
● Then remove them while performing insertion before
the page split.
HOT-updates
● 8.3 Heap-Only Tuples (HOT) accelerate space reuse
for most UPDATEs and DELETEs (Pavan Deolasee,
with ideas from many others)
HOT-updates
UPDATE tbl SET id = 16 WHERE id = 15;
HOT-updates
UPDATE tbl SET data = K WHERE id = 16;
HOT-updates
UPDATE tbl SET data = K WHERE id = 16;
Index-only scans
● 9.2 index-only scans Allow queries to retrieve data
only from indexes, avoiding heap access (Robert
Haas, Ibrar Ahmed, Heikki Linnakangas, Tom
Lane)
Covering indexes
Index with included columns
Index with included columns
CREATE UNIQUE INDEX ON tbl (a) INCLUDING (b);
CREATE INDEX ON tbl (a) INCLUDING (b);
CREATE TABLE tbl (c1 int, c2 int, c3 int, c4 box);
CREATE UNIQUE INDEX tbl_idx_unique ON tbl
using btree(c1, c2) INCLUDING(c3,c4);
ALTER TABLE tbl add UNIQUE USING INDEX
tbl_idx_unique;
CREATE TABLE tbl (c1 int, c2 int, c3 int, c4 box);
ALTER TABLE tbl add UNIQUE(c1,c2) INCLUDING(c3,c4);
CREATE TABLE tbl(c1 int,c2 int, c3 int, c4 box,
UNIQUE(c1,c2)INCLUDING(c3,c4));
Catalog changes
CREATE TABLE tbl
(c1 int,c2 int, c3 int, c4 box,
PRIMARY KEY(c1,c2)INCLUDING(c3,c4));
pg_class
indexrelid | indnatts | indnkeyatts | indkey | indclass
------------+----------+-------------+---------+----------
tbl_pkey | 4 | 2 | 1 2 3 4 | 1978 1978
pg_constraint
conname | conkey | conincluding
----------+--------+-------------
tbl_pkey | {1,2} | {3,4}
Index with included columns
● Indexes maintenace overhead decreased
• Size is smaller
• Inserts are faster
● Index can contain data with no suitable opclass
● Any column deletion leads to index deletion
● HOT-updates do not work for indexed data
Effective storage of duplicates
Effective storage of duplicates
Effective storage of duplicates
Difficulties of development
● Complicated code
● It's a crucial subsystem that can badly break
everything including system catalog
● Few active developers and experts in this area
● Few instruments for developers
pageinspect
postgres=# select bt.blkno, bt.type, bt.live_items,
bt.free_size, bt.btpo_prev, bt.btpo_next
from generate_series(1,pg_relation_size('idx')/8192 - 1) as n,
lateral bt_page_stats('idx', n::int) as bt;
blkno | type | live_items | free_size | btpo_prev | btpo_next
-------+------+------------+-----------+-----------+-----------
1 | l | 367 | 808 | 0 | 2
2 | l | 235 | 3448 | 1 | 0
3 | r | 2 | 8116 | 0 | 0
pageinspect
postgres=# select * from bt_page_items('idx', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+--------+---------+-------+------+-------------------------
1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
2 | (0,2) | 16 | f | f | 01 00 00 00 01 00 00 00
3 | (0,3) | 16 | f | f | 01 00 00 00 02 00 00 00
4 | (0,4) | 16 | f | f | 01 00 00 00 03 00 00 00
5 | (0,5) | 16 | f | f | 01 00 00 00 04 00 00 00
6 | (0,6) | 16 | f | f | 01 00 00 00 05 00 00 00
7 | (0,7) | 16 | f | f | 01 00 00 00 06 00 00 00
8 | (0,8) | 16 | f | f | 01 00 00 00 07 00 00 00
9 | (0,9) | 16 | f | f | 01 00 00 00 08 00 00 00
10 | (0,10) | 16 | f | f | 01 00 00 00 09 00 00 00
11 | (0,11) | 16 | f | f | 01 00 00 00 0a 00 00 00
pageinspect
postgres=# select blkno, bt.itemoffset, bt.ctid, bt.itemlen, bt.data
from generate_series(1,pg_relation_size('idx')/8192 - 1) as blkno,
lateral bt_page_items('idx', blkno::int) as bt where itemoffset <3;
blkno | itemoffset | ctid | itemlen | data
-------+------------+---------+---------+-------------------------
1 | 1 | (1,141) | 16 | 01 00 00 00 6e 01 00 00
1 | 2 | (0,1) | 16 | 01 00 00 00 00 00 00 00
2 | 1 | (1,141) | 16 | 01 00 00 00 6e 01 00 00
2 | 2 | (1,142) | 16 | 01 00 00 00 6f 01 00 00
3 | 1 | (1,1) | 8 |
3 | 2 | (2,1) | 16 | 01 00 00 00 6e 01 00 00
amcheck
● bt_index_check(index regclass);
● bt_index_parent_check(index regclass);
What do we need?
● Index compression
● Compression of leading columns
● Page compression
● Index-Organized-Tables (primary indexes)
● Users don't want to store data twice
● KNN for B-tree
● Nice task for beginners
● Batch update of indexes
● Indexes on partitioned tables
● pg_pathman
● Global indexes
● Global Partitioned Indexes
www.postgrespro.ru
Thanks for attention!
Any questions?
a.lubennikova@postgrespro.ru

More Related Content

PDF
MySQL勉強会 クエリチューニング編
PDF
Inside vacuum - 第一回PostgreSQLプレ勉強会
PPTX
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
PDF
Nazoki
PDF
PostgreSQLのSQL処理の流れとMVCC (第48回 PostgreSQLアンカンファレンス 発表資料)
PDF
PostgreSQLコミュニティに飛び込もう
PDF
Fokker–Planck equation and DPD simulations
PDF
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...
MySQL勉強会 クエリチューニング編
Inside vacuum - 第一回PostgreSQLプレ勉強会
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
Nazoki
PostgreSQLのSQL処理の流れとMVCC (第48回 PostgreSQLアンカンファレンス 発表資料)
PostgreSQLコミュニティに飛び込もう
Fokker–Planck equation and DPD simulations
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...

What's hot (20)

PDF
Wait! What’s going on inside my database?
PDF
pg_walinspectについて調べてみた!(第37回PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
Programas en lenguaje ensamblador
PDF
[Pgday.Seoul 2017] 7. PostgreSQL DB Tuning 기업사례 - 송춘자
PPTX
MongoDB - Sharded Cluster Tutorial
PDF
なぜベイズ統計はリスク分析に向いているのか? その哲学上および実用上の理由
PPTX
PostgreSQL14の pg_stat_statements 改善(第23回PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
MySQLレプリケーションあれやこれや
PPTX
Radix sort
PDF
NTT DATA と PostgreSQL が挑んだ総力戦
PDF
How the Postgres Query Optimizer Works
 
PDF
PostgreSQL WAL for DBAs
PPTX
Yzm 2116 Bölüm 6 - Sıralama ve Arama
PDF
Webinar: PostgreSQL continuous backup and PITR with Barman
PDF
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
PPT
整数列圧縮
PPTX
Succinct Data Structure for Analyzing Document Collection
PPT
Bab 5 linked list
PDF
PostgreSQLの関数属性を知ろう
PPTX
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
Wait! What’s going on inside my database?
pg_walinspectについて調べてみた!(第37回PostgreSQLアンカンファレンス@オンライン 発表資料)
Programas en lenguaje ensamblador
[Pgday.Seoul 2017] 7. PostgreSQL DB Tuning 기업사례 - 송춘자
MongoDB - Sharded Cluster Tutorial
なぜベイズ統計はリスク分析に向いているのか? その哲学上および実用上の理由
PostgreSQL14の pg_stat_statements 改善(第23回PostgreSQLアンカンファレンス@オンライン 発表資料)
MySQLレプリケーションあれやこれや
Radix sort
NTT DATA と PostgreSQL が挑んだ総力戦
How the Postgres Query Optimizer Works
 
PostgreSQL WAL for DBAs
Yzm 2116 Bölüm 6 - Sıralama ve Arama
Webinar: PostgreSQL continuous backup and PITR with Barman
Memoizeの仕組み(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
整数列圧縮
Succinct Data Structure for Analyzing Document Collection
Bab 5 linked list
PostgreSQLの関数属性を知ろう
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
Ad

Viewers also liked (20)

PDF
Олег Бартунов, Федор Сигаев, Александр Коротков (PostgreSQL)
PPT
SAMag2007 Conference: PostgreSQL 8.3 presentation
PPT
demo2.ppt
PDF
Instructivo hidrología unamba
PDF
Destroying Router Security
PPT
Kpi key
PPTX
Making Awesome Experiences with Biosensors
TXT
Andre Silva Campos P (9)
PPTX
Global warming
TXT
P (1)ele escolheu você
PPT
Kpi indicator
DOC
Survival Packet
PDF
Архитектура и новые возможности B-tree
PPT
Kpi process
PDF
Historia de los_satelites_de_comunicaciones._bit_134._5c6c417a
PDF
Page compression. PGCON_2016
DOC
2011-2012 AmLit Syllabus
PPTX
SalesRev - Boost Your Conversions and Generate More Business Without Spending...
PPT
Kpi google analytics
Олег Бартунов, Федор Сигаев, Александр Коротков (PostgreSQL)
SAMag2007 Conference: PostgreSQL 8.3 presentation
demo2.ppt
Instructivo hidrología unamba
Destroying Router Security
Kpi key
Making Awesome Experiences with Biosensors
Andre Silva Campos P (9)
Global warming
P (1)ele escolheu você
Kpi indicator
Survival Packet
Архитектура и новые возможности B-tree
Kpi process
Historia de los_satelites_de_comunicaciones._bit_134._5c6c417a
Page compression. PGCON_2016
2011-2012 AmLit Syllabus
SalesRev - Boost Your Conversions and Generate More Business Without Spending...
Kpi google analytics
Ad

Similar to Btree. Explore the heart of PostgreSQL. (20)

PDF
PgconfSV compression
PDF
MySQL innoDB split and merge pages
PDF
Введение в современную PostgreSQL. Часть 2
PDF
Grokking TechTalk #20: PostgreSQL Internals 101
PPT
Myth busters - performance tuning 101 2007
PPTX
Postgres indexes
PDF
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
PDF
How to teach an elephant to rock'n'roll
PDF
Explain this!
PDF
In-core compression: how to shrink your database size in several times
PDF
MySQL 5.7 Tutorial Dutch PHP Conference 2015
PDF
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
PDF
MariaDB: Engine Independent Table Statistics, including histograms
PPT
linkedLists.ppt presentation on the topic
PPT
Introduction to linked Lists in data structure.ppt
PPT
linkedLists.ppt
PDF
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
PDF
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
PDF
TYPO3 6.1. What's new
PDF
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
PgconfSV compression
MySQL innoDB split and merge pages
Введение в современную PostgreSQL. Часть 2
Grokking TechTalk #20: PostgreSQL Internals 101
Myth busters - performance tuning 101 2007
Postgres indexes
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
How to teach an elephant to rock'n'roll
Explain this!
In-core compression: how to shrink your database size in several times
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MariaDB: Engine Independent Table Statistics, including histograms
linkedLists.ppt presentation on the topic
Introduction to linked Lists in data structure.ppt
linkedLists.ppt
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
TYPO3 6.1. What's new
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...

More from Anastasia Lubennikova (9)

PDF
Advanced backup methods (Postgres@CERN)
PDF
Hacking PostgreSQL. Локальная память процессов. Контексты памяти.
PDF
Hacking PostgreSQL. Разделяемая память и блокировки.
ODP
Hacking PostgreSQL. Физическое представление данных
PDF
Hacking PostgreSQL. Обзор исходного кода
PDF
Расширения для PostgreSQL
PDF
Hacking PostgreSQL. Обзор архитектуры.
PDF
Indexes don't mean slow inserts.
PDF
Советы для начинающих разработчиков PostgreSQL
Advanced backup methods (Postgres@CERN)
Hacking PostgreSQL. Локальная память процессов. Контексты памяти.
Hacking PostgreSQL. Разделяемая память и блокировки.
Hacking PostgreSQL. Физическое представление данных
Hacking PostgreSQL. Обзор исходного кода
Расширения для PostgreSQL
Hacking PostgreSQL. Обзор архитектуры.
Indexes don't mean slow inserts.
Советы для начинающих разработчиков PostgreSQL

Recently uploaded (20)

PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
medical staffing services at VALiNTRY
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Nekopoi APK 2025 free lastest update
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Essential Infomation Tech presentation.pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
How to Choose the Right IT Partner for Your Business in Malaysia
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
medical staffing services at VALiNTRY
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
wealthsignaloriginal-com-DS-text-... (1).pdf
Nekopoi APK 2025 free lastest update
Odoo POS Development Services by CandidRoot Solutions
Understanding Forklifts - TECH EHS Solution
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Softaken Excel to vCard Converter Software.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Design an Analysis of Algorithms II-SECS-1021-03
Essential Infomation Tech presentation.pptx
Odoo Companies in India – Driving Business Transformation.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Operating system designcfffgfgggggggvggggggggg
Upgrade and Innovation Strategies for SAP ERP Customers
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...

Btree. Explore the heart of PostgreSQL.

  • 1. www.postgrespro.ru B-tree - explore the heart of PostgreSQL Anastasia Lubennikova
  • 2. Objectives ● Inspect internals of B-tree index ● Present new features ● Understand difficulties of development ● Clarify our roadmap
  • 3. Important notes about indexes ● All indexes are secondary ● Index files are divided into standard-size pages ● Page layout is defined by Access Method ● Indexes store TIDs of heap tuples ● There is no visibility information in indexes ● But we have visibility map and LP_DEAD flag ● VACUUM removes dead tuples ● Update = insert ● except HOT-updates
  • 4. B+ tree ● Tree is balanced ● Root node and inner nodes contain keys and pointers to lower level nodes ● Leaf nodes contain keys and pointers to the heap ● All keys are sorted inside the node
  • 5. Lehman & Yao Algorithm ✔ Right-link pointer to the page's right sibling ✔ "High key" - upper bound on the keys that are allowed on that page ✔ Assume that we can fit at least three items per page ✗ Assume that all btree keys are unique ✗ Assume fixed size keys ✗ Assume that in-memory copies of tree pages are unshared
  • 6. PostgreSQL B-tree implementation ● Left-link pointer to the page's left sibling • For backward scan purposes ● Nonunique keys • Use link field to check tuples equality • Search must descend to the left subtree • Tuples in b-tree are not ordered by TID ● Variable-size keys • #define MaxIndexTuplesPerPage ((int) ((BLCKSZ - SizeOfPageHeaderData) / (MAXALIGN(sizeof(IndexTupleData) + 1) + sizeof(ItemIdData)))) ● Pages are shared • Page-level read locking is required
  • 7. Meta page ● Page zero of every btree is a meta-data page typedef struct BTMetaPageData { uint32 btm_magic; /* should contain BTREE_MAGIC */ uint32 btm_version; /* should contain BTREE_VERSION */ BlockNumber btm_root; /* current root location */ uint32 btm_level; /* tree level of the root page */ BlockNumber btm_fastroot; /* current "fast" root location */ uint32 btm_fastlevel; /* tree level of the "fast" root page */ } BTMetaPageData;
  • 8. Page layout typedef struct BTPageOpaqueData { BlockNumber btpo_prev; BlockNumber btpo_next; union { uint32 level; TransactionId xact; } btpo; uint16 btpo_flags; BTCycleId btpo_cycleid; } BTPageOpaqueData;
  • 9. Unique and Primary Key B-tree index ● 6.0 Add UNIQUE index capability (Dan McGuirk) CREATE UNIQUE INDEX ON tbl (a); ● 6.3 Implement SQL92 PRIMARY KEY and UNIQUE clauses using indexes (Thomas Lockhart) CREATE TABLE tbl (int a, PRIMARY KEY(a));
  • 10. System and TOAST indexes ● System and TOAST indexes postgres=# select count(*) from pg_indexes where schemaname='pg_catalog'; count ------- 103
  • 11. Fast index build ● Uses tuplesort.c to sort given tuples ● Loads tuples into leaf-level pages ● Builds index from the bottom up ● There is a trick for support of high-key
  • 12. Multicolumn B-tree index ● 6.1 multicolumn btree indexes (Vadim Mikheev) CREATE INDEX ON tbl (a, b);
  • 13. Expressional B-tree index CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
  • 14. Partial B-tree index ● 7.2 Enable partial indexes (Martijn van Oosterhout) CREATE INDEX ON tbl (a) WHERE a%2 = 0;
  • 15. On-the-fly deletion (microvacuum) ● 8.2 Remove dead index entries before B-Tree page split (Junji Teramoto) ● Also known as microvacuum ● Mark tuples as dead when reading table. ● Then remove them while performing insertion before the page split.
  • 16. HOT-updates ● 8.3 Heap-Only Tuples (HOT) accelerate space reuse for most UPDATEs and DELETEs (Pavan Deolasee, with ideas from many others)
  • 17. HOT-updates UPDATE tbl SET id = 16 WHERE id = 15;
  • 18. HOT-updates UPDATE tbl SET data = K WHERE id = 16;
  • 19. HOT-updates UPDATE tbl SET data = K WHERE id = 16;
  • 20. Index-only scans ● 9.2 index-only scans Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)
  • 23. Index with included columns CREATE UNIQUE INDEX ON tbl (a) INCLUDING (b); CREATE INDEX ON tbl (a) INCLUDING (b); CREATE TABLE tbl (c1 int, c2 int, c3 int, c4 box); CREATE UNIQUE INDEX tbl_idx_unique ON tbl using btree(c1, c2) INCLUDING(c3,c4); ALTER TABLE tbl add UNIQUE USING INDEX tbl_idx_unique; CREATE TABLE tbl (c1 int, c2 int, c3 int, c4 box); ALTER TABLE tbl add UNIQUE(c1,c2) INCLUDING(c3,c4); CREATE TABLE tbl(c1 int,c2 int, c3 int, c4 box, UNIQUE(c1,c2)INCLUDING(c3,c4));
  • 24. Catalog changes CREATE TABLE tbl (c1 int,c2 int, c3 int, c4 box, PRIMARY KEY(c1,c2)INCLUDING(c3,c4)); pg_class indexrelid | indnatts | indnkeyatts | indkey | indclass ------------+----------+-------------+---------+---------- tbl_pkey | 4 | 2 | 1 2 3 4 | 1978 1978 pg_constraint conname | conkey | conincluding ----------+--------+------------- tbl_pkey | {1,2} | {3,4}
  • 25. Index with included columns ● Indexes maintenace overhead decreased • Size is smaller • Inserts are faster ● Index can contain data with no suitable opclass ● Any column deletion leads to index deletion ● HOT-updates do not work for indexed data
  • 26. Effective storage of duplicates
  • 27. Effective storage of duplicates
  • 28. Effective storage of duplicates
  • 29. Difficulties of development ● Complicated code ● It's a crucial subsystem that can badly break everything including system catalog ● Few active developers and experts in this area ● Few instruments for developers
  • 30. pageinspect postgres=# select bt.blkno, bt.type, bt.live_items, bt.free_size, bt.btpo_prev, bt.btpo_next from generate_series(1,pg_relation_size('idx')/8192 - 1) as n, lateral bt_page_stats('idx', n::int) as bt; blkno | type | live_items | free_size | btpo_prev | btpo_next -------+------+------------+-----------+-----------+----------- 1 | l | 367 | 808 | 0 | 2 2 | l | 235 | 3448 | 1 | 0 3 | r | 2 | 8116 | 0 | 0
  • 31. pageinspect postgres=# select * from bt_page_items('idx', 1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00 2 | (0,2) | 16 | f | f | 01 00 00 00 01 00 00 00 3 | (0,3) | 16 | f | f | 01 00 00 00 02 00 00 00 4 | (0,4) | 16 | f | f | 01 00 00 00 03 00 00 00 5 | (0,5) | 16 | f | f | 01 00 00 00 04 00 00 00 6 | (0,6) | 16 | f | f | 01 00 00 00 05 00 00 00 7 | (0,7) | 16 | f | f | 01 00 00 00 06 00 00 00 8 | (0,8) | 16 | f | f | 01 00 00 00 07 00 00 00 9 | (0,9) | 16 | f | f | 01 00 00 00 08 00 00 00 10 | (0,10) | 16 | f | f | 01 00 00 00 09 00 00 00 11 | (0,11) | 16 | f | f | 01 00 00 00 0a 00 00 00
  • 32. pageinspect postgres=# select blkno, bt.itemoffset, bt.ctid, bt.itemlen, bt.data from generate_series(1,pg_relation_size('idx')/8192 - 1) as blkno, lateral bt_page_items('idx', blkno::int) as bt where itemoffset <3; blkno | itemoffset | ctid | itemlen | data -------+------------+---------+---------+------------------------- 1 | 1 | (1,141) | 16 | 01 00 00 00 6e 01 00 00 1 | 2 | (0,1) | 16 | 01 00 00 00 00 00 00 00 2 | 1 | (1,141) | 16 | 01 00 00 00 6e 01 00 00 2 | 2 | (1,142) | 16 | 01 00 00 00 6f 01 00 00 3 | 1 | (1,1) | 8 | 3 | 2 | (2,1) | 16 | 01 00 00 00 6e 01 00 00
  • 33. amcheck ● bt_index_check(index regclass); ● bt_index_parent_check(index regclass);
  • 34. What do we need? ● Index compression ● Compression of leading columns ● Page compression ● Index-Organized-Tables (primary indexes) ● Users don't want to store data twice ● KNN for B-tree ● Nice task for beginners ● Batch update of indexes ● Indexes on partitioned tables ● pg_pathman ● Global indexes ● Global Partitioned Indexes
  • 35. www.postgrespro.ru Thanks for attention! Any questions? a.lubennikova@postgrespro.ru