SlideShare a Scribd company logo
Inside PostgreSQL Shared Memory

                                  BRUCE MOMJIAN,
                                   ENTERPRISEDB

                                       January, 2009




                                Abstract
   POSTGRESQL is an open-source, full-featured relational database.
   This presentation gives an overview of the shared memory
   structures used by Postgres.

Creative Commons Attribution License                   http://momjian.us/presentations
Outline



1. File storage format

2. Shared memory creation

3. Shared buffers

4. Row value access

5. Locking

6. Other structures



Inside PostgreSQL Shared Memory             1
File System /data



                                  Postgres           /data


                                  Postgres



                                  Postgres



Inside PostgreSQL Shared Memory                              2
File System /data/base



          Postgres                       /data     /base

                                                   /global
          Postgres                                 /pg_clog
                                                   /pg_multixact
                                                   /pg_subtrans
          Postgres                                 /pg_tblspc
                                                   /pg_twophase
                                                   /pg_xlog
Inside PostgreSQL Shared Memory                                    3
File System /data/base/db



         Postgres                 /data   /base /16385 (production)

                                                /1 (template1)
         Postgres                               /16821 (test)
                                                /17982 (devel)
                                                /21452 (marketing)
         Postgres




Inside PostgreSQL Shared Memory                                       4
File System /data/base/db/table



         Postgres                 /data   /base /16385   /24692 (customer)

                                                         /27214 (order)
         Postgres                                        /25932 (product)
                                                         /25952 (employee)
                                                         /27839 (part)
         Postgres




Inside PostgreSQL Shared Memory                                              5
File System Data Pages



          Postgres                   /data   /base /16385   /24692
                                                            8k   8k   8k   8k



          Postgres



          Postgres




Inside PostgreSQL Shared Memory                                                 6
Data Pages


               Postgres           /data   /base /16385                    /24692
                                                                          8k   8k      8k   8k



               Postgres



               Postgres

                                          Page Header   Item   Item   Item




                                    8K


                                                                               Tuple

                                              Tuple               Tuple             Special
Inside PostgreSQL Shared Memory                                                                  7
File System Block Tuple


               Postgres               /data   /base /16385                    /24692
                                                                              8k   8k      8k   8k



               Postgres

                                              Page Header   Item   Item   Item

               Postgres


                                        8K


                                                                                   Tuple

                                                  Tuple               Tuple             Special



                                                                      Tuple

Inside PostgreSQL Shared Memory                                                                      8
File System Tuple

                                                int4in(’9241’)               ’Martin’
                           Tuple


                                                                             textout()


                                Header               Value   Value   Value     Value     Value   Value




                    OID − object id of tuple (optional)

                    xmin − creation transaction id

                    xmax − destruction transaction id

                    cmin − creation command id

                    cmax − destruction command id

                    ctid − tuple id (page / item)

                    natts − number of attributes

                    infomask − tuple flags

                    hoff − length of tuple header

                    bits − bit map representing NULLs

Inside PostgreSQL Shared Memory                                                                          9
Tuple Header C Structures
        typedef struct HeapTupleFields
        {
            TransactionId t_xmin;         /* inserting xact ID */
            TransactionId t_xmax;         /* deleting or locking xact ID */

            union
            {
                CommandId   t_cid;        /* inserting or deleting command ID, or both */
                TransactionId t_xvac;     /* VACUUM FULL xact ID */
            }           t_field3;
        } HeapTupleFields;

        typedef struct HeapTupleHeaderData
        {
            union
            {
                HeapTupleFields t_heap;
                DatumTupleFields t_datum;
            }           t_choice;

            ItemPointerData t_ctid;       /* current TID of this or newer tuple */

            /* Fields below here must match MinimalTupleData! */

            uint16       t_infomask2;     /* number of attributes + various flags */
            uint16       t_infomask;      /* various flag bits, see below */

            uint8        t_hoff;          /* sizeof header incl. bitmap, padding */

            /* ^ − 23 bytes − ^ */

            bits8        t_bits[1];       /* bitmap of NULLs −− VARIABLE LENGTH */
               /* MORE DATA FOLLOWS AT END OF STRUCT */
          } HeapTupleHeaderData;
Inside PostgreSQL Shared Memory                                                             10
Shared Memory Creation

                                              k()
                                           for
                            postmaster                postgres         postgres




                          Program (Text)            Program (Text)   Program (Text)




                              Data                      Data             Data




                          Shared Memory             Shared Memory    Shared Memory




                              Stack                     Stack            Stack


Inside PostgreSQL Shared Memory                                                       11
Shared Memory



             PROC                  Lightweight Locks    XLOG Buffers
             Proc Array            Lock Hashes          CLOG Buffers
                                   LOCK                 Subtrans Buffers
             Auto Vacuum           PROCLOCK             Two−Phase Structs
             Btree Vacuum                               Multi−XACT Buffers
             Free Space Map        Statistics
             Background Writer     Synchronized Scan    Shared Invalidation


             Buffer Descriptors

                                       Shared Buffers




                                        Semaphores
Inside PostgreSQL Shared Memory                                               12
Shared Buffers


                         Buffer Descriptors                         Pin Count − prevent page replacement

                                                                    LWLock − for page changes




                           8k                8k               8k
                                                                         Shared Buffers



                                                                                     read()


             Page Header   Item   Item   Item
                                                                                                                 write()

                                                                                   Postgres     /data /base /16385 /24692
        8K
                                                                                                                    8k 8k 8k 8k

                                                Tuple
                                                                                   Postgres
                 Tuple               Tuple          Special




                                                                                   Postgres

Inside PostgreSQL Shared Memory                                                                                                   13
HeapTuples


                                             8k                              8k                       8k
                                                                                                             Shared Buffers




                        Page Header            Item        Item       Item




        8K


                                                                                    Tuple

                                  Tuple                       Tuple                      Special




                                                                                                                HeapTuple
                                          int4in(’9241’)                  ’Martin’
                    Tuple


                                                                          textout()


                         Header               Value   Value       Value     Value     Value   Value                           Postgres
                                                                                                                 C pointer
             OID − object id of tuple (optional)

             xmin − creation transaction id

             xmax − destruction transaction id

             cmin − creation command id

             cmax − destruction command id

             ctid − tuple id (page / item)

             natts − number of attributes

             infomask − tuple flags

             hoff − length of tuple header

             bits − bit map representing NULLs



Inside PostgreSQL Shared Memory                                                                                                          14
Finding A Tuple Value in C
               Datum
               nocachegetattr(HeapTuple tuple,
                              int attnum,
                              TupleDesc tupleDesc,
                              bool *isnull)
               {
                   HeapTupleHeader tup = tuple−>t_data;
                   Form_pg_attribute *att = tupleDesc−>attrs;
                   {
                       int           i;
                       /*
                        * Note − This loop is a little tricky. For each non−null attribute,
                        * we have to first account for alignment padding before the attr,
                        * then advance over the attr based on its length. Nulls have no
                        * storage and no alignment padding either. We can use/set
                        * attcacheoff until we reach either a null or a var−width attribute.
                        */
                       off = 0;
                       for (i = 0;; i++)       /* loop exit is at "break" */
                       {
                           if (HeapTupleHasNulls(tuple) && att_isnull(i, bp))
                                continue;      /* this cannot be the target att */
                             if (att[i]−>attlen == −1)
                                 off = att_align_pointer(off, att[i]−>attalign, −1,
                                                          tp + off);
                             else
                                  /* not varlena, so safe to use att_align_nominal */
                                  off = att_align_nominal(off, att[i]−>attalign);
                             if (i == attnum)
                                 break;
                             off = att_addlength_pointer(off, att[i]−>attlen, tp + off);
                       }
                   }
                   return fetchatt(att[attnum], tp + off);
               }
Inside PostgreSQL Shared Memory                                                                15
Value Access in C

       #define fetch_att(T,attbyval,attlen) 
       ( 
           (attbyval) ? 
           ( 
               (attlen) == (int) sizeof(int32) ? 
                   Int32GetDatum(*((int32 *)(T))) 
               : 
               ( 
                   (attlen) == (int) sizeof(int16) ? 
                       Int16GetDatum(*((int16 *)(T))) 
                   : 
                   ( 
                       AssertMacro((attlen) == 1), 
                       CharGetDatum(*((char *)(T))) 
                   ) 
               ) 
           ) 
           : 
           PointerGetDatum((char *) (T)) 
       )
Inside PostgreSQL Shared Memory                           16
Test And Set Lock
                                  Can Succeed Or Fail



                                  1                     1




                                          0/1




                                  0                     1
                              Success                Failure
                      Was 0 on exchange         Was 1 on exchange
                                                Lock already taken
Inside PostgreSQL Shared Memory                                      17
Test And Set Lock
                                    x86 Assembler

static __inline__ int
tas(volatile slock_t *lock)
{
    register slock_t _res = 1;
     /*
      * Use a non−locking test before asserting the bus lock. Note that the
      * extra test appears to be a small loss on some x86 platforms and a small
      * win on others; it’s by no means clear that we should keep it.
      */
     __asm__ __volatile__(
         "   cmpb    $0,%1   n"
         "   jne     1f      n"
         "   lock            n"
         "   xchgb   %0,%1   n"
         "1: n"
:        "+q"(_res), "+m"(*lock)
:
:        "memory", "cc");
     return (int) _res;
}




Inside PostgreSQL Shared Memory                                              18
Spin Lock
                                      Always Succeeds


                                  1                        1




                                             0/1          Sleep of increasing duration




                                  0                        1
                              Success                   Failure
                         Was 0 on exchange         Was 1 on exchange
                                                   Lock already taken




Spinlocks are designed for short-lived locking operations, like access to
control structures. They are not be used to protect code that makes
kernel calls or other heavy operations.
Inside PostgreSQL Shared Memory                                                          19
Light Weight Locks

                             Sleep On Lock



                             PROC                 Lightweight Locks    XLOG Buffers
                             Proc Array           Lock Hashes          CLOG Buffers
                                                  LOCK                 Subtrans Buffers
                             Auto Vacuum          PROCLOCK             Two−Phase Structs
                             Btree Vacuum                              Multi−XACT Buffers
                             Free Space Map       Statistics
                             Background Writer    Synchronized Scan    Shared Invalidation


                             Buffer Descriptors

                                                      Shared Buffers




                                                       Semaphores



Light weight locks attempt to acquire the lock, and go to sleep on a
semaphore if the lock request fails. Spinlocks control access to the light
weight lock control structure.
Inside PostgreSQL Shared Memory                                                              20
Database Object Locks



                PROC                    PROCLOCK            LOCK


                                                      Lock Hashes




Inside PostgreSQL Shared Memory                                     21
Proc



                                  PROC

           empty    used     used   empty   used    empty




            Proc Array




Inside PostgreSQL Shared Memory                             22
Other Shared Memory Structures



             PROC                 Lightweight Locks    XLOG Buffers
             Proc Array           Lock Hashes          CLOG Buffers
                                  LOCK                 Subtrans Buffers
             Auto Vacuum          PROCLOCK             Two−Phase Structs
             Btree Vacuum                              Multi−XACT Buffers
             Free Space Map       Statistics
             Background Writer    Synchronized Scan    Shared Invalidation


             Buffer Descriptors

                                      Shared Buffers




                                       Semaphores
Inside PostgreSQL Shared Memory                                              23
Conclusion




                                               Pink Floyd: Wish You Were Here
Inside PostgreSQL Shared Memory                                           24

More Related Content

PPTX
PostgreSQL Database Slides
PDF
Deep dive into PostgreSQL statistics.
PDF
PostgreSQL Deep Internal
PDF
Linux tuning to improve PostgreSQL performance
PDF
High Availability PostgreSQL with Zalando Patroni
PDF
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
PDF
PostgreSQL Tutorial For Beginners | Edureka
KEY
PostgreSQL
PostgreSQL Database Slides
Deep dive into PostgreSQL statistics.
PostgreSQL Deep Internal
Linux tuning to improve PostgreSQL performance
High Availability PostgreSQL with Zalando Patroni
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
PostgreSQL Tutorial For Beginners | Edureka
PostgreSQL

What's hot (20)

PPTX
Maria db 이중화구성_고민하기
PDF
PostgreSQL and RAM usage
PDF
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
PDF
Cassandra Introduction & Features
PDF
PostgreSQL WAL for DBAs
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PDF
MySQL 상태 메시지 분석 및 활용
PDF
PostgreSQL Replication Tutorial
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PDF
Oracle Performance Tuning Fundamentals
PDF
Understanding PostgreSQL LW Locks
PDF
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
PPTX
PostGreSQL Performance Tuning
PPTX
Liquibase for java developers
PDF
Postgresql database administration volume 1
PDF
Maxscale_메뉴얼
PDF
MariaDB 마이그레이션 - 네오클로바
PDF
MySQL Document Store를 활용한 NoSQL 개발
PDF
[Pgday.Seoul 2017] 2. PostgreSQL을 위한 리눅스 커널 최적화 - 김상욱
PPT
Ash masters : advanced ash analytics on Oracle
Maria db 이중화구성_고민하기
PostgreSQL and RAM usage
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
Cassandra Introduction & Features
PostgreSQL WAL for DBAs
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
MySQL 상태 메시지 분석 및 활용
PostgreSQL Replication Tutorial
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Oracle Performance Tuning Fundamentals
Understanding PostgreSQL LW Locks
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
PostGreSQL Performance Tuning
Liquibase for java developers
Postgresql database administration volume 1
Maxscale_메뉴얼
MariaDB 마이그레이션 - 네오클로바
MySQL Document Store를 활용한 NoSQL 개발
[Pgday.Seoul 2017] 2. PostgreSQL을 위한 리눅스 커널 최적화 - 김상욱
Ash masters : advanced ash analytics on Oracle
Ad

Viewers also liked (7)

PPTX
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
PDF
Postgresql Performance
PDF
5 Tips to Simplify the Management of Your Postgres Database
 
PDF
ProstgreSQLFailoverConfiguration
PPT
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
PDF
PostgreSQL Performance Tuning
PDF
Managing replication of PostgreSQL, Simon Riggs
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
Postgresql Performance
5 Tips to Simplify the Management of Your Postgres Database
 
ProstgreSQLFailoverConfiguration
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
PostgreSQL Performance Tuning
Managing replication of PostgreSQL, Simon Riggs
Ad

Similar to Inside PostgreSQL Shared Memory (20)

PDF
Mastering PostgreSQL Administration
 
PDF
Mastering PostgreSQL Administration
PDF
Heroku Postgres Cloud Database Webinar
PDF
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
PDF
20070920 Highload2007 Training Performance Momjian
PPT
Direct SGA access without SQL
PDF
Illustrated buffer cache
PDF
Data Processing Inside PostgreSQL
 
PDF
Farewell to Disks: Efficient Processing of Obstinate Data
PDF
20070925 Highload2007 Momjian Features
PDF
PostgreSQL on Solaris
PDF
PostgreSQL on Solaris
PDF
PG-Strom - A FDW module utilizing GPU device
PDF
Postgres demystified
PDF
Get to know PostgreSQL!
PDF
Heroku Postgres SQL Tips, Tricks, Hacks
ODP
Introduction to PostgreSQL
PDF
Pg92 HA, LCA 2012, Ballarat
ZIP
Sparse Content Map Storage System
PDF
Basi Dati F1 Bis
Mastering PostgreSQL Administration
 
Mastering PostgreSQL Administration
Heroku Postgres Cloud Database Webinar
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
20070920 Highload2007 Training Performance Momjian
Direct SGA access without SQL
Illustrated buffer cache
Data Processing Inside PostgreSQL
 
Farewell to Disks: Efficient Processing of Obstinate Data
20070925 Highload2007 Momjian Features
PostgreSQL on Solaris
PostgreSQL on Solaris
PG-Strom - A FDW module utilizing GPU device
Postgres demystified
Get to know PostgreSQL!
Heroku Postgres SQL Tips, Tricks, Hacks
Introduction to PostgreSQL
Pg92 HA, LCA 2012, Ballarat
Sparse Content Map Storage System
Basi Dati F1 Bis

More from EDB (20)

PDF
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
PDF
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
 
PDF
Migre sus bases de datos Oracle a la nube
 
PDF
EFM Office Hours - APJ - July 29, 2021
 
PDF
Benchmarking Cloud Native PostgreSQL
 
PDF
Las Variaciones de la Replicación de PostgreSQL
 
PDF
NoSQL and Spatial Database Capabilities using PostgreSQL
 
PDF
Is There Anything PgBouncer Can’t Do?
 
PDF
Data Analysis with TensorFlow in PostgreSQL
 
PDF
Practical Partitioning in Production with Postgres
 
PDF
A Deeper Dive into EXPLAIN
 
PDF
IOT with PostgreSQL
 
PDF
A Journey from Oracle to PostgreSQL
 
PDF
Psql is awesome!
 
PDF
EDB 13 - New Enhancements for Security and Usability - APJ
 
PPTX
Comment sauvegarder correctement vos données
 
PDF
Cloud Native PostgreSQL - Italiano
 
PDF
New enhancements for security and usability in EDB 13
 
PPTX
Best Practices in Security with PostgreSQL
 
PDF
Cloud Native PostgreSQL - APJ
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
 
Migre sus bases de datos Oracle a la nube
 
EFM Office Hours - APJ - July 29, 2021
 
Benchmarking Cloud Native PostgreSQL
 
Las Variaciones de la Replicación de PostgreSQL
 
NoSQL and Spatial Database Capabilities using PostgreSQL
 
Is There Anything PgBouncer Can’t Do?
 
Data Analysis with TensorFlow in PostgreSQL
 
Practical Partitioning in Production with Postgres
 
A Deeper Dive into EXPLAIN
 
IOT with PostgreSQL
 
A Journey from Oracle to PostgreSQL
 
Psql is awesome!
 
EDB 13 - New Enhancements for Security and Usability - APJ
 
Comment sauvegarder correctement vos données
 
Cloud Native PostgreSQL - Italiano
 
New enhancements for security and usability in EDB 13
 
Best Practices in Security with PostgreSQL
 
Cloud Native PostgreSQL - APJ
 

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
A Presentation on Artificial Intelligence
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
The AUB Centre for AI in Media Proposal.docx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
A Presentation on Artificial Intelligence
Review of recent advances in non-invasive hemoglobin estimation
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.

Inside PostgreSQL Shared Memory

  • 1. Inside PostgreSQL Shared Memory BRUCE MOMJIAN, ENTERPRISEDB January, 2009 Abstract POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the shared memory structures used by Postgres. Creative Commons Attribution License http://momjian.us/presentations
  • 2. Outline 1. File storage format 2. Shared memory creation 3. Shared buffers 4. Row value access 5. Locking 6. Other structures Inside PostgreSQL Shared Memory 1
  • 3. File System /data Postgres /data Postgres Postgres Inside PostgreSQL Shared Memory 2
  • 4. File System /data/base Postgres /data /base /global Postgres /pg_clog /pg_multixact /pg_subtrans Postgres /pg_tblspc /pg_twophase /pg_xlog Inside PostgreSQL Shared Memory 3
  • 5. File System /data/base/db Postgres /data /base /16385 (production) /1 (template1) Postgres /16821 (test) /17982 (devel) /21452 (marketing) Postgres Inside PostgreSQL Shared Memory 4
  • 6. File System /data/base/db/table Postgres /data /base /16385 /24692 (customer) /27214 (order) Postgres /25932 (product) /25952 (employee) /27839 (part) Postgres Inside PostgreSQL Shared Memory 5
  • 7. File System Data Pages Postgres /data /base /16385 /24692 8k 8k 8k 8k Postgres Postgres Inside PostgreSQL Shared Memory 6
  • 8. Data Pages Postgres /data /base /16385 /24692 8k 8k 8k 8k Postgres Postgres Page Header Item Item Item 8K Tuple Tuple Tuple Special Inside PostgreSQL Shared Memory 7
  • 9. File System Block Tuple Postgres /data /base /16385 /24692 8k 8k 8k 8k Postgres Page Header Item Item Item Postgres 8K Tuple Tuple Tuple Special Tuple Inside PostgreSQL Shared Memory 8
  • 10. File System Tuple int4in(’9241’) ’Martin’ Tuple textout() Header Value Value Value Value Value Value OID − object id of tuple (optional) xmin − creation transaction id xmax − destruction transaction id cmin − creation command id cmax − destruction command id ctid − tuple id (page / item) natts − number of attributes infomask − tuple flags hoff − length of tuple header bits − bit map representing NULLs Inside PostgreSQL Shared Memory 9
  • 11. Tuple Header C Structures typedef struct HeapTupleFields { TransactionId t_xmin; /* inserting xact ID */ TransactionId t_xmax; /* deleting or locking xact ID */ union { CommandId t_cid; /* inserting or deleting command ID, or both */ TransactionId t_xvac; /* VACUUM FULL xact ID */ } t_field3; } HeapTupleFields; typedef struct HeapTupleHeaderData { union { HeapTupleFields t_heap; DatumTupleFields t_datum; } t_choice; ItemPointerData t_ctid; /* current TID of this or newer tuple */ /* Fields below here must match MinimalTupleData! */ uint16 t_infomask2; /* number of attributes + various flags */ uint16 t_infomask; /* various flag bits, see below */ uint8 t_hoff; /* sizeof header incl. bitmap, padding */ /* ^ − 23 bytes − ^ */ bits8 t_bits[1]; /* bitmap of NULLs −− VARIABLE LENGTH */ /* MORE DATA FOLLOWS AT END OF STRUCT */ } HeapTupleHeaderData; Inside PostgreSQL Shared Memory 10
  • 12. Shared Memory Creation k() for postmaster postgres postgres Program (Text) Program (Text) Program (Text) Data Data Data Shared Memory Shared Memory Shared Memory Stack Stack Stack Inside PostgreSQL Shared Memory 11
  • 13. Shared Memory PROC Lightweight Locks XLOG Buffers Proc Array Lock Hashes CLOG Buffers LOCK Subtrans Buffers Auto Vacuum PROCLOCK Two−Phase Structs Btree Vacuum Multi−XACT Buffers Free Space Map Statistics Background Writer Synchronized Scan Shared Invalidation Buffer Descriptors Shared Buffers Semaphores Inside PostgreSQL Shared Memory 12
  • 14. Shared Buffers Buffer Descriptors Pin Count − prevent page replacement LWLock − for page changes 8k 8k 8k Shared Buffers read() Page Header Item Item Item write() Postgres /data /base /16385 /24692 8K 8k 8k 8k 8k Tuple Postgres Tuple Tuple Special Postgres Inside PostgreSQL Shared Memory 13
  • 15. HeapTuples 8k 8k 8k Shared Buffers Page Header Item Item Item 8K Tuple Tuple Tuple Special HeapTuple int4in(’9241’) ’Martin’ Tuple textout() Header Value Value Value Value Value Value Postgres C pointer OID − object id of tuple (optional) xmin − creation transaction id xmax − destruction transaction id cmin − creation command id cmax − destruction command id ctid − tuple id (page / item) natts − number of attributes infomask − tuple flags hoff − length of tuple header bits − bit map representing NULLs Inside PostgreSQL Shared Memory 14
  • 16. Finding A Tuple Value in C Datum nocachegetattr(HeapTuple tuple, int attnum, TupleDesc tupleDesc, bool *isnull) { HeapTupleHeader tup = tuple−>t_data; Form_pg_attribute *att = tupleDesc−>attrs; { int i; /* * Note − This loop is a little tricky. For each non−null attribute, * we have to first account for alignment padding before the attr, * then advance over the attr based on its length. Nulls have no * storage and no alignment padding either. We can use/set * attcacheoff until we reach either a null or a var−width attribute. */ off = 0; for (i = 0;; i++) /* loop exit is at "break" */ { if (HeapTupleHasNulls(tuple) && att_isnull(i, bp)) continue; /* this cannot be the target att */ if (att[i]−>attlen == −1) off = att_align_pointer(off, att[i]−>attalign, −1, tp + off); else /* not varlena, so safe to use att_align_nominal */ off = att_align_nominal(off, att[i]−>attalign); if (i == attnum) break; off = att_addlength_pointer(off, att[i]−>attlen, tp + off); } } return fetchatt(att[attnum], tp + off); } Inside PostgreSQL Shared Memory 15
  • 17. Value Access in C #define fetch_att(T,attbyval,attlen) ( (attbyval) ? ( (attlen) == (int) sizeof(int32) ? Int32GetDatum(*((int32 *)(T))) : ( (attlen) == (int) sizeof(int16) ? Int16GetDatum(*((int16 *)(T))) : ( AssertMacro((attlen) == 1), CharGetDatum(*((char *)(T))) ) ) ) : PointerGetDatum((char *) (T)) ) Inside PostgreSQL Shared Memory 16
  • 18. Test And Set Lock Can Succeed Or Fail 1 1 0/1 0 1 Success Failure Was 0 on exchange Was 1 on exchange Lock already taken Inside PostgreSQL Shared Memory 17
  • 19. Test And Set Lock x86 Assembler static __inline__ int tas(volatile slock_t *lock) { register slock_t _res = 1; /* * Use a non−locking test before asserting the bus lock. Note that the * extra test appears to be a small loss on some x86 platforms and a small * win on others; it’s by no means clear that we should keep it. */ __asm__ __volatile__( " cmpb $0,%1 n" " jne 1f n" " lock n" " xchgb %0,%1 n" "1: n" : "+q"(_res), "+m"(*lock) : : "memory", "cc"); return (int) _res; } Inside PostgreSQL Shared Memory 18
  • 20. Spin Lock Always Succeeds 1 1 0/1 Sleep of increasing duration 0 1 Success Failure Was 0 on exchange Was 1 on exchange Lock already taken Spinlocks are designed for short-lived locking operations, like access to control structures. They are not be used to protect code that makes kernel calls or other heavy operations. Inside PostgreSQL Shared Memory 19
  • 21. Light Weight Locks Sleep On Lock PROC Lightweight Locks XLOG Buffers Proc Array Lock Hashes CLOG Buffers LOCK Subtrans Buffers Auto Vacuum PROCLOCK Two−Phase Structs Btree Vacuum Multi−XACT Buffers Free Space Map Statistics Background Writer Synchronized Scan Shared Invalidation Buffer Descriptors Shared Buffers Semaphores Light weight locks attempt to acquire the lock, and go to sleep on a semaphore if the lock request fails. Spinlocks control access to the light weight lock control structure. Inside PostgreSQL Shared Memory 20
  • 22. Database Object Locks PROC PROCLOCK LOCK Lock Hashes Inside PostgreSQL Shared Memory 21
  • 23. Proc PROC empty used used empty used empty Proc Array Inside PostgreSQL Shared Memory 22
  • 24. Other Shared Memory Structures PROC Lightweight Locks XLOG Buffers Proc Array Lock Hashes CLOG Buffers LOCK Subtrans Buffers Auto Vacuum PROCLOCK Two−Phase Structs Btree Vacuum Multi−XACT Buffers Free Space Map Statistics Background Writer Synchronized Scan Shared Invalidation Buffer Descriptors Shared Buffers Semaphores Inside PostgreSQL Shared Memory 23
  • 25. Conclusion Pink Floyd: Wish You Were Here Inside PostgreSQL Shared Memory 24