SlideShare a Scribd company logo
Postgres-XC: Write-Scalable
    PostgreSQL Cluster


              Mason Sharp

           August 7th, 2012

  CC License: Attribution-NonCommercial-ShareAlike
Content Attribution
• Koichi Suzuki
• Michael Paquier
• Ashutosh Bapat
• Pavan Deolasee
• Mason Sharp
• ...?

Aug 7, 2012                         2
Who am I
    ●   Mason Sharp
    ●   Co-organizer of NYC PUG
    ●   Co-founder of StormDB
    ●   Previously worked at EnterpriseDB
    ●   Original architect of Stado (GridSQL)
    ●   One of the original architects of Postgres-XC




Aug 7, 2012                     Postgres-XC             3
PostgreSQL User Groups




        San Francisco                   New York
         616 Members                    502 Members



                                          New:
                                          Philadelphia
                                          Los Angeles
                        Tokyo
                        2000? Members

Aug 7, 2012             Postgres-XC                      4
NYC PUG Meetup Membership




Aug 7, 2012         Postgres-XC       5
NYC PUG Speakers
    ●   Recent speakers include
         ●    Bruce Momjian
         ●    Greg Smith
         ●    Greg Stark
         ●    Joe Conway
         ●    Joachim Wieland




Aug 7, 2012                     Postgres-XC   6
NYC PUG Speakers
                We want you!




Aug 7, 2012          Postgres-XC   7
Postges-XC Talk
●   Background
●   Postgres-XC Introduction & Usage
●   Postgres-XC Components
●   Postgres-XC Details




                                       8
Background




Aug 7, 2012      Postgres-XC   9
Data Tier Scaling
    ●   Up versus Out
         ●    More memory, more cores
    ●   Read-only Replicated Slaves
    ●   Caching
         ●    Memcached
    ●   Sharding
    ●   NoSQL
    ●   NewSQL




Aug 7, 2012                     Postgres-XC   10
XC Origins




        Koichi Suzuki, NTT Data            Mason Sharp



Aug 7, 2012                       Postgres-XC            11
PostgreSQL-Related Clustering
             Projects
    ●   pgpool-II
         ●    Read replicated slaves
    ●   PL/Proxy
         ●    Used by Skype, meetme (myYearbook)
         ●    All access is over a stored function
    ●   Postgres-R, PostgresForest
    ●   Stado (GridSQL)
         ●    Parallel Query             Can we make it write scalable?
         ●    Not write-scalable



Aug 7, 2012                         Postgres-XC                           12
Postgres-XC Introduction




Aug 7, 2012            Postgres-XC       13
Overview
    ●   PostgreSQL-based database cluster
         ●    Same API to Apps as PostgreSQL
               –   Same drivers
         ●    Currently based upon PG 9.1. Soon: 9.2.
    ●   Symmetric Multi-headed Cluster
         ●    No master, no slave
               –   Not just PostgreSQL replication.
               –   Application can read/write to any coordinator server
         ●    Consistent database view to all the transactions
               –   Complete ACID property to all the transactions in the cluster
    ●   Scales both for Write and Read

Aug 7, 2012                              Postgres-XC                               14
Postgres-XC Cluster
                            Application can connect to any server to have the same database view and service
                                                                                                           .




      PG- XC Server            PG- XC Server               PG- XC Server                                         PG- XC Server


              Coordinator                Coordinator               Coordinator            ・・・
                                                                                           ・・                            Coordinator



               Data Node                 Data Node                 Data Node             Add PG- XC servers as           Data Node
                                                                                               needed




                                                    Communication among PG- XC servers




                                      Global Transaction
                                          Manager
                                                                           GTM




Aug 7, 2012                                                  Postgres-XC                                                               15
Read/Write Scalability
              DBT-1 throughput scalability




Aug 7, 2012                       Postgres-XC   16
I
   Consistency
Aug 7, 2012       Postgres-XC   17
Is XC right for you?
    ●   I need write scalability
    ●   I like ACID
    ●   I like SQL
    ●   I don't want to rewrite my existing SQL
        applications
    ●   I want to leverage the PostgreSQL community
        for all of their contrib modules



Aug 7, 2012                 Postgres-XC           18
Why XC may not be right for you
    ●   I need MPP parallel query capability
         ● Parallel Query in XC Limited
         ● Try Stado: www.stado.us


    ●   I need a solution with built-in HA
    ●   I need massive scale and have loose
        consistency requirements
    ●   I would rather use a NoSQL solution so I can
        put it on my resume


Aug 7, 2012                Postgres-XC                 19
Postgres-XC Components




Aug 7, 2012            Postgres-XC     20
Aug 7, 2012   Postgres-XC   21
Coordinator Overview
●
    Based on PostgreSQL 9.1 (9.2 soon)
●
    Accepts connections from clients
●
    Parses and plans requests
●
    Interacts with Global Transaction Manager
●
    Uses pooler for Data Node connections
●
    Sends down XIDs and snapshots to Data
    Nodes
●
    Collects results and returns to client
●
    Uses two phase commit if necessary

                                                22
Data Node Overview
●
    Based on PostgreSQL 9.1 (9.2 soon)
●
    Where user created data is actually
    stored
●
    Coordinators (not clients) connects to
    Data Nodes
●
    Accepts XID and snapshots from
    Coordinator
●
    The rest is fairly similar to vanilla
    PostgreSQL


                                             23
Global Transaction Manager



     GTM                     Cluster nodes




  XID
  Snapshot
  Timestamp
  Sequence values



Aug 7, 2012          Postgres-XC             24
Summary
   ●   Coordinator
        ●     Visible to apps                                   Postgres-XC core, based upon
                                                                vanilla PostgreSQL
        ●     SQL analysis, planning, execution
        ●     Connection pooling                                Share same binary
   ●   Datanode (or simply “NODE”)                              May want to colocate
        ●     Actual database store
        ●     Local SQL execution
   ●   GTM (Global Transaction Manager)
        ●     Provides consistent database view to transactions
               –   GXID (Global Transaction ID)
               –   Snapshot (List of active transactions)                       Different binaries
               –   Other global values such as SEQUENCE
   ●   GTM Proxy, integrates server-local transaction requirement for performance


Aug 7, 2012                                       Postgres-XC                                        25
Data Distribution


              Distribution Strategies




Aug 7, 2012           Postgres-XC       26
Distributing the data
 ●   Replicated table
      ●   Each row in the table is replicated to the datanodes
      ●   Statement based replication
 ●   Distributed table
      ●   Each row of the table is stored on one datanode,
          decided by one of following strategies
              –   Hash
              –   Round Robin
              –   Modulo
              –   Range and user defined function (future)

Aug 7, 2012                          Postgres-XC             27
Table Distribution and Replication

 ●   Each table can be distributed or replicated
      ●   Strategy based on usage
              –   Transaction tables → Distributed
              –   Static lookup tables → Replicate
              –   Distribute parent-children together
      ●   Join pushdown when possible
      ●   Where clause pushdown
      ●   Simple parallel aggregates



Aug 7, 2012                           Postgres-XC       28
Defining Tables
 ●   Table Distribution/Replication
      ●   CREATE TABLE tab (…) DISTRIBUTE BY
            HASH(col) | MODULO(col) | ROUND
          ROBIN | REPLICATION




Aug 7, 2012              Postgres-XC           29
Replicated Tables
                                                                                 Reads
                         Writes




                                                                          read
                   write write write




                                                             val   val2          val   val2   val   val2
      val   val2        val   val2     val   val2
                                                             1     2             1     2      1     2
      1     2           1     2        1     2
                                                             2     10            2     10     2     10
      2     10          2     10       2     10
                                                             3     4             3     4      3     4
      3     4           3     4        3     4




Aug 7, 2012                                         Postgres-XC                                            30
Distributed Tables
                              Write                                               Read

                                                                                Combiner



                     write
                                                                             read read read




      val     val2           val   val2   val   val2            val   val2                    val   val2
                                                                                 val   val2

      1       2              11    21     10    20
                                                                1     2          11    21     10    20

      2       10             21    101    20    100             2     10                      20    100
                                                                                 21    101
      3       4              31    41     30    40              3     4          31    41     30    40




Aug 7, 2012                                            Postgres-XC                                         31
Join Pushdown
              Hash/Module              Round Robin               Replicated
              distributed



Hash/Modulo   Inner join with       NO                           Inner join if replicated
distributed   equality condition on                              table's distribution list
              the distribution                                   is superset of
              column with same                                   distributed table's
              data type and same                                 distribution list
              distribution strategy
Round Robin   No                    No                            Inner join if replicated
                                                                  table's distribution list
                                                                  is superset of
                                                                  distributed table's
                                                                  distribution list
Replicated    Inner join if replicated Inner join if replicated All kinds of joins
              table's distribution list table's distribution list
              is superset of            is superset of
              distributed table's       distributed table's
              distribution list         distribution list
Aug 7, 2012                      Postgres-XC                                             32
Constraints
 ●   XC does not support Global constraints – i.e.
     constraints across datanodes
 ●   Constraints within a datanode are supported
 Distribution strategy     Unique, primary key           Foreign key constraints
                           constraints

 Replicated                Supported                      Supported if the referenced
                                                          table is also replicated on
                                                          the same nodes
 Hash/Modulo distributed   Supported if primary OR        Supported if the referenced
                           unique key is distribution key table is replicated on same
                                                          nodes OR it's distributed by
                                                          primary key in the same
                                                          manner and same nodes
 Round Robin               Not supported                  Supported if the referenced
                                                          table is replicated on same
                                                          nodes

Aug 7, 2012                         Postgres-XC                                      33
Demo




Aug 7, 2012   Postgres-XC   34
Transaction Management


          Why MVCC is Important for Consistency
               Global Transaction Manger



Aug 7, 2012              Postgres-XC              35
Multi-version Concurrency Control
          (MVCC) (quick overview)
 ●   Readers do not block writers
 ●   Writers do not block readers
 ●   Transaction Ids (XIDs)
      ●   Every transaction gets an ID
 ●   Snapshots contain a list of running XIDs




Aug 7, 2012                   Postgres-XC       36
Multi-version Concurrency Control
          (MVCC) (quickly discussed)
     Example:
T1 Begin...
T2            Begin; INSERT...; Commit
T3                Begin...
T4                                          Begin; SELECT


 ●   T4's snapshot contains T1 and T3
      ●   T2 already committed
      ●   It can see T2's commits, but not T1's nor T3's

Aug 7, 2012                   Postgres-XC                   37
Multi-version Concurrency Control
          (MVCC) on 2 Independent Nodes
     Example:
T1 Begin...
T2            Begin; INSERT..;   Commit;
T3               Begin...
T4                          Begin; SELECT

 ●   Node 1: T2 Commit, T4 SELECT
 ●   Node 2: T4 SELECT, T2 Commit
 ●   T4's SELECT statement returns inconsistent data
      ●   Includes data from Node1, but not Node2.
      ●   C in ACID Fails

Aug 7, 2012                             Postgres-XC    38
Global Transaction Manager
          (GTM)
   ●   Provides Global Transaction Consistency



     GTM                        Cluster nodes




  XID
  Snapshot
  Timestamp
  Sequence values



Aug 7, 2012             Postgres-XC              39
Transaction Management
●   2PC is used to guarantee transactional consistency
    across nodes
    ●   When there are more than one nodes involved OR
    ●   When there are explicit 2PC transactions
●   Only those nodes where write activity has happened,
    participate in 2PC
●   In PostgreSQL 2PC can not be applied if temporary
    tables are involved. Same restriction applies in
    Postgres-XC
●    When single coordinator command needs multiple
     datanode commands, we encase those in transaction
     block
Aug 7, 2012               Postgres-XC                  40
Postgres-XC Considerations




Aug 7, 2012              Postgres-XC       41
Can GTM be a Performance Bottleneck?
 • Depending on implementation
                  – Current Implementation              Coordinators
GTM


                      GTM Threads                       Coordinator Backend
  Snapshot Data




                                        Domain Socket

                                                                                                        Applicable up to




                                                                 Client Library




                                                                                         Coordinator
                                           Internet




                   Lock                                                                                five PG-XC




                                                                                  Call
                                                                                                       servers (DBT-1)

                     Create Terminate

                  GTM Main Thread




                  – Large snapshot size and number
                  – Too many interaction between GTM and Coordinators


July 12th, 2012                                                                                                    42
Can GTM be a Performance Bottleneck?
Proxy Implementation                                                                                    Coordinators

                        GTM


                                     GTM Worker Threads                                                        GTM Proxy Thread                                         Coordinator Backend
        Snapshot Data




                                          GTM Snapshot Handler




                                                                        GTM Server Scanner




                                                                                                                       Server Protocol Handler




                                                                                                                                                  Command
                                                                                                                                                   Backend

                                                                                                                                                   Handler




                                                                                                                                                                                Client Library
                                                                                             Internet




                                                                                                                                                                                                        Coordinator
                                                                                             Domain
                                                                                             Socket




                                                                                                                                                               Domain
                                                                                                                                                               Socket
                                                                 Call




                                                                                                                                                                Unix
                              Lock




                                                                                                                                                                                                 Call
                                                                                                                                                  Response
                                                                                                                                                   Backend

                                                                                                                                                   Handler
                                       Create Terminate                                                        Create                             Connection
                                                                                                              Terminate                          Assignment

                                     GTM Main Thread                                                               Proxy Main Thread
                                                                                                                                                                        Connection



•Request/Response grouping
•Single representative snapshot applied to multiple transactions


July 12th, 2012                                                                                                                                                                                                       43
Can GTM be a SPOF?
• Implement GTM Standby

                         Checkpoint next starting
                        point (GXID and Sequence)



           GTM Master                                   GTM Standby

                                                    Standby can failover the
                                                    master without referring to
                                                    GTM master information.




July 12th, 2012                                                                   44
Parallel Query
 ●   OK for simple queries
     ●   Also when all joins can be pushed down
              –   Star schema with replicated dimensions
 ●   Even aggregates
     ●   SELECT SUM(col1) FROM tab1
 ●   If cross-node join needed performs poorly
     ●   Data on one node needs to join with another
     ●   Ships all data to coordinator for joining



Aug 7, 2012                          Postgres-XC           45
High Availability
 ●   GTM-standby provides basic HA
 ●   No native HA for nodes
      ●   Use HA middleware such as Pacemaker
 ●   Each data node should be configured with
     synchronous replication




Aug 7, 2012                Postgres-XC          46
Status



              Settings and options



Aug 7, 2012          Postgres-XC     47
Present Status
 ●   Project/Developer site
      ●   http://postgres-xc.sourceforge.net/
      ●   http://sourceforge.net/projects/postgres-xc/
 ●   Version 1.0 available
      ●   Base PostgreSQL version: 9.1
      ●   Soon, PostgreSQL 9.2!
              –   Group commit: even more write scalability
              –   “Index-only Scans”
 ●   Get Involved
      ●   Even as just a tester
Aug 7, 2012                          Postgres-XC              48
Easy way of trying it out?
 ●   www.stormdb.com
      ●   Not Postgres-XC, but similar
      ●   Nothing to install, cloud hosted
      ●   Free beta




Aug 7, 2012                   Postgres-XC    49
Thank You


              mason@stormdb.com
              Twitter: mason_db



Aug 7, 2012        Postgres-XC    50

More Related Content

PDF
Postgres-XC as a Key Value Store Compared To MongoDB
PDF
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
PDF
Introduction to Postrges-XC
PDF
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
PDF
Postgres-XC: Symmetric PostgreSQL Cluster
PDF
Geographically Distributed PostgreSQL
PDF
Bn 1016 demo postgre sql-online-training
PDF
Postgres clusters
Postgres-XC as a Key Value Store Compared To MongoDB
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Introduction to Postrges-XC
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Postgres-XC: Symmetric PostgreSQL Cluster
Geographically Distributed PostgreSQL
Bn 1016 demo postgre sql-online-training
Postgres clusters

What's hot (20)

PDF
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PDF
Managing terabytes: When Postgres gets big
PPTX
Overview of some popular distributed databases
PDF
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
PDF
GlusterFS And Big Data
PDF
Gluster.community.day.2013
ODP
Gluster Data Tiering
PDF
Red Hat Storage - Introduction to GlusterFS
PDF
The Future of GlusterFS and Gluster.org
PDF
Storage as a Service with Gluster
ODP
Glusterfs and Hadoop
PPTX
Gluster Storage
ODP
Tiering barcelona
PPTX
Getting started with postgresql
ODP
Sdc challenges-2012
PDF
Gluster overview & future directions vault 2015
PDF
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
PDF
Introduction to GlusterFS Webinar - September 2011
PDF
SQL, NoSQL, NewSQL? What's a developer to do?
PPTX
CockroachDB
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
Managing terabytes: When Postgres gets big
Overview of some popular distributed databases
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
GlusterFS And Big Data
Gluster.community.day.2013
Gluster Data Tiering
Red Hat Storage - Introduction to GlusterFS
The Future of GlusterFS and Gluster.org
Storage as a Service with Gluster
Glusterfs and Hadoop
Gluster Storage
Tiering barcelona
Getting started with postgresql
Sdc challenges-2012
Gluster overview & future directions vault 2015
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Introduction to GlusterFS Webinar - September 2011
SQL, NoSQL, NewSQL? What's a developer to do?
CockroachDB
Ad

Viewers also liked (20)

PDF
Distributed Postgres
PDF
Postgres in Amazon RDS
PDF
Scaling postgres
PDF
Pgxc scalability pg_open2012
ODP
Replicação PostgreSQL com RepManager
PDF
Flexible Indexing with Postgres
 
PDF
How the Postgres Query Optimizer Works
 
ODP
An Introduction to Postgresql
PDF
Overview of Postgres 9.5
 
PDF
1 introduction
PDF
4 the sql_standard
PDF
6 relational schema_design
PPT
Best Practices for Database Schema Design
PDF
5 data storage_and_indexing
PPTX
Managing your tech career
PDF
Multimaster
PPTX
Webinar: Build an Application Series - Session 2 - Getting Started
PDF
3 relational model
Distributed Postgres
Postgres in Amazon RDS
Scaling postgres
Pgxc scalability pg_open2012
Replicação PostgreSQL com RepManager
Flexible Indexing with Postgres
 
How the Postgres Query Optimizer Works
 
An Introduction to Postgresql
Overview of Postgres 9.5
 
1 introduction
4 the sql_standard
6 relational schema_design
Best Practices for Database Schema Design
5 data storage_and_indexing
Managing your tech career
Multimaster
Webinar: Build an Application Series - Session 2 - Getting Started
3 relational model
Ad

Similar to Postgres-XC Write Scalable PostgreSQL Cluster (20)

PDF
JDG 7 & Spark Integration
PDF
Clusters With Glusterfs
ODP
Geospatial web services using little-known GDAL features and modern Perl midd...
PDF
The Accidental DBA
PDF
Elephant Roads: a tour of Postgres forks
PDF
Elephant Roads: PostgreSQL Patches and Variants
PDF
NoSQL solutions
PDF
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
PDF
NoSQL Databases
PDF
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
PDF
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
PDF
Secrets of Spark's success - Deenar Toraskar, Think Reactive
PDF
kranonit S06E01 Игорь Цинько: High load
PDF
MySQL X protocol - Talking to MySQL Directly over the Wire
PDF
Linuxtag.ceph.talk
PDF
DevEx | there’s no place like k3s
PDF
Blackray @ SAPO CodeBits 2009
PDF
High performance json- postgre sql vs. mongodb
PDF
Node.js scaling in highload
PDF
PUGS Meetup Presentation - 11062015
JDG 7 & Spark Integration
Clusters With Glusterfs
Geospatial web services using little-known GDAL features and modern Perl midd...
The Accidental DBA
Elephant Roads: a tour of Postgres forks
Elephant Roads: PostgreSQL Patches and Variants
NoSQL solutions
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
NoSQL Databases
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Secrets of Spark's success - Deenar Toraskar, Think Reactive
kranonit S06E01 Игорь Цинько: High load
MySQL X protocol - Talking to MySQL Directly over the Wire
Linuxtag.ceph.talk
DevEx | there’s no place like k3s
Blackray @ SAPO CodeBits 2009
High performance json- postgre sql vs. mongodb
Node.js scaling in highload
PUGS Meetup Presentation - 11062015

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Modernizing your data center with Dell and AMD
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Modernizing your data center with Dell and AMD
Advanced methodologies resolving dimensionality complications for autism neur...
Building Integrated photovoltaic BIPV_UPV.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
NewMind AI Monthly Chronicles - July 2025
Chapter 3 Spatial Domain Image Processing.pdf
Empathic Computing: Creating Shared Understanding
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
Understanding_Digital_Forensics_Presentation.pptx

Postgres-XC Write Scalable PostgreSQL Cluster

  • 1. Postgres-XC: Write-Scalable PostgreSQL Cluster Mason Sharp August 7th, 2012 CC License: Attribution-NonCommercial-ShareAlike
  • 2. Content Attribution • Koichi Suzuki • Michael Paquier • Ashutosh Bapat • Pavan Deolasee • Mason Sharp • ...? Aug 7, 2012 2
  • 3. Who am I ● Mason Sharp ● Co-organizer of NYC PUG ● Co-founder of StormDB ● Previously worked at EnterpriseDB ● Original architect of Stado (GridSQL) ● One of the original architects of Postgres-XC Aug 7, 2012 Postgres-XC 3
  • 4. PostgreSQL User Groups San Francisco New York 616 Members 502 Members New: Philadelphia Los Angeles Tokyo 2000? Members Aug 7, 2012 Postgres-XC 4
  • 5. NYC PUG Meetup Membership Aug 7, 2012 Postgres-XC 5
  • 6. NYC PUG Speakers ● Recent speakers include ● Bruce Momjian ● Greg Smith ● Greg Stark ● Joe Conway ● Joachim Wieland Aug 7, 2012 Postgres-XC 6
  • 7. NYC PUG Speakers We want you! Aug 7, 2012 Postgres-XC 7
  • 8. Postges-XC Talk ● Background ● Postgres-XC Introduction & Usage ● Postgres-XC Components ● Postgres-XC Details 8
  • 9. Background Aug 7, 2012 Postgres-XC 9
  • 10. Data Tier Scaling ● Up versus Out ● More memory, more cores ● Read-only Replicated Slaves ● Caching ● Memcached ● Sharding ● NoSQL ● NewSQL Aug 7, 2012 Postgres-XC 10
  • 11. XC Origins Koichi Suzuki, NTT Data Mason Sharp Aug 7, 2012 Postgres-XC 11
  • 12. PostgreSQL-Related Clustering Projects ● pgpool-II ● Read replicated slaves ● PL/Proxy ● Used by Skype, meetme (myYearbook) ● All access is over a stored function ● Postgres-R, PostgresForest ● Stado (GridSQL) ● Parallel Query Can we make it write scalable? ● Not write-scalable Aug 7, 2012 Postgres-XC 12
  • 13. Postgres-XC Introduction Aug 7, 2012 Postgres-XC 13
  • 14. Overview ● PostgreSQL-based database cluster ● Same API to Apps as PostgreSQL – Same drivers ● Currently based upon PG 9.1. Soon: 9.2. ● Symmetric Multi-headed Cluster ● No master, no slave – Not just PostgreSQL replication. – Application can read/write to any coordinator server ● Consistent database view to all the transactions – Complete ACID property to all the transactions in the cluster ● Scales both for Write and Read Aug 7, 2012 Postgres-XC 14
  • 15. Postgres-XC Cluster Application can connect to any server to have the same database view and service . PG- XC Server PG- XC Server PG- XC Server PG- XC Server Coordinator Coordinator Coordinator ・・・ ・・ Coordinator Data Node Data Node Data Node Add PG- XC servers as Data Node needed Communication among PG- XC servers Global Transaction Manager GTM Aug 7, 2012 Postgres-XC 15
  • 16. Read/Write Scalability DBT-1 throughput scalability Aug 7, 2012 Postgres-XC 16
  • 17. I Consistency Aug 7, 2012 Postgres-XC 17
  • 18. Is XC right for you? ● I need write scalability ● I like ACID ● I like SQL ● I don't want to rewrite my existing SQL applications ● I want to leverage the PostgreSQL community for all of their contrib modules Aug 7, 2012 Postgres-XC 18
  • 19. Why XC may not be right for you ● I need MPP parallel query capability ● Parallel Query in XC Limited ● Try Stado: www.stado.us ● I need a solution with built-in HA ● I need massive scale and have loose consistency requirements ● I would rather use a NoSQL solution so I can put it on my resume Aug 7, 2012 Postgres-XC 19
  • 20. Postgres-XC Components Aug 7, 2012 Postgres-XC 20
  • 21. Aug 7, 2012 Postgres-XC 21
  • 22. Coordinator Overview ● Based on PostgreSQL 9.1 (9.2 soon) ● Accepts connections from clients ● Parses and plans requests ● Interacts with Global Transaction Manager ● Uses pooler for Data Node connections ● Sends down XIDs and snapshots to Data Nodes ● Collects results and returns to client ● Uses two phase commit if necessary 22
  • 23. Data Node Overview ● Based on PostgreSQL 9.1 (9.2 soon) ● Where user created data is actually stored ● Coordinators (not clients) connects to Data Nodes ● Accepts XID and snapshots from Coordinator ● The rest is fairly similar to vanilla PostgreSQL 23
  • 24. Global Transaction Manager GTM Cluster nodes XID Snapshot Timestamp Sequence values Aug 7, 2012 Postgres-XC 24
  • 25. Summary ● Coordinator ● Visible to apps Postgres-XC core, based upon vanilla PostgreSQL ● SQL analysis, planning, execution ● Connection pooling Share same binary ● Datanode (or simply “NODE”) May want to colocate ● Actual database store ● Local SQL execution ● GTM (Global Transaction Manager) ● Provides consistent database view to transactions – GXID (Global Transaction ID) – Snapshot (List of active transactions) Different binaries – Other global values such as SEQUENCE ● GTM Proxy, integrates server-local transaction requirement for performance Aug 7, 2012 Postgres-XC 25
  • 26. Data Distribution Distribution Strategies Aug 7, 2012 Postgres-XC 26
  • 27. Distributing the data ● Replicated table ● Each row in the table is replicated to the datanodes ● Statement based replication ● Distributed table ● Each row of the table is stored on one datanode, decided by one of following strategies – Hash – Round Robin – Modulo – Range and user defined function (future) Aug 7, 2012 Postgres-XC 27
  • 28. Table Distribution and Replication ● Each table can be distributed or replicated ● Strategy based on usage – Transaction tables → Distributed – Static lookup tables → Replicate – Distribute parent-children together ● Join pushdown when possible ● Where clause pushdown ● Simple parallel aggregates Aug 7, 2012 Postgres-XC 28
  • 29. Defining Tables ● Table Distribution/Replication ● CREATE TABLE tab (…) DISTRIBUTE BY HASH(col) | MODULO(col) | ROUND ROBIN | REPLICATION Aug 7, 2012 Postgres-XC 29
  • 30. Replicated Tables Reads Writes read write write write val val2 val val2 val val2 val val2 val val2 val val2 1 2 1 2 1 2 1 2 1 2 1 2 2 10 2 10 2 10 2 10 2 10 2 10 3 4 3 4 3 4 3 4 3 4 3 4 Aug 7, 2012 Postgres-XC 30
  • 31. Distributed Tables Write Read Combiner write read read read val val2 val val2 val val2 val val2 val val2 val val2 1 2 11 21 10 20 1 2 11 21 10 20 2 10 21 101 20 100 2 10 20 100 21 101 3 4 31 41 30 40 3 4 31 41 30 40 Aug 7, 2012 Postgres-XC 31
  • 32. Join Pushdown Hash/Module Round Robin Replicated distributed Hash/Modulo Inner join with NO Inner join if replicated distributed equality condition on table's distribution list the distribution is superset of column with same distributed table's data type and same distribution list distribution strategy Round Robin No No Inner join if replicated table's distribution list is superset of distributed table's distribution list Replicated Inner join if replicated Inner join if replicated All kinds of joins table's distribution list table's distribution list is superset of is superset of distributed table's distributed table's distribution list distribution list Aug 7, 2012 Postgres-XC 32
  • 33. Constraints ● XC does not support Global constraints – i.e. constraints across datanodes ● Constraints within a datanode are supported Distribution strategy Unique, primary key Foreign key constraints constraints Replicated Supported Supported if the referenced table is also replicated on the same nodes Hash/Modulo distributed Supported if primary OR Supported if the referenced unique key is distribution key table is replicated on same nodes OR it's distributed by primary key in the same manner and same nodes Round Robin Not supported Supported if the referenced table is replicated on same nodes Aug 7, 2012 Postgres-XC 33
  • 34. Demo Aug 7, 2012 Postgres-XC 34
  • 35. Transaction Management Why MVCC is Important for Consistency Global Transaction Manger Aug 7, 2012 Postgres-XC 35
  • 36. Multi-version Concurrency Control (MVCC) (quick overview) ● Readers do not block writers ● Writers do not block readers ● Transaction Ids (XIDs) ● Every transaction gets an ID ● Snapshots contain a list of running XIDs Aug 7, 2012 Postgres-XC 36
  • 37. Multi-version Concurrency Control (MVCC) (quickly discussed) Example: T1 Begin... T2 Begin; INSERT...; Commit T3 Begin... T4 Begin; SELECT ● T4's snapshot contains T1 and T3 ● T2 already committed ● It can see T2's commits, but not T1's nor T3's Aug 7, 2012 Postgres-XC 37
  • 38. Multi-version Concurrency Control (MVCC) on 2 Independent Nodes Example: T1 Begin... T2 Begin; INSERT..; Commit; T3 Begin... T4 Begin; SELECT ● Node 1: T2 Commit, T4 SELECT ● Node 2: T4 SELECT, T2 Commit ● T4's SELECT statement returns inconsistent data ● Includes data from Node1, but not Node2. ● C in ACID Fails Aug 7, 2012 Postgres-XC 38
  • 39. Global Transaction Manager (GTM) ● Provides Global Transaction Consistency GTM Cluster nodes XID Snapshot Timestamp Sequence values Aug 7, 2012 Postgres-XC 39
  • 40. Transaction Management ● 2PC is used to guarantee transactional consistency across nodes ● When there are more than one nodes involved OR ● When there are explicit 2PC transactions ● Only those nodes where write activity has happened, participate in 2PC ● In PostgreSQL 2PC can not be applied if temporary tables are involved. Same restriction applies in Postgres-XC ● When single coordinator command needs multiple datanode commands, we encase those in transaction block Aug 7, 2012 Postgres-XC 40
  • 41. Postgres-XC Considerations Aug 7, 2012 Postgres-XC 41
  • 42. Can GTM be a Performance Bottleneck? • Depending on implementation – Current Implementation Coordinators GTM GTM Threads Coordinator Backend Snapshot Data Domain Socket Applicable up to Client Library Coordinator Internet Lock five PG-XC Call servers (DBT-1) Create Terminate GTM Main Thread – Large snapshot size and number – Too many interaction between GTM and Coordinators July 12th, 2012 42
  • 43. Can GTM be a Performance Bottleneck? Proxy Implementation Coordinators GTM GTM Worker Threads GTM Proxy Thread Coordinator Backend Snapshot Data GTM Snapshot Handler GTM Server Scanner Server Protocol Handler Command Backend Handler Client Library Internet Coordinator Domain Socket Domain Socket Call Unix Lock Call Response Backend Handler Create Terminate Create Connection Terminate Assignment GTM Main Thread Proxy Main Thread Connection •Request/Response grouping •Single representative snapshot applied to multiple transactions July 12th, 2012 43
  • 44. Can GTM be a SPOF? • Implement GTM Standby Checkpoint next starting point (GXID and Sequence) GTM Master GTM Standby Standby can failover the master without referring to GTM master information. July 12th, 2012 44
  • 45. Parallel Query ● OK for simple queries ● Also when all joins can be pushed down – Star schema with replicated dimensions ● Even aggregates ● SELECT SUM(col1) FROM tab1 ● If cross-node join needed performs poorly ● Data on one node needs to join with another ● Ships all data to coordinator for joining Aug 7, 2012 Postgres-XC 45
  • 46. High Availability ● GTM-standby provides basic HA ● No native HA for nodes ● Use HA middleware such as Pacemaker ● Each data node should be configured with synchronous replication Aug 7, 2012 Postgres-XC 46
  • 47. Status Settings and options Aug 7, 2012 Postgres-XC 47
  • 48. Present Status ● Project/Developer site ● http://postgres-xc.sourceforge.net/ ● http://sourceforge.net/projects/postgres-xc/ ● Version 1.0 available ● Base PostgreSQL version: 9.1 ● Soon, PostgreSQL 9.2! – Group commit: even more write scalability – “Index-only Scans” ● Get Involved ● Even as just a tester Aug 7, 2012 Postgres-XC 48
  • 49. Easy way of trying it out? ● www.stormdb.com ● Not Postgres-XC, but similar ● Nothing to install, cloud hosted ● Free beta Aug 7, 2012 Postgres-XC 49
  • 50. Thank You mason@stormdb.com Twitter: mason_db Aug 7, 2012 Postgres-XC 50