SlideShare a Scribd company logo
Scaling PostgreSQL
     with Stado
Who Am I?
• Jim Mlodgenski
  – Founder of Cirrus Technologies
  – Former Chief Architect of EnterpriseDB
  – Co-organizer of NYCPUG
Agenda
•   What is Stado?
•   Architecture
•   Query Flow
•   Scaling
•   Limitations
What is Stado?
• Continuation of GridSQL
• “Shared-Nothing”, distributed data architecture.
   – Leverage the power of multiple commodity
     servers while appearing as a single database
     to the application
• Essentially...
     Open Source
     Greenplum, Netezza or Teradata
Stado Details
• Designed for Parallel Querying
• Not just “Read-Only”, can execute
  UPDATE, DELETE
• Data Loader for parallel loading
• Standard connectivity via PostgreSQL
  compatible connectors: JDBC, ODBC,
  ADO.NET, libpq (psql)
What Stado is not?
• A replication solution like Slony or Bucardo
• A high availability solution like Synchronous
  Replication in PostgreSQL 9.1
• A scalable transactional solution like PostgresXC
• An elastic, eventually consistent NoSQL database
Architecture
• Loosely coupled, shared-
  nothing architecture
• Data repositories
   – Metadata database
   – Stado database
• Stado processes
   – Central coordinator
   – Agents
Configuration
• Can be configured for multiple logical “nodes” per
  physical server
  – Take advantage of multi-core processors
• Tables may be either replicated or partitioned
• Replicated tables for static lookup data or
  dimensions
  – Partitioned tables for large fact tables
Partitioning
• Tables may simultaneously use Stado
  Partitioning with Constraint Exclusion
  Partitioning
  – Large queries scan a much smaller subset of
    data by using subtables
  – Since each subtable is also partitioned
    across nodes, they are scanned in parallel
  – Queries execute much faster
Creating Tables
• Tables can be partitioned or
  replicated
CREATE TABLE STATE_CODES (
     STATE_CD varchar(2) PRIMARY KEY,
     USPS_CD varchar(2),
     NAME varchar(100),
     GNISIS varchar(8)) REPLICATED;
Creating Tables

CREATE TABLE roads (
  gid integer NOT NULL,
  statefp character varying(2),
  countyfp character varying(3),
  linearid character varying(22),
  fullname character varying(100),
  rttyp character varying(1),
  mtfcc character varying(5),
  the_geom geometry)
PARTITIONING KEY gid ON ALL;
Query Optimization
• Cost Based Optimizer
   – Takes into account Row Shipping
     (expensive)
• Looks for joins with replicated tables
   – Can be done locally
   – Looks for joins between tables on
     partitioned columns
Two Phase Aggregation
• SUM
  – SUM(stat1)
  – SUM2(SUM(stat1)
• AVG
  – SUM(stat1) / COUNT(stat1)
  – SUM2 (SUM(stat1)) / SUM2 (COUNT(stat1))
Query 1
SELECT sum(st_length_spheroid(the_geom,
         'SPHEROID["GRS_1980",6378137,298.257222101]'))/1609.344
        as interstate_miles
 FROM roads
 WHERE rttyp = 'I';




                 interstate_miles
                ------------------
                 84588.5425986619
                (1 row)
Query 1 :
Results
                                       120




                                       100



Nodes Actual (sec)                     80

    1 101.2080566

                      Time (seconds)
    4   25.6410708                     60                              Linear
                                                                       Actual
    8    14.3321144
                                       40
   12     5.4738612
   16     4.8214672
                                       20




                                        0
                                             1   4      8    12   16

                                                     Nodes
Query 2
SELECT s.name as state, c.name as county, a.population, b.road_length,
       a.population/b.road_length as person_per_km
  FROM (SELECT state_cd, county_cd, sum(population) as population
          FROM census_tract
         GROUP BY 1, 2) a,
       (SELECT statefp, countyfp,
               sum(st_length_spheroid(the_geom,
'SPHEROID["GRS_1980",6378137,298.257222101]'))/1000 as road_length
          FROM roads
         GROUP BY 1, 2) b,
       state_codes s, county_codes c
 WHERE a.state_cd = b.statefp
   AND a.county_cd = b.countyfp
   AND a.state_cd = c.state_cd
   AND a.county_cd = c.county_cd
   AND c.state_cd = s.state_cd
 ORDER BY 5 DESC
 LIMIT 20;
state       |     county       | population |   road_length    |   person_per_km
----------------------+-----------------+------------+------------------+------------------
New York             | New York         |    1537195 | 1465.35561969273 | 1049.02521909483
New York             | Kings            |    2465326 | 2785.37685011507 | 885.096032839562
New York             | Bronx            |    1332650 | 1638.47925579201 | 813.345665066614
New York             | Queens           |    2229379 | 4343.78066667893 | 513.234707521383
New Jersey           | Hudson           |     608975 | 1474.86512729116 | 412.902162191933
California           | San Francisco    |     776733 | 2125.05706617179 |   365.51159607175
Pennsylvania         | Philadelphia     |    1517550 | 5067.19918355051 | 299.484970894054
District of Columbia | Washington       |     572059 | 2191.33029860109 | 261.055579054054
New York             | Richmond         |     443728 | 1758.77468237864 | 252.293829588156
Massachusetts        | Suffolk          |     689807 | 2805.37242915611 | 245.887851762877
New Jersey           | Essex            |     793633 | 3359.22581976629 | 236.254733257324
Virginia             | Alexandria City |      128283 |   577.98117468444 | 221.950135434841
Puerto Rico          | San Juan         |     434374 | 1994.26820504899 | 217.811224638829
Virginia             | Arlington        |     189453 | 967.505165121908 | 195.816008874876
New Jersey           | Union            |     522541 | 2827.74655887522 | 184.790605919029
Maryland             | Baltimore City   |     651154 | 3707.01218958787 | 175.654669231717
Puerto Rico          | Catano           |      30071 | 174.765650431886 | 172.064704509654
Hawaii               | Honolulu         |     876156 |   5098.8482067881 | 171.834101441493
Puerto Rico          | Toa Baja         |      94085 | 558.532996996738 | 168.450208861249
Puerto Rico          | Carolina         |     186076 | 1122.20560229076 | 165.812752690026
(20 rows)
Query 2 :
Results
                                        4500


                                        4000


                                        3500
Nodes Actual (sec)
                                        3000
    1   3983.1002548

                       Time (seconds)
                                        2500
    4   1007.1235182                                                     Linear
                                                                         Actual
                                        2000
    8    563.6259202
   12     365.152858                    1500


   16    282.7345952                    1000


                                        500


                                          0
                                               1   4       8   12   16

                                                       Nodes
Scalability
Limitations
• SQL Support
  – Uses its own parser and optimizer
    so:
     • No Window Functions
     • No Stored Procedures
     • No Full Text Search
Transaction Performance
• Single row Insert, Update, or Delete are slow compared
  to a single PostgreSQL instance
   – The data must make an additional network trip to be
     committed
   – All partitioned rows must be hashed to be mapped to
     the proper node
   – All replicated rows must be committed to all nodes
• Use “gs-loader” for bulk loading for better performance
High Availability
• No heartbeat or fail-over control in the coordinator
  – High Availability for each PostgreSQL node must be
    configured separately
  – Streaming replication can be ideal for this
• Getting a consistent backup of the entire Stado
  database is difficult
  – Must ensure there are no transaction are occurring
  – Backup each node separately
Adding Nodes
• Requires Downtime
  – Data must be manually reloaded to partition
    the data to the new node
• With planning, the process can be fast with no
  mapping of data
  – Run multiple PostgreSQL instances on each
    physical server and move the PostgreSQL
    instances to new hardware as needed
Summary
• Stado can improve performance
  tremendously of queries
• Stado can scale linearly as more nodes
  are added
• Stado is open source so if the
  limitations are an issue,
  submit a patch
Download Stado at:
http://stado.us


Jim Mlodgenski
 Email:     jim@cirrusql.com
 Twitter:   @jim_mlodgenski


 NYC PostgreSQL User Group
 http://nycpug.org

More Related Content

ODP
Scaling PostgreSQL With GridSQL
ODP
Multi-Master Replication with Slony
PPTX
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
PDF
Fun with click house window functions webinar slides 2021-08-19
PDF
Performance features12102 doag_2014
PDF
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
PPTX
Hive query optimization infinity
Scaling PostgreSQL With GridSQL
Multi-Master Replication with Slony
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Fun with click house window functions webinar slides 2021-08-19
Performance features12102 doag_2014
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Hive query optimization infinity

What's hot (20)

PPTX
Join optimization in hive
PDF
Data preparation covariates
 
PDF
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
PDF
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
PDF
Developers' mDay 2017. - Bogdan Kecman Oracle
PDF
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
PDF
Map reduce: beyond word count
PDF
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
PDF
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
PDF
Photon Technical Deep Dive: How to Think Vectorized
PPT
Using PostGIS To Add Some Spatial Flavor To Your Application
ODP
Introduction To PostGIS
PDF
Mysqlconf2013 mariadb-cassandra-interoperability
PDF
Map Reduce
ODP
Intro To PostGIS
PDF
Spatial query on vanilla databases
PDF
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
PDF
Common Table Expressions in MariaDB 10.2
PPTX
Day 6 - PostGIS
Join optimization in hive
Data preparation covariates
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
Developers' mDay 2017. - Bogdan Kecman Oracle
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Map reduce: beyond word count
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
Photon Technical Deep Dive: How to Think Vectorized
Using PostGIS To Add Some Spatial Flavor To Your Application
Introduction To PostGIS
Mysqlconf2013 mariadb-cassandra-interoperability
Map Reduce
Intro To PostGIS
Spatial query on vanilla databases
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Common Table Expressions in MariaDB 10.2
Day 6 - PostGIS
Ad

Similar to Scaling PostreSQL with Stado (20)

PDF
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
KEY
NoSQL databases and managing big data
PDF
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
PDF
PDF
8.4 Upcoming Features
ODP
Xml::parent - Yet another way to store XML files
PDF
On Beyond (PostgreSQL) Data Types
PPTX
H base vs hive srp vs analytics 2-14-2012
PPTX
Nearest Neighbor Customer Insight
PDF
An Open Source NoSQL solution for Internet Access Logs Analysis
PDF
Oracle no sql overview brief
PDF
Go simple-fast-elastic-with-couchbase-server-borkar
PDF
Sql 99 and_some_techniques
PDF
Performance improvements in PostgreSQL 9.5 and beyond
PDF
PostgreSQL: Advanced features in practice
PPTX
SQLFire at Strata 2012
PPTX
PDF
Finding the Right Data Solution for your Application in the Data Storage Hays...
PDF
Introduction to PgBench
PDF
Modernización del manejo de datos con v fabric
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
NoSQL databases and managing big data
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
8.4 Upcoming Features
Xml::parent - Yet another way to store XML files
On Beyond (PostgreSQL) Data Types
H base vs hive srp vs analytics 2-14-2012
Nearest Neighbor Customer Insight
An Open Source NoSQL solution for Internet Access Logs Analysis
Oracle no sql overview brief
Go simple-fast-elastic-with-couchbase-server-borkar
Sql 99 and_some_techniques
Performance improvements in PostgreSQL 9.5 and beyond
PostgreSQL: Advanced features in practice
SQLFire at Strata 2012
Finding the Right Data Solution for your Application in the Data Storage Hays...
Introduction to PgBench
Modernización del manejo de datos con v fabric
Ad

More from Jim Mlodgenski (10)

PDF
Strategic autovacuum
PDF
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
PDF
Oracle postgre sql-mirgration-top-10-mistakes
PDF
Profiling PL/pgSQL
PDF
Debugging Your PL/pgSQL Code
PDF
An Introduction To PostgreSQL Triggers
PDF
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
ODP
Introduction to PostgreSQL
ODP
Postgresql Federation
PPT
Leveraging Hadoop in your PostgreSQL Environment
Strategic autovacuum
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Oracle postgre sql-mirgration-top-10-mistakes
Profiling PL/pgSQL
Debugging Your PL/pgSQL Code
An Introduction To PostgreSQL Triggers
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
Introduction to PostgreSQL
Postgresql Federation
Leveraging Hadoop in your PostgreSQL Environment

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Machine Learning_overview_presentation.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mushroom cultivation and it's methods.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectroscopy.pptx food analysis technology
Getting Started with Data Integration: FME Form 101
Machine Learning_overview_presentation.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25-Week II
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
Mushroom cultivation and it's methods.pdf
A comparative study of natural language inference in Swahili using monolingua...
OMC Textile Division Presentation 2021.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Assigned Numbers - 2025 - Bluetooth® Document
Per capita expenditure prediction using model stacking based on satellite ima...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectral efficient network and resource selection model in 5G networks
Reach Out and Touch Someone: Haptics and Empathic Computing
gpt5_lecture_notes_comprehensive_20250812015547.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Scaling PostreSQL with Stado

  • 1. Scaling PostgreSQL with Stado
  • 2. Who Am I? • Jim Mlodgenski – Founder of Cirrus Technologies – Former Chief Architect of EnterpriseDB – Co-organizer of NYCPUG
  • 3. Agenda • What is Stado? • Architecture • Query Flow • Scaling • Limitations
  • 4. What is Stado? • Continuation of GridSQL • “Shared-Nothing”, distributed data architecture. – Leverage the power of multiple commodity servers while appearing as a single database to the application • Essentially... Open Source Greenplum, Netezza or Teradata
  • 5. Stado Details • Designed for Parallel Querying • Not just “Read-Only”, can execute UPDATE, DELETE • Data Loader for parallel loading • Standard connectivity via PostgreSQL compatible connectors: JDBC, ODBC, ADO.NET, libpq (psql)
  • 6. What Stado is not? • A replication solution like Slony or Bucardo • A high availability solution like Synchronous Replication in PostgreSQL 9.1 • A scalable transactional solution like PostgresXC • An elastic, eventually consistent NoSQL database
  • 7. Architecture • Loosely coupled, shared- nothing architecture • Data repositories – Metadata database – Stado database • Stado processes – Central coordinator – Agents
  • 8. Configuration • Can be configured for multiple logical “nodes” per physical server – Take advantage of multi-core processors • Tables may be either replicated or partitioned • Replicated tables for static lookup data or dimensions – Partitioned tables for large fact tables
  • 9. Partitioning • Tables may simultaneously use Stado Partitioning with Constraint Exclusion Partitioning – Large queries scan a much smaller subset of data by using subtables – Since each subtable is also partitioned across nodes, they are scanned in parallel – Queries execute much faster
  • 10. Creating Tables • Tables can be partitioned or replicated CREATE TABLE STATE_CODES ( STATE_CD varchar(2) PRIMARY KEY, USPS_CD varchar(2), NAME varchar(100), GNISIS varchar(8)) REPLICATED;
  • 11. Creating Tables CREATE TABLE roads ( gid integer NOT NULL, statefp character varying(2), countyfp character varying(3), linearid character varying(22), fullname character varying(100), rttyp character varying(1), mtfcc character varying(5), the_geom geometry) PARTITIONING KEY gid ON ALL;
  • 12. Query Optimization • Cost Based Optimizer – Takes into account Row Shipping (expensive) • Looks for joins with replicated tables – Can be done locally – Looks for joins between tables on partitioned columns
  • 13. Two Phase Aggregation • SUM – SUM(stat1) – SUM2(SUM(stat1) • AVG – SUM(stat1) / COUNT(stat1) – SUM2 (SUM(stat1)) / SUM2 (COUNT(stat1))
  • 14. Query 1 SELECT sum(st_length_spheroid(the_geom, 'SPHEROID["GRS_1980",6378137,298.257222101]'))/1609.344 as interstate_miles FROM roads WHERE rttyp = 'I'; interstate_miles ------------------ 84588.5425986619 (1 row)
  • 15. Query 1 : Results 120 100 Nodes Actual (sec) 80 1 101.2080566 Time (seconds) 4 25.6410708 60 Linear Actual 8 14.3321144 40 12 5.4738612 16 4.8214672 20 0 1 4 8 12 16 Nodes
  • 16. Query 2 SELECT s.name as state, c.name as county, a.population, b.road_length, a.population/b.road_length as person_per_km FROM (SELECT state_cd, county_cd, sum(population) as population FROM census_tract GROUP BY 1, 2) a, (SELECT statefp, countyfp, sum(st_length_spheroid(the_geom, 'SPHEROID["GRS_1980",6378137,298.257222101]'))/1000 as road_length FROM roads GROUP BY 1, 2) b, state_codes s, county_codes c WHERE a.state_cd = b.statefp AND a.county_cd = b.countyfp AND a.state_cd = c.state_cd AND a.county_cd = c.county_cd AND c.state_cd = s.state_cd ORDER BY 5 DESC LIMIT 20;
  • 17. state | county | population | road_length | person_per_km ----------------------+-----------------+------------+------------------+------------------ New York | New York | 1537195 | 1465.35561969273 | 1049.02521909483 New York | Kings | 2465326 | 2785.37685011507 | 885.096032839562 New York | Bronx | 1332650 | 1638.47925579201 | 813.345665066614 New York | Queens | 2229379 | 4343.78066667893 | 513.234707521383 New Jersey | Hudson | 608975 | 1474.86512729116 | 412.902162191933 California | San Francisco | 776733 | 2125.05706617179 | 365.51159607175 Pennsylvania | Philadelphia | 1517550 | 5067.19918355051 | 299.484970894054 District of Columbia | Washington | 572059 | 2191.33029860109 | 261.055579054054 New York | Richmond | 443728 | 1758.77468237864 | 252.293829588156 Massachusetts | Suffolk | 689807 | 2805.37242915611 | 245.887851762877 New Jersey | Essex | 793633 | 3359.22581976629 | 236.254733257324 Virginia | Alexandria City | 128283 | 577.98117468444 | 221.950135434841 Puerto Rico | San Juan | 434374 | 1994.26820504899 | 217.811224638829 Virginia | Arlington | 189453 | 967.505165121908 | 195.816008874876 New Jersey | Union | 522541 | 2827.74655887522 | 184.790605919029 Maryland | Baltimore City | 651154 | 3707.01218958787 | 175.654669231717 Puerto Rico | Catano | 30071 | 174.765650431886 | 172.064704509654 Hawaii | Honolulu | 876156 | 5098.8482067881 | 171.834101441493 Puerto Rico | Toa Baja | 94085 | 558.532996996738 | 168.450208861249 Puerto Rico | Carolina | 186076 | 1122.20560229076 | 165.812752690026 (20 rows)
  • 18. Query 2 : Results 4500 4000 3500 Nodes Actual (sec) 3000 1 3983.1002548 Time (seconds) 2500 4 1007.1235182 Linear Actual 2000 8 563.6259202 12 365.152858 1500 16 282.7345952 1000 500 0 1 4 8 12 16 Nodes
  • 20. Limitations • SQL Support – Uses its own parser and optimizer so: • No Window Functions • No Stored Procedures • No Full Text Search
  • 21. Transaction Performance • Single row Insert, Update, or Delete are slow compared to a single PostgreSQL instance – The data must make an additional network trip to be committed – All partitioned rows must be hashed to be mapped to the proper node – All replicated rows must be committed to all nodes • Use “gs-loader” for bulk loading for better performance
  • 22. High Availability • No heartbeat or fail-over control in the coordinator – High Availability for each PostgreSQL node must be configured separately – Streaming replication can be ideal for this • Getting a consistent backup of the entire Stado database is difficult – Must ensure there are no transaction are occurring – Backup each node separately
  • 23. Adding Nodes • Requires Downtime – Data must be manually reloaded to partition the data to the new node • With planning, the process can be fast with no mapping of data – Run multiple PostgreSQL instances on each physical server and move the PostgreSQL instances to new hardware as needed
  • 24. Summary • Stado can improve performance tremendously of queries • Stado can scale linearly as more nodes are added • Stado is open source so if the limitations are an issue, submit a patch
  • 25. Download Stado at: http://stado.us Jim Mlodgenski Email: jim@cirrusql.com Twitter: @jim_mlodgenski NYC PostgreSQL User Group http://nycpug.org