Scaling PostreSQL with Stado

Scaling PostgreSQL
with Stado

Who Am I?
• Jim Mlodgenski
– Founder of Cirrus Technologies
– Former Chief Architect of EnterpriseDB
– Co-organizer of NYCPUG

Agenda
• What is Stado?
• Architecture
• Query Flow
• Scaling
• Limitations

What is Stado?
• Continuation of GridSQL
• “Shared-Nothing”, distributed data architecture.
– Leverage the power of multiple commodity
servers while appearing as a single database
to the application
• Essentially...
Open Source
Greenplum, Netezza or Teradata

Stado Details
• Designed for Parallel Querying
• Not just “Read-Only”, can execute
UPDATE, DELETE
• Data Loader for parallel loading
• Standard connectivity via PostgreSQL
compatible connectors: JDBC, ODBC,
ADO.NET, libpq (psql)

What Stado is not?
• A replication solution like Slony or Bucardo
• A high availability solution like Synchronous
Replication in PostgreSQL 9.1
• A scalable transactional solution like PostgresXC
• An elastic, eventually consistent NoSQL database

Architecture
• Loosely coupled, shared-
nothing architecture
• Data repositories
– Metadata database
– Stado database
• Stado processes
– Central coordinator
– Agents

Configuration
• Can be configured for multiple logical “nodes” per
physical server
– Take advantage of multi-core processors
• Tables may be either replicated or partitioned
• Replicated tables for static lookup data or
dimensions
– Partitioned tables for large fact tables

Partitioning
• Tables may simultaneously use Stado
Partitioning with Constraint Exclusion
Partitioning
– Large queries scan a much smaller subset of
data by using subtables
– Since each subtable is also partitioned
across nodes, they are scanned in parallel
– Queries execute much faster

Creating Tables
• Tables can be partitioned or
replicated
CREATE TABLE STATE_CODES (
STATE_CD varchar(2) PRIMARY KEY,
USPS_CD varchar(2),
NAME varchar(100),
GNISIS varchar(8)) REPLICATED;

Creating Tables

CREATE TABLE roads (
gid integer NOT NULL,
statefp character varying(2),
countyfp character varying(3),
linearid character varying(22),
fullname character varying(100),
rttyp character varying(1),
mtfcc character varying(5),
the_geom geometry)
PARTITIONING KEY gid ON ALL;

Query Optimization
• Cost Based Optimizer
– Takes into account Row Shipping
(expensive)
• Looks for joins with replicated tables
– Can be done locally
– Looks for joins between tables on
partitioned columns

Two Phase Aggregation
• SUM
– SUM(stat1)
– SUM2(SUM(stat1)
• AVG
– SUM(stat1) / COUNT(stat1)
– SUM2 (SUM(stat1)) / SUM2 (COUNT(stat1))

Query 1
SELECT sum(st_length_spheroid(the_geom,
'SPHEROID["GRS_1980",6378137,298.257222101]'))/1609.344
as interstate_miles
FROM roads
WHERE rttyp = 'I';

interstate_miles
------------------
84588.5425986619
(1 row)

Query 1 :
Results
120

100

Nodes Actual (sec) 80

1 101.2080566

Time (seconds)
4 25.6410708 60 Linear
Actual
8 14.3321144
40
12 5.4738612
16 4.8214672
20

0
1 4 8 12 16

Nodes

Query 2
SELECT s.name as state, c.name as county, a.population, b.road_length,
a.population/b.road_length as person_per_km
FROM (SELECT state_cd, county_cd, sum(population) as population
FROM census_tract
GROUP BY 1, 2) a,
(SELECT statefp, countyfp,
sum(st_length_spheroid(the_geom,
'SPHEROID["GRS_1980",6378137,298.257222101]'))/1000 as road_length
FROM roads
GROUP BY 1, 2) b,
state_codes s, county_codes c
WHERE a.state_cd = b.statefp
AND a.county_cd = b.countyfp
AND a.state_cd = c.state_cd
AND a.county_cd = c.county_cd
AND c.state_cd = s.state_cd
ORDER BY 5 DESC
LIMIT 20;

state | county | population | road_length | person_per_km
----------------------+-----------------+------------+------------------+------------------
New York | New York | 1537195 | 1465.35561969273 | 1049.02521909483
New York | Kings | 2465326 | 2785.37685011507 | 885.096032839562
New York | Bronx | 1332650 | 1638.47925579201 | 813.345665066614
New York | Queens | 2229379 | 4343.78066667893 | 513.234707521383
New Jersey | Hudson | 608975 | 1474.86512729116 | 412.902162191933
California | San Francisco | 776733 | 2125.05706617179 | 365.51159607175
Pennsylvania | Philadelphia | 1517550 | 5067.19918355051 | 299.484970894054
District of Columbia | Washington | 572059 | 2191.33029860109 | 261.055579054054
New York | Richmond | 443728 | 1758.77468237864 | 252.293829588156
Massachusetts | Suffolk | 689807 | 2805.37242915611 | 245.887851762877
New Jersey | Essex | 793633 | 3359.22581976629 | 236.254733257324
Virginia | Alexandria City | 128283 | 577.98117468444 | 221.950135434841
Puerto Rico | San Juan | 434374 | 1994.26820504899 | 217.811224638829
Virginia | Arlington | 189453 | 967.505165121908 | 195.816008874876
New Jersey | Union | 522541 | 2827.74655887522 | 184.790605919029
Maryland | Baltimore City | 651154 | 3707.01218958787 | 175.654669231717
Puerto Rico | Catano | 30071 | 174.765650431886 | 172.064704509654
Hawaii | Honolulu | 876156 | 5098.8482067881 | 171.834101441493
Puerto Rico | Toa Baja | 94085 | 558.532996996738 | 168.450208861249
Puerto Rico | Carolina | 186076 | 1122.20560229076 | 165.812752690026
(20 rows)

Query 2 :
Results
4500

4000

3500
Nodes Actual (sec)
3000
1 3983.1002548

Time (seconds)
2500
4 1007.1235182 Linear
Actual
2000
8 563.6259202
12 365.152858 1500

16 282.7345952 1000

500

0
1 4 8 12 16

Nodes

Limitations
• SQL Support
– Uses its own parser and optimizer
so:
• No Window Functions
• No Stored Procedures
• No Full Text Search

Transaction Performance
• Single row Insert, Update, or Delete are slow compared
to a single PostgreSQL instance
– The data must make an additional network trip to be
committed
– All partitioned rows must be hashed to be mapped to
the proper node
– All replicated rows must be committed to all nodes
• Use “gs-loader” for bulk loading for better performance

High Availability
• No heartbeat or fail-over control in the coordinator
– High Availability for each PostgreSQL node must be
configured separately
– Streaming replication can be ideal for this
• Getting a consistent backup of the entire Stado
database is difficult
– Must ensure there are no transaction are occurring
– Backup each node separately

Adding Nodes
• Requires Downtime
– Data must be manually reloaded to partition
the data to the new node
• With planning, the process can be fast with no
mapping of data
– Run multiple PostgreSQL instances on each
physical server and move the PostgreSQL
instances to new hardware as needed

Summary
• Stado can improve performance
tremendously of queries
• Stado can scale linearly as more nodes
are added
• Stado is open source so if the
limitations are an issue,
submit a patch

Download Stado at:
http://stado.us

Jim Mlodgenski
Email: jim@cirrusql.com
Twitter: @jim_mlodgenski

NYC PostgreSQL User Group
http://nycpug.org

Scaling PostreSQL with Stado

More Related Content

What's hot (20)

Similar to Scaling PostreSQL with Stado (20)

More from Jim Mlodgenski (10)

Recently uploaded (20)

Scaling PostreSQL with Stado