SlideShare a Scribd company logo
Citus	
  5.0	
  
Extending	
  PostgreSQL	
  to	
  Build	
  a	
  	
  
Distributed	
  Database	
  
Ozgun	
  Erdogan	
  
on	
  behalf	
  of	
  Citus	
  Data	
  team	
  
Talk	
  Outline	
  
1.  IntroducEon	
  
2.  Citus	
  5.0	
  and	
  its	
  use	
  of	
  extension	
  APIs	
  
3.  Distributed	
  query	
  planning	
  
4.  Different	
  distributed	
  executors	
  for	
  different	
  
workloads	
  
•  Three	
  technical	
  lightning	
  talks	
  in	
  one	
  
What	
  is	
  Citus?	
  
•  Citus	
  extends	
  PostgreSQL	
  (not	
  a	
  fork)	
  to	
  provide	
  
it	
  with	
  distributed	
  funcEonality.	
  
•  Citus	
  scales-­‐out	
  Postgres	
  across	
  servers	
  using	
  
sharding	
  and	
  replicaEon.	
  Its	
  query	
  engine	
  
parallelizes	
  SQL	
  queries	
  across	
  many	
  servers.	
  
•  Citus	
  5.0	
  is	
  open	
  source:	
  hVps://github.com/
citusdata/citus	
  
Citus	
  5.0	
  Architecture	
  Diagram	
  
Events	
  
Citus	
  worker	
  1	
  
(PostgreSQL	
  +	
  
Citus	
  extension)	
  
…	
  
…	
   …	
   …	
  
Citus	
  coordinator	
  
(PostgreSQL	
  +	
  
Citus	
  extension)	
  
	
  
Distributed	
  table	
  
(metadata)	
  
E1	
   E3’	
  
Citus	
  worker	
  2	
  
…	
  
…	
   …	
   …	
  
E2	
   E1’	
  
Citus	
  worker	
  N	
  
…	
  
…	
   …	
   …	
  
E3	
   E2’	
  
…	
  
Regular	
  tables	
  
(1	
  shard	
  =	
  	
  
1	
  Postgres	
  table)	
  
When	
  is	
  Citus	
  a	
  good	
  fit?	
  
•  Scaling	
  a	
  mulE-­‐tenant	
  (B2B)	
  database	
  to	
  100K+	
  tenants	
  
•  Sub-­‐second	
  OLAP	
  queries	
  on	
  data	
  as	
  it	
  arrives	
  
•  Powering	
  real-­‐Eme	
  analyEc	
  dashboards	
  
•  Exploratory	
  queries	
  on	
  events	
  as	
  they	
  arrive	
  
•  Who	
  is	
  using	
  Citus?	
  
•  CloudFlare	
  uses	
  Citus	
  to	
  power	
  their	
  analyEc	
  dashboards	
  
•  Neustar	
  builds	
  ad-­‐tech	
  infrastructure	
  with	
  HyperLogLog	
  
•  Heap	
  powers	
  funnel,	
  segmentaEon,	
  and	
  cohort	
  queries	
  
SQL,	
  Scaling	
  out,	
  and	
  What’s	
  
Unique	
  About	
  PostgreSQL?	
  
“SQL	
  doesn’t	
  Scale”	
  
1.  Scaling-­‐out	
  is	
  hard.	
  Scaling	
  data,	
  compared	
  to	
  
scaling	
  computaEons,	
  is	
  even	
  harder.	
  
2.  SQL	
  means	
  different	
  things	
  to	
  different	
  people:	
  
transacEonal	
  workloads,	
  short	
  reads/writes,	
  real-­‐
Eme	
  analyEcs,	
  data	
  warehousing,	
  or	
  triggers.	
  
3.  SQL	
  doesn’t	
  have	
  the	
  no1on	
  of	
  “distribu1on”	
  built	
  
into	
  the	
  language.	
  This	
  can	
  be	
  added	
  in,	
  but	
  not	
  
there	
  in	
  SQL.	
  
Query	
  Languages:	
  An	
  Example	
  
SQL	
  RouEng	
  /	
  ReplicaEon	
  
•  Simple	
  INSERT	
  rouEng	
  and	
  replicaEon	
  
1.  Parse	
  plain	
  text	
  SQL	
  query	
  
2.  Check	
  column	
  values	
  and	
  types	
  against	
  table	
  schema	
  
3.  Apply	
  opEmizaEons,	
  such	
  as	
  constant	
  folding	
  
4.  Determine	
  “billgates”	
  is	
  the	
  distribuEon	
  key	
  
5.  Only	
  then	
  can	
  you	
  route	
  and	
  replicate	
  INSERT	
  
•  What	
  about	
  my	
  SELECT	
  queries?	
  
Takeaway	
  
	
  
When	
  you’re	
  scaling	
  out	
  a	
  SQL	
  query,	
  your	
  
“query	
  distribuEon”	
  logic	
  needs	
  to	
  work	
  
together	
  with	
  the	
  part	
  that	
  understands	
  the	
  
query.	
  
How	
  to	
  overcome	
  this?	
  
1.  ApplicaEon	
  level	
  sharding	
  
2.  Build	
  a	
  distributed	
  database	
  from	
  scratch	
  
3.  Extend	
  on	
  core	
  for	
  agreed	
  upon	
  use-­‐case	
  
•  MulE-­‐master	
  for	
  replicaEon	
  and	
  HA;	
  parEEoning	
  
•  Build	
  middleware	
  for	
  open	
  source	
  database	
  
4.  Fork	
  an	
  open	
  source	
  database	
  
	
  
PostgreSQL	
  Extension	
  APIs	
  
•  CREATE	
  EXTENSION	
  citus;	
  
•  Metadata	
  stored	
  in	
  Postgres	
  tables	
  
•  User-­‐defined	
  funcEons	
  to	
  extend	
  SQL	
  syntax	
  
•  Hooks:	
  Planner,	
  executor,	
  and	
  uElity	
  hooks	
  
•  Similar	
  to	
  interceptors	
  in	
  Java	
  frameworks	
  
Citus	
  Planner	
  Example	
  
Citus	
  
Summary	
  
•  PostgreSQL’s	
  extensible	
  architecture	
  puts	
  it	
  
in	
  a	
  unique	
  place	
  to	
  scale	
  out	
  SQL	
  and	
  also	
  
adapt	
  to	
  evolving	
  hardware	
  trends.	
  
•  It	
  could	
  just	
  be	
  that	
  the	
  monolithic	
  SQL	
  
database	
  is	
  dying.	
  If	
  so,	
  long	
  live	
  Postgres!	
  
Why	
  is	
  distributed	
  query	
  
planning	
  (SELECTs)	
  hard?	
  	
  	
  
Past	
  Experiences	
  
•  Built	
  a	
  similar	
  distributed	
  data	
  processing	
  engine	
  at	
  
Amazon	
  called	
  CSPIT	
  
•  Led	
  by	
  a	
  visionary	
  architect	
  and	
  built	
  by	
  an	
  
extremely	
  talented	
  team	
  
•  Scaled	
  to	
  (at	
  best)	
  a	
  dozen	
  machines.	
  Nicely	
  
distributed	
  basic	
  computaEons	
  across	
  machines	
  
•  Then	
  the	
  dream	
  met	
  reality	
  
Why	
  did	
  it	
  fail?	
  
•  You	
  can	
  solve	
  all	
  distributed	
  systems	
  
problems	
  in	
  one	
  of	
  two	
  days:	
  
1.  Bring	
  your	
  data	
  to	
  the	
  computaEon	
  
2.  Push	
  your	
  computaEon	
  to	
  the	
  data	
  
Bringing	
  data	
  to	
  computaEon	
  (1)	
  
Bringing	
  computaEon	
  to	
  data	
  (2)	
  
Slightly	
  more	
  complex	
  queries	
  
•  Sum(price):	
  sum(price)	
  on	
  worker	
  nodes	
  and	
  
then	
  sum()	
  intermediate	
  results	
  
•  Avg(price):	
  Can	
  you	
  avg(price)	
  on	
  worker	
  
nodes	
  and	
  then	
  avg()	
  intermediate	
  results?	
  
•  Why	
  not?	
  
CommutaEve	
  ComputaEons	
  
•  If	
  you	
  can	
  transform	
  your	
  computaEons	
  into	
  
their	
  commutaEve	
  form,	
  then	
  you	
  can	
  push	
  
them	
  down.	
  
•  (a	
  +	
  b	
  =	
  b	
  +	
  a	
  ;	
  a	
  /	
  b	
  ≠	
  b	
  /	
  a)	
  	
  (*)	
  
•  AssociaEve	
  and	
  distribuEve	
  property	
  for	
  other	
  
operaEons	
  (We	
  also	
  knew	
  about	
  this)	
  
How	
  does	
  this	
  help	
  me?	
  
•  CommutaEve,	
  associaEve,	
  and	
  distribuEve	
  
properEes	
  hold	
  for	
  any	
  query	
  language	
  
•  We	
  pick	
  SQL	
  as	
  an	
  example	
  language	
  
•  SQL	
  uses	
  RelaEonal	
  Algebra	
  to	
  express	
  a	
  query	
  
•  If	
  a	
  query	
  has	
  a	
  WHERE	
  clause	
  in	
  it,	
  that’s	
  a	
  
FILTER	
  node	
  in	
  the	
  relaEonal	
  algebra	
  tree	
  
Simple	
  SQL	
  query	
  
Distributed	
  Logical	
  Plan	
  (unopEmized)	
  
Distributed	
  Logical	
  Plan	
  (opEmized)	
  
Takeaway	
  
	
  
In	
  the	
  land	
  of	
  distributed	
  systems,	
  the	
  
commutaEve	
  (and	
  distribuEve)	
  property	
  is	
  king!	
  
Transform	
  your	
  queries	
  with	
  respect	
  to	
  the	
  king,	
  
and	
  they	
  will	
  scale!	
  
One	
  example	
  doesn’t	
  make	
  a	
  proof	
  
•  Can	
  you	
  prove	
  this	
  model	
  is	
  complete?	
  
•  RelaEonal	
  Algebra	
  has	
  10	
  operators	
  
•  What	
  about	
  opEmizing	
  more	
  complex	
  
plans	
  with	
  joins,	
  subselects,	
  and	
  other	
  
constructs?	
  
MulE-­‐RelaEonal	
  Algebra	
  
•  Correctness	
  of	
  Query	
  ExecuEon	
  Strategies	
  in	
  
Distributed	
  Databases	
  Ceri	
  and	
  Pelagao,	
  1983	
  
•  A	
  Distributed	
  Database	
  paper	
  from	
  a	
  more	
  
civilized	
  age	
  
•  Models	
  each	
  relaEonal	
  algebra	
  operator	
  as	
  a	
  
distributed	
  operator	
  and	
  extends	
  it	
  
CommutaEve	
  Property	
  Rules	
  
DistribuEve	
  Property	
  Rules	
  
FactorizaEon	
  Rules	
  
Two	
  important	
  notes	
  (1)	
  
Logical	
  plan	
  ≠	
  Physical	
  plan	
  
•  “Join”	
  is	
  a	
  logical	
  operator.	
  HashJoin	
  or	
  MergeJoin	
  is	
  a	
  
physical	
  operator.	
  
•  It’s	
  easier	
  to	
  reason	
  about	
  logical	
  operators’	
  
mathemaEcal	
  properEes	
  than	
  those	
  of	
  physical	
  
operators.	
  
•  Distributed	
  databases	
  that	
  start	
  from	
  a	
  “database”	
  
usually	
  extend	
  physical	
  operators.	
  (Greenplum,	
  
Redshis)	
  
	
  
Two	
  important	
  notes	
  (2)	
  
MulE-­‐relaEonal	
  Algebra	
  offers	
  a	
  complete	
  
foundaEon	
  for	
  distribuEng	
  SQL	
  queries.	
  
•  Citus	
  is	
  adding	
  more	
  SQL	
  funcEonality	
  with	
  each	
  
release.	
  
•  From	
  a	
  use-­‐case	
  standpoint,	
  think	
  of	
  Citus	
  not	
  as	
  
a	
  replacement	
  to	
  your	
  data	
  warehouse,	
  and	
  
instead	
  as	
  extending	
  it	
  with	
  real-­‐Eme	
  capabiliEes.	
  
Summary	
  
•  To	
  scale	
  out,	
  you	
  need	
  to	
  transform	
  your	
  
computaEons	
  into	
  their	
  commutaEve	
  and	
  
distribuEve	
  form.	
  
•  Correctness	
  of	
  Query	
  ExecuEon	
  Strategies	
  in	
  
Distributed	
  Databases	
  (1983)	
  offers	
  a	
  
framework	
  to	
  do	
  this	
  for	
  relaEonal	
  algebra.	
  
Distributed	
  Query	
  ExecuEon	
  
across	
  Different	
  Workloads	
  
Different	
  Workloads	
  
1.  Simple	
  Insert	
  /	
  Update	
  /	
  Delete	
  /	
  Select	
  commands	
  
•  High	
  throughput	
  and	
  low	
  latency	
  
2.  Real-­‐Eme	
  Select	
  queries	
  that	
  get	
  parallelized	
  to	
  hundreds	
  of	
  
shards	
  (<300ms)	
  
3.  Long	
  running	
  Select	
  queries	
  that	
  join	
  large	
  tables	
  
•  You	
  can’t	
  restart	
  a	
  Select	
  query	
  just	
  because	
  one	
  task	
  (or	
  one	
  
machine)	
  in	
  1M	
  tasks	
  failed	
  
	
  
	
  
Different	
  Executors	
  
1.  Router	
  Executor:	
  Simple	
  Insert	
  /	
  Update	
  /	
  Delete	
  /	
  
Select	
  commands	
  
2.  Real-­‐Eme	
  Executor:	
  Real-­‐Eme	
  Select	
  queries	
  that	
  
touch	
  100s	
  of	
  shards	
  (<300ms)	
  
3.  Task-­‐tracker	
  Executor:	
  Longer	
  running	
  queries	
  that	
  
need	
  to	
  scale	
  out	
  to	
  10K-­‐1M	
  tasks	
  
	
  
	
  
Conclusions	
  
•  Distributed	
  relaEonal	
  databases	
  is	
  hard	
  
•  PostgreSQL	
  and	
  its	
  extension	
  APIs	
  are	
  unique	
  
•  Citus	
  targets	
  real-­‐Eme	
  data	
  ingest	
  and	
  
querying	
  
•  Citus	
  5.0	
  is	
  open	
  source:	
  hVps://github.com/
citusdata/citus	
  
QuesEons	
  
hVps://citusdata.com	
  
Forums:	
  groups.google.com/forum/#!forum/
citus-­‐users	
  

More Related Content

PDF
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
PPTX
Full Page Writes in PostgreSQL PGCONFEU 2022
PDF
What's New in Apache Hive
PDF
Deploying Flink on Kubernetes - David Anderson
PDF
Solving PostgreSQL wicked problems
PDF
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
PPTX
Git - Basic Crash Course
PPTX
Apache airflow
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Full Page Writes in PostgreSQL PGCONFEU 2022
What's New in Apache Hive
Deploying Flink on Kubernetes - David Anderson
Solving PostgreSQL wicked problems
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Git - Basic Crash Course
Apache airflow

What's hot (20)

PDF
Presto on YARNの導入・運用
PDF
Oracle database performance tuning
PPTX
Grafana
PDF
Introduction to elasticsearch
PPTX
Monitoring With Prometheus
PDF
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
Introducing Apache Airflow and how we are using it
ODP
Introduction to Version Control
PDF
Altinity Quickstart for ClickHouse-2202-09-15.pdf
PDF
MySQL on AWS RDS
PDF
Introduction VAUUM, Freezing, XID wraparound
PDF
What’s New in the Upcoming Apache Spark 3.0
PPTX
Prometheus and Grafana
PDF
MongoDB WiredTiger Internals: Journey To Transactions
PDF
Practical Partitioning in Production with Postgres
 
PDF
Grafana introduction
PPTX
PostgreSQL.pptx
PPTX
Introduction to Redis
PDF
Infrastructure & System Monitoring using Prometheus
Presto on YARNの導入・運用
Oracle database performance tuning
Grafana
Introduction to elasticsearch
Monitoring With Prometheus
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Introducing Apache Airflow and how we are using it
Introduction to Version Control
Altinity Quickstart for ClickHouse-2202-09-15.pdf
MySQL on AWS RDS
Introduction VAUUM, Freezing, XID wraparound
What’s New in the Upcoming Apache Spark 3.0
Prometheus and Grafana
MongoDB WiredTiger Internals: Journey To Transactions
Practical Partitioning in Production with Postgres
 
Grafana introduction
PostgreSQL.pptx
Introduction to Redis
Infrastructure & System Monitoring using Prometheus
Ad

Similar to Citus Architecture: Extending Postgres to Build a Distributed Database (20)

PDF
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...
PDF
Let's scale-out PostgreSQL using Citus (English)
PPTX
Chjkkkkkkkkkkkkkkkkkjjjjjjjjjjjjjjjjjjjjjjjjjj01_The Basics.pptx
PDF
The Challenges of Distributing Postgres: A Citus Story
PDF
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...
PDF
Open Source SQL Databases
PPTX
PostgreSQL Terminology
PDF
PostgreSQL - Case Study
PDF
Cjoin
PDF
Implementing Highly Performant Distributed Aggregates
PPTX
Modern sql
PPTX
PostgreSQL - Object Relational Database
PDF
Intro to Databases
PDF
The Accidental DBA
ODP
Introduction to PostgreSQL
PDF
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
PDF
Postgres-XC: Symmetric PostgreSQL Cluster
PPTX
Oracle to Postgres Schema Migration Hustle
 
PDF
Before vs After: Redesigning a Website to be Useful and Informative for Devel...
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...
Let's scale-out PostgreSQL using Citus (English)
Chjkkkkkkkkkkkkkkkkkjjjjjjjjjjjjjjjjjjjjjjjjjj01_The Basics.pptx
The Challenges of Distributing Postgres: A Citus Story
The Challenges of Distributing Postgres: A Citus Story | DataEngConf NYC 2017...
Open Source SQL Databases
PostgreSQL Terminology
PostgreSQL - Case Study
Cjoin
Implementing Highly Performant Distributed Aggregates
Modern sql
PostgreSQL - Object Relational Database
Intro to Databases
The Accidental DBA
Introduction to PostgreSQL
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Postgres-XC: Symmetric PostgreSQL Cluster
Oracle to Postgres Schema Migration Hustle
 
Before vs After: Redesigning a Website to be Useful and Informative for Devel...
Ad

Recently uploaded (20)

PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Global journeys: estimating international migration
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Computer network topology notes for revision
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Foundation of Data Science unit number two notes
Reliability_Chapter_ presentation 1221.5784
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Acumen Training GuidePresentation.pptx
Global journeys: estimating international migration
Galatica Smart Energy Infrastructure Startup Pitch Deck
IB Computer Science - Internal Assessment.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Mega Projects Data Mega Projects Data
STUDY DESIGN details- Lt Col Maksud (21).pptx
.pdf is not working space design for the following data for the following dat...
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Database Infoormation System (DBIS).pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Computer network topology notes for revision
Introduction-to-Cloud-ComputingFinal.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Miokarditis (Inflamasi pada Otot Jantung)
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Supervised vs unsupervised machine learning algorithms
Foundation of Data Science unit number two notes

Citus Architecture: Extending Postgres to Build a Distributed Database

  • 1. Citus  5.0   Extending  PostgreSQL  to  Build  a     Distributed  Database   Ozgun  Erdogan   on  behalf  of  Citus  Data  team  
  • 2. Talk  Outline   1.  IntroducEon   2.  Citus  5.0  and  its  use  of  extension  APIs   3.  Distributed  query  planning   4.  Different  distributed  executors  for  different   workloads   •  Three  technical  lightning  talks  in  one  
  • 3. What  is  Citus?   •  Citus  extends  PostgreSQL  (not  a  fork)  to  provide   it  with  distributed  funcEonality.   •  Citus  scales-­‐out  Postgres  across  servers  using   sharding  and  replicaEon.  Its  query  engine   parallelizes  SQL  queries  across  many  servers.   •  Citus  5.0  is  open  source:  hVps://github.com/ citusdata/citus  
  • 4. Citus  5.0  Architecture  Diagram   Events   Citus  worker  1   (PostgreSQL  +   Citus  extension)   …   …   …   …   Citus  coordinator   (PostgreSQL  +   Citus  extension)     Distributed  table   (metadata)   E1   E3’   Citus  worker  2   …   …   …   …   E2   E1’   Citus  worker  N   …   …   …   …   E3   E2’   …   Regular  tables   (1  shard  =     1  Postgres  table)  
  • 5. When  is  Citus  a  good  fit?   •  Scaling  a  mulE-­‐tenant  (B2B)  database  to  100K+  tenants   •  Sub-­‐second  OLAP  queries  on  data  as  it  arrives   •  Powering  real-­‐Eme  analyEc  dashboards   •  Exploratory  queries  on  events  as  they  arrive   •  Who  is  using  Citus?   •  CloudFlare  uses  Citus  to  power  their  analyEc  dashboards   •  Neustar  builds  ad-­‐tech  infrastructure  with  HyperLogLog   •  Heap  powers  funnel,  segmentaEon,  and  cohort  queries  
  • 6. SQL,  Scaling  out,  and  What’s   Unique  About  PostgreSQL?  
  • 7. “SQL  doesn’t  Scale”   1.  Scaling-­‐out  is  hard.  Scaling  data,  compared  to   scaling  computaEons,  is  even  harder.   2.  SQL  means  different  things  to  different  people:   transacEonal  workloads,  short  reads/writes,  real-­‐ Eme  analyEcs,  data  warehousing,  or  triggers.   3.  SQL  doesn’t  have  the  no1on  of  “distribu1on”  built   into  the  language.  This  can  be  added  in,  but  not   there  in  SQL.  
  • 8. Query  Languages:  An  Example  
  • 9. SQL  RouEng  /  ReplicaEon   •  Simple  INSERT  rouEng  and  replicaEon   1.  Parse  plain  text  SQL  query   2.  Check  column  values  and  types  against  table  schema   3.  Apply  opEmizaEons,  such  as  constant  folding   4.  Determine  “billgates”  is  the  distribuEon  key   5.  Only  then  can  you  route  and  replicate  INSERT   •  What  about  my  SELECT  queries?  
  • 10. Takeaway     When  you’re  scaling  out  a  SQL  query,  your   “query  distribuEon”  logic  needs  to  work   together  with  the  part  that  understands  the   query.  
  • 11. How  to  overcome  this?   1.  ApplicaEon  level  sharding   2.  Build  a  distributed  database  from  scratch   3.  Extend  on  core  for  agreed  upon  use-­‐case   •  MulE-­‐master  for  replicaEon  and  HA;  parEEoning   •  Build  middleware  for  open  source  database   4.  Fork  an  open  source  database    
  • 12. PostgreSQL  Extension  APIs   •  CREATE  EXTENSION  citus;   •  Metadata  stored  in  Postgres  tables   •  User-­‐defined  funcEons  to  extend  SQL  syntax   •  Hooks:  Planner,  executor,  and  uElity  hooks   •  Similar  to  interceptors  in  Java  frameworks  
  • 14. Summary   •  PostgreSQL’s  extensible  architecture  puts  it   in  a  unique  place  to  scale  out  SQL  and  also   adapt  to  evolving  hardware  trends.   •  It  could  just  be  that  the  monolithic  SQL   database  is  dying.  If  so,  long  live  Postgres!  
  • 15. Why  is  distributed  query   planning  (SELECTs)  hard?      
  • 16. Past  Experiences   •  Built  a  similar  distributed  data  processing  engine  at   Amazon  called  CSPIT   •  Led  by  a  visionary  architect  and  built  by  an   extremely  talented  team   •  Scaled  to  (at  best)  a  dozen  machines.  Nicely   distributed  basic  computaEons  across  machines   •  Then  the  dream  met  reality  
  • 17. Why  did  it  fail?   •  You  can  solve  all  distributed  systems   problems  in  one  of  two  days:   1.  Bring  your  data  to  the  computaEon   2.  Push  your  computaEon  to  the  data  
  • 18. Bringing  data  to  computaEon  (1)  
  • 19. Bringing  computaEon  to  data  (2)  
  • 20. Slightly  more  complex  queries   •  Sum(price):  sum(price)  on  worker  nodes  and   then  sum()  intermediate  results   •  Avg(price):  Can  you  avg(price)  on  worker   nodes  and  then  avg()  intermediate  results?   •  Why  not?  
  • 21. CommutaEve  ComputaEons   •  If  you  can  transform  your  computaEons  into   their  commutaEve  form,  then  you  can  push   them  down.   •  (a  +  b  =  b  +  a  ;  a  /  b  ≠  b  /  a)    (*)   •  AssociaEve  and  distribuEve  property  for  other   operaEons  (We  also  knew  about  this)  
  • 22. How  does  this  help  me?   •  CommutaEve,  associaEve,  and  distribuEve   properEes  hold  for  any  query  language   •  We  pick  SQL  as  an  example  language   •  SQL  uses  RelaEonal  Algebra  to  express  a  query   •  If  a  query  has  a  WHERE  clause  in  it,  that’s  a   FILTER  node  in  the  relaEonal  algebra  tree  
  • 24. Distributed  Logical  Plan  (unopEmized)  
  • 25. Distributed  Logical  Plan  (opEmized)  
  • 26. Takeaway     In  the  land  of  distributed  systems,  the   commutaEve  (and  distribuEve)  property  is  king!   Transform  your  queries  with  respect  to  the  king,   and  they  will  scale!  
  • 27. One  example  doesn’t  make  a  proof   •  Can  you  prove  this  model  is  complete?   •  RelaEonal  Algebra  has  10  operators   •  What  about  opEmizing  more  complex   plans  with  joins,  subselects,  and  other   constructs?  
  • 28. MulE-­‐RelaEonal  Algebra   •  Correctness  of  Query  ExecuEon  Strategies  in   Distributed  Databases  Ceri  and  Pelagao,  1983   •  A  Distributed  Database  paper  from  a  more   civilized  age   •  Models  each  relaEonal  algebra  operator  as  a   distributed  operator  and  extends  it  
  • 32. Two  important  notes  (1)   Logical  plan  ≠  Physical  plan   •  “Join”  is  a  logical  operator.  HashJoin  or  MergeJoin  is  a   physical  operator.   •  It’s  easier  to  reason  about  logical  operators’   mathemaEcal  properEes  than  those  of  physical   operators.   •  Distributed  databases  that  start  from  a  “database”   usually  extend  physical  operators.  (Greenplum,   Redshis)    
  • 33. Two  important  notes  (2)   MulE-­‐relaEonal  Algebra  offers  a  complete   foundaEon  for  distribuEng  SQL  queries.   •  Citus  is  adding  more  SQL  funcEonality  with  each   release.   •  From  a  use-­‐case  standpoint,  think  of  Citus  not  as   a  replacement  to  your  data  warehouse,  and   instead  as  extending  it  with  real-­‐Eme  capabiliEes.  
  • 34. Summary   •  To  scale  out,  you  need  to  transform  your   computaEons  into  their  commutaEve  and   distribuEve  form.   •  Correctness  of  Query  ExecuEon  Strategies  in   Distributed  Databases  (1983)  offers  a   framework  to  do  this  for  relaEonal  algebra.  
  • 35. Distributed  Query  ExecuEon   across  Different  Workloads  
  • 36. Different  Workloads   1.  Simple  Insert  /  Update  /  Delete  /  Select  commands   •  High  throughput  and  low  latency   2.  Real-­‐Eme  Select  queries  that  get  parallelized  to  hundreds  of   shards  (<300ms)   3.  Long  running  Select  queries  that  join  large  tables   •  You  can’t  restart  a  Select  query  just  because  one  task  (or  one   machine)  in  1M  tasks  failed      
  • 37. Different  Executors   1.  Router  Executor:  Simple  Insert  /  Update  /  Delete  /   Select  commands   2.  Real-­‐Eme  Executor:  Real-­‐Eme  Select  queries  that   touch  100s  of  shards  (<300ms)   3.  Task-­‐tracker  Executor:  Longer  running  queries  that   need  to  scale  out  to  10K-­‐1M  tasks      
  • 38. Conclusions   •  Distributed  relaEonal  databases  is  hard   •  PostgreSQL  and  its  extension  APIs  are  unique   •  Citus  targets  real-­‐Eme  data  ingest  and   querying   •  Citus  5.0  is  open  source:  hVps://github.com/ citusdata/citus  
  • 39. QuesEons   hVps://citusdata.com   Forums:  groups.google.com/forum/#!forum/ citus-­‐users