Citus Architecture: Extending Postgres to Build a Distributed Database

Citus
5.0

Extending
PostgreSQL
to
Build
a

Distributed
Database

Ozgun
Erdogan

on
behalf
of
Citus
Data
team

Talk
Outline

1.  IntroducEon

2.  Citus
5.0
and
its
use
of
extension
APIs

3.  Distributed
query
planning

4.  Diﬀerent
distributed
executors
for
diﬀerent

workloads

•  Three
technical
lightning
talks
in
one

What
is
Citus?

•  Citus
extends
PostgreSQL
(not
a
fork)
to
provide

it
with
distributed
funcEonality.

•  Citus
scales-‐out
Postgres
across
servers
using

sharding
and
replicaEon.
Its
query
engine

parallelizes
SQL
queries
across
many
servers.

•  Citus
5.0
is
open
source:
hVps://github.com/
citusdata/citus

Citus
5.0
Architecture
Diagram

Events

Citus
worker
1

(PostgreSQL
+

Citus
extension)

…

…
…
…

Citus
coordinator

(PostgreSQL
+

Citus
extension)

Distributed
table

(metadata)

E1
E3’

Citus
worker
2

…

…
…
…

E2
E1’

Citus
worker
N

…

…
…
…

E3
E2’

…

Regular
tables

(1
shard
=

1
Postgres
table)

When
is
Citus
a
good
ﬁt?

•  Scaling
a
mulE-‐tenant
(B2B)
database
to
100K+
tenants

•  Sub-‐second
OLAP
queries
on
data
as
it
arrives

•  Powering
real-‐Eme
analyEc
dashboards

•  Exploratory
queries
on
events
as
they
arrive

•  Who
is
using
Citus?

•  CloudFlare
uses
Citus
to
power
their
analyEc
dashboards

•  Neustar
builds
ad-‐tech
infrastructure
with
HyperLogLog

•  Heap
powers
funnel,
segmentaEon,
and
cohort
queries

SQL,
Scaling
out,
and
What’s

Unique
About
PostgreSQL?

“SQL
doesn’t
Scale”

1.  Scaling-‐out
is
hard.
Scaling
data,
compared
to

scaling
computaEons,
is
even
harder.

2.  SQL
means
diﬀerent
things
to
diﬀerent
people:

transacEonal
workloads,
short
reads/writes,
real-‐
Eme
analyEcs,
data
warehousing,
or
triggers.

3.  SQL
doesn’t
have
the
no1on
of
“distribu1on”
built

into
the
language.
This
can
be
added
in,
but
not

there
in
SQL.

Query
Languages:
An
Example

SQL
RouEng
/
ReplicaEon

•  Simple
INSERT
rouEng
and
replicaEon

1.  Parse
plain
text
SQL
query

2.  Check
column
values
and
types
against
table
schema

3.  Apply
opEmizaEons,
such
as
constant
folding

4.  Determine
“billgates”
is
the
distribuEon
key

5.  Only
then
can
you
route
and
replicate
INSERT

•  What
about
my
SELECT
queries?

Takeaway

When
you’re
scaling
out
a
SQL
query,
your

“query
distribuEon”
logic
needs
to
work

together
with
the
part
that
understands
the

query.

How
to
overcome
this?

1.  ApplicaEon
level
sharding

2.  Build
a
distributed
database
from
scratch

3.  Extend
on
core
for
agreed
upon
use-‐case

•  MulE-‐master
for
replicaEon
and
HA;
parEEoning

•  Build
middleware
for
open
source
database

4.  Fork
an
open
source
database

PostgreSQL
Extension
APIs

•  CREATE
EXTENSION
citus;

•  Metadata
stored
in
Postgres
tables

•  User-‐deﬁned
funcEons
to
extend
SQL
syntax

•  Hooks:
Planner,
executor,
and
uElity
hooks

•  Similar
to
interceptors
in
Java
frameworks

Citus
Planner
Example

Citus

Summary

•  PostgreSQL’s
extensible
architecture
puts
it

in
a
unique
place
to
scale
out
SQL
and
also

adapt
to
evolving
hardware
trends.

•  It
could
just
be
that
the
monolithic
SQL

database
is
dying.
If
so,
long
live
Postgres!

Why
is
distributed
query

planning
(SELECTs)
hard?

Past
Experiences

•  Built
a
similar
distributed
data
processing
engine
at

Amazon
called
CSPIT

•  Led
by
a
visionary
architect
and
built
by
an

extremely
talented
team

•  Scaled
to
(at
best)
a
dozen
machines.
Nicely

distributed
basic
computaEons
across
machines

•  Then
the
dream
met
reality

Why
did
it
fail?

•  You
can
solve
all
distributed
systems

problems
in
one
of
two
days:

1.  Bring
your
data
to
the
computaEon

2.  Push
your
computaEon
to
the
data

Bringing
data
to
computaEon
(1)

Bringing
computaEon
to
data
(2)

Slightly
more
complex
queries

•  Sum(price):
sum(price)
on
worker
nodes
and

then
sum()
intermediate
results

•  Avg(price):
Can
you
avg(price)
on
worker

nodes
and
then
avg()
intermediate
results?

•  Why
not?

CommutaEve
ComputaEons

•  If
you
can
transform
your
computaEons
into

their
commutaEve
form,
then
you
can
push

them
down.

•  (a
+
b
=
b
+
a
;
a
/
b
≠
b
/
a)

(*)

•  AssociaEve
and
distribuEve
property
for
other

operaEons
(We
also
knew
about
this)

How
does
this
help
me?

•  CommutaEve,
associaEve,
and
distribuEve

properEes
hold
for
any
query
language

•  We
pick
SQL
as
an
example
language

•  SQL
uses
RelaEonal
Algebra
to
express
a
query

•  If
a
query
has
a
WHERE
clause
in
it,
that’s
a

FILTER
node
in
the
relaEonal
algebra
tree

Distributed
Logical
Plan
(unopEmized)

Distributed
Logical
Plan
(opEmized)

Takeaway

In
the
land
of
distributed
systems,
the

commutaEve
(and
distribuEve)
property
is
king!

Transform
your
queries
with
respect
to
the
king,

and
they
will
scale!

One
example
doesn’t
make
a
proof

•  Can
you
prove
this
model
is
complete?

•  RelaEonal
Algebra
has
10
operators

•  What
about
opEmizing
more
complex

plans
with
joins,
subselects,
and
other

constructs?

MulE-‐RelaEonal
Algebra

•  Correctness
of
Query
ExecuEon
Strategies
in

Distributed
Databases
Ceri
and
Pelagao,
1983

•  A
Distributed
Database
paper
from
a
more

civilized
age

•  Models
each
relaEonal
algebra
operator
as
a

distributed
operator
and
extends
it

CommutaEve
Property
Rules

DistribuEve
Property
Rules

Two
important
notes
(1)

Logical
plan
≠
Physical
plan

•  “Join”
is
a
logical
operator.
HashJoin
or
MergeJoin
is
a

physical
operator.

•  It’s
easier
to
reason
about
logical
operators’

mathemaEcal
properEes
than
those
of
physical

operators.

•  Distributed
databases
that
start
from
a
“database”

usually
extend
physical
operators.
(Greenplum,

Redshis)

Two
important
notes
(2)

MulE-‐relaEonal
Algebra
oﬀers
a
complete

foundaEon
for
distribuEng
SQL
queries.

•  Citus
is
adding
more
SQL
funcEonality
with
each

release.

•  From
a
use-‐case
standpoint,
think
of
Citus
not
as

a
replacement
to
your
data
warehouse,
and

instead
as
extending
it
with
real-‐Eme
capabiliEes.

Summary

•  To
scale
out,
you
need
to
transform
your

computaEons
into
their
commutaEve
and

distribuEve
form.

•  Correctness
of
Query
ExecuEon
Strategies
in

Distributed
Databases
(1983)
oﬀers
a

framework
to
do
this
for
relaEonal
algebra.

Distributed
Query
ExecuEon

across
Diﬀerent
Workloads

Diﬀerent
Workloads

1.  Simple
Insert
/
Update
/
Delete
/
Select
commands

•  High
throughput
and
low
latency

2.  Real-‐Eme
Select
queries
that
get
parallelized
to
hundreds
of

shards
(<300ms)

3.  Long
running
Select
queries
that
join
large
tables

•  You
can’t
restart
a
Select
query
just
because
one
task
(or
one

machine)
in
1M
tasks
failed

Diﬀerent
Executors

1.  Router
Executor:
Simple
Insert
/
Update
/
Delete
/

Select
commands

2.  Real-‐Eme
Executor:
Real-‐Eme
Select
queries
that

touch
100s
of
shards
(<300ms)

3.  Task-‐tracker
Executor:
Longer
running
queries
that

need
to
scale
out
to
10K-‐1M
tasks

Conclusions

•  Distributed
relaEonal
databases
is
hard

•  PostgreSQL
and
its
extension
APIs
are
unique

•  Citus
targets
real-‐Eme
data
ingest
and

querying

•  Citus
5.0
is
open
source:
hVps://github.com/
citusdata/citus

QuesEons

hVps://citusdata.com

Forums:
groups.google.com/forum/#!forum/
citus-‐users

Citus Architecture: Extending Postgres to Build a Distributed Database

More Related Content

What's hot (20)

Similar to Citus Architecture: Extending Postgres to Build a Distributed Database (20)

Recently uploaded (20)

Citus Architecture: Extending Postgres to Build a Distributed Database