Learning to build distributed systems the hard way

LEARNING TO BUILD
DISTRIBUTED SYSTEMS
THE HARD WAY
@iconara

LEARNING TO BUILD
DISTRIBUTED SYSTEMS
THE HARD WAY
BIG DATA
@iconara

speakerdeck.com/u/iconara
(real time!)

let’s make online advertising a great experience

30K REQUESTS
PER SECOND
more than a billion requests per day,
over 1 TB raw data

ONE VISIT CAN
CHANGE UP TO
100K COUNTERS
hundreds of millions of individual counters per day,
plus counting uniques and visitor histories

IN REAL TIME
or near real time, if you want to be pedantic
×

START WITH TWO
OF EVERYTHING
going from one to two is the hardest,
solve the scaling problem up front

START WITH TWO
OF EVERYTHING
you’ll solve the scaling problem,
and need less overcapacity
THREE

GIVE A LOT OF
THOUGHT TO
KEYS AND IDS
and think about your queries ﬁrst

MEIHO0 JME57Z
monotonically increasing,
sorts nicely
a timestamp
something random

JME57Z MEIHO0
uniformly distributed,
works nicely with sharding
something random
a timestamp

CONSISTENCY IS
OVERRATED
don’t fear R + W < N

PRECOMPUTE
ALL THE THINGS
your users most likely don’t know what they want,
so why let them do ad hoc queries?

SEPARATE
PROCESSING
FROM STORAGE
that way you can scale each independently

PLAN HOW TO GET
RID OF YOUR DATA
deleting stuff is harder than you might think
×
×
×
×
×
×
×

DIVIDE THE LOAD
big data systems are all about
routing and partitioning

RANDOM
when you have no interdependencies
between things it’s easy to scale out

CONSISTENT
when there are interdependencies you need
to route using some property of the objects,
but make sure you get a uniform distribution

A DIVERSION ABOUT
COUNTING TO 60
the reason why there’s 60 seconds to a minute,
and 360 degrees to a circle
××

3 SEGMENTS
ON EACH FINGER
= 12

3 SEGMENTS
ON EACH FINGER
= 12
FIVE FINGERS
ON OTHER HAND
= 60

12, 60, 120, 360
superior highly composite numbers

use multiples of 12 to scale
without always having to double

BLAH BLAH BLAH
use multiples of 12 to scale
without always having to double

log2(366) ≈ 31
six characters 0-9, A-Z can represent 31 bits,
which is kind of almost very close to four bytes

MEIHO0
a timestamp
Time.now.to_i.to_s(36).upcase

YOU CAN’T SCALE
TO REAL TIME
and don’t trust code that doesn’t run continuously
×

DO YOU REALLY
NEED A BACKUP?
if you got 3x replication over multiple
availability zones, is that backup really worth it?

PRODUCTION IS THE
ONLY REAL TEST
ENVIRONMENT
when thousands of things happen every second,
new, weird and unforeseen things happen all the time,
your tests can only cover the foreseeable
=

GÖTEBORG,
DISTRIBUTED
@gbgdistr

KTHXBAI
@iconara
github.com/iconara
architecturalatrocities.com
burtcorp.com

Learning to build distributed systems the hard way

More Related Content

Similar to Learning to build distributed systems the hard way (20)

More from Theo Hultberg (8)

Recently uploaded (20)

Learning to build distributed systems the hard way