Fabian Hueske – Juggling with Bits and Bytes

Juggling
with
Bits
and
Bytes

How
Apache
Flink
operates
on
binary
data

Fabian
Hueske

:ueske@apache.org

@:ueske

1

Big
Data
frameworks
on
JVMs

•  Many
(open
source)
Big
Data
frameworks
run
on
JVMs

–  Hadoop,
Drill,
Spark,
Hive,
Pig,
and
...

–  Flink
as
well

•  Common
challenge:
How
to
organize
data
in-‐memory?

–  In-‐memory
processing
(sorOng,
joining,
aggregaOng)

–  In-‐memory
caching
of
intermediate
results

•  Memory
management
of
a
system
influences

–  Reliability

–  Resource
efficiency,
performance
&
performance
predictability

–  Ease
of
configuraOon

2

The
straight-‐forward
approach

Store
and
process
data
as
objects
on
the
heap

•  Put
objects
in
an
array
and
sort
it

A
few
notable
drawbacks

•  PredicOng
memory
consumpOon
is
hard

–  If
you
fail,
an
OutOfMemoryError
will
kill
you!

•  High
garbage
collecOon
overhead

–  Easily
50%
of
Ome
spend
on
GC

•  Objects
have
considerable
space
overhead

–  At
least
8
bytes
for
each
(nested)
object!
(Depends
on
arch)

3

Flink
adopts
DBMS
technology

•  Allocates
ﬁxed
number
of
memory
segments
upfront

•  Data
objects
are
serialized
into
memory
segments

•  DBMS-‐style
algorithms
work
on
binary
representaOon

5

Why
is
that
good?

•  Memory-‐safe
execuOon

–  Used
and
available
memory
segments
are
easy
to
count

–  No
parameter
tuning
for
reliable
operaOons!

•  Efficient
out-‐of-‐core
algorithms

–  Memory
segments
can
be
efficiently
wrifen
to
disk

•  Reduced
GC
pressure

–  Memory
segments
are
off-‐heap
or
never
deallocated

–  Data
objects
are
short-‐lived
or
reused

•  Space-‐efficient
data
representaOon

•  Efficient
operaOons
on
binary
data

6

What
does
it
cost?

•  Signiﬁcant
implementaOon
investment

–  Using
java.uOl.HashMap

vs.

–  ImplemenOng
a
spillable
hash
table
backed
by
byte
arrays

and
custom
serializaOon
stack

•  Other
systems
use
similar
techniques

–  Apache
Drill,
Apache
AsterixDB
(incubaOng)

•  Apache
Spark
evolves
into
a
similar
direcOon

7

Memory
segments

•  Unit
of
memory
distribuOon
in
Flink

–  Fixed
number
allocated
when
worker
starts

•  Backed
by
a
regular
byte
array
(default
32KB)

•  On-‐heap
or
oﬀ-‐heap
allocaOon

•  R/W
access
through
Java’s
eﬃcient
unsafe
methods

•  MulOple
memory
segments
can
be
logically

concatenated
to
a
larger
chunk
of
memory

9

On-‐heap
memory
allocaOon

10

Oﬀ-‐heap
memory
allocaOon

11

On-‐heap
vs.
Off-‐heap

•  No
significant
performance
difference
in

micro-‐benchmarks

•  Garbage
CollecOon

–  Smaller
heap
-‐>
faster
GC

•  Faster
start-‐up
Ome

–  A
mulO-‐GB
JVM
heap
takes
Ome
to
allocate

12

DATA
SERIALIZATION

13

Custom
de/serializaOon
stack

•  Many
alternaOves
for
Java
object
serializaOon

–  Dynamic:
Kryo

–  Schema-‐dependent:
Apache
Avro,
Apache
Thrip,
Protobufs

•  But
Flink
has
its
own
serializaOon
stack

–  OperaOng
on
serialized
data
requires
knowledge
of
layout

–  Control
over
layout
can
improve
eﬃciency
of
operaOons

–  Data
types
are
known
before
execuOon

14

Rich
&
extensible
type
system

•  SerializaOon
framework
requires
knowledge
of
types

•  Flink
analyzes
return
types
of
funcOons

–  Java:
ReﬂecOon
based
type
analyzer

–  Scala:
Compiler
informaOon
+
CodeGen
via
Macros

•  Rich
type
system

–  Atomics:
PrimiOves,
Writables,
Generic
types,
…

–  Composites:
Tuples,
Pojos,
CaseClasses

–  Extensible
by
custom
types

15

Serializing
a
Tuple3<Integer,
Double,
Person>

16

OPERATING
ON
BINARY
DATA

17

Data
processing
algorithms

•  Flink’s
algorithms
are
based
on
RDBMS
technology

–  External
Merge
Sort,
Hybrid
Hash
Join,
Sort
Merge
Join,
…

•  Algorithms
receive
a
budget
of
memory
segments

–  AutomaOc
decision
about
budget
size

–  No
ﬁne-‐tuning
of
operator
memory!

•  Operate
in-‐memory
as
long
as
data
ﬁts
into
budget

–  And
gracefully
spill
to
disk
if
data
exceeds
memory

18

In-‐memory
sort
–
Fill
the
sort
buﬀer

19

In-‐memory
sort
–
Sort
the
buﬀer

20

In-‐memory
sort
–
Read
sorted
buﬀer

21

SHOW
ME
NUMBERS!

22

Sort
benchmark

•  Task:
Sort
10
million
Tuple2<Integer,
String>
records

–  String
length
12
chars

• 
Tuple
has
16
Bytes
of
raw
data

•  ~152
MB
raw
data

–  Integers
uniformly,
Strings
long-‐tail
distributed

–  Sort
on
Integer
ﬁeld
and
on
String
ﬁeld

•  Generated
input
provided
as
mutable
object
iterator

•  Use
JVM
with
900
MB
heap
size

–  Minimum
size
to
reliable
run
the
benchmark

23

SorOng
methods

1.  Objects-‐on-‐Heap:

–  Put
cloned
data
objects
in
ArrayList
and
use
Java’s
CollecOon
sort.

–  ArrayList
is
iniOalized
with
right
size.

2.  Flink-‐serialized
(on-‐heap):

–  Using
Flink’s
custom
serializers.

–  Integer
with
full
binary
sorOng
key,
String
with
8
byte
preﬁx
key.

3.  Kryo-‐serialized
(on-‐heap):

–  Serialize
ﬁelds
with
Kryo.

–  No
binary
sorOng
keys,
objects
are
deserialized
for
comparison.

•  All
implementaOons
use
a
single
thread

•  Average
execuOon
Ome
of
10
runs
reported

•  GC
triggered
between
runs
(does
not
go
into
reported
Ome)

24

Garbage
collecOon
and
heap
usage

26

Objects-‐on-‐heap

Flink-‐serialized

Memory
usage

27

•  Breakdown:
Flink
serialized
-‐
Sort
Integer

–  4
bytes
Integer

–  12
bytes
String

–  4
bytes
String
length

–  4
bytes
pointer

–  4
bytes
Integer
sorOng
key

–  28
bytes
*
10M
records
=
267
MB

Object-‐on-‐heap
Flink-‐serialized
Kryo-‐serialized

Sort
Integer
Approx.
700
MB
277
MB
266
MB

Sort
String
Approx.
700
MB
315
MB
266
MB

Going
out-‐of-‐core

28

•  Single
thread
HashJoin
with
4GB
memory
budget

•  Build
side
varies,
Probe
side
64GB

We’re
not
done
yet!

•  SerializaOon
layouts
tailored
towards
operaOons

–  More
eﬃcient
operaOons
on
binary
data

•  Table
API
provides
full
semanOcs
for
execuOon

–  Use
code
generaOon
to
operate
fully
on
binary
data

•  …

30

Summary

•  AcOve
memory
management
avoids
OOMErrors

•  Highly
efficient
data
serializaOon
stack

–  Facilitates
operaOons
on
binary
data

–  Makes
more
data
fit
into
memory

•  DBMS-‐style
operators
operate
on
binary
data

–  High
performance
in-‐memory
processing

–  Graceful
destaging
to
disk
if
necessary

•  Read
Flink’s
blog:

–  hfp://flink.apache.org/news/2015/05/11/Juggling-‐with-‐Bits-‐and-‐Bytes.html

–  hfp://flink.apache.org/news/2015/03/13/peeking-‐into-‐Apache-‐Flinks-‐Engine-‐Room.html

–  hfp://flink.apache.org/news/2015/09/16/off-‐heap-‐memory.html

31

32

hfp://ﬂink.apache.org

@ApacheFlink

Apache
Flink

Fabian Hueske – Juggling with Bits and Bytes

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Fabian Hueske – Juggling with Bits and Bytes (20)

More from Flink Forward (20)

Recently uploaded (20)

Fabian Hueske – Juggling with Bits and Bytes