SlideShare a Scribd company logo
Juggling	
  with	
  Bits	
  and	
  Bytes	
  
How	
  Apache	
  Flink	
  operates	
  on	
  binary	
  data	
  
	
  
Fabian	
  Hueske	
  
:ueske@apache.org	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  @:ueske	
  
	
  
1	
  
Big	
  Data	
  frameworks	
  on	
  JVMs	
  
•  Many	
  (open	
  source)	
  Big	
  Data	
  frameworks	
  run	
  on	
  JVMs	
  
–  Hadoop,	
  Drill,	
  Spark,	
  Hive,	
  Pig,	
  and	
  ...	
  
–  Flink	
  as	
  well	
  
•  Common	
  challenge:	
  How	
  to	
  organize	
  data	
  in-­‐memory?	
  
–  In-­‐memory	
  processing	
  (sorOng,	
  joining,	
  aggregaOng)	
  
–  In-­‐memory	
  caching	
  of	
  intermediate	
  results	
  
•  Memory	
  management	
  of	
  a	
  system	
  influences	
  
–  Reliability	
  
–  Resource	
  efficiency,	
  performance	
  &	
  performance	
  predictability	
  
–  Ease	
  of	
  configuraOon	
  
2	
  
The	
  straight-­‐forward	
  approach	
  
Store	
  and	
  process	
  data	
  as	
  objects	
  on	
  the	
  heap	
  
•  Put	
  objects	
  in	
  an	
  array	
  and	
  sort	
  it	
  
	
  
A	
  few	
  notable	
  drawbacks	
  
•  PredicOng	
  memory	
  consumpOon	
  is	
  hard	
  
–  If	
  you	
  fail,	
  an	
  OutOfMemoryError	
  will	
  kill	
  you!	
  
•  High	
  garbage	
  collecOon	
  overhead	
  
–  Easily	
  50%	
  of	
  Ome	
  spend	
  on	
  GC	
  
•  Objects	
  have	
  considerable	
  space	
  overhead	
  
–  At	
  least	
  8	
  bytes	
  for	
  each	
  (nested)	
  object!	
  (Depends	
  on	
  arch)	
  
3	
  
FLINK’S	
  APPROACH	
  
4	
  
Flink	
  adopts	
  DBMS	
  technology	
  
•  Allocates	
  fixed	
  number	
  of	
  memory	
  segments	
  upfront	
  
•  Data	
  objects	
  are	
  serialized	
  into	
  memory	
  segments	
  
•  DBMS-­‐style	
  algorithms	
  work	
  on	
  binary	
  representaOon	
  
5	
  
Why	
  is	
  that	
  good?	
  
•  Memory-­‐safe	
  execuOon	
  
–  Used	
  and	
  available	
  memory	
  segments	
  are	
  easy	
  to	
  count	
  
–  No	
  parameter	
  tuning	
  for	
  reliable	
  operaOons!	
  
•  Efficient	
  out-­‐of-­‐core	
  algorithms	
  
–  Memory	
  segments	
  can	
  be	
  efficiently	
  wrifen	
  to	
  disk	
  
•  Reduced	
  GC	
  pressure	
  
–  Memory	
  segments	
  are	
  off-­‐heap	
  or	
  never	
  deallocated	
  
–  Data	
  objects	
  are	
  short-­‐lived	
  or	
  reused	
  
•  Space-­‐efficient	
  data	
  representaOon	
  
•  Efficient	
  operaOons	
  on	
  binary	
  data	
  
6	
  
What	
  does	
  it	
  cost?	
  
•  Significant	
  implementaOon	
  investment	
  
–  Using	
  java.uOl.HashMap	
  
vs.	
  
–  ImplemenOng	
  a	
  spillable	
  hash	
  table	
  backed	
  by	
  byte	
  arrays	
  
and	
  custom	
  serializaOon	
  stack	
  
•  Other	
  systems	
  use	
  similar	
  techniques	
  
–  Apache	
  Drill,	
  Apache	
  AsterixDB	
  (incubaOng)	
  
•  Apache	
  Spark	
  evolves	
  into	
  a	
  similar	
  direcOon	
  
7	
  
MEMORY	
  ALLOCATION	
  
8	
  
Memory	
  segments	
  
•  Unit	
  of	
  memory	
  distribuOon	
  in	
  Flink	
  
–  Fixed	
  number	
  allocated	
  when	
  worker	
  starts	
  
•  Backed	
  by	
  a	
  regular	
  byte	
  array	
  (default	
  32KB)	
  
•  On-­‐heap	
  or	
  off-­‐heap	
  allocaOon	
  
•  R/W	
  access	
  through	
  Java’s	
  efficient	
  unsafe	
  methods	
  
•  MulOple	
  memory	
  segments	
  can	
  be	
  logically	
  
concatenated	
  to	
  a	
  larger	
  chunk	
  of	
  memory	
  
9	
  
On-­‐heap	
  memory	
  allocaOon	
  
10	
  
Off-­‐heap	
  memory	
  allocaOon	
  
11	
  
On-­‐heap	
  vs.	
  Off-­‐heap	
  
•  No	
  significant	
  performance	
  difference	
  in	
  	
  
micro-­‐benchmarks	
  
•  Garbage	
  CollecOon	
  
–  Smaller	
  heap	
  -­‐>	
  faster	
  GC	
  
•  Faster	
  start-­‐up	
  Ome	
  
–  A	
  mulO-­‐GB	
  JVM	
  heap	
  takes	
  Ome	
  to	
  allocate	
  
12	
  
DATA	
  SERIALIZATION	
  
13	
  
Custom	
  de/serializaOon	
  stack	
  
•  Many	
  alternaOves	
  for	
  Java	
  object	
  serializaOon	
  
–  Dynamic:	
  Kryo	
  
–  Schema-­‐dependent:	
  Apache	
  Avro,	
  Apache	
  Thrip,	
  Protobufs	
  
•  But	
  Flink	
  has	
  its	
  own	
  serializaOon	
  stack	
  
–  OperaOng	
  on	
  serialized	
  data	
  requires	
  knowledge	
  of	
  layout	
  
–  Control	
  over	
  layout	
  can	
  improve	
  efficiency	
  of	
  operaOons	
  
–  Data	
  types	
  are	
  known	
  before	
  execuOon	
  
14	
  
Rich	
  &	
  extensible	
  type	
  system	
  
•  SerializaOon	
  framework	
  requires	
  knowledge	
  of	
  types	
  
•  Flink	
  analyzes	
  return	
  types	
  of	
  funcOons	
  
–  Java:	
  ReflecOon	
  based	
  type	
  analyzer	
  
–  Scala:	
  Compiler	
  informaOon	
  +	
  CodeGen	
  via	
  Macros	
  
•  Rich	
  type	
  system	
  
–  Atomics:	
  PrimiOves,	
  Writables,	
  Generic	
  types,	
  …	
  
–  Composites:	
  Tuples,	
  Pojos,	
  CaseClasses	
  
–  Extensible	
  by	
  custom	
  types	
  
15	
  
Serializing	
  a	
  Tuple3<Integer,	
  Double,	
  Person>	
  
16	
  
OPERATING	
  ON	
  BINARY	
  DATA	
  
17	
  
Data	
  processing	
  algorithms	
  
•  Flink’s	
  algorithms	
  are	
  based	
  on	
  RDBMS	
  technology	
  
–  External	
  Merge	
  Sort,	
  Hybrid	
  Hash	
  Join,	
  Sort	
  Merge	
  Join,	
  …	
  
•  Algorithms	
  receive	
  a	
  budget	
  of	
  memory	
  segments	
  
–  AutomaOc	
  decision	
  about	
  budget	
  size	
  
–  No	
  fine-­‐tuning	
  of	
  operator	
  memory!	
  
•  Operate	
  in-­‐memory	
  as	
  long	
  as	
  data	
  fits	
  into	
  budget	
  
–  And	
  gracefully	
  spill	
  to	
  disk	
  if	
  data	
  exceeds	
  memory	
  
18	
  
In-­‐memory	
  sort	
  –	
  Fill	
  the	
  sort	
  buffer	
  
19	
  
In-­‐memory	
  sort	
  –	
  Sort	
  the	
  buffer	
  
20	
  
In-­‐memory	
  sort	
  –	
  Read	
  sorted	
  buffer	
  
21	
  
SHOW	
  ME	
  NUMBERS!	
  
22	
  
Sort	
  benchmark	
  
•  Task:	
  Sort	
  10	
  million	
  Tuple2<Integer,	
  String>	
  records	
  
–  String	
  length	
  12	
  chars	
  
•  	
  Tuple	
  has	
  16	
  Bytes	
  of	
  raw	
  data	
  
•  ~152	
  MB	
  raw	
  data	
  
–  Integers	
  uniformly,	
  Strings	
  long-­‐tail	
  distributed	
  
–  Sort	
  on	
  Integer	
  field	
  and	
  on	
  String	
  field	
  
•  Generated	
  input	
  provided	
  as	
  mutable	
  object	
  iterator	
  
•  Use	
  JVM	
  with	
  900	
  MB	
  heap	
  size	
  
–  Minimum	
  size	
  to	
  reliable	
  run	
  the	
  benchmark	
  
23	
  
SorOng	
  methods	
  
1.  Objects-­‐on-­‐Heap:	
  	
  
–  Put	
  cloned	
  data	
  objects	
  in	
  ArrayList	
  and	
  use	
  Java’s	
  CollecOon	
  sort.	
  	
  
–  ArrayList	
  is	
  iniOalized	
  with	
  right	
  size.	
  
2.  Flink-­‐serialized	
  (on-­‐heap):	
  	
  
–  Using	
  Flink’s	
  custom	
  serializers.	
  
–  Integer	
  with	
  full	
  binary	
  sorOng	
  key,	
  String	
  with	
  8	
  byte	
  prefix	
  key.	
  
3.  Kryo-­‐serialized	
  (on-­‐heap):	
  	
  
–  Serialize	
  fields	
  with	
  Kryo.	
  	
  
–  No	
  binary	
  sorOng	
  keys,	
  objects	
  are	
  deserialized	
  for	
  comparison.	
  
•  All	
  implementaOons	
  use	
  a	
  single	
  thread	
  
•  Average	
  execuOon	
  Ome	
  of	
  10	
  runs	
  reported	
  
•  GC	
  triggered	
  between	
  runs	
  (does	
  not	
  go	
  into	
  reported	
  Ome)	
  
24	
  
ExecuOon	
  Ome	
  
25	
  
Garbage	
  collecOon	
  and	
  heap	
  usage	
  
26	
  
Objects-­‐on-­‐heap	
  
Flink-­‐serialized	
  
Memory	
  usage	
  
27	
  
•  Breakdown:	
  Flink	
  serialized	
  -­‐	
  Sort	
  Integer	
  
–  4	
  bytes	
  Integer	
  
–  12	
  bytes	
  String	
  
–  4	
  bytes	
  String	
  length	
  
–  4	
  bytes	
  pointer	
  
–  4	
  bytes	
  Integer	
  sorOng	
  key	
  
–  28	
  bytes	
  *	
  10M	
  records	
  =	
  267	
  MB	
  
Object-­‐on-­‐heap	
   Flink-­‐serialized	
   Kryo-­‐serialized	
  
Sort	
  Integer	
   Approx.	
  700	
  MB	
   277	
  MB	
   266	
  MB	
  
Sort	
  String	
   Approx.	
  700	
  MB	
   315	
  MB	
   266	
  MB	
  
Going	
  out-­‐of-­‐core	
  
28	
  
•  Single	
  thread	
  HashJoin	
  with	
  4GB	
  memory	
  budget	
  
•  Build	
  side	
  varies,	
  Probe	
  side	
  64GB	
  
WHAT’S	
  NEXT?	
  
29	
  
We’re	
  not	
  done	
  yet!	
  
	
  
•  SerializaOon	
  layouts	
  tailored	
  towards	
  operaOons	
  
–  More	
  efficient	
  operaOons	
  on	
  binary	
  data	
  
•  Table	
  API	
  provides	
  full	
  semanOcs	
  for	
  execuOon	
  
–  Use	
  code	
  generaOon	
  to	
  operate	
  fully	
  on	
  binary	
  data	
  
•  …	
  
30	
  
Summary	
  
•  AcOve	
  memory	
  management	
  avoids	
  OOMErrors	
  
•  Highly	
  efficient	
  data	
  serializaOon	
  stack	
  
–  Facilitates	
  operaOons	
  on	
  binary	
  data	
  
–  Makes	
  more	
  data	
  fit	
  into	
  memory	
  
•  DBMS-­‐style	
  operators	
  operate	
  on	
  binary	
  data	
  	
  
–  High	
  performance	
  in-­‐memory	
  processing	
  	
  
–  Graceful	
  destaging	
  to	
  disk	
  if	
  necessary	
  
•  Read	
  Flink’s	
  blog:	
  	
  
–  hfp://flink.apache.org/news/2015/05/11/Juggling-­‐with-­‐Bits-­‐and-­‐Bytes.html	
  
–  hfp://flink.apache.org/news/2015/03/13/peeking-­‐into-­‐Apache-­‐Flinks-­‐Engine-­‐Room.html	
  
–  hfp://flink.apache.org/news/2015/09/16/off-­‐heap-­‐memory.html	
  
	
  
31	
  
32	
  
hfp://flink.apache.org 	
   	
  @ApacheFlink	
  
Apache	
  Flink	
  

More Related Content

PDF
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
PDF
Introduction to Apache Flink - Fast and reliable big data processing
PPTX
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
PDF
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
PDF
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
PDF
Apache con big data 2015 - Data Science from the trenches
PPTX
Fabian Hueske – Cascading on Flink
PPTX
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Introduction to Apache Flink - Fast and reliable big data processing
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Apache con big data 2015 - Data Science from the trenches
Fabian Hueske – Cascading on Flink
Suneel Marthi - Deep Learning with Apache Flink and DL4J

What's hot (20)

PDF
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem
PDF
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
PPTX
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
PPTX
Juggling with Bits and Bytes - How Apache Flink operates on binary data
PPTX
LLAP: Sub-Second Analytical Queries in Hive
PDF
Presto as a Service - Tips for operation and monitoring
PPTX
Slim Baltagi – Flink vs. Spark
PDF
Stream Processing use cases and applications with Apache Apex by Thomas Weise
PDF
Workflow Hacks #1 - dots. Tokyo
PDF
Building real time data-driven products
PDF
Flink Apachecon Presentation
PDF
Christian Kreuzfeld – Static vs Dynamic Stream Processing
PDF
data.table and H2O at LondonR with Matt Dowle
PDF
Dongwon Kim – A Comparative Performance Evaluation of Flink
PDF
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
PPTX
Case study- Real-time OLAP Cubes
PPTX
Flink Streaming
PPTX
Apache flink
PDF
Apache Spark vs Apache Flink
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
Juggling with Bits and Bytes - How Apache Flink operates on binary data
LLAP: Sub-Second Analytical Queries in Hive
Presto as a Service - Tips for operation and monitoring
Slim Baltagi – Flink vs. Spark
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Workflow Hacks #1 - dots. Tokyo
Building real time data-driven products
Flink Apachecon Presentation
Christian Kreuzfeld – Static vs Dynamic Stream Processing
data.table and H2O at LondonR with Matt Dowle
Dongwon Kim – A Comparative Performance Evaluation of Flink
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Case study- Real-time OLAP Cubes
Flink Streaming
Apache flink
Apache Spark vs Apache Flink
Ad

Viewers also liked (20)

PPTX
Apache Flink: API, runtime, and project roadmap
PDF
Ufuc Celebi – Stream & Batch Processing in one System
PDF
Apache Flink internals
PPTX
Apache Flink Training: System Overview
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
PPTX
Michael Häusler – Everyday flink
PPTX
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
PPTX
Apache Flink Training: DataSet API Basics
PDF
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
PDF
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
PDF
Alexander Kolb – Flink. Yet another Streaming Framework?
PDF
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
PDF
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
PPTX
Assaf Araki – Real Time Analytics at Scale
PPTX
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
PPTX
Apache Flink - Hadoop MapReduce Compatibility
PDF
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
PDF
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
PPTX
Flink 0.10 @ Bay Area Meetup (October 2015)
PDF
Matthias J. Sax – A Tale of Squirrels and Storms
Apache Flink: API, runtime, and project roadmap
Ufuc Celebi – Stream & Batch Processing in one System
Apache Flink internals
Apache Flink Training: System Overview
K. Tzoumas & S. Ewen – Flink Forward Keynote
Michael Häusler – Everyday flink
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Apache Flink Training: DataSet API Basics
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Alexander Kolb – Flink. Yet another Streaming Framework?
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Assaf Araki – Real Time Analytics at Scale
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Apache Flink - Hadoop MapReduce Compatibility
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink 0.10 @ Bay Area Meetup (October 2015)
Matthias J. Sax – A Tale of Squirrels and Storms
Ad

Similar to Fabian Hueske – Juggling with Bits and Bytes (20)

PPTX
Elasticsearch Arcihtecture & What's New in Version 5
PPT
7. Key-Value Databases: In Depth
PDF
Java Memory Analysis: Problems and Solutions
PDF
In-memory Data Management Trends & Techniques
PDF
Why you should care about data layout in the file system with Cheng Lian and ...
PPT
Memory organization including cache and RAM.ppt
PDF
Challenges and Opportunities of Big Data Genomics
KEY
Writing Scalable Software in Java
PDF
An Efficient Backup and Replication of Storage
PPTX
Taming the resource tiger
PPTX
Supercharging Data Performance for Real-Time Data Analysis
PPTX
Taming the resource tiger
PPTX
Dissecting Scalable Database Architectures
PPTX
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
PDF
Scalability, Availability & Stability Patterns
PDF
Overview of the ehcache
PPTX
Hardware Provisioning
PDF
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
PPTX
DataEngConf SF16 - High cardinality time series search
PDF
High Performance With Java
Elasticsearch Arcihtecture & What's New in Version 5
7. Key-Value Databases: In Depth
Java Memory Analysis: Problems and Solutions
In-memory Data Management Trends & Techniques
Why you should care about data layout in the file system with Cheng Lian and ...
Memory organization including cache and RAM.ppt
Challenges and Opportunities of Big Data Genomics
Writing Scalable Software in Java
An Efficient Backup and Replication of Storage
Taming the resource tiger
Supercharging Data Performance for Real-Time Data Analysis
Taming the resource tiger
Dissecting Scalable Database Architectures
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Scalability, Availability & Stability Patterns
Overview of the ehcache
Hardware Provisioning
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
DataEngConf SF16 - High cardinality time series search
High Performance With Java

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
Using the New Apache Flink Kubernetes Operator in a Production Deployment
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Cloud computing and distributed systems.
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation theory and applications.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
NewMind AI Weekly Chronicles - August'25 Week I
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Machine learning based COVID-19 study performance prediction
Cloud computing and distributed systems.
The AUB Centre for AI in Media Proposal.docx
Encapsulation theory and applications.pdf
Spectral efficient network and resource selection model in 5G networks
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Approach and Philosophy of On baking technology
Encapsulation_ Review paper, used for researhc scholars

Fabian Hueske – Juggling with Bits and Bytes

  • 1. Juggling  with  Bits  and  Bytes   How  Apache  Flink  operates  on  binary  data     Fabian  Hueske   :ueske@apache.org                    @:ueske     1  
  • 2. Big  Data  frameworks  on  JVMs   •  Many  (open  source)  Big  Data  frameworks  run  on  JVMs   –  Hadoop,  Drill,  Spark,  Hive,  Pig,  and  ...   –  Flink  as  well   •  Common  challenge:  How  to  organize  data  in-­‐memory?   –  In-­‐memory  processing  (sorOng,  joining,  aggregaOng)   –  In-­‐memory  caching  of  intermediate  results   •  Memory  management  of  a  system  influences   –  Reliability   –  Resource  efficiency,  performance  &  performance  predictability   –  Ease  of  configuraOon   2  
  • 3. The  straight-­‐forward  approach   Store  and  process  data  as  objects  on  the  heap   •  Put  objects  in  an  array  and  sort  it     A  few  notable  drawbacks   •  PredicOng  memory  consumpOon  is  hard   –  If  you  fail,  an  OutOfMemoryError  will  kill  you!   •  High  garbage  collecOon  overhead   –  Easily  50%  of  Ome  spend  on  GC   •  Objects  have  considerable  space  overhead   –  At  least  8  bytes  for  each  (nested)  object!  (Depends  on  arch)   3  
  • 5. Flink  adopts  DBMS  technology   •  Allocates  fixed  number  of  memory  segments  upfront   •  Data  objects  are  serialized  into  memory  segments   •  DBMS-­‐style  algorithms  work  on  binary  representaOon   5  
  • 6. Why  is  that  good?   •  Memory-­‐safe  execuOon   –  Used  and  available  memory  segments  are  easy  to  count   –  No  parameter  tuning  for  reliable  operaOons!   •  Efficient  out-­‐of-­‐core  algorithms   –  Memory  segments  can  be  efficiently  wrifen  to  disk   •  Reduced  GC  pressure   –  Memory  segments  are  off-­‐heap  or  never  deallocated   –  Data  objects  are  short-­‐lived  or  reused   •  Space-­‐efficient  data  representaOon   •  Efficient  operaOons  on  binary  data   6  
  • 7. What  does  it  cost?   •  Significant  implementaOon  investment   –  Using  java.uOl.HashMap   vs.   –  ImplemenOng  a  spillable  hash  table  backed  by  byte  arrays   and  custom  serializaOon  stack   •  Other  systems  use  similar  techniques   –  Apache  Drill,  Apache  AsterixDB  (incubaOng)   •  Apache  Spark  evolves  into  a  similar  direcOon   7  
  • 9. Memory  segments   •  Unit  of  memory  distribuOon  in  Flink   –  Fixed  number  allocated  when  worker  starts   •  Backed  by  a  regular  byte  array  (default  32KB)   •  On-­‐heap  or  off-­‐heap  allocaOon   •  R/W  access  through  Java’s  efficient  unsafe  methods   •  MulOple  memory  segments  can  be  logically   concatenated  to  a  larger  chunk  of  memory   9  
  • 12. On-­‐heap  vs.  Off-­‐heap   •  No  significant  performance  difference  in     micro-­‐benchmarks   •  Garbage  CollecOon   –  Smaller  heap  -­‐>  faster  GC   •  Faster  start-­‐up  Ome   –  A  mulO-­‐GB  JVM  heap  takes  Ome  to  allocate   12  
  • 14. Custom  de/serializaOon  stack   •  Many  alternaOves  for  Java  object  serializaOon   –  Dynamic:  Kryo   –  Schema-­‐dependent:  Apache  Avro,  Apache  Thrip,  Protobufs   •  But  Flink  has  its  own  serializaOon  stack   –  OperaOng  on  serialized  data  requires  knowledge  of  layout   –  Control  over  layout  can  improve  efficiency  of  operaOons   –  Data  types  are  known  before  execuOon   14  
  • 15. Rich  &  extensible  type  system   •  SerializaOon  framework  requires  knowledge  of  types   •  Flink  analyzes  return  types  of  funcOons   –  Java:  ReflecOon  based  type  analyzer   –  Scala:  Compiler  informaOon  +  CodeGen  via  Macros   •  Rich  type  system   –  Atomics:  PrimiOves,  Writables,  Generic  types,  …   –  Composites:  Tuples,  Pojos,  CaseClasses   –  Extensible  by  custom  types   15  
  • 16. Serializing  a  Tuple3<Integer,  Double,  Person>   16  
  • 17. OPERATING  ON  BINARY  DATA   17  
  • 18. Data  processing  algorithms   •  Flink’s  algorithms  are  based  on  RDBMS  technology   –  External  Merge  Sort,  Hybrid  Hash  Join,  Sort  Merge  Join,  …   •  Algorithms  receive  a  budget  of  memory  segments   –  AutomaOc  decision  about  budget  size   –  No  fine-­‐tuning  of  operator  memory!   •  Operate  in-­‐memory  as  long  as  data  fits  into  budget   –  And  gracefully  spill  to  disk  if  data  exceeds  memory   18  
  • 19. In-­‐memory  sort  –  Fill  the  sort  buffer   19  
  • 20. In-­‐memory  sort  –  Sort  the  buffer   20  
  • 21. In-­‐memory  sort  –  Read  sorted  buffer   21  
  • 23. Sort  benchmark   •  Task:  Sort  10  million  Tuple2<Integer,  String>  records   –  String  length  12  chars   •   Tuple  has  16  Bytes  of  raw  data   •  ~152  MB  raw  data   –  Integers  uniformly,  Strings  long-­‐tail  distributed   –  Sort  on  Integer  field  and  on  String  field   •  Generated  input  provided  as  mutable  object  iterator   •  Use  JVM  with  900  MB  heap  size   –  Minimum  size  to  reliable  run  the  benchmark   23  
  • 24. SorOng  methods   1.  Objects-­‐on-­‐Heap:     –  Put  cloned  data  objects  in  ArrayList  and  use  Java’s  CollecOon  sort.     –  ArrayList  is  iniOalized  with  right  size.   2.  Flink-­‐serialized  (on-­‐heap):     –  Using  Flink’s  custom  serializers.   –  Integer  with  full  binary  sorOng  key,  String  with  8  byte  prefix  key.   3.  Kryo-­‐serialized  (on-­‐heap):     –  Serialize  fields  with  Kryo.     –  No  binary  sorOng  keys,  objects  are  deserialized  for  comparison.   •  All  implementaOons  use  a  single  thread   •  Average  execuOon  Ome  of  10  runs  reported   •  GC  triggered  between  runs  (does  not  go  into  reported  Ome)   24  
  • 26. Garbage  collecOon  and  heap  usage   26   Objects-­‐on-­‐heap   Flink-­‐serialized  
  • 27. Memory  usage   27   •  Breakdown:  Flink  serialized  -­‐  Sort  Integer   –  4  bytes  Integer   –  12  bytes  String   –  4  bytes  String  length   –  4  bytes  pointer   –  4  bytes  Integer  sorOng  key   –  28  bytes  *  10M  records  =  267  MB   Object-­‐on-­‐heap   Flink-­‐serialized   Kryo-­‐serialized   Sort  Integer   Approx.  700  MB   277  MB   266  MB   Sort  String   Approx.  700  MB   315  MB   266  MB  
  • 28. Going  out-­‐of-­‐core   28   •  Single  thread  HashJoin  with  4GB  memory  budget   •  Build  side  varies,  Probe  side  64GB  
  • 30. We’re  not  done  yet!     •  SerializaOon  layouts  tailored  towards  operaOons   –  More  efficient  operaOons  on  binary  data   •  Table  API  provides  full  semanOcs  for  execuOon   –  Use  code  generaOon  to  operate  fully  on  binary  data   •  …   30  
  • 31. Summary   •  AcOve  memory  management  avoids  OOMErrors   •  Highly  efficient  data  serializaOon  stack   –  Facilitates  operaOons  on  binary  data   –  Makes  more  data  fit  into  memory   •  DBMS-­‐style  operators  operate  on  binary  data     –  High  performance  in-­‐memory  processing     –  Graceful  destaging  to  disk  if  necessary   •  Read  Flink’s  blog:     –  hfp://flink.apache.org/news/2015/05/11/Juggling-­‐with-­‐Bits-­‐and-­‐Bytes.html   –  hfp://flink.apache.org/news/2015/03/13/peeking-­‐into-­‐Apache-­‐Flinks-­‐Engine-­‐Room.html   –  hfp://flink.apache.org/news/2015/09/16/off-­‐heap-­‐memory.html     31  
  • 32. 32   hfp://flink.apache.org    @ApacheFlink   Apache  Flink