SlideShare a Scribd company logo
Running
Apache Flink®
Everywhere
Stephan Ewen (@StephanEwen)
How is Flink deployed?
2
Standalone Cluster Embedded Service (OSGI)
YARN Sessions
Standalone Cloud
Docker on Mesos
Docker/Kubernetes
YARN->Myriad->Mesos
YARN Jobs
A two minute search on the mailing list reveals
How is Flink deployed?
3
Standalone Cluster Embedded Service (OSGI)
YARN Sessions
Standalone Cloud
Docker on Mesos
Docker/Kubernetes
YARN->Myriad->Mesos
YARN Jobs
A two minute search on the mailing list reveals
Mesos Sessions
Mesos Jobs
(soon!)
How is Flink deployed?
4
Standalone Cluster Embedded Service (OSGI)
YARN Sessions
Standalone Cloud
Docker on Mesos
Docker/Kubernetes
YARN->Myriad->Mesos
YARN Jobs
Users run mostly isolated jobs or multi-job sessions
Mesos Sessions
Mesos Jobs
Resource Management
5
Standalone Cluster Embedded Service (OSGI)
YARN Sessions
Standalone Cloud
Docker on Mesos
Docker/Kubernetes
YARN->Myriad->Mesos
YARN Jobs
Resources controlled by the framework or another service.
Mesos Sessions
Mesos Jobs
More dimensions coming up…
6
Dynamic Resources
• Number of TaskManagers changes
over job lifetime
"Trusted" processes
• Run under superuser credential
and dispatch jobs
No blocking on any process type
• YARN job needs to continue while
ApplicationMaster is down
Uniform vs. Heterogeneous Resources
• Run different functions in different
size containers
• E.g., simple mapper in small
container, heavy window operator in
large container Avoiding "Job Submit" step
Reworking the Flink
Process Model
7
Flink Improvement Proposal 6
8
Currently driving parties:
Core Idea
• Creating composable building blocks
• Create different compositions for different
scenarios
FLIP-6 design document:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
Recap: Current status (Standalone)
9
Standalone Flink Cluster
Client (2) Submit Job
JobManager
TaskManager
TaskManager
TaskManager
(3) Deploy Tasks
(1) Register
Recap: Current status (YARN)
10
YARN
ResourceManager
YARN Cluster
Client
(1) Submit YARN App.
(FLINK)
Application Master
JobManager
TaskManager
TaskManager
TaskManager
(2) Spawn AppMaster
(4) Start
TaskManagers
(8) Deploy
Tasks
(3) Poll
status
(6) All
TaskManager
started
(5) Register
(7) Submit Job
The Building Blocks
11
• ClusterManager-specific
• May live across jobs
• Manages available Containers/TaskManagers
• Used to acquire / release resources
ResourceManager
TaskManagerJobManager
• Registers at ResourceManager
• Gets tasks from one or more
JobManagers
• Single job only, started per job
• Thinks in terms of "task slots"
• Deploys and monitors job/task execution
Dispatcher
• Lives across jobs
• Touch-point for job submissions
• Spawns JobManagers
• May spawn ResourceManager
The Building Blocks
12
ResourceManager
(1) Request slots TaskManager
JobManager
(2) Start
TaskManager
(3) Register
(4) Deploy Tasks
Building Flink-on-YARN
13
YARN
ResourceManager
YARN Cluster
YARN Cluster
Client
(1) Submit YARN App.
(JobGraph / JARs)
Application Master
Flink-YARN
ResourceManager
JobManager TaskManager
TaskManager
TaskManager
(2) Spawn AppMaster
(4) Start
TaskManagers
(6) Deploy
Tasks
(5) Register
(3) Request slots
Building Flink-on-YARN
Main differences from current YARN mode
 All containers started with JARs, config files in classpath
 Credentials & Secrets are strictly bound to a single job
 Slots are allocated/released as needed/freed
• Basic building block for elastic resource usage
 Client disconnects after submitting job, does not need to wait until
TaskManagers are up
14
Building Flink-on-YARN (separate RM)
15
YARN
ResourceManager
YARN Cluster
YARN Cluster
Client
(1) Submit YARN App.
(JobGraph / JARs)
Application Master
Flink-YARN
ResourceManager
JobManager TaskManager
TaskManager
TaskManager
(2) Spawn AppMaster
(4) Start
TaskManagers
(6) Deploy
Tasks
(5) Register(4) Request
slots
(3) Start
JobMngr
Building Flink-on-YARN (w/ dispatcher)
16
YARN
ResourceManager
YARN Cluster
YARN Cluster
Client
(1) HTTP POST
JobGraph/Jars
Application Master
Flink-YARN
ResourceManager
JobManager TaskManager
TaskManager
TaskManager
(3) Spawn AppMaster
(5) Start
TaskManagers
(7) Deploy
Tasks
(6) Register
Flink YARN
Dispatcher
(2) Submit YARN App.
(JobGraph / JARs)
(4) Request slots
Building Flink-on-Mesos
17
Mesos Master
Mesos Cluster
Mesos Cluster
Client
(1) HTTP POST
JobGraph/Jars
Flink Master Process
Flink Mesos
ResourceManager
JobManager TaskManager
TaskManager
TaskManager
(3) Start Process
(and supervise)
(5) Start
TaskManagers
(7) Deploy
Tasks
(6) Register
(4) Request slots
Flink Mesos
Dispatcher
(2) Allocate container
for Flink master
Building Standalone
18
Standalone Cluster
Flink Cluster
Client
(1) Submit
JobGraph/Jars
Flink Master Process
Standalone
ResourceManager
TaskManager
TaskManager
TaskManager
(7) Deploy Tasks
(1) Register
(3) Request slots
JobManager JobManager
Dispatcher
(2) Start JobMngr
Standby Master Process Standby Master Process
Master Container
Flink Master Process
Building Flink-on-Docker/K8S
19
Flink-Container
ResourceManager
JobManager
Program Runner
(2) Run & Start
Worker Container
TaskManager
Worker Container
TaskManager
Worker Container
TaskManager
(3) Register
(1) Container framework starts Master & Worker Containers
(4) Deploy Tasks
Building Flink-on-Docker/K8S
 This is a blueprint for all setups where external services control
resources and start new TaskManagers
• For example AWS EC2 Flink image with auto-scaling groups
 Can be extended to have N equal containers, out of which one
becomes master, remainder workers
 With upcoming dynamic-scaling feature (see Till's talk), JobManager
scales job to use all available resources
20
Multi-Job Sessions
21
Example: YARN session
ApplicationMaster
Flink-YARN
ResourceManager
(5) Request
slots
JobManager
(A)
JobManager
(B)
Dispatcher
(4) Start
JobMngr
YARN
ResourceManager
YARN Cluster
Client
(1) Submit YARN App.
(FLINK – session)
TaskManager
TaskManager
TaskManager
(2) Spawn AppMaster
(6) Start
TaskManagers
(8, 12) Deploy Tasks
(7) Register
(3) Submit
Job A (11) Request
slots
(10) Start
JobMngr
(9) Submit
Job B
22
Sessions vs. Jobs
 For each Job submitted, the session will spawn its own JobManager
 All jobs run under session-user credentials
 ResourceManager holds on to containers for a certain time
• Jobs quickly following one another reuse containers (quicker response)
 Internally, sessions build on the dispatcher component
23
Wrap-up
24
More stuff
 Dynamically acquire/release resources
• Slots are allocated/released from Resource Manager as needed
• ResourceManager allocates/releases containers over time
• Strong interplay with "Dynamic Scaling" (rf. talk by Till yesterday)
 Resource Profiles: Containers of different size
• Requests can pass a "profile" (CPU / memory / disk), or simply use
"default profile"
• Resource Managers YARN & Mesos can allocate respective containers
25
Wrapping it up
 It’s a zoo of cluster managers out there
• Following different paradigms
 Usage patterns vary because of Flink's broad use cases
• Isolated long running jobs vs. many short-lived jobs
• Shared clusters vs. per-user authenticated resources
 We are making "jobs" and "sessions" explicit constructs
 Flexible building blocks, composed in various ways to accommodate
different scenarios
26
Appendix
27
Flink Streaming cornerstones
28
Low latency
High Throughput
Well-behaved
flow control
(back pressure)
Make more sense of data
Works on real-time
and historic data
Performant
Streaming
Event Time
APIs
Libraries
Stateful
Streaming
Globally consistent
savepoints
Exactly-once semantics
for fault tolerance
Windows &
user-defined state
Flexible windows
(time, count, session, roll-your own)
Complex Event Processing

More Related Content

PPTX
Eron Wright - Introducing Flink on Mesos
PPTX
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
PPTX
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
PPTX
Flink history, roadmap and vision
PPTX
Apache Flink Hands On
PDF
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
PPTX
Juggling with Bits and Bytes - How Apache Flink operates on binary data
PDF
Apache Flink Stream Processing
Eron Wright - Introducing Flink on Mesos
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Flink history, roadmap and vision
Apache Flink Hands On
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
Juggling with Bits and Bytes - How Apache Flink operates on binary data
Apache Flink Stream Processing

What's hot (20)

PPTX
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
PPTX
Eron Wright - Flink Security Enhancements
PDF
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
PDF
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
PDF
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
PDF
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
PPTX
Large scale near real-time log indexing with Flume and SolrCloud
PDF
Robust Operations of Kafka Streams
PPTX
Flink 0.10 - Upcoming Features
PPTX
Real-time streaming and data pipelines with Apache Kafka
PPT
Step-by-Step Introduction to Apache Flink
PDF
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
PDF
Power of the Log: LSM & Append Only Data Structures
PPTX
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
PPT
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
PPTX
Streaming and Messaging
PDF
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
PDF
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
PDF
Stream Processing made simple with Kafka
PDF
Pulsar connector on flink 1.14
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
Eron Wright - Flink Security Enhancements
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Large scale near real-time log indexing with Flume and SolrCloud
Robust Operations of Kafka Streams
Flink 0.10 - Upcoming Features
Real-time streaming and data pipelines with Apache Kafka
Step-by-Step Introduction to Apache Flink
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Power of the Log: LSM & Append Only Data Structures
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Streaming and Messaging
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
Stream Processing made simple with Kafka
Pulsar connector on flink 1.14
Ad

Similar to Stephan Ewen - Running Flink Everywhere (20)

PPTX
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
PDF
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
PDF
Improvements to Flink & it's Applications in Alibaba Search
PDF
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
PDF
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
PDF
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
PDF
A look at Flink 1.2
PDF
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
PDF
Apache flink
PPTX
Operating Flink on Mesos at Scale
PPTX
Flink System Overview
PPTX
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
PPTX
Stephan Ewen - Experiences running Flink at Very Large Scale
PDF
Deploying Flink on Kubernetes - David Anderson
PPTX
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
PPTX
Multi-tenant Flink as-a-service with Kafka on Hopsworks
PPTX
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
PDF
Flink Jobs Deployment On Kubernetes
PPTX
Chicago Flink Meetup: Flink's streaming architecture
PDF
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
Improvements to Flink & it's Applications in Alibaba Search
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
A look at Flink 1.2
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Apache flink
Operating Flink on Mesos at Scale
Flink System Overview
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Stephan Ewen - Experiences running Flink at Very Large Scale
Deploying Flink on Kubernetes - David Anderson
Virtual Flink Forward 2020: Integrate Flink with Kubernetes natively - Yang Wang
Multi-tenant Flink as-a-service with Kafka on Hopsworks
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Flink Jobs Deployment On Kubernetes
Chicago Flink Meetup: Flink's streaming architecture
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Flink powered stream processing platform at Pinterest
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
The Current State of Table API in 2022
PDF
Flink SQL on Pulsar made easy
PPTX
Dynamic Rule-based Real-time Market Data Alerts
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PPTX
Processing Semantically-Ordered Streams in Financial Services
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PDF
Batch Processing at Scale with Flink & Iceberg
Building a fully managed stream processing platform on Flink at scale for Lin...
Evening out the uneven: dealing with skew in Flink
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
One sink to rule them all: Introducing the new Async Sink
Tuning Apache Kafka Connectors for Flink.pptx
Flink powered stream processing platform at Pinterest
Apache Flink in the Cloud-Native Era
Where is my bottleneck? Performance troubleshooting in Flink
Using the New Apache Flink Kubernetes Operator in a Production Deployment
The Current State of Table API in 2022
Flink SQL on Pulsar made easy
Dynamic Rule-based Real-time Market Data Alerts
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Processing Semantically-Ordered Streams in Financial Services
Tame the small files problem and optimize data layout for streaming ingestion...
Batch Processing at Scale with Flink & Iceberg

Recently uploaded (20)

PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Introduction to Business Data Analytics.
PPTX
1_Introduction to advance data techniques.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Global journeys: estimating international migration
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Lecture1 pattern recognition............
PDF
Foundation of Data Science unit number two notes
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Reliability_Chapter_ presentation 1221.5784
Supervised vs unsupervised machine learning algorithms
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Introduction to Business Data Analytics.
1_Introduction to advance data techniques.pptx
IB Computer Science - Internal Assessment.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Global journeys: estimating international migration
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Knowledge Engineering Part 1
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Major-Components-ofNKJNNKNKNKNKronment.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Lecture1 pattern recognition............
Foundation of Data Science unit number two notes
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

Stephan Ewen - Running Flink Everywhere

  • 2. How is Flink deployed? 2 Standalone Cluster Embedded Service (OSGI) YARN Sessions Standalone Cloud Docker on Mesos Docker/Kubernetes YARN->Myriad->Mesos YARN Jobs A two minute search on the mailing list reveals
  • 3. How is Flink deployed? 3 Standalone Cluster Embedded Service (OSGI) YARN Sessions Standalone Cloud Docker on Mesos Docker/Kubernetes YARN->Myriad->Mesos YARN Jobs A two minute search on the mailing list reveals Mesos Sessions Mesos Jobs (soon!)
  • 4. How is Flink deployed? 4 Standalone Cluster Embedded Service (OSGI) YARN Sessions Standalone Cloud Docker on Mesos Docker/Kubernetes YARN->Myriad->Mesos YARN Jobs Users run mostly isolated jobs or multi-job sessions Mesos Sessions Mesos Jobs
  • 5. Resource Management 5 Standalone Cluster Embedded Service (OSGI) YARN Sessions Standalone Cloud Docker on Mesos Docker/Kubernetes YARN->Myriad->Mesos YARN Jobs Resources controlled by the framework or another service. Mesos Sessions Mesos Jobs
  • 6. More dimensions coming up… 6 Dynamic Resources • Number of TaskManagers changes over job lifetime "Trusted" processes • Run under superuser credential and dispatch jobs No blocking on any process type • YARN job needs to continue while ApplicationMaster is down Uniform vs. Heterogeneous Resources • Run different functions in different size containers • E.g., simple mapper in small container, heavy window operator in large container Avoiding "Job Submit" step
  • 8. Flink Improvement Proposal 6 8 Currently driving parties: Core Idea • Creating composable building blocks • Create different compositions for different scenarios FLIP-6 design document: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
  • 9. Recap: Current status (Standalone) 9 Standalone Flink Cluster Client (2) Submit Job JobManager TaskManager TaskManager TaskManager (3) Deploy Tasks (1) Register
  • 10. Recap: Current status (YARN) 10 YARN ResourceManager YARN Cluster Client (1) Submit YARN App. (FLINK) Application Master JobManager TaskManager TaskManager TaskManager (2) Spawn AppMaster (4) Start TaskManagers (8) Deploy Tasks (3) Poll status (6) All TaskManager started (5) Register (7) Submit Job
  • 11. The Building Blocks 11 • ClusterManager-specific • May live across jobs • Manages available Containers/TaskManagers • Used to acquire / release resources ResourceManager TaskManagerJobManager • Registers at ResourceManager • Gets tasks from one or more JobManagers • Single job only, started per job • Thinks in terms of "task slots" • Deploys and monitors job/task execution Dispatcher • Lives across jobs • Touch-point for job submissions • Spawns JobManagers • May spawn ResourceManager
  • 12. The Building Blocks 12 ResourceManager (1) Request slots TaskManager JobManager (2) Start TaskManager (3) Register (4) Deploy Tasks
  • 13. Building Flink-on-YARN 13 YARN ResourceManager YARN Cluster YARN Cluster Client (1) Submit YARN App. (JobGraph / JARs) Application Master Flink-YARN ResourceManager JobManager TaskManager TaskManager TaskManager (2) Spawn AppMaster (4) Start TaskManagers (6) Deploy Tasks (5) Register (3) Request slots
  • 14. Building Flink-on-YARN Main differences from current YARN mode  All containers started with JARs, config files in classpath  Credentials & Secrets are strictly bound to a single job  Slots are allocated/released as needed/freed • Basic building block for elastic resource usage  Client disconnects after submitting job, does not need to wait until TaskManagers are up 14
  • 15. Building Flink-on-YARN (separate RM) 15 YARN ResourceManager YARN Cluster YARN Cluster Client (1) Submit YARN App. (JobGraph / JARs) Application Master Flink-YARN ResourceManager JobManager TaskManager TaskManager TaskManager (2) Spawn AppMaster (4) Start TaskManagers (6) Deploy Tasks (5) Register(4) Request slots (3) Start JobMngr
  • 16. Building Flink-on-YARN (w/ dispatcher) 16 YARN ResourceManager YARN Cluster YARN Cluster Client (1) HTTP POST JobGraph/Jars Application Master Flink-YARN ResourceManager JobManager TaskManager TaskManager TaskManager (3) Spawn AppMaster (5) Start TaskManagers (7) Deploy Tasks (6) Register Flink YARN Dispatcher (2) Submit YARN App. (JobGraph / JARs) (4) Request slots
  • 17. Building Flink-on-Mesos 17 Mesos Master Mesos Cluster Mesos Cluster Client (1) HTTP POST JobGraph/Jars Flink Master Process Flink Mesos ResourceManager JobManager TaskManager TaskManager TaskManager (3) Start Process (and supervise) (5) Start TaskManagers (7) Deploy Tasks (6) Register (4) Request slots Flink Mesos Dispatcher (2) Allocate container for Flink master
  • 18. Building Standalone 18 Standalone Cluster Flink Cluster Client (1) Submit JobGraph/Jars Flink Master Process Standalone ResourceManager TaskManager TaskManager TaskManager (7) Deploy Tasks (1) Register (3) Request slots JobManager JobManager Dispatcher (2) Start JobMngr Standby Master Process Standby Master Process
  • 19. Master Container Flink Master Process Building Flink-on-Docker/K8S 19 Flink-Container ResourceManager JobManager Program Runner (2) Run & Start Worker Container TaskManager Worker Container TaskManager Worker Container TaskManager (3) Register (1) Container framework starts Master & Worker Containers (4) Deploy Tasks
  • 20. Building Flink-on-Docker/K8S  This is a blueprint for all setups where external services control resources and start new TaskManagers • For example AWS EC2 Flink image with auto-scaling groups  Can be extended to have N equal containers, out of which one becomes master, remainder workers  With upcoming dynamic-scaling feature (see Till's talk), JobManager scales job to use all available resources 20
  • 22. Example: YARN session ApplicationMaster Flink-YARN ResourceManager (5) Request slots JobManager (A) JobManager (B) Dispatcher (4) Start JobMngr YARN ResourceManager YARN Cluster Client (1) Submit YARN App. (FLINK – session) TaskManager TaskManager TaskManager (2) Spawn AppMaster (6) Start TaskManagers (8, 12) Deploy Tasks (7) Register (3) Submit Job A (11) Request slots (10) Start JobMngr (9) Submit Job B 22
  • 23. Sessions vs. Jobs  For each Job submitted, the session will spawn its own JobManager  All jobs run under session-user credentials  ResourceManager holds on to containers for a certain time • Jobs quickly following one another reuse containers (quicker response)  Internally, sessions build on the dispatcher component 23
  • 25. More stuff  Dynamically acquire/release resources • Slots are allocated/released from Resource Manager as needed • ResourceManager allocates/releases containers over time • Strong interplay with "Dynamic Scaling" (rf. talk by Till yesterday)  Resource Profiles: Containers of different size • Requests can pass a "profile" (CPU / memory / disk), or simply use "default profile" • Resource Managers YARN & Mesos can allocate respective containers 25
  • 26. Wrapping it up  It’s a zoo of cluster managers out there • Following different paradigms  Usage patterns vary because of Flink's broad use cases • Isolated long running jobs vs. many short-lived jobs • Shared clusters vs. per-user authenticated resources  We are making "jobs" and "sessions" explicit constructs  Flexible building blocks, composed in various ways to accommodate different scenarios 26
  • 28. Flink Streaming cornerstones 28 Low latency High Throughput Well-behaved flow control (back pressure) Make more sense of data Works on real-time and historic data Performant Streaming Event Time APIs Libraries Stateful Streaming Globally consistent savepoints Exactly-once semantics for fault tolerance Windows & user-defined state Flexible windows (time, count, session, roll-your own) Complex Event Processing