SlideShare a Scribd company logo
Multi-Tenant Flink-as-a-Service on YARN
Jim Dowling
Associate Prof @ KTH
Senior Researcher @ SICS
CEO @ Logical Clocks AB
Slides by Jim Dowling, Theofilos Kakantousis
Berlin, 13th September 2016
www.hops.io
@hopshadoop
A Polyglot
2
Polyglot Data Parallel Processing
•Stream Processing
- Beam/Flink, Spark
•ETL/Batch Processing
- Spark, MapReduce
•SQL-on-hadoop
- Hive, Presto, SparkSQL
•Distributed ML
- SparkML, FlinkML
•Deep Learning
- Distributed Tensorflow
3
Flink Standalone good enough for some
•Enterprises are polyglot due to economies of scale
•Standalone Flink works great for enterprises
- Dedicate some servers
- Dedicate some SREs
4
Polyglot Data Parallel Processing In Context
5
Data Processing
Spark, MR, Flink, Presto, Tensorflow
Storage
HDFS, MapR, S3, WAS
Resource Management
YARN, Mesos, Kubernetes
Metadata
Hive, Parquet, Authorization, Search
Flink for the Little Guy
•Flink-as-a-Service on Hops Hadoop
- Fully UI Driven, Easy to Install
•Project-Based Multi-tenancy
6
Hops
Flink-as-a-Service running on hops.site
7
SICS ICE: A datacenter research and test environment
Purpose: Increase knowledge, strengthen universities, companies and researchers
HopsFS Architecture
8
NameNodes
NDB
Leader
HDFS Client
DataNodes
Hops-YARN Architecture
9
ResourceMgrs
NDB
Scheduler
YARN Client
NodeManagers
Resource Trackers
Heartbeats
(70-95%)
AM Reqs
(5-30%)
HopsFS Throughput (Spotify Workload)
10
NDB Setup: 8 Nodes using Xeon E5-2620 2.40GHz Processors and 10GbE.
NameNodes: Xeon E5-2620 2.40GHz Processors machines and 10GbE.
HopsFS Metadata Scaleout
11Assuming 256MB Block Size, 100 GB JVM Heap for Apache Hadoop
Hopsworks
12
Hopsworks – Project-Based Multi-Tenancy
•A project is a collection of
- Users with Roles
- HDFS DataSets
- Kafka Topics
- Notebooks, Jobs
•Per-Project quotas
- Storage in HDFS
- CPU in YARN
• Uber-style Pricing
•Sharing across Projects
- Datasets/Topics
13
project
dataset 1
dataset N
Topic 1
Topic N
Kafka
HDFS
Hopsworks – Dynamic Roles
14
Alice@gmail.com
NSA__Alice
Authenticate
Users__Alice
Glassfish
HopsFS
HopsYARN
Projects
Secure
Impersonation
Kafka
X.509
Certificates
Look Ma, No Kerberos!
•For each project, a user is issued with a X.509
certificate, containing the project-specific userID.
•Services are also issued with X.509 certificates.
- Both user and service certs are signed with the same CA.
- Services extract the userID from RPCs to identify the caller.
•Netflix’ BLESS system is a similar model, with short-
lived certificates.
X.509 Certificate Per Project-Specific User
16
Alice@gmail.com
Authenticate
Add/Del
Users
Distributed
Database
Insert/Remove CertsProject
Mgr
Root
CA
Services
Hadoop
Spark
Kafka
etc
Cert Signing
Requests
Flink on YARN
•Two modes: detached or blocking
•Hopsworks supports detached mode
- Client started locally, then exits after the job is submitted
to YARN
- No accumulator results or exceptions from the
ExecutionEnvironment.execute()
- Can only kill YARN job, not Flink session. Cleanup issues.
•New Architecture proposal for a Flink Dispatcher
A Flink/Kafka Job on YARN with Hopsworks
18
Alice@gmail.com
1. Launch Flink Job
Distributed
Database
2. Get certs,
service endpoints
YARN Private
LocalResources
Flink/Kafka Streaming App
4. Materialize certs
3. YARN Job + config
6. Get Schema
7. Consume
Produce
5. Read Certs
Hopsworks
KafkaUtil
Flink Stream Producer in Secure Kafka
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
String topic = parameterTool.get("topic");
1. Discover: Schema Registry and Kafka Broker Endpoints
2. Create: Kafka Properties file with certs and broker details
3. Create: producer using Kafka Properties
4. Distribute: X.509 certs to all hosts on the cluster
5. Download: the Schema for the Topic from the Schema Registry
6. Do this all securely
DataStream<…> messageStream = env.addSource(…);
messageStream.addSink(producer);
env.execute("Write to Kafka");
19
Developer
Operations
Flink/Kafka Stream Producer in Hopsworks
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
String topic = parameterTool.get("topic");
FlinkProducer producer = KafkaUtil.getFlinkProducer(topic);
DataStream<…> messageStream = env.addSource(…);
messageStream.addSink(producer);
env.execute("Write to Kafka");
20https://github.com/hopshadoop/hops-kafka-examples
Flink/Kafka Stream Consumer in Hopsworks
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
String topic = parameterTool.get("topic");
FlinkConsumer consumer = KafkaUtil.getFlinkConsumer(topic);
DataStream<…> messageStream = env.addSource(consumer);
RollingSink<String> rollingSink = ... // HDFS path
messageStream.addSink(rollingSink);
env.execute(“Read from Kafka, write to HDFS");
21https://github.com/hopshadoop/hops-kafka-examples
Zeppelin Support for Flink
22
Karamel/Chef for Automated Installation
23
Google Compute Engine BareMetal
Demo
24
Summary
•Hopsworks provides first-class support for
Flink-as-a-Service
- Streaming or Batch Jobs
- Zeppelin Notebooks
•Hopworks simplifies secure use of Kafka in Flink on
YARN
•YARN support for Flink still a work-in-progress
25
Hops Team
Active: Jim Dowling, Seif Haridi, Tor Björn Minde,
Gautier Berthou, Salman Niazi, Mahmoud Ismail,
Theofilos Kakantousis, Johan Svedlund Nordström,
Konstantin Popov, Antonios Kouzoupis.
Ermias Gebremeskel, Daniel Bekele
Alumni: Vasileios Giannokostas, Misganu Dessalegn,
Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca,
K “Sri” Srijeyanthan, Steffen Grohsschmiedt,
Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems,
Stig Viaene, Hooman Peiro, Evangelos Savvidis,
Jude D’Souza, Qi Qi, Gayana Chandrasekara,
Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos,
Peter Buechler, Pushparaj Motamari, Hamid Afzali,
Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
Hops
[Hadoop For Humans]
Join us!
http://github.com/hopshadoop

More Related Content

PDF
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
PPTX
Strata Hadoop Hopsworks
PPTX
Eron Wright - Flink Security Enhancements
PPTX
Slim Baltagi – Flink vs. Spark
PDF
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
PDF
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
PDF
Streaming Sensor Data Slides_Virender
PDF
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Strata Hadoop Hopsworks
Eron Wright - Flink Security Enhancements
Slim Baltagi – Flink vs. Spark
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Streaming Sensor Data Slides_Virender
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira

What's hot (20)

PDF
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
PPTX
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
PPTX
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
PDF
Cooperative Data Exploration with iPython Notebook
PDF
Data science lifecycle with Apache Zeppelin
PDF
Mobius: C# Language Binding For Spark
PDF
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
PPTX
Securing Hadoop in an Enterprise Context
PPTX
Streaming in the Wild with Apache Flink
PPTX
Emerging technologies /frameworks in Big Data
PDF
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
PPTX
Event Detection Pipelines with Apache Kafka
PDF
Spark Summit EU talk by Ruben Pulido Behar Veliqi
PDF
Migrating pipelines into Docker
PPTX
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
PPTX
Kafka connect-london-meetup-2016
PDF
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
PDF
Introduction to Apache NiFi And Storm
PPT
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Cooperative Data Exploration with iPython Notebook
Data science lifecycle with Apache Zeppelin
Mobius: C# Language Binding For Spark
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Securing Hadoop in an Enterprise Context
Streaming in the Wild with Apache Flink
Emerging technologies /frameworks in Big Data
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Event Detection Pipelines with Apache Kafka
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Migrating pipelines into Docker
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Kafka connect-london-meetup-2016
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Introduction to Apache NiFi And Storm
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Ad

Similar to Multi-tenant Flink as-a-service with Kafka on Hopsworks (20)

PPTX
On-premise Spark as a Service with YARN
PDF
Spark Summit EU talk by Jim Dowling
PDF
Secure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks
PDF
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
PDF
Spark summit-east-dowling-feb2017-full
PPTX
Chicago Flink Meetup: Flink's streaming architecture
PPTX
Stream processing on mobile networks
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
PDF
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
PPTX
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
PPTX
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
PDF
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
PPTX
Get most out of Spark on YARN
PDF
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
PDF
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
PPTX
Stephan Ewen - Running Flink Everywhere
PPTX
Flink Streaming @BudapestData
PDF
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
PPT
Spark & Yarn better together 1.2
On-premise Spark as a Service with YARN
Spark Summit EU talk by Jim Dowling
Secure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Spark summit-east-dowling-feb2017-full
Chicago Flink Meetup: Flink's streaming architecture
Stream processing on mobile networks
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Flink Forward SF 2017: Till Rohrmann - Redesigning Apache Flink’s Distributed...
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
Get most out of Spark on YARN
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Stephan Ewen - Running Flink Everywhere
Flink Streaming @BudapestData
Flink Forward Berlin 2017: Patrick Lucas - Flink in Containerland
Spark & Yarn better together 1.2
Ad

More from Jim Dowling (20)

PDF
ARVC and flecainide case report[EI] Jim.docx.pdf
PDF
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PDF
Serverless ML Workshop with Hopsworks at PyData Seattle
PDF
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PDF
_Python Ireland Meetup - Serverless ML - Dowling.pdf
PDF
Building Hopsworks, a cloud-native managed feature store for machine learning
PDF
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
PDF
Ml ops and the feature store with hopsworks, DC Data Science Meetup
PDF
Hops fs huawei internal conference july 2021
PDF
Hopsworks MLOps World talk june 21
PDF
Hopsworks Feature Store 2.0 a new paradigm
PDF
Metadata and Provenance for ML Pipelines with Hopsworks
PDF
GANs for Anti Money Laundering
PDF
Berlin buzzwords 2020-feature-store-dowling
PDF
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
PDF
Hopsworks data engineering melbourne april 2020
PDF
The Bitter Lesson of ML Pipelines
PDF
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
PDF
Hopsworks at Google AI Huddle, Sunnyvale
PDF
Hopsworks in the cloud Berlin Buzzwords 2019
ARVC and flecainide case report[EI] Jim.docx.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
Serverless ML Workshop with Hopsworks at PyData Seattle
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
Building Hopsworks, a cloud-native managed feature store for machine learning
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Hops fs huawei internal conference july 2021
Hopsworks MLOps World talk june 21
Hopsworks Feature Store 2.0 a new paradigm
Metadata and Provenance for ML Pipelines with Hopsworks
GANs for Anti Money Laundering
Berlin buzzwords 2020-feature-store-dowling
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Hopsworks data engineering melbourne april 2020
The Bitter Lesson of ML Pipelines
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks in the cloud Berlin Buzzwords 2019

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
Modernizing your data center with Dell and AMD
Reach Out and Touch Someone: Haptics and Empathic Computing
Chapter 3 Spatial Domain Image Processing.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
The Rise and Fall of 3GPP – Time for a Sabbatical?
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Monthly Chronicles - July 2025
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
Unlocking AI with Model Context Protocol (MCP)
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Big Data Technologies - Introduction.pptx

Multi-tenant Flink as-a-service with Kafka on Hopsworks

  • 1. Multi-Tenant Flink-as-a-Service on YARN Jim Dowling Associate Prof @ KTH Senior Researcher @ SICS CEO @ Logical Clocks AB Slides by Jim Dowling, Theofilos Kakantousis Berlin, 13th September 2016 www.hops.io @hopshadoop
  • 3. Polyglot Data Parallel Processing •Stream Processing - Beam/Flink, Spark •ETL/Batch Processing - Spark, MapReduce •SQL-on-hadoop - Hive, Presto, SparkSQL •Distributed ML - SparkML, FlinkML •Deep Learning - Distributed Tensorflow 3
  • 4. Flink Standalone good enough for some •Enterprises are polyglot due to economies of scale •Standalone Flink works great for enterprises - Dedicate some servers - Dedicate some SREs 4
  • 5. Polyglot Data Parallel Processing In Context 5 Data Processing Spark, MR, Flink, Presto, Tensorflow Storage HDFS, MapR, S3, WAS Resource Management YARN, Mesos, Kubernetes Metadata Hive, Parquet, Authorization, Search
  • 6. Flink for the Little Guy •Flink-as-a-Service on Hops Hadoop - Fully UI Driven, Easy to Install •Project-Based Multi-tenancy 6 Hops
  • 7. Flink-as-a-Service running on hops.site 7 SICS ICE: A datacenter research and test environment Purpose: Increase knowledge, strengthen universities, companies and researchers
  • 10. HopsFS Throughput (Spotify Workload) 10 NDB Setup: 8 Nodes using Xeon E5-2620 2.40GHz Processors and 10GbE. NameNodes: Xeon E5-2620 2.40GHz Processors machines and 10GbE.
  • 11. HopsFS Metadata Scaleout 11Assuming 256MB Block Size, 100 GB JVM Heap for Apache Hadoop
  • 13. Hopsworks – Project-Based Multi-Tenancy •A project is a collection of - Users with Roles - HDFS DataSets - Kafka Topics - Notebooks, Jobs •Per-Project quotas - Storage in HDFS - CPU in YARN • Uber-style Pricing •Sharing across Projects - Datasets/Topics 13 project dataset 1 dataset N Topic 1 Topic N Kafka HDFS
  • 14. Hopsworks – Dynamic Roles 14 Alice@gmail.com NSA__Alice Authenticate Users__Alice Glassfish HopsFS HopsYARN Projects Secure Impersonation Kafka X.509 Certificates
  • 15. Look Ma, No Kerberos! •For each project, a user is issued with a X.509 certificate, containing the project-specific userID. •Services are also issued with X.509 certificates. - Both user and service certs are signed with the same CA. - Services extract the userID from RPCs to identify the caller. •Netflix’ BLESS system is a similar model, with short- lived certificates.
  • 16. X.509 Certificate Per Project-Specific User 16 Alice@gmail.com Authenticate Add/Del Users Distributed Database Insert/Remove CertsProject Mgr Root CA Services Hadoop Spark Kafka etc Cert Signing Requests
  • 17. Flink on YARN •Two modes: detached or blocking •Hopsworks supports detached mode - Client started locally, then exits after the job is submitted to YARN - No accumulator results or exceptions from the ExecutionEnvironment.execute() - Can only kill YARN job, not Flink session. Cleanup issues. •New Architecture proposal for a Flink Dispatcher
  • 18. A Flink/Kafka Job on YARN with Hopsworks 18 Alice@gmail.com 1. Launch Flink Job Distributed Database 2. Get certs, service endpoints YARN Private LocalResources Flink/Kafka Streaming App 4. Materialize certs 3. YARN Job + config 6. Get Schema 7. Consume Produce 5. Read Certs Hopsworks KafkaUtil
  • 19. Flink Stream Producer in Secure Kafka StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); String topic = parameterTool.get("topic"); 1. Discover: Schema Registry and Kafka Broker Endpoints 2. Create: Kafka Properties file with certs and broker details 3. Create: producer using Kafka Properties 4. Distribute: X.509 certs to all hosts on the cluster 5. Download: the Schema for the Topic from the Schema Registry 6. Do this all securely DataStream<…> messageStream = env.addSource(…); messageStream.addSink(producer); env.execute("Write to Kafka"); 19 Developer Operations
  • 20. Flink/Kafka Stream Producer in Hopsworks StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); String topic = parameterTool.get("topic"); FlinkProducer producer = KafkaUtil.getFlinkProducer(topic); DataStream<…> messageStream = env.addSource(…); messageStream.addSink(producer); env.execute("Write to Kafka"); 20https://github.com/hopshadoop/hops-kafka-examples
  • 21. Flink/Kafka Stream Consumer in Hopsworks StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); String topic = parameterTool.get("topic"); FlinkConsumer consumer = KafkaUtil.getFlinkConsumer(topic); DataStream<…> messageStream = env.addSource(consumer); RollingSink<String> rollingSink = ... // HDFS path messageStream.addSink(rollingSink); env.execute(“Read from Kafka, write to HDFS"); 21https://github.com/hopshadoop/hops-kafka-examples
  • 23. Karamel/Chef for Automated Installation 23 Google Compute Engine BareMetal
  • 25. Summary •Hopsworks provides first-class support for Flink-as-a-Service - Streaming or Batch Jobs - Zeppelin Notebooks •Hopworks simplifies secure use of Kafka in Flink on YARN •YARN support for Flink still a work-in-progress 25
  • 26. Hops Team Active: Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Johan Svedlund Nordström, Konstantin Popov, Antonios Kouzoupis. Ermias Gebremeskel, Daniel Bekele Alumni: Vasileios Giannokostas, Misganu Dessalegn, Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, K “Sri” Srijeyanthan, Steffen Grohsschmiedt, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Jude D’Souza, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
  • 27. Hops [Hadoop For Humans] Join us! http://github.com/hopshadoop

Editor's Notes

  • #6: Stream Processing Flink ETL workflow (batch) processing Spark SQL-on-hadoop Presto, SparkSQL, Hive Deep Learning Distributed Tensorflow
  • #14: Privileges – upload/download data, run analysis jobs Like RBAC solution. All access via HopsWorks.
  • #15: 14
  • #17: 16
  • #18: Netty dependency conflict with our app in blocking mode Impacts: application size, main class run on our multi-tenant application - System.exit(), logs are written locally No accumulator results or exceptions from the ExecutionEnvironment.execute() call Can only kill YARN job, not Flink session – cleanup issues Flink Dispatcher The client directly starts the Job in YARN, rather than bootstrapping a cluster and after that submitting the job to that cluster. The client can hence disconnect immediately after the job was submitted All user code libraries and config files are directly in the Application Classpath, rather than in the dynamic user code class loader Containers are requested as needed and will be released when not used any more The “as needed” allocation of containers allows for different profiles of containers (CPU / memory) to be used for different operators
  • #19: 18
  • #20: public class HopsKafkaUtil implements Serializable { KAFKA_BROKERADDR_ENV_VAR = "kafka.brokeraddress"; KAFKA_RESTENDPOINT = "kafka.restendpoint"; KAFKA_SESSIONID_ENV_VAR = "kafka.sessionid"; KAFKA_PROJECTID_ENV_VAR = "kafka.projectid"; KAFKA_K_CERTIFICATE_ENV_VAR = "kafka_k_certificate"; KAFKA_T_CERTIFICATE_ENV_VAR = "kafka_t_certificate"; String getHopsConsumer(String topic) {…} String getHopsProducer(String topic) {…} String getHopsFlinkKafkaConsumer(String topic) {…} String getHopsFlinkKafkaProducer(String topic) {…} String getSchema(String topicName, int versionId) {..} Map<String, String> getKafkaProps(String propsStr) {…} }
  • #21: HopsKafkaProperties.defaultProps())
  • #22: HopsKafkaProperties.defaultProps())
  • #23: https://gist.github.com/rawkintrevo/ad206879753733f5a536
  • #28: I need some sound-effects to go with that.