SlideShare a Scribd company logo
End-to-End ML pipelines with Beam,
Flink, TensorFlow, and Hopsworks
Theofilos Kakantousis
Software Engineer & COO
@theofiloskak
3rd Apache Beam meetup, Stockholm, July 2019
Agenda
1. End-to-end ML pipelines
2. What is Hopsworks
3. Beam Portable Runner with Flink in Hopsworks
4. ML Pipelines with Beam and TensorFlow Extended
5. Demo
ML Pipelines
End-to-end ML Pipeline
Data
Prep
Data
Ingest
Train Serve
Online
Monitor
Distributed Storage
Raw
Data
Data
Lake
Resource Manager
Typical Feature Store pipeline
Hopsworks Timeline
“If you’re working with big data and Hadoop, this one paper could repay your
investment in the Morning Paper many times over.... HopsFS is a huge win.”
- Adrian Colyer, The Morning Paper
World’s first Hadoop
platform to support
GPUs-as-a-Resource
World’s fastest
Hadoop Published
at USENIX FAST
with Oracle and
Spotify
World’s First
Open Source Feature
Store for Machine
Learning
World’s First
Distributed Filesystem to
store small files in metadata
on NVMe disks
Winner of IEEE
Scale Challenge
2017
with HopsFS -
1.2m ops/sec
2017
World’s most scalable
Filesystem with
Multi Data Center
Availability
2018 2019
World’s first
Open Source Platform
to support TensorFlow
Extended (TFX) on
Beam
What is Hopsworks
What is Hopsworks
True Project-based multi-tenancy
Proj-XProject-42
Kafka TopicResources /Projs/My/Data
Project-AllExperimentsModels
Experiments
Hopsworks REST API
● Manage Hopsworks resources via the REST API
○ Projects
○ Datasets
○ Jobs
○ FeatureStore
○ Experiments
○ ModelServing
○ Kafka
○ ...
● Documented with Swagger and hosted on SwaggerHub
○ https://app.swaggerhub.com/apis-docs/logicalclocks/hopsworks-api/0.10.0
Beam on Hopsworks
Beam Portable Runner
Beam Model: Fn Runners
Apache
Flink
Apache
Spark
Beam Model: Pipeline Construction
Other
LanguagesBeam Java
Beam
Python
Execution Execution
Cloud
Dataflo
w
Execution
1. End users: who want to
write pipelines in a
language that’s familiar.
2. SDK writers: who want
to make Beam concepts
available in new
languages.
3. Runner writers: who
have a distributed
processing environment
and want to support
Beam pipelines
https://s.apache.org/apache-beam-project-overview
Beam-as-a-Service in Hopsworks
● Develop Beam pipelines in Python from Jupyter notebooks
● Tooling to simplify deployment and execution
● Manage lifecycle of Beam Portability JobService(JobServer)
● Logging and monitoring of Beam jobs
● SDK Workers(harness) with conda env
● Scalable execution on Flink/Spark clusters
Hopsworks API
● hops-util-py (Python) and HopsUtil(Java)
● Simplifies development:
○ Sets security config
○ Discover cluster services
○ Helper methods for the Hopsworks REST API
○ ML Experiments
● Manage Beam Runners and Job Service
https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util
Beam Portability - Process vs Docker
● Docker:
○ Build image with all your
dependencies
○ Update or modify? build new
containers
○ Additional infrastructure
components
● Process:
○ Install dependencies on all
servers
○ Management of
dependencies?
○ Easy to update and modify
libraries
○ Challenge? Multi-tenancy &
keep servers in sync
● SDK Worker: SDK-provided program responsible for executing user code
● How to manage the user’s dependencies, libraries, … ?
First class Python: Conda in the Cluster
Conda Repo
Hopsworks Cluster
No need to write
Dockerfiles
Jupyter dashboard in Hopsworks
● Manage notebook
settings from
dashboard
Jupyter dashboard in Hopsworks
● Execute a Beam Python
pipeline
● With the Python kernel
either in a docker
container managed by
Kubernetes or as a local
Python process.
● In a PySpark executor in
the cluster.
Notebooks as Beam jobs in ML pipelines
Beam portability architecture in Hopsworks
https://www.slideshare.net/ThomasWeise/python-streaming-pipelines-on-flink-beam-meetup-at-lyft-2019
Beam portability architecture in Hopsworks
HopsFS
Local/YARN/K8s
Hopsworks
Session cluster on YARN
Beam portability architecture in Hopsworks
Local/YARN/K8s
Compiled and shipped with
HopsFS dependencies
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
Beam portability architecture in Hopsworks
Local/YARN/K8s
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
# creates and starts runner
# localizes Job Service jar file
from HopsFS
# Provides arguments (ports,
artifacts_dir, etc.)
# Start Job Service and
returns host,port
# Job Service automatically
shuts down when Python
pipeline shuts down
host,port = start_runner()
Beam portability architecture in Hopsworks
Local/YARN/K8s
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
Python conda env and
Hopsworks env
variables are set for
SDKWorker
Hopsworks API
https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util
def start_runner(
runner="flink",
runner_name="session",
runner_config=config)
def start_jobservice(
runner = "Resources",
artifacts_dir="Resources",
job_server_path="hdfs:///user/flink/",
job_server_jar="beam-runners-flink-1.8-job-server-2.13.0.jar",
sdk_worker_parallelism=1)
hops.beam.start_runner()
hops.beam.start_jobservice()
Logging
● Flink JobManager and TaskManager
● Beam Job service
○ Local mode - logs in project’s Jupyter staging dir
○ Cluster - logs in the PySpark container where process is running.
● SDK Worker
○ Logs are in the Flink TaskManager container
● Collect and visualize with the ELK stack
○ Logs are accessible only by project members
Logging
Secure Beam with TLS certificates
TensorFlow Extended (TFX)
Hidden Technical Debt in Machine Learning Systems
Data validation
Distributed
Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data
Collection
Hardware
Management
Data Model Prediction
φ(x)
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
TensorFlow Extended (TFX)
https://www.tensorflow.org/tfx
TFX on a Flink Cluster with Portable Runner
TFX on a Flink Cluster with Portable Runner
Distributed Deep Learning in Hopsworks
Executor 1 Executor N
Driver
HopsFS (HDFS)TensorBoard Model Serving
Experiments - TensorBoard
● Repeatable
experiments
● Manage
experiments
metadata
● Integration with
Tensorboard
Orchestration
Apache Airflow-as-a-Service
● Airflow available as a
multi-tenant service
in a Hopsworks
● Develop pipelines
with Hopsworks
operators and
sensors
Apache Airflow-as-a-Service
Apache Airflow-as-a-Service - TFX pipeline
●
Putting it all together
Horizontally Scalable ML Pipelines
Raw Data
Event Data
Monitor
HopsFS
Serving
Feature Store /
TFX Transform
Data PrepIngest DeployExperiment /
Train
logs
logs
Metadata Store
External
Model Analysis
FeatureStore
Compatibility...
● Hopsworks-1.0
● Beam 2.13.0
● Flink 1.8.0
● TensorFlow 1.14.0
● TFX 0.13
● TensorFlow Model Analysis 0.13.2
Demo
Conclusions & Future Work
● Summary
○ Hopsworks v1.0 the first on-prem open source horizontally scalable platform to support Beam
Portable Runner with Flink runner
○ Develop and Manage lifecycle of horizontally scalable End-to-End ML Pipelines with Beam and
TFX
● Future Work
○ Add support for Spark Runner
○ Export metrics for Flink runner to InfluxDB and visualize with Grafana
Contributors
Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias
Gebremeskel, Fabio Buso, Antonios Kouzoupis, Kim Hammar, Steffen Grohsschmiedt, Alex Ormenisan,
Robin Andersson, Moritz Meister, Kajetan Maliszewski, Netsanet Gebretsadkan Kidane, Sina Sheikholeslami,
Joel Stenkvist, August Bonds, Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer,
Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre
Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Qi Qi, ...
How to get started with Hopsworks?
@hopsworks
Register for a free account at: www.hops.site
Images available for AWS, GCE, Virtualbox.
https://www.logicalclocks.com/
https://github.com/logicalclocks/hopsworks
https://www.meetup.com/HopsML-Stockholm
Reach us
@logicalclocks

More Related Content

PDF
End to-end ml pipelines with beam, flink, tensor flow, and hopsworks (beam su...
PDF
Involvement in OpenHPC
PDF
Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...
PDF
Kubeflow Control Plane 中文
PDF
Kubeflow repos
PDF
Linaro: High Performance Computing (HPC)
PDF
Lustre Best Practices
PDF
Exploring the Programming Models for the LUMI Supercomputer
End to-end ml pipelines with beam, flink, tensor flow, and hopsworks (beam su...
Involvement in OpenHPC
Flink Forward Berlin 2017: Dominik Bruhn - Deploying Flink Jobs as Docker Con...
Kubeflow Control Plane 中文
Kubeflow repos
Linaro: High Performance Computing (HPC)
Lustre Best Practices
Exploring the Programming Models for the LUMI Supercomputer

What's hot (20)

PDF
KFServing and Kubeflow Pipelines
PDF
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
PDF
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
PDF
Kubernetes The New Research Platform
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
PDF
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
PDF
Composable infrastructure try valence
PDF
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
PDF
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
PDF
Streaming your Lyft Ride Prices - Flink Forward SF 2019
PDF
Getting started with AMD GPUs
PDF
Managing microservices with istio on OpenShift - Meetup
PDF
Dynamic pricing of Lyft rides using streaming
PPTX
Notary - container signing
PDF
Cloud Native Applications on Kubernetes: a DevOps Approach
PDF
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
PDF
InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...
PDF
p4alu: Arithmetic Logic Unit in P4
PDF
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
KFServing and Kubeflow Pipelines
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Kubernetes The New Research Platform
Introducing HPC with a Raspberry Pi Cluster
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
Composable infrastructure try valence
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Getting started with AMD GPUs
Managing microservices with istio on OpenShift - Meetup
Dynamic pricing of Lyft rides using streaming
Notary - container signing
Cloud Native Applications on Kubernetes: a DevOps Approach
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...
p4alu: Arithmetic Logic Unit in P4
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
Ad

Similar to End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks. (20)

PDF
Hopsworks at Google AI Huddle, Sunnyvale
PPTX
The ExtremeEarth infrastructure-phiweek19
PDF
Sysml 2019 demo_paper
PDF
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
PDF
Hopsworks - The Platform for Data-Intensive AI
PDF
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
PPTX
Simplifying training deep and serving learning models with big data in python...
PDF
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
PDF
Odsc workshop - Distributed Tensorflow on Hops
PDF
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
PPTX
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
PDF
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
PDF
A Collaborative Data Science Development Workflow
PDF
Intro - End to end ML with Kubeflow @ SignalConf 2018
PDF
Powering tensor flow with big data using apache beam, flink, and spark cern...
PDF
Mlflow with databricks
PDF
MLflow with Databricks
PDF
Building Hopsworks, a cloud-native managed feature store for machine learning
PDF
MLFlow 1.0 Meetup
PDF
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks at Google AI Huddle, Sunnyvale
The ExtremeEarth infrastructure-phiweek19
Sysml 2019 demo_paper
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
Hopsworks - The Platform for Data-Intensive AI
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Simplifying training deep and serving learning models with big data in python...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
Odsc workshop - Distributed Tensorflow on Hops
Tensorflow data preparation on Apache Beam using Portable Flink Runner, Ankur...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
A Collaborative Data Science Development Workflow
Intro - End to end ML with Kubeflow @ SignalConf 2018
Powering tensor flow with big data using apache beam, flink, and spark cern...
Mlflow with databricks
MLflow with Databricks
Building Hopsworks, a cloud-native managed feature store for machine learning
MLFlow 1.0 Meetup
Hopsworks in the cloud Berlin Buzzwords 2019
Ad

Recently uploaded (20)

PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
System and Network Administration Chapter 2
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Transform Your Business with a Software ERP System
PPTX
history of c programming in notes for students .pptx
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Softaken Excel to vCard Converter Software.pdf
Odoo Companies in India – Driving Business Transformation.pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Understanding Forklifts - TECH EHS Solution
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
System and Network Administration Chapter 2
L1 - Introduction to python Backend.pptx
Transform Your Business with a Software ERP System
history of c programming in notes for students .pptx
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Design an Analysis of Algorithms II-SECS-1021-03
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Operating system designcfffgfgggggggvggggggggg
VVF-Customer-Presentation2025-Ver1.9.pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Which alternative to Crystal Reports is best for small or large businesses.pdf
PTS Company Brochure 2025 (1).pdf.......

End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.

  • 1. End-to-End ML pipelines with Beam, Flink, TensorFlow, and Hopsworks Theofilos Kakantousis Software Engineer & COO @theofiloskak 3rd Apache Beam meetup, Stockholm, July 2019
  • 2. Agenda 1. End-to-end ML pipelines 2. What is Hopsworks 3. Beam Portable Runner with Flink in Hopsworks 4. ML Pipelines with Beam and TensorFlow Extended 5. Demo
  • 4. End-to-end ML Pipeline Data Prep Data Ingest Train Serve Online Monitor Distributed Storage Raw Data Data Lake Resource Manager
  • 6. Hopsworks Timeline “If you’re working with big data and Hadoop, this one paper could repay your investment in the Morning Paper many times over.... HopsFS is a huge win.” - Adrian Colyer, The Morning Paper World’s first Hadoop platform to support GPUs-as-a-Resource World’s fastest Hadoop Published at USENIX FAST with Oracle and Spotify World’s First Open Source Feature Store for Machine Learning World’s First Distributed Filesystem to store small files in metadata on NVMe disks Winner of IEEE Scale Challenge 2017 with HopsFS - 1.2m ops/sec 2017 World’s most scalable Filesystem with Multi Data Center Availability 2018 2019 World’s first Open Source Platform to support TensorFlow Extended (TFX) on Beam
  • 9. True Project-based multi-tenancy Proj-XProject-42 Kafka TopicResources /Projs/My/Data Project-AllExperimentsModels Experiments
  • 10. Hopsworks REST API ● Manage Hopsworks resources via the REST API ○ Projects ○ Datasets ○ Jobs ○ FeatureStore ○ Experiments ○ ModelServing ○ Kafka ○ ... ● Documented with Swagger and hosted on SwaggerHub ○ https://app.swaggerhub.com/apis-docs/logicalclocks/hopsworks-api/0.10.0
  • 12. Beam Portable Runner Beam Model: Fn Runners Apache Flink Apache Spark Beam Model: Pipeline Construction Other LanguagesBeam Java Beam Python Execution Execution Cloud Dataflo w Execution 1. End users: who want to write pipelines in a language that’s familiar. 2. SDK writers: who want to make Beam concepts available in new languages. 3. Runner writers: who have a distributed processing environment and want to support Beam pipelines https://s.apache.org/apache-beam-project-overview
  • 13. Beam-as-a-Service in Hopsworks ● Develop Beam pipelines in Python from Jupyter notebooks ● Tooling to simplify deployment and execution ● Manage lifecycle of Beam Portability JobService(JobServer) ● Logging and monitoring of Beam jobs ● SDK Workers(harness) with conda env ● Scalable execution on Flink/Spark clusters
  • 14. Hopsworks API ● hops-util-py (Python) and HopsUtil(Java) ● Simplifies development: ○ Sets security config ○ Discover cluster services ○ Helper methods for the Hopsworks REST API ○ ML Experiments ● Manage Beam Runners and Job Service https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util
  • 15. Beam Portability - Process vs Docker ● Docker: ○ Build image with all your dependencies ○ Update or modify? build new containers ○ Additional infrastructure components ● Process: ○ Install dependencies on all servers ○ Management of dependencies? ○ Easy to update and modify libraries ○ Challenge? Multi-tenancy & keep servers in sync ● SDK Worker: SDK-provided program responsible for executing user code ● How to manage the user’s dependencies, libraries, … ?
  • 16. First class Python: Conda in the Cluster Conda Repo Hopsworks Cluster No need to write Dockerfiles
  • 17. Jupyter dashboard in Hopsworks ● Manage notebook settings from dashboard
  • 18. Jupyter dashboard in Hopsworks ● Execute a Beam Python pipeline ● With the Python kernel either in a docker container managed by Kubernetes or as a local Python process. ● In a PySpark executor in the cluster.
  • 19. Notebooks as Beam jobs in ML pipelines
  • 20. Beam portability architecture in Hopsworks https://www.slideshare.net/ThomasWeise/python-streaming-pipelines-on-flink-beam-meetup-at-lyft-2019
  • 21. Beam portability architecture in Hopsworks HopsFS Local/YARN/K8s Hopsworks Session cluster on YARN
  • 22. Beam portability architecture in Hopsworks Local/YARN/K8s Compiled and shipped with HopsFS dependencies Hopsworks Session cluster on YARN HopsFS Local/YARN/K8s hops-util.py
  • 23. Beam portability architecture in Hopsworks Local/YARN/K8s Hopsworks Session cluster on YARN HopsFS Local/YARN/K8s hops-util.py # creates and starts runner # localizes Job Service jar file from HopsFS # Provides arguments (ports, artifacts_dir, etc.) # Start Job Service and returns host,port # Job Service automatically shuts down when Python pipeline shuts down host,port = start_runner()
  • 24. Beam portability architecture in Hopsworks Local/YARN/K8s Hopsworks Session cluster on YARN HopsFS Local/YARN/K8s hops-util.py Python conda env and Hopsworks env variables are set for SDKWorker
  • 25. Hopsworks API https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util def start_runner( runner="flink", runner_name="session", runner_config=config) def start_jobservice( runner = "Resources", artifacts_dir="Resources", job_server_path="hdfs:///user/flink/", job_server_jar="beam-runners-flink-1.8-job-server-2.13.0.jar", sdk_worker_parallelism=1) hops.beam.start_runner() hops.beam.start_jobservice()
  • 26. Logging ● Flink JobManager and TaskManager ● Beam Job service ○ Local mode - logs in project’s Jupyter staging dir ○ Cluster - logs in the PySpark container where process is running. ● SDK Worker ○ Logs are in the Flink TaskManager container ● Collect and visualize with the ELK stack ○ Logs are accessible only by project members
  • 28. Secure Beam with TLS certificates
  • 30. Hidden Technical Debt in Machine Learning Systems Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Engineering Data Collection Hardware Management Data Model Prediction φ(x) https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  • 32. TFX on a Flink Cluster with Portable Runner
  • 33. TFX on a Flink Cluster with Portable Runner
  • 34. Distributed Deep Learning in Hopsworks Executor 1 Executor N Driver HopsFS (HDFS)TensorBoard Model Serving
  • 35. Experiments - TensorBoard ● Repeatable experiments ● Manage experiments metadata ● Integration with Tensorboard
  • 37. Apache Airflow-as-a-Service ● Airflow available as a multi-tenant service in a Hopsworks ● Develop pipelines with Hopsworks operators and sensors
  • 39. Apache Airflow-as-a-Service - TFX pipeline ●
  • 40. Putting it all together
  • 41. Horizontally Scalable ML Pipelines Raw Data Event Data Monitor HopsFS Serving Feature Store / TFX Transform Data PrepIngest DeployExperiment / Train logs logs Metadata Store External Model Analysis FeatureStore
  • 42. Compatibility... ● Hopsworks-1.0 ● Beam 2.13.0 ● Flink 1.8.0 ● TensorFlow 1.14.0 ● TFX 0.13 ● TensorFlow Model Analysis 0.13.2
  • 43. Demo
  • 44. Conclusions & Future Work ● Summary ○ Hopsworks v1.0 the first on-prem open source horizontally scalable platform to support Beam Portable Runner with Flink runner ○ Develop and Manage lifecycle of horizontally scalable End-to-End ML Pipelines with Beam and TFX ● Future Work ○ Add support for Spark Runner ○ Export metrics for Flink runner to InfluxDB and visualize with Grafana
  • 45. Contributors Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Fabio Buso, Antonios Kouzoupis, Kim Hammar, Steffen Grohsschmiedt, Alex Ormenisan, Robin Andersson, Moritz Meister, Kajetan Maliszewski, Netsanet Gebretsadkan Kidane, Sina Sheikholeslami, Joel Stenkvist, August Bonds, Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Qi Qi, ...
  • 46. How to get started with Hopsworks? @hopsworks Register for a free account at: www.hops.site Images available for AWS, GCE, Virtualbox. https://www.logicalclocks.com/ https://github.com/logicalclocks/hopsworks https://www.meetup.com/HopsML-Stockholm Reach us @logicalclocks