SlideShare a Scribd company logo
Techniques for Distributed TensorFlow on Hops
Jim Dowling
CEO, Logical Clocks AB
Assoc Prof, KTH Stockholm
Senior Researcher, RISE SICS
jim_dowling
Europe 2018
©2018 Logical Clocks AB. All Rights Reserved
AI Hierarchy of Needs
2
DDL
(Distributed
Deep Learning)
Deep Learning,
RL, Automated ML
A/B Testing, Experimentation, ML
B.I. Analytics, Metrics, Aggregates,
Features, Training/Test Data
Reliable Data Pipelines, ETL, Unstructured and
Structured Data Storage, Real-Time Data Ingestion
Data
Engineers
Data
Scientists
Data
Science-
Engineers
©2018 Logical Clocks AB. All Rights Reserved
Distributed Deep Learning is Important
3
“We have three main lines of attack:
1.We can search for improved model architectures.
2.We can scale computation.
3.We can create larger training data sets.”
*https://blog.acolyer.org/2018/03/28/deep-learning-scaling-is-predictable-empirically/
How to improve the state of the art in deep learning? *
©2018 Logical Clocks AB. All Rights Reserved
More Data means Better Predictions
Prediction
Performance
Traditional ML
Deep Neural Nets
Amount Labelled Data
Hand-crafted
can outperform
4/45
©2018 Logical Clocks AB. All Rights Reserved
How much Labelled Data do we Need?*
5
*https://arxiv.org/pdf/1712.00409.pdf
©2018 Logical Clocks AB. All Rights Reserved
More Data and Prediction Improvement*
6
*https://blog.acolyer.org/2018/03/28/deep-learning-scaling-is-predictable-empirically/
Generalization
Error
©2018 Logical Clocks AB. All Rights Reserved
Better Model and Prediction Improvement*
7
*https://blog.acolyer.org/2018/03/28/deep-learning-scaling-is-predictable-empirically/
Generalization
Error
©2018 Logical Clocks AB. All Rights Reserved
Get Better Models with More Compute
“Methods that scale with computation
are the future of AI”*
- Rich Sutton (A Founding Father of Reinforcement Learning)
* https://www.youtube.com/watch?v=EeMCEQa85tw
8/45
• Model Architecture Search*
- Explore on smaller datasets, then scale to
larger datasets => enables more searches.
• SOTA on CIFAR10 (2.13% top 1)
SOTA on ImageNet (3.8% top 5)
- 450 GPU / 7 days
- 900 TPU / 5 days
Parallel Experiments to Find Better Models
*https://arxiv.org/abs/1802.01548
9/45
©2018 Logical Clocks AB. All Rights Reserved
Parallel Experiments
1/4
Time
The Outer Loop (hyperparameters):
“I have to run a hundred experiments to find the
best model,” he complained, as he showed me
his Jupyter notebooks. “That takes time. Every
experiment takes a lot of programming, because
there are so many different parameters.
[Rants of a Data Scientist]Hops
©2018 Logical Clocks AB. All Rights Reserved
Need for a Distributed Filesystem
11
Experiment 1 Experiment N
Driver
Distributed FSTensorBoard
Training/test data, evaluation results,
experiment configurations, etc
•Datasets are getting larger
•Model checkpointing
•Model-architecture search
•Hyperparameter search
•Hierarchical Filesystems (fast)
- HDFS / HopsFS
- Ceph, GlusterFS
•Object Stores (slow)
- S3, GCS, WFS
More on Why we need a Distributed Filesystem
12/45
*http://www.logicalclocks.com/fixing-the-small-files-problem-in-hdfs/
PLUG for HopsFS
What about Distributed Training?
13
©2018 Logical Clocks AB. All Rights Reserved
More Compute should mean Faster Training
Training
Performance
Single-Host
Distributed
Available Compute
14/45
©2018 Logical Clocks AB. All Rights Reserved
Distributed Training
2/4
Weeks
Time
The Inner Loop (training):
“ All these experiments took a lot of computation
— we used hundreds of GPUs/TPUs for days.
Much like a single modern computer can
outperform thousands of decades-old machines,
we hope that in the future these experiments will
become household.”
[Google SoTA ImageNet, Cifar-10, March18]
Mins
Hops
©2018 Logical Clocks AB. All Rights Reserved
Reduce DNN Training Time
In 2017, Facebook
reduced training
time on ImageNet
for a CNN from 2
weeks to 1 hour
by scaling out to
256 GPUs using
Ring-AllReduce on
Caffe2.
https://arxiv.org/abs/1706.02677
16/45
©2018 Logical Clocks AB. All Rights Reserved
Distributed Training: Theory and Practice
17 17/45
Image from @hardmaru on Twitter.
Asynchronous vs Synchronous SGD
•Synchronous Stochastic Gradient Descent (SGD) now dominant
“Revisiting Synchronous SGD”, Chen et al, ICLR 2016
https://research.google.com/pubs/pub45187.html
18
Synchronous Distributed SGD Algorithms not all Equal
Training
Performance
Parameter Servers
AllReduce
Available Compute
19/45
©2018 Logical Clocks AB. All Rights Reserved
Ring-AllReduce vs Parameter Server
GPU 0
GPU 1
GPU 2
GPU 3
send
send
send
send
recv
recv
recv
recv GPU 1 GPU 2 GPU 3 GPU 4
Param Server(s)
Network Bandwidth is the Bottleneck for Distributed Training
20/45
©2018 Logical Clocks AB. All Rights Reserved
AllReduce outperforms Parameter Servers
21/45
*https://github.com/uber/horovod
16 servers with 4 P100 GPUs (64 GPUs) each connected by ROCE-capable 25 Gbit/s network
(synthetic data). Speed below is images processed per second.*
For Bigger Models, Parameter Servers don’t scale
ML in Production: Machine Learning Pipelines
©2018 Logical Clocks AB. All Rights Reserved
A Machine Learning Pipeline with TensorFlow
23/45
Data
Collection
Experimentation Training Serving
Feature
Extraction
Data
Transformation
& Verification
TfServingTensorFlowSpark
Distributed FS
Message Queue
Resource Manager with GPU support
Test
Kubernetes
Data Engineering Data Science Ops
©2018 Logical Clocks AB. All Rights Reserved
Hops Small Data ML Pipeline
24/45
Hops (Kafka/HopsFS/Spark/TensorFlow/Kubernetes)
Data
Collection
Experimentation Training Serving
Feature
Extraction
Data
Transformation
& Verification
Test
Project Teams (Data Engineers/Scientists)
TfServingTensorFlow
©2018 Logical Clocks AB. All Rights Reserved
PySpark
Hops Big Data ML Pipeline
25/45
Hops (Kafka/HopsFS/Spark/TensorFlow/Kubernetes)
Data
Collection
Experimentation Training Serving
Feature
Extraction
Data
Transformation
& Verification
Test
Project Teams (Data Engineers/Scientists)
TfServingTensorFlow
Why not Kubeflow?
•Operational Reasons
-No Integrated Enterprise Security Framework
• Encryption-in-Transit, Encryption-at-Rest
-Stateful services not designed for Kubernetes
• Distributed Storage, Kafka, Databases
•Usability Reasons
-Not a Fully Managed Platform
• Write YML files and restart just to install a new Python library
-Slow startup times for applications/notebooks
26/45
Machine Learning Pipelines in Code
©2018 Logical Clocks AB. All Rights Reserved
28/45
Small Data Preparation with tf.data API
def input_fn(batch_size):
files = tf.data.Dataset.list_files(IMAGES_DIR)
def tfrecord_dataset(filename):
return tf.data.TFRecordDataset(filename,
num_parallel_reads=32, buffer_size=8*1024*1024)
dataset = files.apply(tf.data.parallel_interleave
(tfrecord_dataset, cycle_length=32, sloppy=True)
dataset = dataset.apply(tf.data.map_and_batch(parser_fn, batch_size,
num_parallel_batches=4))
dataset = dataset.prefetch(4)
return dataset
Feature Extraction
Experimentation
Training
Test + Serve
Data Acquisition
Clean/Transform Data
©2018 Logical Clocks AB. All Rights Reserved
Big Data Preparation with PySpark
from mmlspark import ImageTransformer
images = spark.readImages(IMAGE_PATH, recursive = True,
sampleRatio = 0.1).cache()
tr = (ImageTransformer().setOutputCol(“transformed”)
.resize(height = 200, width = 200)
.crop(0, 0, height = 180, width = 180) )
smallImages = tr.transform(images).select(“transformed”)
29/45
Feature Extraction
Experimentation
Training
Test + Serve
Data Acquisition
Clean/Transform Data
©2018 Logical Clocks AB. All Rights Reserved
Hyperparam Opt. with Tf/Spark on Hops
def model_fn(learning_rate, dropout):
import tensorflow as tf
from hops import tensorboard, hdfs, devices
[TensorFlow Code here]
from hops import experiment
args_dict = {'learning_rate': [0.001, 0.005, 0.01],
'dropout': [0.5, 0.6]}
experiment.launch(spark, model_fn, args_dict)
Launch TF jobs in Spark Executors
30/45
Feature Extraction
Experimentation
Training
Test + Serve
Data Acquisition
Clean/Transform Data
©2018 Logical Clocks AB. All Rights Reserved
HyperParam Opt. Visualization on TensorBoard
31/45
Hyperparam Opt Results Visualization
©2018 Logical Clocks AB. All Rights Reserved
Distributed Training with Horovod on Hops
def conv_model(feature, target, mode)
…..
hvd.init()
opt = hvd.DistributedOptimizer(opt)
if hvd.local_rank()==0:
hooks = [hvd.BroadcastGlobalVariablesHook(0), ..]
…..
else:
hooks = [hvd.BroadcastGlobalVariablesHook(0), ..]
…..
from hops import allreduce
allreduce.launch(spark, 'hdfs:///Projects/…/all_reduce.ipynb')
“Pure” TensorFlow code
32/45
Feature Extraction
Experimentation
Training
Test + Serve
Data Acquisition
Clean/Transform Data
Hops API
•Python (also Java/Scala)
-Manage tensorboard, Load/save models in HDFS
-Horovod, TensorFlowOnSpark
-Parallel experiments
• Gridsearch
• Model Architecture Search with Genetic Algorithms
-Secure Streaming Analytics with Kafka/Spark/Flink
• SSL/TLS certs, Avro Schema, Endpoints for Kafka/Zookeeper/etc
33/45
Feature Extraction
Experimentation
Training
Test + Serve
Data Acquisition
Clean/Transform Data
©2018 Logical Clocks AB. All Rights Reserved
TensorFlow Model Serving
34/45
Feature Extraction
Experimentation
Training
Test + Serve
Data Acquisition
Clean/Transform Data
Hops Data Platform
©2018 Logical Clocks AB. All Rights Reserved
Hops: Next Generation Hadoop*
16x
Throughput
FasterBigger
*https://www.usenix.org/conference/fast17/technical-sessions/presentation/niazi
37x
Number of files
Scale Challenge Winner (2017)
37
GPUs in
YARN
37/45
©2018 Logical Clocks AB. All Rights Reserved
Engineering
Kafka Topic
Project-X
Project Model for Sensitive Data/GDPR
38/45
Project-42
Shared DBTopic
Project-All
CompanyDB
Ismail et al, Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata, ICDCS 2017
FX Project
FX Topic
FX DB
FX Data
Stream
Shared Analytics
FX team
©2018 Logical Clocks AB. All Rights Reserved
Hopsworks Data Platform
39/45
©2018 Logical Clocks AB. All Rights Reserved
Proj-42
Projects sandbox Private Data
A Project is a Grouping of Users and Data
Proj-X
Shared TopicTopic /Projs/My/Data
Proj-AllCompanyDB
Ismail et al, Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata, ICDCS 2017
40/45
©2018 Logical Clocks AB. All Rights Reserved
How are Projects used?
Engineering
Kafka Topic
FX Project
FX Topic
FX DB
FX Data Stream
Shared Interactive Analytics
FX team
41/45
Python in the Cluster: Per-Project Conda Envs
Python libraries are usable by Spark/Tensorflow
42/45
©2018 Logical Clocks AB. All Rights Reserved
HopsFS
YARN
FeatureStore
Tensorflow
Serving
Public Cloud or On-Premise
Tensorboard
TensorFlow in Hopsworks
Experiments
Kafka
Hive
43/45
©2018 Logical Clocks AB. All Rights Reserved
44/45
Summary
•The future of Deep Learning is Distributed
https://www.oreilly.com/ideas/distributed-tensorflow
•Hops is a new Data Platform with first-class support for
Python / Deep Learning / ML / Data Governance / GPUs
*https://twitter.com/karpathy/status/972701240017633281
“It is starting to look like deep learning workflows of the future
feature autotuned architectures running with autotuned
compute schedules across arbitrary backends.”
Andrej Karpathy - Head of AI @ Tesla
©2018 Logical Clocks AB. All Rights Reserved
The Team
Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman
Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel,
Antonios Kouzoupis, Alex Ormenisan, Fabio Buso, Robin Andersson,
August Bonds, Filotas Siskos, Mahmoud Hamed.
Active:
Alumni:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram
Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto
Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro,
Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos
Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid
Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Fanti Machmount Al
Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, ArunaKumari Yedurupaka,
Tobias Johansson , Roberto Bampi, Roshan Sedar.
www.hops.io
@hopshadoop
©2018 Logical Clocks AB. All Rights Reserved
Spark Scikit-learn integration
from sklearn import svm, grid_search, datasets
from spark_sklearn import GridSearchCV
iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svr = svm.SVC()
clf = GridSearchCV(sc, svr, parameters)
clf.fit(iris.data, iris.target)
59/45

More Related Content

PDF
Berlin buzzwords 2018 TensorFlow on Hops
PDF
Jfokus 2019-dowling-logical-clocks
PDF
Hopsworks at Google AI Huddle, Sunnyvale
PPTX
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
PPTX
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PDF
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
PDF
Separating Hype from Reality in Deep Learning with Sameer Farooqui
PPTX
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Berlin buzzwords 2018 TensorFlow on Hops
Jfokus 2019-dowling-logical-clocks
Hopsworks at Google AI Huddle, Sunnyvale
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016

What's hot (20)

PDF
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
PPTX
Distributed Model Training using MXNet with Horovod
PDF
Python Powered Data Science at Pivotal (PyData 2013)
PDF
How to use Apache TVM to optimize your ML models
PDF
Simple, Modular and Extensible Big Data Platform Concept
PPTX
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
PDF
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
PDF
TAU E4S ON OpenPOWER /POWER9 platform
PDF
On-Prem Solution for the Selection of Wind Energy Models
PDF
CFD on Power
PPTX
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
PPTX
2018 bsc power9 and power ai
PPTX
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
PPTX
Surge: Rise of Scalable Machine Learning at Yahoo!
PDF
Some experiences for porting application to Intel Xeon Phi
PPTX
AI OpenPOWER Academia Discussion Group
PPTX
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
PDF
Introduction to machine learning with GPUs
PDF
Profiling PyTorch for Efficiency & Sustainability
PPTX
PowerAI Deep dive
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Distributed Model Training using MXNet with Horovod
Python Powered Data Science at Pivotal (PyData 2013)
How to use Apache TVM to optimize your ML models
Simple, Modular and Extensible Big Data Platform Concept
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
TAU E4S ON OpenPOWER /POWER9 platform
On-Prem Solution for the Selection of Wind Energy Models
CFD on Power
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
2018 bsc power9 and power ai
Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Over...
Surge: Rise of Scalable Machine Learning at Yahoo!
Some experiences for porting application to Intel Xeon Phi
AI OpenPOWER Academia Discussion Group
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Introduction to machine learning with GPUs
Profiling PyTorch for Efficiency & Sustainability
PowerAI Deep dive
Ad

Similar to Distributed TensorFlow on Hops (Papis London, April 2018) (20)

PPTX
All AI Roads lead to Distribution - Dot AI
PDF
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
PDF
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
PDF
End-to-End Platform Support for Distributed Deep Learning in Finance
PDF
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
PDF
Odsc workshop - Distributed Tensorflow on Hops
PDF
CGI trainees workshop Distributed Deep Learning, 24/5 2019, Kim Hammar
PDF
Sysml 2019 demo_paper
PDF
Distributed deep learning_with_hopsworks_kim_hammar_25_april_2019
PDF
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PDF
Toward Distributed, Global, Deep Learning Using IoT Devices
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
PDF
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
PDF
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
PDF
Eit digital big_data_summer_school_8_aug_2019_kim_hammar
PDF
PDF
1605.08695.pdf
PDF
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
All AI Roads lead to Distribution - Dot AI
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
End-to-End Platform Support for Distributed Deep Learning in Finance
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Odsc workshop - Distributed Tensorflow on Hops
CGI trainees workshop Distributed Deep Learning, 24/5 2019, Kim Hammar
Sysml 2019 demo_paper
Distributed deep learning_with_hopsworks_kim_hammar_25_april_2019
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
Toward Distributed, Global, Deep Learning Using IoT Devices
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
Eit digital big_data_summer_school_8_aug_2019_kim_hammar
1605.08695.pdf
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Ad

More from Jim Dowling (20)

PDF
ARVC and flecainide case report[EI] Jim.docx.pdf
PDF
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PDF
Serverless ML Workshop with Hopsworks at PyData Seattle
PDF
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PDF
_Python Ireland Meetup - Serverless ML - Dowling.pdf
PDF
Building Hopsworks, a cloud-native managed feature store for machine learning
PDF
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
PDF
Ml ops and the feature store with hopsworks, DC Data Science Meetup
PDF
Hops fs huawei internal conference july 2021
PDF
Hopsworks MLOps World talk june 21
PDF
Hopsworks Feature Store 2.0 a new paradigm
PDF
Metadata and Provenance for ML Pipelines with Hopsworks
PDF
GANs for Anti Money Laundering
PDF
Berlin buzzwords 2020-feature-store-dowling
PDF
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
PDF
Hopsworks data engineering melbourne april 2020
PDF
The Bitter Lesson of ML Pipelines
PDF
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
PDF
Hopsworks in the cloud Berlin Buzzwords 2019
PDF
The Feature Store in Hopsworks
ARVC and flecainide case report[EI] Jim.docx.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
Serverless ML Workshop with Hopsworks at PyData Seattle
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
Building Hopsworks, a cloud-native managed feature store for machine learning
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Hops fs huawei internal conference july 2021
Hopsworks MLOps World talk june 21
Hopsworks Feature Store 2.0 a new paradigm
Metadata and Provenance for ML Pipelines with Hopsworks
GANs for Anti Money Laundering
Berlin buzzwords 2020-feature-store-dowling
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Hopsworks data engineering melbourne april 2020
The Bitter Lesson of ML Pipelines
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Hopsworks in the cloud Berlin Buzzwords 2019
The Feature Store in Hopsworks

Recently uploaded (20)

PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
KodekX | Application Modernization Development
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Advanced IT Governance
PPTX
Big Data Technologies - Introduction.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
cuic standard and advanced reporting.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPT
Teaching material agriculture food technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
Advanced Soft Computing BINUS July 2025.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx
KodekX | Application Modernization Development
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced IT Governance
Big Data Technologies - Introduction.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Approach and Philosophy of On baking technology
Understanding_Digital_Forensics_Presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Monthly Chronicles - July 2025
cuic standard and advanced reporting.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Teaching material agriculture food technology
Chapter 3 Spatial Domain Image Processing.pdf
Spectral efficient network and resource selection model in 5G networks

Distributed TensorFlow on Hops (Papis London, April 2018)

  • 1. Techniques for Distributed TensorFlow on Hops Jim Dowling CEO, Logical Clocks AB Assoc Prof, KTH Stockholm Senior Researcher, RISE SICS jim_dowling Europe 2018
  • 2. ©2018 Logical Clocks AB. All Rights Reserved AI Hierarchy of Needs 2 DDL (Distributed Deep Learning) Deep Learning, RL, Automated ML A/B Testing, Experimentation, ML B.I. Analytics, Metrics, Aggregates, Features, Training/Test Data Reliable Data Pipelines, ETL, Unstructured and Structured Data Storage, Real-Time Data Ingestion Data Engineers Data Scientists Data Science- Engineers
  • 3. ©2018 Logical Clocks AB. All Rights Reserved Distributed Deep Learning is Important 3 “We have three main lines of attack: 1.We can search for improved model architectures. 2.We can scale computation. 3.We can create larger training data sets.” *https://blog.acolyer.org/2018/03/28/deep-learning-scaling-is-predictable-empirically/ How to improve the state of the art in deep learning? *
  • 4. ©2018 Logical Clocks AB. All Rights Reserved More Data means Better Predictions Prediction Performance Traditional ML Deep Neural Nets Amount Labelled Data Hand-crafted can outperform 4/45
  • 5. ©2018 Logical Clocks AB. All Rights Reserved How much Labelled Data do we Need?* 5 *https://arxiv.org/pdf/1712.00409.pdf
  • 6. ©2018 Logical Clocks AB. All Rights Reserved More Data and Prediction Improvement* 6 *https://blog.acolyer.org/2018/03/28/deep-learning-scaling-is-predictable-empirically/ Generalization Error
  • 7. ©2018 Logical Clocks AB. All Rights Reserved Better Model and Prediction Improvement* 7 *https://blog.acolyer.org/2018/03/28/deep-learning-scaling-is-predictable-empirically/ Generalization Error
  • 8. ©2018 Logical Clocks AB. All Rights Reserved Get Better Models with More Compute “Methods that scale with computation are the future of AI”* - Rich Sutton (A Founding Father of Reinforcement Learning) * https://www.youtube.com/watch?v=EeMCEQa85tw 8/45
  • 9. • Model Architecture Search* - Explore on smaller datasets, then scale to larger datasets => enables more searches. • SOTA on CIFAR10 (2.13% top 1) SOTA on ImageNet (3.8% top 5) - 450 GPU / 7 days - 900 TPU / 5 days Parallel Experiments to Find Better Models *https://arxiv.org/abs/1802.01548 9/45
  • 10. ©2018 Logical Clocks AB. All Rights Reserved Parallel Experiments 1/4 Time The Outer Loop (hyperparameters): “I have to run a hundred experiments to find the best model,” he complained, as he showed me his Jupyter notebooks. “That takes time. Every experiment takes a lot of programming, because there are so many different parameters. [Rants of a Data Scientist]Hops
  • 11. ©2018 Logical Clocks AB. All Rights Reserved Need for a Distributed Filesystem 11 Experiment 1 Experiment N Driver Distributed FSTensorBoard Training/test data, evaluation results, experiment configurations, etc
  • 12. •Datasets are getting larger •Model checkpointing •Model-architecture search •Hyperparameter search •Hierarchical Filesystems (fast) - HDFS / HopsFS - Ceph, GlusterFS •Object Stores (slow) - S3, GCS, WFS More on Why we need a Distributed Filesystem 12/45 *http://www.logicalclocks.com/fixing-the-small-files-problem-in-hdfs/ PLUG for HopsFS
  • 13. What about Distributed Training? 13
  • 14. ©2018 Logical Clocks AB. All Rights Reserved More Compute should mean Faster Training Training Performance Single-Host Distributed Available Compute 14/45
  • 15. ©2018 Logical Clocks AB. All Rights Reserved Distributed Training 2/4 Weeks Time The Inner Loop (training): “ All these experiments took a lot of computation — we used hundreds of GPUs/TPUs for days. Much like a single modern computer can outperform thousands of decades-old machines, we hope that in the future these experiments will become household.” [Google SoTA ImageNet, Cifar-10, March18] Mins Hops
  • 16. ©2018 Logical Clocks AB. All Rights Reserved Reduce DNN Training Time In 2017, Facebook reduced training time on ImageNet for a CNN from 2 weeks to 1 hour by scaling out to 256 GPUs using Ring-AllReduce on Caffe2. https://arxiv.org/abs/1706.02677 16/45
  • 17. ©2018 Logical Clocks AB. All Rights Reserved Distributed Training: Theory and Practice 17 17/45 Image from @hardmaru on Twitter.
  • 18. Asynchronous vs Synchronous SGD •Synchronous Stochastic Gradient Descent (SGD) now dominant “Revisiting Synchronous SGD”, Chen et al, ICLR 2016 https://research.google.com/pubs/pub45187.html 18
  • 19. Synchronous Distributed SGD Algorithms not all Equal Training Performance Parameter Servers AllReduce Available Compute 19/45
  • 20. ©2018 Logical Clocks AB. All Rights Reserved Ring-AllReduce vs Parameter Server GPU 0 GPU 1 GPU 2 GPU 3 send send send send recv recv recv recv GPU 1 GPU 2 GPU 3 GPU 4 Param Server(s) Network Bandwidth is the Bottleneck for Distributed Training 20/45
  • 21. ©2018 Logical Clocks AB. All Rights Reserved AllReduce outperforms Parameter Servers 21/45 *https://github.com/uber/horovod 16 servers with 4 P100 GPUs (64 GPUs) each connected by ROCE-capable 25 Gbit/s network (synthetic data). Speed below is images processed per second.* For Bigger Models, Parameter Servers don’t scale
  • 22. ML in Production: Machine Learning Pipelines
  • 23. ©2018 Logical Clocks AB. All Rights Reserved A Machine Learning Pipeline with TensorFlow 23/45 Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification TfServingTensorFlowSpark Distributed FS Message Queue Resource Manager with GPU support Test Kubernetes Data Engineering Data Science Ops
  • 24. ©2018 Logical Clocks AB. All Rights Reserved Hops Small Data ML Pipeline 24/45 Hops (Kafka/HopsFS/Spark/TensorFlow/Kubernetes) Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification Test Project Teams (Data Engineers/Scientists) TfServingTensorFlow
  • 25. ©2018 Logical Clocks AB. All Rights Reserved PySpark Hops Big Data ML Pipeline 25/45 Hops (Kafka/HopsFS/Spark/TensorFlow/Kubernetes) Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification Test Project Teams (Data Engineers/Scientists) TfServingTensorFlow
  • 26. Why not Kubeflow? •Operational Reasons -No Integrated Enterprise Security Framework • Encryption-in-Transit, Encryption-at-Rest -Stateful services not designed for Kubernetes • Distributed Storage, Kafka, Databases •Usability Reasons -Not a Fully Managed Platform • Write YML files and restart just to install a new Python library -Slow startup times for applications/notebooks 26/45
  • 28. ©2018 Logical Clocks AB. All Rights Reserved 28/45 Small Data Preparation with tf.data API def input_fn(batch_size): files = tf.data.Dataset.list_files(IMAGES_DIR) def tfrecord_dataset(filename): return tf.data.TFRecordDataset(filename, num_parallel_reads=32, buffer_size=8*1024*1024) dataset = files.apply(tf.data.parallel_interleave (tfrecord_dataset, cycle_length=32, sloppy=True) dataset = dataset.apply(tf.data.map_and_batch(parser_fn, batch_size, num_parallel_batches=4)) dataset = dataset.prefetch(4) return dataset Feature Extraction Experimentation Training Test + Serve Data Acquisition Clean/Transform Data
  • 29. ©2018 Logical Clocks AB. All Rights Reserved Big Data Preparation with PySpark from mmlspark import ImageTransformer images = spark.readImages(IMAGE_PATH, recursive = True, sampleRatio = 0.1).cache() tr = (ImageTransformer().setOutputCol(“transformed”) .resize(height = 200, width = 200) .crop(0, 0, height = 180, width = 180) ) smallImages = tr.transform(images).select(“transformed”) 29/45 Feature Extraction Experimentation Training Test + Serve Data Acquisition Clean/Transform Data
  • 30. ©2018 Logical Clocks AB. All Rights Reserved Hyperparam Opt. with Tf/Spark on Hops def model_fn(learning_rate, dropout): import tensorflow as tf from hops import tensorboard, hdfs, devices [TensorFlow Code here] from hops import experiment args_dict = {'learning_rate': [0.001, 0.005, 0.01], 'dropout': [0.5, 0.6]} experiment.launch(spark, model_fn, args_dict) Launch TF jobs in Spark Executors 30/45 Feature Extraction Experimentation Training Test + Serve Data Acquisition Clean/Transform Data
  • 31. ©2018 Logical Clocks AB. All Rights Reserved HyperParam Opt. Visualization on TensorBoard 31/45 Hyperparam Opt Results Visualization
  • 32. ©2018 Logical Clocks AB. All Rights Reserved Distributed Training with Horovod on Hops def conv_model(feature, target, mode) ….. hvd.init() opt = hvd.DistributedOptimizer(opt) if hvd.local_rank()==0: hooks = [hvd.BroadcastGlobalVariablesHook(0), ..] ….. else: hooks = [hvd.BroadcastGlobalVariablesHook(0), ..] ….. from hops import allreduce allreduce.launch(spark, 'hdfs:///Projects/…/all_reduce.ipynb') “Pure” TensorFlow code 32/45 Feature Extraction Experimentation Training Test + Serve Data Acquisition Clean/Transform Data
  • 33. Hops API •Python (also Java/Scala) -Manage tensorboard, Load/save models in HDFS -Horovod, TensorFlowOnSpark -Parallel experiments • Gridsearch • Model Architecture Search with Genetic Algorithms -Secure Streaming Analytics with Kafka/Spark/Flink • SSL/TLS certs, Avro Schema, Endpoints for Kafka/Zookeeper/etc 33/45 Feature Extraction Experimentation Training Test + Serve Data Acquisition Clean/Transform Data
  • 34. ©2018 Logical Clocks AB. All Rights Reserved TensorFlow Model Serving 34/45 Feature Extraction Experimentation Training Test + Serve Data Acquisition Clean/Transform Data
  • 36. ©2018 Logical Clocks AB. All Rights Reserved Hops: Next Generation Hadoop* 16x Throughput FasterBigger *https://www.usenix.org/conference/fast17/technical-sessions/presentation/niazi 37x Number of files Scale Challenge Winner (2017) 37 GPUs in YARN 37/45
  • 37. ©2018 Logical Clocks AB. All Rights Reserved Engineering Kafka Topic Project-X Project Model for Sensitive Data/GDPR 38/45 Project-42 Shared DBTopic Project-All CompanyDB Ismail et al, Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata, ICDCS 2017 FX Project FX Topic FX DB FX Data Stream Shared Analytics FX team
  • 38. ©2018 Logical Clocks AB. All Rights Reserved Hopsworks Data Platform 39/45
  • 39. ©2018 Logical Clocks AB. All Rights Reserved Proj-42 Projects sandbox Private Data A Project is a Grouping of Users and Data Proj-X Shared TopicTopic /Projs/My/Data Proj-AllCompanyDB Ismail et al, Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata, ICDCS 2017 40/45
  • 40. ©2018 Logical Clocks AB. All Rights Reserved How are Projects used? Engineering Kafka Topic FX Project FX Topic FX DB FX Data Stream Shared Interactive Analytics FX team 41/45
  • 41. Python in the Cluster: Per-Project Conda Envs Python libraries are usable by Spark/Tensorflow 42/45
  • 42. ©2018 Logical Clocks AB. All Rights Reserved HopsFS YARN FeatureStore Tensorflow Serving Public Cloud or On-Premise Tensorboard TensorFlow in Hopsworks Experiments Kafka Hive 43/45
  • 43. ©2018 Logical Clocks AB. All Rights Reserved 44/45
  • 44. Summary •The future of Deep Learning is Distributed https://www.oreilly.com/ideas/distributed-tensorflow •Hops is a new Data Platform with first-class support for Python / Deep Learning / ML / Data Governance / GPUs *https://twitter.com/karpathy/status/972701240017633281 “It is starting to look like deep learning workflows of the future feature autotuned architectures running with autotuned compute schedules across arbitrary backends.” Andrej Karpathy - Head of AI @ Tesla
  • 45. ©2018 Logical Clocks AB. All Rights Reserved The Team Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Fabio Buso, Robin Andersson, August Bonds, Filotas Siskos, Mahmoud Hamed. Active: Alumni: Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, ArunaKumari Yedurupaka, Tobias Johansson , Roberto Bampi, Roshan Sedar. www.hops.io @hopshadoop
  • 46. ©2018 Logical Clocks AB. All Rights Reserved Spark Scikit-learn integration from sklearn import svm, grid_search, datasets from spark_sklearn import GridSearchCV iris = datasets.load_iris() parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} svr = svm.SVC() clf = GridSearchCV(sc, svr, parameters) clf.fit(iris.data, iris.target) 59/45