SlideShare a Scribd company logo
Wangda Tan (Hadoop PMC member @Hortonworks)
Sunil Govind (Hadoop PMC member @Hortonworks)
Deep learning on YARN: running
Tensorflow , etc. on Hadoop
clusters
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Machine Learning Basic
 Machine Learning In Production
 How YARN Helps
 Example: Running distributed Tensorflow on YARN
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Machine learning basics
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Basics: Machine Learning
 Cat Classifier
Cats
Labeled data (Training)
Non-Cats
Feed
Save
Predict
Cat (80%) Non-Cat (20%)
Model
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Basics: Model Training
Model
Training
Model
Evaluation
Model
Validation
Model
Staging
Model
Training
 Traditional machine
learning models
– Logistic Regression
– Gradient boosting tree
– Recommendation/ALS
– LDA
 Libraries
– Apache Spark MLlib
– XGBoost
 Deep learning models
– DNN
– CNN
– RNN
– LSTM
 Libraries
– TensorFlow
– Apache MXNet
– PyTorch
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Basics: Why GPU?
 GPU: Many cores to handle massive (but simple) computation tasks simultaneously:
GPU CPU
GPU Computation Intensive Other
Without GPU support, researchers/engineers
are almost impossible to wait job finish.
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Machine learning in production
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Machine Learning in tutorial
$ nvidia-docker run -it -p 8888:8888 tensorflow/tensorflow:latest-gpu
Go to your browser on http://localhost:8888/
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Machine Learning in a Unified Platform
“Hidden Technical Debt in Machine Learning Systems”, Google
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Training Hierarchical Models
Word Embedding Model
Food picture classifier Model
Ensemble Model
"Burger is great.
however onion rings
were over cooked"
(Image/Photo from Yelp)
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data pipelines for Machine Learning (Big Data)
ETLData Exploration
Join / Sampling /
Feature Extraction
Split train, test Data set, etc.
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How YARN helps
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Running on YARN: All about data and sharing
Hadoop YARN
HDFS AWS S3 RDBMS
Spark MLlib XGBoost TensorFlow
Zeppelin / Jupyter
Hive/LLAP Spark SQL
CPU GPU SSD
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why all under YARN
SLA!
Monitoring!
A normal cluster user
Quotas!
Isolation!
Capacity Planning, Preemption, Reservation System.
Time time services, Grafana, etc.
Queues / Users quota, user access control.
CPU / Memory / GPU / FPGA, (WIP) Network/Disk
YARN
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
All running on the same YARN platform
LLAP
128 G 128 G 128 G 128 G 128 G
LLAP LLAP
128 G 128 G
GPUs
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recent works in YARN to support ML workloads like Tensorflow
 GPU isolation/scheduling support
 Native Service - Easy to define and run any custom service
 All above works available in Apache Hadoop 3.1.0
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
GPU support on YARN (Apache Hadoop 3.1.0)
 Why need isolation?
– Multiple processes use the single GPU will be:
• Serialized.
• Cause OOM easily.
 GPU isolation on YARN: .
– Granularity is for per-GPU device.
– Use Cgroups / docker to enforce the isolation.
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Docker + GPU support on YARN (Apache Hadoop 3.1.0)
 Most of machine learning platforms has
python/R/cudnn/CUDA dependencies.
 Docker solves messy dependencies issues
– But it may introduce problems for GPU base
libraries
 Nvidia-docker-plugin mounts Nvidia driver,
etc. when container got launched.
 YARN supports Docker and as well as
nvidia-docker-plugin.
Tensorflow 1.2
Nginx AppUbuntu 14:04
Nginx AppHost OS
GPU Base Lib v1
Volume Mount
CUDA Library 5.0
Tensorflow 1.2
Nginx AppUbuntu 14:04
GPU Base Lib v2
Nginx AppHost OS
GPU Base Lib v1
X Fails
CUDA Library 5.0
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Running Distributed Tensorflow on YARN
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why distributed?
Reference: https://www.tensorflow.org/performance/benchmarks
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How Distributed TF Works?
 Distributed TF architecture  How to make it work?
– Set following environment: TF_CONFIG
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Using YARN to run distributed Tensorflow
 What you need to do:
– Write YARN service spec with proper
TF_CONFIG in parameter.
– Run the job by using:
– yarn app -launch ${SERVICE_NAME}
${PATH_TO_SERVICE_SPEC}
 What happened under the hood
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Write service spec to run distributed Tensorflow
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Write service spec to serve Tensorflow model
 Note:
– Uses simple_tensorflow_serving (github.com/tobegit3hub/simple_tensorflow_serving)
– http://serving.serving-job-001.<domain-name>:port to access serving REST end point
– Still feel complicated? We’re working on wrapper to simply this!
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Write service spec to run MXNet – Fine-tune model.
 Note:
– Fine-tune refers training with parameters partially initialized with pre-trained model.
– Prepare caltech256 dataset first, then fine tune it with imagenet11k-resnet-152
– YARN Native Service’s dependencies feature helps to run the prepare component first and once its completed, real
training is started on the prepared dataset.
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Accelerating XGBoost applications with
GPU and Spark
https://dataworkssummit.com/berlin-2018/session/accelerating-xgboost-applications-with-gpu-
and-spark/
2:50 PM, Room I, Wed April 18th
-- Related Session --
Yanbo Liang & Mingjie Tang
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?

More Related Content

PPTX
An Overview on Optimization in Apache Hive: Past, Present Future
PPTX
Manage democratization of the data - Data Replication in Hadoop
PPTX
Sharing metadata across the data lake and streams
PPTX
Apache deep learning 101
PPTX
Lessons learned running a container cloud on YARN
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
PPTX
Running Enterprise Workloads in the Cloud
PPTX
Creating the Internet of Your Things
An Overview on Optimization in Apache Hive: Past, Present Future
Manage democratization of the data - Data Replication in Hadoop
Sharing metadata across the data lake and streams
Apache deep learning 101
Lessons learned running a container cloud on YARN
Hadoop & Cloud Storage: Object Store Integration in Production
Running Enterprise Workloads in the Cloud
Creating the Internet of Your Things

What's hot (20)

PPTX
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
PPTX
Apache Hive 2.0: SQL, Speed, Scale
PPTX
Apache Hadoop YARN: state of the union
PPTX
Row/Column- Level Security in SQL for Apache Spark
PPTX
Mission to NARs with Apache NiFi
PPTX
Streamline Hadoop DevOps with Apache Ambari
PPTX
Double Your Hadoop Hardware Performance with SmartSense
PDF
What s new in spark 2.3 and spark 2.4
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
PPTX
Apache Hadoop YARN: Past, Present and Future
PPT
Enabling a hardware accelerated deep learning data science experience for Apa...
PPTX
Log Analytics Optimization
PPTX
Connecting the Drops with Apache NiFi & Apache MiNiFi
PPTX
Hadoop Operations - Past, Present, and Future
PDF
Data in the Cloud Crash Course
PPTX
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
PPTX
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
PPTX
Webinar Series Part 5 New Features of HDF 5
PPT
Running Zeppelin in Enterprise
PPTX
Best Practices for Enterprise User Management in Hadoop Environment
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Apache Hive 2.0: SQL, Speed, Scale
Apache Hadoop YARN: state of the union
Row/Column- Level Security in SQL for Apache Spark
Mission to NARs with Apache NiFi
Streamline Hadoop DevOps with Apache Ambari
Double Your Hadoop Hardware Performance with SmartSense
What s new in spark 2.3 and spark 2.4
Hadoop & Cloud Storage: Object Store Integration in Production
Apache Hadoop YARN: Past, Present and Future
Enabling a hardware accelerated deep learning data science experience for Apa...
Log Analytics Optimization
Connecting the Drops with Apache NiFi & Apache MiNiFi
Hadoop Operations - Past, Present, and Future
Data in the Cloud Crash Course
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Webinar Series Part 5 New Features of HDF 5
Running Zeppelin in Enterprise
Best Practices for Enterprise User Management in Hadoop Environment
Ad

Similar to Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3 (20)

PPTX
Dataworks Berlin Summit 18' - Deep learning On YARN - Running Distributed Te...
PPTX
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
PPTX
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
PPTX
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
PDF
Scaling Deep Learning on Hadoop at LinkedIn
PPTX
Scaling Deep Learning on Hadoop at LinkedIn
PDF
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
PPTX
Hadoop Meetup Jan 2019 - TonY: TensorFlow on YARN and Beyond
PDF
Deep learning 101
PDF
Deep learning on HDP 2018 Prague
PDF
TonY: Native support of TensorFlow on Hadoop
PPTX
Demystifying-AI-Frameworks-TensorFlow-PyTorch-JAX-and-More (1).pptx
PDF
Distributed Deep Learning with Hadoop and TensorFlow
PDF
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
PDF
Tensorflow 2.0 and Coral Edge TPU
PDF
Tensor flow white paper
PPTX
Machine learning in the wild deployment
PDF
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
PPTX
Apache Hadoop 3 updates with migration story
PDF
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Dataworks Berlin Summit 18' - Deep learning On YARN - Running Distributed Te...
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
[Hadoop Meetup] Tensorflow on Apache Hadoop YARN - Sunil Govindan
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Hadoop Meetup Jan 2019 - TonY: TensorFlow on YARN and Beyond
Deep learning 101
Deep learning on HDP 2018 Prague
TonY: Native support of TensorFlow on Hadoop
Demystifying-AI-Frameworks-TensorFlow-PyTorch-JAX-and-More (1).pptx
Distributed Deep Learning with Hadoop and TensorFlow
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Tensorflow 2.0 and Coral Edge TPU
Tensor flow white paper
Machine learning in the wild deployment
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
Apache Hadoop 3 updates with migration story
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
KodekX | Application Modernization Development
PPT
Teaching material agriculture food technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
Encapsulation_ Review paper, used for researhc scholars
Per capita expenditure prediction using model stacking based on satellite ima...
“AI and Expert System Decision Support & Business Intelligence Systems”
NewMind AI Weekly Chronicles - August'25 Week I
KodekX | Application Modernization Development
Teaching material agriculture food technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Understanding_Digital_Forensics_Presentation.pptx
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
Reach Out and Touch Someone: Haptics and Empathic Computing
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Programs and apps: productivity, graphics, security and other tools

Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3

  • 1. Wangda Tan (Hadoop PMC member @Hortonworks) Sunil Govind (Hadoop PMC member @Hortonworks) Deep learning on YARN: running Tensorflow , etc. on Hadoop clusters
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda  Machine Learning Basic  Machine Learning In Production  How YARN Helps  Example: Running distributed Tensorflow on YARN
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Machine learning basics
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Basics: Machine Learning  Cat Classifier Cats Labeled data (Training) Non-Cats Feed Save Predict Cat (80%) Non-Cat (20%) Model
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Basics: Model Training Model Training Model Evaluation Model Validation Model Staging Model Training  Traditional machine learning models – Logistic Regression – Gradient boosting tree – Recommendation/ALS – LDA  Libraries – Apache Spark MLlib – XGBoost  Deep learning models – DNN – CNN – RNN – LSTM  Libraries – TensorFlow – Apache MXNet – PyTorch
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Basics: Why GPU?  GPU: Many cores to handle massive (but simple) computation tasks simultaneously: GPU CPU GPU Computation Intensive Other Without GPU support, researchers/engineers are almost impossible to wait job finish.
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Machine learning in production
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Machine Learning in tutorial $ nvidia-docker run -it -p 8888:8888 tensorflow/tensorflow:latest-gpu Go to your browser on http://localhost:8888/
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Machine Learning in a Unified Platform “Hidden Technical Debt in Machine Learning Systems”, Google
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Training Hierarchical Models Word Embedding Model Food picture classifier Model Ensemble Model "Burger is great. however onion rings were over cooked" (Image/Photo from Yelp)
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data pipelines for Machine Learning (Big Data) ETLData Exploration Join / Sampling / Feature Extraction Split train, test Data set, etc.
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How YARN helps
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Running on YARN: All about data and sharing Hadoop YARN HDFS AWS S3 RDBMS Spark MLlib XGBoost TensorFlow Zeppelin / Jupyter Hive/LLAP Spark SQL CPU GPU SSD
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why all under YARN SLA! Monitoring! A normal cluster user Quotas! Isolation! Capacity Planning, Preemption, Reservation System. Time time services, Grafana, etc. Queues / Users quota, user access control. CPU / Memory / GPU / FPGA, (WIP) Network/Disk YARN
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved All running on the same YARN platform LLAP 128 G 128 G 128 G 128 G 128 G LLAP LLAP 128 G 128 G GPUs
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recent works in YARN to support ML workloads like Tensorflow  GPU isolation/scheduling support  Native Service - Easy to define and run any custom service  All above works available in Apache Hadoop 3.1.0
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved GPU support on YARN (Apache Hadoop 3.1.0)  Why need isolation? – Multiple processes use the single GPU will be: • Serialized. • Cause OOM easily.  GPU isolation on YARN: . – Granularity is for per-GPU device. – Use Cgroups / docker to enforce the isolation.
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Docker + GPU support on YARN (Apache Hadoop 3.1.0)  Most of machine learning platforms has python/R/cudnn/CUDA dependencies.  Docker solves messy dependencies issues – But it may introduce problems for GPU base libraries  Nvidia-docker-plugin mounts Nvidia driver, etc. when container got launched.  YARN supports Docker and as well as nvidia-docker-plugin. Tensorflow 1.2 Nginx AppUbuntu 14:04 Nginx AppHost OS GPU Base Lib v1 Volume Mount CUDA Library 5.0 Tensorflow 1.2 Nginx AppUbuntu 14:04 GPU Base Lib v2 Nginx AppHost OS GPU Base Lib v1 X Fails CUDA Library 5.0
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Running Distributed Tensorflow on YARN
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why distributed? Reference: https://www.tensorflow.org/performance/benchmarks
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How Distributed TF Works?  Distributed TF architecture  How to make it work? – Set following environment: TF_CONFIG
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Using YARN to run distributed Tensorflow  What you need to do: – Write YARN service spec with proper TF_CONFIG in parameter. – Run the job by using: – yarn app -launch ${SERVICE_NAME} ${PATH_TO_SERVICE_SPEC}  What happened under the hood
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Write service spec to run distributed Tensorflow
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Write service spec to serve Tensorflow model  Note: – Uses simple_tensorflow_serving (github.com/tobegit3hub/simple_tensorflow_serving) – http://serving.serving-job-001.<domain-name>:port to access serving REST end point – Still feel complicated? We’re working on wrapper to simply this!
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Write service spec to run MXNet – Fine-tune model.  Note: – Fine-tune refers training with parameters partially initialized with pre-trained model. – Prepare caltech256 dataset first, then fine tune it with imagenet11k-resnet-152 – YARN Native Service’s dependencies feature helps to run the prepare component first and once its completed, real training is started on the prepared dataset.
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Accelerating XGBoost applications with GPU and Spark https://dataworkssummit.com/berlin-2018/session/accelerating-xgboost-applications-with-gpu- and-spark/ 2:50 PM, Room I, Wed April 18th -- Related Session -- Yanbo Liang & Mingjie Tang
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions?

Editor's Notes

  • #6: Model training is the most important step of the whole pipeline.
  • #9: Just like the workflow shows, only a tiny fraction of the code is actually devoted to model learning. The machine learning workflow usually need lots of supports from the big data platform, such as data collection from different data sources, feature extraction, feature transform, and so on. Let’s find out how big data infrastructure could help machine learning step by step.
  • #10: Just like the workflow shows, only a tiny fraction of the code is actually devoted to model learning. The machine learning workflow usually need lots of supports from the big data platform, such as data collection from different data sources, feature extraction, feature transform, and so on. Let’s find out how big data infrastructure could help machine learning step by step.
  • #11: To Do:
  • #12: ToDo Add Ooozie/Azkaban to control the workflow
  • #18: Even though TF provide options to use GPU memory less than whole device provided. But we cannot enforce this from external.
  • #19: Even though TF provide options to use GPU memory less than whole device provided. But we cannot enforce this from external.