SlideShare a Scribd company logo
Bringing Deep Learning into production
Brief introduction
• CTO & co-founder of Agile Lab
• Data & Tech addicted
• Contributor of Spark Notebook
• Spark early adopter
• Certified Cassandra Architect
• DeepLearning enthusiast
Who is Agile Lab ?
GO BIG (data) or GO HOME
http://www.meetup.com/it-IT/Torino-Scala-Programming-Big-Data-Meetup/
What we do
Applications
High scalability
Decision Support
Systems
data engineering, data mining and data
«meaning»
Big Data Strategies
Training
Reactive, NoSQL, Big Data, Machine
learning
Why Deep Learning
Deep Learning is trending
What is Deep Learning
• Deep learning is just another name for artificial neural networks
• An algorithm is deep if the input is passed through several non-li
nearity before being output
• Deep learning is discovering the features that best represent the
problem, rather than just a way to combine them
Deep Learning: Use cases
Do you want start with Deep
Learning ?
Let’s choose the right tools !!
Deep Learning Frameworks
• Deeplearning4J
• TensorFlow
• Caffe
• Theano
• Torch
• Spark ML MultilayerPerceptrons
• H2O
• CNTK
• MatLab
• maxDNN
And many others
How to choose
Background
Target Environment
Vision
Background
Productivity !!
• Scala
• Java
Big Data
Engineer
• Java
• Python
Math
Engineer
• R
• Python
Statistician
Target Environment
Trained model should
be deployable !! Trained
Model
Dev Env
Prod Env
Target Environment
Prod Env Dev Env
Training
Data
Cleaning
ETLScheduling
ML Pipeline
- Track model performance over time
- Care about SLA
- Continous tweaks
Enterprise Architecture
HADOOP
Online
DataStore
Enterprise Service BUS
DataIntegrationLayer
Data Integration Layer
DataIntegrationLayer
External
Sources
ANALYTICS
VALUE
ADDED
SERVICES
API
SERVICES
Internal
Business
Sources
Internal
System
Sources
DeepLearning
Easy Wins
Training pipeline should run
on Spark or Hadoop
Trained Model should be
represented in Java objects
Vision: keep in mind Scaling
High Level dynamic languages
are incredibly productive for
prototyping and data exploration
Scaling on larger data sets
quickly runs into performance
limitations
Keep in mind scaling
requirements from beginning
Vision: simplify the pipeline
Copy & Sample data from Dev Env to Data
Scientist Env
Prototype in Python or R
Train model
Predict on validation Data
Translate Model to match Prod Env 
Java, MapReduce, Spark
Deploy training pipeline and model
Easy Wins
Datascientists should work
directly on distributed
environment
Datascientist and big data
engineers should co-operate
on the same platform
SWOT Analysis
Tensor Flow
Strenghts:
- Powered By Google
- Nice UI
Weaknesses:
- Powered By Google
- No support for “inline” matrix operations
 Slow
Opportunities:
- Awesome community
Threats:
- No Scala or Java integration
- No commercial support
Theano
Strenghts:
- Grand Daddy of deep learning
- RNN and CNN
- Computational graph abstraction
- Python
Weaknesses:
- No support for Hadoop or Spark
- No plug & play nets
Opportunities:
- Great community
Threats:
- No Scala or Java integration
- No commercial support
Torch
Strenghts:
- GPU support
- Lots of pretrained models and packages
- Easy to use
Weaknesses:
- Lua language
Opportunities:
- Backed by DeepMind and Facebook
Threats:
- No Scala or Java integration
- No commercial support
Caffè
Strenghts:
- C++ & Python
- Good Performance
- GPU Support
Weaknesses:
- Focused on image processing
Opportunities:
- Backed by Yahoo for Spark integration
- Gpu Clustering
Threats:
- No commercial support
DeepLearning4j
Strenghts:
- GPU support
- Java and Scala
- Full DNN set
- Support Hadoop, Spark & Akka
Weaknesses:
- Not for dummies
Opportunities:
- Commercial support - SkyMind
Threats:
- Not so sexy for DataScientist because of
Java/Scala
H2O
• Easy to use Web UI
• Multi language API
• Run directly on HDFS or S3
• Model is Java PoJo
• Big Data Ready
• Really Fast
• Compressed data
• Regularization
• Grid Search
• GPU is still on roadmap
• CNN and RNN too
H2O - Flow
H20 – Sparkling Water
• Python, R and Scala API
• Best Kagglers use H20
• Tons of tools for profiling and tu
ning
• Spark leverage
• Best in class algorithms – battle
tested
• Regolarization
• Grid search
H20 – Sparkling Water
Workflow
POJO Java
Training Set
Embeddable in:
• J2EE App
• Spark Job
• MR Job
• DWH as UDF
training
Spark as middleware
Using Spark as middleware, you can leverage :
• Deeplearning4J
• H2O
• TensorFlow ( Arimo Extension)
• Caffe ( Yahoo Extension )
• ML MultilayerPerceptrons and future implementations
NO tech provider Lock-in
Our Stack for Enterprise
• Ready for Enterprise and Hadoop World
• Deployable into Java Env
• Notebook ( Flow )
• H2O for out of the box algorithms
• DeepLearning 4J for advanced DNN and
n-dimension array manipulation
• Good usability for both DataScientists and
Big Data Engineers
• Enterprise Support along the whole stack
Thanks!
We are hiring !
paolo.platter@agilelab.it

More Related Content

PDF
Agile Lab_BigData_Meetup_AKKA
PDF
Deep learning in production with the best
PDF
Strata San Jose 2016: Scalable Ensemble Learning with H2O
PDF
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
PDF
Best Practices for Engineering Production-Ready Software with Apache Spark
PDF
Tuning ML Models: Scaling, Workflows, and Architecture
PDF
Machine Learning for (JVM) Developers
PDF
Productionizing Deep Reinforcement Learning with Spark and MLflow
Agile Lab_BigData_Meetup_AKKA
Deep learning in production with the best
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Best Practices for Engineering Production-Ready Software with Apache Spark
Tuning ML Models: Scaling, Workflows, and Architecture
Machine Learning for (JVM) Developers
Productionizing Deep Reinforcement Learning with Spark and MLflow

What's hot (20)

PDF
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
PPTX
Skymind Open Power Summit ISV Round Table
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
PDF
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
PDF
Using PySpark to Process Boat Loads of Data
PDF
Machine Learning Pipelines
PDF
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
PDF
H2O Rains with Databricks Cloud - NY 02.16.16
PDF
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
PPTX
Production ready big ml workflows from zero to hero daniel marcous @ waze
PDF
Building an ML Platform with Ray and MLflow
PDF
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
PDF
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
PDF
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
PDF
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
PDF
Sawtooth Windows for Feature Aggregations
PDF
Constrained Optimization with Genetic Algorithms and Project Bonsai
PDF
Scalable Automatic Machine Learning in H2O
PDF
Semantic Image Logging Using Approximate Statistics & MLflow
PDF
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Skymind Open Power Summit ISV Round Table
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
Using PySpark to Process Boat Loads of Data
Machine Learning Pipelines
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
H2O Rains with Databricks Cloud - NY 02.16.16
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Production ready big ml workflows from zero to hero daniel marcous @ waze
Building an ML Platform with Ray and MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Sawtooth Windows for Feature Aggregations
Constrained Optimization with Genetic Algorithms and Project Bonsai
Scalable Automatic Machine Learning in H2O
Semantic Image Logging Using Approximate Statistics & MLflow
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Ad

Similar to Bringing Deep Learning into production (20)

PDF
Deep Learning on Apache® Spark™ : Workflows and Best Practices
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
PPTX
Data Science and CDSW
PDF
PPT5: Neuron Introduction
PPTX
Big Data Introduction - Solix empower
PDF
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
PDF
Ncku csie talk about Spark
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
PDF
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
PDF
The Hadoop Guarantee: Keeping Analytics Running On Time
PDF
Apache Deep Learning 201 - Philly Open Source
PPTX
Analyzing Hadoop Data Using Sparklyr

PPTX
Architecting an Open Source AI Platform 2018 edition
PPTX
.NET per la Data Science e oltre
PPTX
Deploying Data Science Engines to Production
PPTX
Apache Spark in Scientific Applciations
PPTX
Apache Spark in Scientific Applications
PDF
BDTC2015 databricks-辛湜-state of spark
PDF
Apache Spark for Everyone - Women Who Code Workshop
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Data Science and CDSW
PPT5: Neuron Introduction
Big Data Introduction - Solix empower
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Ncku csie talk about Spark
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
The Hadoop Guarantee: Keeping Analytics Running On Time
Apache Deep Learning 201 - Philly Open Source
Analyzing Hadoop Data Using Sparklyr

Architecting an Open Source AI Platform 2018 edition
.NET per la Data Science e oltre
Deploying Data Science Engines to Production
Apache Spark in Scientific Applciations
Apache Spark in Scientific Applications
BDTC2015 databricks-辛湜-state of spark
Apache Spark for Everyone - Women Who Code Workshop
Ad

More from Paolo Platter (11)

PPTX
Witboost Platform for decentralization of data management
PPTX
Platform Strategy for decentralization.pptx
PPTX
DAMA Norway - Computational Governance Model
PPTX
The role of Dremio in a data mesh architecture
PPTX
Data Mesh Implementation - a practical journey
PPTX
kafka simplicity and complexity
PDF
Wasp2 - IoT and Streaming Platform
PPTX
Meetup tensorframes
PDF
Agile Lab_BigData_Meetup
PDF
Massive Streaming Analytics with Spark Streaming
PDF
Scala Intro
Witboost Platform for decentralization of data management
Platform Strategy for decentralization.pptx
DAMA Norway - Computational Governance Model
The role of Dremio in a data mesh architecture
Data Mesh Implementation - a practical journey
kafka simplicity and complexity
Wasp2 - IoT and Streaming Platform
Meetup tensorframes
Agile Lab_BigData_Meetup
Massive Streaming Analytics with Spark Streaming
Scala Intro

Recently uploaded (20)

PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
medical staffing services at VALiNTRY
PDF
AI in Product Development-omnex systems
PDF
System and Network Administration Chapter 2
PPTX
Introduction to Artificial Intelligence
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
history of c programming in notes for students .pptx
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
System and Network Administraation Chapter 3
PDF
Nekopoi APK 2025 free lastest update
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
How Creative Agencies Leverage Project Management Software.pdf
Operating system designcfffgfgggggggvggggggggg
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
medical staffing services at VALiNTRY
AI in Product Development-omnex systems
System and Network Administration Chapter 2
Introduction to Artificial Intelligence
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
How to Migrate SBCGlobal Email to Yahoo Easily
history of c programming in notes for students .pptx
Odoo POS Development Services by CandidRoot Solutions
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
System and Network Administraation Chapter 3
Nekopoi APK 2025 free lastest update
PTS Company Brochure 2025 (1).pdf.......
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Upgrade and Innovation Strategies for SAP ERP Customers
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
How Creative Agencies Leverage Project Management Software.pdf

Bringing Deep Learning into production

  • 2. Brief introduction • CTO & co-founder of Agile Lab • Data & Tech addicted • Contributor of Spark Notebook • Spark early adopter • Certified Cassandra Architect • DeepLearning enthusiast
  • 3. Who is Agile Lab ? GO BIG (data) or GO HOME http://www.meetup.com/it-IT/Torino-Scala-Programming-Big-Data-Meetup/
  • 4. What we do Applications High scalability Decision Support Systems data engineering, data mining and data «meaning» Big Data Strategies Training Reactive, NoSQL, Big Data, Machine learning
  • 6. Deep Learning is trending
  • 7. What is Deep Learning • Deep learning is just another name for artificial neural networks • An algorithm is deep if the input is passed through several non-li nearity before being output • Deep learning is discovering the features that best represent the problem, rather than just a way to combine them
  • 9. Do you want start with Deep Learning ? Let’s choose the right tools !!
  • 10. Deep Learning Frameworks • Deeplearning4J • TensorFlow • Caffe • Theano • Torch • Spark ML MultilayerPerceptrons • H2O • CNTK • MatLab • maxDNN And many others
  • 11. How to choose Background Target Environment Vision
  • 12. Background Productivity !! • Scala • Java Big Data Engineer • Java • Python Math Engineer • R • Python Statistician
  • 13. Target Environment Trained model should be deployable !! Trained Model Dev Env Prod Env
  • 14. Target Environment Prod Env Dev Env Training Data Cleaning ETLScheduling ML Pipeline - Track model performance over time - Care about SLA - Continous tweaks
  • 15. Enterprise Architecture HADOOP Online DataStore Enterprise Service BUS DataIntegrationLayer Data Integration Layer DataIntegrationLayer External Sources ANALYTICS VALUE ADDED SERVICES API SERVICES Internal Business Sources Internal System Sources DeepLearning
  • 16. Easy Wins Training pipeline should run on Spark or Hadoop Trained Model should be represented in Java objects
  • 17. Vision: keep in mind Scaling High Level dynamic languages are incredibly productive for prototyping and data exploration Scaling on larger data sets quickly runs into performance limitations Keep in mind scaling requirements from beginning
  • 18. Vision: simplify the pipeline Copy & Sample data from Dev Env to Data Scientist Env Prototype in Python or R Train model Predict on validation Data Translate Model to match Prod Env  Java, MapReduce, Spark Deploy training pipeline and model
  • 19. Easy Wins Datascientists should work directly on distributed environment Datascientist and big data engineers should co-operate on the same platform
  • 21. Tensor Flow Strenghts: - Powered By Google - Nice UI Weaknesses: - Powered By Google - No support for “inline” matrix operations  Slow Opportunities: - Awesome community Threats: - No Scala or Java integration - No commercial support
  • 22. Theano Strenghts: - Grand Daddy of deep learning - RNN and CNN - Computational graph abstraction - Python Weaknesses: - No support for Hadoop or Spark - No plug & play nets Opportunities: - Great community Threats: - No Scala or Java integration - No commercial support
  • 23. Torch Strenghts: - GPU support - Lots of pretrained models and packages - Easy to use Weaknesses: - Lua language Opportunities: - Backed by DeepMind and Facebook Threats: - No Scala or Java integration - No commercial support
  • 24. Caffè Strenghts: - C++ & Python - Good Performance - GPU Support Weaknesses: - Focused on image processing Opportunities: - Backed by Yahoo for Spark integration - Gpu Clustering Threats: - No commercial support
  • 25. DeepLearning4j Strenghts: - GPU support - Java and Scala - Full DNN set - Support Hadoop, Spark & Akka Weaknesses: - Not for dummies Opportunities: - Commercial support - SkyMind Threats: - Not so sexy for DataScientist because of Java/Scala
  • 26. H2O • Easy to use Web UI • Multi language API • Run directly on HDFS or S3 • Model is Java PoJo • Big Data Ready • Really Fast • Compressed data • Regularization • Grid Search • GPU is still on roadmap • CNN and RNN too
  • 28. H20 – Sparkling Water • Python, R and Scala API • Best Kagglers use H20 • Tons of tools for profiling and tu ning • Spark leverage • Best in class algorithms – battle tested • Regolarization • Grid search
  • 30. Workflow POJO Java Training Set Embeddable in: • J2EE App • Spark Job • MR Job • DWH as UDF training
  • 31. Spark as middleware Using Spark as middleware, you can leverage : • Deeplearning4J • H2O • TensorFlow ( Arimo Extension) • Caffe ( Yahoo Extension ) • ML MultilayerPerceptrons and future implementations NO tech provider Lock-in
  • 32. Our Stack for Enterprise • Ready for Enterprise and Hadoop World • Deployable into Java Env • Notebook ( Flow ) • H2O for out of the box algorithms • DeepLearning 4J for advanced DNN and n-dimension array manipulation • Good usability for both DataScientists and Big Data Engineers • Enterprise Support along the whole stack
  • 33. Thanks! We are hiring ! paolo.platter@agilelab.it