SlideShare a Scribd company logo
3 DAYS ON JANUARY 16th, 17th & 18th
2015
www.globalbigdataconference.com
Twitter : @bigdataconf
Presenter: Josh Patterson
Past
Research in Swarm Algorithms
TVA / NERC
Cloudera
Today
Patterson Consulting
Skymind (Advisor)
josh@pattersonconsultingtn.com / @jpatanooga
• What is Deep Learning?
• What is DL4J?
• Enterprise Grade Deep Learning Workflows
WHAT IS DEEP LEARNING?
“Cooper: [When Cooper tries to reconfigure TARS] Humour 75%.
TARS: 75%. Self destruct sequence in T minus 10, 9, 8...
Cooper: Let's make it 65%.
TARS: Knock, knock.”
--- Interstellar
We Want to be able to recognize
Handwriting
This is a Hard Problem
Automated Feature Engineering
• Deep Learning can be thought of as workflows for
automated feature construction
– Where previously we’d consider each stage in the
workflow as unique technique
• Many of the techniques have been around for
years
– But now are being chained together in a way that
automates exotic feature engineering
• As LeCunn says:
– “machines that learn to represent the world”
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4J
These are the features learned at each neuron in a Restricted Boltzmann Machine
(RBMS)
These features are passed to higher levels of RBMs to learn more complicated things.
Part of the
“7” digit
Learning Progressive Layers
Deep Learning Architectures
• Deep Belief Networks
– Most common architecture
• Convolutional Neural Networks
– State of the art in image classification
• Recurrent Networks
– Timeseries
• Recursive Networks
– Text / image
– Can break down scenes
DL4J
Next Generation Deep Learning with
What is DL4J?
• “The Hadoop of Deep Learning”
– Command line driven
– Java, Scala, and Python APIs
• ASF 2.0 Licensed
• Java implementation
– Parallelization / GPU support
• Runtime Neutral
– Local
– Hadoop / YARN
– Spark
– AWS
• https://github.com/deeplearning4j/deeplearning4j
DL4J and Parallelization
14
Model
Training Data
Worker 1
Master
Partial
Model
Global Model
Worker 2
Partial Model
Worker N
Partial
Model
Split 1 Split 2 Split 3
…
Traditional Serial Training
Modern Parallel Engine
(Hadoop / Spark)
DL4J Suite of Tools
• DL4J
– Main library for deep learning
• Canova
– Vectorization library
• ND4J
– Linear Algebra framework
– Swappable backends (JBLAS, GPUs)
• Arbiter
– Model evaluation and testing platform
Canova for Command Line Vectorization
• Library of tools to take
– Audio
– Video
– Image
– Text
– CSV data
• And convert the input data into vectors in a standardized format
– Adaptable with custom input/output formats
• Open Source, ASF 2.0 Licensed
– https://github.com/deeplearning4j/Canova
– Part of DL4J suite
DEEP LEARNING WORKFLOWS
Enterprise Grade
Building Deep Learning Workflows
• Our terminology in data science has gotten more exotic
– But its still about gather, cleaning, visualizing, and feature
construction of data
• We need to get data from a raw format into a baseline raw
vector
– Which is why Canova exists
• Deep Learning is not just classification
– But an automated feature construction pipeline
• Together, DL4J and Canova give us the base workflow
Modeling UCI Data: Iris
• We need to vectorize the data
– Possibly with some per column transformations
– Which is why Canova exists
• To feed raw data into a form DL4J can consume
• We then need to build a deep learning model
over the data
– We’ll use the DL4J lib to do this
Vectorization with Canova
• Setup the configuration file
• Setup the schema transforms for the input
CSV data
• Generate the SVMLight vector data as the
output
– with the command line interface
Canova Configuration
input.header.skip=false
input.statistics.debug.print=false
input.format=org.canova.api.formats.input.impl.LineInputFormat
input.directory=src/test/resources/csv/data/uci_iris_sample.txt
input.vector.schema=src/test/resources/csv/schemas/uci/iris.txt
output.directory=/tmp/iris_unit_test_sample.txt
output.format=org.canova.api.formats.output.impl.SVMLightOutputFormat
Canova Vector Schema
@RELATION UCIIrisDataset
@DELIMITER ,
@ATTRIBUTE sepallength NUMERIC !NORMALIZE
@ATTRIBUTE sepalwidth NUMERIC !NORMALIZE
@ATTRIBUTE petallength NUMERIC !NORMALIZE
@ATTRIBUTE petalwidth NUMERIC !NORMALIZE
@ATTRIBUTE class STRING !LABEL
Canova CLI
./bin/canova vectorize -conf /tmp/iris_conf.txt
File path already exists, deleting the old file before
proceeding...
Output vectors written to: /tmp/iris_svmlight.txt
DL4J from the Command Line
bin/dl4j
Usage:
dl4j [command] [params]
Commands:
train build a deep learning model
test test a deep learning model
predict score new records against a deep learning model
DL4J Configuration File
input.format=org.canova.api.formats.input.impl.SVMLightInputFormat
input.directory=src/test/resources/data/irisSvmLight.txt
model.config=src/test/resources/model_dbn.json
output.directory=/tmp/
JSON Model Config
{"sparsity":0.0,"useAdaGrad":true,"lr":0.10000000149011612,"corruptionLevel":0.30000001192092896,"numIterations":1000,"momentu
m":0.5,"l2":0.0,"useRegularization":false,"momentumAfter":null,"resetAdaGradIterations":-
1,"numLineSearchIterations":100,"dropOut":0.0,"applySparsity":false,"weightInit":"VI","optimizationAlgo":"CONJUGATE_GRADIENT",
"lossFunction":"RECONSTRUCTION_CROSSENTROPY","concatBiases":false,"constrainGradientToUnitNorm":false,"seed":123,"nIn":0,"nOut
":0,"activationFunction":"sigmoid","visibleUnit":"BINARY","hiddenUnit":"BINARY","k":1,"weightShape":[0,0],"filterSize":[2,2,2,
2],"numFeatureMaps":2,"featureMapSize":[2,2],"stride":[2,2],"kernel":5,"batchSize":0,"minimize":false,"rng":"org.apache.common
s.math3.random.MersenneTwister","layerFactory":"org.deeplearning4j.nn.layers.factory.PretrainLayerFactory,org.deeplearning4j.m
odels.featuredetectors.rbm.RBM","stepFunction":"org.deeplearning4j.optimize.stepfunctions.DefaultStepFunction","renderWeightIt
erations":-1,"dist":"org.apache.commons.math3.distribution.NormalDistributiont{SQRT2=1.4142135623730951, mean=0.0,
solverAbsoluteAccuracy=1.0E-9, DEFAULT_INVERSE_ABSOLUTE_ACCURACY=1.0E-9, standardDeviation=0.1,
logStandardDeviationPlusHalfLog2Pi=-1.3836465597893728,
serialVersionUID=8589540077390120676}","listeners":["org.deeplearning4j.optimize.listeners.ScoreIterationListenert{printItera
tions=10, log=Logger[org.deeplearning4j.optimize.listeners.ScoreIterationListener]},"]}
DL4J CLI Training
./bin/dl4j train -conf /tmp/iris_conf.txt
Skymind as DL4J Distribution
• Just as Redhat was to Linux
– A distribution of Linux with enterprise grade packaging
• Just as Cloudera was to Hadoop
– A distribution of Apache Hadoop with enterprise grade
packaging
• Skymind is to DL4J
– A distribution of DL4J (+tool suite) with enterprise grade
packaging
Thank you for your time and attention
“Deep Learning: A Practitioner’s Approach”
(Oreilly, October 2015)

More Related Content

PPTX
How to Build Deep Learning Models
PPTX
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
PPTX
Building Deep Learning Workflows with DL4J
PPTX
Deep learning with DL4J - Hadoop Summit 2015
PPTX
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
PPTX
Deep Learning: DL4J and DataVec
PDF
DL4J at Workday Meetup
PPTX
Deep Learning and Recurrent Neural Networks in the Enterprise
How to Build Deep Learning Models
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
Building Deep Learning Workflows with DL4J
Deep learning with DL4J - Hadoop Summit 2015
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Deep Learning: DL4J and DataVec
DL4J at Workday Meetup
Deep Learning and Recurrent Neural Networks in the Enterprise

What's hot (20)

PPTX
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
PPTX
Deep learning on Hadoop/Spark -NextML
PDF
Deep Learning on Apache® Spark™ : Workflows and Best Practices
PDF
DeepLearning4J: Open Source Neural Net Platform
PDF
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
PDF
SparkApplicationDevMadeEasy_Spark_Summit_2015
PDF
Snorkel: Dark Data and Machine Learning with Christopher Ré
PDF
Neural Networks, Spark MLlib, Deep Learning
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
PDF
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
PPTX
Large Scale Graph Analytics with JanusGraph
PDF
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
PDF
Future of Data Intensive Applicaitons
PDF
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
PPTX
East Bay Java User Group Oct 2014 Spark Streaming Kinesis Machine Learning
PDF
Data Science meets Software Development
PDF
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
PPTX
Big Data Introduction - Solix empower
PDF
H2O with Erin LeDell at Portland R User Group
PPTX
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Deep learning on Hadoop/Spark -NextML
Deep Learning on Apache® Spark™ : Workflows and Best Practices
DeepLearning4J: Open Source Neural Net Platform
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
SparkApplicationDevMadeEasy_Spark_Summit_2015
Snorkel: Dark Data and Machine Learning with Christopher Ré
Neural Networks, Spark MLlib, Deep Learning
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Large Scale Graph Analytics with JanusGraph
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
Future of Data Intensive Applicaitons
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
East Bay Java User Group Oct 2014 Spark Streaming Kinesis Machine Learning
Data Science meets Software Development
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
Big Data Introduction - Solix empower
H2O with Erin LeDell at Portland R User Group
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Ad

Viewers also liked (20)

ODP
Deep Learning for Java (DL4J)
PPTX
Artificial Intelligence: Predictions for 2017
PPTX
Hadoop Summit 2014 Distributed Deep Learning
PPTX
Skymind Open Power Summit ISV Round Table
PPTX
Deep Belief nets
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PPTX
Sri Ambati – CEO, 0xdata at MLconf ATL
PDF
Common Design of Deep Learning Frameworks
PDF
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
PDF
Skymind Company Profile
PDF
Eric bieschke slides
PPTX
Dl4j in the wild
PDF
Configuring Credit Card Process in SAP
PDF
RE-Work Deep Learning Summit - September 2016
PDF
Введение в архитектуры нейронных сетей / HighLoad++ 2016
PDF
Ready for Funding?
PDF
Artificial Intelligence - Trends & Advancements
PDF
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
PPTX
Introduction to Machine Learning with TensorFlow
PDF
Pi ai landscape
Deep Learning for Java (DL4J)
Artificial Intelligence: Predictions for 2017
Hadoop Summit 2014 Distributed Deep Learning
Skymind Open Power Summit ISV Round Table
Deep Belief nets
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Sri Ambati – CEO, 0xdata at MLconf ATL
Common Design of Deep Learning Frameworks
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Skymind Company Profile
Eric bieschke slides
Dl4j in the wild
Configuring Credit Card Process in SAP
RE-Work Deep Learning Summit - September 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Ready for Funding?
Artificial Intelligence - Trends & Advancements
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Introduction to Machine Learning with TensorFlow
Pi ai landscape
Ad

Similar to Enterprise Deep Learning with DL4J (20)

PPTX
Applied Deep Learning with Spark and Deeplearning4j
PPTX
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
PDF
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
PDF
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
PPTX
Data Science at Scale: Using Apache Spark for Data Science at Bitly
PDF
CaffeOnSpark: Deep Learning On Spark Cluster
PDF
Atlanta Hadoop Users Meetup 09 21 2016
PDF
Bringing Deep Learning into production
PDF
Parquet and AVRO
PDF
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
PDF
Data Science
PDF
Apache Spark Presentation good for big data
PDF
Apache Spark for Everyone - Women Who Code Workshop
PDF
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
PPTX
Real time Analytics with Apache Kafka and Apache Spark
PDF
Build, Scale, and Deploy Deep Learning Pipelines with Ease
PDF
Apache Submarine: Unified Machine Learning Platform
PPTX
Data Science at Scale with Apache Spark and Zeppelin Notebook
PDF
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
PDF
New Developments in H2O: April 2017 Edition
Applied Deep Learning with Spark and Deeplearning4j
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Data Science at Scale: Using Apache Spark for Data Science at Bitly
CaffeOnSpark: Deep Learning On Spark Cluster
Atlanta Hadoop Users Meetup 09 21 2016
Bringing Deep Learning into production
Parquet and AVRO
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Data Science
Apache Spark Presentation good for big data
Apache Spark for Everyone - Women Who Code Workshop
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Real time Analytics with Apache Kafka and Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Apache Submarine: Unified Machine Learning Platform
Data Science at Scale with Apache Spark and Zeppelin Notebook
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
New Developments in H2O: April 2017 Edition

More from Josh Patterson (15)

PPTX
Patterson Consulting: What is Artificial Intelligence?
PPTX
What is Artificial Intelligence
PPTX
Smart Data Conference: DL4J and DataVec
PPTX
Modeling Electronic Health Records with Recurrent Neural Networks
PPTX
Vectorization - Georgia Tech - CSE6242 - March 2015
PPTX
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
PPTX
Intro to Vectorization Concepts - GaTech cse6242
PPTX
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
PPTX
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
PPTX
Knitting boar atl_hug_jan2013_v2
PPTX
Knitting boar - Toronto and Boston HUGs - Nov 2012
PPTX
LA HUG Dec 2011 - Recommendation Talk
PPTX
Oct 2011 CHADNUG Presentation on Hadoop
PPTX
Machine Learning and Hadoop
PPTX
Classification with Naive Bayes
Patterson Consulting: What is Artificial Intelligence?
What is Artificial Intelligence
Smart Data Conference: DL4J and DataVec
Modeling Electronic Health Records with Recurrent Neural Networks
Vectorization - Georgia Tech - CSE6242 - March 2015
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Intro to Vectorization Concepts - GaTech cse6242
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Knitting boar atl_hug_jan2013_v2
Knitting boar - Toronto and Boston HUGs - Nov 2012
LA HUG Dec 2011 - Recommendation Talk
Oct 2011 CHADNUG Presentation on Hadoop
Machine Learning and Hadoop
Classification with Naive Bayes

Recently uploaded (20)

PPTX
Managing Community Partner Relationships
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Leprosy and NLEP programme community medicine
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPT
Predictive modeling basics in data cleaning process
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Introduction to Data Science and Data Analysis
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Introduction to the R Programming Language
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Managing Community Partner Relationships
climate analysis of Dhaka ,Banglades.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Leprosy and NLEP programme community medicine
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Database Infoormation System (DBIS).pptx
IB Computer Science - Internal Assessment.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Predictive modeling basics in data cleaning process
Acceptance and paychological effects of mandatory extra coach I classes.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
ISS -ESG Data flows What is ESG and HowHow
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Data Science and Data Analysis
Optimise Shopper Experiences with a Strong Data Estate.pdf
Introduction to the R Programming Language
The THESIS FINAL-DEFENSE-PRESENTATION.pptx

Enterprise Deep Learning with DL4J

Editor's Notes

  • #8: we plot the learned filter for each hidden neuron, one per column of W. Each filter is of the same dimension as the input data, and it is most useful to visualize the filters in the same way as the input data is visualized. In the cases of image patches, we show each filter as an image patch
  • #9: we plot the learned filter for each hidden neuron, one per column of W. Each filter is of the same dimension as the input data, and it is most useful to visualize the filters in the same way as the input data is visualized. In the cases of image patches, we show each filter as an image patch
  • #10: we plot the learned filter for each hidden neuron, one per column of W. Each filter is of the same dimension as the input data, and it is most useful to visualize the filters in the same way as the input data is visualized. In the cases of image patches, we show each filter as an image patch
  • #15: POLR: Parallel Online Logistic Regression Talking points: wanted to start with a known tool to the hadoop community, with expected characteristics Mahout’s SGD is well known, and so we used that as a base point