Deep Learning and Recurrent Neural Networks in the Enterprise

Deep Learning and Recurrent Neural
Networks in the Enterprise
StampedeCon
St. Louis 2016
Josh Patterson, Skymind

Presenter: Josh Patterson
Past
Research in Swarm Algorithms: Real-time optimization techniques in
mesh sensor networks
TVA / NERC: Smartgrid, Sensor Collection, and Big Data
Cloudera: Principal SA, Working with Fortune 500
Patterson Consulting: Working with Fortune 500 on Big Data, ML
Today
Skymind, Director Field Engineering
josh@skymind.io / @jpatanooga
DL4J Co-creator,
Co-Author on Upcoming Oreilly Book
“Deep Learning: A Practitioner’s Approach”

Topics
• What is Deep Learning?
• DL4J
• Recurrent Neural Network Applications

Defining Deep Learning
• Higher neuron counts than in previous
generation neural networks
• Different and evolved ways to connect layers
inside neural networks
• More computing power to train
• Automated Feature Learning

Automated Feature Learning
• Deep Learning can be thought of as workflows
for automated feature construction
– From “feature construction” to “feature learning”
• As Yann LeCun says:
– “machines that learn to represent the world”

Deep Learning and Recurrent Neural Networks in the Enterprise

These are the features learned at each neuron in a Restricted Boltzmann Machine
(RBMS)
These features are passed to higher levels of RBMs to learn more complicated things.
Part of the
“7” digit

Unreasonable Effectiveness:
Benchmark Records
1. Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014)
2. Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014)
3. Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
4. Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
5. Medium vocabulary speech recognition (Geiger et al., Interspeech 2014)
6. English to French translation (Sutskever et al., Google, NIPS 2014)
7. Audio onset detection (Marchi et al., ICASSP 2014)
8. Social signal classification (Brueckner & Schulter, ICASSP 2014)
9. Arabic handwriting recognition (Bluche et al., DAS 2014)
10. TIMIT phoneme recognition (Graves et al., ICASSP 2013)
11. Optical character recognition (Breuel et al., ICDAR 2013)
12. Image caption generation (Vinyals et al., Google, 2014)
13. Video to textual description (Donahue et al., 2014)
14. Syntactic parsing for Natural Language Processing (Vinyals et al., Google, 2014)
15. Photo-real talking heads (Soong and Wang, Microsoft, 2014).

Four Major Architectures
• Deep Belief Networks
• Convolutional Neural Networks
• Recurrent Neural Networks
• Recursive Neural Networks

Quick Usage Guide
• If I have Timeseries or Audio Input
– I should use a Recurrent Neural Network
– Examples: Fraud Detection, Anomaly Detection
• If I have Image input
– I should use a Convolutional Neural Network
• If I have Video input
– I should use a hybrid Convolutional + Recurrent
Architecture!

The More Things Change…
• Deep Learning is still trying to answer the
same fundamental questions such as:
– “is this image a face?”
• The difference is Deep Learning makes hard
questions easier to answer with better
architectures and more computing power
– We do this by matching the correct architecture
w the right problem

DL4J
Building Deep Neural Networks with

DL4J
• “The Hadoop of Deep Learning”
– Java, Scala, and Python APIs
– ASF 2.0 Licensed
• Java implementation
– Parallelization (Yarn + Spark)
– GPU support
• Also Supports multi-GPU per host
• Runtime Neutral
– Local
– Hadoop / YARN + Spark
• https://github.com/deeplearning4j/deeplearning4j

DL4J Workflow Toolchain
ETL
(DataVec)
Vectorization
(DataVec)
Modeling
(DL4J)
Evaluation
(Arbiter)
Execution Platforms: Spark, Single Machine
ND4J - Linear Algebra Runtime: CPU, GPU

ND4J: The Need for Speed
• Javacpp (cython for java)
– Auto generate JNI bindings for C++ by parsing classes
– Allows for easy maintenance and deployment of c++
binaries in java
• CPU backends
– Openmp (multithreading within native operations)
– Openblas or MKL (BLAS operations)
– SIMD-extensions
• GPU backends
– DL4J supports Cuda 7.5 at the moment, and will support
8.0 support as soon as it comes out.
– Leverages cudnn as well

Prepping Data is Time Consuming
http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#633ea7f67f75

Preparing Data for Modeling is Hard

DataVec
• DataVec is a tool for machine learning ETL
(Extract, Transform, Load) operations.
– Spark-Enabled and focused on Supporting DL4J
• Also performs vectorization
– Image, CSV, Sequences (timeseries), more
• Open Source, ASF 2.0 Licensed
– https://github.com/deeplearning4j/DataVec

RECURRENT NEURAL NETWORK
APPLICATIONS
Using DL4J for

Source: IDC White Paper - sponsored by EMC.
As the Economy Contracts, the Digital Universe Expands. May 2009.
.
Transactional Data Explosion
• 2,500 exabytes of new information in 2012 with Internet as primary driver
• Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2
“zettabytes” this year
Relational
Transactional
(Logs, Sensors)
(You)

NERC Sensor Data Collection
openPDC PMU Data Collection circa 2009
• 120 Sensors
• 30 samples/second
• 4.3B Samples/day
• Housed in Hadoop

Sensor Timeseries Classification with RNNs
• Recurrent Neural Networks have the ability to
model change of input over time
• Older techniques (mostly) do not retain time
domain
– Hidden Markov Models do…
• but are more limited
• Key Takeaway:
– For working with Timeseries data, RNNs will be
more accurate

RNN Architectures
Standard
supervised
learning
Image
captioning
Sentiment
analysis
Video captioning,
Natural language
translation
Part of speech
tagging
Generative mode
for text

Anomaly Detection
• Model the normal patterns in the data
• Autoencoders give us the ability to look at
data that it hasn’t seen before
– Find anomalous patterns in sequences
– Can also use RNNs for pattern classification
• Interesting Industry Applications
– Telecom
– Financial Services

Audio Applications
• Text-to-Speech
• Recognize specific songs / audio
• Enables natural language interfaces

“Google is living a few years in the
future and sending the rest of us
messages”
-- Doug Cutting in 2013
• However
– Most organizations are not built like Google
• (and Jeff Dean does not work at your company…)
• Anyone building Next-Gen infrastructure has
to consider these things

Certified on Two Hadoop Distributions
• Running Spark on Hadoop via YARN gives us
– Sharing cluster resources between heterogeneous
workloads concurrently
– Access to the yarn scheduler capabilities
– Better control of executors in Spark
– Kerberos support for security
• Certified on CDH 5.4
• Certified on HDP 2.4
– [ Coming later this month ]

Questions?
Thank you for your time and attention
“Deep Learning: A Practitioner’s Approach”
(Oreilly, October 2016)

Running DL4J Workflows on Spark
• DataVec is built to scale out via Spark RDDs
– RDD<LabeledPoint>
– RDD<DataSet>
• DL4J Uses same MultiLayerConfiguration as
single host version
– Uses SparkDl4jMultiLayer to drive the training on spark
– Performs Parameter Averaging
spark-submit --class
io.skymind.spark.dl4j.datavec.BasicDataVecExample --master yarn --
num-executors 1 --properties-file ./spark_extra.props
./Skymind_spark-1.0-SNAPSHOT.jar

Deep Learning and Recurrent Neural Networks in the Enterprise

More Related Content

What's hot (20)

Similar to Deep Learning and Recurrent Neural Networks in the Enterprise (20)

More from Josh Patterson (13)

Recently uploaded (20)

Deep Learning and Recurrent Neural Networks in the Enterprise