Deep recurrent neutral networks for Sequence Learning in Spark

www.thalesgroup.com
OPEN
Deep recurrent neural networks for
Sequence Learning in Spark
YVES MABIALA

2
OPEN
Thisdocumentmaynotbereproduced,modified,adapted,published,translated,inanyway,inwholeorinpartor
disclosedtoathirdpartywithoutthepriorwrittenconsentofThales-©Thales2015Allrightsreserved.
Outline
▌Thales & Big Data
▌On the difficulty of Sequence Learning
▌Deep Learning for Sequence Learning
▌Spark implementation of Deep Learning
▌Use cases
Predictive maintenance
NLP

3
OPEN
Thales & Big Data
Thales systems produce a huge quantity of data
Transportation systems (ticketing, supervision, …)
Security (radar traces, network logs, …)
Satellite (photos, videos, …)
which is often
Massive
Heterogeneous
Extremely dynamic
and where understanding the dynamic of the monitored phenomena is
mandatory Sequence Learning

4
OPEN
What is sequence learning ?
Sequence learning refers to a set of ML tasks where a model has to
either deal with sequences as input, produce sequences as output or
both
Goal : Understand the dynamic of a sequence to
Classify
Predict
Model
Typical applications
Text
- Classify texts (sentiment analysis)
- Generate textual description of images (image captioning)
Video
- Video classification
Speech
- Speech to text

5
OPEN
How is it typically handled ?
Taking into account the dynamic is difficult
Often people do not bother
- E.g. text analysis using bag of word (one hot encoding)
– Problem for certain tasks such as sentiment classification (order of the words is
important)
Or use popular statistical approaches
- (Hidden) Markov model for prediction (and classification)
– Short term dependency (order 1) : 𝑃( 𝑋 𝑘 = 𝑥 (𝑋 𝑘−1 = 𝑥 𝑘−1, … , 𝑋 𝑘−𝑛 = 𝑥 𝑘−𝑛)) = 𝑃( 𝑋 𝑘 = 𝑥 𝑘 𝑋 𝑘−1 = 𝑥 𝑘−1)
- Autoregressive approaches for time series forecasting
The chair is red 1 0 1 1 0 0 0 0
The cat is on a chair
The cat is young 1 1 0 0 1 1 0 0
1 1 1 0 0 1 1 1
The is chair red young cat on a

6
OPEN
Link with artificial neural network ?
Artificial neural networks are statistical models inspired from the brain
Transforms the input by applying at each layer (non linear) functions
More layers equals more capabilities (≥ 2 hidden layers : Deep Learning)
Set of transformation and activation operations
Affine : 𝒀 = 𝑾 𝒕
𝑿 + 𝒃, sigmoid activation :
𝟏
𝟏+𝐞𝐱𝐩(−𝑿)
, tanh activation : 𝒀 = 𝐭𝐚𝐧𝐡(𝑿)
Convolutional : Apply a spatial convolution on the 1D/2D input (signal, image, …): 𝐘 = 𝒄𝒐𝒏𝒗 𝑿, 𝑾 + 𝒃
- Learns spatial features used for classification or prediction (mostly on images/videos)
Recurrent : Learn dependencies between successive observations (features related to the dynamic)
Objective
Find the best weights W to minimize the difference between the predicted output and the desired
one (using back-propagation algorithm)
input
hidden
layers
output

7
OPEN
Able to cope with varying size sequences either at the input or at the output
Recurrent Neural Network basics
One to many
(fixed size input,
sequence output)
e.g. Image captioning
Many to many
(sequence input to sequence
output)
e.g. Speech to text
Many to one
(sequence input to fixed size
output)
e.g. Text classification
Artificial neural networks with one or more recurrent layers
Classical neural network Recurrent neural network
𝒀 𝒌−𝟑 𝒀 𝒌−𝟐 𝒀 𝒌−𝟏 𝒀 𝒌
𝒀 𝒌
𝑿 𝒌−𝟑 𝑿 𝒌−𝟐 𝑿 𝒌−𝟏 𝑿 𝒌
𝒀 𝒌 = 𝒇(𝑾 𝒕
𝑿 𝒌 + 𝑯𝒀 𝒌−𝟏)
𝑿 𝒌𝑿
𝒀 𝒌 = 𝒇(𝑾 𝒕
𝑿 𝒌)
𝒀
Unrolled through time
𝒀 𝒌−𝟑 𝒀 𝒌−𝟐 𝒀 𝒌−𝟏 𝒀 𝒌
𝑿
𝒀 𝒌−𝟑 𝒀 𝒌−𝟐 𝒀 𝒌−𝟏 𝒀 𝒌
𝑿 𝒌−𝟑 𝑿 𝒌−𝟐 𝑿 𝒌−𝟏 𝑿 𝒌
𝑿 𝒌−𝟑 𝑿 𝒌−𝟐 𝑿 𝒌−𝟏 𝑿 𝒌
𝒀

8
OPEN
On the difficulty of training recurrent networks
RNNs are (were) known to be difficult to learn
More weights and more computational steps
- More computationally expensive (accelerator needed for matrix ops : Blas or GPU)
- More data needed to converge (scalability over Big Data architectures : Spark)
– Theano, Tensor Flow, Caffe do not have distributed versions
Unable to learn long range dependencies (Graves & Al 2014)
- At a given time t, RNN does not remember the observations before 𝑋𝑡−𝑛
 New RNN architectures with memory preservation (more context)
𝑍 𝑘 = 𝑓 𝑊𝑧
𝑇
𝑋 𝑘 + 𝐻𝑧 𝑌𝑘−1
𝑅 𝑘 = 𝑓(𝑊𝑟
𝑇
𝑋 𝑘 + 𝐻𝑟 𝑌𝑘−1)
𝐻 𝑘 = tanh(𝑊ℎ𝑡𝑖𝑑𝑒
𝑇
𝑋 𝑘 + 𝑈 𝑌𝑘−1 o 𝑅 𝑘 )
𝑌𝑘 = 1 − 𝑍 𝑘 𝑌𝑘−1 + 𝑍 𝑘 𝐻 𝑘LSTM GRU

9
OPEN
Recurrent neural networks in Spark
Spark implementation of DL algorithms (data parallel)
All the needed blocks
- Affine, convolutional, recurrent layers (Simple and GRU)
- SGD, rmsprop, adadelta optimizers
- Sigmoid, tanh, reLu activations
CPU (and GPU backend)
Fully compatible with existing DL library in Spark ML
Performance
On 6 nodes cluster (CPU)
- 5.46 average speedup (some communication overhead)
– About the same speedup as MLP in Spark ML
Driver
Worker 1
Worker 2
Worker 3
Resulting gradients (2)
Model broadcast (1)

10
OPEN
Use case 1 : predictive maintenance (1)
Context
Thales and its clients build systems in different domains
- Transportation (ticketing, controlling), Defense (radar), Satellites
Need better and more accurate maintenance services
- From planned maintenance (every x days) to an alert maintenance
- From expert detection to automatic failure prediction
- From whole subsystem changes to more localized reparations
Goal
Detect early signs of a (sub)system failure using data coming from sensors
monitoring the health of a system (HUMS)

11
OPEN
Example on a real system
20 sensors (20 values every 5 minutes), label (failure or not)
Take 3 hours of data and predict the probability of failure in the next hour (fully
customizable)
Learning using MLLIB

12
OPEN
Recurrent net learning
Impact of recurrent nets
Logistic regression
- 70% detection with 70% accuracy
Recurrent Neural Network
• 85% detection with 75% accuracy

13
OPEN
Use case 2 : Sentiment analysis (1)
Context
Social network analysis application developed at Thales (Twitter, Facebook,
blogs, forums)
- Analyze both the content of the texts and the relations (texts, actors)
Multiple (big data) analysis
- Actor community detection
- Text clustering (themes)
- …
Focus on
Sentiment analysis on the collected texts
- Classify texts based on their sentiment

14
OPEN
Learning dataset
Sentiment140 + Kaggle challenge (1.5M labeled tweets)
50% positives, 50% negatives
Compare Bag of words + traditional classifiers (Naïve Bayes, SVM,
logistic regression) versus RNN

15
OPEN
NB SVM
Log
Reg
Neural Net
(perceptron)
RNN (GRU)
100 61.4 58.4 58.4 55.6 NA
1 000 70.6 70.6 70.6 70.8 68.1
10 000 75.4 75.1 75.4 76.1 72.3
100 000 78.1 76.6 76.9 78.5 79.2
700 000 80 78.3 78.3 80 84.1
Results
40
45
50
55
60
65
70
75
80
85
90 NB
SVM
LogReg
NeuralNet
RNN
(GRU)

16
OPEN
The end…
THANK YOU !

Deep recurrent neutral networks for Sequence Learning in Spark

More Related Content

What's hot (20)

Viewers also liked (16)

Similar to Deep recurrent neutral networks for Sequence Learning in Spark (20)

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded (20)

Deep recurrent neutral networks for Sequence Learning in Spark