SlideShare a Scribd company logo
@ODSC
Thomas Delteil
https://www.linkedin.com/in/thomasdelteil
Miguel Fierro
@miguelgfierro
https://miguelgfierro.com
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
O p e ra t io na liza t io nN L P w i t h C N NN L P
Deep Learning for NLP
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Interaction between computers
and human language
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
NLP
Machine
translation OCR
Q&A
Sentiment
Analysis
Speech
Recognition
T2S
Topic
Modelling
Information
Retrieval
Natural
Language
Understanding
Document
Classification
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
£1.3Tvalue of company
data
source: IDC, 2014
10%
of organizations expect to
commercialise their data by 2020
source: Gartner, 2016
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
8.4PB
of information per second
as of 2020
source: business2comunity, 2016
70%
of companies
use customer feedback
Source: business2comunity, 2016
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Spaghetti
Milk
Eating
Broccoli
Kitten
Puppy
Hamster
Eating
TOPIC 1 TOPIC 2
… my favourite dish is
spaghetti …
… the cute hamster is
eating broccoli…
… I love kittens…
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Generative models joint distribution
source: https://en.wikipedia.org/wiki/Hidden_Markov_model
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Conditional models conditional distribution
source: John Lafferty, Andrew McCallum, Fernando C.N. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.
ICML, 2001.
ODSC 2016 London – Thomas Delteil #linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: https://en.wikipedia.org/wiki/Tf%E2%80%93idf
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Bag of n-grams instead of bag of words
source: A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification, 2016
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
N e e d s G P U s a n d l o t s
o f d a t a
G r e a t p e r f o r m a n c eF e a t u r e g e n e r a t i o n
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
wait, wait, wait…
What makes deep learning
deep?
input hidden output
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
input hidden hidden hidden output
…
…
…
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: https://en.wikipedia.org/wiki/Maxima_and_minima
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
input
hidden
output
hidden
hidden
ti ti+1 ti+2 ti+3
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
number of layers
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: https://en.wikipedia.org/wiki/Long_short-term_memory
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Convolution Pooling PoolingConvolution Fully
connected
Fully
connected
Input image Output
predictions
7
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Sharpening filter
Laplacian filter
Sobel x-axis filter
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Max pooling with 2x2 kernel and stride of 2x2
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
input hidden output
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
SoftmaxReLUtanh
Deep Learning for NLP
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
When I read some of the rules
for speaking the English
language correctly, I think any
fool can make a rule, and every
fool will mind it
Henry David Thoreau
?
122 122 112 90 5 10 21
121 122 112 11 6 11 21
120 118 6 10 11 12 23
118 4 6 5 23 23 23
4 6 1 23 23 21 23
4 5 20 24 23 21 23
source: Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn,and Dong Yu,. ClassificationConvolutional Neural Networks for
Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
O D S C - U K N L P
space 0 0 0 0 0 0 0 1 0 0 0
- 0 0 0 0 1 0 0 0 0 0 0
. 0 0 0 0 0 0 0 0 0 0 0
A 0 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0 0 0 0
C 0 0 0 1 0 0 0 0 0 0 0
D 0 1 0 0 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 0 0
K 0 0 0 0 0 0 1 0 0 0 0
L 0 0 0 0 0 0 0 0 0 1 0
M 0 0 0 0 0 0 0 0 0 0 0
N 0 0 0 0 0 0 0 0 1 0 0
O 1 0 0 0 0 0 0 0 0 0 0
P 0 0 0 0 0 0 0 0 0 0 1
Q 0 0 0 0 0 0 0 0 0 0 0
R 0 0 0 0 0 0 0 0 0 0 0
S 0 0 1 0 0 0 0 0 0 0 0
T 0 0 0 0 0 0 0 0 0 0 0
U 0 0 0 0 0 1 0 0 0 0 0
V 0 0 0 0 0 0 0 0 0 0 0
W 0 0 0 0 0 0 0 0 0 0 0
X 0 0 0 0 0 0 0 0 0 0 0
Y 0 0 0 0 0 0 0 0 0 0 0
Z 0 0 0 0 0 0 0 0 0 0 0
One-hot encoding over a
vocabulary of characters.
Encoding:
Text = “ODSC-UK NLP”
Vocab:
[ ‘ ‘, ‘-’, ‘.’, ‘A’, ‘B’, ‘C’, …, ‘Z’ ]
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
For images:
For text:
Humans to rephrase the examples
Synonyms
Similar semantic meaning
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. NIPS 2015
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
O D S C - U K N L P … 1013
space 0 0 0 0 0 0 0 1 0 0 0 … …
- 0 0 0 0 1 0 0 0 0 0 0 … …
. 0 0 0 0 0 0 0 0 0 0 0 … …
A 0 0 0 0 0 0 0 0 0 0 0 … …
B 0 0 0 0 0 0 0 0 0 0 0 … …
C 0 0 0 1 0 0 0 0 0 0 0 … …
D 0 1 0 0 0 0 0 0 0 0 0 … …
E 0 0 0 0 0 0 0 0 0 0 0 … …
F 0 0 0 0 0 0 0 0 0 0 0 … …
G 0 0 0 0 0 0 0 0 0 0 0 … …
H 0 0 0 0 0 0 0 0 0 0 0 … …
I 0 0 0 0 0 0 0 0 0 0 0 … …
J 0 0 0 0 0 0 0 0 0 0 0 … …
K 0 0 0 0 0 0 1 0 0 0 0 … …
L 0 0 0 0 0 0 0 0 0 1 0 … …
M 0 0 0 0 0 0 0 0 0 0 0 … …
N 0 0 0 0 0 0 0 0 1 0 0 … …
O 1 0 0 0 0 0 0 0 0 0 0 … …
P 0 0 0 0 0 0 0 0 0 0 1 … …
Q 0 0 0 0 0 0 0 0 0 0 0 … …
R 0 0 0 0 0 0 0 0 0 0 0 … …
S 0 0 1 0 0 0 0 0 0 0 0 … …
T 0 0 0 0 0 0 0 0 0 0 0 … …
U 0 0 0 0 0 1 0 0 0 0 0 … …
V 0 0 0 0 0 0 0 0 0 0 0 … …
W 0 0 0 0 0 0 0 0 0 0 0 … …
X 0 0 0 0 0 0 0 0 0 0 0 … …
Y 0 0 0 0 0 0 0 0 0 0 0 … …
Z 0 0 0 0 0 0 0 0 0 0 0 … …
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
0 1 2 3 4 … 1007
0 6.4 1.1 3.2 0.1 -0.4 … 3.1
… … … … … … … …
255 1.2 3.4 -1 1.2 3.2 … -1
x 256
69x1014x1
1x1008x256
x 1008
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
0 1 2 3 4 … 1007
0 6.4 1.1 3.2 0.1 -0.4 … 3.1
… … … … … … … …
255 1.2 3.4 -1 1.2 3.2 … -1
0 1 2 3 4 … 1007
0 6.4 1.1 3.2 0.1 0 … 3.1
… … … … … … … …
255 1.2 3.4 0 1.2 3.2 … 0
1x1008x256
1x1008x256
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
0 1 2 3 4 … … … 1007
0 6.4 1.1 3.2 0.1 0 … … … 3.1
… … … … … … … … … …
255 1.2 3.4 0 1.2 3.2 … … … 0
0 1 … 335
0 6.4 0.1 … …
… … … … …
255 3.4 3.2
1x1008x256
1x336x256
x 336x 256
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
0 1 2 3 4 5 6 7 8 … 335
0 6.4 0.1 … … … … … … … … …
… … … … … … … … … … … …
255 3.4 3.2 … … … … … … … … …
0 1 2 3 4 5 6 … 329
0 -2.4 3.2 … … … … … … …
… … … … … … … … … …
255 … … … … … … … … …
1x330x256
1x336x256
x 256x 330
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
1x330x256 <- after 2 convolution (7x1/1) and 1 max pooling (3x1/3)
1x110x256 <- 1 max-pooling (3x1/3)
3x102x256 <- 4 convolutions (3x1/1)
1x34x256 <- 1 max-pooling (3x1/3)
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
0 1 2 3 4 5 6 7 8 … 33
0 6.4 0.1 … … … … … … … … …
1 2.1 24.9 … … … … … … … … …
… … … … … … … … … … … …
255 … … … … … … … … … … 9.9
0
0 6.4
1 0.1
… …
35 2.1
36 24.9
… …
… …
… …
… …
8703 9.9
8704x1x1
1x34x256
x 256
0
0 6.4
1 0.1
… …
… …
… …
… …
… …
… …
… …
8703 9.9
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
8704x1x1
0
…
k
1023
x 1024
1024x1x1
𝑓𝑘 𝑋 = ෍
𝑖=0
8703
𝑤 𝑘𝑖 ∗ 𝑥𝑖 + 𝑏 𝑘
0
0 8.7
1 -2.1
… …
… …
… …
… …
… …
… …
… …
1023 32.1
0
0 6.4
1 0.1
… …
… …
… …
… …
… …
… …
… …
1023 9.9
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
1024x1x1
0
…
k
1023
x 1024
1024x1x1
𝑓𝑘 𝑋 = ෍
𝑖=0
8703
𝑤 𝑘𝑖 ∗ 𝑥𝑖 + 𝑏 𝑘
0
0 8.7
1 -2.1
… …
… …
… …
… …
… …
… …
… …
1023 32.1
ignored
0
0 6.4
1 0.1
… …
… …
… …
… …
… …
… …
… …
1023 9.9
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
1024x1x1
0
…
N
x N
Nx1x1
0
0 2.7
1 0.1
… …
… …
N-1 12.5
ignored
Softmax
0
0 0.1
1 0.01
… …
… …
N-1 0.8
Nx1x1
𝜎 𝒛 𝑖 =
𝑒 𝑧 𝑖
σ 𝑗=0
𝑁−1
𝑒
𝑧 𝑗
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
• MXNet using python bindings
• Training on Azure N-Series, on Tesla K80 GPU
• 3 days of training on 2.5M example for sentiment polarity
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Amazon Review Polarity dataset (1.8M training, 200k testing):
- Crepe model + thesaurus augmentation: 95.07%
- TFIDF + n-grams: 91.64%
AG’s news corpus dataset (4 Classes 120kM training, 7.6k testing):
- Crepe model + thesaurus augmentation: 85.20%
- TFIDF + n-grams: 92.36%
 CNN are no silver bullets, but they perform best on very large dataset
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun. Very Deep Convolutional Networks
for Natural Language Processing, 2016
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: Sergey Ioffe and Christian Szegedy Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015.
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
6.4 1.1 3.2 0.1 0 5 3.1 10 21 3.1 0.2 1.8 0 16.4 1.1 3.2 0.1 0 5 3.1 10 21 3.1 0.2 1.8 0 1
6.4 3.2 5 10 21
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
source: A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification. 2016
Deep Learning for NLP
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
NLP APIs from major cloud providers and market places
- Language detection
- Sentiment Analysis
- Topic detection
- Translation
- Content moderation
- Text to speech
- Speech to text
- Intent modelling
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
+
Scalable
Managed
Pay per use pricing
Documentation and sample code
-
Generic solutions
Limited customizability
Performance
Latency
Limited batch processing
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Single Machine
Training Data Testing Data
Sample Production
Data
Model
Development
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Data pipeline ?
Retraining ?
Scalability ?
Real time / Batch scoring ?
Multiple team / frameworks ?
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Production
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
Training
instance(s)
(GPU)
Scoring
instance
(CPU)
Scoring
instance
(CPU)
Scoring
instance
(CPU)
Scoring
instance
(CPU)
Training
Data
Serialized
model
Serialized
model
Training
instance(s)
(GPU)
Orchestration Layer (CI/CD / Job scheduling / Monitoring)
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
+
Auto-scale and load balancing
Managed
Domain specific training data
Latency
-
Pricing less flexible
Deployment pipeline to monitor
Performance
@ODSC
Thomas Delteil
https://www.linkedin.com/in/thomasdelteil
Miguel Fierro
@miguelgfierro
https://miguelgfierro.com
The code of this application is published at:
https://github.com/ilkarman/Bangalore_Senti
ment
Part of our code is based on:
https://github.com/zhangxiangxiao/Crepe
Attribution of some images:
• http://morguefile.com
• https://unsplash.com
• Ana Corrales Photography
• http://wikipedia.org
Amazon dataset citation:
• J. McAuley, C. Targett, J. Shi, A. van den
Hengel. Image-based recommendations
on styles and substitutes. SIGIR, 2015.
• J. McAuley, R. Pandey, J. Leskovec.
Inferring networks of substitutable and
complementary products. Knowledge
Discovery and Data Mining, 2015
Open Data Science Conference London,
8 & 9 October, 2016
© 2016 Microsoft Corporation. All right reserved

More Related Content

PDF
The success of chocolate
PPTX
Pace IT - Command Line Networking
PDF
Leveraging Data Driven Research Through Microsoft Azure
PDF
Empowering every person on the planet to achieve more
PDF
Dresscovery presentation
PPT
Python Learning for Natural Language Processing
PDF
Practical Deep Learning for NLP
PDF
Machine learning for NLP
The success of chocolate
Pace IT - Command Line Networking
Leveraging Data Driven Research Through Microsoft Azure
Empowering every person on the planet to achieve more
Dresscovery presentation
Python Learning for Natural Language Processing
Practical Deep Learning for NLP
Machine learning for NLP

Similar to Deep Learning for NLP (8)

PDF
Convolutional Neural Networks and Natural Language Processing
PDF
Tutorial-on-DNN-07-Co-design-Precision.pdf
PDF
04 accelerating dl inference with (open)capi and posit numbers
PDF
QuadIron An open source library for number theoretic transform-based erasure ...
PDF
PLOTCON NYC: Domain Specific Visualization
PPTX
Optimized feedforward network of cnn with xnor v5
PDF
Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759
PDF
MSC-2013-12
Convolutional Neural Networks and Natural Language Processing
Tutorial-on-DNN-07-Co-design-Precision.pdf
04 accelerating dl inference with (open)capi and posit numbers
QuadIron An open source library for number theoretic transform-based erasure ...
PLOTCON NYC: Domain Specific Visualization
Optimized feedforward network of cnn with xnor v5
Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759
MSC-2013-12
Ad

More from Miguel González-Fierro (10)

PPTX
Los retos de la inteligencia artificial en la sociedad actual
PDF
Knowledge Graph Recommendation Systems For COVID-19
PDF
Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
PPTX
Best practices in coding for beginners
PDF
Distributed training of Deep Learning Models
PPTX
Running Intelligent Applications inside a Database: Deep Learning with Python...
PPTX
Deep Learning for Sales Professionals
PPTX
Deep Learning for Lung Cancer Detection
PPTX
Mastering Computer Vision Problems with State-of-the-art Deep Learning
PPTX
Speeding up machine-learning applications with the LightGBM library
Los retos de la inteligencia artificial en la sociedad actual
Knowledge Graph Recommendation Systems For COVID-19
Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
Best practices in coding for beginners
Distributed training of Deep Learning Models
Running Intelligent Applications inside a Database: Deep Learning with Python...
Deep Learning for Sales Professionals
Deep Learning for Lung Cancer Detection
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Speeding up machine-learning applications with the LightGBM library
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Approach and Philosophy of On baking technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Spectroscopy.pptx food analysis technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Cloud computing and distributed systems.
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Understanding_Digital_Forensics_Presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Approach and Philosophy of On baking technology
MIND Revenue Release Quarter 2 2025 Press Release
Spectroscopy.pptx food analysis technology
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
Cloud computing and distributed systems.
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity

Deep Learning for NLP

  • 2. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro O p e ra t io na liza t io nN L P w i t h C N NN L P
  • 4. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Interaction between computers and human language
  • 5. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro NLP Machine translation OCR Q&A Sentiment Analysis Speech Recognition T2S Topic Modelling Information Retrieval Natural Language Understanding Document Classification
  • 6. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro £1.3Tvalue of company data source: IDC, 2014 10% of organizations expect to commercialise their data by 2020 source: Gartner, 2016
  • 7. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro 8.4PB of information per second as of 2020 source: business2comunity, 2016 70% of companies use customer feedback Source: business2comunity, 2016
  • 8. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Spaghetti Milk Eating Broccoli Kitten Puppy Hamster Eating TOPIC 1 TOPIC 2 … my favourite dish is spaghetti … … the cute hamster is eating broccoli… … I love kittens…
  • 9. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Generative models joint distribution source: https://en.wikipedia.org/wiki/Hidden_Markov_model
  • 10. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Conditional models conditional distribution source: John Lafferty, Andrew McCallum, Fernando C.N. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. ICML, 2001.
  • 11. ODSC 2016 London – Thomas Delteil #linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro source: https://en.wikipedia.org/wiki/Tf%E2%80%93idf
  • 12. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Bag of n-grams instead of bag of words source: A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification, 2016
  • 13. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro N e e d s G P U s a n d l o t s o f d a t a G r e a t p e r f o r m a n c eF e a t u r e g e n e r a t i o n
  • 14. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro wait, wait, wait… What makes deep learning deep? input hidden output
  • 15. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro input hidden hidden hidden output … … …
  • 16. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro source: R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996
  • 17. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro source: https://en.wikipedia.org/wiki/Maxima_and_minima
  • 18. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro input hidden output hidden hidden ti ti+1 ti+2 ti+3
  • 19. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro number of layers
  • 20. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro source: https://en.wikipedia.org/wiki/Long_short-term_memory
  • 21. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Convolution Pooling PoolingConvolution Fully connected Fully connected Input image Output predictions 7
  • 22. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
  • 23. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Sharpening filter Laplacian filter Sobel x-axis filter
  • 24. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Max pooling with 2x2 kernel and stride of 2x2
  • 25. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro input hidden output
  • 26. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro SoftmaxReLUtanh
  • 28. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro When I read some of the rules for speaking the English language correctly, I think any fool can make a rule, and every fool will mind it Henry David Thoreau ? 122 122 112 90 5 10 21 121 122 112 11 6 11 21 120 118 6 10 11 12 23 118 4 6 5 23 23 23 4 6 1 23 23 21 23 4 5 20 24 23 21 23 source: Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn,and Dong Yu,. ClassificationConvolutional Neural Networks for Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014
  • 29. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro O D S C - U K N L P space 0 0 0 0 0 0 0 1 0 0 0 - 0 0 0 0 1 0 0 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 0 0 B 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 1 0 0 0 0 0 0 0 D 0 1 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 0 0 0 0 0 0 J 0 0 0 0 0 0 0 0 0 0 0 K 0 0 0 0 0 0 1 0 0 0 0 L 0 0 0 0 0 0 0 0 0 1 0 M 0 0 0 0 0 0 0 0 0 0 0 N 0 0 0 0 0 0 0 0 1 0 0 O 1 0 0 0 0 0 0 0 0 0 0 P 0 0 0 0 0 0 0 0 0 0 1 Q 0 0 0 0 0 0 0 0 0 0 0 R 0 0 0 0 0 0 0 0 0 0 0 S 0 0 1 0 0 0 0 0 0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 U 0 0 0 0 0 1 0 0 0 0 0 V 0 0 0 0 0 0 0 0 0 0 0 W 0 0 0 0 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 0 0 Y 0 0 0 0 0 0 0 0 0 0 0 Z 0 0 0 0 0 0 0 0 0 0 0 One-hot encoding over a vocabulary of characters. Encoding: Text = “ODSC-UK NLP” Vocab: [ ‘ ‘, ‘-’, ‘.’, ‘A’, ‘B’, ‘C’, …, ‘Z’ ]
  • 30. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro For images: For text: Humans to rephrase the examples Synonyms Similar semantic meaning
  • 31. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro source: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. NIPS 2015
  • 32. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
  • 33. O D S C - U K N L P … 1013 space 0 0 0 0 0 0 0 1 0 0 0 … … - 0 0 0 0 1 0 0 0 0 0 0 … … . 0 0 0 0 0 0 0 0 0 0 0 … … A 0 0 0 0 0 0 0 0 0 0 0 … … B 0 0 0 0 0 0 0 0 0 0 0 … … C 0 0 0 1 0 0 0 0 0 0 0 … … D 0 1 0 0 0 0 0 0 0 0 0 … … E 0 0 0 0 0 0 0 0 0 0 0 … … F 0 0 0 0 0 0 0 0 0 0 0 … … G 0 0 0 0 0 0 0 0 0 0 0 … … H 0 0 0 0 0 0 0 0 0 0 0 … … I 0 0 0 0 0 0 0 0 0 0 0 … … J 0 0 0 0 0 0 0 0 0 0 0 … … K 0 0 0 0 0 0 1 0 0 0 0 … … L 0 0 0 0 0 0 0 0 0 1 0 … … M 0 0 0 0 0 0 0 0 0 0 0 … … N 0 0 0 0 0 0 0 0 1 0 0 … … O 1 0 0 0 0 0 0 0 0 0 0 … … P 0 0 0 0 0 0 0 0 0 0 1 … … Q 0 0 0 0 0 0 0 0 0 0 0 … … R 0 0 0 0 0 0 0 0 0 0 0 … … S 0 0 1 0 0 0 0 0 0 0 0 … … T 0 0 0 0 0 0 0 0 0 0 0 … … U 0 0 0 0 0 1 0 0 0 0 0 … … V 0 0 0 0 0 0 0 0 0 0 0 … … W 0 0 0 0 0 0 0 0 0 0 0 … … X 0 0 0 0 0 0 0 0 0 0 0 … … Y 0 0 0 0 0 0 0 0 0 0 0 … … Z 0 0 0 0 0 0 0 0 0 0 0 … … ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro 0 1 2 3 4 … 1007 0 6.4 1.1 3.2 0.1 -0.4 … 3.1 … … … … … … … … 255 1.2 3.4 -1 1.2 3.2 … -1 x 256 69x1014x1 1x1008x256 x 1008
  • 34. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro 0 1 2 3 4 … 1007 0 6.4 1.1 3.2 0.1 -0.4 … 3.1 … … … … … … … … 255 1.2 3.4 -1 1.2 3.2 … -1 0 1 2 3 4 … 1007 0 6.4 1.1 3.2 0.1 0 … 3.1 … … … … … … … … 255 1.2 3.4 0 1.2 3.2 … 0 1x1008x256 1x1008x256
  • 35. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro 0 1 2 3 4 … … … 1007 0 6.4 1.1 3.2 0.1 0 … … … 3.1 … … … … … … … … … … 255 1.2 3.4 0 1.2 3.2 … … … 0 0 1 … 335 0 6.4 0.1 … … … … … … … 255 3.4 3.2 1x1008x256 1x336x256 x 336x 256
  • 36. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro 0 1 2 3 4 5 6 7 8 … 335 0 6.4 0.1 … … … … … … … … … … … … … … … … … … … … … 255 3.4 3.2 … … … … … … … … … 0 1 2 3 4 5 6 … 329 0 -2.4 3.2 … … … … … … … … … … … … … … … … … 255 … … … … … … … … … 1x330x256 1x336x256 x 256x 330
  • 37. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro 1x330x256 <- after 2 convolution (7x1/1) and 1 max pooling (3x1/3) 1x110x256 <- 1 max-pooling (3x1/3) 3x102x256 <- 4 convolutions (3x1/1) 1x34x256 <- 1 max-pooling (3x1/3)
  • 38. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro 0 1 2 3 4 5 6 7 8 … 33 0 6.4 0.1 … … … … … … … … … 1 2.1 24.9 … … … … … … … … … … … … … … … … … … … … … 255 … … … … … … … … … … 9.9 0 0 6.4 1 0.1 … … 35 2.1 36 24.9 … … … … … … … … 8703 9.9 8704x1x1 1x34x256 x 256
  • 39. 0 0 6.4 1 0.1 … … … … … … … … … … … … … … 8703 9.9 ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro 8704x1x1 0 … k 1023 x 1024 1024x1x1 𝑓𝑘 𝑋 = ෍ 𝑖=0 8703 𝑤 𝑘𝑖 ∗ 𝑥𝑖 + 𝑏 𝑘 0 0 8.7 1 -2.1 … … … … … … … … … … … … … … 1023 32.1
  • 40. 0 0 6.4 1 0.1 … … … … … … … … … … … … … … 1023 9.9 ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro 1024x1x1 0 … k 1023 x 1024 1024x1x1 𝑓𝑘 𝑋 = ෍ 𝑖=0 8703 𝑤 𝑘𝑖 ∗ 𝑥𝑖 + 𝑏 𝑘 0 0 8.7 1 -2.1 … … … … … … … … … … … … … … 1023 32.1 ignored
  • 41. 0 0 6.4 1 0.1 … … … … … … … … … … … … … … 1023 9.9 ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro 1024x1x1 0 … N x N Nx1x1 0 0 2.7 1 0.1 … … … … N-1 12.5 ignored Softmax 0 0 0.1 1 0.01 … … … … N-1 0.8 Nx1x1 𝜎 𝒛 𝑖 = 𝑒 𝑧 𝑖 σ 𝑗=0 𝑁−1 𝑒 𝑧 𝑗
  • 42. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro • MXNet using python bindings • Training on Azure N-Series, on Tesla K80 GPU • 3 days of training on 2.5M example for sentiment polarity
  • 43. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Amazon Review Polarity dataset (1.8M training, 200k testing): - Crepe model + thesaurus augmentation: 95.07% - TFIDF + n-grams: 91.64% AG’s news corpus dataset (4 Classes 120kM training, 7.6k testing): - Crepe model + thesaurus augmentation: 85.20% - TFIDF + n-grams: 92.36%  CNN are no silver bullets, but they perform best on very large dataset
  • 44. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro source: Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun. Very Deep Convolutional Networks for Natural Language Processing, 2016
  • 45. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro source: Sergey Ioffe and Christian Szegedy Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015.
  • 46. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro 6.4 1.1 3.2 0.1 0 5 3.1 10 21 3.1 0.2 1.8 0 16.4 1.1 3.2 0.1 0 5 3.1 10 21 3.1 0.2 1.8 0 1 6.4 3.2 5 10 21
  • 47. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro source: A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification. 2016
  • 49. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro NLP APIs from major cloud providers and market places - Language detection - Sentiment Analysis - Topic detection - Translation - Content moderation - Text to speech - Speech to text - Intent modelling
  • 50. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro + Scalable Managed Pay per use pricing Documentation and sample code - Generic solutions Limited customizability Performance Latency Limited batch processing
  • 51. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Single Machine Training Data Testing Data Sample Production Data Model Development
  • 52. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Data pipeline ? Retraining ? Scalability ? Real time / Batch scoring ? Multiple team / frameworks ?
  • 53. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Production
  • 54. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro Training instance(s) (GPU) Scoring instance (CPU) Scoring instance (CPU) Scoring instance (CPU) Scoring instance (CPU) Training Data Serialized model Serialized model Training instance(s) (GPU) Orchestration Layer (CI/CD / Job scheduling / Monitoring)
  • 55. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro
  • 56. ODSC 2016 London – Thomas Delteil linkedin.com/in/thomasdelteil & Miguel Fierro @miguelgfierro + Auto-scale and load balancing Managed Domain specific training data Latency - Pricing less flexible Deployment pipeline to monitor Performance
  • 58. The code of this application is published at: https://github.com/ilkarman/Bangalore_Senti ment Part of our code is based on: https://github.com/zhangxiangxiao/Crepe Attribution of some images: • http://morguefile.com • https://unsplash.com • Ana Corrales Photography • http://wikipedia.org Amazon dataset citation: • J. McAuley, C. Targett, J. Shi, A. van den Hengel. Image-based recommendations on styles and substitutes. SIGIR, 2015. • J. McAuley, R. Pandey, J. Leskovec. Inferring networks of substitutable and complementary products. Knowledge Discovery and Data Mining, 2015 Open Data Science Conference London, 8 & 9 October, 2016 © 2016 Microsoft Corporation. All right reserved

Editor's Notes

  • #5: Turing test Jonh Mccarthy, IA term and LISP
  • #9: https://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/ https://www.quora.com/What-is-a-good-explanation-of-Latent-Dirichlet-Allocation
  • #10: Part to speech tagging To define a joint probability over observation and label sequences, a generative model needs to enumerate all possible ob- servation sequences, typically requiring a representation in which observations are task-appropriate atomic entities, such as words or nucleotides. In particular, it is not practi- cal to represent multiple interacting features or long-range dependencies of the observations, since the inference prob- lem for such models is intractable
  • #11: http://www.inference.phy.cam.ac.uk/hmw26/papers/crf_intro.pdf http://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers source: John Lafferty, Andrew McCallum, Fernando C.N. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. ICML, 2001.
  • #13: https://github.com/facebookresearch/fastText
  • #17: http://www.nature.com/nature/journal/v323/n6088/pdf/323533a0.pdf (1986) source: R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996
  • #30: - Size of vocab varies (up to 500), could use all characters, could use ‘other character’ - Word pieces, non latin languages, - No preprocessing for text featurizer
  • #31: In terms of texts, it is not reasonable to augment the data using signal transformations as done in image or speech recognition
  • #34: - Operate in batches
  • #35: - regularize, expressibility conserved, reduce complexity
  • #37: Hierarchical representations of information
  • #45: In general, in the same sentence, we may be faced with local and long-range dependencies
  • #46: Whitening but on every steps Lab41 good example
  • #47: In general, in the same sentence, we may be faced with local and long-range dependencies
  • #48: Link to paper
  • #54: Live chat +
  • #55: Increasingly data scientist need to do data engineers, pipeline engineers, software engineer
  • #56: Author + movie It was a breeze to configure and worked straight away It arrived as expected. No complaint.
  • #57: Live chat +
  • #58: Live chat +
  • #59: Live chat +