SlideShare a Scribd company logo
Brief Overview of Deep Networks
Monireh Ebrahimi
Semantic Cognitive Perceptual Computing Course, July 2016.
Ohio Center of Excellence in Knowledge-enabled Computing(Kno.e.sis),
Wright State University, USA
1
• “Representation-learning methods with
multiple levels of representation, obtained by
composing simple but non-linear modules
that each transform the representation at one
level (starting with the raw input) into a
representation at a higher, slightly more
abstract level. “
What is deep learning?
2
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553
(2015): 436-444.
Successive model layers learn deeper intermediate representations.
Lee, Honglak. "Tutorial on deep learning and applications." NIPS 2010
Workshop on Deep Learning and Unsupervised Feature Learning.
2010.
3
What is deep learning? Learning Hierarchical Representations
Socher, Richard, Yoshua Bengio, and Chris Manning. "Deep learning
for NLP." Tutorial at Association of Computational Logistics (ACL),
2012, and North American Chapter of the Association of
Computational Linguistics (NAACL) (2013).
• Image recognition: Pixel → edge → texton →
motif → part → object
• Text: Character → word → word group →
clause → sentence → story
• Speech: Sample → spectral band → sound →
… phone → phoneme → word
What is deep learning? Learning Hierarchical Representations
4
LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International
Conference on Machine Learning (ICML’13). 2013.
• Does not require any manual Feature
Engineering
• Deep architectures work well (vision, audio,
NLP, etc.)!
– Speech Recognition(2009)
– Computer Vision (2012)
• Early in 2015, a machine was able to beat the human at
an object recognition challenge for the first time in the
history of AI.
– Machine Translation (2014)
Why go deep?
5
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553
(2015): 436-444.
• Loosely inspired by biological neural networks
(the central nervous system of animals),
particularly brain
Biologically inspired: how does the cortex learn perception?
6
• which details are important?
• For airplanes, feathers and wing flapping
weren't crucial
• What is the equivalent of aerodynamics for
understanding intelligence?
“Let's be inspired by nature, but not too much”
7
LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International Conference
on Machine Learning (ICML’13). 2013.
• Retina - LGN - V1 - V2 - V4 - PIT – AIT
• Lots of intermediate representations
Biologically Inspired: The Mammalian Visual Cortex is
Hierarchical.
8
[picture from Simon Thorpe]
LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International
Conference on Machine Learning (ICML’13). 2013.
All models are WRONG, but some are USEFUL.
9
Neural Networks
10
Why now?
11
• Vanishing Gradient Problem
• Lots of data
• GPUs
RBM (Restricted Boltzman Machine)
12
• Solution to Vanishing
Gradient Problem
• Reconstruct the input and
learn the features in this
process.
https://www.youtube.com/channel/UC9OeZkIwhzfv-
_Cb7fCikLQ
Autoencoders
13
https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ
• Kind of Autoencoder (Feature
Extractor Neural Net)
• Detects inherent patterns in
data
• Unsupervised
• Good for real-world
problems
• Both Shallow and deep
Deep Learning for NLP
14
• Use of vectors
– dense low-dimensional real-valued vectors
• Continuous Bag of Words
• Skip Gram Model
• Two popular tools: Word2Vec, Glove
– One-hot vector
• Size of the entire vocabulary
• Very large sparse vector
https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ
Continuous Bag of Words
15
Context Words Target word
Skip Gram Model
16
Target Word Context Words
Deep Belief Net
17
• Stack of RBMs
• Identical to MLP in terms of network structure
• Different Training:
– Pre-training
– Fine-tuning
• Small labeled dataset
• Reasonable training time
• Very accurate
• Image Recognition
Convolutional Neural Networks
18
1. Convolutional layer
2. ReLU layer
3. Pooling Layer
4. Fully Connected Layer
• Supervised
• Large amount of labeled
data for training
Convolutional Neural Networks
19
– CNN performs quite well on NLP problems.
• Although we do not have the nice intuition that we have for
image recognition
– Text Processing (Sentiment Analysis and Text
Categorization)
• Word-level
• Character-level:
– Very attractive for user-generated contents with typos and new
vocabularies
– Models can be fine-tuned from a task A with large corpus to a
more targeted task with smaller corpus
– Learning directly from character-level input (needs millions of
examples)
– Learning from pre-trained character embeddings
Recurrent Neural Nets
20
http://www.wildml.com/2015/09/recurrent-neural-networks-
tutorial-part-1-introduction-to-rnns/
• Not Feedforward
• Sequence of values as input
• Sequence of values as output
• Stacking RNNs on top of each other
Recurrent Neural Nets
21
• Extremely difficult to train
– Exponential Vanishing Gradient Problem
• RNN with n time steps = n layers MLP
– Solution:
• LSTM/GRU: Helps the net to decide when to forget the current
input and when to remember it for the future time steps.
• Good for:
– Time Series Analysis (Forecasting)
– Machine Translation
– Text Processing (Parsing, NER, Sentiment Analysis)
• Word-level
• Character-level
Recursive Neural Tensor Network
22
• Leave group:
• input
• Root group:
• class and score
Socher, Richard, et al. "Recursive deep models for semantic compositionality
over a sentiment treebank." Proceedings of the conference on empirical
methods in natural language processing (EMNLP). Vol. 1631. 2013.
References
23
1. LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature
521.7553 (2015): 436-444.
2. Socher, Richard, Yoshua Bengio, and Chris Manning. "Deep learning for NLP."
Tutorial at Association of Computational Logistics (ACL), 2012, and North American
Chapter of the Association of Computational Linguistics (NAACL) (2013).
3. Lee, Honglak. "Tutorial on deep learning and applications." NIPS 2010 Workshop
on Deep Learning and Unsupervised Feature Learning. 2010.
4. LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International
Conference on Machine Learning (ICML’13). 2013.
5. Socher, Richard, et al. "Recursive deep models for semantic compositionality over
a sentiment treebank." Proceedings of the conference on empirical methods in natural
language processing (EMNLP). Vol. 1631. 2013.
6. https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ
7. https://www.udacity.com/course/deep-learning--ud730
8. http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-
introduction-to-rnns/
Thank you 
Thank you, and please visit us at http://knoesis.org
monireh@knoesis.org
24

More Related Content

PDF
Learning to understand phrases by embedding the dictionary
PPTX
Introduction to natural language processing, history and origin
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
PDF
Phrase Structure Identification and Classification of Sentences using Deep Le...
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
PDF
Laird ibm-small
PPTX
Cognitive Science in Virtual Worlds
PDF
Recurrent Convolutional Neural Networks for Text Classification
Learning to understand phrases by embedding the dictionary
Introduction to natural language processing, history and origin
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Phrase Structure Identification and Classification of Sentences using Deep Le...
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Laird ibm-small
Cognitive Science in Virtual Worlds
Recurrent Convolutional Neural Networks for Text Classification

What's hot (20)

PDF
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
PPTX
Deep Learning Big Data Meetup @ Trondheim
PDF
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
DOCX
Character recognition project
PDF
(Deep) Neural Networks在 NLP 和 Text Mining 总结
PDF
Frontiers of Natural Language Processing
PDF
Semantic Segmentation of Driving Behavior Data: Double Articulation Analyzer ...
PDF
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
DOC
Technical Paper.doc.doc
DOC
syllabus-IS.doc
PDF
Project report - Bengali digit recongnition using SVM
PDF
Teachbot teaching robot_using_artificial
PDF
Study on Different Human Emotions Using Back Propagation Method
PDF
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
PDF
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
PDF
SCTUR: A Sentiment Classification Technique for URDU
PDF
Intro deep learning
PDF
Natural language processing
PPTX
Artificial Neural Network / Hand written character Recognition
PPTX
Neural word embedding and language modelling
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
Deep Learning Big Data Meetup @ Trondheim
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Character recognition project
(Deep) Neural Networks在 NLP 和 Text Mining 总结
Frontiers of Natural Language Processing
Semantic Segmentation of Driving Behavior Data: Double Articulation Analyzer ...
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Technical Paper.doc.doc
syllabus-IS.doc
Project report - Bengali digit recongnition using SVM
Teachbot teaching robot_using_artificial
Study on Different Human Emotions Using Back Propagation Method
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
SCTUR: A Sentiment Classification Technique for URDU
Intro deep learning
Natural language processing
Artificial Neural Network / Hand written character Recognition
Neural word embedding and language modelling
Ad

Similar to Semantic, Cognitive and Perceptual Computing -Deep learning (20)

PDF
Deep Learning, an interactive introduction for NLP-ers
PDF
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
PDF
Introduction to Deep Learning: Concepts, Architectures, and Applications
PPTX
Introduction to deep learning
PPTX
Deep Learning - A Literature survey
PPTX
Introduction of Machine learning and Deep Learning
PPTX
final-day1-july2.pptx
PDF
Deep Learning for NLP: An Introduction to Neural Word Embeddings
PDF
MILA DL & RL summer school highlights
PPTX
Deep Learning
PPTX
Introduction to Deep learning
PDF
Deep learning 1
PDF
Deep analytics via learning to reason
PDF
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
PPT
How do we know what we don’t know: Using the Neuroscience Information Framew...
PDF
An Introduction to Deep Learning
PPTX
Looking for Commonsense in the Semantic Web
PDF
Big Data Malaysia - A Primer on Deep Learning
PDF
MLIP - Chapter 3 - Introduction to deep learning
PPTX
Standards for Smart Learning Environments
Deep Learning, an interactive introduction for NLP-ers
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Introduction to Deep Learning: Concepts, Architectures, and Applications
Introduction to deep learning
Deep Learning - A Literature survey
Introduction of Machine learning and Deep Learning
final-day1-july2.pptx
Deep Learning for NLP: An Introduction to Neural Word Embeddings
MILA DL & RL summer school highlights
Deep Learning
Introduction to Deep learning
Deep learning 1
Deep analytics via learning to reason
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
How do we know what we don’t know: Using the Neuroscience Information Framew...
An Introduction to Deep Learning
Looking for Commonsense in the Semantic Web
Big Data Malaysia - A Primer on Deep Learning
MLIP - Chapter 3 - Introduction to deep learning
Standards for Smart Learning Environments
Ad

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Basic Mud Logging Guide for educational purpose
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Pre independence Education in Inndia.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Cell Types and Its function , kingdom of life
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Classroom Observation Tools for Teachers
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Microbial diseases, their pathogenesis and prophylaxis
TR - Agricultural Crops Production NC III.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Basic Mud Logging Guide for educational purpose
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Pre independence Education in Inndia.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Anesthesia in Laparoscopic Surgery in India
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Sports Quiz easy sports quiz sports quiz
Cell Types and Its function , kingdom of life
Final Presentation General Medicine 03-08-2024.pptx
VCE English Exam - Section C Student Revision Booklet
Abdominal Access Techniques with Prof. Dr. R K Mishra
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Classroom Observation Tools for Teachers
Pharma ospi slides which help in ospi learning
Pharmacology of Heart Failure /Pharmacotherapy of CHF

Semantic, Cognitive and Perceptual Computing -Deep learning

  • 1. Brief Overview of Deep Networks Monireh Ebrahimi Semantic Cognitive Perceptual Computing Course, July 2016. Ohio Center of Excellence in Knowledge-enabled Computing(Kno.e.sis), Wright State University, USA 1
  • 2. • “Representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level. “ What is deep learning? 2 LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553 (2015): 436-444.
  • 3. Successive model layers learn deeper intermediate representations. Lee, Honglak. "Tutorial on deep learning and applications." NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning. 2010. 3 What is deep learning? Learning Hierarchical Representations Socher, Richard, Yoshua Bengio, and Chris Manning. "Deep learning for NLP." Tutorial at Association of Computational Logistics (ACL), 2012, and North American Chapter of the Association of Computational Linguistics (NAACL) (2013).
  • 4. • Image recognition: Pixel → edge → texton → motif → part → object • Text: Character → word → word group → clause → sentence → story • Speech: Sample → spectral band → sound → … phone → phoneme → word What is deep learning? Learning Hierarchical Representations 4 LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International Conference on Machine Learning (ICML’13). 2013.
  • 5. • Does not require any manual Feature Engineering • Deep architectures work well (vision, audio, NLP, etc.)! – Speech Recognition(2009) – Computer Vision (2012) • Early in 2015, a machine was able to beat the human at an object recognition challenge for the first time in the history of AI. – Machine Translation (2014) Why go deep? 5 LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553 (2015): 436-444.
  • 6. • Loosely inspired by biological neural networks (the central nervous system of animals), particularly brain Biologically inspired: how does the cortex learn perception? 6
  • 7. • which details are important? • For airplanes, feathers and wing flapping weren't crucial • What is the equivalent of aerodynamics for understanding intelligence? “Let's be inspired by nature, but not too much” 7 LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International Conference on Machine Learning (ICML’13). 2013.
  • 8. • Retina - LGN - V1 - V2 - V4 - PIT – AIT • Lots of intermediate representations Biologically Inspired: The Mammalian Visual Cortex is Hierarchical. 8 [picture from Simon Thorpe] LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International Conference on Machine Learning (ICML’13). 2013.
  • 9. All models are WRONG, but some are USEFUL. 9
  • 11. Why now? 11 • Vanishing Gradient Problem • Lots of data • GPUs
  • 12. RBM (Restricted Boltzman Machine) 12 • Solution to Vanishing Gradient Problem • Reconstruct the input and learn the features in this process. https://www.youtube.com/channel/UC9OeZkIwhzfv- _Cb7fCikLQ
  • 13. Autoencoders 13 https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ • Kind of Autoencoder (Feature Extractor Neural Net) • Detects inherent patterns in data • Unsupervised • Good for real-world problems • Both Shallow and deep
  • 14. Deep Learning for NLP 14 • Use of vectors – dense low-dimensional real-valued vectors • Continuous Bag of Words • Skip Gram Model • Two popular tools: Word2Vec, Glove – One-hot vector • Size of the entire vocabulary • Very large sparse vector https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ
  • 15. Continuous Bag of Words 15 Context Words Target word
  • 16. Skip Gram Model 16 Target Word Context Words
  • 17. Deep Belief Net 17 • Stack of RBMs • Identical to MLP in terms of network structure • Different Training: – Pre-training – Fine-tuning • Small labeled dataset • Reasonable training time • Very accurate • Image Recognition
  • 18. Convolutional Neural Networks 18 1. Convolutional layer 2. ReLU layer 3. Pooling Layer 4. Fully Connected Layer • Supervised • Large amount of labeled data for training
  • 19. Convolutional Neural Networks 19 – CNN performs quite well on NLP problems. • Although we do not have the nice intuition that we have for image recognition – Text Processing (Sentiment Analysis and Text Categorization) • Word-level • Character-level: – Very attractive for user-generated contents with typos and new vocabularies – Models can be fine-tuned from a task A with large corpus to a more targeted task with smaller corpus – Learning directly from character-level input (needs millions of examples) – Learning from pre-trained character embeddings
  • 20. Recurrent Neural Nets 20 http://www.wildml.com/2015/09/recurrent-neural-networks- tutorial-part-1-introduction-to-rnns/ • Not Feedforward • Sequence of values as input • Sequence of values as output • Stacking RNNs on top of each other
  • 21. Recurrent Neural Nets 21 • Extremely difficult to train – Exponential Vanishing Gradient Problem • RNN with n time steps = n layers MLP – Solution: • LSTM/GRU: Helps the net to decide when to forget the current input and when to remember it for the future time steps. • Good for: – Time Series Analysis (Forecasting) – Machine Translation – Text Processing (Parsing, NER, Sentiment Analysis) • Word-level • Character-level
  • 22. Recursive Neural Tensor Network 22 • Leave group: • input • Root group: • class and score Socher, Richard, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." Proceedings of the conference on empirical methods in natural language processing (EMNLP). Vol. 1631. 2013.
  • 23. References 23 1. LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553 (2015): 436-444. 2. Socher, Richard, Yoshua Bengio, and Chris Manning. "Deep learning for NLP." Tutorial at Association of Computational Logistics (ACL), 2012, and North American Chapter of the Association of Computational Linguistics (NAACL) (2013). 3. Lee, Honglak. "Tutorial on deep learning and applications." NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning. 2010. 4. LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International Conference on Machine Learning (ICML’13). 2013. 5. Socher, Richard, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." Proceedings of the conference on empirical methods in natural language processing (EMNLP). Vol. 1631. 2013. 6. https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ 7. https://www.udacity.com/course/deep-learning--ud730 8. http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1- introduction-to-rnns/
  • 24. Thank you  Thank you, and please visit us at http://knoesis.org monireh@knoesis.org 24

Editor's Notes

  • #3: With the composition of enough such transformations, very complex functions can be learned. For classification tasks, higher layers of representation amplify aspects of the input that are important for discrimination and suppress irrelevant variations.
  • #8: It's nice imitate nature, Which details are merely the result of evolution, and the constraints of biochemistry?
  • #12: Vanishing Gradient Problem: One of the reasons that NNs were not that much successful before. Solved in 2006-2007 by Benjio, Le Cun, Hinton papers, 3 papers, breakthrough in deep learning Neural networks, big come back with deep learning
  • #13: 1- forward: an RBM takes input and translates them into a set of numbers that encode the inputs 2- backward: takes this set of numbers and translates them back to form the reconstructed inputs. 3- At the visible layer, the reconstruction
  • #14: Deep Autoencoders are extremely useful tools for dimensionality reduction An autoencoder is a neural net that takes a set of typically unlabeled inputs, and after decoding them, tries to reconstruct them as accurately as possible. As result of this, then net must decide which of the data features are the most important, essentially acting as a feature extraction engine.
  • #15: The fundamental difference between deep learning and traditional NLP methods is the use of dense vectors.
  • #16: word2vec map a word into a 1D vector whose size is  some fixed size chosen empirically(N), that is the number of nodes in the hidden layer also. Indeed, after training the neural network, for each word in the input layer, all the weights to the hidden layer of dimension N is learned. So for each word you have 1*N vector of weights that is its vector representation (so-called the real-value dense low-dimensional(1*N) vector representation of that word ).  What the neural net takes as input is the one-hot vector of size V*1, so in each iteration only one word in the NN input is 1. What NN does in that iteration is changing all the output vectors (that is updating all the weights between the hidden layer and the output layer (1*N vector for each word)) in a way that all the output words that can co-occur with the input word become more similar to the input word (that is activated by being 1) and all other words in the output that cannot be in the input word context more dissimilar. Similarly the vector for the input (that is activated by being 1) will be updated in a way that the input word will become more similar to its context words in output. What we mean by input vector is the weight between input layer and hidden layer that is 1*N. After running the algorithm many times we have 2 choices: using 1*N  vector of weight from each word from input layer to its hidden layer as its vector representation or choose the 1*N vector of weights from the word form output layer to the hidden layer as the 1*N vector representation of that word. Empirically the use the first choice. So what we use a  vector representation of one word from word2vec is nothing but the 1*N vector of weights from that word in input layer to the hidden layer.
  • #18: We do not start backpropagation until we already have sensible weights that already do well at the task. – So the initial gradients are sensible and backprop only needs to perform a local search. [https://www.cs.toronto.edu/~hinton/nipstutorial/nipstut3.pdf] https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ
  • #19: ReLU: for the vanishing gradient problem Pooling layer: for dimensionality reduction
  • #22: Words in the source language: input Words in the target language: output