SlideShare a Scribd company logo
Long Short Term Memory
(LSTM)
1
Mehrnaz Faraz
Faculty of Electrical Engineering
K. N. Toosi University of Technology
Milad Abbasi
Faculty of Electrical Engineering
Sharif University of Technology
Contents
• Introduction
• Vanishing/Exploding Gradient Problem
• Long Short Term Memory
• LSTM Variations
• CNN-LSTM
• BiLSTM
• Fuzzy-LSTM
2
Introduction
• LSTM is a kind of RNN.
• LSTM is capable of learning long term dependencies.
3
An unrolled recurrent neural network
ℎ 𝑡 ℎ0 ℎ1 ℎ2 ℎ 𝑡
Introduction
• RNN is unable to learn to connect the information in large gap.
• LSTM don’t have large gap problem.
4
 The clouds are in the sky.
 I grew up in France. … I speak fluent French.
ℎ 𝑡 ℎ 𝑡+1 ℎ 𝑡+2ℎ2ℎ1ℎ0
Introduction
• Using LSTM:
– Robot control
– Time series prediction
– Speech recognition
– Rhythm learning
– Music composition
– Grammar learning
– Handwriting recognition
– Human action recognition
– End to end translation
5
Introduction
• Using LSTM:
– Google
• Speech recognition on the smartphone
• Smart assistant Allo
– Amazon
• Amazon Alexa
6
Introduction
– Apple
• Quick type function on the iphone and siri
– Microsoft
• End to end speech translation
7
Automatic
speech
recognition
See you
later
See you
later
See
you
later
头
回
见
回头见
Text to speech
SpeechSpeech
Machine translation
True text
Vanishing Gradient
• RNN:
• Sharing the same parameters at all time steps
• Occurs in time-series with long-term dependencies
8
𝑊𝑖𝑛 𝑊𝑖𝑛𝑊𝑖𝑛𝑊𝑖𝑛𝑊𝑖𝑛
𝑊𝑜𝑢𝑡 𝑊𝑜𝑢𝑡𝑊𝑜𝑢𝑡𝑊𝑜𝑢𝑡𝑊𝑜𝑢𝑡
𝑊𝑟𝑒𝑐 𝑊𝑟𝑒𝑐𝑊𝑟𝑒𝑐𝑊𝑟𝑒𝑐
Time
Feed forward
Vanishing Gradient
9
𝑊𝑖𝑛 𝑊𝑖𝑛𝑊𝑖𝑛𝑊𝑖𝑛
𝑊𝑜𝑢𝑡
𝑊𝑟𝑒𝑐 𝑊𝑟𝑒𝑐𝑊𝑟𝑒𝑐
Time
Back propagation
3 3 3 3
3 3
ˆ
ˆrec rec
E E y s
w y s w
   
  
   
 3 3 2tanh in recs w x w s 
3
3 3 3 3
0 3 3
ˆ
ˆ
k
krec k rec
E E y s s
w y s s w
    
   
    

Vanishing Gradient
10
3
3 3 3 3
0 3 3
ˆ
ˆ
k
krec k rec
E E y s s
w y s s w
    
   
    

 3 3 2tanh in recs w x w s 
3 2 3 2 1
2 1 2 1 0
3 3 3 3 2 3 3 3 1 3 3 3 0
3 3 2 3 3 1 3 3 0
ˆ ˆ ˆ
ˆ ˆ ˆrec rec rec rec
s s s s s
s s s s s
E E y s s E y s s E y s s
w y s s w y s s w y s s w
    
  
    
            
           
            
33
3 3 3
0 13 3 1
ˆ
ˆ
j k
k j krec j rec
sE E y s
w y s s w   
    
          
 
   
'
tanh 0,1
Exploding Gradient
• Increasing weights
• Large errors
• Unstability in the network
11
𝑊𝑖𝑛 𝑊𝑖𝑛𝑊𝑖𝑛𝑊𝑖𝑛
𝑊𝑜𝑢𝑡
𝑊𝑟𝑒𝑐 𝑊𝑟𝑒𝑐𝑊𝑟𝑒𝑐
Time
Long Short Term Memory
12
– Vanilla RNN:
 1,t t th fw h x
 1tanht hh t xh th w h w x Example:
Long Short Term Memory
• Difference between RNN and LSTM:
– RNN: single layer (tanh)
– LSTM: 4 interactive layers
13
Long Short Term Memory
• Vanilla LSTM:
14
Weights are all the same, only inputs change
Long Short Term Memory
• Cell state:
– Like a conveyor belt
– It runs straight down the entire chain
– LSTM can remove or add information to the cell state
15
Long memory
Long Short Term Memory
• Gates:
– A way to optionally let information through
– They are composed out of:
• A sigmoid neural net layer
• A pointwise multiplication operation
16
Long Short Term Memory
• An LSTM has three of these gates, to protect and control
the cell state:
– Forget gate layer
– Input gate layer
– Output gate layer
17
Keep gate
Write gate
Read gate
Long Short Term Memory
• Forget information:
– Decide what information throw away from the cell state
– Forget gate layer:
• Output a number between 0 and 1
18
Long Short Term Memory
• Add new information:
– Decide what new information store in the cell state
– Input gate layer:
• Decides which values we’ll update
– Tanh layer:
• creates a vector of new candidate values, 𝐶𝑡
19
Long Short Term Memory
• Update cell state:
– Forgetting the things we decided to forget earlier:
– Adding information we decide to add:
20
1t tf C 
t ti C
Long Short Term Memory
• Create output:
– Decide what we’re going to output
– Output gate layer:
• Decides what parts of the cell state we’re going to output
– Tanh layer:
• Push the values between -1 and +1
21
Shadow state/ Short memory
Long Short Term Memory
• Conclusion:
– Step 1: Forget gate layer
– Step 2: Input gate layer
– Step 3: Combine step 1 & 2
– Step 4: Output the cell state
22
LSTM Variations (1)
• Peephole:
– Let the gate layer look at the cell state (entire/ partial)
23
LSTM Variations (2)
• Coupled forgot and input gates:
– Not deciding separately
24
LSTM Variations (3)
• Gated Recurrent Unit (GRU):
– Combine the forget and input layer into a single “update
gate”
– Merge the cell state and the hidden state
– Simpler and popular
25
LSTM Variations Comparison
• They’re all about the same in performance
• We can reduce the number of parameters and the
computational cost by:
– Coupling the input and forget gates (GRU, Variation #2)
– Removing peephole connections (Vanilla LSTM)
26
Greff, K., et al. (2017). LSTM: A search space odyssey. IEEE transactions on neural networks and
learning systems.
CNN-LSTM
• Is an LSTM architecture specifically designed for sequence
prediction problems with spatial inputs, like images or
videos.
27
Feature Extraction Sequence Prediction
CNN-LSTM
• Using CNN-LSTM:
– Activity recognition
– Image description
– Video description
28
Bidirectional LSTM
• Training Information travels in both forward and backward
directions
• Remembers complex long term dependencies better.
• Using BiLSTM:
29
𝑦𝑡−1 𝑦𝑡 𝑦𝑡+1 𝑦 𝑇
𝑥 𝑡−1 𝑥 𝑡
𝑥 𝑡+1 𝑥 𝑇
Outputs
Activation layer
Backward layer
Forward layer
Inputs
Fuzzy-LSTM
•
30
31
Thank you

More Related Content

PPTX
Recurrent Neural Networks (RNNs)
PPTX
PDF
LSTM Basics
PDF
Recurrent neural networks rnn
PDF
PDF
Introduction to Recurrent Neural Network
PDF
Long Short Term Memory
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Networks (RNNs)
LSTM Basics
Recurrent neural networks rnn
Introduction to Recurrent Neural Network
Long Short Term Memory
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...

What's hot (20)

PDF
LSTM Tutorial
PDF
Rnn and lstm
PDF
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
PDF
Recurrent Neural Networks, LSTM and GRU
PDF
Deep Learning - Convolutional Neural Networks
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
PDF
An introduction to Deep Learning
PDF
Introduction to Recurrent Neural Network
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
PPTX
Deep neural networks
PDF
Convolutional Neural Networks (CNN)
PPTX
1.Introduction to deep learning
PDF
Deep learning - A Visual Introduction
PPTX
Feedforward neural network
PPTX
Transformers AI PPT.pptx
PPT
backpropagation in neural networks
PDF
Recurrent Neural Networks. Part 1: Theory
PPTX
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
PPTX
Recurrent Neural Network
LSTM Tutorial
Rnn and lstm
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks, LSTM and GRU
Deep Learning - Convolutional Neural Networks
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
An introduction to Deep Learning
Introduction to Recurrent Neural Network
Artificial Neural Networks Lect3: Neural Network Learning rules
Deep neural networks
Convolutional Neural Networks (CNN)
1.Introduction to deep learning
Deep learning - A Visual Introduction
Feedforward neural network
Transformers AI PPT.pptx
backpropagation in neural networks
Recurrent Neural Networks. Part 1: Theory
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
Recurrent Neural Network
Ad

Similar to Lstm (20)

PPTX
lstmhh hjhj uhujikj iijiijijiojijijijijiji
PPTX
RNN and LSTM model description and working advantages and disadvantages
PPTX
recurrent_neural_networks_april_2020.pptx
PDF
rnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PDF
lepibwp74jd2rz.pdf
PDF
Recurrent Neural Networks
PPTX
Long short term memory on tensorflow using python
PPTX
Long and short term memory presesntation
PDF
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
PDF
Recurrent Neural Networks (D2L8 Insight@DCU Machine Learning Workshop 2017)
PPTX
introduction to machine learning for students.pptx
PDF
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
PDF
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
PPT
14889574 dl ml RNN Deeplearning MMMm.ppt
PDF
Rnn presentation 2
PPTX
RNN-LSTM.pptx
PDF
Applying Deep Learning Machine Translation to Language Services
PPTX
Long Short Term Memory LSTM
PDF
Convolutional and Recurrent Neural Networks
PDF
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
lstmhh hjhj uhujikj iijiijijiojijijijijiji
RNN and LSTM model description and working advantages and disadvantages
recurrent_neural_networks_april_2020.pptx
rnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
lepibwp74jd2rz.pdf
Recurrent Neural Networks
Long short term memory on tensorflow using python
Long and short term memory presesntation
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
Recurrent Neural Networks (D2L8 Insight@DCU Machine Learning Workshop 2017)
introduction to machine learning for students.pptx
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
14889574 dl ml RNN Deeplearning MMMm.ppt
Rnn presentation 2
RNN-LSTM.pptx
Applying Deep Learning Machine Translation to Language Services
Long Short Term Memory LSTM
Convolutional and Recurrent Neural Networks
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Ad

Recently uploaded (20)

PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
additive manufacturing of ss316l using mig welding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Sustainable Sites - Green Building Construction
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPT
Mechanical Engineering MATERIALS Selection
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPT
Project quality management in manufacturing
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
composite construction of structures.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Embodied AI: Ushering in the Next Era of Intelligent Systems
additive manufacturing of ss316l using mig welding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Internet of Things (IOT) - A guide to understanding
Sustainable Sites - Green Building Construction
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Structs to JSON How Go Powers REST APIs.pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Mechanical Engineering MATERIALS Selection
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Lesson 3_Tessellation.pptx finite Mathematics
Project quality management in manufacturing
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
composite construction of structures.pdf
bas. eng. economics group 4 presentation 1.pptx
CH1 Production IntroductoryConcepts.pptx

Lstm

  • 1. Long Short Term Memory (LSTM) 1 Mehrnaz Faraz Faculty of Electrical Engineering K. N. Toosi University of Technology Milad Abbasi Faculty of Electrical Engineering Sharif University of Technology
  • 2. Contents • Introduction • Vanishing/Exploding Gradient Problem • Long Short Term Memory • LSTM Variations • CNN-LSTM • BiLSTM • Fuzzy-LSTM 2
  • 3. Introduction • LSTM is a kind of RNN. • LSTM is capable of learning long term dependencies. 3 An unrolled recurrent neural network ℎ 𝑡 ℎ0 ℎ1 ℎ2 ℎ 𝑡
  • 4. Introduction • RNN is unable to learn to connect the information in large gap. • LSTM don’t have large gap problem. 4  The clouds are in the sky.  I grew up in France. … I speak fluent French. ℎ 𝑡 ℎ 𝑡+1 ℎ 𝑡+2ℎ2ℎ1ℎ0
  • 5. Introduction • Using LSTM: – Robot control – Time series prediction – Speech recognition – Rhythm learning – Music composition – Grammar learning – Handwriting recognition – Human action recognition – End to end translation 5
  • 6. Introduction • Using LSTM: – Google • Speech recognition on the smartphone • Smart assistant Allo – Amazon • Amazon Alexa 6
  • 7. Introduction – Apple • Quick type function on the iphone and siri – Microsoft • End to end speech translation 7 Automatic speech recognition See you later See you later See you later 头 回 见 回头见 Text to speech SpeechSpeech Machine translation True text
  • 8. Vanishing Gradient • RNN: • Sharing the same parameters at all time steps • Occurs in time-series with long-term dependencies 8 𝑊𝑖𝑛 𝑊𝑖𝑛𝑊𝑖𝑛𝑊𝑖𝑛𝑊𝑖𝑛 𝑊𝑜𝑢𝑡 𝑊𝑜𝑢𝑡𝑊𝑜𝑢𝑡𝑊𝑜𝑢𝑡𝑊𝑜𝑢𝑡 𝑊𝑟𝑒𝑐 𝑊𝑟𝑒𝑐𝑊𝑟𝑒𝑐𝑊𝑟𝑒𝑐 Time Feed forward
  • 9. Vanishing Gradient 9 𝑊𝑖𝑛 𝑊𝑖𝑛𝑊𝑖𝑛𝑊𝑖𝑛 𝑊𝑜𝑢𝑡 𝑊𝑟𝑒𝑐 𝑊𝑟𝑒𝑐𝑊𝑟𝑒𝑐 Time Back propagation 3 3 3 3 3 3 ˆ ˆrec rec E E y s w y s w             3 3 2tanh in recs w x w s  3 3 3 3 3 0 3 3 ˆ ˆ k krec k rec E E y s s w y s s w               
  • 10. Vanishing Gradient 10 3 3 3 3 3 0 3 3 ˆ ˆ k krec k rec E E y s s w y s s w                 3 3 2tanh in recs w x w s  3 2 3 2 1 2 1 2 1 0 3 3 3 3 2 3 3 3 1 3 3 3 0 3 3 2 3 3 1 3 3 0 ˆ ˆ ˆ ˆ ˆ ˆrec rec rec rec s s s s s s s s s s E E y s s E y s s E y s s w y s s w y s s w y s s w                                                    33 3 3 3 0 13 3 1 ˆ ˆ j k k j krec j rec sE E y s w y s s w                          ' tanh 0,1
  • 11. Exploding Gradient • Increasing weights • Large errors • Unstability in the network 11 𝑊𝑖𝑛 𝑊𝑖𝑛𝑊𝑖𝑛𝑊𝑖𝑛 𝑊𝑜𝑢𝑡 𝑊𝑟𝑒𝑐 𝑊𝑟𝑒𝑐𝑊𝑟𝑒𝑐 Time
  • 12. Long Short Term Memory 12 – Vanilla RNN:  1,t t th fw h x  1tanht hh t xh th w h w x Example:
  • 13. Long Short Term Memory • Difference between RNN and LSTM: – RNN: single layer (tanh) – LSTM: 4 interactive layers 13
  • 14. Long Short Term Memory • Vanilla LSTM: 14 Weights are all the same, only inputs change
  • 15. Long Short Term Memory • Cell state: – Like a conveyor belt – It runs straight down the entire chain – LSTM can remove or add information to the cell state 15 Long memory
  • 16. Long Short Term Memory • Gates: – A way to optionally let information through – They are composed out of: • A sigmoid neural net layer • A pointwise multiplication operation 16
  • 17. Long Short Term Memory • An LSTM has three of these gates, to protect and control the cell state: – Forget gate layer – Input gate layer – Output gate layer 17 Keep gate Write gate Read gate
  • 18. Long Short Term Memory • Forget information: – Decide what information throw away from the cell state – Forget gate layer: • Output a number between 0 and 1 18
  • 19. Long Short Term Memory • Add new information: – Decide what new information store in the cell state – Input gate layer: • Decides which values we’ll update – Tanh layer: • creates a vector of new candidate values, 𝐶𝑡 19
  • 20. Long Short Term Memory • Update cell state: – Forgetting the things we decided to forget earlier: – Adding information we decide to add: 20 1t tf C  t ti C
  • 21. Long Short Term Memory • Create output: – Decide what we’re going to output – Output gate layer: • Decides what parts of the cell state we’re going to output – Tanh layer: • Push the values between -1 and +1 21 Shadow state/ Short memory
  • 22. Long Short Term Memory • Conclusion: – Step 1: Forget gate layer – Step 2: Input gate layer – Step 3: Combine step 1 & 2 – Step 4: Output the cell state 22
  • 23. LSTM Variations (1) • Peephole: – Let the gate layer look at the cell state (entire/ partial) 23
  • 24. LSTM Variations (2) • Coupled forgot and input gates: – Not deciding separately 24
  • 25. LSTM Variations (3) • Gated Recurrent Unit (GRU): – Combine the forget and input layer into a single “update gate” – Merge the cell state and the hidden state – Simpler and popular 25
  • 26. LSTM Variations Comparison • They’re all about the same in performance • We can reduce the number of parameters and the computational cost by: – Coupling the input and forget gates (GRU, Variation #2) – Removing peephole connections (Vanilla LSTM) 26 Greff, K., et al. (2017). LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems.
  • 27. CNN-LSTM • Is an LSTM architecture specifically designed for sequence prediction problems with spatial inputs, like images or videos. 27 Feature Extraction Sequence Prediction
  • 28. CNN-LSTM • Using CNN-LSTM: – Activity recognition – Image description – Video description 28
  • 29. Bidirectional LSTM • Training Information travels in both forward and backward directions • Remembers complex long term dependencies better. • Using BiLSTM: 29 𝑦𝑡−1 𝑦𝑡 𝑦𝑡+1 𝑦 𝑇 𝑥 𝑡−1 𝑥 𝑡 𝑥 𝑡+1 𝑥 𝑇 Outputs Activation layer Backward layer Forward layer Inputs