SlideShare a Scribd company logo
LSTM
(Long short-term Memory)
Cheng Zhan
Data Scientist
• We are the sum total of our experiences. None of us are the same as
we were yesterday, nor will be tomorrow. (B.J. Neblett)
• memorization: every time we gain new information, we store it for
future reference.
• combination: not all tasks are the same, so we couple our analytical
skills with a combination of our memorized, previous experiences to
reason about the world.
Human Learning
Outline
• Review of RNN
• SimpleRNN
• LSTM (forget, input and output gates)
• GRU (reset and update gates)
• LSTM
• Motivation
• Introduction
• Code example
Review of RNN
• Jack Ma is a Chinese business magnate, and his native language is ____
Review of RNN
• Jack Ma is a Chinese business magnate, and his native language is ____
Long Short Term Memory
Long Short Term Memory
Time series problem
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Summary
• Hidden state can be viewed as “memory”, which tries to capture
previous information.
• Output is determined by current input and all the “memory”
• Hidden stage cannot capture all the information
• Unlike CNN, RNN shares the same parameters W
Vanishing gradient example
Vanishing gradient example
Vanishing gradient plot
Solutions
• Adding Skip Connections through Time
• Removing Connections
• Changing Activation Functions
• LSTMs
Long-term dependencies
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Cell state and gate
Step 1
Step 2
Step 3
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
Long Short Term Memory
• Sample may refer to individual training examples. A “batch_size” variable is
hence the count of samples you sent to the neural network. That is, how
many different examples you feed at once to the neural network.
• Time Steps are ticks of time. It is how long in time each of your samples
are. For example, a sample can contain 128 time steps, where each time
steps could be a 30th of a second for signal processing. In Natural Language
Processing (NLP), a time step may be associated with a character, a word,
or a sentence, depending on the setup.
• Features are simply the number of dimensions we feed at each time steps.
For example in NLP, a word could be represented by 300 features
using word2vec. In the case of signal processing, let’s pretend that your
signal is 3D. That is, you have a X, a Y and a Z signal, such as an
accelerometer’s measurements on each axis. This means you would have
3 features sent at each time step for each sample.
• http://harinisuresh.com/2016/10/09/lstms/
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture10.pdf
• https://machinelearningmastery.com/time-series-prediction-lstm-
recurrent-neural-networks-python-keras/
• https://www.youtube.com/watch?v=xCGidAeyS4M
• https://www.youtube.com/watch?v=rTqmWlnwz_0

More Related Content

PPTX
PPTX
Long Short Term Memory LSTM
PPTX
RNN & LSTM: Neural Network for Sequential Data
PDF
PPTX
RNN-LSTM.pptx
PDF
LSTM Tutorial
PDF
Recurrent neural networks rnn
PDF
Recurrent Neural Networks. Part 1: Theory
Long Short Term Memory LSTM
RNN & LSTM: Neural Network for Sequential Data
RNN-LSTM.pptx
LSTM Tutorial
Recurrent neural networks rnn
Recurrent Neural Networks. Part 1: Theory

What's hot (20)

PPTX
PDF
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
PDF
Rnn and lstm
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
PDF
Introduction to Recurrent Neural Network
PDF
LSTM Basics
PDF
Deep Learning - Convolutional Neural Networks
PDF
Recurrent Neural Networks, LSTM and GRU
PDF
Introduction to Recurrent Neural Network
PPT
backpropagation in neural networks
PPTX
Introduction For seq2seq(sequence to sequence) and RNN
PPTX
1.Introduction to deep learning
PPTX
Convolutional Neural Network and Its Applications
PDF
An introduction to Deep Learning
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
PPTX
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
PPTX
Understanding RNN and LSTM
PPTX
Notes on attention mechanism
PPTX
Perceptron & Neural Networks
PPTX
Deep Learning - RNN and CNN
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Rnn and lstm
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Introduction to Recurrent Neural Network
LSTM Basics
Deep Learning - Convolutional Neural Networks
Recurrent Neural Networks, LSTM and GRU
Introduction to Recurrent Neural Network
backpropagation in neural networks
Introduction For seq2seq(sequence to sequence) and RNN
1.Introduction to deep learning
Convolutional Neural Network and Its Applications
An introduction to Deep Learning
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Understanding RNN and LSTM
Notes on attention mechanism
Perceptron & Neural Networks
Deep Learning - RNN and CNN
Ad

Similar to Long Short Term Memory (20)

PDF
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
PPTX
NextWordPrediction_ppt[1].pptx
PPTX
Implement LST perform LSTm stock Makrket Analysis
PDF
Recurrent Neural Networks (D2L8 Insight@DCU Machine Learning Workshop 2017)
PPTX
240219_RNN, LSTM code.pptxdddddddddddddddd
PPTX
recurrent_neural_networks_april_2020.pptx
PDF
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
PPTX
RNN and LSTM model description and working advantages and disadvantages
PDF
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
PDF
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
PPTX
Long Short Term Memory (Neural Networks)
PPTX
lstmhh hjhj uhujikj iijiijijiojijijijijiji
PDF
IRJET- Stock Market Cost Forecasting by Recurrent Neural Network on Long Shor...
PPTX
Long Short-Term Memory
PPTX
Long and short term memory presesntation
PDF
An Introduction to Long Short-term Memory (LSTMs)
PDF
Recurrent Neural Networks A Deep Dive in 2025.pdf
PPTX
10.0 SequenceModeling-merged-compressed_edited.pptx
PPTX
Natural Language Processing Advancements By Deep Learning: A Survey
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
NextWordPrediction_ppt[1].pptx
Implement LST perform LSTm stock Makrket Analysis
Recurrent Neural Networks (D2L8 Insight@DCU Machine Learning Workshop 2017)
240219_RNN, LSTM code.pptxdddddddddddddddd
recurrent_neural_networks_april_2020.pptx
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
RNN and LSTM model description and working advantages and disadvantages
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Long Short Term Memory (Neural Networks)
lstmhh hjhj uhujikj iijiijijiojijijijijiji
IRJET- Stock Market Cost Forecasting by Recurrent Neural Network on Long Shor...
Long Short-Term Memory
Long and short term memory presesntation
An Introduction to Long Short-term Memory (LSTMs)
Recurrent Neural Networks A Deep Dive in 2025.pdf
10.0 SequenceModeling-merged-compressed_edited.pptx
Natural Language Processing Advancements By Deep Learning: A Survey
Ad

More from Yan Xu (20)

PPTX
Kaggle winning solutions: Retail Sales Forecasting
PDF
Basics of Dynamic programming
PPTX
Walking through Tensorflow 2.0
PPTX
Practical contextual bandits for business
PDF
Introduction to Multi-armed Bandits
PDF
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
PDF
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
PDF
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
PDF
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
PDF
Introduction to Autoencoders
PPTX
State of enterprise data science
PDF
Deep Feed Forward Neural Networks and Regularization
PPTX
Linear algebra and probability (Deep Learning chapter 2&3)
PPTX
HML: Historical View and Trends of Deep Learning
PDF
Secrets behind AlphaGo
PPTX
Optimization in Deep Learning
PDF
Convolutional neural network
PDF
Introduction to Neural Network
PDF
Nonlinear dimension reduction
PDF
Mean shift and Hierarchical clustering
Kaggle winning solutions: Retail Sales Forecasting
Basics of Dynamic programming
Walking through Tensorflow 2.0
Practical contextual bandits for business
Introduction to Multi-armed Bandits
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
Introduction to Autoencoders
State of enterprise data science
Deep Feed Forward Neural Networks and Regularization
Linear algebra and probability (Deep Learning chapter 2&3)
HML: Historical View and Trends of Deep Learning
Secrets behind AlphaGo
Optimization in Deep Learning
Convolutional neural network
Introduction to Neural Network
Nonlinear dimension reduction
Mean shift and Hierarchical clustering

Recently uploaded (20)

PPTX
famous lake in india and its disturibution and importance
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Microbiology with diagram medical studies .pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPT
protein biochemistry.ppt for university classes
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
The scientific heritage No 166 (166) (2025)
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Biophysics 2.pdffffffffffffffffffffffffff
famous lake in india and its disturibution and importance
AlphaEarth Foundations and the Satellite Embedding dataset
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Microbiology with diagram medical studies .pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Phytochemical Investigation of Miliusa longipes.pdf
protein biochemistry.ppt for university classes
ECG_Course_Presentation د.محمد صقران ppt
The scientific heritage No 166 (166) (2025)
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
7. General Toxicologyfor clinical phrmacy.pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Cell Membrane: Structure, Composition & Functions
Biophysics 2.pdffffffffffffffffffffffffff

Long Short Term Memory

  • 2. • We are the sum total of our experiences. None of us are the same as we were yesterday, nor will be tomorrow. (B.J. Neblett) • memorization: every time we gain new information, we store it for future reference. • combination: not all tasks are the same, so we couple our analytical skills with a combination of our memorized, previous experiences to reason about the world. Human Learning
  • 3. Outline • Review of RNN • SimpleRNN • LSTM (forget, input and output gates) • GRU (reset and update gates) • LSTM • Motivation • Introduction • Code example
  • 4. Review of RNN • Jack Ma is a Chinese business magnate, and his native language is ____
  • 5. Review of RNN • Jack Ma is a Chinese business magnate, and his native language is ____
  • 13. Summary • Hidden state can be viewed as “memory”, which tries to capture previous information. • Output is determined by current input and all the “memory” • Hidden stage cannot capture all the information • Unlike CNN, RNN shares the same parameters W
  • 17. Solutions • Adding Skip Connections through Time • Removing Connections • Changing Activation Functions • LSTMs
  • 45. • Sample may refer to individual training examples. A “batch_size” variable is hence the count of samples you sent to the neural network. That is, how many different examples you feed at once to the neural network. • Time Steps are ticks of time. It is how long in time each of your samples are. For example, a sample can contain 128 time steps, where each time steps could be a 30th of a second for signal processing. In Natural Language Processing (NLP), a time step may be associated with a character, a word, or a sentence, depending on the setup. • Features are simply the number of dimensions we feed at each time steps. For example in NLP, a word could be represented by 300 features using word2vec. In the case of signal processing, let’s pretend that your signal is 3D. That is, you have a X, a Y and a Z signal, such as an accelerometer’s measurements on each axis. This means you would have 3 features sent at each time step for each sample.
  • 46. • http://harinisuresh.com/2016/10/09/lstms/ • http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture10.pdf • https://machinelearningmastery.com/time-series-prediction-lstm- recurrent-neural-networks-python-keras/ • https://www.youtube.com/watch?v=xCGidAeyS4M • https://www.youtube.com/watch?v=rTqmWlnwz_0