Sequence Models
Roozbeh Sanaei
2
Roozbeh Sanaei https://towardsdatascience.com/an-overview-for-text-representations-in-nlp-311253730af1
One-Hot Word Representations
3
Roozbeh Sanaei https://en.wikipedia.org/wiki/File:Recurrent_neural_network_unfold.svg
Recurrent Neural Networks
4
Roozbeh Sanaei
https://www.semanticscholar.org/paper/Backpropagation-through-time-and-the-brain-Lillicrap-
Santoro/42ce761c85bdb0d422917b03751ab9cbc72a3417/figure/0
Backpropagation through time
5
Roozbeh Sanaei https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks
Different types of RNNs
One to One One to Many Many to One
Many to Many (Same size)
Many to Many (Different size)
6
Roozbeh Sanaei
https://lorenlugosch.github.io/posts/2019/02/seq2seq/
Sequence models as Markov Decision Processes
𝑝𝜃 𝒚 𝒙 = 𝑝𝜃 𝑦1, 𝑦2, 𝑦3, … , 𝑦𝑁 𝒙 =
𝑛=1
𝑁
𝑝𝜃 𝑦𝑛 𝑦𝑛−1, 𝑦𝑛−2, … , 𝒙
7
Roozbeh Sanaei http://www.sunlab.org/teaching/cse6250/fall2222/lab/dl-rnn/
Vanishing vs exploding gradient problem
8
Roozbeh Sanaei https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be
Idea behind GRU and LSTM
• The cell state is kind of like a conveyor belt.
• It runs straight down the entire chain, with only some minor linear
interactions. It’s very easy for information to just flow along it
unchanged.
• Gates are a way to optionally let information through.
9
Roozbeh Sanaei http://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTM
Forget gate: Decide which information to throw away from the cell state.
Input gate: Decide which information to store to the cell state
Update: Update the cell state scaled by input and forget gates.
Output: Output based on the updated cell state.
10
Roozbeh Sanaei
https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be
GRU
Update
gate
Reset
gate
New
info
The update gate decides how much of the past information needs to be passed along
to the future.
The reset gate is used from the model to decide how much of the past information to
forget
11
Roozbeh Sanaei
Bidirectional RNN
• Independent RNNs
• Outputs of are concatenated at each time step
• Allows the networks to have both backward and forward information
https://towardsdatascience.com/understanding-bidirectional-
rnn-in-pytorch-5bd25a5dd66
12
Roozbeh Sanaei
Deep RNNs
https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-
recurrent-neural-networks
13
Roozbeh Sanaei
Featurized Word Representations
https://dzone.com/articles/introduction-to-word-vectors
14
Roozbeh Sanaei
Properties of word embeddings
https://towardsdatascience.com/word-embeddings-for-nlp-5b72991e01d4
15
Roozbeh Sanaei
Embedding Matrix
https://neuro.cs.ut.ee/the-use-of-embeddings-in-openai-five/
embedding Matrix Word embedding
Word Representations can be learned from large corpuses and be used
(or fine-tuned) on new tasks
16
Roozbeh Sanaei
Neural Language Model
https://x-wei.github.io/notes/xcs224n-lecture6.html
17
Roozbeh Sanaei
Word2Vec Sampling
http://mccormickml.com/2016/04/19/word2vec-tutorial-the-
skip-gram-model/
18
Roozbeh Sanaei
Negative Sampling
https://jalammar.github.io/illustrated-word2vec/
19
Roozbeh Sanaei
GloVe: Global Vectors for Word Representations
http://building-babylon.net/tag/glove/
20
Roozbeh Sanaei
Greedy Search vs Beam Search
𝐿𝑒𝑛𝑔𝑡ℎ 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 =
1
𝐿∝
21
Roozbeh Sanaei
Beam Search Failure Analysis
Ground Truth sentence likelihood turns out to be higher
→ 𝐵𝑒𝑎𝑚 𝑆𝑒𝑎𝑟𝑐ℎ 𝑖𝑠 𝑎𝑡 𝐹𝑎𝑢𝑙𝑡
Ground Truth sentence likelihood turns out to be lower
→ 𝑅𝑁𝑁 𝑀𝑜𝑑𝑒𝑙 𝑖𝑠 𝑎𝑡 𝐹𝑎𝑢𝑙𝑡
22
Roozbeh Sanaei
Bleu Score
R1: but thou shalt love thy neighbor as thyself
R2: but have love for your neighbor as for yourself
R3: but love your neighbors as you love yourself
D: but love other love friend for love yourself
D(but)=1
D(love)=3
D(other)=1
D(friend)=1
D(for)=1
D(yourself)=1
R(but)=1
R(love)=2 [appears twice in R3]
R(other)=0
R(friend)=0
R(for)=2 [appears twice in R2]
R(yourself)=1
MIN(D(but), R(but))=MIN(1, 1)=1
MIN(D(love), R(love))=MIN(3, 2)=2
MIN(D(other), R(other))=MIN(1, 0)=0
MIN(D(friend), R(friend))=MIN(1,0)=0
MIN(D(for), R(for))=MIN(1, 2)=1
MIN(D(yourself), R(yourself))=MIN(1,1)=1
Total= 5 Total = 5
Bleu Score = 5/8
https://towardsdatascience.com/nlp-metrics-made-simple-the-
bleu-score-b06b14fbdbc1
23
Roozbeh Sanaei
Attention
https://lilianweng.github.io/lil-log/2018/06/24/attention-
attention.html
24
Roozbeh Sanaei
Self-Attention
https://towardsai.net/p/nlp/getting-meaning-from-text-self-attention-step-by-step-video
https://jalammar.github.io/illustrated-transformer/
25
Roozbeh Sanaei
Self-Attention
https://towardsai.net/p/nlp/getting-meaning-from-text-self-attention-step-by-step-video
https://jalammar.github.io/illustrated-transformer/
26
Roozbeh Sanaei
Multi-Head Self-Attention
https://towardsai.net/p/nlp/getting-meaning-from-text-self-attention-step-by-step-video
https://jalammar.github.io/illustrated-transformer/
27
Roozbeh Sanaei
Attention benefits
https://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf
• Constant ‘path length’ between any two positions.
• Unbounded memory.
• Trivial to parallelize (per layer).
• Models Self-Similarity.
• Relative attention provides expressive timing, equivariance, and extends
naturally to graphs.
28
Roozbeh Sanaei
Transformer
https://towardsai.net/p/nlp/getting-meaning-from-text-self-
attention-step-by-step-video
• Transforms one sequence into another
one word at a time based on previous elements.
• During the training stage, each word is predicted
based on words before that in the sentence
according to Ground truth
• During the test stage, each word is predicted
based on the predicted words before that

More Related Content

PDF
Sequence Modelling with Deep Learning
PDF
Introduction to Synthetic Aperture Radar (SAR)
PPTX
HANDWRITTEN DIGIT RECOGNITIONppt1.pptx
PDF
Introduction to Transformers for NLP - Olga Petrova
PPT
Deep Learning
PPT
Game Playing in Artificial Intelligence
PPTX
HML: Historical View and Trends of Deep Learning
Sequence Modelling with Deep Learning
Introduction to Synthetic Aperture Radar (SAR)
HANDWRITTEN DIGIT RECOGNITIONppt1.pptx
Introduction to Transformers for NLP - Olga Petrova
Deep Learning
Game Playing in Artificial Intelligence
HML: Historical View and Trends of Deep Learning

What's hot (20)

PPTX
PPTX
Deep Learning - RNN and CNN
PDF
Convolutional Neural Networks (CNN)
PDF
Neural networks and deep learning
PPTX
Principal component analysis
PDF
LSTM Basics
PPT
Artificial neural network
PDF
Text prediction based on Recurrent Neural Network Language Model
PPT
"Why LiDAR?" Presentation
PDF
Markov decision process
PPS
Neural Networks
DOCX
Natural language processing
PDF
Deep learning - A Visual Introduction
PPTX
Ray tracing
PDF
Intro to Neural Networks
PPTX
RNN-LSTM.pptx
PPTX
Deep Learning A-Z™: Recurrent Neural Networks (RNN) - Module 3
PPTX
Learning occam razor
PDF
Continuous control with deep reinforcement learning (DDPG)
ODP
Image Processing with OpenCV
Deep Learning - RNN and CNN
Convolutional Neural Networks (CNN)
Neural networks and deep learning
Principal component analysis
LSTM Basics
Artificial neural network
Text prediction based on Recurrent Neural Network Language Model
"Why LiDAR?" Presentation
Markov decision process
Neural Networks
Natural language processing
Deep learning - A Visual Introduction
Ray tracing
Intro to Neural Networks
RNN-LSTM.pptx
Deep Learning A-Z™: Recurrent Neural Networks (RNN) - Module 3
Learning occam razor
Continuous control with deep reinforcement learning (DDPG)
Image Processing with OpenCV
Ad

Similar to Sequence models (20)

PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
PDF
5_RNN_LSTM.pdf
 
PDF
Cheatsheet recurrent-neural-networks
PDF
Deep learning for natural language embeddings
PPTX
Deep Learning for Natural Language Processing
PDF
Deep Learning for Information Retrieval
PDF
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
PPTX
Neural Networks with Focus on Language Modeling
PDF
AINL 2016: Nikolenko
PDF
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
PDF
Deep Learning for Natural Language Processing: Word Embeddings
PDF
Deep learning for nlp
PDF
Convolutional and Recurrent Neural Networks
PPTX
Gnerative AI presidency Module1_L4_LLMs_new.pptx
PPTX
Recurrent Neural Networks for Text Analysis
PDF
Visual-Semantic Embeddings: some thoughts on Language
PPTX
recurrent_neural_networks_april_2020.pptx
PDF
Representation Learning of Text for NLP
PDF
Anthiil Inside workshop on NLP
PPTX
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Advanced_NLP_with_Transformers_PPT_final 50.pptx
5_RNN_LSTM.pdf
 
Cheatsheet recurrent-neural-networks
Deep learning for natural language embeddings
Deep Learning for Natural Language Processing
Deep Learning for Information Retrieval
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Neural Networks with Focus on Language Modeling
AINL 2016: Nikolenko
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Deep Learning for Natural Language Processing: Word Embeddings
Deep learning for nlp
Convolutional and Recurrent Neural Networks
Gnerative AI presidency Module1_L4_LLMs_new.pptx
Recurrent Neural Networks for Text Analysis
Visual-Semantic Embeddings: some thoughts on Language
recurrent_neural_networks_april_2020.pptx
Representation Learning of Text for NLP
Anthiil Inside workshop on NLP
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Ad

Recently uploaded (20)

PPTX
recommendation Project PPT with details attached
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPTX
Business_Capability_Map_Collection__pptx
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PPT
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
New ISO 27001_2022 standard and the changes
PPTX
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
PDF
Microsoft 365 products and services descrption
PDF
Best Data Science Professional Certificates in the USA | IABAC
PPTX
statsppt this is statistics ppt for giving knowledge about this topic
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
ai agent creaction with langgraph_presentation_
PPTX
chrmotography.pptx food anaylysis techni
recommendation Project PPT with details attached
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Business_Capability_Map_Collection__pptx
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Topic 5 Presentation 5 Lesson 5 Corporate Fin
New ISO 27001_2022 standard and the changes
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
Microsoft 365 products and services descrption
Best Data Science Professional Certificates in the USA | IABAC
statsppt this is statistics ppt for giving knowledge about this topic
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
ai agent creaction with langgraph_presentation_
chrmotography.pptx food anaylysis techni

Sequence models