SlideShare a Scribd company logo
[course site]
Day 4 Lecture 2
Advanced Neural
Machine Translation
Marta R. Costa-jussà
2
Acknowledgments
Kyunghyun Cho, NVIDIA BLOGS:
https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-with-gpus/
3
From previous lecture...
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
Attention-based
mechanism
Read the whole sentence, then produce the translated words one at a
time, each time focusing on a different part of the input sentence
5
Encoder with attention: context vector
GOAL: Encode a source sentence into a set of
context vectors
http://www.deeplearningbook.org/contents/applications.html
6
Composing the context vector: bidirectional
RNN
7
Composing the context vector: bidirectional
RNN
8
Decoder with attention
● The context vector now concatenates forward and
reverse encoding vectors
● The decoder generates one symbol at a time based on
this new context set
To compute the new decoder memory state, we must get
one vector out of all context vectors.
9
Compute the context vector
Each time step t, ONE vector context (c_i) is computed
based on the (1) previous hidden state of the decoder
(z_(i-1)), (2) previously decoded symbol (u_(i-1)), (3) whole
context set (C)
10
Score each context vector based on how relevant it is
for translating the next target word
This scoring (h_j, j=1...T_x) is based
on the previous memory state, the
previous generated target word and
the j-th context vector
11
Score each context vector based on how relevant it is
for translating the next target word
fscore is usually a simple single-layer
feedforward network
this relevance score measures how
relevant the j-th context vector of the
source sentence is in deciding the
next symbol in the translation
12
Normalize relevance scores=attention weight
These attention weights correspond to how
much the decoder attends to each of the
context vectors.
13
Obtain the context vector c_i
as the weighted sum of the context vectors with
their weights being the attention weights
14
Update the decoder’s hidden state
(The initial hidden state is initialized
based on the last hidden state of the
reverse RNN)
1515
Decoder
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
RNN’s internal state zi
depends on: summary vector ht
,
previous output word ui-1
and previous internal state zi-1
.
NEW INTERNAL
STATE
From previous session
16
Translation performances comparison
English-to-French WMT 2014 task
Model BLEU
Simple Encoder-Decoder 17.82
+Attention-based 37.19
Phrase-based 37.03
17
What attention learns… WORD ALIGNMENT
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
18
What attention learns… WORD ALIGNMENT
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
19
Neural MT is better than phrase-based
Neural Network for Machine Translation at Production Scale
20
Results in WMT 2016 international evaluation
What Next?
22
Character-based Neural Machine
Translation: Motivation
■Word embeddings have been shown to boost the performance in many
NLP tasks, including machine translation.
■However, the standard look-up based embeddings are limited to a
finite-size vocabulary for both computational and sparsity reasons.
■The orthographic representation of the words is completely ignored.
■The standard learning process is blind to the presence of stems, prefixes,
suffixes and any other kind of affixes in words.
23
Character-based Neural MT:
Proposal (Step 1) ■The computation of the representation of each
word starts with a character-based embedding
layer that associates each word (sequence of
characters) with a sequence of vectors.
■This sequence of vectors is then processed
with a set of 1D convolution filters of different
lengths followed with a max pooling layer.
■For each convolutional filter, we keep only the
output with the maximum value. The
concatenation of these max values already
provides us with a representation of each word
as a vector with a fixed length equal to the total
number of convolutional kernels.
24
Character-based Neural MT:
Proposal (Step 2)
■The addition of two highway layers was
shown to improve the quality of the
language model in (Kim et al., 2016).
■The output of the second Highway layer
will give us the final vector representation
of each source word, replacing the
standard source word embedding in the
neural machine translation system.
architecture designed to
ease gradient-based
training of deep
networks
25
Character-based Neural MT:
Integration with NMT
26
Examples
27
Multilingual Translation
Kyunghyun Cho, “DL4MT slides” (2015)
28
Multilingual Translation Approaches
Sharing attention-based mechanism across language pairs
Orhan Firat et al, “Multi-way, Multilingual Neural Machine Translation with a Shared-based Mechanism”
(2016)
29
Multilingual Translation Approaches
Sharing attention-based mechanism across language pairs
Orhan Firat et al, “Multi-way, Multilingual Neural Machine Translation with a Shared-based Mechanism”
(2016)
Share encoder, decoder, attention accross language pairs
Johnson et al, “Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”
(2016)
https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
30
Is the system learning an Interlingua?
https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
31
Available software on github
DL4MT
NEMATUS
Most publications have open-source code...
32
Summary
● Attention-based mechanism allows to achieve
state-of-the-art results
● Progress in MT includes character-based, multilinguality...
33
Learn more
Natural Language Understanding with
Distributed Representation, Kyunghyun Cho,
Chapter 6, 2015 (available in github)
Thanks ! Q&A ?
https://www.costa-jussa.com
marta.ruiz@upc.edu

More Related Content

PDF
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
PDF
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
PDF
Moving to neural machine translation at google - gopro-meetup
PPTX
Deep Learning for Machine Translation, by Satoshi Enoue, SYSTRAN
PPTX
Deep Learning for Machine Translation
PDF
Deep learning for NLP and Transformer
PDF
Transformer Introduction (Seminar Material)
PPTX
Sequence to Sequence Learning with Neural Networks
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Moving to neural machine translation at google - gopro-meetup
Deep Learning for Machine Translation, by Satoshi Enoue, SYSTRAN
Deep Learning for Machine Translation
Deep learning for NLP and Transformer
Transformer Introduction (Seminar Material)
Sequence to Sequence Learning with Neural Networks

What's hot (20)

PDF
NLP using transformers
PPTX
Natural Language to Visualization by Neural Machine Translation
PPTX
Notes on attention mechanism
PPTX
[Paper Reading] Attention is All You Need
PDF
Icon18revrec sudeshna
PPTX
Introduction to Transformer Model
PPTX
RNN & LSTM: Neural Network for Sequential Data
PDF
Deep Learning & NLP: Graphs to the Rescue!
PPTX
PDF
Use CNN for Sequence Modeling
PPTX
Thomas Wolf "Transfer learning in NLP"
PDF
Introduction to Tree-LSTMs
PPTX
Neural machine translation by jointly learning to align and translate
PDF
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
PDF
Deep Learning: Recurrent Neural Network (Chapter 10)
PDF
Introduction to Transformers for NLP - Olga Petrova
PDF
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
PDF
Machine Translation Introduction
PDF
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
PPTX
Master Thesis of Computer Engineering: OpenTranslator
NLP using transformers
Natural Language to Visualization by Neural Machine Translation
Notes on attention mechanism
[Paper Reading] Attention is All You Need
Icon18revrec sudeshna
Introduction to Transformer Model
RNN & LSTM: Neural Network for Sequential Data
Deep Learning & NLP: Graphs to the Rescue!
Use CNN for Sequence Modeling
Thomas Wolf "Transfer learning in NLP"
Introduction to Tree-LSTMs
Neural machine translation by jointly learning to align and translate
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning: Recurrent Neural Network (Chapter 10)
Introduction to Transformers for NLP - Olga Petrova
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
Machine Translation Introduction
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
Master Thesis of Computer Engineering: OpenTranslator
Ad

Viewers also liked (18)

PDF
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
PDF
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
PDF
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
PDF
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
PDF
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
PDF
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
PDF
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
PDF
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
PDF
The Perceptron (D1L2 Deep Learning for Speech and Language)
PDF
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
PDF
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
PDF
Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and G...
PDF
Shuffle and learn: Unsupervised Learning using Temporal Order Verification (U...
PDF
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
PDF
Sequence Learning with CTC technique
PDF
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
PDF
The impact of visual saliency prediction in image classification
PDF
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech a...
Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Word Embeddings (D2L4 Deep Learning for Speech and Language UPC 2017)
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
The Perceptron (D1L2 Deep Learning for Speech and Language)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and G...
Shuffle and learn: Unsupervised Learning using Temporal Order Verification (U...
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
Sequence Learning with CTC technique
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
The impact of visual saliency prediction in image classification
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Ad

Similar to Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017) (20)

PDF
Building streaming pipelines for neural machine translation
PDF
Building a Neural Machine Translation System From Scratch
PDF
Cairo 2019-seminar
PDF
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
PDF
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
PDF
Multi-modal Neural Machine Translation - Iacer Calixto
PDF
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
PDF
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
PDF
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
PPTX
Neural machine translation by jointly learning to align and translate.pptx
PDF
Ai meetup Neural machine translation updated
PDF
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
PDF
N20181217
PDF
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
PDF
Tensor flow05 neural-machine-translation-seq2seq
PDF
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
PDF
IRJET- Applications of Artificial Intelligence in Neural Machine Translation
PDF
NMT with Attention-1.pdfhhhhhhhhhhhhhhhh
PDF
On using monolingual corpora in neural machine translation
PDF
cs224n natural language processing with deep learning cs224n
Building streaming pipelines for neural machine translation
Building a Neural Machine Translation System From Scratch
Cairo 2019-seminar
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
Multi-modal Neural Machine Translation - Iacer Calixto
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
Neural machine translation by jointly learning to align and translate.pptx
Ai meetup Neural machine translation updated
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
N20181217
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Tensor flow05 neural-machine-translation-seq2seq
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
IRJET- Applications of Artificial Intelligence in Neural Machine Translation
NMT with Attention-1.pdfhhhhhhhhhhhhhhhh
On using monolingual corpora in neural machine translation
cs224n natural language processing with deep learning cs224n

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
PDF
Deep Generative Learning for All
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
PDF
Open challenges in sign language translation and production
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
PDF
Intepretability / Explainable AI for Deep Neural Networks
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
PDF
Curriculum Learning for Recurrent Video Object Segmentation
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
The Transformer - Xavier Giró - UPC Barcelona 2021
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Open challenges in sign language translation and production
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Learn2Sign : Sign language recognition and translation using human keypoint e...
Intepretability / Explainable AI for Deep Neural Networks
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Curriculum Learning for Recurrent Video Object Segmentation
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Mega Projects Data Mega Projects Data
PDF
Business Analytics and business intelligence.pdf
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
1_Introduction to advance data techniques.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Qualitative Qantitative and Mixed Methods.pptx
Introduction-to-Cloud-ComputingFinal.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Clinical guidelines as a resource for EBP(1).pdf
ISS -ESG Data flows What is ESG and HowHow
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Mega Projects Data Mega Projects Data
Business Analytics and business intelligence.pdf
IB Computer Science - Internal Assessment.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Reliability_Chapter_ presentation 1221.5784
Miokarditis (Inflamasi pada Otot Jantung)
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
1_Introduction to advance data techniques.pptx
climate analysis of Dhaka ,Banglades.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Business Acumen Training GuidePresentation.pptx

Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)

  • 1. [course site] Day 4 Lecture 2 Advanced Neural Machine Translation Marta R. Costa-jussà
  • 2. 2 Acknowledgments Kyunghyun Cho, NVIDIA BLOGS: https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-with-gpus/
  • 3. 3 From previous lecture... Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
  • 4. Attention-based mechanism Read the whole sentence, then produce the translated words one at a time, each time focusing on a different part of the input sentence
  • 5. 5 Encoder with attention: context vector GOAL: Encode a source sentence into a set of context vectors http://www.deeplearningbook.org/contents/applications.html
  • 6. 6 Composing the context vector: bidirectional RNN
  • 7. 7 Composing the context vector: bidirectional RNN
  • 8. 8 Decoder with attention ● The context vector now concatenates forward and reverse encoding vectors ● The decoder generates one symbol at a time based on this new context set To compute the new decoder memory state, we must get one vector out of all context vectors.
  • 9. 9 Compute the context vector Each time step t, ONE vector context (c_i) is computed based on the (1) previous hidden state of the decoder (z_(i-1)), (2) previously decoded symbol (u_(i-1)), (3) whole context set (C)
  • 10. 10 Score each context vector based on how relevant it is for translating the next target word This scoring (h_j, j=1...T_x) is based on the previous memory state, the previous generated target word and the j-th context vector
  • 11. 11 Score each context vector based on how relevant it is for translating the next target word fscore is usually a simple single-layer feedforward network this relevance score measures how relevant the j-th context vector of the source sentence is in deciding the next symbol in the translation
  • 12. 12 Normalize relevance scores=attention weight These attention weights correspond to how much the decoder attends to each of the context vectors.
  • 13. 13 Obtain the context vector c_i as the weighted sum of the context vectors with their weights being the attention weights
  • 14. 14 Update the decoder’s hidden state (The initial hidden state is initialized based on the last hidden state of the reverse RNN)
  • 15. 1515 Decoder Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015) RNN’s internal state zi depends on: summary vector ht , previous output word ui-1 and previous internal state zi-1 . NEW INTERNAL STATE From previous session
  • 16. 16 Translation performances comparison English-to-French WMT 2014 task Model BLEU Simple Encoder-Decoder 17.82 +Attention-based 37.19 Phrase-based 37.03
  • 17. 17 What attention learns… WORD ALIGNMENT Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
  • 18. 18 What attention learns… WORD ALIGNMENT Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
  • 19. 19 Neural MT is better than phrase-based Neural Network for Machine Translation at Production Scale
  • 20. 20 Results in WMT 2016 international evaluation
  • 22. 22 Character-based Neural Machine Translation: Motivation ■Word embeddings have been shown to boost the performance in many NLP tasks, including machine translation. ■However, the standard look-up based embeddings are limited to a finite-size vocabulary for both computational and sparsity reasons. ■The orthographic representation of the words is completely ignored. ■The standard learning process is blind to the presence of stems, prefixes, suffixes and any other kind of affixes in words.
  • 23. 23 Character-based Neural MT: Proposal (Step 1) ■The computation of the representation of each word starts with a character-based embedding layer that associates each word (sequence of characters) with a sequence of vectors. ■This sequence of vectors is then processed with a set of 1D convolution filters of different lengths followed with a max pooling layer. ■For each convolutional filter, we keep only the output with the maximum value. The concatenation of these max values already provides us with a representation of each word as a vector with a fixed length equal to the total number of convolutional kernels.
  • 24. 24 Character-based Neural MT: Proposal (Step 2) ■The addition of two highway layers was shown to improve the quality of the language model in (Kim et al., 2016). ■The output of the second Highway layer will give us the final vector representation of each source word, replacing the standard source word embedding in the neural machine translation system. architecture designed to ease gradient-based training of deep networks
  • 27. 27 Multilingual Translation Kyunghyun Cho, “DL4MT slides” (2015)
  • 28. 28 Multilingual Translation Approaches Sharing attention-based mechanism across language pairs Orhan Firat et al, “Multi-way, Multilingual Neural Machine Translation with a Shared-based Mechanism” (2016)
  • 29. 29 Multilingual Translation Approaches Sharing attention-based mechanism across language pairs Orhan Firat et al, “Multi-way, Multilingual Neural Machine Translation with a Shared-based Mechanism” (2016) Share encoder, decoder, attention accross language pairs Johnson et al, “Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation” (2016) https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
  • 30. 30 Is the system learning an Interlingua? https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
  • 31. 31 Available software on github DL4MT NEMATUS Most publications have open-source code...
  • 32. 32 Summary ● Attention-based mechanism allows to achieve state-of-the-art results ● Progress in MT includes character-based, multilinguality...
  • 33. 33 Learn more Natural Language Understanding with Distributed Representation, Kyunghyun Cho, Chapter 6, 2015 (available in github)
  • 34. Thanks ! Q&A ? https://www.costa-jussa.com marta.ruiz@upc.edu