Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)

[course site]
Day 4 Lecture 2
Advanced Neural
Machine Translation
Marta R. Costa-jussà

2
Acknowledgments
Kyunghyun Cho, NVIDIA BLOGS:
https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-with-gpus/

3
From previous lecture...
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)

Attention-based
mechanism
Read the whole sentence, then produce the translated words one at a
time, each time focusing on a different part of the input sentence

5
Encoder with attention: context vector
GOAL: Encode a source sentence into a set of
context vectors
http://www.deeplearningbook.org/contents/applications.html

6
Composing the context vector: bidirectional
RNN

7
Composing the context vector: bidirectional
RNN

8
Decoder with attention
● The context vector now concatenates forward and
reverse encoding vectors
● The decoder generates one symbol at a time based on
this new context set
To compute the new decoder memory state, we must get
one vector out of all context vectors.

9
Compute the context vector
Each time step t, ONE vector context (c_i) is computed
based on the (1) previous hidden state of the decoder
(z_(i-1)), (2) previously decoded symbol (u_(i-1)), (3) whole
context set (C)

10
Score each context vector based on how relevant it is
for translating the next target word
This scoring (h_j, j=1...T_x) is based
on the previous memory state, the
previous generated target word and
the j-th context vector

11
Score each context vector based on how relevant it is
for translating the next target word
fscore is usually a simple single-layer
feedforward network
this relevance score measures how
relevant the j-th context vector of the
source sentence is in deciding the
next symbol in the translation

12
Normalize relevance scores=attention weight
These attention weights correspond to how
much the decoder attends to each of the
context vectors.

13
Obtain the context vector c_i
as the weighted sum of the context vectors with
their weights being the attention weights

14
Update the decoder’s hidden state
(The initial hidden state is initialized
based on the last hidden state of the
reverse RNN)

1515
Decoder
RNN’s internal state zi
depends on: summary vector ht
,
previous output word ui-1
and previous internal state zi-1
.
NEW INTERNAL
STATE
From previous session

16
Translation performances comparison
English-to-French WMT 2014 task
Model BLEU
Simple Encoder-Decoder 17.82
+Attention-based 37.19
Phrase-based 37.03

17
What attention learns… WORD ALIGNMENT

18
What attention learns… WORD ALIGNMENT

19
Neural MT is better than phrase-based
Neural Network for Machine Translation at Production Scale

20
Results in WMT 2016 international evaluation

22
Character-based Neural Machine
Translation: Motivation
■Word embeddings have been shown to boost the performance in many
NLP tasks, including machine translation.
■However, the standard look-up based embeddings are limited to a
finite-size vocabulary for both computational and sparsity reasons.
■The orthographic representation of the words is completely ignored.
■The standard learning process is blind to the presence of stems, prefixes,
suffixes and any other kind of affixes in words.

23
Character-based Neural MT:
Proposal (Step 1) ■The computation of the representation of each
word starts with a character-based embedding
layer that associates each word (sequence of
characters) with a sequence of vectors.
■This sequence of vectors is then processed
with a set of 1D convolution filters of different
lengths followed with a max pooling layer.
■For each convolutional filter, we keep only the
output with the maximum value. The
concatenation of these max values already
provides us with a representation of each word
as a vector with a fixed length equal to the total
number of convolutional kernels.

24
Proposal (Step 2)
■The addition of two highway layers was
shown to improve the quality of the
language model in (Kim et al., 2016).
■The output of the second Highway layer
will give us the final vector representation
of each source word, replacing the
standard source word embedding in the
neural machine translation system.
architecture designed to
ease gradient-based
training of deep
networks

25
Integration with NMT

27
Multilingual Translation
Kyunghyun Cho, “DL4MT slides” (2015)

28
Multilingual Translation Approaches
Sharing attention-based mechanism across language pairs
Orhan Firat et al, “Multi-way, Multilingual Neural Machine Translation with a Shared-based Mechanism”
(2016)

29
Multilingual Translation Approaches
Sharing attention-based mechanism across language pairs
Orhan Firat et al, “Multi-way, Multilingual Neural Machine Translation with a Shared-based Mechanism”
(2016)
Share encoder, decoder, attention accross language pairs
Johnson et al, “Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”
(2016)
https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html

30
Is the system learning an Interlingua?
https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html

31
Available software on github
DL4MT
NEMATUS
Most publications have open-source code...

32
Summary
● Attention-based mechanism allows to achieve
state-of-the-art results
● Progress in MT includes character-based, multilinguality...

33
Learn more
Natural Language Understanding with
Distributed Representation, Kyunghyun Cho,
Chapter 6, 2015 (available in github)

Thanks ! Q&A ?
https://www.costa-jussa.com
marta.ruiz@upc.edu

Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)

More Related Content

What's hot (20)

Viewers also liked (18)

Similar to Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017) (20)

More from Universitat Politècnica de Catalunya (20)

Recently uploaded (20)

Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)