Improving lexical choice in neural machine translation

Improving Lexical Choice
in Neural Machine Translation
Toan Q. Nguyen and David Chiang
arXiv 2017
presentation
Sekizawa Yuuki
2017/12/5 1

Overview
• NMT learns word representations in continuous space
• NMT’s translation tends to seem natural in the context,
but do not reflect the content of the source sentence
• due to the frequency of training
• proposed method: two solution
1. argue that the standard output layer, which computes the
inner product of a vector representing the context with all
possible output word embeddings, rewards frequent words
disproportionately, and we propose to fix the norms of both
vectors to a constant value
2. integrate a simple lexical module which is jointly trained with
the rest of the model
2017/12/5 2

Output a word in NMT
• f: source sequence
• e: output word
• We: embedding of word e
• be: bias vector (scalar)
• h~: hidden vector
• depending only on the source sentence and previous
output words
2017/12/5 3
depending only e

Propose method argues
2017/12/5 4
1. measures how well e fits into the context h ̃, favors
common words disproportionately, and show that it
helps to fix the norm of both vectors to a constant
2. add a new term representing a more direct connection
from the source sentence, which allows the model to
better memorize translations of rare words

Proposed method: Nomalization
• do this by projected gradient descent: after an update,
project each We onto the hypersphere of radius r
2017/12/5 5

Previous work: lexicon into NMT
• background
• hidden state contains information
• the source word(s) corresponding to the current target word
• the contexts of those source words and the preceding context of the
target word.
• This could make the model prone to generate a target word that
fits the context but doesn’t necessarily correspond to the source
word(s)
• Arthur et al. (2016)
• tried to alleviate this issue by integrating a count-based lexicon
into an NMT system
• However, this lexicon must be trained separately using GIZA++
and its parameters form a large, sparse array, which can be
difficult to store in GPU memory
2017/12/5 6

Proposed method: Lexical Translation
• use a simple feedforward neural network (FFNN)
• trained jointly with the rest of the NMT model to generate
a target word based directly on the source word(s)
2017/12/5 7

Experiment: Data settings
• Tamil (ta), Urdu (ur), Hausa (ha), Turkish (tu), and
Hungarian (hu) to English (en), using data from the
LORELEI program.
• English to Vietnamese (vi), using data from the IWSLT
2015 shared task.1
• English to Japanese (ja) KFTT and BTEC datasets.
2017/12/5 8

Experiment: NMT systems
• NMT baselines
• untied: does not tie the rows of Wo to the target word
embeddings
• tied: tie the rows of Wo to the target word embeddings
• other baselines
• Moses: The Moses phrase-based translation system
Moses used the full vocabulary from the training data; unknown
words were copied to the target sentence.
• Arthur: Our reimplementation of the discrete lexicon approach
of Arthur et al. (2016). We only tried their auto lexicon, using
GIZA++ integrated using their bias approach.
• proposed methods
• fixnorm: The normalization approach
• fixnorm+lex: fixnorm with the addition of the lexical translation
2017/12/5 9

Result: BLEU evaluation
2017/12/5 10
parentheses are relative to tied
a dagger † indicating an insignificant difference in BLEU (p > 0.01)

Result: Translation example
2017/12/5 11

Result: Alignment
2017/12/5 12

Analysis: Hyper parameter r
2017/12/5 13

Analysis: Training process
2017/12/5 14
BLEU of develop

Conclusion
• presented two simple yet effective changes to the
output layer of a NMT model
• both of these changes improve translation quality
• substantially on low-resource language pairs
• the baseline NMT system performs poorly relative to
phrase-based translation, but our system surpasses it
• We conclude that NMT, equipped with the methods
is a more viable choice for low-resource translation
2017/12/5 15

Improving lexical choice in neural machine translation

More Related Content

What's hot (7)

Similar to Improving lexical choice in neural machine translation (20)

More from sekizawayuuki (20)

Recently uploaded (20)

Improving lexical choice in neural machine translation