Searching for the Best Machine Translation Combination

Matīss Rikters
Searching for the Best Machine
Translation Combination
Tartu, Estonia
22.03.2017

Machine Translation
Hybrid Machine Translation
Methods I used
• A count-based language model for candidate selection from full whole translations
• Combining translations of sentence chunks
• Combining translations of linguistically motivated chunks
• A character-level neural language model for candidate selection
A graphical implementation of the methods
Translation of multiword expressions
Other academic activities
Future plans
Contents

• Machine translation (MT) is a sub-field of natural language processing that
investigates the use of computers to translate text from one language to another
• Statistical MT (SMT) consists of subcomponents that are separately engineered
to learn how to translate from vast amounts of translated text
• Rule-based MT (RBMT) is based on linguistic information covering the main
semantic, morphological, and syntactic regularities of source and target languages
• Neural MT (NMT) consists of a large neural network in which weights are trained
jointly to maximize the translation performance
Machine Translation

• One of the first metrics to report high correlation with human judgments
• One of the most popular in the field
• The closer MT is to a professional human translation, the better it is
• Scores a translation on a scale of 0 to 100
Automatic Evaluation of MT: BLEU

Statistical rule generation
• Rules for RBMT systems are generated from training corpora
Multi-pass
• Process data through RBMT first, and then through SMT
Multi-System hybrid MT
• Multiple MT systems run in parallel
• SMT + RBMT (Ahsan and Kolachina, 2010)
• Confusion Networks (Barrault, 2010)
+ Neural Network Model (Freitag et al., 2015)
• SMT + EBMT + TM + NE (Santanu et al., 2014)
• Recursive sentence decomposition (Mellebeek et al., 2006)
Literature Review: Hybrid Machine Translation

Combining full whole translations
• Translate the full input sentence with multiple MT systems
• Choose the best translation as the output
Combining translations of sentence chunks
• Split the sentence into smaller chunks
• The chunks are the top level subtrees of the syntax tree of the sentence
• Translate each chunk with multiple MT systems
• Choose the best translated chunks and combine them
Combining Translations

KenLM (Heafield, 2011) calculates probabilities based on the observed entry with longest matching
history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
)
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties 𝑏(𝑤𝑖
𝑛−1
) are given by an already-estimated
language model. Perplexity is then calculated using this probability: where
given an unknown probability distribution p and a proposed probability model q, it is evaluated by
determining how well it predicts a separate test sample x1, x2... xN drawn from p.
Candidate Selection

Teikumu dalīšana tekstvienībās
Tulkošana ar tiešsaistes MT API
Google Translate Bing Translator LetsMT
Labākā tulkojuma izvēle
Tulkojuma izvade
Sentence tokenization
Translation with online MT
Selection of
the best translation
Output
Whole Translations

Teikumu dalīšana tekstvienībās
Tulkošana artiešsaistes MT API
Google
Translate
Bing
Translator
LetsMT
Labāko fragmentu izvēle
Tulkojumu izvade
Teikumu sadalīšana fragmentos
Sintaktiskā analīze
Teikumu apvienošana
Sentence tokenization
Translation with online MT
Selection of
the best chunks
Output
Syntactic analysis
Sentence chunking
Sentence
recomposition
Chunks

An advanced approach to chunking
• Traverse the syntax tree bottom up, from right to left
• Add a word to the current chunk if
• The current chunk is not too long (sentence word count / 4)
• The word is non-alphabetic or only one symbol long
• The word begins with a genitive phrase («of »)
• Otherwise, initialize a new chunk with the word
• When chunking results in too many chunks, repeat the process,
allowing more (than sentence word count / 4) words in a chunk
Candidate Selection:
12-gram LM trained with
• KenLM
• DGT-Translation Memory corpus (Steinberger, 2011)
3.1 million legal domain sentences
• Sentences scored with the query program from KenLM
Test data
• 1581 random sentences from the JRC-Acquis corpus
• ACCURAT balanced evaluation corpus
Linguistically Motivated Chunks
CICLing 2016

Linguistically Motivated Chunks
Simple chunks Linguistically motivated chunks
• Recently
• there
• has been an increased interest in the automated
automated discovery of equivalent expressions
expressions in different languages
• .
• Recently there has been an increased interest

16.00
17.00
18.00
19.00
20.00
21.00
22.00
23.00
24.00
25.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
0.11
0.20
0.32
0.41
0.50
0.61
0.70
0.79
0.88
1.00
1.09
1.20
1.29
1.40
1.47
1.56
1.67
1.74
1.77
BLEU
Perplexity
Epoch
Perplexity BLEU-HY Linear (BLEU-HY)
Neural Language Models
13.30
13.80
14.30
14.80
15.30
15.80
16.30
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
BLEU
Perplexity
Epoch
Perplexity BLEU Linear (BLEU)

System BLEU
Whole translations – G+B
(Rikters 2015)
17.70
Simple Chunks– G+B
(Rikters and Skadiņa 2016a)
17.95
Linguistic Chunks – G+B
(Rikters and Skadiņa 2016b)
18.29
Linguistic Chunks – G+B+H+Y
(Rikters and Skadiņa 2016b)
19.21
+ Char-RNN Neural Language Model
(Rikters 2016d)
19.51
Some Results
Baselines BLEU
Bing 17.43
Google 17.63
Hugo.lv 17.14
Yandex 16.04

Start page
Translate with
onlinesystems
Inputtranslations
to combine
Input
translated
chunks
Settings
Translation results
Inputsource
sentence
Inputsource
sentence
Interactive MS MT
(Rikters 2016a)

Translation of Multi-Word Expressions (MWEs)
Find & Mark
MWE candidates
in corpora
Pre-process
monolingual texts
with TreeTagger
Extract MWE
candidate lists
from corpora
Mark MWE
candidates in
text
Find translation equivalents for
monolingual MWE candidates
with MPAligner
Monolingual MWE extraction
and annotation
MWE alignment
SMT Experiments
Adding data to
the parallel
corpora
Adding a second
translation table
Adding a sixth
feature to the
translation table
Using the Jaccard
Index for translation
probabilities
Using a Levenshtein
distance-based
similarity metric for
translation
probabilities
Method BLEU
Baseline 62.23
Baseline + MWE training data 62.10
Baseline + 2nd translation table 62.04
Baseline + 6th feature 62.37

MWEs in Neural Machine Translation
English-Latvian English-Czech
Training
Validation
2.5M 1xMWE 2.5M 2xMWE 5M 2xMWE 5M
1M 1xMWE 1M 2xMWE 2M 2xMWE 0.5M

• Matīss Rikters
"Multi-system machine translation using online APIs for English-Latvian"
The Fourth Workshop on Hybrid Approaches to Translation (2015)
• Matīss Rikters and Inguna Skadiņa
"Syntax-based multi-system machine translation"
The 10th edition of the Language Resources and Evaluation Conference (2016a)
• Matīss Rikters and Inguna Skadiņa
"Combining machine translated sentence chunks from multiple MT systems"
The 17th International Conference on Computational Linguistics and Intelligent Text Processing (2016b)
• Matīss Rikters
"K-translate – interactive multi-system machine translation"
12th International Baltic Conference on Databases and Information Systems (2016a)
• Matīss Rikters
“Searching for the Best Translation Combination Across All Possible Variants”
The 7th Conference on Human Language Technologies - the Baltic Perspective (2016b)
• Matīss Rikters
“Interactive Multi-System Machine Translation with Neural Language Models”
IOS Press Ebook (2016c)
• Matīss Rikters
“Neural Network Language Models for Candidate Scoring in Hybrid Multi-System Machine Translation”
The Sixth Workshop on Hybrid Approaches to Translation (2016d)
Publications
CICLing 2016

• Matīss Rikters and Ondřej Bojar
"Handling Multi-Word Expressions in Neural Machine Translation"
Publications in Progress

http://ej.uz/ChunkMT
http://ej.uz/SyMHyT
http://ej.uz/MSMT
http://ej.uz/chunker
http://ej.uz/NeuralLM
Code on GitHub

Teaching
• Supervised multiple course, qualification and bachelor theses
• Average grade 8.67
• Student curator
Attended Summer / Winter Schools
• Machine Translation Marathon 2015
• Deep Learning For Machine Translation 2015
• ParseME 2nd Training School
• Neural Machine Translation Marathon 2016
Other Academic Activities

Future Work
• Complete experiments and inspect results for English – Estonian
• Win WMT17 news translation task
• At least for English-Latvian
• At least beat Tilde
• Perform chunking on the target side
• Get chunks from dependency parses
• Complete PhD thesis draft
• Pass final exams
• Experiment with other types of LMs for candidate selection
• Factored Language Models (POS tag + lemma)
• Convolutional Neural Network Language Models
• Perform candidate selection using MT quality estimation
• QuEst++ (Specia et al., 2015)
• SHEF-NN (Shah et al., 2015)

Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of the Association for Machine Translation in the
Americas." Denver, Colorado (2010).
Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155.
Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011.
Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015).
Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006).
Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010.
Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting
of the Association for Computational Linguistics. Association for Computational Linguistics, 2006.
Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International Conference Baltic HLT 2010, Frontiers in
Artificial Intelligence and Applications, Vol. 2192. , 125-132.
Rikters, M., Skadiņa, I.: Syntax-based multi-system machine translation. LREC 2016. (2016a)
Rikters, M., Skadiņa, I.: Combining machine translated sentence chunks from multiple MT systems. CICLing 2016. (2016b)
Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language Processing. , 2014.
Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of the COLING/ACL on Main conference poster
sessions. Association for Computational Linguistics, 2006.
Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015.
Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association for Computational Linguistics and Seventh
International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: System Demonstrations. 2015.
Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013).
Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006).
References

Searching for the Best Machine Translation Combination

More Related Content

What's hot (20)

Similar to Searching for the Best Machine Translation Combination (20)

More from Matīss ‎‎‎‎‎‎‎ (20)

Recently uploaded (20)

Searching for the Best Machine Translation Combination