Neural Network Language Models
for Candidate Scoring
in Multi-System Machine Translation
Matīss Rikters
University of Latvia
COLING 2016 6th Workshop on
Hybrid Approaches to Translation
Osaka, Japan
December 11, 2016
Contents
1. Introduction
2. Baseline System
3. Example Sentence
4. Neural Network Language Models
5. Results
6. Related publications
7. Future plans
Chunking
– Parse sentences with Berkeley Parser (Petrov et al., 2006)
– Traverse the syntax tree bottom up, from right to left
– Add a word to the current chunk if
• The current chunk is not too long (sentence word count / 4)
• The word is non-alphabetic or only one symbol long
• The word begins with a genitive phrase («of »)
– Otherwise, initialize a new chunk with the word
– In case when chunking results in too many chunks, repeat the process,
allowing more (than sentence word count / 4) words in a chunk
Translation with online MT systems
– Google Translate; Bing Translator; Yandex.Translate; Hugo.lv
12-gram language model
– DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million
Latvian legal domain sentences
Baseline System
Teikumu dalīšana tekstvienībās
Tulkošana artiešsaistes MT API
Google
Translate
Bing
Translator
LetsMT
Labāko fragmentu izvēle
Tulkojumu izvade
Teikumu sadalīšana fragmentos
Sintaktiskā analīze
Teikumu apvienošana
Sentence tokenization
Translation with online MT
Selection of
the best chunks
Output
Syntactic analysis
Sentence chunking
Sentence
recomposition
Baseline System
Sentence Chunking
Choose the best candidate
KenLM (Heafield, 2011) calculates probabilities based on the
observed entry with longest matching history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
)
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties
𝑏(𝑤𝑖
𝑛−1
) are given by an already-estimated language model.
Perplexity is then calculated using this probability:
where given an unknown probability distribution p and a
proposed probability model q, it is evaluated by determining
how well it predicts a separate test sample x1, x2... xN drawn
from p.
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Recently there has been an increased interest
in the automated discovery
of equivalent expressions in different languages .
Neural Language Models
• RWTHLM
• CPU only
• Feed-forward, recurrent (RNN) and long short-term
memory (LSTM) NNs
• MemN2N
• CPU or GPU
• End-to-end memory network (RNN with attention)
• Char-RNN
• CPU or GPU
• RNNs, LSTMs and rated recurrent units (GRU)
• Character level
Best Models
• RWTHLM
• one feed-forward input layer with a 3-word
history, followed by one linear layer of 200
neurons with sigmoid activation function
• MemN2N
• internal state dimension of 150, linear part of
the state 75, number of hops set to six
• Char-RNN
• 2 LSTM layers with 1,024 neurons each,
dropout set to 0.5
Char-RNN
• A character level model works better
for highly inflected languages with
less data
• Requires Torch scientific computing
framework + additional packages
• Can run on CPU, NVIDIA GPU or
AMD GPU
• Intended for generating new text,
modified to score new text
More in Andrej Karpathy’s blog
Experiment Environment
Training
• Baseline KenLM and RWTHLM modes
• 8-core CPU with 16GB of RAM
• MemN2N
• GeForce Titan X (12GB, 3,072 CUDA cores)
12-core CPU and 64GB RAM
• Char-RNN
• Radeon HD 7950 (3GB, 1,792 cores)
8-core CPU and 16GB RAM
Translation
• All models
• 4-core CPU with 16GB of RAM
Results
System Perplexity
Training
Corpus
Size
Trained
On
Training
Time
BLEU
KenLM 34.67 3.1M CPU 1 hour 19.23
RWTHLM 136.47 3.1M CPU 7 days 18.78
MemN2N 25.77 3.1M GPU 4 days 18.81
Char-RNN 24.46 1.5M GPU 2 days 19.53
General domain
12.00
12.50
13.00
13.50
14.00
14.50
15.00
15.50
16.00
16.50
17.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.77
BLEU
Perplexity
Epoch
Perplexity BLEU-HY BLEU-BG Linear (BLEU-HY) Linear (BLEU-BG)
Legal domain
16.00
17.00
18.00
19.00
20.00
21.00
22.00
23.00
24.00
25.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.77
BLEU
Perplexity
Epoch
Perplexity BLEU-BG BLEU-HY Linear (BLEU-BG) Linear (BLEU-HY)
• Matīss Rikters
"Multi-system machine translation using online APIs for English-Latvian"
ACL-IJCNLP 2015 4th HyTra Workshop
• Matīss Rikters and Inguna Skadiņa
"Syntax-based multi-system machine translation"
LREC 2016
• Matīss Rikters and Inguna Skadiņa
"Combining machine translated sentence chunks from multiple MT systems"
CICLing 2016
• Matīss Rikters
"K-translate – interactive multi-system machine translation"
Baltic DB&IS 2016
• Matīss Rikters
"Searching for the Best Translation Combination Across All Possible Variants"
Baltic HLT 2016
Related publications
Baseline system
• http://ej.uz/ChunkMT
Only the chunker + visualizer
• http://ej.uz/chunker
Interactive browser version
• http://ej.uz/KTranslate
With integrated usage of NN LMs
• http://ej.uz/NNLMs
Code on GitHub
https://github.com/M4t1ss
More enhancements for the chunking step
– Try dependency parsing instead of constituency
Choose the best translation candidate with MT quality estimation
– QuEst++ (Specia et al., 2015)
– SHEF-NN (Shah et al., 2015)
Add special processing of multi-word expressions (MWEs)
Handle MWEs in neural machine translation systems
Future work
References• Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of
the Association for Machine Translation in the Americas." Denver, Colorado (2010).
• Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155.
• Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation.
Association for Computational Linguistics, 2011.
• Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015).
• Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006).
• Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010.
• Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on
Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics,
2006.
• Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International
Conference Baltic HLT 2010, Frontiers in Artificial Intelligence and Applications, Vol. 2192. , 125-132.
• Rikters, M., Skadiņa, I.: Syntax-based multi-system machine translation. LREC 2016. (2016)
• Rikters, M., Skadiņa, I.: Combining machine translated sentence chunks from multiple MT systems. CICLing 2016. (2016)
• Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language
Processing. , 2014.
• Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of
the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006.
• Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine
Translation. 2015.
• Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association
for Computational Linguistics and Seventh International Joint Conference on Natural Language Processing of the Asian Federation of Natural
Language Processing: System Demonstrations. 2015.
• Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013).
• Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006).
References
Thank you!
Thank you!

More Related Content

PDF
Frontiers of Natural Language Processing
PDF
Learning to understand phrases by embedding the dictionary
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
PDF
Natural language processing
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
PDF
What can typological knowledge bases and language representations tell us abo...
PDF
AINL 2016: Eyecioglu
Frontiers of Natural Language Processing
Learning to understand phrases by embedding the dictionary
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Natural language processing
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
What can typological knowledge bases and language representations tell us abo...
AINL 2016: Eyecioglu

What's hot (20)

PDF
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
PDF
Deep Learning for Information Retrieval
PDF
Research data as an aid in teaching technical competence in subtitling
PDF
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
PDF
Adnan: Introduction to Natural Language Processing
PDF
AINL 2016: Galinsky, Alekseev, Nikolenko
PDF
AINL 2016: Maraev
PDF
Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...
PDF
(Deep) Neural Networks在 NLP 和 Text Mining 总结
PDF
AINL 2016: Kuznetsova
PDF
AINL 2016: Alekseev, Nikolenko
PDF
ICS1020 NLP 2020
PDF
Acl reading@2016 10-26
PDF
About programming languages
PPTX
Semantic, Cognitive and Perceptual Computing -Deep learning
PDF
Natural Language Processing in Practice
PDF
LIWC-ing at Texts for Insights from Linguistic Patterns
PPTX
Arabic question answering ‫‬
PDF
NLP Project Full Cycle
PPTX
Seamless semantics - avoiding semantic discontinuity
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Deep Learning for Information Retrieval
Research data as an aid in teaching technical competence in subtitling
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Adnan: Introduction to Natural Language Processing
AINL 2016: Galinsky, Alekseev, Nikolenko
AINL 2016: Maraev
Gender Classification of Blog Authors: With Feature Engineering and Deep Lear...
(Deep) Neural Networks在 NLP 和 Text Mining 总结
AINL 2016: Kuznetsova
AINL 2016: Alekseev, Nikolenko
ICS1020 NLP 2020
Acl reading@2016 10-26
About programming languages
Semantic, Cognitive and Perceptual Computing -Deep learning
Natural Language Processing in Practice
LIWC-ing at Texts for Insights from Linguistic Patterns
Arabic question answering ‫‬
NLP Project Full Cycle
Seamless semantics - avoiding semantic discontinuity
Ad

Viewers also liked (11)

PPT
Jānis Zuters: Mākslīgais intelekts, dabiskā intelekta surogāts vai alternatīva?
PPT
Guntis Arnicāns: Skats uz mākslīgā intelekta sistēmu no komplekso sistēmu teo...
PPTX
Programmatūras testēšanas pamati
PPTX
Neirozinātnes SkeptiCafe: Ulrika Beitnere
PPTX
Vjačeslavs Kaščejevs: Kvantu fizika bez mistikas
PPT
IMPACT Final Conference - Richard Boulderstone
PPTX
Neirozinātnes SkeptiCafe: Gunda Zvīgule
PPTX
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
PPTX
Statistical Machine Translation for Language Localisation
PDF
Deep Learning for NLP: An Introduction to Neural Word Embeddings
PDF
Deep Learning for Natural Language Processing: Word Embeddings
Jānis Zuters: Mākslīgais intelekts, dabiskā intelekta surogāts vai alternatīva?
Guntis Arnicāns: Skats uz mākslīgā intelekta sistēmu no komplekso sistēmu teo...
Programmatūras testēšanas pamati
Neirozinātnes SkeptiCafe: Ulrika Beitnere
Vjačeslavs Kaščejevs: Kvantu fizika bez mistikas
IMPACT Final Conference - Richard Boulderstone
Neirozinātnes SkeptiCafe: Gunda Zvīgule
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
Statistical Machine Translation for Language Localisation
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Ad

Similar to Neural Network Language Models for Candidate Scoring in Multi-System Machine Translation (20)

PPTX
Searching for the Best Machine Translation Combination
PDF
Building a Neural Machine Translation System From Scratch
PPTX
Searching for the best translation combination
PDF
RSP4J: An API for RDF Stream Processing
PDF
The Standards Mosaic Opening the Way to New Technologies
PDF
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
PDF
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
PDF
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
PDF
Are High Level Programming Languages for Multicore and Safety Critical Conver...
PPTX
Model of semantic textual document clustering
PPTX
Corpus Linguistics :Analytical Tools
PPTX
Gnerative AI presidency Module1_L4_LLMs_new.pptx
PPTX
2010 INTERSPEECH
PPTX
Cork AI Meetup Number 3
PDF
GPT-2: Language Models are Unsupervised Multitask Learners
PDF
Machine Learning for Software Maintainability
PPTX
Seminar on Parallel and Concurrent Programming
PPT
On the need for a W3C community group on RDF Stream Processing
PPT
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
Searching for the Best Machine Translation Combination
Building a Neural Machine Translation System From Scratch
Searching for the best translation combination
RSP4J: An API for RDF Stream Processing
The Standards Mosaic Opening the Way to New Technologies
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
FPGAs as Components in Heterogeneous HPC Systems (paraFPGA 2015 keynote)
Are High Level Programming Languages for Multicore and Safety Critical Conver...
Model of semantic textual document clustering
Corpus Linguistics :Analytical Tools
Gnerative AI presidency Module1_L4_LLMs_new.pptx
2010 INTERSPEECH
Cork AI Meetup Number 3
GPT-2: Language Models are Unsupervised Multitask Learners
Machine Learning for Software Maintainability
Seminar on Parallel and Concurrent Programming
On the need for a W3C community group on RDF Stream Processing
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...

More from Matīss ‎‎‎‎‎‎‎   (20)

PPTX
Relation Between Images and Text Posted on Social Media
PPTX
PPTX
Thrifty Food Tweets on a Rainy Day
PDF
How Masterly Are People at Playing with Their Vocabulary?
PDF
大学への交通手段
PPTX
小学生に 携帯電話
PPTX
Tracing multisensory food experience on twitter
PPTX
PDF
PPTX
Tips and Tools for NMT
PPTX
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
PPTX
The Impact of Corpora Qulality on Neural Machine Translation
PPTX
Advancing Estonian Machine Translation
PPTX
Debugging neural machine translations
PPTX
Effective online learning implementation for statistical machine translation
PDF
Neirontulkojumu atkļūdošana
PPTX
Hybrid machine translation by combining multiple machine translation systems
Relation Between Images and Text Posted on Social Media
Thrifty Food Tweets on a Rainy Day
How Masterly Are People at Playing with Their Vocabulary?
大学への交通手段
小学生に 携帯電話
Tracing multisensory food experience on twitter
Tips and Tools for NMT
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
The Impact of Corpora Qulality on Neural Machine Translation
Advancing Estonian Machine Translation
Debugging neural machine translations
Effective online learning implementation for statistical machine translation
Neirontulkojumu atkļūdošana
Hybrid machine translation by combining multiple machine translation systems

Recently uploaded (20)

PPTX
Build Your First AI Agent with UiPath.pptx
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
DOCX
search engine optimization ppt fir known well about this
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Five Habits of High-Impact Board Members
PPTX
Modernising the Digital Integration Hub
PPTX
Configure Apache Mutual Authentication
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PPT
What is a Computer? Input Devices /output devices
PDF
sustainability-14-14877-v2.pddhzftheheeeee
Build Your First AI Agent with UiPath.pptx
CloudStack 4.21: First Look Webinar slides
Convolutional neural network based encoder-decoder for efficient real-time ob...
Chapter 5: Probability Theory and Statistics
Benefits of Physical activity for teenagers.pptx
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
search engine optimization ppt fir known well about this
Consumable AI The What, Why & How for Small Teams.pdf
Five Habits of High-Impact Board Members
Modernising the Digital Integration Hub
Configure Apache Mutual Authentication
OpenACC and Open Hackathons Monthly Highlights July 2025
Developing a website for English-speaking practice to English as a foreign la...
Microsoft Excel 365/2024 Beginner's training
Flame analysis and combustion estimation using large language and vision assi...
Credit Without Borders: AI and Financial Inclusion in Bangladesh
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
A proposed approach for plagiarism detection in Myanmar Unicode text
What is a Computer? Input Devices /output devices
sustainability-14-14877-v2.pddhzftheheeeee

Neural Network Language Models for Candidate Scoring in Multi-System Machine Translation

  • 1. Neural Network Language Models for Candidate Scoring in Multi-System Machine Translation Matīss Rikters University of Latvia COLING 2016 6th Workshop on Hybrid Approaches to Translation Osaka, Japan December 11, 2016
  • 2. Contents 1. Introduction 2. Baseline System 3. Example Sentence 4. Neural Network Language Models 5. Results 6. Related publications 7. Future plans
  • 3. Chunking – Parse sentences with Berkeley Parser (Petrov et al., 2006) – Traverse the syntax tree bottom up, from right to left – Add a word to the current chunk if • The current chunk is not too long (sentence word count / 4) • The word is non-alphabetic or only one symbol long • The word begins with a genitive phrase («of ») – Otherwise, initialize a new chunk with the word – In case when chunking results in too many chunks, repeat the process, allowing more (than sentence word count / 4) words in a chunk Translation with online MT systems – Google Translate; Bing Translator; Yandex.Translate; Hugo.lv 12-gram language model – DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million Latvian legal domain sentences Baseline System
  • 4. Teikumu dalīšana tekstvienībās Tulkošana artiešsaistes MT API Google Translate Bing Translator LetsMT Labāko fragmentu izvēle Tulkojumu izvade Teikumu sadalīšana fragmentos Sintaktiskā analīze Teikumu apvienošana Sentence tokenization Translation with online MT Selection of the best chunks Output Syntactic analysis Sentence chunking Sentence recomposition Baseline System
  • 6. Choose the best candidate KenLM (Heafield, 2011) calculates probabilities based on the observed entry with longest matching history 𝑤𝑓 𝑛 : 𝑝 𝑤 𝑛 𝑤1 𝑛−1 = 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 𝑖=1 𝑓−1 𝑏(𝑤𝑖 𝑛−1 ) where the probability 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 and backoff penalties 𝑏(𝑤𝑖 𝑛−1 ) are given by an already-estimated language model. Perplexity is then calculated using this probability: where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.
  • 22. Example sentence Recently there has been an increased interest in the automated discovery of equivalent expressions in different languages .
  • 23. Neural Language Models • RWTHLM • CPU only • Feed-forward, recurrent (RNN) and long short-term memory (LSTM) NNs • MemN2N • CPU or GPU • End-to-end memory network (RNN with attention) • Char-RNN • CPU or GPU • RNNs, LSTMs and rated recurrent units (GRU) • Character level
  • 24. Best Models • RWTHLM • one feed-forward input layer with a 3-word history, followed by one linear layer of 200 neurons with sigmoid activation function • MemN2N • internal state dimension of 150, linear part of the state 75, number of hops set to six • Char-RNN • 2 LSTM layers with 1,024 neurons each, dropout set to 0.5
  • 25. Char-RNN • A character level model works better for highly inflected languages with less data • Requires Torch scientific computing framework + additional packages • Can run on CPU, NVIDIA GPU or AMD GPU • Intended for generating new text, modified to score new text More in Andrej Karpathy’s blog
  • 26. Experiment Environment Training • Baseline KenLM and RWTHLM modes • 8-core CPU with 16GB of RAM • MemN2N • GeForce Titan X (12GB, 3,072 CUDA cores) 12-core CPU and 64GB RAM • Char-RNN • Radeon HD 7950 (3GB, 1,792 cores) 8-core CPU and 16GB RAM Translation • All models • 4-core CPU with 16GB of RAM
  • 27. Results System Perplexity Training Corpus Size Trained On Training Time BLEU KenLM 34.67 3.1M CPU 1 hour 19.23 RWTHLM 136.47 3.1M CPU 7 days 18.78 MemN2N 25.77 3.1M GPU 4 days 18.81 Char-RNN 24.46 1.5M GPU 2 days 19.53
  • 28. General domain 12.00 12.50 13.00 13.50 14.00 14.50 15.00 15.50 16.00 16.50 17.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.77 BLEU Perplexity Epoch Perplexity BLEU-HY BLEU-BG Linear (BLEU-HY) Linear (BLEU-BG)
  • 29. Legal domain 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 25.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.77 BLEU Perplexity Epoch Perplexity BLEU-BG BLEU-HY Linear (BLEU-BG) Linear (BLEU-HY)
  • 30. • Matīss Rikters "Multi-system machine translation using online APIs for English-Latvian" ACL-IJCNLP 2015 4th HyTra Workshop • Matīss Rikters and Inguna Skadiņa "Syntax-based multi-system machine translation" LREC 2016 • Matīss Rikters and Inguna Skadiņa "Combining machine translated sentence chunks from multiple MT systems" CICLing 2016 • Matīss Rikters "K-translate – interactive multi-system machine translation" Baltic DB&IS 2016 • Matīss Rikters "Searching for the Best Translation Combination Across All Possible Variants" Baltic HLT 2016 Related publications
  • 31. Baseline system • http://ej.uz/ChunkMT Only the chunker + visualizer • http://ej.uz/chunker Interactive browser version • http://ej.uz/KTranslate With integrated usage of NN LMs • http://ej.uz/NNLMs Code on GitHub https://github.com/M4t1ss
  • 32. More enhancements for the chunking step – Try dependency parsing instead of constituency Choose the best translation candidate with MT quality estimation – QuEst++ (Specia et al., 2015) – SHEF-NN (Shah et al., 2015) Add special processing of multi-word expressions (MWEs) Handle MWEs in neural machine translation systems Future work
  • 33. References• Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of the Association for Machine Translation in the Americas." Denver, Colorado (2010). • Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155. • Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011. • Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015). • Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006). • Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010. • Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006. • Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International Conference Baltic HLT 2010, Frontiers in Artificial Intelligence and Applications, Vol. 2192. , 125-132. • Rikters, M., Skadiņa, I.: Syntax-based multi-system machine translation. LREC 2016. (2016) • Rikters, M., Skadiņa, I.: Combining machine translated sentence chunks from multiple MT systems. CICLing 2016. (2016) • Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language Processing. , 2014. • Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006. • Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015. • Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association for Computational Linguistics and Seventh International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: System Demonstrations. 2015. • Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013). • Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006). References