SlideShare a Scribd company logo
Left-to-Right Hierarchical Phrase-
based Translation System
(LR-Hiero)
Maryam Siahbani
Overview
• History of Machine Translation
• Rule based MT
• Statistical MT
– Training
– Decoding
• Left-to-Right Hierarchical Phrase-based MT
• Using LR-Hiero in Simultaneous Translation
2
History of Machine Translation
• Late 1940’s: Early rule-based systems
– computers would replace human translations within
5 years!
• 1966: ALPAC report cuts research funding
• Early 1970’s: First commercial system (Systran)
• Late 1980’s: IBM developed first statistical
models inspired by speech research
• Late 2000’s: Explosion in MT research
• 2006: First version of Google Translate
3
Rule-based Machine Translation
• Rules hand-written by linguists
• State of the art until early 2000’s
– e.g. Systran
• Expensive to create maintain and adapt
4
French
NP
Noun
chat
Adjective
noir
English
NP
Noun
cat
Adjective
black
Statistical Machine Translation
• Data driven approaches to MT
• Learn translation from textual data
– Parallel Data
• Language independent
• Normally use probabilistic models
– The best translation = the most probable translation
𝑒∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑒 𝑃(𝑒|𝑓) where f: source sentence
• State of the art for most language pairs
– Best systems include rules (hybrid)
5
translation
model
Statistical Machine Translation
6
Training
Pipeline
Training data
Monolingual
& Bilingual data
Decoder
Input
sentence
translation
Translation Data
Parallel Text:
(Web, United Nations, European/Canadian Parliament,
Wikipedia, etc.)
Statistical Machine Translation (SMT)
8
Aligned Words
EnZh
happens
发生 事情我们十分 关注 的
we are very much concerned with what in region
地区非洲
African
Learn alignment from parallel text
Statistical Machine Translation (SMT)
9
Aligned Words
EnZh
Translation rules
happens
发生 事情我们十分 关注 的
we are very much concerned with what in region
地区非洲
African
Learn alignment from parallel text
Id Source Target Weight
r1 关注 X_1 concerned with X_1 -5.3
r2 X_1 发生 X_2事情 what happens X_2 X_1 -4.8
r3 非洲 地区 African region -3.1
Learn weighted translation rules from word aligned text
Translation Rules (phrase-pairs)
10
Source Target p(e|f)
den Vorschlag the proposal 0.6227
den Vorschlag ‘s proposal 0.1068
den Vorschlag a proposal 0.0341
den Vorschlag the idea 0.0250
den Vorschlag this proposal 0.0227
den Vorschlag proposal 0.0205
den Vorschlag of the proposal 0.0159
den Vorschlag the proposals 0.0159
* German-English phrase table trained on Europarl
Millions of
translation rules
Log probability
-1.7986
translation
model
Statistical Machine Translation (SMT)
11


drdyee
rhwfePe )(.maxarg)|(maxarg*
)(
Aligned Words
EnZh
Translation rules
Decoder
happens
发生 事情我们十分 关注 的
we are very much concerned with what in region
地区非洲
African
Learn alignment from parallel text
Id Source Target Weight
r1 关注 X_1 concerned with X_1 -5.3
r2 X_1 发生 X_2事情 what happens X_2 X_1 -4.8
r3 非洲 地区 African region -3.1
Learn weighted translation rules from word aligned text
Decoder generates many candidate translations,
scores them and returns the most likely one
Find the translation for any
given input (f)
f e
Measuring Translation Quality:
BLEU score
• BLEU is a simple but effective scoring metric
shown to be proportional to human judgment of
translation quality
• The idea is to measure overlap between the
translation generated by MT system and the
reference translation
• Measure one word overlaps, two word
overlaps,… (n-grams)
• Compute precision score for each n-gram
• Impose a brevity penalty for candidates that are
shorter than reference
12
Measuring Translation Quality:
BLEU score
• Input:
– Ich war in meinen zwangzigern bevor ich erstmals in
ein kunstmuseum ging .
• Reference translation:
– I was in my twenties before I ever went to an art
museum .
• Low BLEU score (41.1):
– I was twenty I ever went to art .
• High BLEU score (89.0):
– I was in my twenties before I first went to an art
museum .
13
Hierarchical Phrase-based
Translation (Hiero)
SCFG
Hierarchical Phrase-based Translation
Synchronous Context-Free Grammar
15
Aligned Words
EnZh
Translation Rules
X -> <我们十分X_1 / we are very much X_1>
X -> <事情 / what >
我们 十分 关注 发生 的 事情地区非洲
(Hiero)
X -> <非洲 地区 / african region >
we are very much
X-> <关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1>
concerned with happens inwhat african region
X -> <我们十分X_1 / we are very much X_1>
X-> <关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1>
X -> <事情 / what >
X -> <非洲 地区 / african region >
translation
model
Decoder
Hiero Decoder
O(n^3)
LM computation
我们 关注 发生 的 事情地区十分 非洲 。
we are very much concerned with what happens in african regions .
X_2
X_1 X_2= what
X -> <关注 X_1 发生 的 X_2 / concerned with X_2 happens in X_1>
X_1= african region
concerned with happens in
what
african region
LM LM LM
Bottom-up Dynamic
Programing algorithm
we are very much concerned with
16
Left-to-Right Hierarchical
Phrase-based Translation System
Left-to-Right Target Generation
(Watanabe et al. 2006)
18
X1
X1
X1
we are very much
concerned with
X2what happens X1
in african region
X1
X1
X1
我们十分
关注
X2发生X1
的非洲 地区
发生
的我们 关注 发生 事情地区十分 非洲
we are very muchconcerned with what happens african regionin
X -> <我们十分 X_1 / we are very much X_1>
X -> <X_1 发生 X_2事情 / what happens X_2 X_1>
X -> < 关注 X_1 / concerned with X_1>
X -> <X_1 发生 的 X_2 / X_2 happens in X_1>Non-GNF
Greibach Normal Form
(GNF)
• Search for sub-phrases within larger ones
– Smaller phrases are replaced by non-terminal X
• Dynamic programming algorithm to extract rules
for LR-
– Linear time complexity (in number of rules)
LR-Hiero Rule Extraction
19
<我们十分X_1 / we are very much X_1>
事情
happens
发生我们十分 关注 的
we are very much concerned with what in region
地区非洲
AfricanX_1
X_1
• Search for sub-phrases within larger ones
– Smaller phrases are replaced by non-terminal X
• A novel Dynamic programming algorithm to extract
rules for LR-Hiero
– Linear time complexity vs. exhaustive search
LR-Hiero Rule Extraction
20
<我们十分X_1 / we are very much X_1>
事情
happens
发生我们十分 关注 的
we are very much concerned with what in region
地区非洲
African
X2X_1
< X_1 发生 X_2事情 / what happens X_2 X_1>
X2 X_1
• Linear time complexity vs. exhaustive search
• Can easily extract rules with more non-terminals
LR-Hiero Rule Extraction
21
0
1000
2000
3000
4000
1 2 3 4
Time(sec.)
No. of Non-terminals
Effect of No. of Non-terminals on
extraction time
Hiero Heuristic
DP Extractor
Expressive Hierarchical Rule Extraction for Left-to-Right Translation. M. Siahbani and A.
Sarkar. AMTA(2014)
的
Left-to-Right Decoding
X -> <我们十分 X_1 / we are very much X_1>
X -> <X_1 发生 X_2事情 / what happens X_2 X_1>
X -> <非洲 地区 / African region >
<s> [0,8]
<s>
<s> we are very much
<s> we are very much concerned with
<s> we are very much concerned with what happens
<s> we are very much concerned with what happens in
0 1 2 3 4 5 6 7 8
我们 关注 发生 事情地区十分 非洲
X -> < 关注 X_1 / concerned with X_1>
X -> <的 / in >
we are very much[2,8]
concerned with[3,8]
what happens[6,7] [3,5]
in
[3,5]
African region
22
的
Left-to-Right Decoding
<s> [0,8]
<s> we are very much [2,8]
<s> we are very much concerned with [3,8]
<s> we are very much concerned with what happens [6,7][3.5]
<s> we are very much concerned with what happens in [3,5]
<s> we are very much concerned with what happens in African region
0 1 2 3 4 5 6 7 8
我们 关注 发生 事情地区十分 非洲
𝑶(𝒏 𝟐
)
Typical CKY: 𝑶(𝒏 𝟑
)
23


drdyt
rfwt )(.maxarg*
)(
Candidate translations are scored by:
<我们十分 X_1 / we are very much X_1>, -4.7
<X_1 发生 X_2事情 / what happens X_2 X_1>, -3.6
<非洲 地区 / African region >, -2.7
< 关注 X_1 / concerned with X_1>, -3.8
<的 / in >, -1.2
, -7.7
, -7.1
, -5.9
, -4.5
, -3.3
, 0
LR-Hiero State-of-the-art
17
19
21
23
25
27
29
0 2000 4000 6000 8000
BLEU(translationaccuracy)
LM Calls (translation time)
Czech-English
German-English
Chinese-English
LR-Hiero Results
3 Times Faster
Comparable Translation Accuracy
Statistical Machine Translation (SMT)
• Available SMT systems:
– Moses (Edinburgh)
– Phrasal (Stanford)
– Jane 2 (Aachen University)
– Joshua (JHU)
– Kriya (SFU)
– CDEC (CMU)
– LR-Hiero
Phrase-Based
Hierarchical
Phrase-Based
(Hiero)
Left-to-Right Hierarchical
Phrase-based
Available : https://github.com/sfu-
natlang/lrhiero
• Time efficient
• Can model complex translation
• Generates translation in left-to-right
manner
• Suitable choice for online translation
Simultaneous Translation
Speech to Speech Translation
Karlsruhe (KIT)
Lecture Translator
NICT Speech Translator Skype Translator
Incremental Translation
• Facilitate continuous translation with low
latency
– Latency: time difference between start of source
sentence (speech) and start of target sentence
(speech)
• Ensure acceptable translation accuracy
Good evening, I would like
a taxi to the airport please
Buenas noches. Quiero un
taxi al aeropuerto por favor
6 sec
Good evening, I would
0.7 sec
0.2 sec
0.2 sec
like a taxi
to the airport please
Non-incremental
Buenas noches quiero
como un taxi
al aeropuerto por favor
Incremental
translate
segment?
Good
Integrating Segmentation with
Translation Process
segment?
Goodevening translate
Integrating Segmentation with
Translation Process
Integrating Segmentation with
Translation Process
segment?
Good eveningI Buenas nochestranslate
Incremental Translation Results
Translation accuracy
measure
• Task: English-German TED speech translation
• MT System Training Data: IWSLT 2013 Train data +
Europarl v7 data [Koehn 2005]
Bleu Latency (sec) Segs/Second
Non-incremental 21.08 6.353 0.15
Prosodic 20.88 0.468 2.27
Incremental 20.86 0.311 3.22
Publications
33
• Efficient Left-to-Right Hierarchical Phrase-Based Translation
with Improved Reordering. Siahbani, Maryam and
Sankaran, Baskaran and Sarkar, Anoop. EMNLP(2014)
• Two Improvements to Left-to-Right Decoding for
Hierarchical Phrase-based Machine Translation. Siahbani,
Maryam and Sarkar, Anoop. EMNLP(2014)
• Expressive Hierarchical Rule Extraction for Left-to-Right
Translation. Siahbani, Maryam and Sarkar, Anoop.
AMTA(2014)
• Incremental Translation using a Hierarchical Phrase-based
Translation System. Siahbani, Maryam and Mehdizadeh
Seraj, Ramtin and Sankaran, Baskaran and Sarkar, Anoop. SLT
(2014)complexity (in number of rules)
Question?
Partial Hypothesis
<s> [0,8], -3.3
<s> we are very much [2,8], -4.5
的
0 1 2 3 4 5 6 7 8
我们 关注 发生 事情地区十分 非洲
<s> we are very much concerned with [3,8], -5.9
<s> we are very much concerned with what happens [6,7][3,5], -7.1
LR-Decoding with Beam Search
• LR-Decoding integrated with beam-search
(Watanabe et al. 2006)
• Stacks: hypotheses with same number of source side
words covered
• Exhaustively generating all possible partial
hypotheses for a given stack
36
Cube pruning
• Each cube: a group of hypotheses and applicable
rules
• Cubes are fed to a priority queue which fills the
current stack
37
• Rows: hypotheses
• Columns: rules
• Rows and columns are sorted based on the scores
• Assumption: The best hypothesis is in the top left
– The next best are the
neighbours of this entry
Cube pruning
38
12.5 12.4 14.3
12.6 12.8 14.7
13.3 13.5 15.4
0.9 1.1 3.2
students have not yet 10.2 12.5
12.5
12.412.4
made
done
do
pupils have not yet 11.5
student has not 12.7
Time Efficiency: avg of LM queries
Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering.
M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 39
Watanabe et al. (2006)
Reordering Features
• LR-Hiero by (Watanabe et al. 2006) achieves ~2 BLEU
scores less than Hiero
40
Watanabe et al. (2006)
Reordering Features
• Distortion feature (when apply each rule)
• Number of reordering rules (non-terminals on source
and target side are reordered)
41
r<>= 1
r<>= 0
<X_1 发生 X_2事情 / what happens X_2 X_1>
<X_1 发生 X_2事情 / what happens X_1 X_2>
<X_1 发生 X_2事情 / what happens X_2 X_1>
的
0 1 2 3 4 5 6 7 8
我们 关注 发生 事情地区十分 非洲
d =(5-3) + (7-6) + (8-6) + (7-3) + (8-5)
Translation Quality
Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering.
M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 42
Watanabe et al. (2006)
Search Error in Cube Pruning
43
8.1 8.2 8.5
8.0 8.4 8.6
8.3 8.9 8.8
0.9 1.3 3.2
6.6
6.7
6.9
9.1 8.9 9.3
8.0 8.5 9.0
7.7 7.9 8.1
1.0 1.3 1.5
6.2
6.3
6.5
8.1
8.0 8.1
8.0
8.28.2
• Assumption: The best hypothesis is in the top left
– The next best are the neighbours of this entry
• Adding LM score violates the assumption
Search Error in Cube Pruning
44
• Assumption: The best hypothesis is in the top left
– The next best are the neighbours of this entry
• Adding LM score violates the assumption
8.1 8.2 8.5
8.0 8.4 8.6
8.3 8.9 8.8
0.9 1.3 3.2
6.6
6.7
6.9
9.1 8.9 9.3
8.0 8.5 9.0
7.7 7.9 8.1
1.0 1.3 1.5
6.2
6.3
6.5
8.08.0 8.0
8.0
7.7
7.7
Queue
diversity
Queue Diversity
Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine
Translation. M. Siahbani and A. Sarkar. EMNLP(2014) 45
23.5
24
24.5
25
25.5
26
26.5
Chinese-English
BLEU score
LR-Hiero
LR-Hiero+CP
LR-Hiero+CP
(QD=10)
0
10000
20000
30000
40000
Chinese-English
No. LM calls
LR-Hiero
LR-Hiero+CP
LR-Hiero+CP
(QD=10)
Lexicalized Reordering Model
• Distortion penalty is weak
– deviation from the monotonic translation
• Learn reordering preferences for each phrase
(respect to previous phrase)
– Monotone
– Swap
– Discontinuous
46
F
E
Figure from "Statistical Machine Translation“
Koehn 2010
Lexicalized Reordering Model
• Collect orientation information during rule extraction
– Convert each rule to a phrase-pair (possibly discontinuous)
– M: If there is a phrase-pair on the top-left
– S: If there is a phrase-pair on the top right
– D: otherwise
• Estimation by relative frequency
𝑃𝑜 𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 𝑒, 𝑓 =
𝑐𝑜𝑢𝑛𝑡(𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛,𝑒,𝑓)
𝑜 𝑐𝑜𝑢𝑛𝑡(𝑜,𝑒,𝑓)
47
F
E
Figure from "Statistical Machine Translation“
Koehn 2010

More Related Content

PDF
Building an Automated Movie Transcription and Machine Translation Platform, b...
PPTX
OpenNMT
PPTX
Adversarial learning for neural dialogue generation
PPTX
Toward Real-Time Simultaneous Translation with LLM
PPTX
Crestec - TAUS Tokyo Forum 2015
PPT
Automatic speech recognition
PPTX
Hilbert huang transform(hht)
Building an Automated Movie Transcription and Machine Translation Platform, b...
OpenNMT
Adversarial learning for neural dialogue generation
Toward Real-Time Simultaneous Translation with LLM
Crestec - TAUS Tokyo Forum 2015
Automatic speech recognition
Hilbert huang transform(hht)

More from WithTheBest (20)

PDF
Riccardo Vittoria
PPTX
Recreating history in virtual reality
PDF
Engaging and sharing your VR experience
PDF
How to survive the early days of VR as an Indie Studio
PDF
Mixed reality 101
PDF
Unlocking Human Potential with Immersive Technology
PPTX
Building your own video devices
PPTX
Maximizing performance of 3 d user generated assets in unity
PPTX
Wizdish rovr
PPTX
Haptics & amp; null space vr
PPTX
How we use vr to break the laws of physics
PPTX
The Virtual Self
PPTX
You dont have to be mad to do VR and AR ... but it helps
PDF
Omnivirt overview
PDF
VR Interactions - Jason Jerald
PDF
Japheth Funding your startup - dating the devil
PDF
Transported vr the virtual reality platform for real estate
PDF
Measuring Behavior in VR - Rob Merki Cognitive VR
PDF
Global demand for Mixed Realty (VR/AR) content is about to explode.
PDF
VR, a new technology over 40,000 years old
Riccardo Vittoria
Recreating history in virtual reality
Engaging and sharing your VR experience
How to survive the early days of VR as an Indie Studio
Mixed reality 101
Unlocking Human Potential with Immersive Technology
Building your own video devices
Maximizing performance of 3 d user generated assets in unity
Wizdish rovr
Haptics & amp; null space vr
How we use vr to break the laws of physics
The Virtual Self
You dont have to be mad to do VR and AR ... but it helps
Omnivirt overview
VR Interactions - Jason Jerald
Japheth Funding your startup - dating the devil
Transported vr the virtual reality platform for real estate
Measuring Behavior in VR - Rob Merki Cognitive VR
Global demand for Mixed Realty (VR/AR) content is about to explode.
VR, a new technology over 40,000 years old
Ad

Recently uploaded (20)

PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Encapsulation theory and applications.pdf
PDF
Hybrid model detection and classification of lung cancer
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
August Patch Tuesday
PDF
project resource management chapter-09.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
WOOl fibre morphology and structure.pdf for textiles
Enhancing emotion recognition model for a student engagement use case through...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Heart disease approach using modified random forest and particle swarm optimi...
A comparative study of natural language inference in Swahili using monolingua...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
A Presentation on Artificial Intelligence
Encapsulation theory and applications.pdf
Hybrid model detection and classification of lung cancer
SOPHOS-XG Firewall Administrator PPT.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Web App vs Mobile App What Should You Build First.pdf
August Patch Tuesday
project resource management chapter-09.pdf
Hindi spoken digit analysis for native and non-native speakers
A novel scalable deep ensemble learning framework for big data classification...
MIND Revenue Release Quarter 2 2025 Press Release
WOOl fibre morphology and structure.pdf for textiles
Ad

Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

  • 1. Left-to-Right Hierarchical Phrase- based Translation System (LR-Hiero) Maryam Siahbani
  • 2. Overview • History of Machine Translation • Rule based MT • Statistical MT – Training – Decoding • Left-to-Right Hierarchical Phrase-based MT • Using LR-Hiero in Simultaneous Translation 2
  • 3. History of Machine Translation • Late 1940’s: Early rule-based systems – computers would replace human translations within 5 years! • 1966: ALPAC report cuts research funding • Early 1970’s: First commercial system (Systran) • Late 1980’s: IBM developed first statistical models inspired by speech research • Late 2000’s: Explosion in MT research • 2006: First version of Google Translate 3
  • 4. Rule-based Machine Translation • Rules hand-written by linguists • State of the art until early 2000’s – e.g. Systran • Expensive to create maintain and adapt 4 French NP Noun chat Adjective noir English NP Noun cat Adjective black
  • 5. Statistical Machine Translation • Data driven approaches to MT • Learn translation from textual data – Parallel Data • Language independent • Normally use probabilistic models – The best translation = the most probable translation 𝑒∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑒 𝑃(𝑒|𝑓) where f: source sentence • State of the art for most language pairs – Best systems include rules (hybrid) 5
  • 6. translation model Statistical Machine Translation 6 Training Pipeline Training data Monolingual & Bilingual data Decoder Input sentence translation
  • 7. Translation Data Parallel Text: (Web, United Nations, European/Canadian Parliament, Wikipedia, etc.)
  • 8. Statistical Machine Translation (SMT) 8 Aligned Words EnZh happens 发生 事情我们十分 关注 的 we are very much concerned with what in region 地区非洲 African Learn alignment from parallel text
  • 9. Statistical Machine Translation (SMT) 9 Aligned Words EnZh Translation rules happens 发生 事情我们十分 关注 的 we are very much concerned with what in region 地区非洲 African Learn alignment from parallel text Id Source Target Weight r1 关注 X_1 concerned with X_1 -5.3 r2 X_1 发生 X_2事情 what happens X_2 X_1 -4.8 r3 非洲 地区 African region -3.1 Learn weighted translation rules from word aligned text
  • 10. Translation Rules (phrase-pairs) 10 Source Target p(e|f) den Vorschlag the proposal 0.6227 den Vorschlag ‘s proposal 0.1068 den Vorschlag a proposal 0.0341 den Vorschlag the idea 0.0250 den Vorschlag this proposal 0.0227 den Vorschlag proposal 0.0205 den Vorschlag of the proposal 0.0159 den Vorschlag the proposals 0.0159 * German-English phrase table trained on Europarl Millions of translation rules Log probability -1.7986
  • 11. translation model Statistical Machine Translation (SMT) 11   drdyee rhwfePe )(.maxarg)|(maxarg* )( Aligned Words EnZh Translation rules Decoder happens 发生 事情我们十分 关注 的 we are very much concerned with what in region 地区非洲 African Learn alignment from parallel text Id Source Target Weight r1 关注 X_1 concerned with X_1 -5.3 r2 X_1 发生 X_2事情 what happens X_2 X_1 -4.8 r3 非洲 地区 African region -3.1 Learn weighted translation rules from word aligned text Decoder generates many candidate translations, scores them and returns the most likely one Find the translation for any given input (f) f e
  • 12. Measuring Translation Quality: BLEU score • BLEU is a simple but effective scoring metric shown to be proportional to human judgment of translation quality • The idea is to measure overlap between the translation generated by MT system and the reference translation • Measure one word overlaps, two word overlaps,… (n-grams) • Compute precision score for each n-gram • Impose a brevity penalty for candidates that are shorter than reference 12
  • 13. Measuring Translation Quality: BLEU score • Input: – Ich war in meinen zwangzigern bevor ich erstmals in ein kunstmuseum ging . • Reference translation: – I was in my twenties before I ever went to an art museum . • Low BLEU score (41.1): – I was twenty I ever went to art . • High BLEU score (89.0): – I was in my twenties before I first went to an art museum . 13
  • 15. SCFG Hierarchical Phrase-based Translation Synchronous Context-Free Grammar 15 Aligned Words EnZh Translation Rules X -> <我们十分X_1 / we are very much X_1> X -> <事情 / what > 我们 十分 关注 发生 的 事情地区非洲 (Hiero) X -> <非洲 地区 / african region > we are very much X-> <关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1> concerned with happens inwhat african region X -> <我们十分X_1 / we are very much X_1> X-> <关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1> X -> <事情 / what > X -> <非洲 地区 / african region > translation model Decoder
  • 16. Hiero Decoder O(n^3) LM computation 我们 关注 发生 的 事情地区十分 非洲 。 we are very much concerned with what happens in african regions . X_2 X_1 X_2= what X -> <关注 X_1 发生 的 X_2 / concerned with X_2 happens in X_1> X_1= african region concerned with happens in what african region LM LM LM Bottom-up Dynamic Programing algorithm we are very much concerned with 16
  • 18. Left-to-Right Target Generation (Watanabe et al. 2006) 18 X1 X1 X1 we are very much concerned with X2what happens X1 in african region X1 X1 X1 我们十分 关注 X2发生X1 的非洲 地区 发生 的我们 关注 发生 事情地区十分 非洲 we are very muchconcerned with what happens african regionin X -> <我们十分 X_1 / we are very much X_1> X -> <X_1 发生 X_2事情 / what happens X_2 X_1> X -> < 关注 X_1 / concerned with X_1> X -> <X_1 发生 的 X_2 / X_2 happens in X_1>Non-GNF Greibach Normal Form (GNF)
  • 19. • Search for sub-phrases within larger ones – Smaller phrases are replaced by non-terminal X • Dynamic programming algorithm to extract rules for LR- – Linear time complexity (in number of rules) LR-Hiero Rule Extraction 19 <我们十分X_1 / we are very much X_1> 事情 happens 发生我们十分 关注 的 we are very much concerned with what in region 地区非洲 AfricanX_1 X_1
  • 20. • Search for sub-phrases within larger ones – Smaller phrases are replaced by non-terminal X • A novel Dynamic programming algorithm to extract rules for LR-Hiero – Linear time complexity vs. exhaustive search LR-Hiero Rule Extraction 20 <我们十分X_1 / we are very much X_1> 事情 happens 发生我们十分 关注 的 we are very much concerned with what in region 地区非洲 African X2X_1 < X_1 发生 X_2事情 / what happens X_2 X_1> X2 X_1
  • 21. • Linear time complexity vs. exhaustive search • Can easily extract rules with more non-terminals LR-Hiero Rule Extraction 21 0 1000 2000 3000 4000 1 2 3 4 Time(sec.) No. of Non-terminals Effect of No. of Non-terminals on extraction time Hiero Heuristic DP Extractor Expressive Hierarchical Rule Extraction for Left-to-Right Translation. M. Siahbani and A. Sarkar. AMTA(2014)
  • 22. 的 Left-to-Right Decoding X -> <我们十分 X_1 / we are very much X_1> X -> <X_1 发生 X_2事情 / what happens X_2 X_1> X -> <非洲 地区 / African region > <s> [0,8] <s> <s> we are very much <s> we are very much concerned with <s> we are very much concerned with what happens <s> we are very much concerned with what happens in 0 1 2 3 4 5 6 7 8 我们 关注 发生 事情地区十分 非洲 X -> < 关注 X_1 / concerned with X_1> X -> <的 / in > we are very much[2,8] concerned with[3,8] what happens[6,7] [3,5] in [3,5] African region 22
  • 23. 的 Left-to-Right Decoding <s> [0,8] <s> we are very much [2,8] <s> we are very much concerned with [3,8] <s> we are very much concerned with what happens [6,7][3.5] <s> we are very much concerned with what happens in [3,5] <s> we are very much concerned with what happens in African region 0 1 2 3 4 5 6 7 8 我们 关注 发生 事情地区十分 非洲 𝑶(𝒏 𝟐 ) Typical CKY: 𝑶(𝒏 𝟑 ) 23   drdyt rfwt )(.maxarg* )( Candidate translations are scored by: <我们十分 X_1 / we are very much X_1>, -4.7 <X_1 发生 X_2事情 / what happens X_2 X_1>, -3.6 <非洲 地区 / African region >, -2.7 < 关注 X_1 / concerned with X_1>, -3.8 <的 / in >, -1.2 , -7.7 , -7.1 , -5.9 , -4.5 , -3.3 , 0
  • 24. LR-Hiero State-of-the-art 17 19 21 23 25 27 29 0 2000 4000 6000 8000 BLEU(translationaccuracy) LM Calls (translation time) Czech-English German-English Chinese-English LR-Hiero Results 3 Times Faster Comparable Translation Accuracy
  • 25. Statistical Machine Translation (SMT) • Available SMT systems: – Moses (Edinburgh) – Phrasal (Stanford) – Jane 2 (Aachen University) – Joshua (JHU) – Kriya (SFU) – CDEC (CMU) – LR-Hiero Phrase-Based Hierarchical Phrase-Based (Hiero) Left-to-Right Hierarchical Phrase-based Available : https://github.com/sfu- natlang/lrhiero • Time efficient • Can model complex translation • Generates translation in left-to-right manner • Suitable choice for online translation
  • 27. Speech to Speech Translation Karlsruhe (KIT) Lecture Translator NICT Speech Translator Skype Translator
  • 28. Incremental Translation • Facilitate continuous translation with low latency – Latency: time difference between start of source sentence (speech) and start of target sentence (speech) • Ensure acceptable translation accuracy Good evening, I would like a taxi to the airport please Buenas noches. Quiero un taxi al aeropuerto por favor 6 sec Good evening, I would 0.7 sec 0.2 sec 0.2 sec like a taxi to the airport please Non-incremental Buenas noches quiero como un taxi al aeropuerto por favor Incremental
  • 31. Integrating Segmentation with Translation Process segment? Good eveningI Buenas nochestranslate
  • 32. Incremental Translation Results Translation accuracy measure • Task: English-German TED speech translation • MT System Training Data: IWSLT 2013 Train data + Europarl v7 data [Koehn 2005] Bleu Latency (sec) Segs/Second Non-incremental 21.08 6.353 0.15 Prosodic 20.88 0.468 2.27 Incremental 20.86 0.311 3.22
  • 33. Publications 33 • Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. Siahbani, Maryam and Sankaran, Baskaran and Sarkar, Anoop. EMNLP(2014) • Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation. Siahbani, Maryam and Sarkar, Anoop. EMNLP(2014) • Expressive Hierarchical Rule Extraction for Left-to-Right Translation. Siahbani, Maryam and Sarkar, Anoop. AMTA(2014) • Incremental Translation using a Hierarchical Phrase-based Translation System. Siahbani, Maryam and Mehdizadeh Seraj, Ramtin and Sankaran, Baskaran and Sarkar, Anoop. SLT (2014)complexity (in number of rules)
  • 35. Partial Hypothesis <s> [0,8], -3.3 <s> we are very much [2,8], -4.5 的 0 1 2 3 4 5 6 7 8 我们 关注 发生 事情地区十分 非洲 <s> we are very much concerned with [3,8], -5.9 <s> we are very much concerned with what happens [6,7][3,5], -7.1
  • 36. LR-Decoding with Beam Search • LR-Decoding integrated with beam-search (Watanabe et al. 2006) • Stacks: hypotheses with same number of source side words covered • Exhaustively generating all possible partial hypotheses for a given stack 36
  • 37. Cube pruning • Each cube: a group of hypotheses and applicable rules • Cubes are fed to a priority queue which fills the current stack 37
  • 38. • Rows: hypotheses • Columns: rules • Rows and columns are sorted based on the scores • Assumption: The best hypothesis is in the top left – The next best are the neighbours of this entry Cube pruning 38 12.5 12.4 14.3 12.6 12.8 14.7 13.3 13.5 15.4 0.9 1.1 3.2 students have not yet 10.2 12.5 12.5 12.412.4 made done do pupils have not yet 11.5 student has not 12.7
  • 39. Time Efficiency: avg of LM queries Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 39 Watanabe et al. (2006)
  • 40. Reordering Features • LR-Hiero by (Watanabe et al. 2006) achieves ~2 BLEU scores less than Hiero 40 Watanabe et al. (2006)
  • 41. Reordering Features • Distortion feature (when apply each rule) • Number of reordering rules (non-terminals on source and target side are reordered) 41 r<>= 1 r<>= 0 <X_1 发生 X_2事情 / what happens X_2 X_1> <X_1 发生 X_2事情 / what happens X_1 X_2> <X_1 发生 X_2事情 / what happens X_2 X_1> 的 0 1 2 3 4 5 6 7 8 我们 关注 发生 事情地区十分 非洲 d =(5-3) + (7-6) + (8-6) + (7-3) + (8-5)
  • 42. Translation Quality Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 42 Watanabe et al. (2006)
  • 43. Search Error in Cube Pruning 43 8.1 8.2 8.5 8.0 8.4 8.6 8.3 8.9 8.8 0.9 1.3 3.2 6.6 6.7 6.9 9.1 8.9 9.3 8.0 8.5 9.0 7.7 7.9 8.1 1.0 1.3 1.5 6.2 6.3 6.5 8.1 8.0 8.1 8.0 8.28.2 • Assumption: The best hypothesis is in the top left – The next best are the neighbours of this entry • Adding LM score violates the assumption
  • 44. Search Error in Cube Pruning 44 • Assumption: The best hypothesis is in the top left – The next best are the neighbours of this entry • Adding LM score violates the assumption 8.1 8.2 8.5 8.0 8.4 8.6 8.3 8.9 8.8 0.9 1.3 3.2 6.6 6.7 6.9 9.1 8.9 9.3 8.0 8.5 9.0 7.7 7.9 8.1 1.0 1.3 1.5 6.2 6.3 6.5 8.08.0 8.0 8.0 7.7 7.7 Queue diversity
  • 45. Queue Diversity Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation. M. Siahbani and A. Sarkar. EMNLP(2014) 45 23.5 24 24.5 25 25.5 26 26.5 Chinese-English BLEU score LR-Hiero LR-Hiero+CP LR-Hiero+CP (QD=10) 0 10000 20000 30000 40000 Chinese-English No. LM calls LR-Hiero LR-Hiero+CP LR-Hiero+CP (QD=10)
  • 46. Lexicalized Reordering Model • Distortion penalty is weak – deviation from the monotonic translation • Learn reordering preferences for each phrase (respect to previous phrase) – Monotone – Swap – Discontinuous 46 F E Figure from "Statistical Machine Translation“ Koehn 2010
  • 47. Lexicalized Reordering Model • Collect orientation information during rule extraction – Convert each rule to a phrase-pair (possibly discontinuous) – M: If there is a phrase-pair on the top-left – S: If there is a phrase-pair on the top right – D: otherwise • Estimation by relative frequency 𝑃𝑜 𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 𝑒, 𝑓 = 𝑐𝑜𝑢𝑛𝑡(𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛,𝑒,𝑓) 𝑜 𝑐𝑜𝑢𝑛𝑡(𝑜,𝑒,𝑓) 47 F E Figure from "Statistical Machine Translation“ Koehn 2010

Editor's Notes

  • #9: In Statistical Machine Translation We are basically looking for a translation sentence e which maximizes the probability of e given source sentence f. Statistical approaches to Machine Translation have achieved impressive performance by leveraging large amounts of parallel corpora. However, such data are available only for a few dozen language pairs in limited domains #Currently we just have parallel data for a few language pairs like: French-English, Arabic-English, and so on. But we have more than 5000 languages spoken by people on the world. And we do not have parallel data between most of them.
  • #10: In Statistical Machine Translation We are basically looking for a translation sentence e which maximizes the probability of e given source sentence f. Statistical approaches to Machine Translation have achieved impressive performance by leveraging large amounts of parallel corpora. However, such data are available only for a few dozen language pairs in limited domains #Currently we just have parallel data for a few language pairs like: French-English, Arabic-English, and so on. But we have more than 5000 languages spoken by people on the world. And we do not have parallel data between most of them.
  • #12: In Statistical Machine Translation We are basically looking for a translation sentence e which maximizes the probability of e given source sentence f. Statistical approaches to Machine Translation have achieved impressive performance by leveraging large amounts of parallel corpora. However, such data are available only for a few dozen language pairs in limited domains #Currently we just have parallel data for a few language pairs like: French-English, Arabic-English, and so on. But we have more than 5000 languages spoken by people on the world. And we do not have parallel data between most of them.
  • #20: - Hiero uses a simple rule extraction algorithm based on word alignments to avoid excessively large grammars, they apply constraints on length of phrase-pairs and rule configuration Assumes unit count for phrase-pairs Uniformly distributes the fractional count to all rules extracted from the phrase-pair
  • #21: - Hiero uses a simple rule extraction algorithm based on word alignments to avoid excessively large grammars, they apply constraints on length of phrase-pairs and rule configuration Assumes unit count for phrase-pairs Uniformly distributes the fractional count to all rules extracted from the phrase-pair
  • #23: Left-to-right decoding is a potential alternative. It is a Early style decoder which generate the target side in left-to-right order. Each partial hypothesis consists of a partial translation and a sequence of uncovered spans on source side. It is a faster decoder compare to CKY,
  • #24: Left-to-right decoding is a potential alternative. It is a Early style decoder which generate the target side in left-to-right order. Each partial hypothesis consists of a partial translation and a sequence of uncovered spans on source side. It is a faster decoder compare to CKY,
  • #29: In incremental translation we need to optimize two criteria, Facilitate continuous translation with low latency Latency: time difference between start of source language (speech) and start of target language (speech) Ensure acceptable translation accuracy