Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Left-to-Right Hierarchical Phrase-
based Translation System
(LR-Hiero)
Maryam Siahbani

Overview
• History of Machine Translation
• Rule based MT
• Statistical MT
– Training
– Decoding
• Left-to-Right Hierarchical Phrase-based MT
• Using LR-Hiero in Simultaneous Translation
2

History of Machine Translation
• Late 1940’s: Early rule-based systems
– computers would replace human translations within
5 years!
• 1966: ALPAC report cuts research funding
• Early 1970’s: First commercial system (Systran)
• Late 1980’s: IBM developed first statistical
models inspired by speech research
• Late 2000’s: Explosion in MT research
• 2006: First version of Google Translate
3

Rule-based Machine Translation
• Rules hand-written by linguists
• State of the art until early 2000’s
– e.g. Systran
• Expensive to create maintain and adapt
4
French
NP
Noun
chat
Adjective
noir
English
NP
Noun
cat
Adjective
black

Statistical Machine Translation
• Data driven approaches to MT
• Learn translation from textual data
– Parallel Data
• Language independent
• Normally use probabilistic models
– The best translation = the most probable translation
𝑒∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑒 𝑃(𝑒|𝑓) where f: source sentence
• State of the art for most language pairs
– Best systems include rules (hybrid)
5

translation
model
Statistical Machine Translation
6
Training
Pipeline
Training data
Monolingual
& Bilingual data
Decoder
Input
sentence
translation

Translation Data
Parallel Text:
(Web, United Nations, European/Canadian Parliament,
Wikipedia, etc.)

Statistical Machine Translation (SMT)
8
Aligned Words
EnZh
happens
发生事情我们十分关注的
we are very much concerned with what in region
地区非洲
African
Learn alignment from parallel text

9
Aligned Words
EnZh
Translation rules
happens
地区非洲
African
Id Source Target Weight
r1 关注 X_1 concerned with X_1 -5.3
r2 X_1 发生 X_2事情 what happens X_2 X_1 -4.8
r3 非洲地区 African region -3.1
Learn weighted translation rules from word aligned text

Translation Rules (phrase-pairs)
10
Source Target p(e|f)
den Vorschlag the proposal 0.6227
den Vorschlag ‘s proposal 0.1068
den Vorschlag a proposal 0.0341
den Vorschlag the idea 0.0250
den Vorschlag this proposal 0.0227
den Vorschlag proposal 0.0205
den Vorschlag of the proposal 0.0159
den Vorschlag the proposals 0.0159
* German-English phrase table trained on Europarl
Millions of
translation rules
Log probability
-1.7986

translation
model
11


drdyee
rhwfePe )(.maxarg)|(maxarg*
)(
Aligned Words
EnZh
Translation rules
Decoder
happens
地区非洲
African
Id Source Target Weight
r1 关注 X_1 concerned with X_1 -5.3
r2 X_1 发生 X_2事情 what happens X_2 X_1 -4.8
r3 非洲地区 African region -3.1
Learn weighted translation rules from word aligned text
Decoder generates many candidate translations,
scores them and returns the most likely one
Find the translation for any
given input (f)
f e

Measuring Translation Quality:
BLEU score
• BLEU is a simple but effective scoring metric
shown to be proportional to human judgment of
translation quality
• The idea is to measure overlap between the
translation generated by MT system and the
reference translation
• Measure one word overlaps, two word
overlaps,… (n-grams)
• Compute precision score for each n-gram
• Impose a brevity penalty for candidates that are
shorter than reference
12

Measuring Translation Quality:
BLEU score
• Input:
– Ich war in meinen zwangzigern bevor ich erstmals in
ein kunstmuseum ging .
• Reference translation:
– I was in my twenties before I ever went to an art
museum .
• Low BLEU score (41.1):
– I was twenty I ever went to art .
• High BLEU score (89.0):
– I was in my twenties before I first went to an art
museum .
13

Hierarchical Phrase-based
Translation (Hiero)

SCFG
Hierarchical Phrase-based Translation
Synchronous Context-Free Grammar
15
Aligned Words
EnZh
Translation Rules
X -> <我们十分X_1 / we are very much X_1>
X -> <事情 / what >
我们十分关注发生的事情地区非洲
(Hiero)
X -> <非洲地区 / african region >
we are very much
X-> <关注 X_1 发生的 X_2 /concerned with X_2 happens in X_1>
concerned with happens inwhat african region
X -> <我们十分X_1 / we are very much X_1>
X-> <关注 X_1 发生的 X_2 /concerned with X_2 happens in X_1>
X -> <事情 / what >
X -> <非洲地区 / african region >
translation
model
Decoder

Hiero Decoder
O(n^3)
LM computation
我们关注发生的事情地区十分非洲。
we are very much concerned with what happens in african regions .
X_2
X_1 X_2= what
X -> <关注 X_1 发生的 X_2 / concerned with X_2 happens in X_1>
X_1= african region
concerned with happens in
what
african region
LM LM LM
Bottom-up Dynamic
Programing algorithm
we are very much concerned with
16

Left-to-Right Hierarchical
Phrase-based Translation System

Left-to-Right Target Generation
(Watanabe et al. 2006)
18
X1
X1
X1
we are very much
concerned with
X2what happens X1
in african region
X1
X1
X1
我们十分
关注
X2发生X1
的非洲地区
发生
的我们关注发生事情地区十分非洲
we are very muchconcerned with what happens african regionin
X -> <我们十分 X_1 / we are very much X_1>
X -> <X_1 发生 X_2事情 / what happens X_2 X_1>
X -> < 关注 X_1 / concerned with X_1>
X -> <X_1 发生的 X_2 / X_2 happens in X_1>Non-GNF
Greibach Normal Form
(GNF)

• Search for sub-phrases within larger ones
– Smaller phrases are replaced by non-terminal X
• Dynamic programming algorithm to extract rules
for LR-
– Linear time complexity (in number of rules)
LR-Hiero Rule Extraction
19
<我们十分X_1 / we are very much X_1>
事情
happens
发生我们十分关注的
地区非洲
AfricanX_1
X_1

• Search for sub-phrases within larger ones
– Smaller phrases are replaced by non-terminal X
• A novel Dynamic programming algorithm to extract
rules for LR-Hiero
– Linear time complexity vs. exhaustive search
20
<我们十分X_1 / we are very much X_1>
事情
happens
发生我们十分关注的
地区非洲
African
X2X_1
< X_1 发生 X_2事情 / what happens X_2 X_1>
X2 X_1

• Linear time complexity vs. exhaustive search
• Can easily extract rules with more non-terminals
21
0
1000
2000
3000
4000
1 2 3 4
Time(sec.)
No. of Non-terminals
Effect of No. of Non-terminals on
extraction time
Hiero Heuristic
DP Extractor
Expressive Hierarchical Rule Extraction for Left-to-Right Translation. M. Siahbani and A.
Sarkar. AMTA(2014)

的
Left-to-Right Decoding
X -> <我们十分 X_1 / we are very much X_1>
X -> <X_1 发生 X_2事情 / what happens X_2 X_1>
X -> <非洲地区 / African region >
<s> [0,8]
<s>
<s> we are very much
<s> we are very much concerned with
<s> we are very much concerned with what happens
<s> we are very much concerned with what happens in
0 1 2 3 4 5 6 7 8
我们关注发生事情地区十分非洲
X -> < 关注 X_1 / concerned with X_1>
X -> <的 / in >
we are very much[2,8]
concerned with[3,8]
what happens[6,7] [3,5]
in
[3,5]
African region
22

的
Left-to-Right Decoding
<s> [0,8]
<s> we are very much [2,8]
<s> we are very much concerned with [3,8]
<s> we are very much concerned with what happens [6,7][3.5]
<s> we are very much concerned with what happens in [3,5]
<s> we are very much concerned with what happens in African region
0 1 2 3 4 5 6 7 8
𝑶(𝒏 𝟐
)
Typical CKY: 𝑶(𝒏 𝟑
)
23


drdyt
rfwt )(.maxarg*
)(
Candidate translations are scored by:
<我们十分 X_1 / we are very much X_1>, -4.7
<X_1 发生 X_2事情 / what happens X_2 X_1>, -3.6
<非洲地区 / African region >, -2.7
< 关注 X_1 / concerned with X_1>, -3.8
<的 / in >, -1.2
, -7.7
, -7.1
, -5.9
, -4.5
, -3.3
, 0

LR-Hiero State-of-the-art
17
19
21
23
25
27
29
0 2000 4000 6000 8000
BLEU(translationaccuracy)
LM Calls (translation time)
Czech-English
German-English
Chinese-English
LR-Hiero Results
3 Times Faster
Comparable Translation Accuracy

• Available SMT systems:
– Moses (Edinburgh)
– Phrasal (Stanford)
– Jane 2 (Aachen University)
– Joshua (JHU)
– Kriya (SFU)
– CDEC (CMU)
– LR-Hiero
Phrase-Based
Hierarchical
Phrase-Based
(Hiero)
Left-to-Right Hierarchical
Phrase-based
Available : https://github.com/sfu-
natlang/lrhiero
• Time efficient
• Can model complex translation
• Generates translation in left-to-right
manner
• Suitable choice for online translation

Speech to Speech Translation
Karlsruhe (KIT)
Lecture Translator
NICT Speech Translator Skype Translator

Incremental Translation
• Facilitate continuous translation with low
latency
– Latency: time difference between start of source
sentence (speech) and start of target sentence
(speech)
• Ensure acceptable translation accuracy
Good evening, I would like
a taxi to the airport please
Buenas noches. Quiero un
taxi al aeropuerto por favor
6 sec
Good evening, I would
0.7 sec
0.2 sec
0.2 sec
like a taxi
to the airport please
Non-incremental
Buenas noches quiero
como un taxi
al aeropuerto por favor
Incremental

translate
segment?
Good
Integrating Segmentation with
Translation Process

segment?
Goodevening translate
Translation Process

Translation Process
segment?
Good eveningI Buenas nochestranslate

Incremental Translation Results
Translation accuracy
measure
• Task: English-German TED speech translation
• MT System Training Data: IWSLT 2013 Train data +
Europarl v7 data [Koehn 2005]
Bleu Latency (sec) Segs/Second
Non-incremental 21.08 6.353 0.15
Prosodic 20.88 0.468 2.27
Incremental 20.86 0.311 3.22

Publications
33
• Efficient Left-to-Right Hierarchical Phrase-Based Translation
with Improved Reordering. Siahbani, Maryam and
Sankaran, Baskaran and Sarkar, Anoop. EMNLP(2014)
• Two Improvements to Left-to-Right Decoding for
Hierarchical Phrase-based Machine Translation. Siahbani,
Maryam and Sarkar, Anoop. EMNLP(2014)
• Expressive Hierarchical Rule Extraction for Left-to-Right
Translation. Siahbani, Maryam and Sarkar, Anoop.
AMTA(2014)
• Incremental Translation using a Hierarchical Phrase-based
Translation System. Siahbani, Maryam and Mehdizadeh
Seraj, Ramtin and Sankaran, Baskaran and Sarkar, Anoop. SLT
(2014)complexity (in number of rules)

Partial Hypothesis
<s> [0,8], -3.3
<s> we are very much [2,8], -4.5
的
0 1 2 3 4 5 6 7 8
<s> we are very much concerned with [3,8], -5.9
<s> we are very much concerned with what happens [6,7][3,5], -7.1

LR-Decoding with Beam Search
• LR-Decoding integrated with beam-search
(Watanabe et al. 2006)
• Stacks: hypotheses with same number of source side
words covered
• Exhaustively generating all possible partial
hypotheses for a given stack
36

Cube pruning
• Each cube: a group of hypotheses and applicable
rules
• Cubes are fed to a priority queue which fills the
current stack
37

• Rows: hypotheses
• Columns: rules
• Rows and columns are sorted based on the scores
• Assumption: The best hypothesis is in the top left
– The next best are the
neighbours of this entry
Cube pruning
38
12.5 12.4 14.3
12.6 12.8 14.7
13.3 13.5 15.4
0.9 1.1 3.2
students have not yet 10.2 12.5
12.5
12.412.4
made
done
do
pupils have not yet 11.5
student has not 12.7

Time Efficiency: avg of LM queries
Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering.
M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 39
Watanabe et al. (2006)

Reordering Features
• LR-Hiero by (Watanabe et al. 2006) achieves ~2 BLEU
scores less than Hiero
40

Reordering Features
• Distortion feature (when apply each rule)
• Number of reordering rules (non-terminals on source
and target side are reordered)
41
r<>= 1
r<>= 0
<X_1 发生 X_2事情 / what happens X_2 X_1>
的
0 1 2 3 4 5 6 7 8
d =(5-3) + (7-6) + (8-6) + (7-3) + (8-5)

Translation Quality
Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering.
M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 42

Search Error in Cube Pruning
43
8.1 8.2 8.5
8.0 8.4 8.6
8.3 8.9 8.8
0.9 1.3 3.2
6.6
6.7
6.9
9.1 8.9 9.3
8.0 8.5 9.0
7.7 7.9 8.1
1.0 1.3 1.5
6.2
6.3
6.5
8.1
8.0 8.1
8.0
8.28.2
– The next best are the neighbours of this entry
• Adding LM score violates the assumption

Search Error in Cube Pruning
44
– The next best are the neighbours of this entry
• Adding LM score violates the assumption
8.1 8.2 8.5
8.0 8.4 8.6
8.3 8.9 8.8
0.9 1.3 3.2
6.6
6.7
6.9
9.1 8.9 9.3
8.0 8.5 9.0
7.7 7.9 8.1
1.0 1.3 1.5
6.2
6.3
6.5
8.08.0 8.0
8.0
7.7
7.7
Queue
diversity

Queue Diversity
Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine
Translation. M. Siahbani and A. Sarkar. EMNLP(2014) 45
23.5
24
24.5
25
25.5
26
26.5
Chinese-English
BLEU score
LR-Hiero
LR-Hiero+CP
LR-Hiero+CP
(QD=10)
0
10000
20000
30000
40000
Chinese-English
No. LM calls
LR-Hiero
LR-Hiero+CP
LR-Hiero+CP
(QD=10)

Lexicalized Reordering Model
• Distortion penalty is weak
– deviation from the monotonic translation
• Learn reordering preferences for each phrase
(respect to previous phrase)
– Monotone
– Swap
– Discontinuous
46
F
E
Figure from "Statistical Machine Translation“
Koehn 2010

Lexicalized Reordering Model
• Collect orientation information during rule extraction
– Convert each rule to a phrase-pair (possibly discontinuous)
– M: If there is a phrase-pair on the top-left
– S: If there is a phrase-pair on the top right
– D: otherwise
• Estimation by relative frequency
𝑃𝑜 𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 𝑒, 𝑓 =
𝑐𝑜𝑢𝑛𝑡(𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛,𝑒,𝑓)
𝑜 𝑐𝑜𝑢𝑛𝑡(𝑜,𝑒,𝑓)
47
F
E
Figure from "Statistical Machine Translation“
Koehn 2010

Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

More Related Content

More from WithTheBest (20)

Recently uploaded (20)

Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Editor's Notes