A statistical approach to machine translation

A Statistical Approach to
Machine Translation
Peter F. Brown, John Cocke, Stephen A. Della Pietra,
Vincent J. Della Pietra, Fredrick Jeinek, John D. Lafferty,
Robert L. Mercer, and Paul S. Roossin

Published in:
· Journal
Computational Linguistics archive
Volume 16 Issue 2, June 1990
Pages 79-85
MIT Press Cambridge, MA, USA

Introduction
• (S, T): Every pair of sentences
• Pr(T|S): probability of T when presented with S
• Given a sentence T in the target language, seek
the sentence S from which the translator
produced T.
• Chance of error is minimized by choosing the
most probable sentence S given T
– In other words, to maximize Pr(S|T)
– Pr(S|T) = Pr(S) Pr(T|S) / Pr(T) : Bayes’ theorem
– Pr(S): language model probability of S
– Pr(T|S): translation probability of T given S

• Statistical Translation System requires:
– A method for computing language model
probabilities
– A method for computing translation probabilities
– A method for searching among possible source
sentences S for the one that gives the greatest
value for Pr(S)Pr(T|S)

The Language Model
• a word string S1, S2, … Sn
• Pr(S1, S2, … Sn)
= Pr(S1) Pr(S2|S1)…Pr(Sn|S1S2…Sn-1)
• Given a history S1S2…Sj-1, you must know the
probability of object word Sj
• Histories are too long, probability parameters cannot
be separated
• To reduce parameters:
– Categorize each history into same class
– Probability of object depends on the history in same class

The Translation Model
• A word can be translated into more than one word
– Fertility: num. of T words that an S word produces
• Notation for alignment:
– (Jean aime Marie | John(1) loves(2) Mary(3) )
• ( T | S)
• Num. in S words are positions in T
• Computing the probability of the alignment:
– (Le chien est battu par Jean | John(6) does beat(3, 4)
the(1) dog(2))

The Translation Model
• In English adjectives precede nouns, in French
adj. follows nouns
– Distortion: T word appears far from S word in
alignment
• Distortion probability:
– Pr(i|j, l)
• i: a target position
• J: a source position
• l: the target length

Searching
• Searching for the sentence S to maximize
Pr(S)Pr(T|S)
• Uses stack search
– A list of partial alignment hypothesis
– (Jean aime Marie| * )
• *: a place holder for unknown sequence of S
– In iteration, extends most promising entries and adds
to its hypothesis
– Ends when complete alignment is the most promising
significantly

Parameter Estimation
• Both LM and TM have many parameters
• To estimate, needs pairs of translations
• For this experiment, they used Canadian
parliament’s records translated in
English/French
– Param. & LM/TM can be estimated from this

Two Pilot Experiments
• First experiment: to estimate params for TM
• 9000 most common words in English/French
from Hansard data

Two Pilot Experiment
• Second experiment:
– French to English
– 1,000 most frequently used English words
– 1,700 most frequently used French words ( covered by the 1k
English words)
– Estimated 17 million parameters of translation model from
117,000 pairs of sentences
– Est. bigram language model from 570,000 sentences from the
English part of Hansard
– Evaluation:
• Exact: Decoded sentence was exactly the same
• Alternate: same meaning, slightly different words
• Different: not covey the same meaning as the translation
• Wrong: makes a sense but not interpreted
• Ungrammatical: grammatically deficient

A statistical approach to machine translation

More Related Content

Viewers also liked (20)

Similar to A statistical approach to machine translation (20)

More from Hiroshi Matsumoto (17)

Recently uploaded (20)

A statistical approach to machine translation