Experiments with Different Models of Statistcial Machine Translation

Experiment With Different Models Of
Statistical Machine Translation
Submitted by-
Khyati gupta(14483)
Rakhi Sharma(14514)
Project Presentation
ON

Contents
 Problem Statement
 Objective
 About the project
 Flow chart
 Work done
 Conclusion
 Future work
 Reference

Problem Statement
• Machine Translation is quite popular in research field since 1990’s.
• But little work has been done in Indian Languages as the current state-of-the-
art is quite bleak due to sparse data resources.
• The success of an SMT is dependent on the availability of a large parallel
corpus.
• Such a data is necessary to reliably estimate translation probabilities.
• We have worked on Hindi to English Translation.

Objective
The objectives of our thesis is-
• Work on Different models of Statistical Machine Translation..
• Report the result obtained
• The SMT models studied are-
SMT
TREE
HIERARCHICAL SYNTAX
STRING
PHRASE

Introduction
What is Translation
Process of converting text from one language to another, so that the
original message is retained in target language.
Source Language = language whose text is to be translated.
Target Language = language in which the text is translated.
What is machine translation?
Machine translation is automated translation or “translation carried out by
a computer.” It is a process, sometimes referred to as Natural Language
Processing which uses a bilingual data set and other language assets to
build language and phrase models used to translate text from source
language to another language.

About the Project
• Study the basics of SMT
• Installation of Moses, IRSTLM and MGIZA.
• Study various models of SMT like phrase, syntax, hierarchical
model
• Creation of parallel Corpus
• Experiment translation from Hindi to English using different
models of SMT.
• Conversion of Parser’s output into Moses format .
• Find out result on the basis of Score obtained .
• Evaluate the best model of SMT for a given corpus.

Bayesian Approach
• We apply Bayesian approach for this-
• Language model(LM):assigns a probability to any target string
of words {P(e)}
• an LM probability distribution over strings S that attempts to
reflect how frequently a string S occurs as a sentence.
• Translation model(TM): assigns a probability to any pair of
target and source strings {P(f|e)}
• Decoder: determines translation based on probabilities of LM &
TM
argmaxe p(e|f) = argmaxep(f|e) p(e)

Language Model
• A simple model of language Computes a probability of the sentence.
• Goal of the Language Model: Detect good English.
• SMT uses n-gram approach to computing probability of LM.
• A sentence is composed of product of conditional probability of component
words.
• Probability of a word is calculated by that word given the preceding words.
calculate
• Likelihood of sentence P(S) =P(W1)*P(W2)*….. *P(N)
= P(w1) × P(w2|w1) × … × P(wn|wn-1)
• Example illustrating bigram model-
P(the barking dog) = P(the|<start>)P(barking|the)P(dog|barking)

Translation Models
P(s|e) is called Translation model. It is used to give better scores to accurate
and complete .It is trained on bilingual Hindi-English parallel data.
Approaches for translation models are-
1. Phrase-based translation
• The sequences of words are called blocks or phrases, but typically are not linguistic
phrases, but phrasemes found using statistical methods from corpora
2 Hierarchical phrase-based translation
• . Hierarchical phrase-based translation combines the strengths of phrase-based and
syntax-based translation.
• It uses synchronous context-free grammar rules, but the grammars may be
constructed by an extension of methods for phrase-based translation without
reference to linguistically motivated syntactic constituents
3. Syntax based Model
• Syntax model works on syntactic categories of word and uses CFG grammar.

Decoding
• The task of decoding in machine translation is to find the best
scoring translation according to these formulae.
• Given a Hindi sentence f, it finds the English yield of the single
best derivation that has Hindi yield f:
• Phrase based model uses beam search algorithm.
• Tree based models use chart decoding.

Data Pre-Processing Flowchart
Bilingual Text Aligner
Optical character recognition
Convert pdf into jpeg
Sources(pdf)

Data Conversion
pdf
Convert to jpeg jpeg

Bilingual Text Alignment
(using Microsoft Aligner)

Corpus Preparation
To prepare the data for training the translation system, we have to
perform the following steps:
• Tokenisation: This means that spaces have to be inserted between
(e.g.) words and punctuation.
• Truecasing: The initial words in each sentence are converted to
their most probable casing. This helps reduce data sparsity.
• Cleaning: Long sentences and empty sentences are removed as
they can cause problems with the training pipeline, and obviously
misaligned sentences are removed.

Training in Moses
1. Prepare data
• Training data has to be provided sentence aligned in two files, one
for the foreign sentences, one for the English sentences
• The parallel corpus has to be converted into a format that is suitable
to the GIZA++ toolkit.
• Two vocabulary files are generated and the parallel corpus is
converted into a numberized format.
• The vocabulary files contain words, integer word identifiers and
word count information.
2. Run GIZA++
• GIZA++ is a freely available implementation of the IBM models.
We need it as a initial step to establish word alignments.

मेरे दोस्त के लिए पान दो
GIVE
A
BETTLE
FOR
MY
FRIEND
3. Align words
• To establish word alignments based on the two GIZA++ alignments, a
number of heuristics may be applied.
4. Get lexical translation table
Estimate a maximum likelihood lexical translation table.
We estimate the w(e|f) as well as the inversew(f|e) word translation table.

5. Extract phrases -all phrases are dumped into one big file

6. Score phrases -estimate the phrase translation probability (ejf)
जहानाबाद *दरभंगा ||| darbhanga* navada* ||| 1 1 1 1 ||| 0-0 1-1 ||| 1 1
7. Build lexicalized reordering model
Moses use lexicalized reordering models for reordering.
8. Build generation models-
The generation model is build from the target side of the parallel corpus.
9. Create Configuration File-
As a final step, a configuration file for the decoder is generated with all
the correct paths for the generated model and a number of default
parameter settings

Tuning
• Once training is over, the parameters of the log-linear model have
to be tuned to avoid over fitting on training data produce the most
desirable translation on any test set. This process is called tuning.
The basic assumption behind tuning is that the model must be
tuned according to the evaluation techniques.
• That’s why tuning technique is known as Maximum Error rate
training.

1.Working of Phrase based Model
•The Hindi sentence is first broken down into phrases based on statistics
drawn from parallel corpora.
•Then these Hindi phrases are translated into English phrases.
•Translated English phrases are reordered.

2.Working of Hierarchical Model
• ALL the phases performed by Moses in hierarchal model are same as
phrase passed model but the rule extraction of hierarchal model is differ
from phrase based SMT.
It include -
 Data Preparation
• Tokenization
• True casing
• Cleaning
 Training
• word alignment
• rule extraction
• Glue rule
• Extract phrase with phrase extraction table
• Reordering Model
• Language Modelling
 Decoding
 Tuning
 Blue Score

Advantage of Hierarchical Model
• Hierarchical MT replace redundant rule used in phrase based MT
into single rule.
• It also overcome the problem of other model it does not require
annotated corpora at all or automatically generate it.
• We are working on Hindi to English translation
English already have annotated data and Hindi will be
automatically annotated by hierarchical model .
• The grammar used correction in known as synchronous context free
grammar.

Synchronous Context Free Grammar
• SCFG is a kind of context free grammar that generates pair of
strings.
• Example:- S -> (I, में )
• This rule translates ’I’ in English to में in Hindi.
• This rule consists of terminals only but rules may consist of
terminals and non-terminals as described below.
• VP ->(V1 NP2, NP2 V1 )

Rule Extraction with SCFG
• Hierarchical model not only reduces the size of a grammar.
 It also uses the same rules for parsing as well as translation.
Steps performed in rule extraction
• In hierarchical Model intervening words can be separated. these
are replace by non-terminal X.
• Synchronization is required between sub-phrases This model does
not require parser at the Hindi side because all phrase are labelled
as X.

This allow us to build useful translation rule such as
X- ( X1 kA X2 , X2 of X1 )
• Some examples
• भारत का प्रधान मंत्री- ->
Prime Minister of India
• जापान का प्रधान मंत्री- ->
Prime Minister of Japan
• चीन का वित्त मंत्री- -> Finance
Minister of China
• भारत का राष्ट्रीय पक्षी-> National
bird of India
• Phrase based model memorises
all these phrases, but essentially
all phrases have the same
structure i.e.
• where X1 is prime minister or
“प्रधान मंत्री” X2 is India or
“भारत”

GLUE RULE
• Glue rules facilitate the concatenation of two trees originating from the
same Nonterminal. Here are the two glue rules.
• S-S1 X2, S1 X2
• S- X1, X1
• These two rules in conjunction can be used to concatenate discontigous
phrases. So, input to the system is a sentence in hindi and a set of SCFG
rules extracted from training set..
• To avoid ruleset of unmanageable size and reduce decoding
complexity, we typically set limits on possible rule
• At most 2 non-terminal symbol
• At least one but at most 5 words/language
• Span at most 15 words

3.Working of Syntax Model
• Earlier models did not include any linguistic information on trained data
which produced grammatically incoherent output.
• The persistence of reordering problem in translated text led to development
of syntax based model. In this model Moses is trained on syntactic phrases
on Target side.
• Syntactic information includes root word, word class, POS category. We
have syntactic parsing on English language in our work.
ADVANTAGES
• Since Hindi is syntactically divergent language, this model overcomes the
reordering problem faced in phrase based and hierarchical based model.
• Syntax based MT performs well in case of structural divergent language.
Hindi observes SOV structure while English observes SVO structure.
• This model improves the resultant sentence grammatically.

MODEL
VB
PRP VB1 VB2
He adores VB TO
Listening TO
To MN
Music
VB
PRP VB2 VB1
He TO VB adores
TO MN Listening
to music
REORDERING
Cont.
…..

VB
PRP VB2 VB1
He TO VB करत adores ह
TO MN Listening क
to music
Insertion
VB
PRP VB2 VB1
िह TO VB करत यार ह
TO MN नन क
क ंगीत
Translation
िह
ंगीत
नन क
यार
करत ह

Working
• The string-to-tree model accepts a Hindi string as input and seeks across multiple
parsed English trees and finds the highest scoring tree.
• Input is a string- व्यक्तिगि जीवन
• Translation Rules-
• [SYM][X] personal [NN][X] [FRAG] ||| [SYM][X] व्यक्ततगत [NN][X] [X] |||
0.0326378 0.6 0.0652757 1 ||| 0-0 1-1 2-2 ||| 0.285714 0.142857 0.142857 |||
• [SYM][X] personal life [FRAG] ||| [SYM][X] व्यक्ततगत जीिन [X] ||| 0.0326378
0.385714 0.0652757 0.6 ||| 0-0 1-1 2-2 ||| 0.285714 0.142857 0.142857 |||
• [SYM][X] personal life [TOP] ||| [SYM][X] व्यक्ततगत जीिन [X] ||| 0.0326378
0.385714 0.0652757 0.6 ||| 0-0 1-1 2-2 ||| 0.285714 0.142857 0.142857 |||
• Decoding by Translation Rules-
• [0..3]: [3..3]=</s> [0..2]=S : S ->S -> S </s> :0-0 : c=0 core=(0,-1,1,0,0,0,0,0,0)
0core=(0,-4,6,-11.5445,-5.99562,-7.46699,-1.60944,1.99979,-16.0431)
• [0..1]: [1..1]= X [0..0]=S : S ->S -> S X :0-0 1-1 : c=0 core=(0,-
0,1,0,0,0,0,0.999896,0) 0core=(0,-2,3,-3.35156,-0.916291,-2.43527,0,0.999896,-
7.74303)
• [0..0]: [0..0]=<s> : S ->S -> <s> :: c=0 core=(0,-1,1,0,0,0,0,0,0) 0core=(0,-
1,1,0,0,0,0,0,0)
• [1..1]: [1..1]=personel : X ->X -> व्यक्ततगत :: c=0 core=(0,-1,1,-3.35156,-
0.916291,-2.43527,0,0,0) 0core=(0,-1,1,-3.35156,-0.916291,-2.43527,0,0,-
9.44562)

• The target tree it produces is
• Output is a string- personal life
(TOP <s> (S (NP personal) (NP (NN life)))) </s>)

4.Working of Hybrid Translation
• The main disadvantage in Statistical Machine Translation (SMT) is
that it only translates phrases which were seen during training.
• Unseen phrases such as named entities are not translated .
• This leads to low bleu score .We can improve bleu score by
translating named entities from external source.
Working
Preprocessing
Translation by
Moses Decoder
Postprocessing

आपको नए <n translation=monastery >आश्रम</n> के ननममाण के लिए ककिने धन की आवश्यकिम है
Preprocessing of Data-
Moses accept data in following format for hybrid translation-
Translation by Moses Decoder-
We translate normally using Moses decoder which is trained on our data. The translation using Moses decoder is-
How much money you need for the construction of the new आश्रम??
Here word आश्रम is left untranslated.
Post processing-
The untranslated word can be translated by referring the xml tags. The output obtained is-
How much money youo need fr the construction of the new monastery?

Result of Hybrid Translation
• Exclusive Only the XML-specified translation is used for the
input phrase. Any phrases from the phrase table that overlap
with that span are ignored.
• Inclusive The XML-specified translation competes with all
the phrase table choices for that span.
• Ignore The XML-specified translation is ignored completely.
Xml-exclusive: 7.21
Xml-inclusive 7.36
Xml-ignore 6.18

Syntax Model Parsing Extended
BERKELEY PARSER
We have used Berkeley parser for parsing
English language in our project. Since we
had parser for English language so we
trained our system on string-to-tree and
tree-to-string.
Input -Economic Services
ENJU PARSER
With a wide-coverage probabilistic HPSG
grammar and an efficient parsing
algorithm, this parser can effectively
analyze syntactic/semantic structures of
English sentences and provide a user with
phrase structures and predicate-argument
structures.

Motivation
• Moses accepts data for training syntax model in XML format.
• <tree label="NP"> <tree label="DET"> the </tree> <tree label="NN"> cat </tree>
</tree>
• There are a number of parsers available for parsing. Each parser has its own
idiosyncratic input and output format. Hence, we need to process the output of these
parser in the format compatible with Moses for syntax model. There are 3 wrapper
scripts available in Moses decoder /scripts/training/wrapper for converting the parser
output into Moses format. These are-
• Parse-en-collins.perl – This script is used with Collins parser available from MIT.
• Parse-de-bitpar.perl – This script is used with Bitapar parser available from
University of Munich.
• Parse-de-berkeley- This script is used with Berkeley parser available from UC
Berkeley.
• We used Enju parser for our experiment we were motivated to write a wrapper
script for this purpose.
• Hence we wrote a wrapper script to convert Enju parser output to Moses format
compatible for syntax trees.

Format Conversion-
 We designed a program to
convert XML output of Enju
parser to Moses compatible
XML format.
But Enju and Penn Tree Bank
have different syntactic
categories.
 Because the output of Enju is
based on HPSG and it is
different from the annotation
policy of PTB, tree structures
and/or syntactic categories are
often different from those given
by the PTB-style annotation.
However, these mappings
provide a clear image of what
Enju expresses. So we mapped
Enju categories to PTB style for
our experiment.

Steps-
1. For every <sentence> tag , form a output string by adding <tree label =”TOP”>
2. For every <cons> tag
i. Retrieve its CAT value ($CAT_VALUE).
ii. Retrieve its XCAT value ($XCAT_VALUE).
iii. If the XCAT value of the CONS element is non-empty:
iv. Find the corresponding POS tag by comparing it with the mapping table.
v. Add new tree tag to the given output string by adding <tree label=”CONS_POS”>
where CONS_POS is the POS category derived from mapping table.
3. For every <tok> tag
i. Retrieve its POS value ($POS_VALUE).
ii. Add new tree tag to the given output string by adding <tree label=”POS”> where POS
is the POS category derived from POS attribute from tok tag.
4. For every closing </sentence> tag, add new closing </tree> tag.
5. For every closing </cons> tag, add new closing </tree> tag.
6. For every closing </tok> tag, add new closing </tree> tag.
7. All unnecessary attributes are omitted.

Challenges-
• The deep syntactic parser we used was Enju5 (Miyao and Tsujii, 2005),
which is based on HPSG and outputs both (dependency-like) predicate-
argument relations (Miyao, 2007) and phrase structure trees (although
these do not follow the PTB scheme for phrase structure trees) in an XML
format.
• The Berkeley is a phrase structure grammar parser based on PBT
grammar.
• The output of both the parsers differ in tree structure since Enju’s
syntactic representation is richer, but still quite challenging. Enju parser
produces strictly binary trees while Berkeley parser produces binary trees.
Also the tress in the number of levels and structure.
• This made the task of converting Enju output to Moses Format difficult.
Conclusion -
• We trained syntax model on converted Enju output. There was not any
major effect on the bleu score.

Result
INTERFACE - Phrase Based Translation
• Input-य क्षत्र यमना पार कहलात ह िै य नई ददल्ली बहत पलों द्िारा
भली भांतत जड हए ह
• Output-it regions are caled yamuna par and they new delhi these are also
joined by many bridges from

Hierarchical Based
• Input-य क्षत्र यमना पार कहलात ह िै य नई ददल्ली बहत पलों द्िारा भली भांतत जड
हए ह
• Output- so these regions are caled yamuna par and they from new delhi पलों by भली भांतत जड
front are

Syntax based
 Input-य क्षत्र यमना पार कहलात ह िै य नई ददल्ली बहत पलों द्िारा भली भांतत जड
हए ह
 Output-it caled yamuna par regions are and it from new delhi of the world the very
popular from पलों by भली bridges from are

Corpus
Type Source
Gyan nidhi Downloaded from Joshua
Miscellaneous PM speech(July 2015),Budget Data(
2014),Vigyan Prashar magazine
ACL2005 Available by Cdac, Noida
Agriculture www.pib.gov.in Govt of India

Result of Comparing Models of SMT
Agriculture ACL 2005 Gyan Nidhi Misc.
Phrase 3.48 6.18 3.61 3.45
Heirarchical 3.27 13.8 4.3 5.2
Syntax ST 2.93 10.79 3.21 2.9
Syntax TS 1.2 2.3 0.9 1.5
3.48
6.18
3.61 3.453.27
13.8
4.3
5.2
2.93
10.79
3.21 2.9
1.2
2.3
0.9
1.5
0
2
4
6
8
10
12
14
16
ModelsScore
Corpus
Comparison of SMT Models
Phrase Heirarchical Syntax ST Syntax TS

Conclusion
We are developing Hindi to English translation system and comparing
the results obtained by various models. .During the course of this
project, the various models of translation had been evaluated and it is
concluded that “Hierarchical based model” is the best approach to carry
out this task. The result is verified both on the various English and
Hindi sentences corpus. The project concludes with the tasks showing
the excellent and desired result as needed. The project, at the end is
completed and successfully tested.

Future Work
We need to –
• Perform and compare results of factored model on Moses.
• Find and replace OOV words.
• Compare the effect of replacing OOV words on blue score.
• Transliterate unknown words.
• We propose a technique “word to vec” for hybrid translation that
can automate the process of generating dictionaries and phrase
table.

References
• Statistical Phrase-Based Translation by Philipp Koehn, Franz Josef Och,
Daniel Marcu Information Sciences Institute Department of Computer Science
University of Southern California koehn@isi.edu , och@isi.edu , marcu@isi.edu
• A Hierarchical Phrase-Based Model for Statistical Machine Translation by-
David Chiang Institute for Advanced Computer Studies (UMIACS)University of
Maryland, College Park, MD 20742, USA dchiang@umiacs.umd.edu
• Philipp Koehn. 2004b. Statistical significance tests for machine translation
evaluation. In Proceedings of the 2004 Conference on Empirical Methods in
Natural Language Processing (EMNLP),
• Richard Zens and Hermann Ney. 2004. Improvements in phrase-based
statistical machine translation. In Proceedings of HLT-NAACL 2004,
• Hierarchical Phrase-Based Statistical Machine Translation System Mtech.
Project Dissertation by Bibek Behera under the guidance of Prof. Pushpak
Bhattacharyya Department of Computer Science and Engineering Indian
Institute of Technology, Bombay

• Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N.,Cowan, B., Shen, W.,
Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A.and Herbst, E. (2007). Moses: open source toolkit
for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive
Poster and Demonstration Sessions, ACL ’07, pages 177–180, Stroudsburg, PA, USA. Association for
Computational Linguistics.
 Chiang, D. (2005). A hierarchical phrase-based model for statistical machine translation. In
Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pages
263–270, Stroudsburg, PA, USA. Association for Computational Linguistics.
 Sinha, R. M. K. and Thakur, A. (2005). Machine translation of bi-lingual hindi-english (hinglish) text.
10th Machine Translation summit (MT Summit X), Phuket, Thailand, pages 149–156.Kunal Sachdeva,
Rishabh Srivastava, Sambhav Jain, Dipti Misra Sharma
 Language Technologies Research Center, International Institute of Information Technology, Hyderabad,
Hindi to English Machine Translation: Using Effective Selection in Multi-Model SMT
 Amr Ahmed and Greg Hanneman, Syntax-Based Statistical Machine Translation:A review
 Aswani, N. and Gaizauskas, R. (2005). A hybrid approach to align sentences and words in English–
Hindi parallel corpora. In Proceedings of the ACL Workshop on Building and Using Parallel Texts, pp.
57–64, Ann Arbor, Michigan. Association for Computational Linguistics.
 Jurafsky, D. and Martin, J. H. (2008). Speech and Language Processing (2nd edition). Prentice Hall

Experiments with Different Models of Statistcial Machine Translation

More Related Content

What's hot (15)

Viewers also liked (10)

Similar to Experiments with Different Models of Statistcial Machine Translation (20)

Recently uploaded (20)

Experiments with Different Models of Statistcial Machine Translation