Linguistic Evaluation of Support Verb Construction Translations by OpenLogos and Google Translate

technology
from seed
LINGUISTIC EVALUATION
OF SUPPORT VERB CONSTRUCTIONS
BY OPENLOGOS AND GOOGLE TRANSLATE
ANABELA BARREIRO
INESC-ID
KUTZ ARRIETA
Oracle
JOHANNA MONTI
University of Sassari
WANG LING
CMU-IST
BRIGITTE ORLIAC
Logos Institute
FERNANDO BATISTA
INESC-ID, ISCTE-IUL
SUSANNE PREUß
Saarland University
ISABEL TRANCOSO
INESC-ID, IST
Language Resources and Evaluation Conference 26-31 May, Reykjavik, Iceland

• Introduction
– Towards Hybrid Machine Translation
– OpenLogos and Google Translate Models
• Evaluation Task
– Corpus and Datasets
– Quantitative Results
– Linguistic Evaluation Details
• Current Work
– Semantico-Syntactic Knowledge Integration into SMT
• Conclusions and Future Work
Outline
2

• MT GOAL
– researchers aim for robust MT systems that can produce high
quality translations
• CURRENT PROBLEMS
– translations produced by widely used MT systems still show
unfortunate errors that require significant post-editing effort
– there is lack of periodical qualitative evaluation efforts
involving MT systems of different nature
– state-of-the-art quality metrics and estimation have been
targeting human-factors tasks (post-editing time and effort),
but NOT diagnosing fine-grained linguistic errors to improve
syntactic structure and meaning
Introduction
3

• CURRENT TREND
– produce systems that combine linguistic resources and
analysis with statistical techniques that will lead to
linguistically enhancing SMT models
• OUR MOTIVATION
– belief that an effective method to advance MT research is to
bring different approaches together, comparing them and
measuring which modules need improvement
– to our knowledge, no major effort has been made to combine
the strengths of different MT approaches with the purpose of
overcoming known weaknesses on the basis of a joint
linguistic evaluation of those weaknesses
Introduction
4

• OUR GOALS
– advance hybrid MT, starting by understanding different
approaches, their weaknesses and strengths
– perform a systematic fine-grained linguistic analysis of the
performance of individual models
– The first exercise to achieve our goals is to evaluate the
performance of RBMT and SMT when dealing with a very
specific linguistic phenomenon: support verb constructions
Introduction
5

• A current trend in MT research is the creation of HMT models
that combine linguistic knowledge with statistical techniques
• HMT systems attempt to combine RBMT systems [Scott, 2003]
with data-driven MT systems, such as phrase-based SMT
[Koehn, 2007]
• System combination often leads to improvements in
translation quality, as different systems tend to address
different translation challenges
• it is still not obvious which HMT approach will be the most
efficient one and will lead to higher quality translation in the
long run
Towards Hybrid Machine Translation
6

• SMT models learn generalizations of the translation process
using parallel corpora
– they tend to perform better than RBMT when parallel corpora
is abundant (English-Mandarin)
– when parallel corpora is scarce (Spanish-Basque), they have
insufficient data to learn generalizations [Labaka, 2007]
• Morphologically rich languages require more data to learn
accurate translations
– SMT models for morphologically rich languages have been
proposed [Chahuneau, 2013]
– RBMT systems with manually-encoded morphology are an
alternative for resource-poor languages
7

• Some methods to combine RBMT with SMT:
– combine the translations of the same text by two different systems
[Eisele, 2008] [Heafield, 2011]
– use data-driven techniques to improve RBMT systems
[Eisele, 2008] uses phrase pair extraction in phrase-based SMT to
extract phrasal translations used to improve the coverage of a
RBMT system
– a similar method using example-based MT for the same end has
been proposed [Sanchez, 2009]
– use statistical post-editing methods to improve RBMT translation
quality [Elming, 2006] [Simard, 2007] [Dugast, 2007] [Terumasa,
2007]
– use RBMT systems to enhance data-driven approaches. [Shirai,
1997] uses an example-based MT system [Brown, 1996] to create
an initial translation template, and a RBMT system to translate
individual words and phrases according to this template
8

• is an open source copy of the commercial Logos System
• addresses morphology, syntax, and semantics, has robust
parsers, sets of semantico-syntactic rules, terminology sets
and tools
• pattern-based methodology
– closer in spirit to the SMT approach with the advantage of
including semantic knowledge/understanding
• uses an intermediate language (SAL) to encode linguistic
information and process text
– SAL contributes to OpenLogos (OL) high quality translation
and lessens one of the main problems in SMT (the sparseness
in linguistic examples)
• its linguistic knowledge databases have not been developed
for over 10 years
The OpenLogos Model
9

• one of the most widely used online MT systems
• this SMT system benefits from the large amount of parallel
data that Google collects from the web
– in March 2014, it was set to account for 80 language pairs
• translation quality is highly dependent on the language pair,
producing better results for close language pairs (Portuguese
and Spanish) and languages for which large amounts of
parallel data are available
• closed system, however, no knowledge of semantic
understanding is known to exist in Google Translate (GT)
The Google Translate Model
10

• sentences containing 100 support verb constructions (SVC)
extracted from the news and Internet
• SVC - multiword or complex predicate formed by a
semantically weak verb, and a predicate
noun/adjective/adverb [Barreiro, 2008]
– make a presentation
support verb make + predicate noun presentation
– make it simple
support verb make + predicate adjective simple
Evaluation Task: Corpus
11

• Why SVC?
– studied systematically within the Lexicon-Grammar Theory
• the scientific study of SVC eliminates subjectivity concerns
for the evaluation task
– occur abundantly in texts
– recognized and processed computationally
• in general and specific-purpose corpora
• for several languages
– most MT systems still fail at addressing the compositional
aspect of multiword units
• when translated incorrectly, SVC have a negative impact in
the understandability and quality of translations
12

• Why SVC?
– SVC can be non-contiguous (the individual elements that
compose the unit are placed apart in the sentence), with a
smaller or greater number of inserts
• An insert is any word in between elements of the multiword
other than an article before a predicate noun
• we are taking a growing interest in
– non-contiguous SVC are extremely difficult to align in SMT,
remaining one of the key cross-language challenges for MT
13

Support Verb Constructions Types in Our
Corpus
14
Nominal Support Verb Construction (NSVC)
make a presentation
Adjectival Support Verb Construction (ADJSVC)
be meaningful
Contiguous nominal (NON-CONT NSVC)
have [ADV+ADJ-particularly good] links
Prepositional nominal (PREPNSVC)
give an illustration of
Non-contiguous prepositional nominal (NON-CONT PREPNSVC)
be the [ADJ-immediate] cause of
Idiomatic nominal (IDIOM NSVC)
set in motion, place at risk, go on strike
Idiomatic prepositional nominal (IDIOM PREPNSVC)
earn an income of
Non-contiguous idiomatic nominal (NON-CONT IDIOM NSVC)
hold [NP-the option] in place, be of [ADJ-practical] value
Non-contiguous idiomatic prepositional nominal (NON-CONT IDIOM PREPNSVC)
give [PRO-us] a [bird’s-eye] view of, be [ADV-clearly] at odds with, open talks [May 14] with

Support Verb Constructions Types in Our
Corpus
15
Nominal Support Verb Construction (NSVC)
make a presentation
Adjectival Support Verb Construction (ADJSVC)
be meaningful
Non-contiguous adjectival (NON-CONT ADJSVC)
be [ADV-extremely] selective
Prepositional adjectival (PREPADJSVC)
be known as; be involved in
Non-contiguous prepositional adjectival (NON-CONT PREPADJSVC)
fall [ADV-so far] short of

• Each SVC was annotated according to the SVC taxonomy
• SVC corpus was translated into FR, GE, IT, PT and ES, using the
OL and the GT systems
• native linguists evaluated the SVC translation quality for each
target language and classified the errors according to a binary
evaluation metrics:
– OK ERR (agreement, morphologically-related or other
problems, such as incorrect prepositions, wrong word order)
• a comprehensive qualitative evaluation of mistranslations
according to the different types of SVC was provided
• none of the systems was trained for the task - texts were not
domain specific
Evaluation Task: Setup
16

Quantitative Results
17
Lang. pair System OK ERR Agreem Other
EN-FR
GT 64 32 4 -
OL 51 48 1 -
EN-GE
GT 37 46 3 14
OL 60 33 1 6
EN-IT
GT 61 31 - 8
OL 43 52 - 5
EN-PT
GT 68 27 5 -
OL 41 58 1 -
EN-ES
GT 51 41 6 2
OL 25 70 3 2
Results for translation of the 100 SVC in our corpus
for FR, GE, IT, PT, and ES
with the OL and the GT MT systems

• OL translates correctly more SVC than GT
• incorrect translations (for both systems) concern:
– word choice, incl. most prepositions - lexical (L)
– word order, incl. incorrect clause segmentation - order (O)
– word form, incl. choice between bare-infinitive and to-
infinitive - morphology (M)
– missing word, mainly auxiliary and main verb - ellipsis (E)
• GT has + lexical, morphology and missing word errors than OL
• GT lexical coverage is poor “wrt” contiguous SVC
• GT does not translate well the GE verb split (even after
reordering)
Linguistic Evaluation
EN-GE
18

• GT translates correctly more SVC than OL
• most translation errors by both systems involved:
– incorrect lexical choice for some or all of the elements of the
SVC (non-translation or literal translation)
– wrong agreement (subject-verb, subject-predicate adjective)
– non-contiguous and idiomatic SVC
– less idiomatic SVC - problems with (i) prepositions; (ii) literal
translation of the support verb and (iii) wrong lexical choice
for the predicate noun
– prepositions and determiner assignment, which require
minor post-editing corrections (e.g., prepositional adjectival
SVC)
Linguistic Evaluation
EN-FR/IT/PT/ES
19

• In general, SVC problems by GT were more structural, while SVC
problems by OL were more lexical
• OL would easily translate contiguous and non-contiguous SVC
correctly, provided it added it to its dictionary and rule DB
• OL is able to resolve the SVC internal modifiers better than GT,
which removes some meaning from the source in the translation
• OL use of linguistic knowledge in its structural analysis is a
powerful feature that can turn OL performance for the Romance
languages as satisfactory as that for GE
• Higher quality translation can be achieved if we combine:
• OL ability to translate different surface structures of a sentence
• GT rich word selection powered by sophisticated statistical
methods to extract knowledge from large volumes of parallel data
Linguistic Evaluation: Conclusions
20

• In the OL system, linguistic elements are represented in a
semantico-syntactic abstraction language (SAL) with
ontological properties
• SAL represents the heart of OL, accounting for its effectiveness
in parsing and semantic understanding
[Scott, 2003] [Barreiro et al., 2011] [Barreiro et., 2014]
– http://www.l2f.inesc-id.pt/~abarreiro/openlogos-tutorial/INDEX.HTM
• SAL is hierarchical, made up of supersets, sets and subsets
• SAL knowledge is encoded in the lexicon,
both in the dictionary entries and in the rules.
• Bilingual dictionaries with SAL knowledge are available at:
– http://metanet4u.l2f.inesc-id.pt
Proposal for Semantico-Syntactic Knowledge
Integration into SMT
21
nouns
concrete
func onals
conduits
word class
superset
set
subsetbarriers containers
……
… …
……

• In OL, all NL input sentences are converted into SAL patterns,
which represent the semantico-syntactic and morphological
features of each word
• SAL elements interact with semantico-syntactic rules called
SEMTAB rules, which
– represent the meaning of words on the basis of their
association with other words (context)
– disambiguate the meanings of words in the source text by
identifying the syntactic structures underlying each meaning
– provide the target language equivalents of each identified
meaning of a source language
– are conceptual and encode deep structure relations
22

• called after dictionary look-up and during the execution of
target transfer rules (TRAN rules) to solve ambiguity
problems (verb dependencies) and multiwords, overriding the
default dictionary transfer
• When a sentence is being parsed by TRAN, OL sends the SAL
patterns to the SEMTAB database to look for a rule match
• If the rule exists for a linguistic string, TRAN uses that rule and
overrides the dictionary transfer for that string
23

• A string can maintain the SVC structure or be paraphrased
apply paint to
PT: aplicar tinta a / pintar
• The SEMTAB rule applies to different surface structures of the
SVC and any insert specified in the rule
they applied immediately red paint (immediately) to
PT: aplicaram imediatamente tinta vermelha a
24

• As long as the SEMTAB rule exists in the database, OL can
process and translate correctly all the incorrectly translated
SVC in our corpus (by OL and GT)
• The OL method can overcome the structural problems
presented by SMT, not only the contiguous, but also the non-
contiguous SVC, independently of how remotely they occur in
the sentence
• The OL methodology applies to any type of multiword and
allows the translation of other context-sensitive challenges
25

• Multiwords (SVC) are responsible for most translation errors
– researchers need to develop approach-independent
systematic linguistic quality evaluation metrics with
phased error categorization tasks where specific linguistic
phenomena (such as SVC) can be evaluated individually in
stages by MT expert linguists
• fine-grained error categorization can contribute to more
controlled and systematic evaluation tasks
• evaluation needs to target each group of linguistic errors and
identify which system has more difficulties translating each
type of linguistic challenge (paradigmatic evaluation)
Conclusions and Future Work
26

• evaluation tasks require the construction of corpora to test
grammatical correctness addressing individual linguistic
phenomena
– different types of multiwords, relative constructions,
passives, pronouns, determiners, locative prepositions, etc.
• TOWARDS HYBRIDIZATION
– the question “how effectively can rule-based and statistical
MT be combined?” can only be answered after linguistic
quality evaluation metrics are developed and validated by
the MT community
• no effective hybridization can take place before linguistic
evaluation of the results provided by different approaches is
successfully accomplished
Conclusions and Future Work
27

28
Thank you!
This research was supported by FCT Fundação para a Ciência e Tecnologia,
through grant SFRH/BPD/91446/2012) and project PEst-OE/EEI/LA0021/2013.

Linguistic Evaluation of Support Verb Construction Translations by OpenLogos and Google Translate

More Related Content

What's hot (10)

Viewers also liked (20)

Similar to Linguistic Evaluation of Support Verb Construction Translations by OpenLogos and Google Translate (20)

More from INESC-ID (Spoken Language Systems Laboratory - L2F) (20)

Recently uploaded (20)

Linguistic Evaluation of Support Verb Construction Translations by OpenLogos and Google Translate