Addressing open Machine Translation problems with
Linked Data.
Diego Moussallem
Department of Computer Science, AKSW, SIMBA Research Group
University of Leipzig
Colloquium 29/05/2017
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 1 / 26
Outline
1 MT problems
OOV words
Translation of Named Entities
2 Using BabelNet to Improve OOV Coverage in SMT
Paper
Method
Results
3 How to Configure Statistical Machine Translation with Linked Open
Data Resources
Paper
Method
Results
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 2 / 26
Outline
1 MT problems
OOV words
Translation of Named Entities
2 Using BabelNet to Improve OOV Coverage in SMT
Paper
Method
Results
3 How to Configure Statistical Machine Translation with Linked Open
Data Resources
Paper
Method
Results
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 3 / 26
Background - OOV words problem
Definition
OOV : Out-of-vocabulary words are entries which do not appear in the
translation process, i.e, translation table.
How and when it happens
Training:
Source text: ...Albert Einstein died in April 16, 1955...
Reference translation: ...Albert Einstein morreu em 16 de Abril de 1995...
Test:
Source text: ...Albert Einstein passed away in April 16, 1955...
MT output: ...Albert Einstein passed away em 16 de Abril de 1995...
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 4 / 26
Outline
1 MT problems
OOV words
Translation of Named Entities
2 Using BabelNet to Improve OOV Coverage in SMT
Paper
Method
Results
3 How to Configure Statistical Machine Translation with Linked Open
Data Resources
Paper
Method
Results
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 5 / 26
Entity’s translation problem
Definition
When common words are entities in the target language or vice versa.
Also, when the words are very ambiguous like Kiwi which may be a fruit, a
person, a computer program, and a bird depending on the language.
How and when it happens
Training:
Source text: ...MS Paint is a good option...
Reference Translation: ...Microsoft Paint ist eine gute wahl...
Test:
Source text: ...MS Paint is a good option...
MT output : ...Frau Farbe ist eine gute wahl...
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 6 / 26
Outline
1 MT problems
OOV words
Translation of Named Entities
2 Using BabelNet to Improve OOV Coverage in SMT
Paper
Method
Results
3 How to Configure Statistical Machine Translation with Linked Open
Data Resources
Paper
Method
Results
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 7 / 26
Paper - LREC 2016
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 8 / 26
Outline
1 MT problems
OOV words
Translation of Named Entities
2 Using BabelNet to Improve OOV Coverage in SMT
Paper
Method
Results
3 How to Configure Statistical Machine Translation with Linked Open
Data Resources
Paper
Method
Results
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 9 / 26
Proposed Method
Strategies for Using BabelNet in SMT
1 Direct Training
2 Domain Adaptation
3 Post-processing
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 10 / 26
Direct Training
Figure: Direct Training workflow
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 11 / 26
Domain Adaptation
Divided in 3 sub-strategies for adapting the domains
1:CEM, 2:UEM, 3:CED
Figure: Domain Adaptation workflow
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 12 / 26
Post-processing
Figure: Post-processing workflow
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 13 / 26
Outline
1 MT problems
OOV words
Translation of Named Entities
2 Using BabelNet to Improve OOV Coverage in SMT
Paper
Method
Results
3 How to Configure Statistical Machine Translation with Linked Open
Data Resources
Paper
Method
Results
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 14 / 26
Result - Direct Training
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 15 / 26
Result - Domain Adaptation
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 16 / 26
Result - Post-processing
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 17 / 26
Final Results
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 18 / 26
Outline
1 MT problems
OOV words
Translation of Named Entities
2 Using BabelNet to Improve OOV Coverage in SMT
Paper
Method
Results
3 How to Configure Statistical Machine Translation with Linked Open
Data Resources
Paper
Method
Results
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 19 / 26
Paper - AsLing 2016
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 20 / 26
Outline
1 MT problems
OOV words
Translation of Named Entities
2 Using BabelNet to Improve OOV Coverage in SMT
Paper
Method
Results
3 How to Configure Statistical Machine Translation with Linked Open
Data Resources
Paper
Method
Results
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 21 / 26
Proposed method
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 22 / 26
Outline
1 MT problems
OOV words
Translation of Named Entities
2 Using BabelNet to Improve OOV Coverage in SMT
Paper
Method
Results
3 How to Configure Statistical Machine Translation with Linked Open
Data Resources
Paper
Method
Results
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 23 / 26
Results
Task EN - DE
WMT 2016 shared task on machine translation of IT domain
Moses baseline
BLEU: 34.0 and TER: 56.1
Moses + Linked Data
BLEU: 34.8 and TER: 53.6
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 24 / 26
Thank you!
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 25 / 26
Bibliography I
Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 26 / 26

More Related Content

PPTX
The Road To The Semantic Web
ODP
Math in Plone
PDF
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
PPTX
Data at scale: How to deal with small challenges when they become massive - A...
PDF
NABU - Multilingual Graph-based Neural RDF Verbalizer
PDF
LOG4MEX: a library to export machine learning experiments
PDF
GENESIS – A Generic RDF Data Access Interface
PDF
MAG - A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linkin...
The Road To The Semantic Web
Math in Plone
Master defence 2020 - Philipp Kofman - Efficient Generation of Complex Data D...
Data at scale: How to deal with small challenges when they become massive - A...
NABU - Multilingual Graph-based Neural RDF Verbalizer
LOG4MEX: a library to export machine learning experiments
GENESIS – A Generic RDF Data Access Interface
MAG - A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linkin...

Recently uploaded (20)

PPTX
endocrine - management of adrenal incidentaloma.pptx
PDF
Cosmology using numerical relativity - what hapenned before big bang?
PPTX
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
PDF
CuO Nps photocatalysts 15156456551564161
PPTX
2currentelectricity1-201006102815 (1).pptx
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PDF
Chapter 3 - Human Development Poweroint presentation
PDF
5.Physics 8-WBS_Light.pdfFHDGJDJHFGHJHFTY
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PPTX
Preformulation.pptx Preformulation studies-Including all parameter
PPTX
Substance Disorders- part different drugs change body
PDF
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
PDF
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
PPTX
gene cloning powerpoint for general biology 2
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
perinatal infections 2-171220190027.pptx
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PDF
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG
PDF
Social preventive and pharmacy. Pdf
PPTX
HAEMATOLOGICAL DISEASES lack of red blood cells, which carry oxygen throughou...
endocrine - management of adrenal incidentaloma.pptx
Cosmology using numerical relativity - what hapenned before big bang?
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
CuO Nps photocatalysts 15156456551564161
2currentelectricity1-201006102815 (1).pptx
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Chapter 3 - Human Development Poweroint presentation
5.Physics 8-WBS_Light.pdfFHDGJDJHFGHJHFTY
TORCH INFECTIONS in pregnancy with toxoplasma
Preformulation.pptx Preformulation studies-Including all parameter
Substance Disorders- part different drugs change body
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
gene cloning powerpoint for general biology 2
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
perinatal infections 2-171220190027.pptx
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG
Social preventive and pharmacy. Pdf
HAEMATOLOGICAL DISEASES lack of red blood cells, which carry oxygen throughou...
Ad
Ad

Addressing open Machine Translation problems with Linked Data.

  • 1. Addressing open Machine Translation problems with Linked Data. Diego Moussallem Department of Computer Science, AKSW, SIMBA Research Group University of Leipzig Colloquium 29/05/2017 Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 1 / 26
  • 2. Outline 1 MT problems OOV words Translation of Named Entities 2 Using BabelNet to Improve OOV Coverage in SMT Paper Method Results 3 How to Configure Statistical Machine Translation with Linked Open Data Resources Paper Method Results Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 2 / 26
  • 3. Outline 1 MT problems OOV words Translation of Named Entities 2 Using BabelNet to Improve OOV Coverage in SMT Paper Method Results 3 How to Configure Statistical Machine Translation with Linked Open Data Resources Paper Method Results Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 3 / 26
  • 4. Background - OOV words problem Definition OOV : Out-of-vocabulary words are entries which do not appear in the translation process, i.e, translation table. How and when it happens Training: Source text: ...Albert Einstein died in April 16, 1955... Reference translation: ...Albert Einstein morreu em 16 de Abril de 1995... Test: Source text: ...Albert Einstein passed away in April 16, 1955... MT output: ...Albert Einstein passed away em 16 de Abril de 1995... Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 4 / 26
  • 5. Outline 1 MT problems OOV words Translation of Named Entities 2 Using BabelNet to Improve OOV Coverage in SMT Paper Method Results 3 How to Configure Statistical Machine Translation with Linked Open Data Resources Paper Method Results Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 5 / 26
  • 6. Entity’s translation problem Definition When common words are entities in the target language or vice versa. Also, when the words are very ambiguous like Kiwi which may be a fruit, a person, a computer program, and a bird depending on the language. How and when it happens Training: Source text: ...MS Paint is a good option... Reference Translation: ...Microsoft Paint ist eine gute wahl... Test: Source text: ...MS Paint is a good option... MT output : ...Frau Farbe ist eine gute wahl... Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 6 / 26
  • 7. Outline 1 MT problems OOV words Translation of Named Entities 2 Using BabelNet to Improve OOV Coverage in SMT Paper Method Results 3 How to Configure Statistical Machine Translation with Linked Open Data Resources Paper Method Results Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 7 / 26
  • 8. Paper - LREC 2016 Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 8 / 26
  • 9. Outline 1 MT problems OOV words Translation of Named Entities 2 Using BabelNet to Improve OOV Coverage in SMT Paper Method Results 3 How to Configure Statistical Machine Translation with Linked Open Data Resources Paper Method Results Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 9 / 26
  • 10. Proposed Method Strategies for Using BabelNet in SMT 1 Direct Training 2 Domain Adaptation 3 Post-processing Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 10 / 26
  • 11. Direct Training Figure: Direct Training workflow Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 11 / 26
  • 12. Domain Adaptation Divided in 3 sub-strategies for adapting the domains 1:CEM, 2:UEM, 3:CED Figure: Domain Adaptation workflow Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 12 / 26
  • 13. Post-processing Figure: Post-processing workflow Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 13 / 26
  • 14. Outline 1 MT problems OOV words Translation of Named Entities 2 Using BabelNet to Improve OOV Coverage in SMT Paper Method Results 3 How to Configure Statistical Machine Translation with Linked Open Data Resources Paper Method Results Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 14 / 26
  • 15. Result - Direct Training Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 15 / 26
  • 16. Result - Domain Adaptation Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 16 / 26
  • 17. Result - Post-processing Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 17 / 26
  • 18. Final Results Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 18 / 26
  • 19. Outline 1 MT problems OOV words Translation of Named Entities 2 Using BabelNet to Improve OOV Coverage in SMT Paper Method Results 3 How to Configure Statistical Machine Translation with Linked Open Data Resources Paper Method Results Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 19 / 26
  • 20. Paper - AsLing 2016 Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 20 / 26
  • 21. Outline 1 MT problems OOV words Translation of Named Entities 2 Using BabelNet to Improve OOV Coverage in SMT Paper Method Results 3 How to Configure Statistical Machine Translation with Linked Open Data Resources Paper Method Results Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 21 / 26
  • 22. Proposed method Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 22 / 26
  • 23. Outline 1 MT problems OOV words Translation of Named Entities 2 Using BabelNet to Improve OOV Coverage in SMT Paper Method Results 3 How to Configure Statistical Machine Translation with Linked Open Data Resources Paper Method Results Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 23 / 26
  • 24. Results Task EN - DE WMT 2016 shared task on machine translation of IT domain Moses baseline BLEU: 34.0 and TER: 56.1 Moses + Linked Data BLEU: 34.8 and TER: 53.6 Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 24 / 26
  • 25. Thank you! Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 25 / 26
  • 26. Bibliography I Diego Moussallem (AKSW) Addressing open Machine Translation problems with Linked Data.Colloquium 29/05/2017 26 / 26