SlideShare a Scribd company logo
MACHINE TRANSLATION
QUALITY ESTIMATION
A Linguist’s Approach
WHAT IS MT QUALITY ESTIMATION?
Automatically providing a quality indicator for machine
translation output without depending on human reference
translations.
Our objective:
Estimate quality and post-editing effort for eBay listing titles
and descriptions
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 2
ONE big CHALLENGE
min W Ʃ T
t=1 ||(W(t)X(t) − Y (t) )||2 2 + λs||S||1 + λb||B||1,∞ subject to: W = S + B
or
“State-of-the-art QE explores different supervised linear or non-linear learning methods
for regression or classification such as Support Vector Machines (SVM), different types
of Decision Trees, Neural Networks, Elastic-Net, Gaussian Processes, Naive Bayes,
among others”
(Machine Translation Quality Estimation Across Domains, de Souza et al, 2014)
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 3
A LINGUIST’S APPROACH
Using linguistic features from 3 dimensions:
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 4
COMPLEXITY ADEQUACY FLUENCY
FEATURES
Complexity:
• Length
• Polysemy
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 5
Adequacy:
• QA
 Terminology
 Patterns
 Blacklist
 Numbers
• Automated
Post-Editing
• (POS)
• (NER)
Fluency:
• Misspellings
• Grammar errors
IMPLEMENTATION
Checkmate+LanguageTool
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 6
Reusable Profile
Detailed Report
Score
TESTING
• One Language (es-LA)
• Short samples (~300 words)
• Bigger samples (~1000 words)
• Post-Edited files (~50,000 words)
• pt-BR, ru-RU, zh-CN
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 7
RESULTS
MEASURING RESULTS
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 9
SAMPLES - SCORE AND TIME ALIGN
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 10
FILES - SCORE AND ED ALIGN
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 11
Average ED (es-LA, descriptions) = 72
MT QE OVER TIME
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 12
SAMPLES - OTHER LANGUAGES
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 13
CHALLENGES
• False positives
• Matching score and post-editing effort
• Same weight for all features
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 14
WHAT’S NEXT
• Tracking scores over time
• Adding scores to our post-editing tool
• Adding new languages
• Researching new features
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 15
HOW CAN YOU USE THIS?
• Tailor the model to your needs
• Estimate quality at the file/segment level
• Target post-editing, discard bad content
• Estimate post-editing effort/time
• Compare MT systems
• Monitor MT system progress
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 16
Q&A
THANK YOU! jrowda@ebay.com
MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 17

More Related Content

PDF
An Investigation into the Efficiency of Translation Dictation, by Dr. Masaru ...
PPTX
Galileo at a Crossroads
PPT
TBEX 2013 Toronto Creative Pitching for Experienced Travel Bloggers
PDF
xcel energy EEI_Pres_November_2007SEC
PDF
WWL Roundtable Immigration Discussion 2013
PDF
Some fixed point theorems in fuzzy mappings
PPTX
Tech Petting Zoo & gMail
PPTX
Lumea sofiei
An Investigation into the Efficiency of Translation Dictation, by Dr. Masaru ...
Galileo at a Crossroads
TBEX 2013 Toronto Creative Pitching for Experienced Travel Bloggers
xcel energy EEI_Pres_November_2007SEC
WWL Roundtable Immigration Discussion 2013
Some fixed point theorems in fuzzy mappings
Tech Petting Zoo & gMail
Lumea sofiei

Viewers also liked (15)

PPT
Future of india
PPTX
A Civilização Industrial no século XIX
PPTX
Homelluna.pdf
PPTX
Cortese e Gentile
PPTX
Daniela barragan
PPTX
Alternativas para el consumidor
PDF
London TEC Construction Sector Study
PPTX
Private Is The New Public: How Society Is Ending Privacy
PDF
De toan a_b_d_2002_2012
DOC
BF CV 12-17-16
PPTX
كتابة الأخبار القصيرة للمحمول SMS
PDF
Q1 2012 investor_presentation_may_2012
DOCX
JHResume 05_06_2015
PDF
про нформац ю
PPTX
Origen i evolució de l'univers
Future of india
A Civilização Industrial no século XIX
Homelluna.pdf
Cortese e Gentile
Daniela barragan
Alternativas para el consumidor
London TEC Construction Sector Study
Private Is The New Public: How Society Is Ending Privacy
De toan a_b_d_2002_2012
BF CV 12-17-16
كتابة الأخبار القصيرة للمحمول SMS
Q1 2012 investor_presentation_may_2012
JHResume 05_06_2015
про нформац ю
Origen i evolució de l'univers
Ad

Similar to Machine Translation Quality Estimation - A Linguist's Approach (20)

PPTX
Analytics Boot Camp - Slides
PDF
machine translation evaluation resources and methods: a survey
PDF
Tech capabilities with_sa
PDF
Meta-evaluation of machine translation evaluation methods
PDF
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
PPTX
Exploring Text Revision with Backspace and Caret in Virtual Reality
PDF
Triantafyllia Voulibasi
PDF
Exploiting Distributional Semantic Models in Question Answering
PPTX
Keynote at IWLS 2017
PPTX
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
PDF
Recsys 2018 overview and highlights
PDF
Language Models for Information Retrieval
PDF
Consistent Transformation of Ratio Metrics for Efficient Online Controlled Ex...
PPTX
Software Defect Prediction on Unlabeled Datasets
PDF
An introduction to Elasticsearch's advanced relevance ranking toolbox
PDF
TOIN - TAUS Tokyo Forum 2015
PPTX
PPTX
0 introduction
PDF
Nl201609
PPT
Quality of Multimedia Experience: Past, Present and Future
Analytics Boot Camp - Slides
machine translation evaluation resources and methods: a survey
Tech capabilities with_sa
Meta-evaluation of machine translation evaluation methods
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Exploring Text Revision with Backspace and Caret in Virtual Reality
Triantafyllia Voulibasi
Exploiting Distributional Semantic Models in Question Answering
Keynote at IWLS 2017
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Recsys 2018 overview and highlights
Language Models for Information Retrieval
Consistent Transformation of Ratio Metrics for Efficient Online Controlled Ex...
Software Defect Prediction on Unlabeled Datasets
An introduction to Elasticsearch's advanced relevance ranking toolbox
TOIN - TAUS Tokyo Forum 2015
0 introduction
Nl201609
Quality of Multimedia Experience: Past, Present and Future
Ad

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Introduction to the R Programming Language
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Business Analytics and business intelligence.pdf
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Lecture1 pattern recognition............
IB Computer Science - Internal Assessment.pptx
Reliability_Chapter_ presentation 1221.5784
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
SAP 2 completion done . PRESENTATION.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Knowledge Engineering Part 1
Supervised vs unsupervised machine learning algorithms
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to the R Programming Language
climate analysis of Dhaka ,Banglades.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Analytics and business intelligence.pdf
Clinical guidelines as a resource for EBP(1).pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx

Machine Translation Quality Estimation - A Linguist's Approach

  • 2. WHAT IS MT QUALITY ESTIMATION? Automatically providing a quality indicator for machine translation output without depending on human reference translations. Our objective: Estimate quality and post-editing effort for eBay listing titles and descriptions MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 2
  • 3. ONE big CHALLENGE min W Ʃ T t=1 ||(W(t)X(t) − Y (t) )||2 2 + λs||S||1 + λb||B||1,∞ subject to: W = S + B or “State-of-the-art QE explores different supervised linear or non-linear learning methods for regression or classification such as Support Vector Machines (SVM), different types of Decision Trees, Neural Networks, Elastic-Net, Gaussian Processes, Naive Bayes, among others” (Machine Translation Quality Estimation Across Domains, de Souza et al, 2014) MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 3
  • 4. A LINGUIST’S APPROACH Using linguistic features from 3 dimensions: MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 4 COMPLEXITY ADEQUACY FLUENCY
  • 5. FEATURES Complexity: • Length • Polysemy MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 5 Adequacy: • QA  Terminology  Patterns  Blacklist  Numbers • Automated Post-Editing • (POS) • (NER) Fluency: • Misspellings • Grammar errors
  • 6. IMPLEMENTATION Checkmate+LanguageTool MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 6 Reusable Profile Detailed Report Score
  • 7. TESTING • One Language (es-LA) • Short samples (~300 words) • Bigger samples (~1000 words) • Post-Edited files (~50,000 words) • pt-BR, ru-RU, zh-CN MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 7
  • 9. MEASURING RESULTS MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 9
  • 10. SAMPLES - SCORE AND TIME ALIGN MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 10
  • 11. FILES - SCORE AND ED ALIGN MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 11 Average ED (es-LA, descriptions) = 72
  • 12. MT QE OVER TIME MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 12
  • 13. SAMPLES - OTHER LANGUAGES MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 13
  • 14. CHALLENGES • False positives • Matching score and post-editing effort • Same weight for all features MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 14
  • 15. WHAT’S NEXT • Tracking scores over time • Adding scores to our post-editing tool • Adding new languages • Researching new features MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 15
  • 16. HOW CAN YOU USE THIS? • Tailor the model to your needs • Estimate quality at the file/segment level • Target post-editing, discard bad content • Estimate post-editing effort/time • Compare MT systems • Monitor MT system progress MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 16
  • 17. Q&A THANK YOU! jrowda@ebay.com MT QUALITY ESTIMATION – A LINGUIST’S APPROACH 17

Editor's Notes

  • #2: Introduction