SlideShare a Scribd company logo
Machine Translation course program

Brief description of the course:
There are two fundamental approaches to machine translation: rule-based approach (based on formal
models of natural languages, like e.g. dependency grammars) and statistical approaches (based on parallel
streams of data). Both these approaches have their advantages: rule-based one being formal and
structured, while statistic approach gives an opportunity to construct and scale the system without the
need to deeply study properties of a natural language. On the other hand both these approaches have their
problematic areas: rule-based approach is bound to a given language or a family of languages, while
statistic approach doesn’t allow controlling subtle structures and properties of a natural language, like for
example generating prepositions. Recently combining these two fundamental approaches have been of a
special interest of scientists. An entire pipeline of machine translation, starting from source language
formalization and finishing with word reordering on the target language side, can be considered as a
training area for combining rule based with statistics. This course will introduce students into all sub-tasks
of creating a machine translation system using both fundamental approaches: formalization of natural
language, translational dictionaries, phrase translation, machine translation models, decoding and word
reordering. The course will also present formal semantic models of natural languages and their place in the
topic. Along with that, machine learning methods (like structured prediction) will be in the focus of the
course. The course material assumes knowledge of general higher mathematics and knowledge or interest
in the natural language processing. We will have some hands-on and take-away knowledge sessions, which
assume familiarity with formats, NLP algorithms and libraries.


Course topics
    1.    Introduction to MT. Motivation of its existence
    2.    Short history of MT, mane phases. ALPAC report
    3.    MT systems triangle. Direct and indirect MT. Examples of MT systems
    4.    Current MT systems existing in the industry, main players
    5.    Existing software packages for natural language processing and building an MT system
    6.    Two fundamental approaches to MT: statistical and rule-based (classical)
    7.    Methods of MT
    8.    Direct MT system, its features, pros and cons.
    9.    Transfer MT system, types of transfer methods, features
    10.   Notion of interlingua. Features of MT based on interlingua, its comparison with transfer
    11.   Statistical MT and its components
    12.   Example based MT systems
    13.   Theory of statistical MT systems. Fundamental equation (Bayes theorem). Notion of statistical language
          model. MT model
    14.   model of machine translation in statistical MT
    15.   Task of word alignment
    16.   Features of MT systems
    17.   Existing programming components of statistical MT systems
    18.   Evaluation of MT systems: human evaluation and automatic metrics
    19.   BLEU score
    20.   METEOR score
    21.   NIST score
    22.   Round-trip evaluation method
    23.   Hybrid MT systems
    24.   Task of word reordering in a sentence on the target side. Rule-based and statistical approaches
    25.   Computer semantics of a natural language. MT system based on it
    26.   Pragmatics and context analysis on cross-sentence level
    27.   Practical details of software packages: GIZA++, SRILM, Moses
28. Method of structured prediction for learning machine translation models


Seminar topics
   1.   Mathematics of statistical MT, paper [1]
   2.   Hierarchical model of statistical MT, paper [2]
   3.   Phrase-based statistical MT, paper [3]
   4.   Rule-based MT systems, papers [4,5]
   5.   Hybrid MT systems, based on examples, paper [6]
   6.   BLEU score in details, paper [8]
   7.   Robust large-scale MT systems, based on examples, paper [9]


Bibliography
[1] Brown P., Della Petra S., Della Petra V., Mercer R.: The Mathematics of
Statistical Machine Translation: Parameter Estimation, 1993
[2] Chiang D.: A Hierarchical Phrase-Based Model for Statistical Machine
Translation, 2005
[3] Koehn P., Och F., Marcu D.: Statistical Phrase-Based Machine Translation, 2003
[4] Kaplan R., Netter K., Wedekind J., Zaenen A.: Translation By Structural
Correspondences, 1989
[5] Landsbergen J.: The Rosetta Project, 1989
[6] Groves D., Way A.: Hybrid Example-Based SMT: the Best of Both Worlds?
[7] Athanaselis T., Bakamidis S., Dologou I.: Words Reordering based on Statistical
Language Model, 2006
[8] Papineni K., Roukos S., Ward T., Zhu W.-J.: BLEU: a Method for Automatic
Evaluation of Machine Translation, 2002
[9] Gough N., Way A.: Robust Large-Scale EBMT with Marker-Based Segmentation,
2004

More Related Content

DOC
4th sem
PPTX
Contextual Definition Generation
DOCX
Mi0034 database management system
PPTX
Order out of Chaos: Construction of Knowledge Models from PDF Textbooks
PPTX
Lect03
PPTX
Mathematical Language Processing via Tree Embeddings
PDF
B046021319
PDF
CS3270 – Database Systems Course Outline
4th sem
Contextual Definition Generation
Mi0034 database management system
Order out of Chaos: Construction of Knowledge Models from PDF Textbooks
Lect03
Mathematical Language Processing via Tree Embeddings
B046021319
CS3270 – Database Systems Course Outline

What's hot (11)

PPTX
Lect09
PDF
Algoritmos comp2010
PPTX
Lect01
PDF
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
PPT
A hierarchical approach for semi structured document indexing and
PDF
Data wrangling week 9
PPTX
Filtering out improper user accounts from twitter user accounts for discoveri...
PPTX
HAN_XU_ICDMW2014
PDF
Resume-Luan Sitao
PDF
Tag recommendation in social bookmarking sites like deli
PDF
Tag recommendation in social bookmarking sites like deli
Lect09
Algoritmos comp2010
Lect01
AUTOMATED SHORT ANSWER GRADER USING FRIENDSHIP GRAPHS
A hierarchical approach for semi structured document indexing and
Data wrangling week 9
Filtering out improper user accounts from twitter user accounts for discoveri...
HAN_XU_ICDMW2014
Resume-Luan Sitao
Tag recommendation in social bookmarking sites like deli
Tag recommendation in social bookmarking sites like deli
Ad

Viewers also liked (18)

PDF
Solr onfitnesse learningfromberlinbuzzwords
PDF
Lucene revolution eu 2013 dublin writeup
PDF
Automatic Build Of Semantic Translational Dictionary
PDF
Starget sentiment analyzer for English
PDF
Social spam detection by SemanticAnalyzer Group
PDF
Semantic feature machine translation system
PDF
Introduction To Machine Translation 1
PDF
Linguistic component Sentiment Analyzer for the Russian language
PDF
Linguistic component Lemmatizer for the Russian language
PDF
MTEngine: Semantic-level Crowdsourced Machine Translation
PDF
Introduction To Machine Translation
PDF
NoSQL, Apache SOLR and Apache Hadoop
PDF
Rule based approach to sentiment analysis at ROMIP 2011
PDF
Poster: Method for an automatic generation of a semantic-level contextual tra...
PDF
Linguistic component Tokenizer for the Russian language
PPTX
Rule based approach to sentiment analysis at romip’11 slides
PDF
Semantic Analysis: theory, applications and use cases
PDF
IR: Open source state
Solr onfitnesse learningfromberlinbuzzwords
Lucene revolution eu 2013 dublin writeup
Automatic Build Of Semantic Translational Dictionary
Starget sentiment analyzer for English
Social spam detection by SemanticAnalyzer Group
Semantic feature machine translation system
Introduction To Machine Translation 1
Linguistic component Sentiment Analyzer for the Russian language
Linguistic component Lemmatizer for the Russian language
MTEngine: Semantic-level Crowdsourced Machine Translation
Introduction To Machine Translation
NoSQL, Apache SOLR and Apache Hadoop
Rule based approach to sentiment analysis at ROMIP 2011
Poster: Method for an automatic generation of a semantic-level contextual tra...
Linguistic component Tokenizer for the Russian language
Rule based approach to sentiment analysis at romip’11 slides
Semantic Analysis: theory, applications and use cases
IR: Open source state
Ad

Similar to Machine translation course program (in English) (20)

PPTX
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
PDF
Meta-evaluation of machine translation evaluation methods
PDF
Integration of speech recognition with computer assisted translation
PDF
A hybrid composite features based sentence level sentiment analyzer
PDF
A simplified classification computational model of opinion mining using deep ...
PPTX
Lecture 0 CSE322 updated LPU 5th SEM.pptx
PDF
computer science and information technology course units outline.pdf
PDF
French machine reading for question answering
PPTX
Computer Educational Theories Technology .pptx
PDF
Error Analysis of Rule-based Machine Translation Outputs
PDF
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
PDF
Design and Development of a Malayalam to English Translator- A Transfer Based...
PDF
76 s201906
PDF
Development of an intelligent information resource model based on modern na...
PDF
Fundamentals of data structures ellis horowitz & sartaj sahni
PDF
Synthetic Personas: Enhancing Demographic Response Simulation through Large L...
PDF
Course Syllabus For Operations Management
PDF
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH ...
PDF
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH ...
PDF
Interlingual Syntactic Parsing: An Optimized Head-Driven Parsing for English ...
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
Meta-evaluation of machine translation evaluation methods
Integration of speech recognition with computer assisted translation
A hybrid composite features based sentence level sentiment analyzer
A simplified classification computational model of opinion mining using deep ...
Lecture 0 CSE322 updated LPU 5th SEM.pptx
computer science and information technology course units outline.pdf
French machine reading for question answering
Computer Educational Theories Technology .pptx
Error Analysis of Rule-based Machine Translation Outputs
58903230-SentiMatrix-Named-Entity-Recognition-for-Romanian-Language
Design and Development of a Malayalam to English Translator- A Transfer Based...
76 s201906
Development of an intelligent information resource model based on modern na...
Fundamentals of data structures ellis horowitz & sartaj sahni
Synthetic Personas: Enhancing Demographic Response Simulation through Large L...
Course Syllabus For Operations Management
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH ...
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH ...
Interlingual Syntactic Parsing: An Optimized Head-Driven Parsing for English ...

More from Dmitry Kan (6)

PDF
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
PDF
Vector databases and neural search
PPTX
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
PDF
SentiScan: система автоматической разметки тональности в social media
PDF
Icsoft 2011 51_cr
PDF
Computer Semantics And Machine Translation
London IR Meetup - Players in Vector Search_ algorithms, software and use cases
Vector databases and neural search
Haystack LIVE! - 5 ways to increase result diversity at web-scale - Dmitry Ka...
SentiScan: система автоматической разметки тональности в social media
Icsoft 2011 51_cr
Computer Semantics And Machine Translation

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
“AI and Expert System Decision Support & Business Intelligence Systems”
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
Programs and apps: productivity, graphics, security and other tools
The Rise and Fall of 3GPP – Time for a Sabbatical?
Reach Out and Touch Someone: Haptics and Empathic Computing
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
Chapter 3 Spatial Domain Image Processing.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx

Machine translation course program (in English)

  • 1. Machine Translation course program Brief description of the course: There are two fundamental approaches to machine translation: rule-based approach (based on formal models of natural languages, like e.g. dependency grammars) and statistical approaches (based on parallel streams of data). Both these approaches have their advantages: rule-based one being formal and structured, while statistic approach gives an opportunity to construct and scale the system without the need to deeply study properties of a natural language. On the other hand both these approaches have their problematic areas: rule-based approach is bound to a given language or a family of languages, while statistic approach doesn’t allow controlling subtle structures and properties of a natural language, like for example generating prepositions. Recently combining these two fundamental approaches have been of a special interest of scientists. An entire pipeline of machine translation, starting from source language formalization and finishing with word reordering on the target language side, can be considered as a training area for combining rule based with statistics. This course will introduce students into all sub-tasks of creating a machine translation system using both fundamental approaches: formalization of natural language, translational dictionaries, phrase translation, machine translation models, decoding and word reordering. The course will also present formal semantic models of natural languages and their place in the topic. Along with that, machine learning methods (like structured prediction) will be in the focus of the course. The course material assumes knowledge of general higher mathematics and knowledge or interest in the natural language processing. We will have some hands-on and take-away knowledge sessions, which assume familiarity with formats, NLP algorithms and libraries. Course topics 1. Introduction to MT. Motivation of its existence 2. Short history of MT, mane phases. ALPAC report 3. MT systems triangle. Direct and indirect MT. Examples of MT systems 4. Current MT systems existing in the industry, main players 5. Existing software packages for natural language processing and building an MT system 6. Two fundamental approaches to MT: statistical and rule-based (classical) 7. Methods of MT 8. Direct MT system, its features, pros and cons. 9. Transfer MT system, types of transfer methods, features 10. Notion of interlingua. Features of MT based on interlingua, its comparison with transfer 11. Statistical MT and its components 12. Example based MT systems 13. Theory of statistical MT systems. Fundamental equation (Bayes theorem). Notion of statistical language model. MT model 14. model of machine translation in statistical MT 15. Task of word alignment 16. Features of MT systems 17. Existing programming components of statistical MT systems 18. Evaluation of MT systems: human evaluation and automatic metrics 19. BLEU score 20. METEOR score 21. NIST score 22. Round-trip evaluation method 23. Hybrid MT systems 24. Task of word reordering in a sentence on the target side. Rule-based and statistical approaches 25. Computer semantics of a natural language. MT system based on it 26. Pragmatics and context analysis on cross-sentence level 27. Practical details of software packages: GIZA++, SRILM, Moses
  • 2. 28. Method of structured prediction for learning machine translation models Seminar topics 1. Mathematics of statistical MT, paper [1] 2. Hierarchical model of statistical MT, paper [2] 3. Phrase-based statistical MT, paper [3] 4. Rule-based MT systems, papers [4,5] 5. Hybrid MT systems, based on examples, paper [6] 6. BLEU score in details, paper [8] 7. Robust large-scale MT systems, based on examples, paper [9] Bibliography [1] Brown P., Della Petra S., Della Petra V., Mercer R.: The Mathematics of Statistical Machine Translation: Parameter Estimation, 1993 [2] Chiang D.: A Hierarchical Phrase-Based Model for Statistical Machine Translation, 2005 [3] Koehn P., Och F., Marcu D.: Statistical Phrase-Based Machine Translation, 2003 [4] Kaplan R., Netter K., Wedekind J., Zaenen A.: Translation By Structural Correspondences, 1989 [5] Landsbergen J.: The Rosetta Project, 1989 [6] Groves D., Way A.: Hybrid Example-Based SMT: the Best of Both Worlds? [7] Athanaselis T., Bakamidis S., Dologou I.: Words Reordering based on Statistical Language Model, 2006 [8] Papineni K., Roukos S., Ward T., Zhu W.-J.: BLEU: a Method for Automatic Evaluation of Machine Translation, 2002 [9] Gough N., Way A.: Robust Large-Scale EBMT with Marker-Based Segmentation, 2004