SlideShare a Scribd company logo
Apertium: Free/open-source rule-based machine
translation and language processors
Mikel L. Forcada
Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain
Riga TAUS Roundtable, June 1, 2016
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
What is Apertium?
What is Apertium?
Apertium (since 2005) is
a free/open-source platform for shallow-transfer rule-based machine
translation
which is collaboratively developed
and provides:
A congurable, language independent machine translation engine,
Data (dictionaries, rules) for more than 40 language pairs (in XML
and text-based formats), and
lots of tools for developers and users.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
What is Apertium?
Pipeline architecture
A pipelined architecture allows for easy customization and diagnostics.
lexical
transfer
morph.
analyser
morph.
disambig.
morph.
generator
post-
generator
SL
text
TL
text
deformatter
reformatter
structural
transfer
lexical
selection
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
What is Apertium?
Languages and language pairs
afr
nld
arg
cat
ita
bre
fra
spa
cym
eng
glg
dan
nno
nob
ast por ron
epo eus
hbs
mkd slv
bul
ind
zsmisl
swe
kaz
tat
mlt
ara
oci
sme
urd
hin
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
What is Apertium?
Apertium loves small languages
Some unique MT systems for small languages:
Breton→French Aragonese↔Spanish
Occitan↔Catalan Aragonese↔Catalan
Occitan↔Spanish North Sámi→Norwegian
To love is to give: e.g. provide small languages with
language resources, and
computational-linguistic descriptions of their language.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
What is Apertium good for?
What is Apertium good for?
Apertium is basically good to translate between related languages. Some
examples in Apertium:
Spanish ↔ Portuguese
Norwegian Nynorsk ↔ Norwegian Bokmål
Slovenian ↔ Croatian
Tatar ↔ Kazakh
Postediting Apertium output in these cases may save time compared to
translation from scratch.
It is also being used for less-related language pairs in gisting applications.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
Apertium is collaboratively developed
Apertium licensing: free/open-source
Apertium language data and code are both licensed under the GNU
General Public License:
a free/open-source license allowing free distribution of unmodied and
modied versions
a copylefted license: it avoids private appropriation and encourages
giving improvements back to the project (a commons) → community
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
Apertium is collaboratively developed
Apertium is collaboratively developed
Very active group of hundreds of developers (freelance developers,
researchers, industrial partners).
Wiki documentation (wiki.apertium.org) in addition to formal
documents.
Help available at IRC channel #apertium in freenode.net
Mailing lists: apertium-stuff@lists.sf.net and other
language-specic lists
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
Apertium is collaboratively developed
Research and business with Apertium
Apertium is already an active research and business platform:
Research: 40+ publications, 2 PhD thesis, 4 master's theses
Business: companies (Prompsit, Eleka, Imaxin Software, etc.)
oering services to customers such as Autodesk, the Government of
Catalonia, one of the main Basque banks, the daily newspaper La Voz
de Galicia, etc.)
The free/open-source model creates a community which eectively
connects researchers, developers, vendors and users.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
Becoming an Apertium user
Becoming an Apertium user
Professional translators can:
use Apertium oine plugins in the OmegaT free/open-source CAT
environment.
(as with any other system) easily align source and MT to generate
machine translation memories to feed into other CAT systems
Muggles can use:
a stand-alone Java application for the desktop: apertium-caffeine
an Android version for handhelds
a stand-alone version (Apertium Simpleton) for Windows and MacOS.
a plug-in for the OmegaT CAT platform apertium-omegat
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
Becoming an Apertium developer
Becoming an Apertium developer
It's easy to become an Apertium developer. It just takes
reasonable computing skills (XML, shell commands, etc.), which are
not too hard to acquire,
good translation skills.
In no time, developers nd themselves contributing to a language pair with
the support of the community.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
A nice side eect: monolingual resources
A nice side eect: monolingual resources
When developing a language pair, monolingual language resources are
developed, such as
morphological dictionaries
morphological disambiguation rules and probabilities
The corresponding monolingual processors are available to help statistical
machine translation deal, for instance, with languages having a challenging
morphology.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13
Success cases
Success cases
Apertium a is mature technology which is used:
in Wikimedia Content Translation to generate Wikipedia content in
other languages,
to produce a Catalan edition of Valencia daily newspaper
Levante-EMV,
by Universities in the Catalan speaking area to help in the generation
of courseware and academic information,
in PLATA, the Spanish government platform for on-the-y webpage
machine translation of public-service webpages.
Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process
Riga TAUS Roundtable, June 1, 2016
/ 13

More Related Content

PDF
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
PDF
META-NET and META-SHARE: Language Technology for Europe
PDF
Promoting the Use of Basque via Language Technology
PPTX
Celtic language technologies in the digital age
PDF
AI for Translation Technologies and Multilingual Europe
PDF
Language Technology for Multilingual Europe
PDF
AI and Conference Interpretation – From Smart Assistants for the Human Interp...
PDF
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
META-NET and META-SHARE: Language Technology for Europe
Promoting the Use of Basque via Language Technology
Celtic language technologies in the digital age
AI for Translation Technologies and Multilingual Europe
Language Technology for Multilingual Europe
AI and Conference Interpretation – From Smart Assistants for the Human Interp...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...

Similar to Apertium: Free/open-source rule-based machine translation and language processors, Mikel L. Forcada, Universitat d'Alacant, Spain (8)

PDF
Apertium: a unique free/open-source MT system for related languages [but not ...
PDF
Apertium: a unique free/open-source MT system for related languages [but not ...
PPTX
Apertium: an extensive and shared language resource base for MT and much more...
PDF
Open-source machine translation for Icelandic: the Apertium platform as an o...
PDF
Machine Translation of Indic Languages using apertium
PDF
Presentation Prompsit Apertium Oswc 2012
PDF
Open Source innovation Catalyst, OW2con11, Nov 24-25, 2011
 
PDF
Constraint Grammar and Apertium
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: an extensive and shared language resource base for MT and much more...
Open-source machine translation for Icelandic: the Apertium platform as an o...
Machine Translation of Indic Languages using apertium
Presentation Prompsit Apertium Oswc 2012
Open Source innovation Catalyst, OW2con11, Nov 24-25, 2011
 
Constraint Grammar and Apertium
Ad

More from TAUS - The Language Data Network (20)

PPTX
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
PPTX
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
PPTX
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
PPTX
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
PPTX
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
PPTX
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
PDF
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
PPTX
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
PPTX
A translation memory P2P trading platform - to make global translation memory...
PPTX
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
PPT
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
PPTX
Farmer Lv (TrueTran)
PPT
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
PPTX
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
PPTX
Translation Technology Showcase in Shenzhen
PPTX
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
PPTX
SDL Trados Studio 2017, Jocelyn He (SDL)
PPTX
How we train post-editors - Yongpeng Wei (Lingosail)
PDF
A use-case for getting MT into your company, Kerstin Berns (berns language c...
PPTX
QE integrated in XTM, by Bob Willans (XTM)
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
A translation memory P2P trading platform - to make global translation memory...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Farmer Lv (TrueTran)
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
Translation Technology Showcase in Shenzhen
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
SDL Trados Studio 2017, Jocelyn He (SDL)
How we train post-editors - Yongpeng Wei (Lingosail)
A use-case for getting MT into your company, Kerstin Berns (berns language c...
QE integrated in XTM, by Bob Willans (XTM)
Ad

Recently uploaded (20)

PPTX
Introduction to Effective Communication.pptx
PPTX
Tablets And Capsule Preformulation Of Paracetamol
PPTX
chapter8-180915055454bycuufucdghrwtrt.pptx
PPTX
BIOLOGY TISSUE PPT CLASS 9 PROJECT PUBLIC
PPTX
ART-APP-REPORT-FINctrwxsg f fuy L-na.pptx
PPTX
Impressionism_PostImpressionism_Presentation.pptx
PPTX
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
PPTX
water for all cao bang - a charity project
PPTX
PHIL.-ASTRONOMY-AND-NAVIGATION of ..pptx
DOCX
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
PPTX
Primary and secondary sources, and history
PPTX
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
PDF
Tunisia's Founding Father(s) Pitch-Deck 2022.pdf
PPTX
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
PDF
Parts of Speech Prepositions Presentation in Colorful Cute Style_20250724_230...
PDF
Swiggy’s Playbook: UX, Logistics & Monetization
PDF
oil_refinery_presentation_v1 sllfmfls.pdf
DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
PPTX
Learning-Plan-5-Policies-and-Practices.pptx
DOC
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
Introduction to Effective Communication.pptx
Tablets And Capsule Preformulation Of Paracetamol
chapter8-180915055454bycuufucdghrwtrt.pptx
BIOLOGY TISSUE PPT CLASS 9 PROJECT PUBLIC
ART-APP-REPORT-FINctrwxsg f fuy L-na.pptx
Impressionism_PostImpressionism_Presentation.pptx
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
water for all cao bang - a charity project
PHIL.-ASTRONOMY-AND-NAVIGATION of ..pptx
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
Primary and secondary sources, and history
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
Tunisia's Founding Father(s) Pitch-Deck 2022.pdf
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
Parts of Speech Prepositions Presentation in Colorful Cute Style_20250724_230...
Swiggy’s Playbook: UX, Logistics & Monetization
oil_refinery_presentation_v1 sllfmfls.pdf
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
Learning-Plan-5-Policies-and-Practices.pptx
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证

Apertium: Free/open-source rule-based machine translation and language processors, Mikel L. Forcada, Universitat d'Alacant, Spain

  • 1. Apertium: Free/open-source rule-based machine translation and language processors Mikel L. Forcada Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain Riga TAUS Roundtable, June 1, 2016 Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 2. What is Apertium? What is Apertium? Apertium (since 2005) is a free/open-source platform for shallow-transfer rule-based machine translation which is collaboratively developed and provides: A congurable, language independent machine translation engine, Data (dictionaries, rules) for more than 40 language pairs (in XML and text-based formats), and lots of tools for developers and users. Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 3. What is Apertium? Pipeline architecture A pipelined architecture allows for easy customization and diagnostics. lexical transfer morph. analyser morph. disambig. morph. generator post- generator SL text TL text deformatter reformatter structural transfer lexical selection Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 4. What is Apertium? Languages and language pairs afr nld arg cat ita bre fra spa cym eng glg dan nno nob ast por ron epo eus hbs mkd slv bul ind zsmisl swe kaz tat mlt ara oci sme urd hin Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 5. What is Apertium? Apertium loves small languages Some unique MT systems for small languages: Breton→French Aragonese↔Spanish Occitan↔Catalan Aragonese↔Catalan Occitan↔Spanish North Sámi→Norwegian To love is to give: e.g. provide small languages with language resources, and computational-linguistic descriptions of their language. Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 6. What is Apertium good for? What is Apertium good for? Apertium is basically good to translate between related languages. Some examples in Apertium: Spanish ↔ Portuguese Norwegian Nynorsk ↔ Norwegian Bokmål Slovenian ↔ Croatian Tatar ↔ Kazakh Postediting Apertium output in these cases may save time compared to translation from scratch. It is also being used for less-related language pairs in gisting applications. Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 7. Apertium is collaboratively developed Apertium licensing: free/open-source Apertium language data and code are both licensed under the GNU General Public License: a free/open-source license allowing free distribution of unmodied and modied versions a copylefted license: it avoids private appropriation and encourages giving improvements back to the project (a commons) → community Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 8. Apertium is collaboratively developed Apertium is collaboratively developed Very active group of hundreds of developers (freelance developers, researchers, industrial partners). Wiki documentation (wiki.apertium.org) in addition to formal documents. Help available at IRC channel #apertium in freenode.net Mailing lists: apertium-stuff@lists.sf.net and other language-specic lists Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 9. Apertium is collaboratively developed Research and business with Apertium Apertium is already an active research and business platform: Research: 40+ publications, 2 PhD thesis, 4 master's theses Business: companies (Prompsit, Eleka, Imaxin Software, etc.) oering services to customers such as Autodesk, the Government of Catalonia, one of the main Basque banks, the daily newspaper La Voz de Galicia, etc.) The free/open-source model creates a community which eectively connects researchers, developers, vendors and users. Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 10. Becoming an Apertium user Becoming an Apertium user Professional translators can: use Apertium oine plugins in the OmegaT free/open-source CAT environment. (as with any other system) easily align source and MT to generate machine translation memories to feed into other CAT systems Muggles can use: a stand-alone Java application for the desktop: apertium-caffeine an Android version for handhelds a stand-alone version (Apertium Simpleton) for Windows and MacOS. a plug-in for the OmegaT CAT platform apertium-omegat Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 11. Becoming an Apertium developer Becoming an Apertium developer It's easy to become an Apertium developer. It just takes reasonable computing skills (XML, shell commands, etc.), which are not too hard to acquire, good translation skills. In no time, developers nd themselves contributing to a language pair with the support of the community. Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 12. A nice side eect: monolingual resources A nice side eect: monolingual resources When developing a language pair, monolingual language resources are developed, such as morphological dictionaries morphological disambiguation rules and probabilities The corresponding monolingual processors are available to help statistical machine translation deal, for instance, with languages having a challenging morphology. Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13
  • 13. Success cases Success cases Apertium a is mature technology which is used: in Wikimedia Content Translation to generate Wikipedia content in other languages, to produce a Catalan edition of Valencia daily newspaper Levante-EMV, by Universities in the Catalan speaking area to help in the generation of courseware and academic information, in PLATA, the Spanish government platform for on-the-y webpage machine translation of public-service webpages. Mikel L. Forcada (Universitat d'Alacant, E-03690 Sant Vicent del Raspeig, Spain)Apertium: Free/open-source rule-based machine translation and language process Riga TAUS Roundtable, June 1, 2016 / 13