SlideShare a Scribd company logo
xenotext
                                              xenotext




              Introduction to
          translation technologies

                            Gerrit Sanders
                           www.xenotext.com




Computer-Assisted Translation
Computer-assisted translation                           xenotext
                                                          xenotext




           Computer-assisted translation (CAT)
           or computer-aided translation is a
           translation process in which a
           human translator uses software to obtain
           a higher degree of precision and efficiency.




2
Computer-Assisted   Translation                       Introduction
Computer-assisted translation                      xenotext
                                                     xenotext


   Typical components of a CAT-solution include:

                                     Data mining tools:
       Translation memory
                                       alignment and
              (TM)
                                      term extraction



         Translation editor           Quality assurance



                                   Translation management
             Termbase
                                        system (TMS)


33
Computer-Assisted   Translation                      Introduction
xenotext
                                        xenotext




                   Translation memory
                          (TM)




Computer-Assisted Translation
Translation memory                                           xenotext
                                                               xenotext




      A translation memory (TM) is a database that
      stores sentences and their translations for reuse in
      new translation projects.




    This is a                                               Ceci est
                                  This is a sentence.
    sentence.                     Ceci est une phrase.
                                                            une phrase.




5
Computer-Assisted   Translation                          Translation memory
Translation unit                                              xenotext
                                                                xenotext


      A record in the translation memory is called a
      translation unit (TU).




          source segment               This is a sentence.
           target segment              Ceci est une phrase.
                                    Created on:   18/09/2006
                                    Created by:   Gerrit
         information fields
                                    Customer:     ACME
                                    Project:      Training




6
Computer-Assisted     Translation                         Translation memory
Segmentation                                                             xenotext
                                                                           xenotext




       Segmentation is the process of splitting the new
       source text into logical, reusable units.
       Segmentation can be either sentence-based or
       paragraph-based.

   Paragraph-based segmentation               Sentence-based segmentation
   1   Welcome to Brussels                    1   Welcome to Brussels
   2   Brussels is the capital of             2   Brussels is the capital of
       Belgium. It is officially bilingual.       Belgium.
                                              3   It is officially bilingual.




7
Computer-Assisted    Translation                                  Translation memory
Match types                                                         xenotext
                                                                      xenotext



                                   Translation memory
                                            (TM)



       0%                99% or lower              100%                101% ??


   No match             Fuzzy match            Exact match         Context match
The new source        The new source         The new source       The new source
segment is            segment is             segment is           segment is
not found in the      similar (but not       identical to a       identical to a
TM.                   identical) to a        source segment       source segment
                      source segment         found in the TM.     found in the TM
                      found in the TM.                            and they both
                                                                  have the same
                                                                  context.

8Computer-Assisted   Translation                                Translation memory
TMX                                                    xenotext
                                                         xenotext




   • Most translation memory tools support TMX
     (Translation Memory eXchange), an XML-based
     open standard for the exchange of translation
     memory data.
   • TMX is developed and maintained by LISA
     (www.lisa.org).
    TMX does not ensure 100% compatibility between
         different translation tools: e.g. segmentation or
         formatting may be handled in different ways.



9
Computer-Assisted   Translation                    Translation memory
SRX                                              xenotext
                                                   xenotext




   • SRX (Segmentation Rules eXchange) is an
     XML-based open standard for the exchange of
     segmentation rules.
   • Without SRX, TMX leverage may be lower than
     expected.
   • SRX is developed and maintained by LISA
     (www.lisa.org).
    SRX is currently not supported by SDL Trados.




10
Computer-Assisted   Translation             Translation memory
xenotext
                                           xenotext




                      Translation editor




Computer-Assisted Translation
Translation editor                                  xenotext
                                                      xenotext




   • A translation editor is the translator's working
     environment, offering easy access to source and target
     segments.
   • Translation editors typically include spelling checkers
     in a wide variety of languages, and may enable the
     user to add comments or status indications to
     each translation.
   • File filters convert the source document to a
     translatable (or localizable) format, such as XLIFF.


12
Computer-Assisted   Translation                  Translation editor
File filters                                                                    xenotext
                                                                                  xenotext


    Source Document                    Translation Editor              Target Document


      HTML        DLL                                                    HTML         DLL
    EXE     PowerPoint                                                  EXE      PowerPoint
     InDesign      PHP                                                   InDesign      PHP
  SGML FrameMaker                                                      SGML FrameMaker
                                             XLIFF
         DOCX           File filters                    File filters          DOCX
   PDF      RTF                                                         PDF     RTF
         QuarkXPress                                                          QuarkXPress
   OpenOffice      Excel                                               OpenOffice      Excel
           TXT     XML                                                          TXT    XML
  DITA                                                                 DITA
          PageMaker                                                            PageMaker


13
Computer-Assisted       Translation                                       Translation editor
XLIFF                                              xenotext
                                                     xenotext




  • XLIFF (XML Localization Interchange File Format)
    is an XML-based open standard for translatable (or
    localizable) files.
  • XLIFF is developed and maintained by OASIS
    (www.oasis-open.org).
   There are various "flavours" of XLIFF (e.g. SDLXLIFF),
      which in practice complicates the interchange of XLIFF
      data between different tools.




14
Computer-Assisted   Translation                 Translation editor
XLIFF                                                      xenotext
                                                             xenotext




                                         XLIFF
                                  (localization data)
      source                                             target

                                       skeleton
                                     (other data)




15
Computer-Assisted   Translation                         Translation editor
xenotext
                                            xenotext




                                Alignment




Computer-Assisted Translation
Alignment                                          xenotext
                                                     xenotext




      Alignment is the process in which specialized
      software compares a source text with its
      translation, matching equivalent segments, e.g. for
      the purpose of creating a translation memory.
      In a semi-automatic alignment process, the
      alignment results are reviewed and misalignments
      are corrected by a human linguist.




17
17
 Computer-Assisted   Translation                      Alignment
Alignment process                                    xenotext
                                                       xenotext


     legacy     segmentation
                                  revision   export     import
   documents     + alignment


    source
     file


                                             TMX      translation
                                                        memory
    target
     file




18
Computer-Assisted   Translation                          Alignment
xenotext
                                           xenotext




                                Termbase




Computer-Assisted Translation
Example entry structure                               xenotext
                                                        xenotext


      Entry             Subject

                         Note

                        English   Definition   Source

                                    Term       Gender

                                               Source

                                    Term       Gender

                                               Source


                        French    Definition   Source

                                    Term       Gender

                                               Source


20
Computer-Assisted   Translation                         Termbase
Concept-oriented termbases                            xenotext
                                                        xenotext




            Your concept may
            look like this




      All terms and synonyms referring to the same concept
      should be stored in the same entry:
      car, motorcar, automobile, voiture, bagnole, ...

      This will ensure that each language in your termbase
      can be used as source or target language.


21
Computer-Assisted   Translation                          Termbase
TBX                                              xenotext
                                                   xenotext




     • TBX (TermBase eXchange) is an XML-based
       open standard for exchanging structured
       terminological data.
     • The TBX standard is developed by LISA
       (www.lisa.org) and has also been published as an
       ISO standard.




22
22
 Computer-Assisted   Translation                    Termbase
Term extraction                                      xenotext
                                                       xenotext




       Term extraction (or terminology extraction)
       is the process of extracting mono- or bilingual lists
       of potentially interesting terms from a selection of
       electronic texts.




23
23
 Computer-Assisted   Translation                        Termbase
Terminology extraction                             xenotext
                                                     xenotext




        Linguistic term extraction:
        • uses grammatical information to identify
          term candidates (and their translations)
        • language dependent

        Statistical term extraction:
        • looks for repeated sequences of lexical items
        • language independent




24
Computer-Assisted   Translation                       Termbase
xenotext
                                          xenotext




                      Quality assurance
                            (QA)




Computer-Assisted Translation
xenotext
                                                         xenotext




     Quality assurance (QA) tools detect formal errors in
     translations and/or translation memories, and enable
     their correction.
     Traceable errors include omissions, inconsistent
     translations, punctuation differences, formatting
     problems, terminology errors etc.
      QA tools do NOT guarantee a flawless translation!




26
26
 Computer-Assisted   Translation                  Quality assurance
The end...                    xenotext
                                xenotext




          www.xenotext.com




27
Computer-Assisted Translation

More Related Content

PPT
CAT TOOLS.ppt
PPTX
Email writing
PPTX
PPTX
Literature as text
PPTX
TT Sight Translation
PPT
Action research
PPT
Translation Types
CAT TOOLS.ppt
Email writing
Literature as text
TT Sight Translation
Action research
Translation Types

What's hot (20)

PDF
Types of Translation, By Dr. Shadia Yousef Banjar
PPT
Types of translation
PPTX
The position of Translated Literature within the Literary Polysystem
PPT
Challenges of Translation
PPT
Untranslatability in translation
PPTX
Peter newmark
PPTX
Translation theory before the 20th century
PPTX
Theories_of_translation.pptx
PPTX
Trasnlation shift
PPTX
Translation theory
PDF
Translation Strategies, by Dr. Shadia Y. Banjar
PPT
Equivalencein translation
PPTX
Translation and culture
PPT
08 Literary Translation #1 Prose
PPTX
Development of translation theory (ling)
DOCX
01 translation and interpretation
PPTX
Translation studies
PPTX
translation method
PPTX
Theory of translation
PDF
TRANSLATION UNIT, by Dr. Shadia Yousef Banjar
Types of Translation, By Dr. Shadia Yousef Banjar
Types of translation
The position of Translated Literature within the Literary Polysystem
Challenges of Translation
Untranslatability in translation
Peter newmark
Translation theory before the 20th century
Theories_of_translation.pptx
Trasnlation shift
Translation theory
Translation Strategies, by Dr. Shadia Y. Banjar
Equivalencein translation
Translation and culture
08 Literary Translation #1 Prose
Development of translation theory (ling)
01 translation and interpretation
Translation studies
translation method
Theory of translation
TRANSLATION UNIT, by Dr. Shadia Yousef Banjar
Ad

Viewers also liked (20)

PDF
ATA 2009 Translation Tools Seminar
PPT
Types of translation
PPT
Translation Studies
PPT
Translation
PPT
Translation techniques presentation
PPTX
Introducing cat tools
PPTX
Tools of translation
PDF
Becoming a Tech-Savvy Translator and Interpreter in the Digital Age
PPTX
Why the Address Translation Scheme Matters?
PDF
DAT Education
PPT
Address translation-mechanism-of-80386 by aniket bhute
PPT
Translation Services
PDF
Introduction to 80386 microprocessor
PDF
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
PPTX
80386 Architecture
PDF
Operating Systems - memory management
PPTX
Microprocessor Protected Mode Memory addressing By DHEERAJ KATARIA
PDF
SmithStreet Presentation & Translation Services
PDF
Memory segmentation-of-8086
ATA 2009 Translation Tools Seminar
Types of translation
Translation Studies
Translation
Translation techniques presentation
Introducing cat tools
Tools of translation
Becoming a Tech-Savvy Translator and Interpreter in the Digital Age
Why the Address Translation Scheme Matters?
DAT Education
Address translation-mechanism-of-80386 by aniket bhute
Translation Services
Introduction to 80386 microprocessor
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
80386 Architecture
Operating Systems - memory management
Microprocessor Protected Mode Memory addressing By DHEERAJ KATARIA
SmithStreet Presentation & Translation Services
Memory segmentation-of-8086
Ad

Similar to Introduction To Translation Technologies (20)

PDF
Machine Translation
PDF
Zerfass trends in translation technologies
PPTX
Machine Tanslation
PDF
Breaking the language barrier: how do we quickly add multilanguage support in...
PDF
Bilingual Terminology Extraction
PDF
SDL BeGlobal The SDL Platform for Automated Translation
PPT
SDL Trados training course
PDF
Lets Localize Your Plugins
PPTX
PPT
What is machine translation
PDF
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
PPTX
Speech Recognition
PPTX
Tamil OCR using Tesseract OCR Engine
PDF
Integration of speech recognition with computer assisted translation
PDF
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
PPT
Perfect papers software
PPTX
Transformers AI PPT.pptx
PDF
Heartsome Portfolio
PPT
A 10 Point Localisation Plan For Games
PDF
microprocesser-140306112352-phpapp01.pdf
Machine Translation
Zerfass trends in translation technologies
Machine Tanslation
Breaking the language barrier: how do we quickly add multilanguage support in...
Bilingual Terminology Extraction
SDL BeGlobal The SDL Platform for Automated Translation
SDL Trados training course
Lets Localize Your Plugins
What is machine translation
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
Speech Recognition
Tamil OCR using Tesseract OCR Engine
Integration of speech recognition with computer assisted translation
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
Perfect papers software
Transformers AI PPT.pptx
Heartsome Portfolio
A 10 Point Localisation Plan For Games
microprocesser-140306112352-phpapp01.pdf

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Cloud computing and distributed systems.
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Per capita expenditure prediction using model stacking based on satellite ima...
sap open course for s4hana steps from ECC to s4
Mobile App Security Testing_ A Comprehensive Guide.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Cloud computing and distributed systems.
The Rise and Fall of 3GPP – Time for a Sabbatical?
Programs and apps: productivity, graphics, security and other tools
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Weekly Chronicles - August'25 Week I
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Network Security Unit 5.pdf for BCA BBA.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
MIND Revenue Release Quarter 2 2025 Press Release
Understanding_Digital_Forensics_Presentation.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Introduction To Translation Technologies

  • 1. xenotext xenotext Introduction to translation technologies Gerrit Sanders www.xenotext.com Computer-Assisted Translation
  • 2. Computer-assisted translation xenotext xenotext Computer-assisted translation (CAT) or computer-aided translation is a translation process in which a human translator uses software to obtain a higher degree of precision and efficiency. 2 Computer-Assisted Translation Introduction
  • 3. Computer-assisted translation xenotext xenotext Typical components of a CAT-solution include: Data mining tools: Translation memory alignment and (TM) term extraction Translation editor Quality assurance Translation management Termbase system (TMS) 33 Computer-Assisted Translation Introduction
  • 4. xenotext xenotext Translation memory (TM) Computer-Assisted Translation
  • 5. Translation memory xenotext xenotext A translation memory (TM) is a database that stores sentences and their translations for reuse in new translation projects. This is a Ceci est This is a sentence. sentence. Ceci est une phrase. une phrase. 5 Computer-Assisted Translation Translation memory
  • 6. Translation unit xenotext xenotext A record in the translation memory is called a translation unit (TU). source segment This is a sentence. target segment Ceci est une phrase. Created on: 18/09/2006 Created by: Gerrit information fields Customer: ACME Project: Training 6 Computer-Assisted Translation Translation memory
  • 7. Segmentation xenotext xenotext Segmentation is the process of splitting the new source text into logical, reusable units. Segmentation can be either sentence-based or paragraph-based. Paragraph-based segmentation Sentence-based segmentation 1 Welcome to Brussels 1 Welcome to Brussels 2 Brussels is the capital of 2 Brussels is the capital of Belgium. It is officially bilingual. Belgium. 3 It is officially bilingual. 7 Computer-Assisted Translation Translation memory
  • 8. Match types xenotext xenotext Translation memory (TM) 0% 99% or lower 100% 101% ?? No match Fuzzy match Exact match Context match The new source The new source The new source The new source segment is segment is segment is segment is not found in the similar (but not identical to a identical to a TM. identical) to a source segment source segment source segment found in the TM. found in the TM found in the TM. and they both have the same context. 8Computer-Assisted Translation Translation memory
  • 9. TMX xenotext xenotext • Most translation memory tools support TMX (Translation Memory eXchange), an XML-based open standard for the exchange of translation memory data. • TMX is developed and maintained by LISA (www.lisa.org).  TMX does not ensure 100% compatibility between different translation tools: e.g. segmentation or formatting may be handled in different ways. 9 Computer-Assisted Translation Translation memory
  • 10. SRX xenotext xenotext • SRX (Segmentation Rules eXchange) is an XML-based open standard for the exchange of segmentation rules. • Without SRX, TMX leverage may be lower than expected. • SRX is developed and maintained by LISA (www.lisa.org).  SRX is currently not supported by SDL Trados. 10 Computer-Assisted Translation Translation memory
  • 11. xenotext xenotext Translation editor Computer-Assisted Translation
  • 12. Translation editor xenotext xenotext • A translation editor is the translator's working environment, offering easy access to source and target segments. • Translation editors typically include spelling checkers in a wide variety of languages, and may enable the user to add comments or status indications to each translation. • File filters convert the source document to a translatable (or localizable) format, such as XLIFF. 12 Computer-Assisted Translation Translation editor
  • 13. File filters xenotext xenotext Source Document Translation Editor Target Document HTML DLL HTML DLL EXE PowerPoint EXE PowerPoint InDesign PHP InDesign PHP SGML FrameMaker SGML FrameMaker XLIFF DOCX File filters File filters DOCX PDF RTF PDF RTF QuarkXPress QuarkXPress OpenOffice Excel OpenOffice Excel TXT XML TXT XML DITA DITA PageMaker PageMaker 13 Computer-Assisted Translation Translation editor
  • 14. XLIFF xenotext xenotext • XLIFF (XML Localization Interchange File Format) is an XML-based open standard for translatable (or localizable) files. • XLIFF is developed and maintained by OASIS (www.oasis-open.org).  There are various "flavours" of XLIFF (e.g. SDLXLIFF), which in practice complicates the interchange of XLIFF data between different tools. 14 Computer-Assisted Translation Translation editor
  • 15. XLIFF xenotext xenotext XLIFF (localization data) source target skeleton (other data) 15 Computer-Assisted Translation Translation editor
  • 16. xenotext xenotext Alignment Computer-Assisted Translation
  • 17. Alignment xenotext xenotext Alignment is the process in which specialized software compares a source text with its translation, matching equivalent segments, e.g. for the purpose of creating a translation memory. In a semi-automatic alignment process, the alignment results are reviewed and misalignments are corrected by a human linguist. 17 17 Computer-Assisted Translation Alignment
  • 18. Alignment process xenotext xenotext legacy segmentation revision export import documents + alignment source file TMX translation memory target file 18 Computer-Assisted Translation Alignment
  • 19. xenotext xenotext Termbase Computer-Assisted Translation
  • 20. Example entry structure xenotext xenotext Entry Subject Note English Definition Source Term Gender Source Term Gender Source French Definition Source Term Gender Source 20 Computer-Assisted Translation Termbase
  • 21. Concept-oriented termbases xenotext xenotext Your concept may look like this All terms and synonyms referring to the same concept should be stored in the same entry: car, motorcar, automobile, voiture, bagnole, ... This will ensure that each language in your termbase can be used as source or target language. 21 Computer-Assisted Translation Termbase
  • 22. TBX xenotext xenotext • TBX (TermBase eXchange) is an XML-based open standard for exchanging structured terminological data. • The TBX standard is developed by LISA (www.lisa.org) and has also been published as an ISO standard. 22 22 Computer-Assisted Translation Termbase
  • 23. Term extraction xenotext xenotext Term extraction (or terminology extraction) is the process of extracting mono- or bilingual lists of potentially interesting terms from a selection of electronic texts. 23 23 Computer-Assisted Translation Termbase
  • 24. Terminology extraction xenotext xenotext Linguistic term extraction: • uses grammatical information to identify term candidates (and their translations) • language dependent Statistical term extraction: • looks for repeated sequences of lexical items • language independent 24 Computer-Assisted Translation Termbase
  • 25. xenotext xenotext Quality assurance (QA) Computer-Assisted Translation
  • 26. xenotext xenotext Quality assurance (QA) tools detect formal errors in translations and/or translation memories, and enable their correction. Traceable errors include omissions, inconsistent translations, punctuation differences, formatting problems, terminology errors etc.  QA tools do NOT guarantee a flawless translation! 26 26 Computer-Assisted Translation Quality assurance
  • 27. The end... xenotext xenotext www.xenotext.com 27 Computer-Assisted Translation