SlideShare a Scribd company logo
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE


An MT journey: MT in use at
Sybase, a SAP company
14:20-14:40
Monday 4 June


Kerstin Bier
Sybase
An MT journey:
Moses in use at
Sybase, an SAP company

Kerstin Bier
Sybase Technical Publications Solutions
Our way to MT – When we started in 2009...

                                           Traditional TM technology almost fully
                                            exploited:
                      Cost pressure             ca. 80% of costs spent on „new“ words
                                                only 20% spent on recycling

                                           No more improvements in turnaround
                                            times:
                                               Average translator productivity 2400
                    Time to market
                                               words/day or less (JA) for years


                                           Encouraging developments in the industry:
                                               SMT success stories (Microsoft, Autodesk ...)
              (S)MT developments               Availability of Moses Open Source toolkit




3 – Sybase Confidential – June 11, 2012
Our way to MT – The beginnings
      We were skeptical...
       Costs - Volumes big, but not big enough to justify huge
        up-front investment: Is open source MT a viable option?
       Knowledge - Small team, limited expertise in-house:
        Can we take that on?
       Quality – Can we maintain quality level with post-editing?

      How it all began...
       Joined TDA (TAUS Data Association)
       Participated in Microsoft SMT pilot (see www.tausdata.org)
       Built up relationship with MT partner:
       First contact with Pangeanic in 2009
       Trial project: Surprisingly good results for EN-DE engine

4 – Sybase Confidential – June 11, 2012
Our way to MT: From trial to production




5 – Sybase Confidential – June 11, 2012
Our Moses engine
    Training Data: 5 million source words in TMX files
              Small data volume, but we do not have more
              Our own data to have better control
    Moses phrase decoder with add-on for inline markup
     handling (PangeaMT)
    Language pair: EN -> DE
              Other languages? Planned but not started yet
    Setup:
              In-house: Reasonably powerful machine currently
               sufficient (64-bit 8 CPU)
              Ubuntu Linux, Moses decoder, PangeaMT, several pre-
               and post-processing scripts

6 – Sybase Confidential – June 11, 2012
Our challenges: What Moses does not offer

  Engine training/data issues
            Getting enough and the right data
  Integration into translation workflow
            WorldServer
            Post-editing environment and translator resistance
  Getting the data right: Handling special input and
   output requirements
            Inline XML tags
            „Hybrid“ content: Translations mixed with EN
            New terminology
  Metrics:
            Engine performance: bad output vs. good output
            Post-editing effort/productivity increase
7 – Sybase Confidential – June 11, 2012
What Moses does not offer:
        Handling special output requirements
                                          Solutions for inline tag handling:
                                           PangeaMT inline handler
                  Inline Markup            Scripts to pre-process „notranslate“ tags
                      Content              Scripts to resolve UI references

                                          Solutions for DNT handling:
                                           product/component names mostly OK through
                                            engine training
                  DNTs                     domain-specific untranslatable terms: pre-
            (Do Not Translates)             processing and Moses XML markup

                                          Solutions for unknown terminology:
                                           Pre-processing: OOV analysis in Moses, feed in
                     Unknown                 via XML markup
                    terminology            Issues: English multi-word compounds and
                                             grammar (cases)

8 – Sybase Confidential – June 11, 2012
What Moses does not offer:
      Measuring engine performance and productivity increase
  Measuring engine performance is critical:
    to evaluate how engine evolves
    as basis for translator payment

  Option 1: Scoring tools
     Problem with BLEU & friends: Just averages
     Need to find out how much good, average and bad output
      there is, ideally something like TM matching categories?

  Option 2: Human evaluation
     Ask translators to assess quality? We found that highly
      subjective („I had to re-translate everything“)
     We still continue to interview them but treat feedback
      with caution
9 – Sybase Confidential – June 11, 2012
Custom solution: Measuring engine
        performance with segment-based scoring

         Developed a workflow to evaluate MT output with METEOR
         Developed scripts to determine percentage of„crap“ the
          engine produces and categorized PE effort
         Basis for payment: Below threshold = full price, over
          threshold = fixed price reduction

      METEOR
      score range

      100-70
      50- 69
      40 - 49
      30 - 39
      0 – 29                               Retraining

10 – Sybase Confidential – June 11, 2012
Our results




11 – Sybase Confidential – June 11, 2012
Moses - Lessons Learned
   ² Moses out-of-the box offers:
      State-of-the-art SMT toolkit and lots of tools
   ² Moses MT output often better than expected
              Often better than translators say
              Output quality depends on your training data AND can be
               influenced greatly by pre- and post-processing
   ² Moses is still a toolkit „only“
              You will likely need to develop your own process around MT
              You need a partner or you need capable developer(s) – you
               do not need to know all the inside stuff
              You definitely need to know your data & output requirements to
               customize the tools around Moses
12 – Sybase Confidential – June 11, 2012
Moses – Limitations: What we think is needed

² Handling of source formats and tags
         Support of plain text data for training: Loss of valuable meta-data
         Inline tag handler should be integrated (promising: m4loc!)
² Limited/no support for Asian languages „out-of-the box“
         Need for separate tokenizer/word breaker
² Better integration of more useful metrics &
  confidence values
         BLEU & friends are just averages and can only be applied when
          you have reference content
         Integration of confidence values to filter out bad content: Critical
          to improve translator acceptance of MT

13 – Sybase Confidential – June 11, 2012
Thank you!
             For questions, feel free to contact Kerstin.Bier@sap.com.




14 – Sybase Confidential – June 11, 2012
Examples: MT output and PE effort
                 Minimal PE effort




                     Small PE effort




                Medium PE effort




15 – Sybase Confidential – June 11, 2012

More Related Content

ODP
Mts manual v.4.0
PPTX
My Vote, My Future
PPTX
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
PDF
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
PPTX
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
PPTX
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Truste...
PPT
Lexcelera MT Breaking Compromises
PPT
What is machine translation
Mts manual v.4.0
My Vote, My Future
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Truste...
Lexcelera MT Breaking Compromises
What is machine translation

Similar to TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4 June 2012 (20)

PDF
TAUS USER CONFERENCE 2010, Sony, Pangeanic - moving on with mt - building op...
PPTX
Trends In Technology: Worldware 2010
PPTX
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Gustavo Lucardi, Trust...
PDF
TAUS MT SHOWCASE, Moses in the Mix. A Technology Agnostic Approach to a Winni...
PDF
TAUS MT Showcase, Sovee Smart Engine 2.0, A Leap Beyond Base Moses Technology...
PPTX
EAMT Presentation by Welocalize Olga Beregovaya May 2015
PPTX
TAUS Moses Roundtable, Prague, 11 September 2013
PPTX
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Full Service Enterpri...
PPT
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
PDF
Control over digital technology with foss-tools
PDF
TAUS USER CONFERENCE 2010, Machine and human translation integration; bridgin...
PPTX
machine transaltion
PDF
TAUS Scotland Asia Online Technology Platform V1
PDF
MTexperiences Sony Europe PangeaMT _f_prastarosony_eyustepangeamt
PPTX
Presentation at CEF-EU-Luxembourg
PDF
Eamt olga beregovaya_keynote
PDF
iMT Language Solutions
 
PPTX
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 ...
PPTX
(Recent) technology trends and bridges to gap in the localization industry
PPTX
Tms days 04 2012 manuel herranz pangea mt
TAUS USER CONFERENCE 2010, Sony, Pangeanic - moving on with mt - building op...
Trends In Technology: Worldware 2010
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Gustavo Lucardi, Trust...
TAUS MT SHOWCASE, Moses in the Mix. A Technology Agnostic Approach to a Winni...
TAUS MT Showcase, Sovee Smart Engine 2.0, A Leap Beyond Base Moses Technology...
EAMT Presentation by Welocalize Olga Beregovaya May 2015
TAUS Moses Roundtable, Prague, 11 September 2013
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Full Service Enterpri...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
Control over digital technology with foss-tools
TAUS USER CONFERENCE 2010, Machine and human translation integration; bridgin...
machine transaltion
TAUS Scotland Asia Online Technology Platform V1
MTexperiences Sony Europe PangeaMT _f_prastarosony_eyustepangeamt
Presentation at CEF-EU-Luxembourg
Eamt olga beregovaya_keynote
iMT Language Solutions
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Monaco, Joel Sigling, AVB, 25 ...
(Recent) technology trends and bridges to gap in the localization industry
Tms days 04 2012 manuel herranz pangea mt
Ad

More from TAUS - The Language Data Network (20)

PPTX
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
PPTX
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
PPTX
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
PPTX
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
PPTX
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
PPTX
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
PDF
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
PPTX
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
PPTX
A translation memory P2P trading platform - to make global translation memory...
PPTX
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
PPT
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
PPTX
Farmer Lv (TrueTran)
PPT
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
PPTX
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
PPTX
Translation Technology Showcase in Shenzhen
PPTX
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
PPTX
SDL Trados Studio 2017, Jocelyn He (SDL)
PPTX
How we train post-editors - Yongpeng Wei (Lingosail)
PDF
A use-case for getting MT into your company, Kerstin Berns (berns language c...
PPTX
QE integrated in XTM, by Bob Willans (XTM)
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
A translation memory P2P trading platform - to make global translation memory...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Farmer Lv (TrueTran)
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
Translation Technology Showcase in Shenzhen
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
SDL Trados Studio 2017, Jocelyn He (SDL)
How we train post-editors - Yongpeng Wei (Lingosail)
A use-case for getting MT into your company, Kerstin Berns (berns language c...
QE integrated in XTM, by Bob Willans (XTM)
Ad

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Encapsulation_ Review paper, used for researhc scholars
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
20250228 LYD VKU AI Blended-Learning.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Spectral efficient network and resource selection model in 5G networks
The Rise and Fall of 3GPP – Time for a Sabbatical?
“AI and Expert System Decision Support & Business Intelligence Systems”
Encapsulation_ Review paper, used for researhc scholars
The AUB Centre for AI in Media Proposal.docx
Empathic Computing: Creating Shared Understanding
Programs and apps: productivity, graphics, security and other tools
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Review of recent advances in non-invasive hemoglobin estimation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4 June 2012

  • 1. TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE An MT journey: MT in use at Sybase, a SAP company 14:20-14:40 Monday 4 June Kerstin Bier Sybase
  • 2. An MT journey: Moses in use at Sybase, an SAP company Kerstin Bier Sybase Technical Publications Solutions
  • 3. Our way to MT – When we started in 2009...  Traditional TM technology almost fully exploited: Cost pressure  ca. 80% of costs spent on „new“ words  only 20% spent on recycling  No more improvements in turnaround times:  Average translator productivity 2400 Time to market words/day or less (JA) for years  Encouraging developments in the industry:  SMT success stories (Microsoft, Autodesk ...) (S)MT developments  Availability of Moses Open Source toolkit 3 – Sybase Confidential – June 11, 2012
  • 4. Our way to MT – The beginnings We were skeptical...  Costs - Volumes big, but not big enough to justify huge up-front investment: Is open source MT a viable option?  Knowledge - Small team, limited expertise in-house: Can we take that on?  Quality – Can we maintain quality level with post-editing? How it all began...  Joined TDA (TAUS Data Association)  Participated in Microsoft SMT pilot (see www.tausdata.org)  Built up relationship with MT partner:  First contact with Pangeanic in 2009  Trial project: Surprisingly good results for EN-DE engine 4 – Sybase Confidential – June 11, 2012
  • 5. Our way to MT: From trial to production 5 – Sybase Confidential – June 11, 2012
  • 6. Our Moses engine  Training Data: 5 million source words in TMX files  Small data volume, but we do not have more  Our own data to have better control  Moses phrase decoder with add-on for inline markup handling (PangeaMT)  Language pair: EN -> DE  Other languages? Planned but not started yet  Setup:  In-house: Reasonably powerful machine currently sufficient (64-bit 8 CPU)  Ubuntu Linux, Moses decoder, PangeaMT, several pre- and post-processing scripts 6 – Sybase Confidential – June 11, 2012
  • 7. Our challenges: What Moses does not offer  Engine training/data issues  Getting enough and the right data  Integration into translation workflow  WorldServer  Post-editing environment and translator resistance  Getting the data right: Handling special input and output requirements  Inline XML tags  „Hybrid“ content: Translations mixed with EN  New terminology  Metrics:  Engine performance: bad output vs. good output  Post-editing effort/productivity increase 7 – Sybase Confidential – June 11, 2012
  • 8. What Moses does not offer: Handling special output requirements Solutions for inline tag handling:  PangeaMT inline handler Inline Markup  Scripts to pre-process „notranslate“ tags Content  Scripts to resolve UI references Solutions for DNT handling:  product/component names mostly OK through engine training DNTs  domain-specific untranslatable terms: pre- (Do Not Translates) processing and Moses XML markup Solutions for unknown terminology:  Pre-processing: OOV analysis in Moses, feed in Unknown via XML markup terminology  Issues: English multi-word compounds and grammar (cases) 8 – Sybase Confidential – June 11, 2012
  • 9. What Moses does not offer: Measuring engine performance and productivity increase  Measuring engine performance is critical:  to evaluate how engine evolves  as basis for translator payment  Option 1: Scoring tools  Problem with BLEU & friends: Just averages  Need to find out how much good, average and bad output there is, ideally something like TM matching categories?  Option 2: Human evaluation  Ask translators to assess quality? We found that highly subjective („I had to re-translate everything“)  We still continue to interview them but treat feedback with caution 9 – Sybase Confidential – June 11, 2012
  • 10. Custom solution: Measuring engine performance with segment-based scoring  Developed a workflow to evaluate MT output with METEOR  Developed scripts to determine percentage of„crap“ the engine produces and categorized PE effort  Basis for payment: Below threshold = full price, over threshold = fixed price reduction METEOR score range 100-70 50- 69 40 - 49 30 - 39 0 – 29 Retraining 10 – Sybase Confidential – June 11, 2012
  • 11. Our results 11 – Sybase Confidential – June 11, 2012
  • 12. Moses - Lessons Learned ² Moses out-of-the box offers:  State-of-the-art SMT toolkit and lots of tools ² Moses MT output often better than expected  Often better than translators say  Output quality depends on your training data AND can be influenced greatly by pre- and post-processing ² Moses is still a toolkit „only“  You will likely need to develop your own process around MT  You need a partner or you need capable developer(s) – you do not need to know all the inside stuff  You definitely need to know your data & output requirements to customize the tools around Moses 12 – Sybase Confidential – June 11, 2012
  • 13. Moses – Limitations: What we think is needed ² Handling of source formats and tags  Support of plain text data for training: Loss of valuable meta-data  Inline tag handler should be integrated (promising: m4loc!) ² Limited/no support for Asian languages „out-of-the box“  Need for separate tokenizer/word breaker ² Better integration of more useful metrics & confidence values  BLEU & friends are just averages and can only be applied when you have reference content  Integration of confidence values to filter out bad content: Critical to improve translator acceptance of MT 13 – Sybase Confidential – June 11, 2012
  • 14. Thank you! For questions, feel free to contact Kerstin.Bier@sap.com. 14 – Sybase Confidential – June 11, 2012
  • 15. Examples: MT output and PE effort Minimal PE effort Small PE effort Medium PE effort 15 – Sybase Confidential – June 11, 2012

Editor's Notes

  • #3: Very quick introduction of where I come from: I am a Localization Manager for one of Sybase product groups, a group focusing on a database for the mobile and embedded market. We have a long history of localization since the mid-902 with really large-scale documentation. We are a small group with just 3.5 employees and 1/3 manager. We have been using open source MT since early 2010, in production. In my presentation, I will provide insights into how our journey began, how it proceeded, which challenges we were facing along the way and which results we got.
  • #4: Back in 2009 when it all started, we were already doing quite well: We had been able to reduce our translation costs year over year through optimization of processes, use of language technology and automation. Along with the cost optimization, we of course were able to optimize turnaround times. Situation: Traditional TM technology almost fully exploited: ca. 80 % of the translation costs went into „new text “ (no matches) – 80% of the dark blue bar in 2009 only 20% were spent on recycling (e.g. 100% Matches, Fuzzy Matches) We did not see any more improvement with traditional TM-based translation Typical translator volume 2500 „new “ words/day Additional target languages with high Investment in a fully translated new products hard to justify. We felt on one hand that Machine Translation was the only way to go, on the other hand...
  • #5: we were still skeptical: Our volumes were high enough to be under cost pressue, but not high enough to justify huge up-front investment. We were only a product group within Sybase and could not find interested other groups (yet) – they were all busy with other stuff and were mostly dealing with Asian languages which we considered too difficult to start with. Then we were also skeptical that we could handle it in addition to all the work we already head. We are a small team, only hand limited resources and MT expertise in-house. We needed to learn and we likely needed an external partner to get started. Another factor was the quality: What we saw was good but did not fully convince us that we could maintain our high quality levels. Still, we decided to move forward: We joined TAUS Data Association, expecting for learning and also to have access to more training data. We participated in a Microsoft SMT pilot , really very good results. We then built up relationship with an external MT partner, Pangeanic, and started just a trial project with some small-scale PE: Very good results so we decided to move forward with pilot project.
  • #6: We then went ahead and kicked off a pilot project but decided to start small: One product, one language direction, small engine (2.5 million words). We also started with a sub-set of real-world content only – believing that the results would not be as good as in the trials, as we consider the usual evaluation with 1000 or 2000 segments held back not reflecting reality. The tests were successful, we did not have to pull the plug, and with the results it was reasonably easy to get management buy-in for some investment: We had the engine developed by Pangeanic. After several iterations we were able to complete a production engine just in time for a major project: 400,000 words MT plus Post-editing. Results: Successful but with variations – we saw productivity increases of 5—70% but we achieved ROI after half of the project and finally a cost saving. Since then the engine has been retrained several times, and has been in production for two minor releases and is currently being used for one major release for a totally new component. Theoretically we are ready to scale-up and extend to more products and other languages, but have not had the resources internally to coordinate.
  • #7: Stable engines start at 10 million words, we only have 5 million (source words) Still small: In-house Linux machine, reasonably powerful.
  • #8: The training data: According to SMT specialists, stable SMT volumes start with 10 million words of bilingual data (counting the source words of course). The data needs to be in-domain. Integration into workflow: Our vendor uses Worldserver, we do not have an in-house installation of WS and neither want to spend time nor money on developing an adaptor for WS (yet). We solved that issue with some scripting. Post-Editing process : We did not have any experience with post-editing. The resistance among translators to be measured in this way is very strong, in particular among freelance translators. There are fast translators and slow translators, some can cope well with PE task some but our vendor was reasonably experienced and cooperativ to work with us on a solution. The output issues: - We have content with lots of inline tags (markup for UI elements, syntax, emphasis etc.). Since we are doing full post-editing, the PE could fiddle them in in the right place, but it is clear that productivity is much better when you have the inline tags in the right place. We have content that I ‘ d call „hybrid “ to use a positive wording – we need to leave a lot in EN (database keywords, resultsets from the sample DB, function names and much more…). (Human) translators often have problems getting this right so how can a machine take the right decision? We are developing cutting-edge software, often with functionality that is new in the industry where terminology is brand-new and bilingual data cannot be found. Unlike RBMT, SMT engines do not have a feature to feed in terms just a dictionary. In some cases it is sufficient to have one occurrence of a term. Metrics: Generally, the approach ist to measure in BLEU or other scores, and it is just an average. What we were interested in: How much really good output, how much average output and how much really bad output does the engine produce? What is the impact on the post-editing effort and the productivity increase and hence on the cost savings?
  • #13: Moses out-of-the box offers state-of-the art SMT technology and lots of tools to get started. However, it is a toolkit only: There is likely no one size fits all processes with MT anyway so with Moses as well you will likely need to develop your own process around MT. For this you either need a capable partner (we found Pangeanic) or capable developer or computational linguist. While it probably helps to know the inside stuff – the mathematics behind it –I do not think you absolutely need to know. However, what is a must: Your special requirements for your process, such as input and output format, language requirements, special needs such as tagging, do not translates etc. We found it easier to use than many said. Once you have it up and running, it is really surprising how it works.