SlideShare a Scribd company logo
Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 1
Design and Development of a Malayalam to English Translator-
A Transfer Based Approach
Latha R Nair latha5074@gmail.com
Assistant Professor
School of Engineering
Cochin University of Science and Technology
Kochi,Kerala,682022, India
David Peter S davidpeter@cusat.ac.in
Professor
School of Engineering
Cochin University of Science and Technology
Kochi,Kerala,682022,India
Renjith P Ravindran
School of Engineering renjithforever@gmail.com
Cochin University of Science and Technology
Kochi,Kerala,682022,India
Abstract
This paper describes a transfer based scheme for translating Malayalam, a Dravidian language,
to English. The input to the system is a Malayalam sentence and he ouput isits equivalent English
sentence. The system comprises of a preprocessor for splitting the compound words, a
morphological parser for context disambiguation and chunking, a syntactic structure transfer
module and a bilingual dictionary. All the modules are morpheme based to reduce dictionary size.
The system does not rely on a stochastic approach and it is based on a rule-based architecture
along with various linguistic knowledge components of both Malayalam and English. The system
uses two sets of rules: rules for Malayalam morphology and rules for syntactic structure transfer
from Malayalam to English. The system is designed using artificial intelligence techniques and
can easily be modified to build translation systems for other language pairs.
Keywords: Malayalam Language, Transfer Based Approach, Machine Translation,
Morphological Parser.
1. INTRODUCTION
Work in the area of Machine translation in India has been going on for several decades.
Promising translation technology began to emerge by 1970 with the developments in the field of
artificial intelligence and computational linguistics. Machine Translation Systems in certain well-
defined domains have been successfully developed. Translation of gazette notifications, office
memorandums,and circulars has been done successfully by Mantra system developed by centre
for development for advanced computing(CDAC), Pune. Most of the systems developed are for
Hindi, the official language of India. This paper describes a translator for translating sentences in
Malayalam a Dravidian Language to English developed on a rule based architecture combined
with linguistic knowledge components of both Malayalam and English. The system has a
preprocessor for splitting the compound words, morphological parser for context disambiguation
and chunking and a bilingual dictionary. A set of rules for Malayalam morphology and rules for
syntactic structure transfer from Malayalam to English have been incorporated in the system.
Some of the organizations which are involved in the development of translation systems are:
Indian Institute of Technology (Kanpur), Center for Development of Advanced Computing (CDAC)
(Mumbai), CDAC(Pune), Indian Institute of Information Technology (Hyderabad) . They are
Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 2
engaged in development of MT systems under projects sponsored by Department of
Electronics, state governments etc. since 1990[1,2]. Research on MT systems between Indian
and foreign languages and also between Indian languages are going on in these institutions.
The two major goals in any translation system development wrk are accuracy of translation and
speed. Accuracy-wise, smart tools for handling transfer grammar and translation standards
including equivalent words, expressions, phrases and styles in the target language are to be
developed. The grammar should be optimized with a view to obtaining a single correct parse and
hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing
algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the
grammar which substantially reduces the parsing and generation time are required [3]. A fully
automatic Machine translation system should have different modules such as morphological
analyzer, Part of speech tagger, chunker, Named entity recognizer, word sense disambiguator,
syntactic transfer module and target word generator [3]. The different techniques used for
translation differs in the number of modules used and also the way these modules are
implemented. Both rule based and statistical approaches have been tried in the implementation of
each of these modules.
The various approaches used in the MT systems for Indian languages are: Direct machine
translation systems, Rule based systems and Corpus based systems. Rule based systems do not
use any intermediate representation. This is done on a word by word translation using a bilingual
dictionary usually followed by some syntactic arrangement. [4, 5,6] 2) Rule based translation
which produces an intermediate representation, which may be a parse tree or some abstract
representation. The target language text is generated from the intermediate representation. Of
the two rule based methods, Interlingua and transfer based approach, transfer based systems are
more flexible and it can be extended to language pairs in a multilingual environment. The
Interlingua based systems can be used for multilingual translation [7]. The amount of analysis
needed in Interlingua approach is more than that in a transfer based approach. The universal
networking language has been proposed as the Interlingua by the United Nations University for
overcoming the language barrier[8]. Corpus based MT is fully automatic and requires less human
labour than rule based approaches. The disadvantage is that they need sentence aligned parallel
text for each language pair and this method can not be employed where these corpora are not
available [9, 10].
2. PREVIOUS WORK
English to Hindi MT system Mantra, developed by Applied Artificial Intelligence (AAI) group of
CDAC, Bangalore, in 1999 uses transfer based approach. The system translates domain specific
documents in the field of personal administration; specifically gazette notifications, office orders,
office memorandums and circulars. It is based on lexicalized tree adjoining grammar (LTAG) to
represent English and Hindi grammar which are used to parse source English sentences and for
structural transfer from English to Hindi [2]. This system also works well on other language pairs
such as English-Bengali, English-Telugu, English-Gujarati , Hindi-English etc and also between
Indian language pairs such as Hindi-Bengali and Hindi-Punjabi. The Mantra approach is general
but the lexicon and grammar have been limited to the specific domain of personal Administration.
It uses preprocessing tools like phrase marker, named entity recognizer, spell and grammatical
checker. It uses Earley’s style bottom up parsing algorithm for parsing. The system provides
online addition of grammar rule. The system produces multiple translation results in the case of
multiple correct parses.
English to Kannada MT system has been developed at Resource centre for Indian Language
Technology Solutions (RC_ILTS), University of Hyderabad by Dr. K. Narayan Murthy [2]. This
also uses a transfer based approach and it can be applied to the domain of government circulars.
The project is funded by Karnataka government. This system uses Universal Clause Structure
Grammar (UCSG) formalism [15]. The technique is applied to English_ Telugu translation as well.
Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 3
Other systems developed using this approach are : Matra- English to Hindi MTS developed by
CDAC, Pune, Sakti- English to Marathi, Hindi and Telugu developed by IISc Bangalore and IIIT
Hyderabad, Anubaad- English to Bengali developed by CDAC, Kolkata, English to Malayalam
MTS developed by Amrita Institute of Technology.
It is found that translation between structurally similar languages like Hindi and Punjabi can be
developed easily than translation systems between Indian languages and English which differ in
the syntactic structure. The proposed translation system translates Malayalam sentences to
English sentences. Since there is a wide difference in English and Malayalam sentences the
system needs an additional modules for parsing and syntactic reordering.
3. DEVELOPMENT AND IMPLEMENTATION OF TRANSFER BASED
MACHINE TRANSLATION SYSTEM
A transfer based MT system has been developed with the following system modules 1. A
preprocessor for splitting the compound words [13] 2. a morphological parser for context
disambiguation and chunking 3. A transfer module which transfers the source language structure
representation to a target language representation. 4. A generation module which generates
target language text using target language structure. Block diagram of the same is shown in fig 1.
The grammar rules for Malayalam and some of the transfer rules for transferring source parse
tree to target parse tree are stored in two separate files. Some of the transfer rules are embedded
in the source code. The sentences stored in a source file are read one by one by the input
module and given to the preprocessor module. The final translated output is stored in another file.
FIGURE 1: Block diagram of a transfer based system
3.1 Compound Word Splitter Module
Morphological variations for words occur in Malayalam due to inflections, derivations and word
compounding. Malayalam is an agglutinative language where words of different syntactic
categories are combined to form a single word. Formation of new words by combining a noun and
a noun, noun and adjective, verb and noun, adverb and verb, adjective and noun and in some
cases all the words of an entire sentence to reflect the semantics of the sentence are very
common. The complexity of compounding in Malayalam language can be understood from the
following example.
The English version being Seetha’s cat ate a rat
Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 4
The constituent words in 1 are to be separated before any further processing. Splitting has been
done at morpheme level to reduce dictionary space. The above sentence gets split as shown in 2
morpheme by morpheme translation for the sentence at 2 is :
Seetha ‘s cat a mouse (null) ate
The morpheme sequence will be as in 2 above. The sequence of morphemes is given to the
parser for chunking and word sense disambiguation. The set of inflectional suffixes for nouns and
verbs and derivational suffix for adjectives are based on previous works [11, 12]. Due to the
ambiguity in the splitting rules the system generates multiple splits for the same input sentence
and the split with least number of constituents is fed to parser.
3.2 Parser Module
Parser takes input from the splitter and does the following tasks. It groups the input sequence of
morphemes into chunks [14, 15] and performs word sense disambiguation based on morpheme
tags [16]. The chunking process finds the basic units for tree reordering. The word sense
disambiguation is required as a morpheme can have multiple tags. The parser uses a depth first
approach with backtracking [17]. The output of the parser is a parse tree for the next module. The
parser uses the syntax rules for the morpheme sequences in Malayalam sentences in the regular
expression form. A set syntax rules in the regular expression form are shown below:
1. S-> NP*VP
2. NP-> ADJ*NP | N NA
3. VP ->ADV* V VA| V VA
Rule 1 implies that a simple sentence is a sequence of noun chunks followed by a verb chunk.
Based on the second rule, a noun chunk consists of a set of adjectives followed by a noun and
suffixes like case, gender and number for nouns. According to the third rule a verb chunk consist
of a sequence of adverbs followed by a verb. Only a subset of such rules derived is shown above.
The chunks selected form groups for structural transfer to form target language structure.
A sample sentence and the parse tree generated for the sentence using the grammar rules are
shown below:
Input sentence:
English version: The police thought that the thieves who stole the chain went into forest in the
night.
Output of the splitter:
English version:chain stole theif ‘s night in forest to went that police thought
Output of the parser:
Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 5
English version: CS(NC(S(NG(ADJC(S(N (chain ) V(stole) RP) NG(N(theif) PL(‘s))) NG(N(night)
NA(in)) NG(N(forest) NA(to)) V(went )) NCA(that)) S(N(police) V(thought)))
The corresponding parse tree generated is shown in Fig.2
FIGURE 2: Generated Parse tree
3.3 Syntactic Structure Transfer Module
The transfer module transfers the source language structure representation to a target language
representation. This module needs the sub tree rearrangement rules by which the source
Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 6
a
b
FIGURE 3: a. Parse tree after structural transfer b. Corresponding parse tree in English
Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 7
language sentence syntax tree can be transformed into target language sentence syntax tree.
The system performs most of the commonly needed reordering for Malayalam to English
translation. The tree after reordering for the above Malayalam sentence using the transfer
grammar rules using the transfer rule identified is shown in fig3(a) and its corresponding English
tree is shown in fig3(b).A set of transfer rules used by the system are shown in Table I.
Malayalam
structure
English structure
1 PP: NG P PP: P NG
2 VG: ADV V VG: V ADV
TABLE 1: A set of system transfer rules
According to the first rule the order of case suffix and noun chunk should be interchanged in a
prepositional chunk. The Second rule accords that in verbal chunk the adverb and verb should be
interchanged.
3.4 Target Sentence Generator Module
The generation module generates target language text using target language structure [18]. This
uses inter chunk dependency rules and intra chunk dependency rules. It involves lexical transfer
of verbs, transfer of auxiliary verb for tense, aspect and mood and transfer of gender, number and
person information. A depth first traversal of the target parse tree generates the following English
sentence
Input Malayalam sentence:
Correct English translation: The police thought that the thieves who stole the chain went into the
forest in the night
Sentence generated by the system:
The Police thought that thieves who stole the chain went in night to forest.
3.5 Cross lingual Dictionary
The dictionary includes most of the commonly occurring verbs, nouns, pronouns, adjectives,
inflectional and derivational suffixes, clause suffixes etc. Each entry in the file has three fields: the
root word (morpheme), the morpheme tag and its translation. The verbs in past tense have their
root words stored along with them. Since the system works with morphemes, the space required
for the dictionary is less.
TABLE 2: Lexicon
Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 8
Presently the system works for sentences which contain upto two adverbial or adjectival clauses
which is commonly found in Malayalam texts. The system can be modified to handle other
sentences by adding appropriate grammar rules and transfer rules to the rule database. As the
parser is a general parser, it can handle sentences of any depth.
3.6. Implementation and Testing of the System
The system was implemented in Python language and tested with a source file which contains
1000 sentences. The sentences which follow the grammar rules were translated. A group of
results are tabulated in table 3.
4. RESULTS AND DISCUSSION
The system was tested with more than 1000 different kinds of sentences with and without
subordinate clauses which follows the identified morpheme sequences. The system returned
correct meaningful translations in most of the cases. A group of sample input sentences with the
tabulated outputs are shown in table 3 to give a correct picture of the results obtained..In around
20% of sentences the system returned the exact English version of the input sentences. In
balance translations the output sentences were meaningful but had small shortcomings due to
the following reasons:
i) The positioning of articles is not considered.
ii) Many inter chunk and intra chunk dependencies are not considered.
iii) The lexicon stores only the common translation for polysemous words.
The system takes care of word sense disambiguation based on lexical category successfully.
The compound nouns are also not handled by the system as the shallow parser cannot group
them using the current set of rules. The system output can be enhanced including rules which
can take care of the above shortcomings.
5. CONCLUSION
Various MT groups have used different formalisms best suited to their applications. Of them
transfer based systems are more flexible and it can be extended to language pairs in a
multilingual environment. A transfer based MT system has been developed for Malayalam, a
Dravidian Language which comprises of a preprocessor for splitting the compound words, a
morphological parser for context disambiguation and chunking, a syntactic structure transfer
module and a bilingual dictionary. The system was tested successfully for more than 1000
different types of sentences wherein the system returned true results for sentences which contain
two subordinate clauses. Even for sentences with more than two subordinate clauses the system
Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 9
TABLE 3: Group of Tabulated results
Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 10
returned translated output sentences which could give basic understanding of the input
sentences. More rules can be added to make the system to give exact translation of input
sentences in all cases. Additional modules like finding and replacing collocations, finding and
replacing named entities can also be added to the basic translator. The results obtained are
encouraging. The work can be extended to create a full fledged machine translator from any
Dravidian language to English since they all exhibit structural homogeneity.
6. REFERENCES
[1] P Dubey et al. “Overcoming the Digital Divide through Machine Translation”.
TranslationJournal., Vol.15, 2011, http://translationjournal.net/journal/55mt_india.htm [Dec
12, 2011].
[2] B.K.Murthy, W.R Deshpande ., “Language technology in India: past, present and future”,
1998,
[3] http://www.cicc.or.jp/english/hyoujyunka/mlit3/7-12.html [Dec 11,2011]
[4] S.Lalithadevi, P.Pralayankar , V.Kavitha. ”Translation of Hindi se to Tamil in a MT System”.
Information systems for Indian languages, Berlin Heidelberg: Springer-Verlag, 2011, pp. 246–
249.
[5] V Goyal, G S Lehal. “Advances in Machine Translation Systems”. Language In India, Vol. 9,
No. 11, 2009, pp. 138-150 .
[6] G.S.Josan , G.S. Lehal. “Evaluation of Hindi to Punjabi Machine Translation System”.
International Journal of Computer Science Issues, vol4 no1, 2009, pp 243-257.
[7] V.Goyal, G.S. Lehal . “Web Based Hindi to Punjabi Machine Translation System”. Journal of
Emerging Technologies in Web Intelligence. Vol.2., 2010, pp.148-151.
[8] S.K.Goutam. “The EB-Anubad translator: A hybrid scheme”. Journal of Zhejiang University
Science, Vol.6, 2005, pp.1047-1050.
[9] D.S Parikh P.Bhattacharyya “Interlingua Based English Hindi Machine Translation and
Language Divergence”, Machine Translation , Vol.16, 2001, pp.251-304.
[10]R.M.K. Sinha. “A hybridized EBMT system for Hindi to English Translation”. CSI Journal,
volume 37 no. 4, 2007, pp.3-9.
[11]10. R.M.K. Sinha. “Designing Multi-lingual Machine- Translation System: Some
Perspectives”. International Conference on Machine Learning: Models, Technologies &
Applications (MLMTA 2007), 2007 , pp. 244-249.
[12]11. S..M Idicula, D. S. Peter. “A morphological processor for Malayalam language”.
South Asia Research. vol. 27 (2): 2007, pp.173-186.
[13]12. L. Pandian, T.V.Geetha “Morpheme based Language Model for Tamil Part of Speech
Tagging” Polibits (38) , 2008, pp.19-26 .
[14]13. L.R.Nair, D.S. Peter. “Development of a rule based learning system for splitting
compound words in Malayalam language”. IEEE Recent advances in intelligent and
computational systems(RAICS), 2011, pp.751-755.
Latha R Nair, David Peter & Renjith P Ravindran
International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 11
[15]14. L.R.Nair, D.S. Peter, “Shallow parser for Malayalam Language using finite state
cascades”, 4th
international congress on image and signal processing, China, 2011, pp.2464-
2467.
[16]15. S. Abney. “Partial parsing via finite state cascades”. Journal of Natural Language
Engineering, 2(4), 1996, pp. 337-344.
[17]16. D.Jurafsky ,J.H Martin. Speech and natural language processing. India: Pearson
Education, 2000,pp 657-671.
[18]17. E. Rich, K. Knight, S. B Nair. Artificial Intelligence. New Delhi,India: The Tata McGraw
Hill, 2009 pp 295- 300.
[19]18. S.L.Devi, P.Pralayankar. “Verb Transfer in a Tamil to Hindi Machine Translation
System”. International Conference on Asian Language Processing. Harbin, China, 2010,
pp.261-264.

More Related Content

PDF
Survey on Indian CLIR and MT systems in Marathi Language
PDF
Ac04507168175
PDF
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
PDF
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
PDF
A Novel Approach for Rule Based Translation of English to Marathi
PDF
Cross language information retrieval in indian
PDF
A New Approach to Parts of Speech Tagging in Malayalam
PDF
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Survey on Indian CLIR and MT systems in Marathi Language
Ac04507168175
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A Novel Approach for Rule Based Translation of English to Marathi
Cross language information retrieval in indian
A New Approach to Parts of Speech Tagging in Malayalam
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis

What's hot (19)

PDF
Quality estimation of machine translation outputs through stemming
PDF
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
PDF
Comparison Analysis of Post- Processing Method for Punjabi Font
PDF
Named Entity Recognition System for Hindi Language: A Hybrid Approach
PDF
PDF
Named Entity Recognition using Hidden Markov Model (HMM)
PDF
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
PDF
A Review on the Cross and Multilingual Information Retrieval
PDF
Tamil-English Document Translation Using Statistical Machine Translation Appr...
PDF
Quality Translation Enhancement Using Sequence Knowledge and Pruning in Stati...
PDF
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
PDF
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
PPT
Tamil Morphological Analysis
PDF
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
PDF
An Improved Approach for Word Ambiguity Removal
PDF
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
PDF
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
PDF
Punjabi Text Classification using Naïve Bayes, Centroid and Hybrid Approach
PDF
Language Identifier for Languages of Pakistan Including Arabic and Persian
Quality estimation of machine translation outputs through stemming
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Comparison Analysis of Post- Processing Method for Punjabi Font
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition using Hidden Markov Model (HMM)
MULTI-WORD TERM EXTRACTION BASED ON NEW HYBRID APPROACH FOR ARABIC LANGUAGE
A Review on the Cross and Multilingual Information Retrieval
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Quality Translation Enhancement Using Sequence Knowledge and Pruning in Stati...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Tamil Morphological Analysis
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
An Improved Approach for Word Ambiguity Removal
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
Punjabi Text Classification using Naïve Bayes, Centroid and Hybrid Approach
Language Identifier for Languages of Pakistan Including Arabic and Persian
Ad

Viewers also liked (19)

PDF
AbuAmanAminNoorah's Lexicon
PDF
Convention Conseil Régional paca / Pôle emploi (avril 2016)
PPTX
2017 january ohio yab meeting
PDF
Recruiter
PPT
2013 pcsao youth panel
PDF
RRXtra1706_LR
PPT
CSCC Resources for Foster and Homeless Youth
PPTX
2016 october ohio yab meeting
PDF
Helping Foster Kids Through the Holidays
PPTX
Caregivers normalcy and rpps power point updated 9 2015-personalized
PPTX
2017 OHIO YAB Officers Retreat
PDF
What’s Involved with Aging-Out of the Foster Care System? The Big Picture: Tr...
PPTX
Fathers day
PPT
2017 junior league presentation
DOCX
Resume - Santi Gong__
PPT
2017 Emotional Resiliency for Teens in Foster Care
PPT
2013 three days on the hill
PPT
2014 three days on the hill training
PDF
Large English dictionary free to download PDF
AbuAmanAminNoorah's Lexicon
Convention Conseil Régional paca / Pôle emploi (avril 2016)
2017 january ohio yab meeting
Recruiter
2013 pcsao youth panel
RRXtra1706_LR
CSCC Resources for Foster and Homeless Youth
2016 october ohio yab meeting
Helping Foster Kids Through the Holidays
Caregivers normalcy and rpps power point updated 9 2015-personalized
2017 OHIO YAB Officers Retreat
What’s Involved with Aging-Out of the Foster Care System? The Big Picture: Tr...
Fathers day
2017 junior league presentation
Resume - Santi Gong__
2017 Emotional Resiliency for Teens in Foster Care
2013 three days on the hill
2014 three days on the hill training
Large English dictionary free to download PDF
Ad

Similar to Design and Development of a Malayalam to English Translator- A Transfer Based Approach (20)

PDF
Cf32516518
PDF
Survey of machine translation systems in india
PDF
A Novel Approach for Rule Based Translation of English to Marathi
PDF
A Novel Approach for Rule Based Translation of English to Marathi
PDF
A Novel Approach for Rule Based Translation of English to Marathi
PDF
International Journal of Engineering and Science Invention (IJESI)
PDF
International Journal of Engineering and Science Invention (IJESI)
PDF
TOP 5 MOST CITED NATURAL LANGUAGE COMPUTING ARTICLES IN 2013
PDF
Building of Database for English-Azerbaijani Machine Translation Expert System
PDF
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
PPTX
Multi lingual corpus for machine aided translation
ODT
A tutorial on Machine Translation
PDF
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
PDF
Machine Translation Approaches and Design Aspects
PDF
Machine Transalation.pdf
PDF
Development of Bi-Directional English To Yoruba Translator for Real-Time Mobi...
PPTX
Machine Tanslation
PDF
ReseachPaper
PDF
Ijetcas14 444
PDF
Pxc3898474
Cf32516518
Survey of machine translation systems in india
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
TOP 5 MOST CITED NATURAL LANGUAGE COMPUTING ARTICLES IN 2013
Building of Database for English-Azerbaijani Machine Translation Expert System
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
Multi lingual corpus for machine aided translation
A tutorial on Machine Translation
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Machine Translation Approaches and Design Aspects
Machine Transalation.pdf
Development of Bi-Directional English To Yoruba Translator for Real-Time Mobi...
Machine Tanslation
ReseachPaper
Ijetcas14 444
Pxc3898474

More from Waqas Tariq (20)

PDF
The Use of Java Swing’s Components to Develop a Widget
PDF
3D Human Hand Posture Reconstruction Using a Single 2D Image
PDF
Camera as Mouse and Keyboard for Handicap Person with Troubleshooting Ability...
PDF
A Proposed Web Accessibility Framework for the Arab Disabled
PDF
Real Time Blinking Detection Based on Gabor Filter
PDF
Computer Input with Human Eyes-Only Using Two Purkinje Images Which Works in ...
PDF
Toward a More Robust Usability concept with Perceived Enjoyment in the contex...
PDF
Collaborative Learning of Organisational Knolwedge
PDF
A PNML extension for the HCI design
PDF
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 Board
PDF
An overview on Advanced Research Works on Brain-Computer Interface
PDF
Exploring the Relationship Between Mobile Phone and Senior Citizens: A Malays...
PDF
Principles of Good Screen Design in Websites
PDF
Progress of Virtual Teams in Albania
PDF
Cognitive Approach Towards the Maintenance of Web-Sites Through Quality Evalu...
PDF
USEFul: A Framework to Mainstream Web Site Usability through Automated Evalua...
PDF
Robot Arm Utilized Having Meal Support System Based on Computer Input by Huma...
PDF
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
PDF
Interface on Usability Testing Indonesia Official Tourism Website
PDF
Monitoring and Visualisation Approach for Collaboration Production Line Envir...
The Use of Java Swing’s Components to Develop a Widget
3D Human Hand Posture Reconstruction Using a Single 2D Image
Camera as Mouse and Keyboard for Handicap Person with Troubleshooting Ability...
A Proposed Web Accessibility Framework for the Arab Disabled
Real Time Blinking Detection Based on Gabor Filter
Computer Input with Human Eyes-Only Using Two Purkinje Images Which Works in ...
Toward a More Robust Usability concept with Perceived Enjoyment in the contex...
Collaborative Learning of Organisational Knolwedge
A PNML extension for the HCI design
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 Board
An overview on Advanced Research Works on Brain-Computer Interface
Exploring the Relationship Between Mobile Phone and Senior Citizens: A Malays...
Principles of Good Screen Design in Websites
Progress of Virtual Teams in Albania
Cognitive Approach Towards the Maintenance of Web-Sites Through Quality Evalu...
USEFul: A Framework to Mainstream Web Site Usability through Automated Evalua...
Robot Arm Utilized Having Meal Support System Based on Computer Input by Huma...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Interface on Usability Testing Indonesia Official Tourism Website
Monitoring and Visualisation Approach for Collaboration Production Line Envir...

Recently uploaded (20)

PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
master seminar digital applications in india
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Basic Mud Logging Guide for educational purpose
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Institutional Correction lecture only . . .
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Cell Types and Its function , kingdom of life
VCE English Exam - Section C Student Revision Booklet
Final Presentation General Medicine 03-08-2024.pptx
master seminar digital applications in india
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
O5-L3 Freight Transport Ops (International) V1.pdf
Basic Mud Logging Guide for educational purpose
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Microbial disease of the cardiovascular and lymphatic systems
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pharma ospi slides which help in ospi learning
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPH.pptx obstetrics and gynecology in nursing
Abdominal Access Techniques with Prof. Dr. R K Mishra
STATICS OF THE RIGID BODIES Hibbelers.pdf
Institutional Correction lecture only . . .
01-Introduction-to-Information-Management.pdf
Cell Types and Its function , kingdom of life

Design and Development of a Malayalam to English Translator- A Transfer Based Approach

  • 1. Latha R Nair, David Peter & Renjith P Ravindran International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 1 Design and Development of a Malayalam to English Translator- A Transfer Based Approach Latha R Nair latha5074@gmail.com Assistant Professor School of Engineering Cochin University of Science and Technology Kochi,Kerala,682022, India David Peter S davidpeter@cusat.ac.in Professor School of Engineering Cochin University of Science and Technology Kochi,Kerala,682022,India Renjith P Ravindran School of Engineering renjithforever@gmail.com Cochin University of Science and Technology Kochi,Kerala,682022,India Abstract This paper describes a transfer based scheme for translating Malayalam, a Dravidian language, to English. The input to the system is a Malayalam sentence and he ouput isits equivalent English sentence. The system comprises of a preprocessor for splitting the compound words, a morphological parser for context disambiguation and chunking, a syntactic structure transfer module and a bilingual dictionary. All the modules are morpheme based to reduce dictionary size. The system does not rely on a stochastic approach and it is based on a rule-based architecture along with various linguistic knowledge components of both Malayalam and English. The system uses two sets of rules: rules for Malayalam morphology and rules for syntactic structure transfer from Malayalam to English. The system is designed using artificial intelligence techniques and can easily be modified to build translation systems for other language pairs. Keywords: Malayalam Language, Transfer Based Approach, Machine Translation, Morphological Parser. 1. INTRODUCTION Work in the area of Machine translation in India has been going on for several decades. Promising translation technology began to emerge by 1970 with the developments in the field of artificial intelligence and computational linguistics. Machine Translation Systems in certain well- defined domains have been successfully developed. Translation of gazette notifications, office memorandums,and circulars has been done successfully by Mantra system developed by centre for development for advanced computing(CDAC), Pune. Most of the systems developed are for Hindi, the official language of India. This paper describes a translator for translating sentences in Malayalam a Dravidian Language to English developed on a rule based architecture combined with linguistic knowledge components of both Malayalam and English. The system has a preprocessor for splitting the compound words, morphological parser for context disambiguation and chunking and a bilingual dictionary. A set of rules for Malayalam morphology and rules for syntactic structure transfer from Malayalam to English have been incorporated in the system. Some of the organizations which are involved in the development of translation systems are: Indian Institute of Technology (Kanpur), Center for Development of Advanced Computing (CDAC) (Mumbai), CDAC(Pune), Indian Institute of Information Technology (Hyderabad) . They are
  • 2. Latha R Nair, David Peter & Renjith P Ravindran International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 2 engaged in development of MT systems under projects sponsored by Department of Electronics, state governments etc. since 1990[1,2]. Research on MT systems between Indian and foreign languages and also between Indian languages are going on in these institutions. The two major goals in any translation system development wrk are accuracy of translation and speed. Accuracy-wise, smart tools for handling transfer grammar and translation standards including equivalent words, expressions, phrases and styles in the target language are to be developed. The grammar should be optimized with a view to obtaining a single correct parse and hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the grammar which substantially reduces the parsing and generation time are required [3]. A fully automatic Machine translation system should have different modules such as morphological analyzer, Part of speech tagger, chunker, Named entity recognizer, word sense disambiguator, syntactic transfer module and target word generator [3]. The different techniques used for translation differs in the number of modules used and also the way these modules are implemented. Both rule based and statistical approaches have been tried in the implementation of each of these modules. The various approaches used in the MT systems for Indian languages are: Direct machine translation systems, Rule based systems and Corpus based systems. Rule based systems do not use any intermediate representation. This is done on a word by word translation using a bilingual dictionary usually followed by some syntactic arrangement. [4, 5,6] 2) Rule based translation which produces an intermediate representation, which may be a parse tree or some abstract representation. The target language text is generated from the intermediate representation. Of the two rule based methods, Interlingua and transfer based approach, transfer based systems are more flexible and it can be extended to language pairs in a multilingual environment. The Interlingua based systems can be used for multilingual translation [7]. The amount of analysis needed in Interlingua approach is more than that in a transfer based approach. The universal networking language has been proposed as the Interlingua by the United Nations University for overcoming the language barrier[8]. Corpus based MT is fully automatic and requires less human labour than rule based approaches. The disadvantage is that they need sentence aligned parallel text for each language pair and this method can not be employed where these corpora are not available [9, 10]. 2. PREVIOUS WORK English to Hindi MT system Mantra, developed by Applied Artificial Intelligence (AAI) group of CDAC, Bangalore, in 1999 uses transfer based approach. The system translates domain specific documents in the field of personal administration; specifically gazette notifications, office orders, office memorandums and circulars. It is based on lexicalized tree adjoining grammar (LTAG) to represent English and Hindi grammar which are used to parse source English sentences and for structural transfer from English to Hindi [2]. This system also works well on other language pairs such as English-Bengali, English-Telugu, English-Gujarati , Hindi-English etc and also between Indian language pairs such as Hindi-Bengali and Hindi-Punjabi. The Mantra approach is general but the lexicon and grammar have been limited to the specific domain of personal Administration. It uses preprocessing tools like phrase marker, named entity recognizer, spell and grammatical checker. It uses Earley’s style bottom up parsing algorithm for parsing. The system provides online addition of grammar rule. The system produces multiple translation results in the case of multiple correct parses. English to Kannada MT system has been developed at Resource centre for Indian Language Technology Solutions (RC_ILTS), University of Hyderabad by Dr. K. Narayan Murthy [2]. This also uses a transfer based approach and it can be applied to the domain of government circulars. The project is funded by Karnataka government. This system uses Universal Clause Structure Grammar (UCSG) formalism [15]. The technique is applied to English_ Telugu translation as well.
  • 3. Latha R Nair, David Peter & Renjith P Ravindran International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 3 Other systems developed using this approach are : Matra- English to Hindi MTS developed by CDAC, Pune, Sakti- English to Marathi, Hindi and Telugu developed by IISc Bangalore and IIIT Hyderabad, Anubaad- English to Bengali developed by CDAC, Kolkata, English to Malayalam MTS developed by Amrita Institute of Technology. It is found that translation between structurally similar languages like Hindi and Punjabi can be developed easily than translation systems between Indian languages and English which differ in the syntactic structure. The proposed translation system translates Malayalam sentences to English sentences. Since there is a wide difference in English and Malayalam sentences the system needs an additional modules for parsing and syntactic reordering. 3. DEVELOPMENT AND IMPLEMENTATION OF TRANSFER BASED MACHINE TRANSLATION SYSTEM A transfer based MT system has been developed with the following system modules 1. A preprocessor for splitting the compound words [13] 2. a morphological parser for context disambiguation and chunking 3. A transfer module which transfers the source language structure representation to a target language representation. 4. A generation module which generates target language text using target language structure. Block diagram of the same is shown in fig 1. The grammar rules for Malayalam and some of the transfer rules for transferring source parse tree to target parse tree are stored in two separate files. Some of the transfer rules are embedded in the source code. The sentences stored in a source file are read one by one by the input module and given to the preprocessor module. The final translated output is stored in another file. FIGURE 1: Block diagram of a transfer based system 3.1 Compound Word Splitter Module Morphological variations for words occur in Malayalam due to inflections, derivations and word compounding. Malayalam is an agglutinative language where words of different syntactic categories are combined to form a single word. Formation of new words by combining a noun and a noun, noun and adjective, verb and noun, adverb and verb, adjective and noun and in some cases all the words of an entire sentence to reflect the semantics of the sentence are very common. The complexity of compounding in Malayalam language can be understood from the following example. The English version being Seetha’s cat ate a rat
  • 4. Latha R Nair, David Peter & Renjith P Ravindran International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 4 The constituent words in 1 are to be separated before any further processing. Splitting has been done at morpheme level to reduce dictionary space. The above sentence gets split as shown in 2 morpheme by morpheme translation for the sentence at 2 is : Seetha ‘s cat a mouse (null) ate The morpheme sequence will be as in 2 above. The sequence of morphemes is given to the parser for chunking and word sense disambiguation. The set of inflectional suffixes for nouns and verbs and derivational suffix for adjectives are based on previous works [11, 12]. Due to the ambiguity in the splitting rules the system generates multiple splits for the same input sentence and the split with least number of constituents is fed to parser. 3.2 Parser Module Parser takes input from the splitter and does the following tasks. It groups the input sequence of morphemes into chunks [14, 15] and performs word sense disambiguation based on morpheme tags [16]. The chunking process finds the basic units for tree reordering. The word sense disambiguation is required as a morpheme can have multiple tags. The parser uses a depth first approach with backtracking [17]. The output of the parser is a parse tree for the next module. The parser uses the syntax rules for the morpheme sequences in Malayalam sentences in the regular expression form. A set syntax rules in the regular expression form are shown below: 1. S-> NP*VP 2. NP-> ADJ*NP | N NA 3. VP ->ADV* V VA| V VA Rule 1 implies that a simple sentence is a sequence of noun chunks followed by a verb chunk. Based on the second rule, a noun chunk consists of a set of adjectives followed by a noun and suffixes like case, gender and number for nouns. According to the third rule a verb chunk consist of a sequence of adverbs followed by a verb. Only a subset of such rules derived is shown above. The chunks selected form groups for structural transfer to form target language structure. A sample sentence and the parse tree generated for the sentence using the grammar rules are shown below: Input sentence: English version: The police thought that the thieves who stole the chain went into forest in the night. Output of the splitter: English version:chain stole theif ‘s night in forest to went that police thought Output of the parser:
  • 5. Latha R Nair, David Peter & Renjith P Ravindran International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 5 English version: CS(NC(S(NG(ADJC(S(N (chain ) V(stole) RP) NG(N(theif) PL(‘s))) NG(N(night) NA(in)) NG(N(forest) NA(to)) V(went )) NCA(that)) S(N(police) V(thought))) The corresponding parse tree generated is shown in Fig.2 FIGURE 2: Generated Parse tree 3.3 Syntactic Structure Transfer Module The transfer module transfers the source language structure representation to a target language representation. This module needs the sub tree rearrangement rules by which the source
  • 6. Latha R Nair, David Peter & Renjith P Ravindran International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 6 a b FIGURE 3: a. Parse tree after structural transfer b. Corresponding parse tree in English
  • 7. Latha R Nair, David Peter & Renjith P Ravindran International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 7 language sentence syntax tree can be transformed into target language sentence syntax tree. The system performs most of the commonly needed reordering for Malayalam to English translation. The tree after reordering for the above Malayalam sentence using the transfer grammar rules using the transfer rule identified is shown in fig3(a) and its corresponding English tree is shown in fig3(b).A set of transfer rules used by the system are shown in Table I. Malayalam structure English structure 1 PP: NG P PP: P NG 2 VG: ADV V VG: V ADV TABLE 1: A set of system transfer rules According to the first rule the order of case suffix and noun chunk should be interchanged in a prepositional chunk. The Second rule accords that in verbal chunk the adverb and verb should be interchanged. 3.4 Target Sentence Generator Module The generation module generates target language text using target language structure [18]. This uses inter chunk dependency rules and intra chunk dependency rules. It involves lexical transfer of verbs, transfer of auxiliary verb for tense, aspect and mood and transfer of gender, number and person information. A depth first traversal of the target parse tree generates the following English sentence Input Malayalam sentence: Correct English translation: The police thought that the thieves who stole the chain went into the forest in the night Sentence generated by the system: The Police thought that thieves who stole the chain went in night to forest. 3.5 Cross lingual Dictionary The dictionary includes most of the commonly occurring verbs, nouns, pronouns, adjectives, inflectional and derivational suffixes, clause suffixes etc. Each entry in the file has three fields: the root word (morpheme), the morpheme tag and its translation. The verbs in past tense have their root words stored along with them. Since the system works with morphemes, the space required for the dictionary is less. TABLE 2: Lexicon
  • 8. Latha R Nair, David Peter & Renjith P Ravindran International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 8 Presently the system works for sentences which contain upto two adverbial or adjectival clauses which is commonly found in Malayalam texts. The system can be modified to handle other sentences by adding appropriate grammar rules and transfer rules to the rule database. As the parser is a general parser, it can handle sentences of any depth. 3.6. Implementation and Testing of the System The system was implemented in Python language and tested with a source file which contains 1000 sentences. The sentences which follow the grammar rules were translated. A group of results are tabulated in table 3. 4. RESULTS AND DISCUSSION The system was tested with more than 1000 different kinds of sentences with and without subordinate clauses which follows the identified morpheme sequences. The system returned correct meaningful translations in most of the cases. A group of sample input sentences with the tabulated outputs are shown in table 3 to give a correct picture of the results obtained..In around 20% of sentences the system returned the exact English version of the input sentences. In balance translations the output sentences were meaningful but had small shortcomings due to the following reasons: i) The positioning of articles is not considered. ii) Many inter chunk and intra chunk dependencies are not considered. iii) The lexicon stores only the common translation for polysemous words. The system takes care of word sense disambiguation based on lexical category successfully. The compound nouns are also not handled by the system as the shallow parser cannot group them using the current set of rules. The system output can be enhanced including rules which can take care of the above shortcomings. 5. CONCLUSION Various MT groups have used different formalisms best suited to their applications. Of them transfer based systems are more flexible and it can be extended to language pairs in a multilingual environment. A transfer based MT system has been developed for Malayalam, a Dravidian Language which comprises of a preprocessor for splitting the compound words, a morphological parser for context disambiguation and chunking, a syntactic structure transfer module and a bilingual dictionary. The system was tested successfully for more than 1000 different types of sentences wherein the system returned true results for sentences which contain two subordinate clauses. Even for sentences with more than two subordinate clauses the system
  • 9. Latha R Nair, David Peter & Renjith P Ravindran International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 9 TABLE 3: Group of Tabulated results
  • 10. Latha R Nair, David Peter & Renjith P Ravindran International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 10 returned translated output sentences which could give basic understanding of the input sentences. More rules can be added to make the system to give exact translation of input sentences in all cases. Additional modules like finding and replacing collocations, finding and replacing named entities can also be added to the basic translator. The results obtained are encouraging. The work can be extended to create a full fledged machine translator from any Dravidian language to English since they all exhibit structural homogeneity. 6. REFERENCES [1] P Dubey et al. “Overcoming the Digital Divide through Machine Translation”. TranslationJournal., Vol.15, 2011, http://translationjournal.net/journal/55mt_india.htm [Dec 12, 2011]. [2] B.K.Murthy, W.R Deshpande ., “Language technology in India: past, present and future”, 1998, [3] http://www.cicc.or.jp/english/hyoujyunka/mlit3/7-12.html [Dec 11,2011] [4] S.Lalithadevi, P.Pralayankar , V.Kavitha. ”Translation of Hindi se to Tamil in a MT System”. Information systems for Indian languages, Berlin Heidelberg: Springer-Verlag, 2011, pp. 246– 249. [5] V Goyal, G S Lehal. “Advances in Machine Translation Systems”. Language In India, Vol. 9, No. 11, 2009, pp. 138-150 . [6] G.S.Josan , G.S. Lehal. “Evaluation of Hindi to Punjabi Machine Translation System”. International Journal of Computer Science Issues, vol4 no1, 2009, pp 243-257. [7] V.Goyal, G.S. Lehal . “Web Based Hindi to Punjabi Machine Translation System”. Journal of Emerging Technologies in Web Intelligence. Vol.2., 2010, pp.148-151. [8] S.K.Goutam. “The EB-Anubad translator: A hybrid scheme”. Journal of Zhejiang University Science, Vol.6, 2005, pp.1047-1050. [9] D.S Parikh P.Bhattacharyya “Interlingua Based English Hindi Machine Translation and Language Divergence”, Machine Translation , Vol.16, 2001, pp.251-304. [10]R.M.K. Sinha. “A hybridized EBMT system for Hindi to English Translation”. CSI Journal, volume 37 no. 4, 2007, pp.3-9. [11]10. R.M.K. Sinha. “Designing Multi-lingual Machine- Translation System: Some Perspectives”. International Conference on Machine Learning: Models, Technologies & Applications (MLMTA 2007), 2007 , pp. 244-249. [12]11. S..M Idicula, D. S. Peter. “A morphological processor for Malayalam language”. South Asia Research. vol. 27 (2): 2007, pp.173-186. [13]12. L. Pandian, T.V.Geetha “Morpheme based Language Model for Tamil Part of Speech Tagging” Polibits (38) , 2008, pp.19-26 . [14]13. L.R.Nair, D.S. Peter. “Development of a rule based learning system for splitting compound words in Malayalam language”. IEEE Recent advances in intelligent and computational systems(RAICS), 2011, pp.751-755.
  • 11. Latha R Nair, David Peter & Renjith P Ravindran International Journal of Computational Linguistics (IJCL), Volume (3) : Issue (1) : 2012 11 [15]14. L.R.Nair, D.S. Peter, “Shallow parser for Malayalam Language using finite state cascades”, 4th international congress on image and signal processing, China, 2011, pp.2464- 2467. [16]15. S. Abney. “Partial parsing via finite state cascades”. Journal of Natural Language Engineering, 2(4), 1996, pp. 337-344. [17]16. D.Jurafsky ,J.H Martin. Speech and natural language processing. India: Pearson Education, 2000,pp 657-671. [18]17. E. Rich, K. Knight, S. B Nair. Artificial Intelligence. New Delhi,India: The Tata McGraw Hill, 2009 pp 295- 300. [19]18. S.L.Devi, P.Pralayankar. “Verb Transfer in a Tamil to Hindi Machine Translation System”. International Conference on Asian Language Processing. Harbin, China, 2010, pp.261-264.