SlideShare a Scribd company logo
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
DOI : 10.5121/ijist.2012.2604 43
HINDI NAMED ENTITY RECOGNITION BY
AGGREGATING RULE BASED HEURISTICS AND
HIDDEN MARKOV MODEL
Deepti Chopra, Nusrat Jahan, Sudha Morwal
Department of Computer Engineering, Banasthali Vidyapith Jaipur (Raj.), INDIA
deeptichopra11@yahoo.co.in
nusratkota@gmail.com
sudha_morwal@yahoo.co.in
ABSTRACT
Named entity recognition (NER) is one of the applications of Natural Language Processing and is regarded
as the subtask of information retrieval. NER is the process to detect Named Entities (NEs) in a document
and to categorize them into certain Named entity classes such as the name of organization, person,
location, sport, river, city, country, quantity etc. In English, we have accomplished lot of work related to
NER. But, at present, still we have not been able to achieve much of the success pertaining to NER in the
Indian languages. The following paper discusses about NER, the various approaches of NER, Performance
Metrics, the challenges in NER in the Indian languages and finally some of the results that have been
achieved by performing NER in Hindi by aggregating approaches such as Rule based heuristics and
Hidden Markov Model (HMM).
KEYWORDS
HMM, Accuracy, NER, Performance Metrics, Named Entities
1. INTRODUCTION
There are numerous applications of Named Entity Recognition (NER).Some of these include:
Information Extraction, Question Answering, Information Retrieval, Automatic Summarization,
Machine Translation etc. The Named Entities can be known to us, if we perform computations on
the natural language. The task of extracting necessary details and retrieving important information
can be made easier and faster, if the Named entities are already known to us. NER is the process
in which Named Entities are detected in a document and are classified into their respective
Named Entity classes using any of the NER based approaches. According to the 8th
schedule,
India is known to have 22 official Indian languages. NER in Indian languages is still considered
to be a budding topic of research in the field of NLP and much of work is needed to be performed
in this regard.
Consider an example of NER in Hindi as follows:
“Mohit/PER ne/O mi road/LOC se/O kitab/O khareedi/O |/O
In the above sentence, the task of a NER based system is to extract and then classify the named
entities into certain classes. Here, we have considered ‘Mohit’ as the name of a person, so it is
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
44
shown by a PER tag. ‘mi road’ is the name of a location, so we have allotted a LOC tag to it. The
named entity tags that we choose may vary every time. It depends on the individual choice and
the contents that we have considered for the Named Entity Recognition. TABLE I lists some of
the Named Entity Tags. Named Entity tags may be of the general type or may further be
divided into sub tags which are of specific types. E.g. location tag (LOC) may further be
classified into continent tag, country tag, city tag, state tag, town tag, street tag etc.
Figure1. A single Named Entity tag split into more specific Named Entity tags
Table 1
Various Named Entity Tags. NE Tags: Named Entity Tags
PER: Name of Person, CO-Country, ORG-Organization, VEH-vehicle and QTY-Quantity
NE TAG EXAMPLE
PER Deepti, Sudha, Rohit
CITY Jaipur, Mumbai, Kolkata
CO India, China, Pakistan
STATE Rajasthan, Maharashtra
SPORT Hockey, Badminton
ORG TCS, Infosys, Accenture
RIVER Ganga, Krishna, kaveri
DATE 27-04-2012, 31/01/1989
TIME 10:10
PERCENT 100%
2. METHODOLOGIES FOR NER
There are basically two approaches that are employed in Named Entity Recognition. [5] [1] [18]
These include: Rule Based Approach and Machine learning based Approach [11] [6] [16].
2.1. Rule based Approach
It is also known as handcrafted approach. It is of two types:
LOCATION
CONTINENT
COUNTRY
STATE CITY
TOWN
STREET
T
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
45
2.1.1 List Lookup Approach
In this approach, Gazetteers are used that consists of different lists of Named Entity
classes and a simple look up operation is performed to conclude whether a word is a
Named Entity or not. If a particular word is found in a Named Entity class, then a Named
Entity tag is allotted to that word according to the Named Entity class in which it is
found. Indian languages lack in resources.
We can prepare Gazetteers in Indian languages using transliteration that would convert
English Named Entities into Indian languages. Some seed values of a domain specific
corpus can be used that would learn the context patterns and then Named Entities are
produced by the concept of bootstrapping.[17]This methodology is easy and fast .The
disadvantage of this approach is that it cannot overcome the problem of ambiguities.
E.g In a sentence:-““Ganga/PER Ne/O Ganga/RIVER nadi/O mein/O dupki/O Lagayi/O
|/O””. In this sentence, Ganga is a Named Entity .But, it can be a person name or a river
name .The ambiguity cannot be resolved by this methodology.
2.1.2. Linguistic Approach
In this approach, a linguist, who has an in depth knowledge about the grammar of specific
language constructs some rules, so that the Named Entities can be recognized as well as classified
easily. [3][20][19]The rules that are constructed are language independent and cannot be used to
identify Named Entities in some other language. [11]
2.2. Machine Learning Based Approach
This approach is also known as automated approach or Statistical approach. Machine learning
based approach is more efficiently and frequently used as compared to the Rule based approach.
2.2.1. Hidden Markov Model (HMM)
HMM is a statistical based approach in which states are hidden or unobserved .The HMM
produces sequence of tokens that are nothing but optimal state sequence.
It is based on the Markov Chain Property i.e. the probability of occurrence of the next state is
dependent on the just previous state. HMM is easy to implement. The disadvantage of this
approach is that it requires lot of training in order to get better results and it cannot be used for
large dependencies. [12]
Figure 2: Diagrammatic description of HMM
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
46
2.2.2. Maximum Entropy Markov Model (MEMM)
It combines the concept of Hidden Markov Model and Maximum Entropy Model. While training,
this model makes sure that the unknown values in a Markov Chain are connected and are not
conditionally independent of each other.
The large dependency problem of HMM is resolved by this model. Also, it has higher recall and
precision as compared to HMM. The disadvantage of this approach is the label bias problem. The
probabilities of transition from a particular state must sum to one. MEMM favours those states
through which less number of transitions occurs. [16]
Figure 3: Diagrammatic description of MEMM
2.2.3. Conditional Random Field (CRF)
It is graphical undirected model .Unlike other classifiers, it also takes into consideration the
context information or the neighbouring samples. It is known as Random field since it computes
the conditional probability on the following node given the present node values.
This methodology has advantages same as that of MEMM. Also it resolves the label bias problem
faced by MEMM. [3]
Figure 4.Diagrammatic description of CRF
MEMM
BEAM
SEARCH
HANDLING
UNKNOWN ENTITIES
GAZETTEER
POS TAGGED
TEXT
FINAL OUTPUT
UNTAGGED
TEXT
CRF
FORWARD
VITERBI &
BBACKWARD
A* SEARCH
HANDLING
UNKNOWN ENTITIES
GAZETTEER
POS TAGGED
TEXT
FINAL OUTPUT
UNTAGGED
TEXT
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
47
2.2.4. Support Vector Machine (SVM)
This methodology was introduced by Vapnik. SVM is a supervised statistical approach. The main
objective of this approach is to find whether a specific vector belongs to a particular target class
or not. [2] In this approach, the training as well as the testing data belongs to the single dimension
vector space.
Figure 5: Diagrammatic description of SVM
During training in this approach, we generate a hyper plane that is used to categorize the members
into two classes (positive and negative classes) that exists on the opposite sides of a hyper plane.
This approach also computes the distance of every vector from the hyper plane known as margin.
The main advantage of this approach is that it gives high accuracy for the text categorization
problem. [4]
2.2.5 Decision Tree
It is a well known methodology that is used to extract and categorize Named Entities in a given
corpus .In this approach, some recognition rules are applied to the untagged training corpus so
that Named Entities are retrieved. Now, we match these Named Entities obtained with the actual
answer key provided by the humans. If the Named Entity is same as the answer key, then it is
referred to as the positive example else it is known as negative example. [7]. A decision tree is
build that classifies the Named Entities in the testing document.[9] The leaf node of decision tree
depicts the resultant value of test .
3. PERFORMANCE METRICS
Performance Metrics is very important since it reveals the performance of a Named Entity
Recognition based system in terms of Precision, Accuracy and F-Measure. The output of a NER
system may be termed as “response” and the interpretation of human as the “answer key”. We
consider the following terms:
1. Correct-If the response is same as the answer key.
2. Incorrect-If the response is not same as the answer key.
3. Missing-If answer key is found to be tagged but response is not tagged.
4. Spurious-If response is found to be tagged but answer key is not tagged.[6]
Hence, we define Precision, Recall and F-Measure as follows:
SVM
BEAM
SEARCH
HANDLING
UNKNOWN ENTITIES
GAZETTEER
POS TAGGED
TEXT
FINAL OUTPUT
UNTAGGED
TEXT
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
48
Precision (P): Correct / (Correct + Incorrect + Missing)
Recall (R): Correct/ (Correct + Incorrect + Spurious)
F-Measure: (2*P*R)/(P+R) [5][8]
4. ISSUES IN NER IN INDIAN LANGUAGES
We still have not performed much of the work in NER in the Indian languages. This is mainly due
to the fact that Indian languages lack in resources such as annotated corpora and lexical resources.
There are many challenges related to the Named Entity Recognition in the Indian languages
.Some of them include the following:[6]
1. Lack of Capitalization: In Indian languages, the Capitalization concept is absent. Whereas, in
English and in many of the European languages, the word in which first alphabet is capital is a
proper noun. The NER based systems that are developed for the English and the European
languages, henceforth cannot be used to perform named entity recognition in the Indian languages
.Thus there is a need to develop an efficient NER based system for the Indian languages. [15]
2. Indian languages are inflectional and morphologically rich and are free word order.
3. Indian languages lack in resources .This problem is due to the fact that web mostly have lists of
Named Entities which are in English and not in the Indian languages.[17].
4. In dictionary of the Indian languages, many common nouns also exists as proper nouns. E.g.
Lata, Suraj, Aakash , Tara etc. are the Name of persons and common nouns as well. So, we need
to resolve ambiguities, which is also one of the issues in NER in the Indian languages
5. RESULTS
We have prepared a general corpus from the Hindi newspapers on the web. We have annotated it
manually. The Named Entity tags that we have used are: PER (Name of Person), LOC (Name of
Location), TIME, MONTH, SPORT, ORG (Name of Organization), VEH (Name of Vehicle),
RIVER and QTY (Quantity).In the first phase, we have applied the Rule based heuristics or the
shallow parsing technique over the Corpus, in which some of the helping words are used to detect
the Named Entities, that occur just after or before the Named Entities to be identified. In the
second phase, we apply Hidden Markov Model (HMM) to detect the rest of the Named Entities.
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
49
Table 2 Results of Rule based heuristics or shallow parsing technique
NAMED
ENTITIES
TOTAL NAMED
ENTITIES(NEs)
NAMED ENTITIES
(NEs) IDENTIFIED
ACCURACY
LOC 247 125 50.60%
PER 56 29 51.79%
QTY 79 40 50.63%
TIME 67 34 50.75%
ORG 135 68 50.37%
SPORT 45 23 51.11%
RIVER 11 6 54.54%
VEH 25 0 0%
MONTH 22 0 0%
TOTAL NEs = 687 TOTAL NEs DETECTED
= 325
TOTAL ACCURACY
= 47.5%
Table 3 Results of Hidden Markov Model (HMM)
NAMED
ENTITIES
TOTAL NAMED
ENTITIES (NEs)
UNDETECTED
NAMED ENTITIES (NEs)
IDENTIFIED
ACCURACY
LOC 122 107 87.70%
PER 27 24 88.89%
QTY 39 34 87.18%
TIME 33 29 87.88%
ORG 67 59 88.06%
SPORT 22 20 90.90%
RIVER 5 5 100%
VEH 25 25 100%
MONTH 22 22 100%
TOTAL NEs = 362 TOTAL NEs DETECTED
= 325
TOTAL ACCURACY
= 89.78%
Table 4 Results of Combination of Approaches or Hybrid Approach
TOTAL NAMED
ENTITIES (NEs)
NAMED ENTITIES (NEs)
IDENTIFIED
ACCURACY
687 650 94.61%
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
Figure 6 Results of using Combined Approach
6. CONCLUSIONS
We have obtained accuracy of about 94.61% by
HMM, as shown in Table 4. Table 2 depicts that if we applied only Rule Based Heuristics, then it
performed very poorly, and the accuracy
depicts that if we applied only HMM, then it
obtained by this approach was
combined approach, then it gives very good results in
ACKNOWLEDGEMENT
I would like to thank all those who helped me in accomplishing this task.
REFERENCES
[1] Animesh Nayan,, B. Ravi Kiran Rao, Pawandeep Singh,Sudip Sanyal and Ratna Sanya “Named
Entity Recognition for Indian Languages”
South and South East Asian Languages ,Hyderabad (India) pp. 97
[2] Asif Ekbal and Sivaji Bandyopadhyay. “
Language Independent Approa
2010.
[3] Asif Ekbal, Rejwanul Haque, Amitava Das, Venkateswarlu Poka and Sivaji Bandyopadhyay
“Language Independent Named Entity Recognition in Indian Languages” .In Proceedings of
IJCNLP-08 Workshop on NER for South and South East Asian Languages, pages 33
India, January 2008.
[4] Asif Ekbal and Sivaji Bandyopadhyay 2008 “ Bengali Named Entity Recognition using Support
Vector Machine” Proceedings of the IJCNLP
Languages, pages 51–58, Hyderabad, India, January 2008..
90
91
92
93
94
95
96
97
98
99
100
101
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
Figure 6 Results of using Combined Approach
We have obtained accuracy of about 94.61% by aggregating the rule based heuristic
Table 2 depicts that if we applied only Rule Based Heuristics, then it
performed very poorly, and the accuracy obtained by this approach was 47.5%. Similarly, Table 3
we applied only HMM, then its performance was average, and the accuracy
obtained by this approach was 89.78%. This shows that if we apply hybrid approach
combined approach, then it gives very good results in a Named Entity Recognition based system
I would like to thank all those who helped me in accomplishing this task.
Animesh Nayan,, B. Ravi Kiran Rao, Pawandeep Singh,Sudip Sanyal and Ratna Sanya “Named
Entity Recognition for Indian Languages” .In Proceedings of the IJCNLP-08 Workshop on NER for
South and South East Asian Languages ,Hyderabad (India) pp. 97–104, 2008.
Bandyopadhyay. “Named Entity Recognition using Support Vector Machine: A
Language Independent Approach” International Journal of Electrical and Electronics Engineering 4:2
Asif Ekbal, Rejwanul Haque, Amitava Das, Venkateswarlu Poka and Sivaji Bandyopadhyay
“Language Independent Named Entity Recognition in Indian Languages” .In Proceedings of
08 Workshop on NER for South and South East Asian Languages, pages 33–
Asif Ekbal and Sivaji Bandyopadhyay 2008 “ Bengali Named Entity Recognition using Support
Vector Machine” Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian
58, Hyderabad, India, January 2008..
NAMED ENTITIES
LOC
PER
QTY
TIME
ORG
SPORT
RIVER
VEH
MONTH
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
50
le based heuristics and the
Table 2 depicts that if we applied only Rule Based Heuristics, then it
Similarly, Table 3
, and the accuracy
apply hybrid approach or the
based system.
Animesh Nayan,, B. Ravi Kiran Rao, Pawandeep Singh,Sudip Sanyal and Ratna Sanya “Named
08 Workshop on NER for
Named Entity Recognition using Support Vector Machine: A
ch” International Journal of Electrical and Electronics Engineering 4:2
Asif Ekbal, Rejwanul Haque, Amitava Das, Venkateswarlu Poka and Sivaji Bandyopadhyay
“Language Independent Named Entity Recognition in Indian Languages” .In Proceedings of the
–40,Hyderabad,
Asif Ekbal and Sivaji Bandyopadhyay 2008 “ Bengali Named Entity Recognition using Support
Workshop on NER for South and South East Asian
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
51
[5] B. Sasidhar, P. M. Yohan, Dr. A. Vinaya Babu3, Dr. A. Govardhan. “A Survey on Named Entity
Recognition in Indian Languages with particular reference to Telugu” IJCSI International Journal of
Computer Science Issues, Vol. 8, Issue 2, March 2011
[6] Darvinder kaur, Vishal Gupta. “A survey of Named Entity Recognition in English and other Indian
Languages” . IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November
2010.
[7] Georgios Paliouras, Vangelis Karkaletsis, Georgios Petasis and Constantine D.
Spyropoulos.”Learning Decision Trees for Named-Entity Recognition and Classification”
[8] G.V.S.RAJU, B.SRINIVASU, Dr.S.VISWANADHA RAJU, 4K.S.M.V.KUMAR “Named Entity
Recognition for Telugu Using Maximum Entropy Model”
[9] Hideki Isozaki “Japanese Named Entity Recognition based on a Simple Rule Generator and Decision
Tree Learning” .Available at:http://acl.ldc.upenn.edu/acl2001/MAIN/ISOZAKI.PDF
[10] James Mayfield and Paul McNamee and Christine Piatko “Named Entity Recognition using Hundreds
of Thousands of Features” .Available at: http://acl.ldc.upenn.edu/W/W03/W03-0429.pdf
[11] Kamaldeep Kaur, Vishal Gupta.” Name Entity Recognition for Punjabi Language” IRACST -
International Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN:
2249-9555 .Vol. 2, No.3, June 2012
[12] Lawrence R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech
Recognition", In Proceedings of the IEEE, 77 (2), p. 257-286February 1989.Available at:
http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf
[13] “Padmaja Sharma, Utpal Sharma, Jugal Kalita.”Named Entity Recognition: A Survey for the Indian
Languages. ” . (LANGUAGE IN INDIA. Strength for Today and Bright Hope for Tomorrow .Volume
11: 5 May 2011 ISSN 1930-
2940)AvailableAt:http://www.languageinindia.com/may2011/v11i5may2011.pdf
[14] Praveen Kumar P and Ravi Kiran V” A Hybrid Named Entity Recognition System for South Asian
Languages”. Available at-http://www.aclweb.org/anthology-new/I/I08/I08-5012.pdf
[15] S. Pandian, K. A. Pavithra, and T. Geetha, “Hybrid Three-stage Named Entity Recognizer for Tamil,”
INFOS2008, March Cairo-Egypt. Available
at: http://infos2008.fci.cu.edu.eg/infos/NLP_08_P045-052.pdf
[16] Shilpi Srivastava, Mukund Sanglikar & D.C Kothari. ”Named Entity Recognition System for Hindi
Language: A Hybrid Approach” International Journal of Computational Linguistics (IJCL), Volume
(2) : Issue (1) : 2011.Available at:
http://cscjournals.org/csc/manuscript/Journals/IJCL/volume2/Issue1/IJCL-19.pdf
[17] Sujan Kumar Saha, Sudeshna Sarkar, Pabitra Mitra “Gazetteer Preparation for Named Entity
Recognition in Indian Languages”.
[18] Sujan Kumar Saha Sanjay Chatterji Sandipan Dandapat. “A Hybrid Approach for Named Entity
Recognition in Indian Languages”
[19] S. Biswas, M. K. Mishra, Sitanath_biswas, S. Acharya, S. Mohanty “A Two Stage Language
Independent Named Entity Recognition for Indian Languages” (IJCSIT) International Journal of
Computer Science and Information Technologies, Vol. 1 (4), 2010, 285-289.
[20] Vishal Gupta, Gurpreet Singh Lehal “Named Entity Recognition for Punjabi Language Text
Summarization” International Journal of Computer Applications (0975 – 8887) Vpl.33 No.3, Nov.
2011
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012
52
Authors
Deepti Chopra received B.Tech degree in Computer Science and Engineering from
Rajasthan College of Engineering for Women, Jaipur, Rajasthan in 2011.Currently she is
pursuing her M.Tech degree in Computer Science and Engineering from Banasthali
University, Rajasthan. Her research interests include Artificial Intelligence, Natural
Language Processing, and Information Retrieval.
Nusrat Jahan received B.Tech degree in Computer Science and Engineering from R.N.
Modi Engineering College, Kota, Rajasthan in 2010.Currently she is pursuing her
M.Tech degree in Computer Science and Engineering from Banasthali University,
Rajasthan. Her research interests include Artificial Intelligence, Natural Language
Processing, and Information Retrieval.
Sudha Morwal is an active researcher in the field of Natural Language Processing.
Currently working as Associate Professor in the Department of Computer Science at
Banasthali University (Rajasthan), India. She has done M.Tech (Computer Science) ,
NET, M.Sc (Computer Science) and her PhD is in progress from Banasthali University
(Rajasthan), India.

More Related Content

PDF
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
PDF
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
PDF
A fuzzy logic based on sentiment
PDF
A study on the approaches of developing a named entity recognition tool
PDF
DBMS Campus crack Question Prepared by Randhir Kumar
PDF
Multitier holistic Approach for urdu Nastaliq Recognition
PDF
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
PDF
Towards Building Semantic Role Labeler for Indian Languages
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
A fuzzy logic based on sentiment
A study on the approaches of developing a named entity recognition tool
DBMS Campus crack Question Prepared by Randhir Kumar
Multitier holistic Approach for urdu Nastaliq Recognition
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Towards Building Semantic Role Labeler for Indian Languages

What's hot (18)

PDF
DOMAIN BASED CHUNKING
PDF
1861 1865
PDF
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
PDF
Semantic based automatic question generation using artificial immune system
PDF
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
PDF
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
PDF
PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...
PDF
D3 dhanalakshmi
PDF
Review of research on devnagari character recognition
PPT
An OCR System for recognition of Urdu text in Nastaliq Font
PPTX
OCR for Urdu translation
PDF
MODIFIED PAGE RANK ALGORITHM TO SOLVE AMBIGUITY OF POLYSEMOUS WORDS
PDF
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
PDF
Suitability of naïve bayesian methods for paragraph level text classification...
PDF
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
PDF
Ijarcet vol-2-issue-4-1363-1367
PDF
Resolving the semantics of vietnamese questions in v news qaict system
PDF
TALASH: A SEMANTIC AND CONTEXT BASED OPTIMIZED HINDI SEARCH ENGINE
DOMAIN BASED CHUNKING
1861 1865
Using Decision Tree for Automatic Identification of Bengali Noun-Noun Compounds
Semantic based automatic question generation using artificial immune system
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
PERFORMANCE EVALUATION OF STATISTICAL CLASSIFIERS USING INDIAN SIGN LANGUAGE ...
D3 dhanalakshmi
Review of research on devnagari character recognition
An OCR System for recognition of Urdu text in Nastaliq Font
OCR for Urdu translation
MODIFIED PAGE RANK ALGORITHM TO SOLVE AMBIGUITY OF POLYSEMOUS WORDS
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
Suitability of naïve bayesian methods for paragraph level text classification...
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
Ijarcet vol-2-issue-4-1363-1367
Resolving the semantics of vietnamese questions in v news qaict system
TALASH: A SEMANTIC AND CONTEXT BASED OPTIMIZED HINDI SEARCH ENGINE
Ad

Similar to HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDEN MARKOV MODEL (20)

PDF
A survey of named entity recognition in assamese and other indian languages
PDF
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
PDF
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
PDF
Named Entity Recognition using Hidden Markov Model (HMM)
PDF
Named Entity Recognition using Hidden Markov Model (HMM)
PDF
Named Entity Recognition using Hidden Markov Model (HMM)
PDF
D017422528
PDF
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
PDF
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
PDF
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
PDF
IRJET- Survey for Amazon Fine Food Reviews
PDF
B017441015
PDF
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
PDF
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
PDF
Top 10 cited articles in nlp
PDF
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...
PDF
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
PDF
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
PDF
International Journal on Soft Computing, Artificial Intelligence and Applicat...
PDF
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
A survey of named entity recognition in assamese and other indian languages
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
D017422528
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
IRJET- Survey for Amazon Fine Food Reviews
B017441015
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
Top 10 cited articles in nlp
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
International Journal on Soft Computing, Artificial Intelligence and Applicat...
JOB MATCHING USING ARTIFICIAL INTELLIGENCE
Ad

More from ijistjournal (20)

PDF
MATHEMATICAL EXPLANATION TO SOLUTION FOR EX-NOR PROBLEM USING MLFFN
PPTX
Call for Papers - International Journal of Information Sciences and Technique...
PDF
3rd International Conference on NLP, AI & Information Retrieval (NLAII 2025)
PDF
SURVEY ON LI-FI TECHNOLOGY AND ITS APPLICATIONS
PPTX
Research Article Submission - International Journal of Information Sciences a...
PDF
A BRIEF REVIEW OF SENTIMENT ANALYSIS METHODS
PDF
14th International Conference on Information Technology Convergence and Servi...
PPTX
Online Paper Submission - International Journal of Information Sciences and T...
PDF
New Era of Teaching Learning : 3D Marker Based Augmented Reality
PPTX
Submit Your Research Articles - International Journal of Information Sciences...
PDF
GOOGLE CLOUD MESSAGING (GCM): A LIGHT WEIGHT COMMUNICATION MECHANISM BETWEEN ...
PDF
6th International Conference on Artificial Intelligence and Machine Learning ...
PPTX
Call for Papers - International Journal of Information Sciences and Technique...
PDF
SURVEY OF ANDROID APPS FOR AGRICULTURE SECTOR
PDF
6th International Conference on Machine Learning Techniques and Data Science ...
PDF
International Journal of Information Sciences and Techniques (IJIST)
PPTX
Research Article Submission - International Journal of Information Sciences a...
PDF
SURVEY OF DATA MINING TECHNIQUES USED IN HEALTHCARE DOMAIN
PDF
International Journal of Information Sciences and Techniques (IJIST)
PPTX
Online Paper Submission - International Journal of Information Sciences and T...
MATHEMATICAL EXPLANATION TO SOLUTION FOR EX-NOR PROBLEM USING MLFFN
Call for Papers - International Journal of Information Sciences and Technique...
3rd International Conference on NLP, AI & Information Retrieval (NLAII 2025)
SURVEY ON LI-FI TECHNOLOGY AND ITS APPLICATIONS
Research Article Submission - International Journal of Information Sciences a...
A BRIEF REVIEW OF SENTIMENT ANALYSIS METHODS
14th International Conference on Information Technology Convergence and Servi...
Online Paper Submission - International Journal of Information Sciences and T...
New Era of Teaching Learning : 3D Marker Based Augmented Reality
Submit Your Research Articles - International Journal of Information Sciences...
GOOGLE CLOUD MESSAGING (GCM): A LIGHT WEIGHT COMMUNICATION MECHANISM BETWEEN ...
6th International Conference on Artificial Intelligence and Machine Learning ...
Call for Papers - International Journal of Information Sciences and Technique...
SURVEY OF ANDROID APPS FOR AGRICULTURE SECTOR
6th International Conference on Machine Learning Techniques and Data Science ...
International Journal of Information Sciences and Techniques (IJIST)
Research Article Submission - International Journal of Information Sciences a...
SURVEY OF DATA MINING TECHNIQUES USED IN HEALTHCARE DOMAIN
International Journal of Information Sciences and Techniques (IJIST)
Online Paper Submission - International Journal of Information Sciences and T...

Recently uploaded (20)

PPTX
web development for engineering and engineering
PDF
Well-logging-methods_new................
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Digital Logic Computer Design lecture notes
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
DOCX
573137875-Attendance-Management-System-original
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
PPT on Performance Review to get promotions
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
web development for engineering and engineering
Well-logging-methods_new................
Embodied AI: Ushering in the Next Era of Intelligent Systems
OOP with Java - Java Introduction (Basics)
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Digital Logic Computer Design lecture notes
Automation-in-Manufacturing-Chapter-Introduction.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
bas. eng. economics group 4 presentation 1.pptx
Mechanical Engineering MATERIALS Selection
UNIT 4 Total Quality Management .pptx
Operating System & Kernel Study Guide-1 - converted.pdf
573137875-Attendance-Management-System-original
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPT on Performance Review to get promotions
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS

HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDEN MARKOV MODEL

  • 1. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 DOI : 10.5121/ijist.2012.2604 43 HINDI NAMED ENTITY RECOGNITION BY AGGREGATING RULE BASED HEURISTICS AND HIDDEN MARKOV MODEL Deepti Chopra, Nusrat Jahan, Sudha Morwal Department of Computer Engineering, Banasthali Vidyapith Jaipur (Raj.), INDIA deeptichopra11@yahoo.co.in nusratkota@gmail.com sudha_morwal@yahoo.co.in ABSTRACT Named entity recognition (NER) is one of the applications of Natural Language Processing and is regarded as the subtask of information retrieval. NER is the process to detect Named Entities (NEs) in a document and to categorize them into certain Named entity classes such as the name of organization, person, location, sport, river, city, country, quantity etc. In English, we have accomplished lot of work related to NER. But, at present, still we have not been able to achieve much of the success pertaining to NER in the Indian languages. The following paper discusses about NER, the various approaches of NER, Performance Metrics, the challenges in NER in the Indian languages and finally some of the results that have been achieved by performing NER in Hindi by aggregating approaches such as Rule based heuristics and Hidden Markov Model (HMM). KEYWORDS HMM, Accuracy, NER, Performance Metrics, Named Entities 1. INTRODUCTION There are numerous applications of Named Entity Recognition (NER).Some of these include: Information Extraction, Question Answering, Information Retrieval, Automatic Summarization, Machine Translation etc. The Named Entities can be known to us, if we perform computations on the natural language. The task of extracting necessary details and retrieving important information can be made easier and faster, if the Named entities are already known to us. NER is the process in which Named Entities are detected in a document and are classified into their respective Named Entity classes using any of the NER based approaches. According to the 8th schedule, India is known to have 22 official Indian languages. NER in Indian languages is still considered to be a budding topic of research in the field of NLP and much of work is needed to be performed in this regard. Consider an example of NER in Hindi as follows: “Mohit/PER ne/O mi road/LOC se/O kitab/O khareedi/O |/O In the above sentence, the task of a NER based system is to extract and then classify the named entities into certain classes. Here, we have considered ‘Mohit’ as the name of a person, so it is
  • 2. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 44 shown by a PER tag. ‘mi road’ is the name of a location, so we have allotted a LOC tag to it. The named entity tags that we choose may vary every time. It depends on the individual choice and the contents that we have considered for the Named Entity Recognition. TABLE I lists some of the Named Entity Tags. Named Entity tags may be of the general type or may further be divided into sub tags which are of specific types. E.g. location tag (LOC) may further be classified into continent tag, country tag, city tag, state tag, town tag, street tag etc. Figure1. A single Named Entity tag split into more specific Named Entity tags Table 1 Various Named Entity Tags. NE Tags: Named Entity Tags PER: Name of Person, CO-Country, ORG-Organization, VEH-vehicle and QTY-Quantity NE TAG EXAMPLE PER Deepti, Sudha, Rohit CITY Jaipur, Mumbai, Kolkata CO India, China, Pakistan STATE Rajasthan, Maharashtra SPORT Hockey, Badminton ORG TCS, Infosys, Accenture RIVER Ganga, Krishna, kaveri DATE 27-04-2012, 31/01/1989 TIME 10:10 PERCENT 100% 2. METHODOLOGIES FOR NER There are basically two approaches that are employed in Named Entity Recognition. [5] [1] [18] These include: Rule Based Approach and Machine learning based Approach [11] [6] [16]. 2.1. Rule based Approach It is also known as handcrafted approach. It is of two types: LOCATION CONTINENT COUNTRY STATE CITY TOWN STREET T
  • 3. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 45 2.1.1 List Lookup Approach In this approach, Gazetteers are used that consists of different lists of Named Entity classes and a simple look up operation is performed to conclude whether a word is a Named Entity or not. If a particular word is found in a Named Entity class, then a Named Entity tag is allotted to that word according to the Named Entity class in which it is found. Indian languages lack in resources. We can prepare Gazetteers in Indian languages using transliteration that would convert English Named Entities into Indian languages. Some seed values of a domain specific corpus can be used that would learn the context patterns and then Named Entities are produced by the concept of bootstrapping.[17]This methodology is easy and fast .The disadvantage of this approach is that it cannot overcome the problem of ambiguities. E.g In a sentence:-““Ganga/PER Ne/O Ganga/RIVER nadi/O mein/O dupki/O Lagayi/O |/O””. In this sentence, Ganga is a Named Entity .But, it can be a person name or a river name .The ambiguity cannot be resolved by this methodology. 2.1.2. Linguistic Approach In this approach, a linguist, who has an in depth knowledge about the grammar of specific language constructs some rules, so that the Named Entities can be recognized as well as classified easily. [3][20][19]The rules that are constructed are language independent and cannot be used to identify Named Entities in some other language. [11] 2.2. Machine Learning Based Approach This approach is also known as automated approach or Statistical approach. Machine learning based approach is more efficiently and frequently used as compared to the Rule based approach. 2.2.1. Hidden Markov Model (HMM) HMM is a statistical based approach in which states are hidden or unobserved .The HMM produces sequence of tokens that are nothing but optimal state sequence. It is based on the Markov Chain Property i.e. the probability of occurrence of the next state is dependent on the just previous state. HMM is easy to implement. The disadvantage of this approach is that it requires lot of training in order to get better results and it cannot be used for large dependencies. [12] Figure 2: Diagrammatic description of HMM
  • 4. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 46 2.2.2. Maximum Entropy Markov Model (MEMM) It combines the concept of Hidden Markov Model and Maximum Entropy Model. While training, this model makes sure that the unknown values in a Markov Chain are connected and are not conditionally independent of each other. The large dependency problem of HMM is resolved by this model. Also, it has higher recall and precision as compared to HMM. The disadvantage of this approach is the label bias problem. The probabilities of transition from a particular state must sum to one. MEMM favours those states through which less number of transitions occurs. [16] Figure 3: Diagrammatic description of MEMM 2.2.3. Conditional Random Field (CRF) It is graphical undirected model .Unlike other classifiers, it also takes into consideration the context information or the neighbouring samples. It is known as Random field since it computes the conditional probability on the following node given the present node values. This methodology has advantages same as that of MEMM. Also it resolves the label bias problem faced by MEMM. [3] Figure 4.Diagrammatic description of CRF MEMM BEAM SEARCH HANDLING UNKNOWN ENTITIES GAZETTEER POS TAGGED TEXT FINAL OUTPUT UNTAGGED TEXT CRF FORWARD VITERBI & BBACKWARD A* SEARCH HANDLING UNKNOWN ENTITIES GAZETTEER POS TAGGED TEXT FINAL OUTPUT UNTAGGED TEXT
  • 5. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 47 2.2.4. Support Vector Machine (SVM) This methodology was introduced by Vapnik. SVM is a supervised statistical approach. The main objective of this approach is to find whether a specific vector belongs to a particular target class or not. [2] In this approach, the training as well as the testing data belongs to the single dimension vector space. Figure 5: Diagrammatic description of SVM During training in this approach, we generate a hyper plane that is used to categorize the members into two classes (positive and negative classes) that exists on the opposite sides of a hyper plane. This approach also computes the distance of every vector from the hyper plane known as margin. The main advantage of this approach is that it gives high accuracy for the text categorization problem. [4] 2.2.5 Decision Tree It is a well known methodology that is used to extract and categorize Named Entities in a given corpus .In this approach, some recognition rules are applied to the untagged training corpus so that Named Entities are retrieved. Now, we match these Named Entities obtained with the actual answer key provided by the humans. If the Named Entity is same as the answer key, then it is referred to as the positive example else it is known as negative example. [7]. A decision tree is build that classifies the Named Entities in the testing document.[9] The leaf node of decision tree depicts the resultant value of test . 3. PERFORMANCE METRICS Performance Metrics is very important since it reveals the performance of a Named Entity Recognition based system in terms of Precision, Accuracy and F-Measure. The output of a NER system may be termed as “response” and the interpretation of human as the “answer key”. We consider the following terms: 1. Correct-If the response is same as the answer key. 2. Incorrect-If the response is not same as the answer key. 3. Missing-If answer key is found to be tagged but response is not tagged. 4. Spurious-If response is found to be tagged but answer key is not tagged.[6] Hence, we define Precision, Recall and F-Measure as follows: SVM BEAM SEARCH HANDLING UNKNOWN ENTITIES GAZETTEER POS TAGGED TEXT FINAL OUTPUT UNTAGGED TEXT
  • 6. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 48 Precision (P): Correct / (Correct + Incorrect + Missing) Recall (R): Correct/ (Correct + Incorrect + Spurious) F-Measure: (2*P*R)/(P+R) [5][8] 4. ISSUES IN NER IN INDIAN LANGUAGES We still have not performed much of the work in NER in the Indian languages. This is mainly due to the fact that Indian languages lack in resources such as annotated corpora and lexical resources. There are many challenges related to the Named Entity Recognition in the Indian languages .Some of them include the following:[6] 1. Lack of Capitalization: In Indian languages, the Capitalization concept is absent. Whereas, in English and in many of the European languages, the word in which first alphabet is capital is a proper noun. The NER based systems that are developed for the English and the European languages, henceforth cannot be used to perform named entity recognition in the Indian languages .Thus there is a need to develop an efficient NER based system for the Indian languages. [15] 2. Indian languages are inflectional and morphologically rich and are free word order. 3. Indian languages lack in resources .This problem is due to the fact that web mostly have lists of Named Entities which are in English and not in the Indian languages.[17]. 4. In dictionary of the Indian languages, many common nouns also exists as proper nouns. E.g. Lata, Suraj, Aakash , Tara etc. are the Name of persons and common nouns as well. So, we need to resolve ambiguities, which is also one of the issues in NER in the Indian languages 5. RESULTS We have prepared a general corpus from the Hindi newspapers on the web. We have annotated it manually. The Named Entity tags that we have used are: PER (Name of Person), LOC (Name of Location), TIME, MONTH, SPORT, ORG (Name of Organization), VEH (Name of Vehicle), RIVER and QTY (Quantity).In the first phase, we have applied the Rule based heuristics or the shallow parsing technique over the Corpus, in which some of the helping words are used to detect the Named Entities, that occur just after or before the Named Entities to be identified. In the second phase, we apply Hidden Markov Model (HMM) to detect the rest of the Named Entities.
  • 7. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 49 Table 2 Results of Rule based heuristics or shallow parsing technique NAMED ENTITIES TOTAL NAMED ENTITIES(NEs) NAMED ENTITIES (NEs) IDENTIFIED ACCURACY LOC 247 125 50.60% PER 56 29 51.79% QTY 79 40 50.63% TIME 67 34 50.75% ORG 135 68 50.37% SPORT 45 23 51.11% RIVER 11 6 54.54% VEH 25 0 0% MONTH 22 0 0% TOTAL NEs = 687 TOTAL NEs DETECTED = 325 TOTAL ACCURACY = 47.5% Table 3 Results of Hidden Markov Model (HMM) NAMED ENTITIES TOTAL NAMED ENTITIES (NEs) UNDETECTED NAMED ENTITIES (NEs) IDENTIFIED ACCURACY LOC 122 107 87.70% PER 27 24 88.89% QTY 39 34 87.18% TIME 33 29 87.88% ORG 67 59 88.06% SPORT 22 20 90.90% RIVER 5 5 100% VEH 25 25 100% MONTH 22 22 100% TOTAL NEs = 362 TOTAL NEs DETECTED = 325 TOTAL ACCURACY = 89.78% Table 4 Results of Combination of Approaches or Hybrid Approach TOTAL NAMED ENTITIES (NEs) NAMED ENTITIES (NEs) IDENTIFIED ACCURACY 687 650 94.61%
  • 8. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 Figure 6 Results of using Combined Approach 6. CONCLUSIONS We have obtained accuracy of about 94.61% by HMM, as shown in Table 4. Table 2 depicts that if we applied only Rule Based Heuristics, then it performed very poorly, and the accuracy depicts that if we applied only HMM, then it obtained by this approach was combined approach, then it gives very good results in ACKNOWLEDGEMENT I would like to thank all those who helped me in accomplishing this task. REFERENCES [1] Animesh Nayan,, B. Ravi Kiran Rao, Pawandeep Singh,Sudip Sanyal and Ratna Sanya “Named Entity Recognition for Indian Languages” South and South East Asian Languages ,Hyderabad (India) pp. 97 [2] Asif Ekbal and Sivaji Bandyopadhyay. “ Language Independent Approa 2010. [3] Asif Ekbal, Rejwanul Haque, Amitava Das, Venkateswarlu Poka and Sivaji Bandyopadhyay “Language Independent Named Entity Recognition in Indian Languages” .In Proceedings of IJCNLP-08 Workshop on NER for South and South East Asian Languages, pages 33 India, January 2008. [4] Asif Ekbal and Sivaji Bandyopadhyay 2008 “ Bengali Named Entity Recognition using Support Vector Machine” Proceedings of the IJCNLP Languages, pages 51–58, Hyderabad, India, January 2008.. 90 91 92 93 94 95 96 97 98 99 100 101 International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 Figure 6 Results of using Combined Approach We have obtained accuracy of about 94.61% by aggregating the rule based heuristic Table 2 depicts that if we applied only Rule Based Heuristics, then it performed very poorly, and the accuracy obtained by this approach was 47.5%. Similarly, Table 3 we applied only HMM, then its performance was average, and the accuracy obtained by this approach was 89.78%. This shows that if we apply hybrid approach combined approach, then it gives very good results in a Named Entity Recognition based system I would like to thank all those who helped me in accomplishing this task. Animesh Nayan,, B. Ravi Kiran Rao, Pawandeep Singh,Sudip Sanyal and Ratna Sanya “Named Entity Recognition for Indian Languages” .In Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages ,Hyderabad (India) pp. 97–104, 2008. Bandyopadhyay. “Named Entity Recognition using Support Vector Machine: A Language Independent Approach” International Journal of Electrical and Electronics Engineering 4:2 Asif Ekbal, Rejwanul Haque, Amitava Das, Venkateswarlu Poka and Sivaji Bandyopadhyay “Language Independent Named Entity Recognition in Indian Languages” .In Proceedings of 08 Workshop on NER for South and South East Asian Languages, pages 33– Asif Ekbal and Sivaji Bandyopadhyay 2008 “ Bengali Named Entity Recognition using Support Vector Machine” Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian 58, Hyderabad, India, January 2008.. NAMED ENTITIES LOC PER QTY TIME ORG SPORT RIVER VEH MONTH International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 50 le based heuristics and the Table 2 depicts that if we applied only Rule Based Heuristics, then it Similarly, Table 3 , and the accuracy apply hybrid approach or the based system. Animesh Nayan,, B. Ravi Kiran Rao, Pawandeep Singh,Sudip Sanyal and Ratna Sanya “Named 08 Workshop on NER for Named Entity Recognition using Support Vector Machine: A ch” International Journal of Electrical and Electronics Engineering 4:2 Asif Ekbal, Rejwanul Haque, Amitava Das, Venkateswarlu Poka and Sivaji Bandyopadhyay “Language Independent Named Entity Recognition in Indian Languages” .In Proceedings of the –40,Hyderabad, Asif Ekbal and Sivaji Bandyopadhyay 2008 “ Bengali Named Entity Recognition using Support Workshop on NER for South and South East Asian
  • 9. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 51 [5] B. Sasidhar, P. M. Yohan, Dr. A. Vinaya Babu3, Dr. A. Govardhan. “A Survey on Named Entity Recognition in Indian Languages with particular reference to Telugu” IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011 [6] Darvinder kaur, Vishal Gupta. “A survey of Named Entity Recognition in English and other Indian Languages” . IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010. [7] Georgios Paliouras, Vangelis Karkaletsis, Georgios Petasis and Constantine D. Spyropoulos.”Learning Decision Trees for Named-Entity Recognition and Classification” [8] G.V.S.RAJU, B.SRINIVASU, Dr.S.VISWANADHA RAJU, 4K.S.M.V.KUMAR “Named Entity Recognition for Telugu Using Maximum Entropy Model” [9] Hideki Isozaki “Japanese Named Entity Recognition based on a Simple Rule Generator and Decision Tree Learning” .Available at:http://acl.ldc.upenn.edu/acl2001/MAIN/ISOZAKI.PDF [10] James Mayfield and Paul McNamee and Christine Piatko “Named Entity Recognition using Hundreds of Thousands of Features” .Available at: http://acl.ldc.upenn.edu/W/W03/W03-0429.pdf [11] Kamaldeep Kaur, Vishal Gupta.” Name Entity Recognition for Punjabi Language” IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN: 2249-9555 .Vol. 2, No.3, June 2012 [12] Lawrence R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", In Proceedings of the IEEE, 77 (2), p. 257-286February 1989.Available at: http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf [13] “Padmaja Sharma, Utpal Sharma, Jugal Kalita.”Named Entity Recognition: A Survey for the Indian Languages. ” . (LANGUAGE IN INDIA. Strength for Today and Bright Hope for Tomorrow .Volume 11: 5 May 2011 ISSN 1930- 2940)AvailableAt:http://www.languageinindia.com/may2011/v11i5may2011.pdf [14] Praveen Kumar P and Ravi Kiran V” A Hybrid Named Entity Recognition System for South Asian Languages”. Available at-http://www.aclweb.org/anthology-new/I/I08/I08-5012.pdf [15] S. Pandian, K. A. Pavithra, and T. Geetha, “Hybrid Three-stage Named Entity Recognizer for Tamil,” INFOS2008, March Cairo-Egypt. Available at: http://infos2008.fci.cu.edu.eg/infos/NLP_08_P045-052.pdf [16] Shilpi Srivastava, Mukund Sanglikar & D.C Kothari. ”Named Entity Recognition System for Hindi Language: A Hybrid Approach” International Journal of Computational Linguistics (IJCL), Volume (2) : Issue (1) : 2011.Available at: http://cscjournals.org/csc/manuscript/Journals/IJCL/volume2/Issue1/IJCL-19.pdf [17] Sujan Kumar Saha, Sudeshna Sarkar, Pabitra Mitra “Gazetteer Preparation for Named Entity Recognition in Indian Languages”. [18] Sujan Kumar Saha Sanjay Chatterji Sandipan Dandapat. “A Hybrid Approach for Named Entity Recognition in Indian Languages” [19] S. Biswas, M. K. Mishra, Sitanath_biswas, S. Acharya, S. Mohanty “A Two Stage Language Independent Named Entity Recognition for Indian Languages” (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 1 (4), 2010, 285-289. [20] Vishal Gupta, Gurpreet Singh Lehal “Named Entity Recognition for Punjabi Language Text Summarization” International Journal of Computer Applications (0975 – 8887) Vpl.33 No.3, Nov. 2011
  • 10. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.6, November 2012 52 Authors Deepti Chopra received B.Tech degree in Computer Science and Engineering from Rajasthan College of Engineering for Women, Jaipur, Rajasthan in 2011.Currently she is pursuing her M.Tech degree in Computer Science and Engineering from Banasthali University, Rajasthan. Her research interests include Artificial Intelligence, Natural Language Processing, and Information Retrieval. Nusrat Jahan received B.Tech degree in Computer Science and Engineering from R.N. Modi Engineering College, Kota, Rajasthan in 2010.Currently she is pursuing her M.Tech degree in Computer Science and Engineering from Banasthali University, Rajasthan. Her research interests include Artificial Intelligence, Natural Language Processing, and Information Retrieval. Sudha Morwal is an active researcher in the field of Natural Language Processing. Currently working as Associate Professor in the Department of Computer Science at Banasthali University (Rajasthan), India. She has done M.Tech (Computer Science) , NET, M.Sc (Computer Science) and her PhD is in progress from Banasthali University (Rajasthan), India.