Novel Scoring System for Identify Accurate Answers for Factoid Questions

International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
Novel Scoring System for Identify Accurate
Answers for Factoid Questions
1
Harpreet Kaur, 2
Rimpi Kumari
1
Swami Vivekanand Institute of Engineering and Technology, Banur, Punjab, India
Abstract: Question and Answer System (QAS) are some of the many challenges for natural language understanding and interfaces. In
this paper we have develop a new scoring mathematical model that works on the five types of questions. The question text failures are
first extracted and a score is found based on its structure with respect to its template structure and then answer score is calculated again
the question as well as paragraph. A name entity recognizer and a Part of Speech tagger are applied on each of these words to encode
necessary of information. After that the text to finally reach at the index of the most probable answer with respect to question. In this the
entropy algorithm is used to find the exact answer.
Keywords: Natural language processing, Question answering System, Information retrieval.
1. Introduction
Questions answering (QA) systems look for the answer of a
question in a large collection of documents. The question is
in natural language. QA systems select text passages. Then,
after that the answer is extracted from these passages,
according to criteria issued from the question analysis. NLP
focuses on communications between computers and natural
languages in terms of theoretical results and practical
applications, and on information sharing now that
information is exchange as it never has been before and
sharing information becoming the leading theme in the
domain of NLP systems[2][3]. Automatic question
answering system will help for the above technology. In this
Question Answering System consists of three distinct phases:
Question classification, information retrieval or document
processing and answer extraction.
The design of a standard QA system assumes that the
language in which the question is asked and the text
collection available to be processed are all in the same
language. English QA system research attempts to deal with
a wide range of question types like WHEN, WHERE,
WHAT, HOW, WHOM, WHY & WHOSE. Thus the aim of
a QA system is to localize the exact answer to a question
from a structured or a non-structured collection of texts.
Question Answering (QA) Systems allow the user to ask
questions in a natural language and obtain an exact answer.
In this, we tried to learn the important issues in the field of
Question Answering (QA) systems. We peeked into the
internals of many established QA systems. we do not only
consider simple questions but text problems consisting of
several sentences. Our approach to translating the natural
language question uses an underlying corpus and the
knowledge base to derive meaningful and relevant patterns
which can then be used to process the questions and capture
their meaning with respect to the underlying knowledge
base. We classify the text based on their subject, verb, object
and preposition for determining the possible type of
questions to be generated. The ability of QA systems to
recognize a great amount of answer types is related to their
powerfulness for extracting right answers [5] [6] [8].
2. Previous Work
A survey of different QA techniques has been elaborated.
Question answering system for Indian languages like Hindi,
Telugu, Bengali and Punjabi is discussed. In Hindi language
the Hindi QA system research attempts to deal with a wide
range of question types like when, where, what time, how
many[1][3]. The developed Question-Answering system in
Hindi is using Hindi Shallow Parser. The shallow parser
gives the analysis of the sentence in terms of the
morphological analysis, POS tagging, Chunking etc. In
Bengali language question and answering system is one of
the Indo-Aryan languages of South Asia with over 200
million native speakers. A translation based on transliteration
and a table look-up method is proposed as an interface to the
actual QA task. The implementation part thus involves
transliterating a Bangla question as an equivalent Latin
alphabet (English) version that could be used in an actual QA
task [2]. The Bangla lexicon consists of a good number of
“loan-words” from Arabic, Persian, English and other
languages. An approach to transform the Bangla question
could be;
 Tokenizing the transliterate version of the Bangla
question,
 Translating the remaining question by a simple table
look-up method.
3. Methodology
In this first we collect the corpus of data or paragraph from
encyclopaedia to make the questions and find the exact
answer show n fig1. Corpus is of two types: Questions and
Paragraph. These questions have many types and these types
are what when, where/which, who/whose/whom. After this
with the help of these questions we make the question from
paragraph then next step is the paragraph chunk and question
score, the chunk paragraph is a format of writing, which
forces you to expand on your ideas and explain your
arguments.
Paper ID: 15091303 294

www.ijsr.net
Figure 1: Flow of Question Answer
It helps in skills writing development and the scores are
calculated on the basis of the accuracy of the answers. After
that the candidate put query or question and answer then the
similarity score will be calculated this loop will continue for
process till the best answer will be find.
4. Results
4.1 Mean Precision Percentage Values
It is the fraction of relevant retrieved answers given by the
question and answer system to the total number of retrieved
answers given by the question and answer system.
Mathematically, it is represented as:
The average Precision, Recall and F-Score is shown in table:
Table 1: Average Precision, Recall and F-Score of each
question type (in Percentage)
S.
No
Question
Type
Worst
Case
Average
Precision
Case
Best
Case
Worst
Case
Average
Recall
Best
Case
Worst
Case
Average
F- Score
Best
Case
1 What
70.6
82.604651
85.604 47.88
59.88372
62.88 29.27
32.2757
35.27
65116 65116 372 372 57 57
2 When
70.57
82.571429
85.571 51.57
63.57143
66.57 30.45
33.4577
36.45
42857 42857 143 143 77 77
3 Why 68.11 80.11 83.11 46.2 58.2 61.2
28.26
31.2683
34.26
83 83
4 Who
67.86
79.857143
82.857
42.3 54.3 57.3
26.9
29.9065
32.9
14286 14286 65 65
5 Where 63.7 75.7 78.7 49.2 61.2 64.2
28.36
31.3671
34.36
71 71
We have also considered the worst case scenario for analysis
the working of the system for each factoid questions, in this
we have found that in worst case the system typically find 7
questions corrects and 9 questions correctly in best possible
case ‘when ‘what’ type of questions are explored and search
on the input paragraph and similar is the case of other factoid
types [4] [7].
The graph given below in fig.2 shows the values of mean
precision for each type of factoid questions types which
shows how the system search for the information which is
relevant to the question to process the best answer from
possible dataset of answer predicted by the system. Scores
are calculated on the basis of the accuracy of the answer that
add another level of precision which can be made by finding
more common artifacts between the question token and the
answer token. The results would be more precise with the
use of more common verbs, nouns, adjectives, adverbs,
pronouns in both token sets and matching pattern with the
usage of regular expressions.
Figure 2: Average Mean Precision of each question type(in
Percentage
4.2 Mean Recall Percentage Values
It is the fraction of the number of relevant retrieved answers
given by the question and answer system to the total number
f relevant answers that should have been retrieved.
Mathematically, it is represented as:
The percentage of recall for each question type can be seen
by the graph given below in fig. 3. The answer found by the
question answering system can be more or less thorough than
the actual answer based on the dataset provided. The number
of answers possible for a query depends on the evaluator and
the ground truth. The answer expected by the evaluator may
differ from depending on the depth of search. As a result of
which there is a good amount of recall percentage due to
obvious reason of the high value of precision. The number
and the type of answers found from the paragraphs quite
similar in nature can be seen because of this high value of
recall mentioned above, creating difficulty in discriminating
one set of answer token from another possible similar set of
answer token.
Paper ID: 15091303 295

www.ijsr.net
Figure 3: Average Mean Recall of each question type (in %)
4.3 F-measure Percentage Values
It can be calculated only if precision and recall are known for
system. It calculates a harmonic mean between precision and
recall. Mathematically, it is represented as:
Figure 4: Average F-Score of each question types (in
Percentage)
5. Conclusion
Through this thesis work, we tried to learn the important
issues in the field of Question Answering (QA) systems. We
have added all types of questions.. It can be used to improve
question answering system by checking all returned answers.
However, it cannot be used alone to select the good answer.
Answering system has become an important component of
the online education platform. From our research findings we
took the initiative of proposing a basic framework for a QA
task for the language English [9]. The goal of a question
answering system is to retrieving answers to questions rather
than full documents or best matching passages, as most
information retrieval systems.
6. Future Score
In this research paper, we have added all types of questions.
These questions are when, why, who/whom, when, where.
We used the dataset and evaluated the performance of our
system using Recall and Precision. The future work include
that also the more questions can be added and the coding
system could be better [10] [11]. We hope to carry on these
ideas and develop additional mechanisms to question
generation based on the dependency features of the answers
and answer finding.
References
[1] Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin,
Andrew Ng “Web Question Answering: Is More
Always Better?”
[2] Haque, Nafid and Rosner, Mike. A prototype
framework for a Bangla question answering system
using translation based on transliteration and table
look-up as an interface for the medical domain.
University of Malta Gertjan Van Noord, University of
Groningen
[3] Ashish Kumar Saxena, Ganesh Viswanath Sambhu, L.
Venkata Subramaniam*, Saroj Kaushik”IITD-IBMIRL
System for Question Answering using Pattern
Matching, Semantic Type and Semantic Category
Recognition” OCT 2007.
[4] Boris Katz and Jimmy Lin” Selectively Using
Relations to Improve Precision in Question
Answering” MIT Artiﬁcial Intelligence Laboratory 200
Technology Square Cambridge, MA 02139
[5] Arnaud Grappy, Brigitte Grau”Answer type validation
in question answering systems”Le centre de hautes
etudes internationals dtnnformatique documentaire
Paris, France, France ©2010
[6] S. M. Harabagiu, M. A. Pa_sca, and S. J. Maiorano.
Experiments with open-domain textual question
answering. In Proceedings of the 18th conference on
Computational linguistics, Morristown, NJ, USA,
2000. Association for Computational Linguistics
[7] Matthew W. Bilotti and Eric Nyberg” Improving Text
Retrieval Precision and Answer Accuracy in Question
Answering Systems” Language Technologies Institute
Carnegie Mellon University5000 Forbes Avenue
Pittsburgh, PA 15213 USA
[8] E. Hovy, L. Gerber, U. Hermjakob, C.-Y. Lin, and D.
Ravichandran. Toward semantics-based answer
pinpointing. In HLT '01: Proceedings of the _rst
international conference on Human language
technology research, Morristown, NJ, USA, 2001.
Association for Computational Linguistics
[9] Guda, Vanitha., Sanampudi, Suresh. Kumar. And
Manikyamba, I.Lalkshmi ,”Approaches For Question
Answering Systems” , Vanitha Guda et al. /
International Journal of Engineering Science and
Technology (IJEST) ISSN : 0975-5462 Vol. 3 No.
2011. 990-995
[10] PINCHAK C. & LIN D. (2006). A Probabilistic
Answer Type Model. In Proceedings of the 11th
Conference of the European Chapter of the Association
for Computational Linguistics, p. 393–400.
[11] Quarteroni, S. and Manandhar S. “Designing an
Interactive Open-Domain Question Answering
System”. Journal of Natural Language Engineering 1.
1-23.
[12] LI X. & ROTH D. (2002). Learning Question
Classifiers. In Proceedings of the 19th International
Paper ID: 15091303 296

www.ijsr.net
Conference on Computational Linguistics, p. 1–7,
Morristown, NJ, USA : Association for Computational
Linguistics.
Author Profile
Harpreet Kaur is currently persuing the M. Tech in
computer science and engineering from Swami
Vivekanand Institute of Engineering & Technology,
Banur, Punjab. She holds the degree of B. Tech in
Computer Science and Technology from Baba Banda Singh
Bahadur Engineering and Technology, Fathegarh sahib, Punjab.
Er. Rimpi is currently working as Assistant Professor
in Computer Science and Engineering Department at
Swami Vivekanand Institute of Engineering and
Technology, Banur. She has completed her M. Tech in
Computer Engineering from Guru Nanak Dev University, Amritsar,
Punjab in 2011. She holds the degree of B. Tech in Computer
Science and Technology from Guru Nanak Dev University,
Amritsar, Punjab in 2009.
Paper ID: 15091303 297

Novel Scoring System for Identify Accurate Answers for Factoid Questions

More Related Content

What's hot (20)

Viewers also liked (9)

Similar to Novel Scoring System for Identify Accurate Answers for Factoid Questions (20)

More from International Journal of Science and Research (IJSR) (20)

Recently uploaded (20)

Novel Scoring System for Identify Accurate Answers for Factoid Questions