SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1953
LOCAL AND GLOBAL LEARNING METHOD FOR QUESTION ANSWERING
APPROACH
RADHIKA S M1, SYAMA R2
1SCT college of engineering, papanamcode, Trivandrum
2Assistant Professor, Dept. of computer science and engineering, Sct college of engineering,
Papanamcode, Trivandrum
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Vocabulary gap between healthseekersandhealth
care experts are more prevalent in health care domain.
Different health seekers describe their questions in different
ways and answers provided by the experts may contain non
standardised terminologies. To overcome the vocabularygap,
a new scheme is used which combines two approachesnamely
local mining and global learning. Local mining extracts
medical concepts from medical records and then map them to
normalised terminologies based on standardized dictionary.
Local mining suffer from problem of missing key concepts.
Global learning overcome the issues in local mining by finding
the missing key concepts.
Key Words: Local mining, Global learning, SNOMED-CT,
UMLS.
1.INTRODUCTION
Patients seeking online information about their health,
connecting patients with doctors world wide to know about
their health via question and answering. Doctors able to
interact with many patients about particular issue and
provides instant trusted answers for complex and
sophisticated problems. Previously external dictionary is
used to relate medical data which was not that much
sufficient enough. Here we incorporate corpus aware
terminology which is used to relate the natural language
medical data with medical terminology this narrow down
the path between health seekers and health providers. For
example: heart attack can also be said as myocardial
disorder. A tri-stage framework is used to accomplish the
task.
_ Noun phase extraction
_ Medical concept detection
_ Medical concept normalization
Due to loss of information global learning approach is used
to complement local mining approach. Medical sites are
among the most popular internet sites today through which
people can get more knowledge about their health
conditions. The practice of medicine is experiencing a shift
from patients who passively accept their doctors orders to
patients who actively took online information to know
briefly about their health becausedoctors arevery busywith
many patients and hence they cannot give brief description
about their health issue to each and every patient. Thisisthe
reasons why health seekers normally use online medical
sites. Most of the medical sitessuchasmayoclinic,Medscape
are consumer oriented and providetheirsoundadviceabout
general medical topics. The vocabulary used is readily
comprehensive when health seekers search for more
detailed information about a very specific topic. Due to
tremendous number of records have been accumulated in
their repositories and in most circumstances user may
directly locate good answers by searching rather than
waiting for experts to answer. However users with diverse
background do not necessary share same vocabulary, the
same question may be written in different native languages
which is difficult for other health seekers to understand to
bridge vocabulary gap corpus aware terminology is used.
2. RELATED WORKS
A. An Automated System for Conversion of Clinical Noteinto
SNOMED Clinical Terminology
SNOMED-CT consists of many medical concepts and
relationships. Identifies the medical concepts inthetext.It is
mainly used for medical data retreival. System mainly
comprises of three modules namely Augmented lexicon,
term compositor and negation detector. Augmented lexicon
traces the words that appears in the text and identifies the
concepts that are also in the SNOMED-CT. SNOMED-CT
descriptions are then made into atomic words. UMLS
Specialist lexicon performs the normalisation.
Normalisation involves the removal of stop words. A token
matching algorithm is used. It identifies the SNOMED-CT
description in the text and also retreives the related
descriptions from the data structure. Matching matrix is
used to identify the sequences. Negation identification is to
identify the negative terms. Number of negative terms are
present in clinical text . SNOMED-CTalsocontainsnumerous
negation terms. For each negative term in the text there is a
mapping in the SNOMED-CT. Mappingisperformed based on
the SNOMED-CT concept id. To detect other negative
concepts a simple rule based negation identifier is used.
Identifies negation terms of the form negation phrase
(SNOMED CT phrase)* (SNOMED CT phrase)* negation
phrase [2] SNOMED-CT consists of many qualifier values.
Qualifier modifies the medical concepts. Qualifier words are
separated from augmentedlexiconduringconceptmatching.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1954
B. Meeting Medical Terminology Needs The Ontology-
Enhanced Medical Concept Mapper
Concept mapper maps the semantically related terms to
the query. Helps user to access online medical information.
System integrates AZ noun phraser ,Unified Medical
Language System(UMLS), WordNet and Concept Space. AZ
noun phraser extracts the noun phrases from free text.
WordNet is a online accessibleontologyandconsistsofsetof
synonyms. For example, in WordNet, ”injection” has three
senses: ”injection as the forceful insertion of a substance
under pressure (no synonyms). injection as any solutions
that is injected (as into the skin) (synonym: injectant). and
injection as the act of putting a liquid into thebodybymeans
of a syringe (synonym: shot.”[3]. The UMLS involves the
Metathesaurus, the Semantic Net, the SPECIALIST Lexicon,
and the online Knowledge Sources. Metathesaurus is used
for synonyms. SPECIALIST Lexicon is integrated with AZ
noun phraser. DSP algorithm is used with Semantic Net.
Concept Mapper provides users with synonyms for the
query. System consists of three phases. AZ noun phraser
extracts the medical concepts from the query. In the second
phase synonyms are obtained from WordNet and the
Metathesaurus based on the query. In the third phase
related terms are obtained based on Concept Space and the
Semantic Net.
C. Medical coding classification by leveraging inter-code
relationships
Medical coding converts the information in the patient
medical records into codes. ICD-9 is used to code medical
records. Code is assigned to a patient when a patient is
provided with service and also after discharge the patient is
provided with a code. Multilabel large margin classifier is
used to study about the code structure.
D. Fast tagging of medical terms in legal text
Medical terms occurs in news, medical, legal text etc. this
involves a method that tags the medical terms and finds the
longest set of words with the medical terms. These set of
words are then converted into medical termhashkeys.Then
finds the concept id associated with the hash keys. This
system uses a probabilistic term classifier. A probabilistic
classifier is a classifier that is able to predict, given a sample
input, a probability distribution over a set of classes, rather
than only outputting the most likely class that the sample
should belong to. Probabilistic classifiers provide
classification with a degree of certainty.
E. A joint local-global approach for medical terminology
assignment
In community generated health services like Health Tap,
WebMD etc there is vocabulary gap between health seekers
and experts due to the non standardized terminologiesused
by the experts in their answers. Joint local and global
learning approach is used to label question answer pairs.
Local mining labels question answer pair by extracting
medical concepts. Local mining suffer from missing key
concepts. Global learning enhances local mining.
F. Exploiting medical hierarchies for concept-based
information retrieval
Keyword basedtechniquessometimesdoesnotidentifies the
medical terms. Concept based method overcomes the
problem of keyword based methods. In this the text
documents are turned into concepts as arranged in
SNOMED-CT. ’Bag of concepts’ representation of documents
is used. Terms are converted to concepts using natural
language processing tool.
G. Domain-Specific term extractionanditsapplicationintext
classification
Domain specific terms are identified using the statistical
method. Entropy impurity is used to measure the word
distribution in the domain. Normalisation process is added
to identify the more specific domain terms. Samples are
drawn from N samples to decidetheimpurity.Ifimpurityisa
lower value then samples are of one category. If impurity
gets maximum value then N categories are equally in all
samples. Popularmeasurementforimpuritymeasurementis
entropy impurity.
H. A Simple Algorithm for Identifying Negated Findings and
Diseases in Discharge Summaries
Narrative medical reports contain negative terms. A
algorithm NegEX is used to identify the negative terms.
NegEx consists of many phrases that filters the sentences
that contain negative terms. NegEx also controls the
negation phrases. Indexed clinical findings and diseases are
given as input to the algorithm. Output is the negatedphrase
in the findings. First the algorithm preprocesses the
sentences. Then the relevant phrases are indexed and the
applies the negation algorithm to it. Atlast compares the
negation phrases detected by the algorithm with the
negations identified by the physicians.
2. METHODOLOGY
A. LOCAL MINING
Local Mining involves a three stage framework. First stage
is the noun phrase extraction in which the noun phrase are
extracted. In the second stage the medical concepts are
detected using concept entropy impurity(CEI). CEI also
measures the specificity of that concept in the particular
domain. Finally normalisation is performed. Normalizes the
medical concepts based on the authenticated vocabulary.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1955
Fig -1: Local mining
1) NOUN PHRASE EXTRACTION:
Natural Language Processing is anupcomingfieldinthearea
of text mining. As text is an unstructured source of
information, to make it a suitable input to an automatic
method of information extraction it is usually transformed
into a structured format. Part of Speech Tagging is oneofthe
preprocessing steps which perform semantic analysis by
assigning one of the parts of speech to the given word. Part-
of-speech tagging (POS tagging or PoS tagging orPOST),also
called grammatical tagging or word-category
disambiguation, is the process of marking up a word in a
text. Part-of-speech tagging (tagging forshort)isthe process
of assigning a part-of speech marker to each word in an
input text. Because tags are generally also applied to
punctuation, tokenization is usually performed before, or as
part of, the tagging process: separating commas, quotation
marks, etc., from words and disambiguatingend-of-sentence
punctuation (period, question mark, etc.) frompart-of-word
punctuation (such as in abbreviations like e.g. and etc.).In
noun phase extraction, it takes the speech types part into
account. In this process many unwanted words are stopped
because that words are uninterestedmeaning.Toextractthe
noun phrases, speech tags are assigned by Stanford POS
tagger to every word of medical record given by the user.
Then pulls out the sequence of words that match with the
fixed pattern. The input to a tagging algorithm is a string of
words and a specified tagset. The output is a single best tag
for each word. The noun phrases should contain zero or
more adjectives or nouns, followed by an optional groupofa
noun and a preposition, followed all over again by zero or
more adjectives or nouns, followed by a single noun. To
make up a noun phrase, sequences of tags are matched in a
pattern. While there are many lists of parts-of-speech, most
modern language processing onEnglishusesthe45-tagPenn
Treebank tagset.
Fig -2: part of speech tagging
Fig -3: penn tree bank Tagset
Parts-of-speech are generally represented by placing the tag
after each word, delimited by a slash. For example,Takethat
Book.
VB DT NN (Tagged using Penn Tree Bank Tagset)
2) MEDICAL CONCEPT DETECTION:
Medical Concept Detection detects the medical conceptsand
differentiates it from other phrases. Concept Entropy
Impurity is used to analyses the specificity of the medical
concept. Larger CEI value indicates more relevant the
concept in that domain.
3) MEDICAL CONCEPT NORMALISATION:
Medical concepts may not be standard terminologies so it is
necessary to normalise the concepts based on a
authenticated vocabulary. Consider birth control as an
example it is not a standard terminology so it is necessary to
map it to contraception. Authenticated vocabulariesareICD,
SNOMED-CT, UMLS. SNOMED-CT provides the core general
terminologies. Local mining suffer from incompletenessdue
to the missing key concepts. Second problem is the lower
precision this is due to the irrelevant medical conceptsinthe
records.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1956
B. GLOBAL LEARNING
An enhanced and novel approach of global learning is
being built for enhancing the result of local coding.
1) RELATIONSHIP IDENTIFICATION: Inter-terminology
and Inter-expert relationshipsareanalysedfromthemedical
records.
2) INTER-TERMINOLOGY RELATIONSHIP: Terminologies
in SNOMED-CT are arranged in hierarchies. For example,
viral pneumonia is-an infectious pneumonia is-pneumonia
is-a lung disease. Terminologies may also have multiple
parents. For example, infectious pneumonia is also a childof
infectious disease[1]. This hierarchial representation
improves the coding.
3) INTER-EXPERT RELATIONSHIP: Analyses the historical
data of experts and checks whether the experts are in the
same or related area. Jaccard coefficient is used to analyse
the experts relationship .
Fig -4: medical terminology assignment scheme
3. CONCLUSION
The proposed approach consist of a combined approach
within the local mining and global learning, where the
corpus aware terminology is being used for making a
communication between the medical supportseekerandthe
medical care providers. The corpus terminology is having
the combined approaches of local mining and global
learning, where the approach of local mining undergoes
within the process of stemming, noun phrase extraction,
spell check, normalization and detection of medical concept.
The global learning maps the query against the indexed
document or keyword thatis relevanttothemedical records.
The query is being mapped within the local database and
health seekers. The output is being produced based on the
patients query.
REFERENCES
[1] Liqiang Nie, Yi-Liang Zhao, Mohammad Akbari, Jialie
Shen, and Tat-Seng Chua, Bridging the Vocabulary Gap
between Health Seekers and Healthcare Knowledge, IEEE
TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, VOL. 27, NO. 2, FEBRUARY 2015.
[2] G. Leroy and H. Chen, Meeting medical terminology
needs-the ontologyenhanced medical concept mapper,IEEE
Trans. Inf.Technol. Biomed.vol. 5, no. 4, pp. 261270, Dec.
2001.
[3] Y. Yan, G. Fung, J. G. Dy, and R. Rosales, Medical coding
classification by leveraging inter-code relationships,inProc.
ACMSIGKDD Int. Conf. Knowl. Discov. Data Mining, 2012,
pp.193202.
[4] S. V. Pakhomov, J. D. Buntrock, and C. G. Chute,
Automating theassignment of diagnosis codes to patient
encounters using example-based and machine learning
techniques, J. Amer. Med. Inf.Assoc., vol. 13, no. 5, pp.
516525, 2006.
[5] C. Dozier, R. Kondadadi, K. Al-Kofahi, M. Chaudhary, and
X. Guo, Fast tagging of medical terms in legal text, in Proc.
Int. Conf. Artif. Intell.Law, 2007, pp. 253260.
[6] M.-Y. Kim and R. Goebel, Detection and normalization of
medical terms using domain-specific term frequency and
adaptive ranking, in Proc. IEEE Int. Conf. Inf. Technol. Appl.
Biomed., 2010, pp. 15.
[7] S. Hina, E. Atwell, and O. Johnson, Semantic tagging of
medical narratives with top level conceptsfromSNOMED CT
healthcare data standard, Int. J. Intell. Comput. Res., vol. 2,
pp. 204210, 2010.
[8] H. Stenzhorn, E. Pacheco, P. Nohama, and S. Schulz,
Automatic mapping of clinical documentation to SNOMED
CT, Studies Health Technol. Inform., vol. 158, pp. 228232,
2009. . Intell. Comput. Res., vol. 2, pp. 204210, 2010.
[9] Y. Chen, Z. Chenqing, and K.-Y. Su, A joint model to
identify and align bilingual named entities, Comput.
Linguistics, vol. 39, no. 2, pp. 229266,2013.
[10] L. Nie, M. Akbari, T. Li, and T.-S. Chua, A joint local-
global approach for medical terminology assignment, in
Proc. Int. ACM SIGIR Conf.,2014.

More Related Content

PPTX
Aiding the Aid: Computational Early Clinical Diagnosis of Electronic Health R...
PDF
IRJET- A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-E...
PDF
[IJET-V2I3P19] Authors: Priyanka Sharma
PDF
Nlp based retrieval of medical information for diagnosis of human diseases
PDF
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
PDF
Performance analysis on secured data method in natural language steganography
PPTX
Data Mining in Rediology reports
PDF
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
Aiding the Aid: Computational Early Clinical Diagnosis of Electronic Health R...
IRJET- A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-E...
[IJET-V2I3P19] Authors: Priyanka Sharma
Nlp based retrieval of medical information for diagnosis of human diseases
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
Performance analysis on secured data method in natural language steganography
Data Mining in Rediology reports
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS

What's hot (20)

PDF
AUTOMATED PHARMACY
PDF
Controlling informative features for improved accuracy and faster predictions...
PDF
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
PDF
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
PDF
Ijarcet vol-3-issue-1-9-11
PDF
Phenoflow 2021
PPTX
2020.04.07 automated molecular design and the bradshaw platform webinar
PDF
Automatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
PDF
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...
PDF
Analysis of Opinionated Text for Opinion Mining
PDF
Drug Discovery and Development Using AI
PDF
V34132136
PDF
IRJET - Analysis of Paraphrase Detection using NLP Techniques
PDF
Chen2018.mac missing tag iceberg queries for multi category rfid system
PDF
Visualizing stemming techniques on online news articles text analytics
PDF
Document Classification Using Expectation Maximization with Semi Supervised L...
DOCX
PDF
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
PPTX
Knowledge-driven Implicit Information Extraction
PDF
Classification of Health Forum Messages using Deep Learning
AUTOMATED PHARMACY
Controlling informative features for improved accuracy and faster predictions...
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
Ijarcet vol-3-issue-1-9-11
Phenoflow 2021
2020.04.07 automated molecular design and the bradshaw platform webinar
Automatic Speech Recognition and Machine Learning for Robotic Arm in Surgery
A Novel Technique for Name Identification from Homeopathy Diagnosis Discussio...
Analysis of Opinionated Text for Opinion Mining
Drug Discovery and Development Using AI
V34132136
IRJET - Analysis of Paraphrase Detection using NLP Techniques
Chen2018.mac missing tag iceberg queries for multi category rfid system
Visualizing stemming techniques on online news articles text analytics
Document Classification Using Expectation Maximization with Semi Supervised L...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
Knowledge-driven Implicit Information Extraction
Classification of Health Forum Messages using Deep Learning
Ad

Similar to Local and Global Learning Method for Question Answering Approach (20)

PDF
Fundamentals of snomed ct
PDF
MIXHS12-Zhe
PDF
Understanding snomed ct
PPTX
Knowledge Discovery And Data Mining Of Free Text Final
PPSX
A Cognitive-Based Semantic Approach to Deep Content Analysis in Search Engines
PDF
Tutoriel ssmt
PDF
Instant Answering By Machine Learning Approach
PPTX
SNOMED CT-A technologist's Perspective
PPTX
Mining Electronic Health Records for Insights
PDF
Natural Language Processing for biomedical text mining - Thierry Hamon
PPTX
Search Tips
PDF
Automated handling CT using Snomed DL
PDF
Ontology oriented concept based clustering
PDF
Ontology oriented concept based clustering
PPTX
I know just what you mean - Ontologies and their uses
PPTX
Information extraction from EHR
PPTX
clear explnation about snomed ct , and different between snomed ct and icd 10
PPTX
0 An Introduction To Snomed Ct1
PDF
Turning Data into Knowledge - Semantic Technologies in Healthcare
PPTX
Mi224 snomed santiago & herber 022715
Fundamentals of snomed ct
MIXHS12-Zhe
Understanding snomed ct
Knowledge Discovery And Data Mining Of Free Text Final
A Cognitive-Based Semantic Approach to Deep Content Analysis in Search Engines
Tutoriel ssmt
Instant Answering By Machine Learning Approach
SNOMED CT-A technologist's Perspective
Mining Electronic Health Records for Insights
Natural Language Processing for biomedical text mining - Thierry Hamon
Search Tips
Automated handling CT using Snomed DL
Ontology oriented concept based clustering
Ontology oriented concept based clustering
I know just what you mean - Ontologies and their uses
Information extraction from EHR
clear explnation about snomed ct , and different between snomed ct and icd 10
0 An Introduction To Snomed Ct1
Turning Data into Knowledge - Semantic Technologies in Healthcare
Mi224 snomed santiago & herber 022715
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PDF
Well-logging-methods_new................
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Construction Project Organization Group 2.pptx
PPTX
Sustainable Sites - Green Building Construction
PPTX
Lecture Notes Electrical Wiring System Components
PDF
PPT on Performance Review to get promotions
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Welding lecture in detail for understanding
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
Project quality management in manufacturing
DOCX
573137875-Attendance-Management-System-original
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Well-logging-methods_new................
bas. eng. economics group 4 presentation 1.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Construction Project Organization Group 2.pptx
Sustainable Sites - Green Building Construction
Lecture Notes Electrical Wiring System Components
PPT on Performance Review to get promotions
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
UNIT 4 Total Quality Management .pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Welding lecture in detail for understanding
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Project quality management in manufacturing
573137875-Attendance-Management-System-original
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...

Local and Global Learning Method for Question Answering Approach

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1953 LOCAL AND GLOBAL LEARNING METHOD FOR QUESTION ANSWERING APPROACH RADHIKA S M1, SYAMA R2 1SCT college of engineering, papanamcode, Trivandrum 2Assistant Professor, Dept. of computer science and engineering, Sct college of engineering, Papanamcode, Trivandrum ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Vocabulary gap between healthseekersandhealth care experts are more prevalent in health care domain. Different health seekers describe their questions in different ways and answers provided by the experts may contain non standardised terminologies. To overcome the vocabularygap, a new scheme is used which combines two approachesnamely local mining and global learning. Local mining extracts medical concepts from medical records and then map them to normalised terminologies based on standardized dictionary. Local mining suffer from problem of missing key concepts. Global learning overcome the issues in local mining by finding the missing key concepts. Key Words: Local mining, Global learning, SNOMED-CT, UMLS. 1.INTRODUCTION Patients seeking online information about their health, connecting patients with doctors world wide to know about their health via question and answering. Doctors able to interact with many patients about particular issue and provides instant trusted answers for complex and sophisticated problems. Previously external dictionary is used to relate medical data which was not that much sufficient enough. Here we incorporate corpus aware terminology which is used to relate the natural language medical data with medical terminology this narrow down the path between health seekers and health providers. For example: heart attack can also be said as myocardial disorder. A tri-stage framework is used to accomplish the task. _ Noun phase extraction _ Medical concept detection _ Medical concept normalization Due to loss of information global learning approach is used to complement local mining approach. Medical sites are among the most popular internet sites today through which people can get more knowledge about their health conditions. The practice of medicine is experiencing a shift from patients who passively accept their doctors orders to patients who actively took online information to know briefly about their health becausedoctors arevery busywith many patients and hence they cannot give brief description about their health issue to each and every patient. Thisisthe reasons why health seekers normally use online medical sites. Most of the medical sitessuchasmayoclinic,Medscape are consumer oriented and providetheirsoundadviceabout general medical topics. The vocabulary used is readily comprehensive when health seekers search for more detailed information about a very specific topic. Due to tremendous number of records have been accumulated in their repositories and in most circumstances user may directly locate good answers by searching rather than waiting for experts to answer. However users with diverse background do not necessary share same vocabulary, the same question may be written in different native languages which is difficult for other health seekers to understand to bridge vocabulary gap corpus aware terminology is used. 2. RELATED WORKS A. An Automated System for Conversion of Clinical Noteinto SNOMED Clinical Terminology SNOMED-CT consists of many medical concepts and relationships. Identifies the medical concepts inthetext.It is mainly used for medical data retreival. System mainly comprises of three modules namely Augmented lexicon, term compositor and negation detector. Augmented lexicon traces the words that appears in the text and identifies the concepts that are also in the SNOMED-CT. SNOMED-CT descriptions are then made into atomic words. UMLS Specialist lexicon performs the normalisation. Normalisation involves the removal of stop words. A token matching algorithm is used. It identifies the SNOMED-CT description in the text and also retreives the related descriptions from the data structure. Matching matrix is used to identify the sequences. Negation identification is to identify the negative terms. Number of negative terms are present in clinical text . SNOMED-CTalsocontainsnumerous negation terms. For each negative term in the text there is a mapping in the SNOMED-CT. Mappingisperformed based on the SNOMED-CT concept id. To detect other negative concepts a simple rule based negation identifier is used. Identifies negation terms of the form negation phrase (SNOMED CT phrase)* (SNOMED CT phrase)* negation phrase [2] SNOMED-CT consists of many qualifier values. Qualifier modifies the medical concepts. Qualifier words are separated from augmentedlexiconduringconceptmatching.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1954 B. Meeting Medical Terminology Needs The Ontology- Enhanced Medical Concept Mapper Concept mapper maps the semantically related terms to the query. Helps user to access online medical information. System integrates AZ noun phraser ,Unified Medical Language System(UMLS), WordNet and Concept Space. AZ noun phraser extracts the noun phrases from free text. WordNet is a online accessibleontologyandconsistsofsetof synonyms. For example, in WordNet, ”injection” has three senses: ”injection as the forceful insertion of a substance under pressure (no synonyms). injection as any solutions that is injected (as into the skin) (synonym: injectant). and injection as the act of putting a liquid into thebodybymeans of a syringe (synonym: shot.”[3]. The UMLS involves the Metathesaurus, the Semantic Net, the SPECIALIST Lexicon, and the online Knowledge Sources. Metathesaurus is used for synonyms. SPECIALIST Lexicon is integrated with AZ noun phraser. DSP algorithm is used with Semantic Net. Concept Mapper provides users with synonyms for the query. System consists of three phases. AZ noun phraser extracts the medical concepts from the query. In the second phase synonyms are obtained from WordNet and the Metathesaurus based on the query. In the third phase related terms are obtained based on Concept Space and the Semantic Net. C. Medical coding classification by leveraging inter-code relationships Medical coding converts the information in the patient medical records into codes. ICD-9 is used to code medical records. Code is assigned to a patient when a patient is provided with service and also after discharge the patient is provided with a code. Multilabel large margin classifier is used to study about the code structure. D. Fast tagging of medical terms in legal text Medical terms occurs in news, medical, legal text etc. this involves a method that tags the medical terms and finds the longest set of words with the medical terms. These set of words are then converted into medical termhashkeys.Then finds the concept id associated with the hash keys. This system uses a probabilistic term classifier. A probabilistic classifier is a classifier that is able to predict, given a sample input, a probability distribution over a set of classes, rather than only outputting the most likely class that the sample should belong to. Probabilistic classifiers provide classification with a degree of certainty. E. A joint local-global approach for medical terminology assignment In community generated health services like Health Tap, WebMD etc there is vocabulary gap between health seekers and experts due to the non standardized terminologiesused by the experts in their answers. Joint local and global learning approach is used to label question answer pairs. Local mining labels question answer pair by extracting medical concepts. Local mining suffer from missing key concepts. Global learning enhances local mining. F. Exploiting medical hierarchies for concept-based information retrieval Keyword basedtechniquessometimesdoesnotidentifies the medical terms. Concept based method overcomes the problem of keyword based methods. In this the text documents are turned into concepts as arranged in SNOMED-CT. ’Bag of concepts’ representation of documents is used. Terms are converted to concepts using natural language processing tool. G. Domain-Specific term extractionanditsapplicationintext classification Domain specific terms are identified using the statistical method. Entropy impurity is used to measure the word distribution in the domain. Normalisation process is added to identify the more specific domain terms. Samples are drawn from N samples to decidetheimpurity.Ifimpurityisa lower value then samples are of one category. If impurity gets maximum value then N categories are equally in all samples. Popularmeasurementforimpuritymeasurementis entropy impurity. H. A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries Narrative medical reports contain negative terms. A algorithm NegEX is used to identify the negative terms. NegEx consists of many phrases that filters the sentences that contain negative terms. NegEx also controls the negation phrases. Indexed clinical findings and diseases are given as input to the algorithm. Output is the negatedphrase in the findings. First the algorithm preprocesses the sentences. Then the relevant phrases are indexed and the applies the negation algorithm to it. Atlast compares the negation phrases detected by the algorithm with the negations identified by the physicians. 2. METHODOLOGY A. LOCAL MINING Local Mining involves a three stage framework. First stage is the noun phrase extraction in which the noun phrase are extracted. In the second stage the medical concepts are detected using concept entropy impurity(CEI). CEI also measures the specificity of that concept in the particular domain. Finally normalisation is performed. Normalizes the medical concepts based on the authenticated vocabulary.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1955 Fig -1: Local mining 1) NOUN PHRASE EXTRACTION: Natural Language Processing is anupcomingfieldinthearea of text mining. As text is an unstructured source of information, to make it a suitable input to an automatic method of information extraction it is usually transformed into a structured format. Part of Speech Tagging is oneofthe preprocessing steps which perform semantic analysis by assigning one of the parts of speech to the given word. Part- of-speech tagging (POS tagging or PoS tagging orPOST),also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text. Part-of-speech tagging (tagging forshort)isthe process of assigning a part-of speech marker to each word in an input text. Because tags are generally also applied to punctuation, tokenization is usually performed before, or as part of, the tagging process: separating commas, quotation marks, etc., from words and disambiguatingend-of-sentence punctuation (period, question mark, etc.) frompart-of-word punctuation (such as in abbreviations like e.g. and etc.).In noun phase extraction, it takes the speech types part into account. In this process many unwanted words are stopped because that words are uninterestedmeaning.Toextractthe noun phrases, speech tags are assigned by Stanford POS tagger to every word of medical record given by the user. Then pulls out the sequence of words that match with the fixed pattern. The input to a tagging algorithm is a string of words and a specified tagset. The output is a single best tag for each word. The noun phrases should contain zero or more adjectives or nouns, followed by an optional groupofa noun and a preposition, followed all over again by zero or more adjectives or nouns, followed by a single noun. To make up a noun phrase, sequences of tags are matched in a pattern. While there are many lists of parts-of-speech, most modern language processing onEnglishusesthe45-tagPenn Treebank tagset. Fig -2: part of speech tagging Fig -3: penn tree bank Tagset Parts-of-speech are generally represented by placing the tag after each word, delimited by a slash. For example,Takethat Book. VB DT NN (Tagged using Penn Tree Bank Tagset) 2) MEDICAL CONCEPT DETECTION: Medical Concept Detection detects the medical conceptsand differentiates it from other phrases. Concept Entropy Impurity is used to analyses the specificity of the medical concept. Larger CEI value indicates more relevant the concept in that domain. 3) MEDICAL CONCEPT NORMALISATION: Medical concepts may not be standard terminologies so it is necessary to normalise the concepts based on a authenticated vocabulary. Consider birth control as an example it is not a standard terminology so it is necessary to map it to contraception. Authenticated vocabulariesareICD, SNOMED-CT, UMLS. SNOMED-CT provides the core general terminologies. Local mining suffer from incompletenessdue to the missing key concepts. Second problem is the lower precision this is due to the irrelevant medical conceptsinthe records.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1956 B. GLOBAL LEARNING An enhanced and novel approach of global learning is being built for enhancing the result of local coding. 1) RELATIONSHIP IDENTIFICATION: Inter-terminology and Inter-expert relationshipsareanalysedfromthemedical records. 2) INTER-TERMINOLOGY RELATIONSHIP: Terminologies in SNOMED-CT are arranged in hierarchies. For example, viral pneumonia is-an infectious pneumonia is-pneumonia is-a lung disease. Terminologies may also have multiple parents. For example, infectious pneumonia is also a childof infectious disease[1]. This hierarchial representation improves the coding. 3) INTER-EXPERT RELATIONSHIP: Analyses the historical data of experts and checks whether the experts are in the same or related area. Jaccard coefficient is used to analyse the experts relationship . Fig -4: medical terminology assignment scheme 3. CONCLUSION The proposed approach consist of a combined approach within the local mining and global learning, where the corpus aware terminology is being used for making a communication between the medical supportseekerandthe medical care providers. The corpus terminology is having the combined approaches of local mining and global learning, where the approach of local mining undergoes within the process of stemming, noun phrase extraction, spell check, normalization and detection of medical concept. The global learning maps the query against the indexed document or keyword thatis relevanttothemedical records. The query is being mapped within the local database and health seekers. The output is being produced based on the patients query. REFERENCES [1] Liqiang Nie, Yi-Liang Zhao, Mohammad Akbari, Jialie Shen, and Tat-Seng Chua, Bridging the Vocabulary Gap between Health Seekers and Healthcare Knowledge, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 2, FEBRUARY 2015. [2] G. Leroy and H. Chen, Meeting medical terminology needs-the ontologyenhanced medical concept mapper,IEEE Trans. Inf.Technol. Biomed.vol. 5, no. 4, pp. 261270, Dec. 2001. [3] Y. Yan, G. Fung, J. G. Dy, and R. Rosales, Medical coding classification by leveraging inter-code relationships,inProc. ACMSIGKDD Int. Conf. Knowl. Discov. Data Mining, 2012, pp.193202. [4] S. V. Pakhomov, J. D. Buntrock, and C. G. Chute, Automating theassignment of diagnosis codes to patient encounters using example-based and machine learning techniques, J. Amer. Med. Inf.Assoc., vol. 13, no. 5, pp. 516525, 2006. [5] C. Dozier, R. Kondadadi, K. Al-Kofahi, M. Chaudhary, and X. Guo, Fast tagging of medical terms in legal text, in Proc. Int. Conf. Artif. Intell.Law, 2007, pp. 253260. [6] M.-Y. Kim and R. Goebel, Detection and normalization of medical terms using domain-specific term frequency and adaptive ranking, in Proc. IEEE Int. Conf. Inf. Technol. Appl. Biomed., 2010, pp. 15. [7] S. Hina, E. Atwell, and O. Johnson, Semantic tagging of medical narratives with top level conceptsfromSNOMED CT healthcare data standard, Int. J. Intell. Comput. Res., vol. 2, pp. 204210, 2010. [8] H. Stenzhorn, E. Pacheco, P. Nohama, and S. Schulz, Automatic mapping of clinical documentation to SNOMED CT, Studies Health Technol. Inform., vol. 158, pp. 228232, 2009. . Intell. Comput. Res., vol. 2, pp. 204210, 2010. [9] Y. Chen, Z. Chenqing, and K.-Y. Su, A joint model to identify and align bilingual named entities, Comput. Linguistics, vol. 39, no. 2, pp. 229266,2013. [10] L. Nie, M. Akbari, T. Li, and T.-S. Chua, A joint local- global approach for medical terminology assignment, in Proc. Int. ACM SIGIR Conf.,2014.