RelationExtractionfromBiologicalText
Dialekti Valsamou, Claire Nedellec and the Bibliome Team @ MIG, INRA
dialekti.valsamou@jouy.inra.fr, claire.nedellec@jouy.inra.fr
Introduction
Information Extraction is the extraction of
meaningful structured information from text. This
can be divided in three tasks: a) named entity
recognition (NER) b) anaphora resolution and c)
relation (or event) extraction (RE).
Relation Extraction is the problem of detecting
and classifying the existence of a relation between
entities in text. Approaches vary from simple
pattern matching [3][1] to more sophisticated ones.
Machine Learning seems to be indispensable
for the task of RE and there exist methods that
employ kernel-based algorithms [12][6], (logistic)
regression [7][9] or even neural networks [2].
The features used vary as well: sequences or subse-
quences [5][4], syntactic parse trees [8], dependency
graphs [6], convolution trees [13] and shallow
parsing [12] are some important examples.
An example for Genic Interactions
In the example above, we are trying to detect a relation of type
’Interaction’. The very first approach would be to use a bag of words, or a
slightly more sophisticated solution would look for k-subsequences.
We have adapted two sophisticated approaches using syntactic and
semantic information.
Bag-of-words Subsequences
Performance on the LLL Corpus
The LLL corpus [10] provides a good benchmark for relation extraction methods. The topic is genic
interaction, just like the examples. We tried the two approaches presented here and got encouraging results.
Here’s a table of the F-measure (10-fold cross-validation).
String Kernel
Linguistic Annotation
none auto manual
Sem.Classes
none 52.2 ± 3.1 64.4 ± 1.8 69.0 ± 2.3
manual 52.4 ± 3.7 68.4 ± 2.3 75.4 ± 2.6
Global Alignment Kernel
Linguistic Annotation
auto manual
Sem.Classes
none 61.0 ± 4.1 77.0 ± 2.4
manual 59.4 ± 5.4 79.1 ± 2.8
Dependency Graphs
Using the parsing information we can try and build
a dependency graph on the sentence that contains
candidate arguments.
The dependency graph for our example:
Goal: discover a connection between the two
arguments: a path in this graph that connects the
corresponding nodes.
⇒ the shortest path in the dependency graph,
used as we’d use a sequence
Dependency Graphs Kernel Learning
Features: The shortest path between the argu-
ments in the dependency graph of each sentence.
Algorithm: a Support Vector Machine or any other
kernel method
Global Alignment Kernel
Idea: use the “edit distance” of two sentences as a
kernel function. How?
⇒ Find the global alignment between them:
Similarity score: the optimal alignment score
given a substitution function and a gap penalty.
Substitution cost: Minimum (zero) if the elements
belong to the same semantic class (ex. activate-
control), medium if they share the same POS tag
and high otherwise.
Gap Penalty: Empirically shown that lower values
produce better results.
Algorithm: a Support Vector Machine or any other
kernel method
What Information to Use?
In recent years the linguistic analysis tools at our
disposal have become more and more efficient, al-
lowing us to obtain better results by using deeper
analysis. We call this information that we add to
the original text data, an annotation and it can be
obtained either manually or, ideally, automatically.
The levels are:
• Lexical (with possible lemmatisation), ex.
Bag-of-words, word n-grams etc
• Morpho-syntactic, ex. Part-of-Speech (POS)
tagging
• Parsing, ex. Dependency or constituency
graphs (paths, trees, etc)
• Semantic, ex. the use of semantic classes
In both of the algorithms presented in this poster,
performance improved considerably when using syn-
tactic and/or semantic information.This was made
possible by the AlvisNLP pipeline developped by
our lab.
Distributed Semantics: Unsupervised learning
of semantically close words from entire document
collections in order to form classes.
String Kernel Comparison
Precision/Recall graph for a
simple string kernel (bag of
words) and the shortest path
on dependency graphs version.
References
[1] E. Agichtein and L. Gravano. Snowball: Extracting relations from
large plain-text collections. 2000.
[2] T. Barnickel, J. Weston, R. Collobert, H. Mewes, and V. St¨umpflen.
Large scale application of neural network based semantic role labeling
for automated relation extraction from biomedical texts. 2009.
[3] S. Brin. Extracting patterns and relations from the world wide web.
1999.
[4] R. Bunescu and R. Mooney. Subsequence kernels for relation extrac-
tion. 2006.
[5] A. Culotta, A. McCallum, and J. Betz. Integrating probabilistic ex-
traction models and data mining to discover relations and patterns in
text. 2006.
[6] A. Culotta and J. Sorensen. Dependency tree kernels for relation ex-
traction. 2004.
[7] N. Kambhatla. Combining lexical, syntactic, and semantic features
with maximum entropy models for extracting relations. 2004.
[8] Y. Liu, Z. Shi, and A. Sarkar. Exploiting rich syntactic information
for relation extraction from biomedical articles. 2007.
[9] M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for
relation extraction without labeled data. 2009.
[10] C. N´edellec. Learning language in logic-genic interaction extraction
challenge. 2005.
[11] S. Riedel, L. Yao, and A. McCallum. Collective cross-document rela-
tion extraction without labelled data. 2010.
[12] D. Zelenko, C. Aone, and A. Richardella. Kernel methods for relation
extraction. 2003.
[13] M. Zhang, J. Zhang, and J. Su. Exploring syntactic features for rela-
tion extraction using a convolution tree kernel. 2006.
Future Work
This work is continuously being improved on all as-
pects (linguistic parsing, semantic classes, etc). We
are also focusing on the fact that when dealing with
data of biological nature,
• it is hard to engage experts in the tedious and
time consuming task of manual annotation
• but, there exists an abundance of databases
Distant supervision: Project structured relation
data onto text documents in order to produce posi-
tive and negative examples [11].
⇒ Pre-annotate examples for the experts to con-
firm, creating larger datasets that allow for general-
ization.

More Related Content

PPTX
Ontology-based Data Integration
PDF
Blei ngjordan2003
PPTX
Ontology integration - Heterogeneity, Techniques and more
PPTX
Ontology For Data Integration
PPTX
ontology based- data_integration.
PPT
Data Integration Ontology Mapping
PDF
Hyponymy extraction of domain ontology
PPTX
Ontology mapping for the semantic web
Ontology-based Data Integration
Blei ngjordan2003
Ontology integration - Heterogeneity, Techniques and more
Ontology For Data Integration
ontology based- data_integration.
Data Integration Ontology Mapping
Hyponymy extraction of domain ontology
Ontology mapping for the semantic web

What's hot (18)

PDF
Topic models
PDF
Topicmodels
PDF
Canini09a
PDF
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
PPT
Information Retrieval Models
PDF
Learning ontologies
PDF
Ontology Mapping
PDF
EXTRACTING ARABIC RELATIONS FROM THE WEB
PDF
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
PDF
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
PDF
Complex Relations Extraction
PDF
Identifying the semantic relations on
PDF
Blei lafferty2009
PPT
Ontology Mapping
PDF
Textual Data Partitioning with Relationship and Discriminative Analysis
PDF
Ijetcas14 624
PDF
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
PDF
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
Topic models
Topicmodels
Canini09a
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
Information Retrieval Models
Learning ontologies
Ontology Mapping
EXTRACTING ARABIC RELATIONS FROM THE WEB
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
Complex Relations Extraction
Identifying the semantic relations on
Blei lafferty2009
Ontology Mapping
Textual Data Partitioning with Relationship and Discriminative Analysis
Ijetcas14 624
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
Ad

Similar to mlss (20)

PDF
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
PDF
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
PDF
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
PDF
IRJET- An Analysis of Recent Advancements on the Dependency Parser
PDF
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
PDF
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
PDF
A Survey on Unsupervised Graph-based Word Sense Disambiguation
PDF
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
PDF
Automatically converting tabular data to
PDF
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
PDF
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
PDF
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
PDF
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
PDF
A semantic framework and software design to enable the transparent integratio...
PPTX
Discover How Scientific Data is Used for the Public Good with Natural Languag...
PDF
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
PPTX
Higher-order spectral graph clustering with motifs
PDF
G04124041046
PDF
AdaptivesequencingusingnanoporesanddeeplearningofmitochondrialDNA
PDF
International Journal of Computer Science, Engineering and Information Techno...
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IRJET- An Analysis of Recent Advancements on the Dependency Parser
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
A Survey on Unsupervised Graph-based Word Sense Disambiguation
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
Automatically converting tabular data to
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
A semantic framework and software design to enable the transparent integratio...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
Higher-order spectral graph clustering with motifs
G04124041046
AdaptivesequencingusingnanoporesanddeeplearningofmitochondrialDNA
International Journal of Computer Science, Engineering and Information Techno...
Ad

mlss

  • 1. RelationExtractionfromBiologicalText Dialekti Valsamou, Claire Nedellec and the Bibliome Team @ MIG, INRA dialekti.valsamou@jouy.inra.fr, claire.nedellec@jouy.inra.fr Introduction Information Extraction is the extraction of meaningful structured information from text. This can be divided in three tasks: a) named entity recognition (NER) b) anaphora resolution and c) relation (or event) extraction (RE). Relation Extraction is the problem of detecting and classifying the existence of a relation between entities in text. Approaches vary from simple pattern matching [3][1] to more sophisticated ones. Machine Learning seems to be indispensable for the task of RE and there exist methods that employ kernel-based algorithms [12][6], (logistic) regression [7][9] or even neural networks [2]. The features used vary as well: sequences or subse- quences [5][4], syntactic parse trees [8], dependency graphs [6], convolution trees [13] and shallow parsing [12] are some important examples. An example for Genic Interactions In the example above, we are trying to detect a relation of type ’Interaction’. The very first approach would be to use a bag of words, or a slightly more sophisticated solution would look for k-subsequences. We have adapted two sophisticated approaches using syntactic and semantic information. Bag-of-words Subsequences Performance on the LLL Corpus The LLL corpus [10] provides a good benchmark for relation extraction methods. The topic is genic interaction, just like the examples. We tried the two approaches presented here and got encouraging results. Here’s a table of the F-measure (10-fold cross-validation). String Kernel Linguistic Annotation none auto manual Sem.Classes none 52.2 ± 3.1 64.4 ± 1.8 69.0 ± 2.3 manual 52.4 ± 3.7 68.4 ± 2.3 75.4 ± 2.6 Global Alignment Kernel Linguistic Annotation auto manual Sem.Classes none 61.0 ± 4.1 77.0 ± 2.4 manual 59.4 ± 5.4 79.1 ± 2.8 Dependency Graphs Using the parsing information we can try and build a dependency graph on the sentence that contains candidate arguments. The dependency graph for our example: Goal: discover a connection between the two arguments: a path in this graph that connects the corresponding nodes. ⇒ the shortest path in the dependency graph, used as we’d use a sequence Dependency Graphs Kernel Learning Features: The shortest path between the argu- ments in the dependency graph of each sentence. Algorithm: a Support Vector Machine or any other kernel method Global Alignment Kernel Idea: use the “edit distance” of two sentences as a kernel function. How? ⇒ Find the global alignment between them: Similarity score: the optimal alignment score given a substitution function and a gap penalty. Substitution cost: Minimum (zero) if the elements belong to the same semantic class (ex. activate- control), medium if they share the same POS tag and high otherwise. Gap Penalty: Empirically shown that lower values produce better results. Algorithm: a Support Vector Machine or any other kernel method What Information to Use? In recent years the linguistic analysis tools at our disposal have become more and more efficient, al- lowing us to obtain better results by using deeper analysis. We call this information that we add to the original text data, an annotation and it can be obtained either manually or, ideally, automatically. The levels are: • Lexical (with possible lemmatisation), ex. Bag-of-words, word n-grams etc • Morpho-syntactic, ex. Part-of-Speech (POS) tagging • Parsing, ex. Dependency or constituency graphs (paths, trees, etc) • Semantic, ex. the use of semantic classes In both of the algorithms presented in this poster, performance improved considerably when using syn- tactic and/or semantic information.This was made possible by the AlvisNLP pipeline developped by our lab. Distributed Semantics: Unsupervised learning of semantically close words from entire document collections in order to form classes. String Kernel Comparison Precision/Recall graph for a simple string kernel (bag of words) and the shortest path on dependency graphs version. References [1] E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. 2000. [2] T. Barnickel, J. Weston, R. Collobert, H. Mewes, and V. St¨umpflen. Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts. 2009. [3] S. Brin. Extracting patterns and relations from the world wide web. 1999. [4] R. Bunescu and R. Mooney. Subsequence kernels for relation extrac- tion. 2006. [5] A. Culotta, A. McCallum, and J. Betz. Integrating probabilistic ex- traction models and data mining to discover relations and patterns in text. 2006. [6] A. Culotta and J. Sorensen. Dependency tree kernels for relation ex- traction. 2004. [7] N. Kambhatla. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. 2004. [8] Y. Liu, Z. Shi, and A. Sarkar. Exploiting rich syntactic information for relation extraction from biomedical articles. 2007. [9] M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. 2009. [10] C. N´edellec. Learning language in logic-genic interaction extraction challenge. 2005. [11] S. Riedel, L. Yao, and A. McCallum. Collective cross-document rela- tion extraction without labelled data. 2010. [12] D. Zelenko, C. Aone, and A. Richardella. Kernel methods for relation extraction. 2003. [13] M. Zhang, J. Zhang, and J. Su. Exploring syntactic features for rela- tion extraction using a convolution tree kernel. 2006. Future Work This work is continuously being improved on all as- pects (linguistic parsing, semantic classes, etc). We are also focusing on the fact that when dealing with data of biological nature, • it is hard to engage experts in the tedious and time consuming task of manual annotation • but, there exists an abundance of databases Distant supervision: Project structured relation data onto text documents in order to produce posi- tive and negative examples [11]. ⇒ Pre-annotate examples for the experts to con- firm, creating larger datasets that allow for general- ization.