SlideShare a Scribd company logo
+                             Computer-              Query user-
                                aided                  defined
                            summarization             expansion




                             Post-retrieval          Extractive
                              clustering           Summarization




    Experiences on integrating explicit knowledge on
    information access tools in the medical domain

                                 Manuel de la Villa
                                 Department of Information Technologies
                                 University of Huelva
+                                                                                   2

    Index

      Brief     CV
           Why a research stay? In Wolverhampton?
           Teaching

     Integrating  explicit knowledge on information
       access tools
        Knowledge  sources (UMLS & Freebase)
        Automatic Text Summarization
        Information Retrieval




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   3

    Brief CV




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   6

    Teaching experience


     Software                Engineering
        Process and Methodologies, Metrics,
         Requirements analysis, Design, …
        Software Engineering Lab (UML, NetBeans,
         Subversion, Java, JUnit, Persistence…)

     Multimedia  applications development
        Adobe Director, Flash, Photoshop, Premiere
        Sony Sound Forge, Audacity



Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   7

    Knowledge integration




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+ Specific Domain Knowledge source. UMLS (I)                                                                8




                                                                       ICD-10
                                                                                            LOINC

                                                              SNOMED-CT                     UK-Clinical Terms
                   UMLS                                                              MeSH
                                                             DSM-IV
                                                                                             …
                                                       Gene Ontology                   RxNorm


An homogeneus group of terminologies                                 A saturation of different terminologies

 UMLS aims to overcome a significant barrier, the variety of
 ways the same concepts are expressed in different
 machine-readable sources.
 Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+ Specific Domain Knowledge source. UMLS (II)                                                  9




    Project NLM Unified Medical Language System (UMLS):

        Aim, to develop tools that help researchers in the knowledge
         representation, retrieval and integration of biomedical information.
           UMLS Knowledge Sources ‫‏‬

             Software tools
    Three main components:

    SPECIALIST Lexicon: Compilation of lexical elements (>200.000) with grammatical
    information and linguistic variants.

   “Anaesthetic”                                              “Anaesthetic”
  {base=anesthetic                                           {base=anesthetic
  spelling_variant=anaesthetic                               spelling_variant=anaesthetic
  entry=E0330018 cat=noun                                    entry=E0330019 cat=adj
  variants=reg variants=uncount }                            variants=inv position=attrib(3)
                                                             position=pred stative }
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+ Specific Domain Knowledge source. UMLS (III)                                      10




     Metathesaurus: very    large, multi-purpose, and multi-lingual
        vocabulary database (compiles more than 100 source
        vocabularios),      https://uts.nlm.nih.gov/metathesaurus.html
     every   term (>5M) associated with a concept (>1.5M), terms
        related (e.g., synonyms) (16M relations)

       each concept assigned to one or more semantic types of the 135
        existing
                   Different terms…



             for a same concept…



   Included in a semantic type
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+ Specific Domain Knowledge source. UMLS (IV)                                       11




                                     https://uts.nlm.nih.gov/semanticnetwork.html

    UMLS   Semantic Network: is an ontology with 135
       semantic types and to 54 types of relationships
       between types




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+ General Domain Knowledge Source: Freebase (I)


       Freebase is a large public database that collects three kinds of
       information:
        data;

        texts ; and
        media   , that references…
      …entities or topics (≈ 12 million). An entity is a unique single person,
       place, or thing.
          A single concept or real-world thing.
          A topic could also be called an entity, resource or element or thing, it is a
            fundamental unit in Freebase.
          /common/topic
          Each topic has a Guid or globally unique ID
             http://www.freebase.com/view/en/barack_obama
             http://www.freebase.com/guid/9202a8c04000641f800000000029c277
+ General Domain Knowledge Source: Freebase (II)
     Freebase connects entities together as a graph,
       defines  its data structure as a set of nodes and a set
        of links that establish relationships between the
        nodes.
     Most of our topics are associated with one or more types (such as
      people, places, books, films, etc) and may have additional
      properties like "date of birth" for a person or latitude and
      longitude for a location. These types and properties and related
      concepts are called Schema.
+ General Domain Knowledge Source: Freebase (III)
  The Schema
  Schema (the way Freebase's data is laid out) is expressed through
  Types and Properties. Types are grouped together in Domains.
+ General Domain Knowledge Source: Freebase (III)
  The Schema
  Schema (the way Freebase's data is laid out) is expressed through
  Types and Properties. Types are grouped together in Domains.
+ General Domain Knowledge Source: Freebase (III)
  The Schema
  Schema (the way Freebase's data is laid out) is expressed through
  Types and Properties. Types are grouped together in Domains.
+ General Domain Knowledge Source: Freebase (III)
  The Schema
  Schema (the way Freebase's data is laid out) is expressed through
  Types and Properties. Types are grouped together in Domains.
+ General Domain Knowledge Source: Freebase (IV)
  The Schema: Medicine
+ General Domain Knowledge Source: Freebase (V)
  How can we use it…


      As a reference or information source
       Create interesting Views and Visualizations and
       share them with others
      Embed Freebase data in your website

      Use our API or Acre, our hosted app development
       platform, to build apps that use Freebase data
      Download our Data dumps

   Use    Freebase's RDF for Semantic Web applications
+ General Domain Knowledge Source: Freebase (IV)
  The Freebase approach
+ MQL (Metaweb Query Language)
•  http://api.freebase.com/api/service/mqlread?query={"query":{"type":"/
   music/artist","name":"U2","album":[]}}
•  http://api.freebase.com/api/service/mqlread?query={"query":
   [{"type":"/medicine/disease", "name":null, "symptoms":
   {"name":"Nausea"}}]}
•  Query Editor
+                                                                                   22

    Knowledge integration




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   23

    Experiences in Automatic summarization (I)

+ We develop a proposal with this main
 characteristics:
             Sentences extraction
             Document representation as a graph
             Centered on biomedical concepts
             Using concept frequency to measure relevance


Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   24

    Experiences in Automatic summarization (II)

    + Phase I: Graph generation
       Sentences and UMLS concepts identification
    + Phase II: Similarity algorithm
       Concepts overlapping between sentences
       (edges) means “recommendation”
    + Phase III: Ranking algorithm
       Weight associated with each edge depends on
       similarity
    + Phase IV: Summary building
       Top ranked sentences are selected
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   25

    Experiences in Automatic summarization (II)

    + Phase I: Graph generation
       Sentences and UMLS concepts identification
    + Phase II: Similarity algorithm
       Concepts overlapping between sentences
       (edges) means “recommendation”
    + Phase III: Ranking algorithm
       Weight associated with each edge depends on
       similarity
    + Phase IV: Summary building
       Top ranked sentences are selected
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   26

    Experiences in Automatic summarization (II)

    + Phase I: Graph generation
       Sentences and UMLS concepts identification
    + Phase II: Similarity algorithm
       Concepts overlapping between sentences
       (edges) means “recommendation”
    + Phase III: Ranking algorithm
       Weight associated with each edge depends on
       similarity
    + Phase IV: Summary building
       Top ranked sentences are selected
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   27

    Experiences in Automatic summarization (II)

    + Phase I: Graph generation
       Sentences and UMLS concepts identification
    + Phase II: Similarity algorithm
       Concepts overlapping between sentences
       (edges) means “recommendation”
    + Phase III: Ranking algorithm
       Weight associated with each edge depends on
       similarity
    + Phase IV: Summary building
       Top ranked sentences are selected
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   28

    Experiences in Automatic summarization (II)

    + Phase I: Graph generation
       Sentences and UMLS concepts identification
    + Phase II: Similarity algorithm
       Concepts overlapping between sentences
       (edges) means “recommendation”
    + Phase III: Ranking algorithm
       Weight associated with each edge depends on
       similarity
    + Phase IV: Summary building
       Top ranked sentences are selected
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                                                    29

    Automatic Summarization. Evaluation




      Evaluation
                with ROUGE (based on n-grams) against generic
       summarizers
           Our method obtains good results, specially with small n-grams
                                                               de la Villa, M., Maña, M.
                                                               “Propuesta y evaluación de un método de generación de
                                                               resúmenes extractivo basado en conceptos en el ámbito
                                                               biomédico”. XXV edición del Congreso Anual de la Sociedad
                                                               Española para el Procesamiento del Lenguaje Natural 2009
                                                               (SEPLN´09) San Sebastián (Sept-2009).

Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   30

    Knowledge integration




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                    31


    Experiences in Computer-aided
    summarization(I)
      Computer-aided
                    summarization combines automatic
       and human summarization.
      The CAS system suggest an initial summary,
       selecting relevant sentences
      The human can change the sentences selection and
       edit manually the summary.
      Purpose: construction                      of a Gold-Standard building
       assistant.
      Novelty: Considering                       biomedical concepts distribution
       (Reeve et al., 2006)

Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   32


    Experiences in Computer-aided
    summarization(and II)
Experience in the design
  and construction of a
 Gold-Standard building
 assistant (or Computer-
  aided summarization)

Considering biomedical
 concepts distribution
  (Reeve et al., 2006)

    -Client-server app
 -Centralized repository
   -Supports PDF, XML



Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   33

    Knowledge integration




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                   34

    Experiences in Information Retrieval
    and Post-retrieval clustering
     Experience in the design and
     construction of an information
         retrieval system with:
         •  ost-retrieval clustering,
          P
        •  rientation to biomedical
          o
                documents and
               •  obile devices
                m




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
Search	
  and	
  Informa.on	
  Retrieval	
  
                                                       Our	
  implementa.on	
               36


    Document sources: Biomed Central (web crawling in progress)
    Text Processing: lowercasing, stemming, stop-words ,…




                                        Lucene for indexing…


Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
Search	
  and	
  Informa.on	
  Retrieval	
  
                                                Our	
  implementa.on	
  (and	
  II)	
       37




                                      … and Lucene for searching
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
Clustering	
  
                                                    Our	
  implementa.on	
          38

    Weka for Clustering
          The post-processing clustering is to associate, according to their
          similarity, a set of documents retrieved from a query in different
          subsets




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
                                                                                         38	
  
Clustering	
  
                                            Why	
  Simple-­‐K-­‐Means?	
  



 Clustering algorithm:
  Simple-K-Means vs Expectation Maximization

                     Algorithms	
  	
  
                                          Simple-­‐K-­‐means	
               EM	
  
Querys	
  (Documents)	
  
     Ligaments	
  (10)	
                             1	
                      2	
  
    Cancer	
  Skin	
  (25)	
                         4	
                     12	
  
         Cancer	
  (46)	
                            5	
                     26	
  
      Disease	
  (62)	
                              8	
                     57	
  
                                          Time it takes to perform the grouping in seconds


    K? It depends on the number of documents retrieved.



                                                                                             39	
  
Visualiza.on	
  on	
  Mobile	
  Devices	
  
                   Our	
  interface	
  




Cancer skin




                                                       40	
  
+                                                                                   41

    Knowledge integration




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+
    Experiences in Information Retrieval
    and Query user-defined expansion (I)

      Userhave problems to define their information needs in a
      query string (Jansen, Spink y Koshman, 2007).
        Queries containe less than three terms (75,2%) and the majority of
        queries contained one (18,5%), two (32,2%)

      Methods  to improve (expand) query:
        Relevance feedback.
        Local analysis or global analysis.

        Natural   Language Processing Resources.

      Experiments   with users show the preferences of these to
      maintain control over how the query is reformulated (Belkin
      et al., 2001).
+                                                                                   43

    Experiences in Information Retrieval
    and Query user-defined expansion (II)

      Experience  on using Ontologies to assist the definition of the
       search string… previosly




Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+
    Experiences in Information Retrieval
    and Query user-defined expansion (II)
    How does it works?
      Pre-retrieval      Construction   o f the Graph
+                                                                                   45

    Research: Information Retrieval
    (and III)
      …  or using Ontologies to build an enriched concept graph that
       assist the definition of the search string




  http://www.uhu.es/manuel.villa/viewmed/
  de la Villa, M., Garcia, S., Maña, M.
  “¿De verdad sabes lo que quieres buscar? Expansión guiada visualmente
  de la cadena de búsqueda usando ontologías y grafos de conceptos”.
  XXVII edición del Congreso Anual de la Sociedad Española para el
  Procesamiento del Lenguaje Natural 2011 (SEPLN´11) Huelva (Sept-2011).



Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
+                                                                                                       46

    Tools knowns.                                                   Expectations.

      UMLS:
           Metathesaurus, Semantic Network                               Ioffer my collaboration if
           Tools:                                                         you’re interested in using
              Metamap,                                                    any of these resources
              MMTx API,
                                                                          I’mopen to collaborate on
              Semrep
                                                                           whatever task you
              UTS Web Services, …
                                                                           consider related and…
      Freebase
                                                                          … to receive some
           MQL (Metaweb Query Language)                                   guidelines to improve
                                                                           summarization method
      Newbie        with UIMA & GATE
                                                                         Any questions?
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011

More Related Content

PPT
Ontology Mapping
PPTX
Interpreting Data Mining Results with Linked Data for Learning Analytics
PDF
Linking Universities - A broader look at the application of linked data and s...
PPT
Linked Data as a new environment for Learning Analytics and education
PPT
Data Integration Ontology Mapping
PDF
Learning ontologies
PDF
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
PPTX
Semantic Web, Linked Data and Education: A Perfect Fit?
Ontology Mapping
Interpreting Data Mining Results with Linked Data for Learning Analytics
Linking Universities - A broader look at the application of linked data and s...
Linked Data as a new environment for Learning Analytics and education
Data Integration Ontology Mapping
Learning ontologies
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
Semantic Web, Linked Data and Education: A Perfect Fit?

Similar to Experiences on integrating explicit knowledge on information access tools in the medical domain (20)

PDF
Tutoriel ssmt
PDF
Six Month
PDF
MVilla IUI 2012 Lisbon
PPT
Literature Based Framework for Semantic Descriptions of e-Science resources
PPTX
DH Tools Workshop #1: Text Analysis
PPT
Text Analytics for Semantic Computing
PPTX
Search, Signals & Sense: An Analytics Fueled Vision
PPT
AAUP 2008: Making XML Work (T. Kerner)
PDF
Ontologies
PDF
Dynamic Potential of Semantic Enrichment
PPTX
Ibn Sina
PPT
PDF
Nlp based retrieval of medical information for diagnosis of human diseases
PDF
Nlp based retrieval of medical information for diagnosis of human diseases
PDF
From Linked Data to Semantic Applications
PDF
Integrating Public and Private Data: Lessons Learned from Unison
PPT
Copy of 10text (2)
PPT
Chapter 10 Data Mining Techniques
KEY
Social media and it's use in disease surveillance
Tutoriel ssmt
Six Month
MVilla IUI 2012 Lisbon
Literature Based Framework for Semantic Descriptions of e-Science resources
DH Tools Workshop #1: Text Analysis
Text Analytics for Semantic Computing
Search, Signals & Sense: An Analytics Fueled Vision
AAUP 2008: Making XML Work (T. Kerner)
Ontologies
Dynamic Potential of Semantic Enrichment
Ibn Sina
Nlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseases
From Linked Data to Semantic Applications
Integrating Public and Private Data: Lessons Learned from Unison
Copy of 10text (2)
Chapter 10 Data Mining Techniques
Social media and it's use in disease surveillance
Ad

More from Manuel de la Villa (15)

PPTX
Presentación TFG Informes de Alta Automáticos
PDF
Presentación programa Social Media UHU
PDF
Marca personal para community managers
PDF
Taller Facebook #SMUHU parte 2
PDF
Taller Facebook #SMUHU parte 1
PDF
Personal branding
PDF
Taller de Presentaciones efectivas
PDF
Presentacion Grado en Ingeniería Informática UHU
PDF
Curso personal branding profesores
PDF
Herramientas web 2.0 parte 2
PDF
Herramientas web 2.0 Parte 1
PPSX
A Biomedical Information Retrieval System based on Clustering for Mobile Dev...
PDF
Deconstructing freebase
PDF
A critical and comparative study about ISO 9001, CMMI and ISO 15504
PDF
Presentación TFG Informes de Alta Automáticos
Presentación programa Social Media UHU
Marca personal para community managers
Taller Facebook #SMUHU parte 2
Taller Facebook #SMUHU parte 1
Personal branding
Taller de Presentaciones efectivas
Presentacion Grado en Ingeniería Informática UHU
Curso personal branding profesores
Herramientas web 2.0 parte 2
Herramientas web 2.0 Parte 1
A Biomedical Information Retrieval System based on Clustering for Mobile Dev...
Deconstructing freebase
A critical and comparative study about ISO 9001, CMMI and ISO 15504
Ad

Experiences on integrating explicit knowledge on information access tools in the medical domain

  • 1. + Computer- Query user- aided defined summarization expansion Post-retrieval Extractive clustering Summarization Experiences on integrating explicit knowledge on information access tools in the medical domain Manuel de la Villa Department of Information Technologies University of Huelva
  • 2. + 2 Index   Brief CV   Why a research stay? In Wolverhampton?   Teaching  Integrating explicit knowledge on information access tools  Knowledge sources (UMLS & Freebase)  Automatic Text Summarization  Information Retrieval Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 3. + 3 Brief CV Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 4. + 6 Teaching experience  Software Engineering  Process and Methodologies, Metrics, Requirements analysis, Design, …  Software Engineering Lab (UML, NetBeans, Subversion, Java, JUnit, Persistence…)  Multimedia applications development  Adobe Director, Flash, Photoshop, Premiere  Sony Sound Forge, Audacity Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 5. + 7 Knowledge integration Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 6. + Specific Domain Knowledge source. UMLS (I) 8 ICD-10 LOINC SNOMED-CT UK-Clinical Terms UMLS MeSH DSM-IV … Gene Ontology RxNorm An homogeneus group of terminologies A saturation of different terminologies UMLS aims to overcome a significant barrier, the variety of ways the same concepts are expressed in different machine-readable sources. Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 7. + Specific Domain Knowledge source. UMLS (II) 9 Project NLM Unified Medical Language System (UMLS):   Aim, to develop tools that help researchers in the knowledge representation, retrieval and integration of biomedical information.   UMLS Knowledge Sources ‫‏‬   Software tools Three main components: SPECIALIST Lexicon: Compilation of lexical elements (>200.000) with grammatical information and linguistic variants. “Anaesthetic” “Anaesthetic” {base=anesthetic {base=anesthetic spelling_variant=anaesthetic spelling_variant=anaesthetic entry=E0330018 cat=noun entry=E0330019 cat=adj variants=reg variants=uncount } variants=inv position=attrib(3) position=pred stative } Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 8. + Specific Domain Knowledge source. UMLS (III) 10   Metathesaurus: very large, multi-purpose, and multi-lingual vocabulary database (compiles more than 100 source vocabularios), https://uts.nlm.nih.gov/metathesaurus.html   every term (>5M) associated with a concept (>1.5M), terms related (e.g., synonyms) (16M relations)   each concept assigned to one or more semantic types of the 135 existing Different terms… for a same concept… Included in a semantic type Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 9. + Specific Domain Knowledge source. UMLS (IV) 11 https://uts.nlm.nih.gov/semanticnetwork.html  UMLS Semantic Network: is an ontology with 135 semantic types and to 54 types of relationships between types Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 10. + General Domain Knowledge Source: Freebase (I)    Freebase is a large public database that collects three kinds of information:  data;  texts ; and  media , that references…   …entities or topics (≈ 12 million). An entity is a unique single person, place, or thing.  A single concept or real-world thing.  A topic could also be called an entity, resource or element or thing, it is a fundamental unit in Freebase.  /common/topic  Each topic has a Guid or globally unique ID  http://www.freebase.com/view/en/barack_obama  http://www.freebase.com/guid/9202a8c04000641f800000000029c277
  • 11. + General Domain Knowledge Source: Freebase (II)   Freebase connects entities together as a graph,  defines its data structure as a set of nodes and a set of links that establish relationships between the nodes.   Most of our topics are associated with one or more types (such as people, places, books, films, etc) and may have additional properties like "date of birth" for a person or latitude and longitude for a location. These types and properties and related concepts are called Schema.
  • 12. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  • 13. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  • 14. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  • 15. + General Domain Knowledge Source: Freebase (III) The Schema Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
  • 16. + General Domain Knowledge Source: Freebase (IV) The Schema: Medicine
  • 17. + General Domain Knowledge Source: Freebase (V) How can we use it…   As a reference or information source   Create interesting Views and Visualizations and share them with others   Embed Freebase data in your website   Use our API or Acre, our hosted app development platform, to build apps that use Freebase data   Download our Data dumps  Use Freebase's RDF for Semantic Web applications
  • 18. + General Domain Knowledge Source: Freebase (IV) The Freebase approach
  • 19. + MQL (Metaweb Query Language) •  http://api.freebase.com/api/service/mqlread?query={"query":{"type":"/ music/artist","name":"U2","album":[]}} •  http://api.freebase.com/api/service/mqlread?query={"query": [{"type":"/medicine/disease", "name":null, "symptoms": {"name":"Nausea"}}]} •  Query Editor
  • 20. + 22 Knowledge integration Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 21. + 23 Experiences in Automatic summarization (I) + We develop a proposal with this main characteristics:   Sentences extraction   Document representation as a graph   Centered on biomedical concepts   Using concept frequency to measure relevance Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 22. + 24 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selected Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 23. + 25 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selected Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 24. + 26 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selected Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 25. + 27 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selected Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 26. + 28 Experiences in Automatic summarization (II) + Phase I: Graph generation Sentences and UMLS concepts identification + Phase II: Similarity algorithm Concepts overlapping between sentences (edges) means “recommendation” + Phase III: Ranking algorithm Weight associated with each edge depends on similarity + Phase IV: Summary building Top ranked sentences are selected Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 27. + 29 Automatic Summarization. Evaluation   Evaluation with ROUGE (based on n-grams) against generic summarizers   Our method obtains good results, specially with small n-grams de la Villa, M., Maña, M. “Propuesta y evaluación de un método de generación de resúmenes extractivo basado en conceptos en el ámbito biomédico”. XXV edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural 2009 (SEPLN´09) San Sebastián (Sept-2009). Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 28. + 30 Knowledge integration Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 29. + 31 Experiences in Computer-aided summarization(I)   Computer-aided summarization combines automatic and human summarization.   The CAS system suggest an initial summary, selecting relevant sentences   The human can change the sentences selection and edit manually the summary.   Purpose: construction of a Gold-Standard building assistant.   Novelty: Considering biomedical concepts distribution (Reeve et al., 2006) Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 30. + 32 Experiences in Computer-aided summarization(and II) Experience in the design and construction of a Gold-Standard building assistant (or Computer- aided summarization) Considering biomedical concepts distribution (Reeve et al., 2006) -Client-server app -Centralized repository -Supports PDF, XML Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 31. + 33 Knowledge integration Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 32. + 34 Experiences in Information Retrieval and Post-retrieval clustering Experience in the design and construction of an information retrieval system with: •  ost-retrieval clustering, P •  rientation to biomedical o documents and •  obile devices m Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 33. Search  and  Informa.on  Retrieval   Our  implementa.on   36 Document sources: Biomed Central (web crawling in progress) Text Processing: lowercasing, stemming, stop-words ,… Lucene for indexing… Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 34. Search  and  Informa.on  Retrieval   Our  implementa.on  (and  II)   37 … and Lucene for searching Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 35. Clustering   Our  implementa.on   38 Weka for Clustering The post-processing clustering is to associate, according to their similarity, a set of documents retrieved from a query in different subsets Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011 38  
  • 36. Clustering   Why  Simple-­‐K-­‐Means?   Clustering algorithm: Simple-K-Means vs Expectation Maximization Algorithms     Simple-­‐K-­‐means   EM   Querys  (Documents)   Ligaments  (10)   1   2   Cancer  Skin  (25)   4   12   Cancer  (46)   5   26   Disease  (62)   8   57   Time it takes to perform the grouping in seconds K? It depends on the number of documents retrieved. 39  
  • 37. Visualiza.on  on  Mobile  Devices   Our  interface   Cancer skin 40  
  • 38. + 41 Knowledge integration Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 39. + Experiences in Information Retrieval and Query user-defined expansion (I)   Userhave problems to define their information needs in a query string (Jansen, Spink y Koshman, 2007).   Queries containe less than three terms (75,2%) and the majority of queries contained one (18,5%), two (32,2%)   Methods to improve (expand) query:   Relevance feedback.   Local analysis or global analysis.   Natural Language Processing Resources.   Experiments with users show the preferences of these to maintain control over how the query is reformulated (Belkin et al., 2001).
  • 40. + 43 Experiences in Information Retrieval and Query user-defined expansion (II)   Experience on using Ontologies to assist the definition of the search string… previosly Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 41. + Experiences in Information Retrieval and Query user-defined expansion (II) How does it works?   Pre-retrieval   Construction o f the Graph
  • 42. + 45 Research: Information Retrieval (and III)   … or using Ontologies to build an enriched concept graph that assist the definition of the search string http://www.uhu.es/manuel.villa/viewmed/ de la Villa, M., Garcia, S., Maña, M. “¿De verdad sabes lo que quieres buscar? Expansión guiada visualmente de la cadena de búsqueda usando ontologías y grafos de conceptos”. XXVII edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural 2011 (SEPLN´11) Huelva (Sept-2011). Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
  • 43. + 46 Tools knowns. Expectations.   UMLS:   Metathesaurus, Semantic Network   Ioffer my collaboration if   Tools: you’re interested in using   Metamap, any of these resources   MMTx API,   I’mopen to collaborate on   Semrep whatever task you   UTS Web Services, … consider related and…   Freebase   … to receive some   MQL (Metaweb Query Language) guidelines to improve summarization method   Newbie with UIMA & GATE Any questions? Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011