From Proteins to Hypotheses:

 Some Experiments in
 Semantic Enrichment

               Anita de Waard
 Principal Researcher, Disruptive Technologies
           Elsevier Labs, Amsterdam
Utrecht Institute of Linguistics, University Utrecht
                 The Netherlands

            a.dewaard@elsevier.com
Some Experiments in Semantic Enrichment
Some Experiments in Semantic Enrichment

Entities and Relationships:
  FEBS Structured Digital Abstracts
  OKKAM Entity Repository
Some Experiments in Semantic Enrichment

Entities and Relationships:
  FEBS Structured Digital Abstracts
  OKKAM Entity Repository

Hypotheses and Evidence:
  Discourse Analysis of Biology Articles
  Hypotheses, Evidence and Relationships
Some Experiments in Semantic Enrichment

Entities and Relationships:
  FEBS Structured Digital Abstracts
  OKKAM Entity Repository

Hypotheses and Evidence:
  Discourse Analysis of Biology Articles
  Hypotheses, Evidence and Relationships

Collaborations:
  Elsevier Grand Challenge
  Future of Research Communications
Entities and Relationships
FEBS Letters Structured Digital Abstracts


  -   With FEBS Letters Editorial Office in Heidelberg; and MINT database
      Rome
  -   SDA [Gerstein et. al]: ‘machine-readable XML summary of pertinent
      facts’
  -   For each ppi paper: proteins, protein-protein interactions, methods:
      provided by author in XLS
  -   Started April 9, 2008: in ScienceDirect links to MINT and Uniprot
  -   Now 117 articles with SDA: http://www.febsletters.org/content/sda
  -   Being used as Gold Standard for Biocreative Challenge II.5
      http://www.biocreative.org/tasks/biocreative-ii5/biocreative-ii5-evaluation/
expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the redistribution of TPAP from the
cytosol to the ER.

MINT-6168263:
Gsg1 (uniprotkb:Q8R1W2), TPAP
(uniprotkb:Q9WVP6) and Calmegin
(uniprotkb:P52194) colocalize (MI:0403) by
cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:
Gsg1 (uniprotkb:Q8R1W2) and TPAP
(uniprotkb:Q9WVP6) colocalize (MI:0403) by
fluorescence microscopy (MI:0416)

MINT-6167930:
Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the redistribution of TPAP from the
cytosol to the ER.

MINT-6168263:
Gsg1 (uniprotkb:Q8R1W2), TPAP
(uniprotkb:Q9WVP6) and Calmegin
(uniprotkb:P52194) colocalize (MI:0403) by
cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:
Gsg1 (uniprotkb:Q8R1W2) and TPAP
(uniprotkb:Q9WVP6) colocalize (MI:0403) by
fluorescence microscopy (MI:0416)

MINT-6167930:
Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the redistribution of TPAP from the
cytosol to the ER.

MINT-6168263:
Gsg1 (uniprotkb:Q8R1W2), TPAP
(uniprotkb:Q9WVP6) and Calmegin
(uniprotkb:P52194) colocalize (MI:0403) by
cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:
Gsg1 (uniprotkb:Q8R1W2) and TPAP
(uniprotkb:Q9WVP6) colocalize (MI:0403) by
fluorescence microscopy (MI:0416)

MINT-6167930:
Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the redistribution of TPAP from the
cytosol to the ER.

MINT-6168263:
Gsg1 (uniprotkb:Q8R1W2), TPAP
(uniprotkb:Q9WVP6) and Calmegin
(uniprotkb:P52194) colocalize (MI:0403) by
cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:
Gsg1 (uniprotkb:Q8R1W2) and TPAP
(uniprotkb:Q9WVP6) colocalize (MI:0403) by
fluorescence microscopy (MI:0416)

MINT-6167930:
Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the redistribution of TPAP from the
cytosol to the ER.

MINT-6168263:
Gsg1 (uniprotkb:Q8R1W2), TPAP
(uniprotkb:Q9WVP6) and Calmegin
(uniprotkb:P52194) colocalize (MI:0403) by
cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:
Gsg1 (uniprotkb:Q8R1W2) and TPAP
(uniprotkb:Q9WVP6) colocalize (MI:0403) by
fluorescence microscopy (MI:0416)

MINT-6167930:
Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
OKKAM Project

FP7 - EU funded project, 13 partners, 2008 - 2010

-   Main goal: create entity identifier repository

-   Create entity-centric architecture:

    -   create new entities + synonyms where needed

    -   offer interface/link where not needed

    -   contain resolver, matcher, repository

-   E.g. http://sig.ma - semantic mashup of entity properties
http://sig.ma/search?q=p53
OKKAM Entity-centric authoring tool
Current:

 -   Build MS Word plugin to find proteins in FEBS articles
     (i.e., semi-automating FEBS SDA!)

 -   Now: testing web server interface to Whatizit
     Next: link to Biocreative metaserver

 -   Add: links to other entities via OKKAM server

 -   Prototype now being tested with FEBS Office
Future:

 -   Create unique Author ID, straddling proprietary (Scopus) and
     other (ResearcherID, OpenID) systems

 -   Allow query and resolution of authors/papers within Word
OKKAM Entity Editor in MS Word
OKKAM Entity Editor in MS Word
OKKAM Entity Editor in MS Word
OKKAM Entity Editor in MS Word
OKKAM plugin vs. UCSD group Plugin
       UCSD plugin                     Okkam plugin
      Works! Can be            Prototype, starting to test with
 downloaded, open source               FEBS authors
            etc.
  Offline: load ontologies     Online: call out to web services
            locally                        (soon...)
 Identify classes of entities Identify specific entities (e.g., a
      (e.g., ‘inhibitor’)             specific protein)
                              Works with other version of Word
   Works with Word 2007
                                         (on a PC!)
Stores annotations as XML     Store annotations as Word
   metadata - maintain     comments - can generate report
metadata in Word info pane   of entitieslinked to identifiers
                      Work well together!
OKKAM plugin vs. UCSD group Plugin
       UCSD plugin                     Okkam plugin
      Works! Can be            Prototype, starting to test with
 downloaded, open source               FEBS authors
            etc.
  Offline: load ontologies     Online: call out to web services
            locally                        (soon...)
 Identify classes of entities Identify specific entities (e.g., a
      (e.g., ‘inhibitor’)             specific protein)
                              Works with other version of Word
   Works with Word 2007
                                         (on a PC!)
Stores annotations as XML     Store annotations as Word
   metadata - maintain     comments - can generate report
metadata in Word info pane   of entitieslinked to identifiers
                      Work well together!
Hypotheses and Evidence
Issue with Biocreative challenge:
From one of the documents in the training set:
-     In Xenopus oocyte maturation, cytoplasmic polyadenylation mediated by cytoplasmic
      polyadenylation element binding protein (CPEB) induces the translation of maternal
      mRNA [5].

-     In mouse testis, another novel member of the CPEB protein family (CPEB2) and a
      homolog of xGLD-2 (mGLD-2) have been identified [7] and [8]

versus:
-     TPAP was present in GSG1 immunoprecipitates (Fig. 2B). The in vivo data suggest that
      TPAP–GSG1 interactions occur in mammalian cells.


    Issue 1: what is experimentally verified content?
    Issue 2: what is new?
Discourse analysis of biological text
Discourse analysis of biological text
-   How can we identify line of argumentation in biology text?
Discourse analysis of biological text
-   How can we identify line of argumentation in biology text?
Discourse analysis of biological text
-   How can we identify line of argumentation in biology text?




-   Liguistics says: parse into clauses (anything with a verb)
Discourse analysis of biological text
-   How can we identify line of argumentation in biology text?




-   Liguistics says: parse into clauses (anything with a verb)
-   Identify Discourse Segment Purpose
Discourse analysis of biological text
-   How can we identify line of argumentation in biology text?




-   Liguistics says: parse into clauses (anything with a verb)
-   Identify Discourse Segment Purpose
-   Identify verb tense, type, markers for each segment type
Discourse analysis of biological text
-   How can we identify line of argumentation in biology text?




-   Liguistics says: parse into clauses (anything with a verb)
-   Identify Discourse Segment Purpose
-   Identify verb tense, type, markers for each segment type
Discourse analysis of biological text
-   How can we identify line of argumentation in biology text?




-   Liguistics says: parse into clauses (anything with a verb)
-   Identify Discourse Segment Purpose
-   Identify verb tense, type, markers for each segment type
3 Realms of Science:

Conceptual
  realm


Experimental
   realm




  Data realm
3 Realms of Science:
               (1) Oncogene-induced senescence is            (4b) transduction with either
Conceptual     characterized by the appearance of            miR-Vec-371&2 or miR-Vec-
                                                                                V12
               cells with a flat morphology that             373 prevents RAS -
  realm        express senescence associated (SA)-           induced growth arrest in
                -Galactosid a s e .                          primary human cells.


                    (2a) Indeed,              (4a) Altogether, these data
                                              show that
Experimental
   realm
                               V12
               (2b) control RAS -arrested                       (3b) very few cells showed
               cells showed relatively high                     senescent morphology when
                                              (3a) Consistent
               abundance of flat cells                          transduced with either miR-
                                              with the cell
               expressing SA- -                                 Vec-371&2, miR-Vec-373, or
                                              growth assay,                 kd
               Galactosidase                                    control p53 .

                      (2c) (Figures
                      2G and 2H).

  Data realm
                       (Figures)
3 Realms of Science:
               (1) Oncogene-induced senescence is            (4b) transduction with either
Conceptual     characterized by the appearance of            miR-Vec-371&2 or miR-Vec-
                                                                                V12
               cells with a flat morphology that             373 prevents RAS -
  realm        express senescence associated (SA)-           induced growth arrest in
                -Galactosid a s e .                          primary human cells.


                    (2a) Indeed,              (4a) Altogether, these data
                                              show that
Experimental
   realm
                               V12
               (2b) control RAS -arrested                       (3b) very few cells showed
               cells showed relatively high                     senescent morphology when
                                              (3a) Consistent
               abundance of flat cells                          transduced with either miR-
                                              with the cell
               expressing SA- -                                 Vec-371&2, miR-Vec-373, or
                                              growth assay,                 kd
               Galactosidase                                    control p53 .

                      (2c) (Figures
                      2G and 2H).

  Data realm
                       (Figures)
Hypotheses ‘erode’ into facts:
Hypotheses ‘erode’ into facts:




           KnownFact   KnownFact

Concepts
Hypotheses ‘erode’ into facts:
    To investigate the possibility that
    miR-372 and miR-373 suppress
     the expression of LATS2, we...




                      KnownFact                   KnownFact

Concepts                             Hypothesis
Hypotheses ‘erode’ into facts:
    To investigate the possibility that
    miR-372 and miR-373 suppress
     the expression of LATS2, we...




                      KnownFact                    KnownFact

Concepts                             Hypothesis


                              Goal


                   Method                 Result


                              Data

 Experiment 1
Hypotheses ‘erode’ into facts:
    To investigate the possibility that
    miR-372 and miR-373 suppress
     the expression of LATS2, we...

                           Therefore, these results point to
                           LATS2 as a mediator of the miR-372 and
                           miR-373 effects on cell proliferation and
                           tumorigenicity,



                      KnownFact                      KnownFact

Concepts                             Hypothesis              Implication


                              Goal


                   Method                   Result


                              Data

 Experiment 1
Hypotheses ‘erode’ into facts:
    To investigate the possibility that
    miR-372 and miR-373 suppress
     the expression of LATS2, we...

                           Therefore, these results point to
                           LATS2 as a mediator of the miR-372 and
                           miR-373 effects on cell proliferation and
                           tumorigenicity,



                      KnownFact                      KnownFact

Concepts                             Hypothesis              Implication


                                                                             Goal
                              Goal


                   Method                   Result                         Method      Result



                              Data                                              Data

 Experiment 1                                                  Experiment 2
Hypotheses ‘erode’ into facts:
    To investigate the possibility that
    miR-372 and miR-373 suppress
     the expression of LATS2, we...
                                                                             Raver-Shapira et.al, JMolCell 2007

                           Therefore, these results point to           two miRNAs, miRNA-372 and-373, function as
                           LATS2 as a mediator of the miR-372 and      potential novel oncogenes in testicular germ cell
                           miR-373 effects on cell proliferation and      tumors by inhibition of LATS2 expression,
                           tumorigenicity,                              which suggests that Lats2 is an important
                                                                         tumor suppressor (Voorhoeve et al., 2006).


                      KnownFact                      KnownFact

Concepts                             Hypothesis              Implication                                   Fact


                                                                                     Goal
                              Goal


                   Method                   Result                               Method                   Result



                              Data                                                        Data

 Experiment 1                                                  Experiment 2
Hypotheses ‘erode’ into facts:                                                                Yabuta, JBioChem 2007
                                                                                          miR-372 and miR-373 target the
                                                                                              Lats2 tumor suppressor
    To investigate the possibility that                                                       (Voorhoeve et al., 2006)
    miR-372 and miR-373 suppress
     the expression of LATS2, we...
                                                                             Raver-Shapira et.al, JMolCell 2007

                           Therefore, these results point to           two miRNAs, miRNA-372 and-373, function as
                           LATS2 as a mediator of the miR-372 and      potential novel oncogenes in testicular germ cell
                           miR-373 effects on cell proliferation and      tumors by inhibition of LATS2 expression,
                           tumorigenicity,                              which suggests that Lats2 is an important
                                                                         tumor suppressor (Voorhoeve et al., 2006).


                      KnownFact                      KnownFact

Concepts                             Hypothesis              Implication                                   Fact


                                                                                     Goal
                              Goal


                   Method                   Result                               Method                   Result



                              Data                                                        Data

 Experiment 1                                                  Experiment 2
Hypotheses ‘erode’ Hypothesis +
                   into facts:                                                                Yabuta, JBioChem 2007
                                                                                          miR-372 and miR-373 target the
                                                                                              Lats2 tumor suppressor
    To investigate the possibility that            Evidence                                   (Voorhoeve et al., 2006)
    miR-372 and miR-373 suppress
     the expression of LATS2, we...
                                                                             Raver-Shapira et.al, JMolCell 2007

                           Therefore, these results point to           two miRNAs, miRNA-372 and-373, function as
                           LATS2 as a mediator of the miR-372 and      potential novel oncogenes in testicular germ cell
                           miR-373 effects on cell proliferation and      tumors by inhibition of LATS2 expression,
                           tumorigenicity,                              which suggests that Lats2 is an important
                                                                         tumor suppressor (Voorhoeve et al., 2006).


                      KnownFact                      KnownFact

Concepts                             Hypothesis              Implication                                   Fact


                                                                                     Goal
                              Goal


                   Method                   Result                               Method                   Result



                              Data                                                        Data

 Experiment 1                                                  Experiment 2
Identification of hypotheses in papers:
SWAN Alzheimer KB
Identification of hypotheses in papers:
SWAN Alzheimer KB
Identification of hypotheses in papers:
SWAN Alzheimer KB
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evidence & Relationships

-       Partners:
    -    Harvard/MGH: SWAN, ARF

    -    Open University: Cohere

    -    Oxford University: CiTO, eLearning/Rhetoric

    -    DERI: SALT, aTags

    -    University of Trento: LiquidPub

    -    Xerox Research: XIP hypothesis identifier

    -    U Tilburg: ML for Science

    -    Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evidence & Relationships

-       Partners:
    -    Harvard/MGH: SWAN, ARF

    -    Open University: Cohere

    -    Oxford University: CiTO, eLearning/Rhetoric

    -    DERI: SALT, aTags

    -    University of Trento: LiquidPub

    -    Xerox Research: XIP hypothesis identifier

    -    U Tilburg: ML for Science

    -    Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evidence & Relationships

-       Partners:
    -    Harvard/MGH: SWAN, ARF

    -    Open University: Cohere

    -    Oxford University: CiTO, eLearning/Rhetoric

    -    DERI: SALT, aTags

    -    University of Trento: LiquidPub

    -    Xerox Research: XIP hypothesis identifier

    -    U Tilburg: ML for Science

    -    Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evidence & Relationships

-       Partners:
    -    Harvard/MGH: SWAN, ARF

    -    Open University: Cohere

    -    Oxford University: CiTO, eLearning/Rhetoric

    -    DERI: SALT, aTags

    -    University of Trento: LiquidPub

    -    Xerox Research: XIP hypothesis identifier

    -    U Tilburg: ML for Science

    -    Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evidence & Relationships

-       Partners:
    -    Harvard/MGH: SWAN, ARF

    -    Open University: Cohere

    -    Oxford University: CiTO, eLearning/Rhetoric

    -    DERI: SALT, aTags

    -    University of Trento: LiquidPub

    -    Xerox Research: XIP hypothesis identifier

    -    U Tilburg: ML for Science

    -    Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evidence & Relationships

-       Partners:
    -    Harvard/MGH: SWAN, ARF

    -    Open University: Cohere

    -    Oxford University: CiTO, eLearning/Rhetoric

    -    DERI: SALT, aTags

    -    University of Trento: LiquidPub

    -    Xerox Research: XIP hypothesis identifier

    -    U Tilburg: ML for Science

    -    Elsevier, UUtrecht: Discourse analysis of biology
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evidence & Relationships

-       Partners:
    -    Harvard/MGH: SWAN, ARF

    -    Open University: Cohere

    -    Oxford University: CiTO, eLearning/Rhetoric

    -    DERI: SALT, aTags

    -    University of Trento: LiquidPub

    -    Xerox Research: XIP hypothesisHypothesis 22: Intramembrenous Aβ dimer may be toxic
                                        identifier

    -    U Tilburg: ML for Science         Derived from: POSTAT_CONTRIBUTION(This essay explo
                                           possibility that a fraction of these Abeta peptides never leav
    -    Elsevier, UUtrecht: Discourse analysis of biology
HypER Activities: http://hyper.wik.is
Current activities:

   -   Aligning discourse ontologies: joint task with W3C HCLSSig

   -   Aligning architectures to exchange hypotheses + evidence

   -   Format for a rhetorical conference paper (SALT + abcde)

   -   Parser comparison of hypothesis identification tools

   -   With NIF+SWAN/SCF: Structured Rhetorical Abstracts for
       Neuroscience publications
Further interests:

   -   Better structure of evidence: MyExperiment, KeFeD, ...

   -   Granularity of annotation/access: entity, hypothesis,
       discussion?

   -   Where/how annotate: http://saakm2009.semanticauthoring.org/
Collaborations
Scope: Tools and processes to:
-   Improve the process of creating, reviewing and editing
    scientific content
-   Interpret, visualize or connect science knowledge
-   Provide tools/ideas for measuring the impact of these
    improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
     -   500,000 full text articles
     -   Plus EMTREE, EmBase, Scopus
     -   Created tool/demo
     -   Presented to the Judges
     -   Wrote a paper (accepted for JWeb Semantics)
April 2009: Judges selected 4 Finalist teams.
And the winners are:
Scope: Tools and processes to:
-   Improve the process of creating, reviewing and editing
    scientific content                                2 Related Work
-   Interpret, visualize or connect science knowledge Using link text for summarisation has been explo
- Provide tools/ideas for measuring the impact of these
                                                    previously by Amitay and Paris (2000). They ide
  improvements.                                     fied situations when it was possible to generate su
June 2008: 71 Submissions from 15 countries.        maries of web-pages by recycling human-autho
                                                    descriptions of links from anchor text. In our wo
August 2008: 10 Semi-finalists teams, access to:
                                                    we use the anchor text as the reading context to p
    - 500,000 full text articles                    vide an elaborative summary for the linked do
    - Plus EMTREE, EmBase, Scopus                   ment.
                                                       Our work is similar in domain to that of the 2
    - Created tool/demo
                                                    CLEF WiQA shared task.4 However, in contras
    - Presented to the Judges                       our application scenario, the end goal of the sha
    - Wrote a paper (accepted for JWeb Semantics)   task focuses on suggesting editing updates fo
April 2009: Judges selected 4 Finalist teams.       particular document and not on elaborating on
                                                    user’s reading context.
And the winners are:
                                                       A related task was explored at the Document
Scope: Tools and processes to:
-   Improve the process of creating, reviewing and editing
    scientific content                                2 Related Work
-   Interpret, visualize or connect science knowledge Using link text for summarisation has been explo
- Provide tools/ideas for measuring the impact of these
                                                    previously by Amitay and Paris (2000). They ide
  improvements.                                     fied situations when it was possible to generate su
June 2008: 71 Submissions from 15 countries.        maries of web-pages by recycling human-autho
                                                    descriptions of links from anchor text. In our wo
August 2008: 10 Semi-finalists teams, access to:
                                                    we use the anchor text as the reading context to p
    - 500,000 full text articles                    vide an elaborative summary for the linked do
    - Plus EMTREE, EmBase, Scopus                   ment.
                                                       Our work is similar in domain to that of the 2
    - Created tool/demo
                                                    CLEF WiQA shared task.4 However, in contras
    - Presented to the Judges                       our application scenario, the end goal of the sha
    - Wrote a paper (accepted for JWeb Semantics)   task focuses on suggesting editing updates fo
April 2009: Judges selected 4 Finalist teams.       particular document and not on elaborating on
                                                    user’s reading context.
And the winners are:
                                                       A related task was explored at the Document
Scope: Tools and processes to:
-   Improve the process of creating, reviewing and editing
    scientific content                                2 Related Work
-   Interpret, visualize or connect science knowledge Using link text for summarisation has been explo
- Provide tools/ideas for measuring the impact of these
                                                    previously by Amitay and Paris (2000). They ide
  improvements.                                     fied situations when it was possible to generate su
June 2008: 71 Submissions from 15 countries.        maries of web-pages by recycling human-autho
                                                    descriptions of links from anchor text. In our wo
August 2008: 10 Semi-finalists teams, access to:
                                                    we use the anchor text as the reading context to p
    - 500,000 full text articles                    vide an elaborative summary for the linked do
    - Plus EMTREE, EmBase, Scopus                   ment.
                                                       Our work is similar in domain to that of the 2
    - Created tool/demo
                                                    CLEF WiQA shared task.4 However, in contras
    - Presented to the Judges                       our application scenario, the end goal of the sha
    - Wrote a paper (accepted for JWeb Semantics)   task focuses on suggesting editing updates fo
April 2009: Judges selected 4 Finalist teams.       particular document and not on elaborating on
                                                    user’s reading context.
And the winners are:
                                                       A related task was explored at the Document
Scope: Tools and processes to:
-   Improve the process of creating, reviewing and editing
    scientific content                                2 Related Work
-   Interpret, visualize or connect science knowledge Using link text for summarisation has been explo
- Provide tools/ideas for measuring the impact of these
                                                    previously by Amitay and Paris (2000). They ide
  improvements.                                     fied situations when it was possible to generate su
June 2008: 71 Submissions from 15 countries.        maries of web-pages by recycling human-autho
                                                    descriptions of links from anchor text. In our wo
August 2008: 10 Semi-finalists teams, access to:
                                                    we use the anchor text as the reading context to p
    - 500,000 full text articles                    vide an elaborative summary for the linked do
    - Plus EMTREE, EmBase, Scopus                   ment.
                                                       Our work is similar in domain to that of the 2
    - Created tool/demo
                                                    CLEF WiQA shared task.4 However, in contras
    - Presented to the Judges                       our application scenario, the end goal of the sha
    - Wrote a paper (accepted for JWeb Semantics)   task focuses on suggesting editing updates fo
April 2009: Judges selected 4 Finalist teams.       particular document and not on elaborating on
                                                    user’s reading context.
And the winners are:
                                                       A related task was explored at the Document
Harvard, Fall 2010
FoRC: The Future of Research Communication
Improve ways to:
 - create, review, edit scientific content
 - interpret, visualize, connect scientific knowledge
 - measure the impact of these improvements
 - present new paradigms in publishing
 - serve global academic knowledge communities
 - communicate between collaboratories
 - share research data
 - support complex knowledge ecosystems.
Harvard, Fall 2010
FoRC: The Future of Research Communication
  Improve ways to:
   - create, review, edit scientific content
   - interpret, visualize, connect scientific knowledge
   - measure the impact of these improvements
   - present new paradigms in publishing
   - serve global academic knowledge communities
   - communicate between collaboratories
   - share research data
   - support complex knowledge ecosystems.

New ways of communicating science requires new ways of communicating
Conference registrants form a community:
- Website is their meeting place, continues as long as users want to
- The physical conference is synchronous manifestation of the community
- Others can participate through a virtual environment at other time/place
Acknowledgments                          HypER:
-   Funding:                             -   Harvard/MGH:
    FP7-Cordis                               Tim Clark
    NWO Casimir Project                      Elizabeth Wu

-   UUtrecht:                            -   DERI:
    Leen Breure,                             Siegfried Handschuh
                                             Tudor Groza
    Henk Pander Maat
                                             Matthias Samwald
    Ted Sanders
                                             Paul Buitelaar
-   ISI:
                                         -   Xerox Research:
    Gully Burns                              Agnes Sandor
    Ed Hovy
                                         -   Open University:
-   Elsevier:                                Simon Buckingham Shum
    David Marques                            David Shotton
    Darin McBeath                            Jack Park
    Stefano Bocconi                          Clara Mancini
    Adriaan Klinkenberg
    Emilie Marcus                        -   Oxford University:
                                             Annamaria Carusi
-   Challenge judges and participants:       David Shotton
    EMBL Team, CMU Team
    David Shotton                        -   Tilburg U:
                                             Piroska Lendvai
    Alfonso Valencia

More Related Content

PDF
GMueller_Barcelona
PDF
Review of CRISPR/Cas9
PPT
Biological literature mining - from information retrieval to biological disco...
PPT
Literature mining: what is it, and should I care?
PDF
Miguel Foronda T3chfest
PPT
Literature mining and large-scale data integration
PPT
Literature Mining and Systems Biology
PPTX
Crispr/Cas 9
GMueller_Barcelona
Review of CRISPR/Cas9
Biological literature mining - from information retrieval to biological disco...
Literature mining: what is it, and should I care?
Miguel Foronda T3chfest
Literature mining and large-scale data integration
Literature Mining and Systems Biology
Crispr/Cas 9

What's hot (10)

PDF
Crypt Sequence DNA
PPTX
2014 ucl
PPTX
2014 naples
PPTX
2014 villefranche
PDF
Type inference through the analysis of Wikipedia links
PDF
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
PPTX
Crispr cas9 ppt (1)
PDF
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
PDF
Computational infrastructure for NGS data analysis
PDF
Semiconductor Sequencing Applications for Plant Sciences
Crypt Sequence DNA
2014 ucl
2014 naples
2014 villefranche
Type inference through the analysis of Wikipedia links
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
Crispr cas9 ppt (1)
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Computational infrastructure for NGS data analysis
Semiconductor Sequencing Applications for Plant Sciences
Ad

Viewers also liked (8)

PPTX
Polyadenylation
PPTX
eukaryotic translation initiation and its regulation
PPTX
Comparative genomics presentation
PPTX
Comparative genomics
PPTX
TRANSLATION & POST - TRANSLATIONAL MODIFICATIONS
PPTX
Post-Translational Modifications
PPT
Post-Transcriptional Modification
PPTX
Post translational modifications
Polyadenylation
eukaryotic translation initiation and its regulation
Comparative genomics presentation
Comparative genomics
TRANSLATION & POST - TRANSLATIONAL MODIFICATIONS
Post-Translational Modifications
Post-Transcriptional Modification
Post translational modifications
Ad

Similar to Ismb2009 (20)

PDF
Xerox2009
PDF
Text mining tools for semantically enriching scientific literature
PDF
C-SHALS 2010: representing scientific discourse, or: why triples are not enough
PPT
Protein function prediction
PPTX
How Scientists Read, How Computers Read, and What We Should Do
PPT
Communications
PDF
Specificity Assessment At Santaris Pharma
PPTX
Chibucos annot go_final
PPTX
Light Intro to the Gene Ontology
PPTX
Web Science, SADI, and the Singularity
PDF
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
PDF
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
PPTX
Bioinformatica t2-databases
PPTX
Cameron.bibm2011
PPT
Basic Formal Ontology (BFO) and Disease
PPT
Transcriptomics and lexico-syntactic analysis
PPTX
Web Science - ISoLA 2012
PDF
Unifying ontology services for functional genomic annotations
PPTX
The annotation of plant proteins in UniProtKB
 
PPTX
Ibn Sina
Xerox2009
Text mining tools for semantically enriching scientific literature
C-SHALS 2010: representing scientific discourse, or: why triples are not enough
Protein function prediction
How Scientists Read, How Computers Read, and What We Should Do
Communications
Specificity Assessment At Santaris Pharma
Chibucos annot go_final
Light Intro to the Gene Ontology
Web Science, SADI, and the Singularity
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Bioinformatica t2-databases
Cameron.bibm2011
Basic Formal Ontology (BFO) and Disease
Transcriptomics and lexico-syntactic analysis
Web Science - ISoLA 2012
Unifying ontology services for functional genomic annotations
The annotation of plant proteins in UniProtKB
 
Ibn Sina

More from Anita de Waard (20)

PDF
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
PPTX
Why would a publisher care about open data?
PPTX
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
PDF
NFAIS Talk on Enabling FAIR Data
PPTX
CNI 2018: A Research Object Authoring Tool for the Data Commons
PPTX
Enabling FAIR Data: TAG B Authoring Guidelines
PPTX
Scientific facts are myths, told through fairytales and spread by gossip.
PPTX
Data, Data Everywhere: What's A Publisher to Do?
PPTX
Talk on Research Data Management
PPTX
History of the future
PPTX
Networked Science, And Integrating with Dataverse
PPTX
Big Data and the Future of Publishing
PPTX
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
PDF
Data Repositories: Recommendation, Certification and Models for Cost Recovery
PPTX
The Economics of Data Sharing
PPTX
Public Identifiers in Scholarly Publishing
PPTX
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
PPTX
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
PPTX
Charleston Conference 2016
PPTX
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Why would a publisher care about open data?
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
NFAIS Talk on Enabling FAIR Data
CNI 2018: A Research Object Authoring Tool for the Data Commons
Enabling FAIR Data: TAG B Authoring Guidelines
Scientific facts are myths, told through fairytales and spread by gossip.
Data, Data Everywhere: What's A Publisher to Do?
Talk on Research Data Management
History of the future
Networked Science, And Integrating with Dataverse
Big Data and the Future of Publishing
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Data Repositories: Recommendation, Certification and Models for Cost Recovery
The Economics of Data Sharing
Public Identifiers in Scholarly Publishing
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Charleston Conference 2016
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...

Recently uploaded (20)

PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PDF
Literature_Review_methods_ BRACU_MKT426 course material
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
My India Quiz Book_20210205121199924.pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
IP : I ; Unit I : Preformulation Studies
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PPTX
What’s under the hood: Parsing standardized learning content for AI
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
Complications of Minimal Access-Surgery.pdf
PPTX
Education and Perspectives of Education.pptx
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
Hazard Identification & Risk Assessment .pdf
PDF
semiconductor packaging in vlsi design fab
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
Literature_Review_methods_ BRACU_MKT426 course material
Share_Module_2_Power_conflict_and_negotiation.pptx
Race Reva University – Shaping Future Leaders in Artificial Intelligence
FORM 1 BIOLOGY MIND MAPS and their schemes
Cambridge-Practice-Tests-for-IELTS-12.docx
My India Quiz Book_20210205121199924.pdf
Introduction to pro and eukaryotes and differences.pptx
IP : I ; Unit I : Preformulation Studies
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
What’s under the hood: Parsing standardized learning content for AI
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Complications of Minimal Access-Surgery.pdf
Education and Perspectives of Education.pptx
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Hazard Identification & Risk Assessment .pdf
semiconductor packaging in vlsi design fab
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Environmental Education MCQ BD2EE - Share Source.pdf
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf

Ismb2009

  • 1. From Proteins to Hypotheses: Some Experiments in Semantic Enrichment Anita de Waard Principal Researcher, Disruptive Technologies Elsevier Labs, Amsterdam Utrecht Institute of Linguistics, University Utrecht The Netherlands a.dewaard@elsevier.com
  • 2. Some Experiments in Semantic Enrichment
  • 3. Some Experiments in Semantic Enrichment Entities and Relationships: FEBS Structured Digital Abstracts OKKAM Entity Repository
  • 4. Some Experiments in Semantic Enrichment Entities and Relationships: FEBS Structured Digital Abstracts OKKAM Entity Repository Hypotheses and Evidence: Discourse Analysis of Biology Articles Hypotheses, Evidence and Relationships
  • 5. Some Experiments in Semantic Enrichment Entities and Relationships: FEBS Structured Digital Abstracts OKKAM Entity Repository Hypotheses and Evidence: Discourse Analysis of Biology Articles Hypotheses, Evidence and Relationships Collaborations: Elsevier Grand Challenge Future of Research Communications
  • 7. FEBS Letters Structured Digital Abstracts - With FEBS Letters Editorial Office in Heidelberg; and MINT database Rome - SDA [Gerstein et. al]: ‘machine-readable XML summary of pertinent facts’ - For each ppi paper: proteins, protein-protein interactions, methods: provided by author in XLS - Started April 9, 2008: in ScienceDirect links to MINT and Uniprot - Now 117 articles with SDA: http://www.febsletters.org/content/sda - Being used as Gold Standard for Biocreative Challenge II.5 http://www.biocreative.org/tasks/biocreative-ii5/biocreative-ii5-evaluation/
  • 8. expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER. MINT-6168263: Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027) MINT-6168204, MINT-6168178: Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416) MINT-6167930: Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
  • 9. expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER. MINT-6168263: Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027) MINT-6168204, MINT-6168178: Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416) MINT-6167930: Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
  • 10. expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER. MINT-6168263: Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027) MINT-6168204, MINT-6168178: Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416) MINT-6167930: Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
  • 11. expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER. MINT-6168263: Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027) MINT-6168204, MINT-6168178: Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416) MINT-6167930: Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
  • 12. expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER. MINT-6168263: Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027) MINT-6168204, MINT-6168178: Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416) MINT-6167930: Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
  • 13. OKKAM Project FP7 - EU funded project, 13 partners, 2008 - 2010 - Main goal: create entity identifier repository - Create entity-centric architecture: - create new entities + synonyms where needed - offer interface/link where not needed - contain resolver, matcher, repository - E.g. http://sig.ma - semantic mashup of entity properties
  • 15. OKKAM Entity-centric authoring tool Current: - Build MS Word plugin to find proteins in FEBS articles (i.e., semi-automating FEBS SDA!) - Now: testing web server interface to Whatizit Next: link to Biocreative metaserver - Add: links to other entities via OKKAM server - Prototype now being tested with FEBS Office Future: - Create unique Author ID, straddling proprietary (Scopus) and other (ResearcherID, OpenID) systems - Allow query and resolution of authors/papers within Word
  • 16. OKKAM Entity Editor in MS Word
  • 17. OKKAM Entity Editor in MS Word
  • 18. OKKAM Entity Editor in MS Word
  • 19. OKKAM Entity Editor in MS Word
  • 20. OKKAM plugin vs. UCSD group Plugin UCSD plugin Okkam plugin Works! Can be Prototype, starting to test with downloaded, open source FEBS authors etc. Offline: load ontologies Online: call out to web services locally (soon...) Identify classes of entities Identify specific entities (e.g., a (e.g., ‘inhibitor’) specific protein) Works with other version of Word Works with Word 2007 (on a PC!) Stores annotations as XML Store annotations as Word metadata - maintain comments - can generate report metadata in Word info pane of entitieslinked to identifiers Work well together!
  • 21. OKKAM plugin vs. UCSD group Plugin UCSD plugin Okkam plugin Works! Can be Prototype, starting to test with downloaded, open source FEBS authors etc. Offline: load ontologies Online: call out to web services locally (soon...) Identify classes of entities Identify specific entities (e.g., a (e.g., ‘inhibitor’) specific protein) Works with other version of Word Works with Word 2007 (on a PC!) Stores annotations as XML Store annotations as Word metadata - maintain comments - can generate report metadata in Word info pane of entitieslinked to identifiers Work well together!
  • 23. Issue with Biocreative challenge: From one of the documents in the training set: - In Xenopus oocyte maturation, cytoplasmic polyadenylation mediated by cytoplasmic polyadenylation element binding protein (CPEB) induces the translation of maternal mRNA [5]. - In mouse testis, another novel member of the CPEB protein family (CPEB2) and a homolog of xGLD-2 (mGLD-2) have been identified [7] and [8] versus: - TPAP was present in GSG1 immunoprecipitates (Fig. 2B). The in vivo data suggest that TPAP–GSG1 interactions occur in mammalian cells. Issue 1: what is experimentally verified content? Issue 2: what is new?
  • 24. Discourse analysis of biological text
  • 25. Discourse analysis of biological text - How can we identify line of argumentation in biology text?
  • 26. Discourse analysis of biological text - How can we identify line of argumentation in biology text?
  • 27. Discourse analysis of biological text - How can we identify line of argumentation in biology text? - Liguistics says: parse into clauses (anything with a verb)
  • 28. Discourse analysis of biological text - How can we identify line of argumentation in biology text? - Liguistics says: parse into clauses (anything with a verb) - Identify Discourse Segment Purpose
  • 29. Discourse analysis of biological text - How can we identify line of argumentation in biology text? - Liguistics says: parse into clauses (anything with a verb) - Identify Discourse Segment Purpose - Identify verb tense, type, markers for each segment type
  • 30. Discourse analysis of biological text - How can we identify line of argumentation in biology text? - Liguistics says: parse into clauses (anything with a verb) - Identify Discourse Segment Purpose - Identify verb tense, type, markers for each segment type
  • 31. Discourse analysis of biological text - How can we identify line of argumentation in biology text? - Liguistics says: parse into clauses (anything with a verb) - Identify Discourse Segment Purpose - Identify verb tense, type, markers for each segment type
  • 32. 3 Realms of Science: Conceptual realm Experimental realm Data realm
  • 33. 3 Realms of Science: (1) Oncogene-induced senescence is (4b) transduction with either Conceptual characterized by the appearance of miR-Vec-371&2 or miR-Vec- V12 cells with a flat morphology that 373 prevents RAS - realm express senescence associated (SA)- induced growth arrest in -Galactosid a s e . primary human cells. (2a) Indeed, (4a) Altogether, these data show that Experimental realm V12 (2b) control RAS -arrested (3b) very few cells showed cells showed relatively high senescent morphology when (3a) Consistent abundance of flat cells transduced with either miR- with the cell expressing SA- - Vec-371&2, miR-Vec-373, or growth assay, kd Galactosidase control p53 . (2c) (Figures 2G and 2H). Data realm (Figures)
  • 34. 3 Realms of Science: (1) Oncogene-induced senescence is (4b) transduction with either Conceptual characterized by the appearance of miR-Vec-371&2 or miR-Vec- V12 cells with a flat morphology that 373 prevents RAS - realm express senescence associated (SA)- induced growth arrest in -Galactosid a s e . primary human cells. (2a) Indeed, (4a) Altogether, these data show that Experimental realm V12 (2b) control RAS -arrested (3b) very few cells showed cells showed relatively high senescent morphology when (3a) Consistent abundance of flat cells transduced with either miR- with the cell expressing SA- - Vec-371&2, miR-Vec-373, or growth assay, kd Galactosidase control p53 . (2c) (Figures 2G and 2H). Data realm (Figures)
  • 36. Hypotheses ‘erode’ into facts: KnownFact KnownFact Concepts
  • 37. Hypotheses ‘erode’ into facts: To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... KnownFact KnownFact Concepts Hypothesis
  • 38. Hypotheses ‘erode’ into facts: To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... KnownFact KnownFact Concepts Hypothesis Goal Method Result Data Experiment 1
  • 39. Hypotheses ‘erode’ into facts: To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity, KnownFact KnownFact Concepts Hypothesis Implication Goal Method Result Data Experiment 1
  • 40. Hypotheses ‘erode’ into facts: To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity, KnownFact KnownFact Concepts Hypothesis Implication Goal Goal Method Result Method Result Data Data Experiment 1 Experiment 2
  • 41. Hypotheses ‘erode’ into facts: To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... Raver-Shapira et.al, JMolCell 2007 Therefore, these results point to two miRNAs, miRNA-372 and-373, function as LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression, tumorigenicity, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006). KnownFact KnownFact Concepts Hypothesis Implication Fact Goal Goal Method Result Method Result Data Data Experiment 1 Experiment 2
  • 42. Hypotheses ‘erode’ into facts: Yabuta, JBioChem 2007 miR-372 and miR-373 target the Lats2 tumor suppressor To investigate the possibility that (Voorhoeve et al., 2006) miR-372 and miR-373 suppress the expression of LATS2, we... Raver-Shapira et.al, JMolCell 2007 Therefore, these results point to two miRNAs, miRNA-372 and-373, function as LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression, tumorigenicity, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006). KnownFact KnownFact Concepts Hypothesis Implication Fact Goal Goal Method Result Method Result Data Data Experiment 1 Experiment 2
  • 43. Hypotheses ‘erode’ Hypothesis + into facts: Yabuta, JBioChem 2007 miR-372 and miR-373 target the Lats2 tumor suppressor To investigate the possibility that Evidence (Voorhoeve et al., 2006) miR-372 and miR-373 suppress the expression of LATS2, we... Raver-Shapira et.al, JMolCell 2007 Therefore, these results point to two miRNAs, miRNA-372 and-373, function as LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression, tumorigenicity, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006). KnownFact KnownFact Concepts Hypothesis Implication Fact Goal Goal Method Result Method Result Data Data Experiment 1 Experiment 2
  • 44. Identification of hypotheses in papers: SWAN Alzheimer KB
  • 45. Identification of hypotheses in papers: SWAN Alzheimer KB
  • 46. Identification of hypotheses in papers: SWAN Alzheimer KB
  • 47. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  • 48. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  • 49. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  • 50. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  • 51. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  • 52. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  • 53. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesisHypothesis 22: Intramembrenous Aβ dimer may be toxic identifier - U Tilburg: ML for Science Derived from: POSTAT_CONTRIBUTION(This essay explo possibility that a fraction of these Abeta peptides never leav - Elsevier, UUtrecht: Discourse analysis of biology
  • 54. HypER Activities: http://hyper.wik.is Current activities: - Aligning discourse ontologies: joint task with W3C HCLSSig - Aligning architectures to exchange hypotheses + evidence - Format for a rhetorical conference paper (SALT + abcde) - Parser comparison of hypothesis identification tools - With NIF+SWAN/SCF: Structured Rhetorical Abstracts for Neuroscience publications Further interests: - Better structure of evidence: MyExperiment, KeFeD, ... - Granularity of annotation/access: entity, hypothesis, discussion? - Where/how annotate: http://saakm2009.semanticauthoring.org/
  • 56. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content - Interpret, visualize or connect science knowledge - Provide tools/ideas for measuring the impact of these improvements. June 2008: 71 Submissions from 15 countries. August 2008: 10 Semi-finalists teams, access to: - 500,000 full text articles - Plus EMTREE, EmBase, Scopus - Created tool/demo - Presented to the Judges - Wrote a paper (accepted for JWeb Semantics) April 2009: Judges selected 4 Finalist teams. And the winners are:
  • 57. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content 2 Related Work - Interpret, visualize or connect science knowledge Using link text for summarisation has been explo - Provide tools/ideas for measuring the impact of these previously by Amitay and Paris (2000). They ide improvements. fied situations when it was possible to generate su June 2008: 71 Submissions from 15 countries. maries of web-pages by recycling human-autho descriptions of links from anchor text. In our wo August 2008: 10 Semi-finalists teams, access to: we use the anchor text as the reading context to p - 500,000 full text articles vide an elaborative summary for the linked do - Plus EMTREE, EmBase, Scopus ment. Our work is similar in domain to that of the 2 - Created tool/demo CLEF WiQA shared task.4 However, in contras - Presented to the Judges our application scenario, the end goal of the sha - Wrote a paper (accepted for JWeb Semantics) task focuses on suggesting editing updates fo April 2009: Judges selected 4 Finalist teams. particular document and not on elaborating on user’s reading context. And the winners are: A related task was explored at the Document
  • 58. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content 2 Related Work - Interpret, visualize or connect science knowledge Using link text for summarisation has been explo - Provide tools/ideas for measuring the impact of these previously by Amitay and Paris (2000). They ide improvements. fied situations when it was possible to generate su June 2008: 71 Submissions from 15 countries. maries of web-pages by recycling human-autho descriptions of links from anchor text. In our wo August 2008: 10 Semi-finalists teams, access to: we use the anchor text as the reading context to p - 500,000 full text articles vide an elaborative summary for the linked do - Plus EMTREE, EmBase, Scopus ment. Our work is similar in domain to that of the 2 - Created tool/demo CLEF WiQA shared task.4 However, in contras - Presented to the Judges our application scenario, the end goal of the sha - Wrote a paper (accepted for JWeb Semantics) task focuses on suggesting editing updates fo April 2009: Judges selected 4 Finalist teams. particular document and not on elaborating on user’s reading context. And the winners are: A related task was explored at the Document
  • 59. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content 2 Related Work - Interpret, visualize or connect science knowledge Using link text for summarisation has been explo - Provide tools/ideas for measuring the impact of these previously by Amitay and Paris (2000). They ide improvements. fied situations when it was possible to generate su June 2008: 71 Submissions from 15 countries. maries of web-pages by recycling human-autho descriptions of links from anchor text. In our wo August 2008: 10 Semi-finalists teams, access to: we use the anchor text as the reading context to p - 500,000 full text articles vide an elaborative summary for the linked do - Plus EMTREE, EmBase, Scopus ment. Our work is similar in domain to that of the 2 - Created tool/demo CLEF WiQA shared task.4 However, in contras - Presented to the Judges our application scenario, the end goal of the sha - Wrote a paper (accepted for JWeb Semantics) task focuses on suggesting editing updates fo April 2009: Judges selected 4 Finalist teams. particular document and not on elaborating on user’s reading context. And the winners are: A related task was explored at the Document
  • 60. Scope: Tools and processes to: - Improve the process of creating, reviewing and editing scientific content 2 Related Work - Interpret, visualize or connect science knowledge Using link text for summarisation has been explo - Provide tools/ideas for measuring the impact of these previously by Amitay and Paris (2000). They ide improvements. fied situations when it was possible to generate su June 2008: 71 Submissions from 15 countries. maries of web-pages by recycling human-autho descriptions of links from anchor text. In our wo August 2008: 10 Semi-finalists teams, access to: we use the anchor text as the reading context to p - 500,000 full text articles vide an elaborative summary for the linked do - Plus EMTREE, EmBase, Scopus ment. Our work is similar in domain to that of the 2 - Created tool/demo CLEF WiQA shared task.4 However, in contras - Presented to the Judges our application scenario, the end goal of the sha - Wrote a paper (accepted for JWeb Semantics) task focuses on suggesting editing updates fo April 2009: Judges selected 4 Finalist teams. particular document and not on elaborating on user’s reading context. And the winners are: A related task was explored at the Document
  • 61. Harvard, Fall 2010 FoRC: The Future of Research Communication Improve ways to: - create, review, edit scientific content - interpret, visualize, connect scientific knowledge - measure the impact of these improvements - present new paradigms in publishing - serve global academic knowledge communities - communicate between collaboratories - share research data - support complex knowledge ecosystems.
  • 62. Harvard, Fall 2010 FoRC: The Future of Research Communication Improve ways to: - create, review, edit scientific content - interpret, visualize, connect scientific knowledge - measure the impact of these improvements - present new paradigms in publishing - serve global academic knowledge communities - communicate between collaboratories - share research data - support complex knowledge ecosystems. New ways of communicating science requires new ways of communicating Conference registrants form a community: - Website is their meeting place, continues as long as users want to - The physical conference is synchronous manifestation of the community - Others can participate through a virtual environment at other time/place
  • 63. Acknowledgments HypER: - Funding: - Harvard/MGH: FP7-Cordis Tim Clark NWO Casimir Project Elizabeth Wu - UUtrecht: - DERI: Leen Breure, Siegfried Handschuh Tudor Groza Henk Pander Maat Matthias Samwald Ted Sanders Paul Buitelaar - ISI: - Xerox Research: Gully Burns Agnes Sandor Ed Hovy - Open University: - Elsevier: Simon Buckingham Shum David Marques David Shotton Darin McBeath Jack Park Stefano Bocconi Clara Mancini Adriaan Klinkenberg Emilie Marcus - Oxford University: Annamaria Carusi - Challenge judges and participants: David Shotton EMBL Team, CMU Team David Shotton - Tilburg U: Piroska Lendvai Alfonso Valencia