SlideShare a Scribd company logo
Using OWL Domain Models as
       Abstract Workflow Models

                   Or...
Conducting in silico research in the Web
    from hypothesis to publication

                  Mark Wilkinson
Isaac Peral Senior Researcher in Biological Informatics
Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain
Adjunct Professor of Medical Genetics, University of British Columbia
Vancouver, BC, Canada.
Context
  “While it took 2,300 years after the first
  report of angina for the condition to be
  commonly taught in medical
  curricula, modern discoveries are
  being disseminated at an increasingly
  rapid pace. Focusing on the last 150
  years, the trend still appears to be
  linear, approaching the axis around
  2025.”


The Healthcare Singularity and the Age of Semantic
Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-
Intensive Scientific Discovery Tony Hey (Editor), 2009

Slide adapted with permission from Joanne Luciano, Presentation
at Health Web Science Workshop 2012, Evanston IL, USA
June 22, 2012.
“The Singularity”




              The X-intercept is where, the moment a discovery is
                   made, it is immediately put into practice

     (not only medical practice, but any research endeavour...)

The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009
Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA
June 22, 2012.
The technology required
     to achieve this
   does not yet exist
You
                                      Are
                                      Here




Scientific research would have to be conducted
              within a medium that
 immediately interpreted and disseminated
                  the results...
You
                                           Are
                                           Here




...in a form that immediately (actively!) affected the
                  research of others...
You
                                  Are
                                  Here




...without requiring them to be aware
       of these new discoveries.
To achieve this vision


 We must learn how to
do research IN the Web


 Not OVER the Web
How we use
the Web today
To achieve this vision


 We must learn how to
do research IN the Web


 Not OVER the Web
I’d like to show you how close
   we now are to this vision




   and how we got there
Web Science 2.0
We wanted to duplicate
a real, peer-reviewed, bioinformatics analysis


   simply by building a model in the Web
       describing what the answer
                  (if one existed)
               would look like
...the machine had to make
     every other decision
         on it’s own
This is the study we chose:
Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
Original Study Simplified




Using what is known about interactions in fly & yeast


         predict new interactions with your
             human protein of interest
Abstracted

Given a protein P in Species X

   Find proteins similar to P in Species Y
   Retrieve interactors in Species Y
   Sequence-compare Y-interactors with Species X genome
        (1)  Keep only those with homologue in X


   Find proteins similar to P in Species Z
   Retrieve interactors in Species Z
   Sequence-compare Z-interactors with (1)



              Putative interactors in Species X
Modeling the answer...




                                           OWL




               Web Ontology Language (OWL) is the
                   language approved by the W3C
               for representing knowledge in the Web
Modeling the answer...


                   Note that every word in
                   this diagram is, in reality, a
                   URL (because it is OWL)
Modeling the answer...


                   The model of a Potential
                   Interactor is published in
                   The Web

                   It utilizes concepts from
                   other models published in
                   The Web
                   (ours and other’s)
                   by referencing their URLs
Modeling the answer...


                   The model of a Potential
                   Interactor is a network of
                   concepts distributed
                   within the Web

                   It will be affected by
                   changes to those concepts

                   We do not “own” all of
                   those concepts!
Modeling the answer...


               ProbableInteractor
                   is homologous to (
                       Potential Interactor from ModelOrganism1…)
                       and
                      Potential Interactor from ModelOrganism2…)




Probable Interactor is defined in OWL as a subclass of Potential Interactor
   that requires homologous pairs of interacting proteins to exist in both
                       comparator model organisms.

                        (Effectively, an intersection)
Publish our OWL model of a Probable Interactor


                  in the Web
Running a Web Science 2.0
          Experiment

                     In a local data-file

         provide the protein we are interested in

  and the two species we wish to use in our comparison




taxon:9606       a      i:OrganismOfInterest . # human
uniprot:Q9UK53   a      i:ProteinOfInterest . # ING1
taxon:4932       a      i:ModelOrganism1 . # yeast
taxon:7227       a      i:ModelOrganism2 . # fly
The tricky bit is...

  In the abstract, the
search for homology is
“generic” – ANY model
       organism.

But when the machine
   attempts to do the
experiment, it will have
to use several different
and specific resources
 because our question
 specifies two different   taxon:4932   a   i:ModelOrganism1 . # yeast
        species            taxon:7227   a   i:ModelOrganism2 . # fly
This is the question we ask:
                  (the query language here is SPARQL)




PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#>

SELECT ?protein
FROM   <file:/local/workflow.input.n3>
WHERE {

              ?protein        a       i:ProbableInteractor .


        }



                       The reference (URL) to our OWL model of the answer
Our system then derives (and executes) the following workflow automatically




                                                  These are different
                                                  Web services!

                                                  ...selected at run-time
                                                  based on the same model
Web Science - ISoLA 2012
There are three very cool things about what you just saw...
There are three very cool things about what you just saw...



              The system was able to
            create a workflow based on
             an OWL model (ontology)
There are three very cool things about what you just saw...



          The system was able to create a
            COMPUTATIONAL workflow
          based on a BIOLOGICAL model
There are three very cool things about what you just saw...



                        The workflow it created
                       (i.e. the services chosen)
                    differed depending on context




taxon:4932   a   i:ModelOrganism1 . # yeast


taxon:7227   a   i:ModelOrganism2 . # fly
We got the answer

“simply” by designing a model of the answer!
How did we do that?
Design Pattern for
Web Services on the Semantic Web
A Web application that answers
    SPARQL-DL queries

      Query-answering
     Enhanced by SADI
Demos of SADI and SHARE
What is the phenotype of every allele of the
          Antirrhinum majus DEFICIENS gene




SELECT ?allele    ?image     ?desc

WHERE {
      locus:DEF            genetics:hasVariant      ?allele .
        ?allele            info:visualizedByImage   ?image .
         ?image            info:hasDescription      ?desc
}
What is the phenotype of every allele of the
          Antirrhinum majus DEFICIENS gene




SELECT ?allele    ?image     ?desc

WHERE {
      locus:DEF            genetics:hasVariant            ?allele .
        ?allele            info:visualizedByImage         ?image .
         ?image            info:hasDescription            ?desc
}


                       Note that there is no “FROM” clause!
                       We don’t tell it where it should get the information,
                       The machine has to figure that out by itself...
Enter that query into
      SHARE
Click “Submit”...
SHARE examines available SADI Web Services
 ...and in a few seconds you get your answer.
The query results are live hyperlinks
to the respective Database or images
        (the answer is IN the Web!)
What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
        uniprot:P47989   pred:isEncodedBy    ?gene .
        ?gene            ont:isParticipantIn ?pathway .
}
What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
        uniprot:P47989   pred:isEncodedBy    ?gene .
        ?gene            ont:isParticipantIn ?pathway .
}
What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
        uniprot:P47989   pred:isEncodedBy    ?gene .
        ?gene            ont:isParticipantIn ?pathway .
}



               Note again that there is no “From” clause…

              I have not told SHARE where to look for the
                 answer, I am simply asking my question
Enter that query into
      SHARE
Web Science - ISoLA 2012
Web Science - ISoLA 2012
Two different
Two different   providers of
providers of    pathway
gene            information
information     (KEGG and
(KEGG &         GO);
NCBI);          were found &
were found &    accessed
accessed
The results are all links to the original data
                                   (The answer is IN the Web!)
Show me the latest Blood Urea Nitrogen and Creatinine levels
    of patients who appear to be rejecting their transplants
       (I showed you this query in ISoLA 2010… sorry for repeating myself  )




PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
        ?patient rdf:type           patient:LikelyRejecter .
        ?patient l:latestBUN        ?bun .
        ?patient l:latestCreatinine ?creat .
}
Show me the latest Blood Urea Nitrogen and Creatinine levels
    of patients who appear to be rejecting their transplants
          (I showed you this query in 2010… sorry for repeating myself!)




PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
        ?patient rdf:type           patient:LikelyRejecter .
        ?patient l:latestBUN        ?bun .
        ?patient l:latestCreatinine ?creat .
}
Likely Rejecter:

A patient who has creatinine levels
   that are increasing over time


                 - - Mark D Wilkinson’s definition
Likely Rejecter:

  …but there is no “likely rejecter”
 column or table in our database…
only blood chemistry measurements
        at various time-points
Likely Rejecter:

So the data required to answer this question
             DOESN’T EXIST!
?
Enter that query into
      SHARE
SHARE “decomposes” the
        Likely Rejector OWL class
into its constituent property restrictions
Each property restriction in the Class
is matched with a SADI Service

The matched SADI Service can
generate data that has that property
SHARE chains these SADI services
are into a workflow...

...the outputs from that workflow are
Instances (OWL Individuals)
of the Likely Rejector OWL Class
For example… SHARE utilizes SADI to discover
analytical services on the Web that do linear regression analysis;

required for the “increasing over time” part of the Class definition
VOILA!
SHARE examines the OWL Class

  Gathers, from the Web, the ontologies that are
             referenced by that Class

 then uses those ontological properties to identify
  which data-sources and analytical tools it must
access to create data matching that Class definition
OWL
The way SHARE builds the workflow varies
        depending on the context of the query
(i.e. which data/ontologies it reads – Mine? Yours?)


              and on what part of the query
     it is trying to answer at any given moment
(which ontological concept is relevant to that clause)
And that brings us back to...
Web Science 2.0
Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies
data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
derives and executes the following workflow automatically
    using an OWL ontology that describes the biology
The analytical tools chosen for that
 workflow were determined based on

               context

even though the biological (ontological)
 model driving their selection was the
                 same
i.e.

The published model is re-usable
i.e.

      The published model is re-usable

In different contexts... by different researchers
Because the model IS the experiment


   the published EXPERIMENT is re-usable!!




Simply point the same query at your own dataset...
The

scientific publication

        is an

executable document!
Every component of the model

Every component of the input data

Every component of the output data

            is a URL


Therefore the model, the question,
 the experiment, and the results

    are inherently IN the Web
Every component of the model

        Every component of the input data

       Every component of the output data

                    is a URL


  The answer, and the knowledge derived from it,
  is immediately available to Web search engines
and moreover, can instantly affect the outcome of
        other Web Science experiments
Web Science - ISoLA 2012
You
Are Now
 Here!!!
Change the way we think of “hypotheses”
In Web Science 2.0


Model what the world would “look like”
    if your hypothesis were true


   Then ask “is there any data that
          fits that model?”
Please join us!

SADI and SHARE are Open-Source projects

       http://sadiframework.org
My New Home!
University of British Columbia


Luke McCarthy – Lead Dev.                  Edward Kawas
Everything...                              SADI Service auto-generator



Benjamin VanderValk                        Ian Wood
SHARE & SADI & Experimental modeling &     Experimental modeling project
myHeath Button




Soroush Samadian
Cardiovascular data modeling and queries
C-BRASS Collaborators at other sites

U of New Brunswick      Carleton University

Dr. Chris Baker         Dr. Michel Dumontier
Alexandre Riazanov      Marc-Alexandre Nolin
                        Leonid Chepelev
                        Steve Etlinger
                        Nichaella Kieth
                        Jose Cruz
Microsoft Research

More Related Content

PPTX
Web Science, SADI, and the Singularity
PPT
The Seven Deadly Sins of Bioinformatics
PPTX
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
PDF
Neo4j and bioinformatics
PPT
A biologist in e-Science
PPT
PPTX
How SADI & SHARE help restore the Scientific Method to in silico science
Web Science, SADI, and the Singularity
The Seven Deadly Sins of Bioinformatics
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Neo4j and bioinformatics
A biologist in e-Science
How SADI & SHARE help restore the Scientific Method to in silico science

Viewers also liked (20)

PPT
PHUG - Open Source Culture
PDF
Teamworks Campaign Blueprint
PDF
Tutorial 1.2 - Import modules from Biomart
PDF
Portfolio PlusAnimations 2009 ENG
PPS
82378 andrea bocelli-y_celline_dion-1
PDF
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
PDF
Part 1: Lean Clinical Workplace Design
PPS
¡LA INTERNACIONALIZACIÓN DE LA AMAZONA!
PPTX
SADI in Taverna Tutorial
PPTX
Smart brief content marketing trifecta
PDF
Analyze Your Modules
PPS
¡LA BELLEZA DE LOS ARBOLES!
PPS
ANDRE BOCELLI- SUIZA
PPT
i楼市
PPTX
Technologies, methods and challenges to data sharing and aggrigation
PPT
Making the Most of Plug-ins - WordCamp Toronto 2008
PDF
Portfolio PlusAnimations 2009 NL
PPS
¡UNA BOTELLA AGUA....Y QUE!
PPTX
Good library use
PDF
UX Munich2015
PHUG - Open Source Culture
Teamworks Campaign Blueprint
Tutorial 1.2 - Import modules from Biomart
Portfolio PlusAnimations 2009 ENG
82378 andrea bocelli-y_celline_dion-1
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
Part 1: Lean Clinical Workplace Design
¡LA INTERNACIONALIZACIÓN DE LA AMAZONA!
SADI in Taverna Tutorial
Smart brief content marketing trifecta
Analyze Your Modules
¡LA BELLEZA DE LOS ARBOLES!
ANDRE BOCELLI- SUIZA
i楼市
Technologies, methods and challenges to data sharing and aggrigation
Making the Most of Plug-ins - WordCamp Toronto 2008
Portfolio PlusAnimations 2009 NL
¡UNA BOTELLA AGUA....Y QUE!
Good library use
UX Munich2015
Ad

Similar to Web Science - ISoLA 2012 (20)

PPT
The seven-deadly-sins-of-bioinformatics3960
PPTX
Wilkinson bosc2010 moby-to-sadi
PPTX
The Semantic Web - This time... its Personal
PPTX
Presentation to the J. Craig Venter Institute, Dec. 2014
PPTX
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
PDF
Semantic IoT Semantic Inter-Operability Practices - Part 1
PDF
Chemistry made mobile – the expanding world of chemistry in the hand
PPT
Introduction to Ontologies for Environmental Biology
PPTX
Presentationonline
PPT
Ontology - and Reloaded and Revolutions
PPTX
Phyloinformatics and the Semantic Web
PDF
ssie_ibic_lecture21_slides.pdf
PPTX
Machine creativity TED Talk 2.0
PPTX
Machine creativity TED Talk 2.0
PPTX
Research Objects for FAIRer Science
PPTX
The 4th New Science
PPTX
2013 nas-ehs-data-integration-dc
PPTX
myExperiment and the Rise of Social Machines
PPTX
Tales from BioLand - Engineering Challenges in the World of Life Sciences
PPT
OWL-XML-Summer-School-09
The seven-deadly-sins-of-bioinformatics3960
Wilkinson bosc2010 moby-to-sadi
The Semantic Web - This time... its Personal
Presentation to the J. Craig Venter Institute, Dec. 2014
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Semantic IoT Semantic Inter-Operability Practices - Part 1
Chemistry made mobile – the expanding world of chemistry in the hand
Introduction to Ontologies for Environmental Biology
Presentationonline
Ontology - and Reloaded and Revolutions
Phyloinformatics and the Semantic Web
ssie_ibic_lecture21_slides.pdf
Machine creativity TED Talk 2.0
Machine creativity TED Talk 2.0
Research Objects for FAIRer Science
The 4th New Science
2013 nas-ehs-data-integration-dc
myExperiment and the Rise of Social Machines
Tales from BioLand - Engineering Challenges in the World of Life Sciences
OWL-XML-Summer-School-09
Ad

More from Mark Wilkinson (20)

PPTX
FAIR Metrics - Presentation to NIH KC1
PDF
Introducing the fair evaluator
PPTX
FAIR Projector Builder
PPTX
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
PPTX
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
PPTX
IBC FAIR Data Prototype Implementation slideshow
PDF
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
PDF
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
DOCX
Sample data and other ur ls
DOCX
Example code for the SADI BMI Calculator Web Service
DOCX
Sadi service
PPTX
Tutorial - Creating SADI semantic-web-services
PPTX
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
PPTX
Force11 JDDCP workshop presentation, @ Force2015, Oxford
PPTX
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
PPTX
SADI CSHALS 2013
PPTX
Web Science 2.0 - in silico science
PPTX
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
PPTX
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
PPTX
ISoLA 2010: SADI Taverna plug-in
FAIR Metrics - Presentation to NIH KC1
Introducing the fair evaluator
FAIR Projector Builder
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
IBC FAIR Data Prototype Implementation slideshow
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Sample data and other ur ls
Example code for the SADI BMI Calculator Web Service
Sadi service
Tutorial - Creating SADI semantic-web-services
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
SADI CSHALS 2013
Web Science 2.0 - in silico science
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
ISoLA 2010: SADI Taverna plug-in

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
A Presentation on Artificial Intelligence
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Approach and Philosophy of On baking technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Monthly Chronicles - July 2025
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectral efficient network and resource selection model in 5G networks
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MYSQL Presentation for SQL database connectivity
A Presentation on Artificial Intelligence
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Approach and Philosophy of On baking technology
20250228 LYD VKU AI Blended-Learning.pptx
Cloud computing and distributed systems.
NewMind AI Monthly Chronicles - July 2025
“AI and Expert System Decision Support & Business Intelligence Systems”
Review of recent advances in non-invasive hemoglobin estimation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Web Science - ISoLA 2012

  • 1. Using OWL Domain Models as Abstract Workflow Models Or... Conducting in silico research in the Web from hypothesis to publication Mark Wilkinson Isaac Peral Senior Researcher in Biological Informatics Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain Adjunct Professor of Medical Genetics, University of British Columbia Vancouver, BC, Canada.
  • 2. Context “While it took 2,300 years after the first report of angina for the condition to be commonly taught in medical curricula, modern discoveries are being disseminated at an increasingly rapid pace. Focusing on the last 150 years, the trend still appears to be linear, approaching the axis around 2025.” The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data- Intensive Scientific Discovery Tony Hey (Editor), 2009 Slide adapted with permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA June 22, 2012.
  • 3. “The Singularity” The X-intercept is where, the moment a discovery is made, it is immediately put into practice (not only medical practice, but any research endeavour...) The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009 Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA June 22, 2012.
  • 4. The technology required to achieve this does not yet exist
  • 5. You Are Here Scientific research would have to be conducted within a medium that immediately interpreted and disseminated the results...
  • 6. You Are Here ...in a form that immediately (actively!) affected the research of others...
  • 7. You Are Here ...without requiring them to be aware of these new discoveries.
  • 8. To achieve this vision We must learn how to do research IN the Web Not OVER the Web
  • 9. How we use the Web today
  • 10. To achieve this vision We must learn how to do research IN the Web Not OVER the Web
  • 11. I’d like to show you how close we now are to this vision and how we got there
  • 13. We wanted to duplicate a real, peer-reviewed, bioinformatics analysis simply by building a model in the Web describing what the answer (if one existed) would look like
  • 14. ...the machine had to make every other decision on it’s own
  • 15. This is the study we chose:
  • 16. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
  • 17. Original Study Simplified Using what is known about interactions in fly & yeast predict new interactions with your human protein of interest
  • 18. Abstracted Given a protein P in Species X Find proteins similar to P in Species Y Retrieve interactors in Species Y Sequence-compare Y-interactors with Species X genome (1)  Keep only those with homologue in X Find proteins similar to P in Species Z Retrieve interactors in Species Z Sequence-compare Z-interactors with (1)  Putative interactors in Species X
  • 19. Modeling the answer... OWL Web Ontology Language (OWL) is the language approved by the W3C for representing knowledge in the Web
  • 20. Modeling the answer... Note that every word in this diagram is, in reality, a URL (because it is OWL)
  • 21. Modeling the answer... The model of a Potential Interactor is published in The Web It utilizes concepts from other models published in The Web (ours and other’s) by referencing their URLs
  • 22. Modeling the answer... The model of a Potential Interactor is a network of concepts distributed within the Web It will be affected by changes to those concepts We do not “own” all of those concepts!
  • 23. Modeling the answer... ProbableInteractor is homologous to ( Potential Interactor from ModelOrganism1…) and Potential Interactor from ModelOrganism2…) Probable Interactor is defined in OWL as a subclass of Potential Interactor that requires homologous pairs of interacting proteins to exist in both comparator model organisms. (Effectively, an intersection)
  • 24. Publish our OWL model of a Probable Interactor in the Web
  • 25. Running a Web Science 2.0 Experiment In a local data-file provide the protein we are interested in and the two species we wish to use in our comparison taxon:9606 a i:OrganismOfInterest . # human uniprot:Q9UK53 a i:ProteinOfInterest . # ING1 taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly
  • 26. The tricky bit is... In the abstract, the search for homology is “generic” – ANY model organism. But when the machine attempts to do the experiment, it will have to use several different and specific resources because our question specifies two different taxon:4932 a i:ModelOrganism1 . # yeast species taxon:7227 a i:ModelOrganism2 . # fly
  • 27. This is the question we ask: (the query language here is SPARQL) PREFIX i: <http://sadiframework.org/ontologies/InteractingProteins.owl#> SELECT ?protein FROM <file:/local/workflow.input.n3> WHERE { ?protein a i:ProbableInteractor . } The reference (URL) to our OWL model of the answer
  • 28. Our system then derives (and executes) the following workflow automatically These are different Web services! ...selected at run-time based on the same model
  • 30. There are three very cool things about what you just saw...
  • 31. There are three very cool things about what you just saw... The system was able to create a workflow based on an OWL model (ontology)
  • 32. There are three very cool things about what you just saw... The system was able to create a COMPUTATIONAL workflow based on a BIOLOGICAL model
  • 33. There are three very cool things about what you just saw... The workflow it created (i.e. the services chosen) differed depending on context taxon:4932 a i:ModelOrganism1 . # yeast taxon:7227 a i:ModelOrganism2 . # fly
  • 34. We got the answer “simply” by designing a model of the answer!
  • 35. How did we do that?
  • 36. Design Pattern for Web Services on the Semantic Web
  • 37. A Web application that answers SPARQL-DL queries Query-answering Enhanced by SADI
  • 38. Demos of SADI and SHARE
  • 39. What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene SELECT ?allele ?image ?desc WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc }
  • 40. What is the phenotype of every allele of the Antirrhinum majus DEFICIENS gene SELECT ?allele ?image ?desc WHERE { locus:DEF genetics:hasVariant ?allele . ?allele info:visualizedByImage ?image . ?image info:hasDescription ?desc } Note that there is no “FROM” clause! We don’t tell it where it should get the information, The machine has to figure that out by itself...
  • 41. Enter that query into SHARE
  • 43. SHARE examines available SADI Web Services ...and in a few seconds you get your answer.
  • 44. The query results are live hyperlinks to the respective Database or images (the answer is IN the Web!)
  • 45. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  • 46. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  • 47. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . } Note again that there is no “From” clause… I have not told SHARE where to look for the answer, I am simply asking my question
  • 48. Enter that query into SHARE
  • 51. Two different Two different providers of providers of pathway gene information information (KEGG and (KEGG & GO); NCBI); were found & were found & accessed accessed
  • 52. The results are all links to the original data (The answer is IN the Web!)
  • 53. Show me the latest Blood Urea Nitrogen and Creatinine levels of patients who appear to be rejecting their transplants (I showed you this query in ISoLA 2010… sorry for repeating myself  ) PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 54. Show me the latest Blood Urea Nitrogen and Creatinine levels of patients who appear to be rejecting their transplants (I showed you this query in 2010… sorry for repeating myself!) PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patient rdf:type patient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 55. Likely Rejecter: A patient who has creatinine levels that are increasing over time - - Mark D Wilkinson’s definition
  • 56. Likely Rejecter: …but there is no “likely rejecter” column or table in our database… only blood chemistry measurements at various time-points
  • 57. Likely Rejecter: So the data required to answer this question DOESN’T EXIST!
  • 58. ?
  • 59. Enter that query into SHARE
  • 60. SHARE “decomposes” the Likely Rejector OWL class into its constituent property restrictions
  • 61. Each property restriction in the Class is matched with a SADI Service The matched SADI Service can generate data that has that property
  • 62. SHARE chains these SADI services are into a workflow... ...the outputs from that workflow are Instances (OWL Individuals) of the Likely Rejector OWL Class
  • 63. For example… SHARE utilizes SADI to discover analytical services on the Web that do linear regression analysis; required for the “increasing over time” part of the Class definition
  • 65. SHARE examines the OWL Class Gathers, from the Web, the ontologies that are referenced by that Class then uses those ontological properties to identify which data-sources and analytical tools it must access to create data matching that Class definition
  • 66. OWL
  • 67. The way SHARE builds the workflow varies depending on the context of the query (i.e. which data/ontologies it reads – Mine? Yours?) and on what part of the query it is trying to answer at any given moment (which ontological concept is relevant to that clause)
  • 68. And that brings us back to...
  • 70. Gordon, P.M.K., Soliman, M.A., Bose, P., Trinh, Q., Sensen, C.W., Riabowol, K.: Interspecies data mining to predict novel ING-protein interactions in human. BMC genomics. 9, 426 (2008).
  • 71. derives and executes the following workflow automatically using an OWL ontology that describes the biology
  • 72. The analytical tools chosen for that workflow were determined based on context even though the biological (ontological) model driving their selection was the same
  • 73. i.e. The published model is re-usable
  • 74. i.e. The published model is re-usable In different contexts... by different researchers
  • 75. Because the model IS the experiment the published EXPERIMENT is re-usable!! Simply point the same query at your own dataset...
  • 76. The scientific publication is an executable document!
  • 77. Every component of the model Every component of the input data Every component of the output data is a URL Therefore the model, the question, the experiment, and the results are inherently IN the Web
  • 78. Every component of the model Every component of the input data Every component of the output data is a URL The answer, and the knowledge derived from it, is immediately available to Web search engines and moreover, can instantly affect the outcome of other Web Science experiments
  • 81. Change the way we think of “hypotheses”
  • 82. In Web Science 2.0 Model what the world would “look like” if your hypothesis were true Then ask “is there any data that fits that model?”
  • 83. Please join us! SADI and SHARE are Open-Source projects http://sadiframework.org
  • 85. University of British Columbia Luke McCarthy – Lead Dev. Edward Kawas Everything... SADI Service auto-generator Benjamin VanderValk Ian Wood SHARE & SADI & Experimental modeling & Experimental modeling project myHeath Button Soroush Samadian Cardiovascular data modeling and queries
  • 86. C-BRASS Collaborators at other sites U of New Brunswick Carleton University Dr. Chris Baker Dr. Michel Dumontier Alexandre Riazanov Marc-Alexandre Nolin Leonid Chepelev Steve Etlinger Nichaella Kieth Jose Cruz

Editor's Notes

  • #3: n 1499, when Portuguese explorer Vasco da Gama returned home after completing the first-ever sea voyage from Europe to India, he had less than half of his original crew with him— scurvy had claimed the lives of 100 of the 160 men. Through-out the Age of Discovery,1 scurvy was the leading cause of death among sailors. Ship captains typically planned for the death of as many as half of their crew during long voyages. A dietary cause for scurvy was suspected, but no one had proved it. More than a century later, on a voyage from England to India in 1601, Captain James Lancaster placed the crew of one of his four ships on a regi- men of three teaspoons of lemon juice a day. By the halfway point of the trip, almost 40% of the men (110 of 278) on three of the ships had died, while on the lemon-supplied ship, every man sur- vived [1]. The British navy responded to this discovery by repeat- ing the experiment—146 years later.In 1747, a British navy physician named James Lind treated sail- ors suffering from scurvy using six randomized approaches and demonstrated that citrus reversed the symptoms. The British navy responded, 48 years later, by enacting new dietary guidelines re- quiring citrus, which virtually eradicated scurvy from the British fleet overnight. The British Board of Trade adopted similar dietary practices for the merchant fleet in 1865, an additional 70 years later. The total time from Lancaster’s definitive demonstration of how to prevent scurvy to adoption across the British Empire was 264 years [2].The translation of medical discovery to practice has thankfully improved sub- stantially. But a 2003 report from the Institute of Medicine found that the lag be- tween significant discovery and adoption into routine patient care still averages 17 years [3, 4]. This delayed translation of knowledge to clinical care has negative effects on both the cost and the quality of patient care. A nationwide review of 439 quality indicators found that only half of adults receive the care recommended by U.S. national standards [5].