SlideShare a Scribd company logo
Integration of GO, Pathway data and Interaction dataChris MungallPeter D’Eustachio
The GO was originally intended to integrate databasesHow are we doing?Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to addressGene Ontology: Tool for theUnificationofBiology. Nat Genet 2000SGDFBGOA
GOThe GO was originally intended to integrate databasesHow are we doing?Not as well as we could!GOSGDFBGOAPathway CommonsIMEXReactomeCyc…BioGRIDIntact…
Integration enhances analyses and reduces workloadDivision of laborleave specialized curation to specialized systems biology databasesbut data needs to be re-combined to prevent siloingGO is an invaluable single-stop shop for term enrichment etcCan we quantify how integrating with systems biology databases helps users?Yes! We can do the experiment:GO term enrichment analysis on all MolSigDBwithReactome annotationsAlso include Reactome inputs/outputs, not currently in GOAwithoutReactomeannotations
Integration enhances analyses GOA+R: Many p-values will significantly improvedRecapitulated biologically valid results that would have been suppressed had one single resource been usedExamples:Genes down-regulated in Alzheimers
How are we currently integrating systems biology datasets?Interaction dataCurrently Intact, soon IMEX“protein binding” and “self-protein binding” only (+with)Pathway dataCurrently ReactomeonlyLoses much of what is in ReactomeE,g,inputs and outputs Manually curated GO<->Reactome linksincompletenot always to the most specific termlabor-intensivebecome stale over timeother pathway databases?This can be improved!
Automating integration using cross-product definitions – pathway databases[Term]id: GO:0015871name: choline transportintersection_of: GO:0006810 ! transportintersection_of:results_in_transport_ofCHEBI:15354 ! choline
Automating integration using cross-products – pathway databasesWe can also automatically map:catalysis terms [165*]transport [373]binding [133]phosphorylation and other modificationsmetabolism [278]signaling…All this relies on different cross-product filesAny pathway database that exports BioPax-OWL can be usedE.ghumancyc, mousecyc, pathwaycommons, …*Numbers for Reactome-human
Automating integration using cross-products – interaction databasesFIGFVEGFRbindshas_functionis_a[Term]id: GO:0043184name: vascular endothelial growth factor receptor 2 bindingintersection_of: GO:0005488           ! bindingintersection_of:results_in_binding_ofPRO:000002112! VEGFR 2
Automated Integration: ResultsReactomeEvaluation in progressMany manually assigned equivalencies recapitulatedInferred equivalencies differed in some casessometimes better than manually assignedsometimes required info not in biopax exportongoing discussionsBioGridnot evaluated (all trivial)inferred annotations improve some enrichment resultsE.g. Brentani angiogenesis gene sets, increased enrichment for VEGFR bindingObvious but useful as proof of concept
Conclusions and future workWe can be more efficient:Coordinate with systems bio databases to divide laborPrevent siloing through semi-automated integrationGO acts as a high-level ‘window’ on systems biology databasesStill to be done:Make integration tool production-readyReconcile existing mis-alignments, particularly signalinghighly inconsistent between GO and ReactomeExplore open questions – e.g. auto-generate terms?Finish cross-products, they are vitalparticular PRO, CHEBI

More Related Content

PPT
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
PPT
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
PPT
Ebi public meeting on internet chemistry databases november 2010
PDF
ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification
PPTX
Metabolite Set Enrichment Analysis (ChemRICH)
PPTX
Analysing curated protein targets: Partitioning the drugged and the druggable
PPTX
Extracting reaction networks from databases – opening Pandora’s box
PPTX
Pathways and genomes databases in bioinformatics
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
Ebi public meeting on internet chemistry databases november 2010
ISMB2011 Tutorial: Biomedical Ontologies for data integration and verification
Metabolite Set Enrichment Analysis (ChemRICH)
Analysing curated protein targets: Partitioning the drugged and the druggable
Extracting reaction networks from databases – opening Pandora’s box
Pathways and genomes databases in bioinformatics

Viewers also liked (13)

PPTX
Short Introduction of software engineering for bioinformatics
PDF
20160530 journal club_jqo
PPTX
Kegg database resources
PDF
Functional And Pathway Analysis 2010
PPTX
B.Sc. Biochem II Biomolecule I U 3.1 Structure of Proteins
PPT
Protein structure classification
PPT
Biological databases
PPTX
PROTEIN DATABASE
PPTX
databases in bioinformatics
PPT
Protein classification
PPTX
Classification and properties of protein
Short Introduction of software engineering for bioinformatics
20160530 journal club_jqo
Kegg database resources
Functional And Pathway Analysis 2010
B.Sc. Biochem II Biomolecule I U 3.1 Structure of Proteins
Protein structure classification
Biological databases
PROTEIN DATABASE
databases in bioinformatics
Protein classification
Classification and properties of protein
Ad

Similar to Go pathway-interaction-integration (20)

PPTX
Integrating Pathway Databases with Gene Ontology Causal Activity Models
PDF
Protein Network Analysis
PDF
Investigating plant systems using data integration and network analysis
PDF
Integration of knowledge for personalized medicine: a pharmacogenomics...
PDF
NetBioSIG2013-Talk Robin Haw
PPT
UniProt-GOA
 
PDF
Tyler functional annotation thurs 1120
PPTX
Systems Immunology -- 2014
PPTX
Mapping metabolites against pathway databases
PPTX
WikiPathways: how open source and open data can make omics technology more us...
PPTX
Analysis with biological pathways:
PPTX
NetBioSIG2012 ugurdogrusoz-cbio
PPTX
Using biological network approaches for dynamic extension of micronutrient re...
PDF
Visualization Approaches for Biomedical Omics Data: Putting It All Together
PDF
NetBioSIG2013-KEYNOTE Benno Schwikowski
PPT
Semantic Web for Health Care and Biomedical Informatics
PPTX
Omic Data Integration Strategies
PDF
NetBioSIG2012 chrisevelo
PPTX
Data analysis & integration challenges in genomics
PPTX
The Gene Ontology & Gene Ontology Annotation resources
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Protein Network Analysis
Investigating plant systems using data integration and network analysis
Integration of knowledge for personalized medicine: a pharmacogenomics...
NetBioSIG2013-Talk Robin Haw
UniProt-GOA
 
Tyler functional annotation thurs 1120
Systems Immunology -- 2014
Mapping metabolites against pathway databases
WikiPathways: how open source and open data can make omics technology more us...
Analysis with biological pathways:
NetBioSIG2012 ugurdogrusoz-cbio
Using biological network approaches for dynamic extension of micronutrient re...
Visualization Approaches for Biomedical Omics Data: Putting It All Together
NetBioSIG2013-KEYNOTE Benno Schwikowski
Semantic Web for Health Care and Biomedical Informatics
Omic Data Integration Strategies
NetBioSIG2012 chrisevelo
Data analysis & integration challenges in genomics
The Gene Ontology & Gene Ontology Annotation resources
Ad

More from Chris Mungall (20)

PPTX
MADICES Mungall 2022.pptx
PPTX
Scaling up semantics; lessons learned across the life sciences
PPTX
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
PPTX
Ontology Access Kit_ Workshop Intro Slides.pptx
PPTX
LinkML Intro (for Monarch devs)
PPTX
LinkML presentation to Yosemite Group
PPTX
Experiences in the biosciences with the open biological ontologies foundry an...
PPTX
All together now: piecing together the knowledge graph of life
PPTX
Collaboratively Creating the Knowledge Graph of Life
PPTX
Representation of kidney structures in Uberon
PPTX
SparqlProg (BioHackathon 2019)
PPTX
Ontology Development Kit: Bio-Ontologies 2019
PPTX
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
PPTX
Uberon: opening up to community contributions
PPTX
Modeling exposure events and adverse outcome pathways using ontologies
PPTX
Causal reasoning using the Relation Ontology
PPTX
US2TS presentation on Gene Ontology
PPTX
Introduction to the BioLink datamodel
PPTX
Computing on Phenotypes AMP 2015
PPTX
ENVO GSC 2015
MADICES Mungall 2022.pptx
Scaling up semantics; lessons learned across the life sciences
LinkML Intro July 2022.pptx PLEASE VIEW THIS ON ZENODO
Ontology Access Kit_ Workshop Intro Slides.pptx
LinkML Intro (for Monarch devs)
LinkML presentation to Yosemite Group
Experiences in the biosciences with the open biological ontologies foundry an...
All together now: piecing together the knowledge graph of life
Collaboratively Creating the Knowledge Graph of Life
Representation of kidney structures in Uberon
SparqlProg (BioHackathon 2019)
Ontology Development Kit: Bio-Ontologies 2019
US2TS: Reasoning over multiple open bio-ontologies to make machines and human...
Uberon: opening up to community contributions
Modeling exposure events and adverse outcome pathways using ontologies
Causal reasoning using the Relation Ontology
US2TS presentation on Gene Ontology
Introduction to the BioLink datamodel
Computing on Phenotypes AMP 2015
ENVO GSC 2015

Go pathway-interaction-integration

  • 1. Integration of GO, Pathway data and Interaction dataChris MungallPeter D’Eustachio
  • 2. The GO was originally intended to integrate databasesHow are we doing?Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to addressGene Ontology: Tool for theUnificationofBiology. Nat Genet 2000SGDFBGOA
  • 3. GOThe GO was originally intended to integrate databasesHow are we doing?Not as well as we could!GOSGDFBGOAPathway CommonsIMEXReactomeCyc…BioGRIDIntact…
  • 4. Integration enhances analyses and reduces workloadDivision of laborleave specialized curation to specialized systems biology databasesbut data needs to be re-combined to prevent siloingGO is an invaluable single-stop shop for term enrichment etcCan we quantify how integrating with systems biology databases helps users?Yes! We can do the experiment:GO term enrichment analysis on all MolSigDBwithReactome annotationsAlso include Reactome inputs/outputs, not currently in GOAwithoutReactomeannotations
  • 5. Integration enhances analyses GOA+R: Many p-values will significantly improvedRecapitulated biologically valid results that would have been suppressed had one single resource been usedExamples:Genes down-regulated in Alzheimers
  • 6. How are we currently integrating systems biology datasets?Interaction dataCurrently Intact, soon IMEX“protein binding” and “self-protein binding” only (+with)Pathway dataCurrently ReactomeonlyLoses much of what is in ReactomeE,g,inputs and outputs Manually curated GO<->Reactome linksincompletenot always to the most specific termlabor-intensivebecome stale over timeother pathway databases?This can be improved!
  • 7. Automating integration using cross-product definitions – pathway databases[Term]id: GO:0015871name: choline transportintersection_of: GO:0006810 ! transportintersection_of:results_in_transport_ofCHEBI:15354 ! choline
  • 8. Automating integration using cross-products – pathway databasesWe can also automatically map:catalysis terms [165*]transport [373]binding [133]phosphorylation and other modificationsmetabolism [278]signaling…All this relies on different cross-product filesAny pathway database that exports BioPax-OWL can be usedE.ghumancyc, mousecyc, pathwaycommons, …*Numbers for Reactome-human
  • 9. Automating integration using cross-products – interaction databasesFIGFVEGFRbindshas_functionis_a[Term]id: GO:0043184name: vascular endothelial growth factor receptor 2 bindingintersection_of: GO:0005488 ! bindingintersection_of:results_in_binding_ofPRO:000002112! VEGFR 2
  • 10. Automated Integration: ResultsReactomeEvaluation in progressMany manually assigned equivalencies recapitulatedInferred equivalencies differed in some casessometimes better than manually assignedsometimes required info not in biopax exportongoing discussionsBioGridnot evaluated (all trivial)inferred annotations improve some enrichment resultsE.g. Brentani angiogenesis gene sets, increased enrichment for VEGFR bindingObvious but useful as proof of concept
  • 11. Conclusions and future workWe can be more efficient:Coordinate with systems bio databases to divide laborPrevent siloing through semi-automated integrationGO acts as a high-level ‘window’ on systems biology databasesStill to be done:Make integration tool production-readyReconcile existing mis-alignments, particularly signalinghighly inconsistent between GO and ReactomeExplore open questions – e.g. auto-generate terms?Finish cross-products, they are vitalparticular PRO, CHEBI

Editor's Notes

  • #3: The Gene Ontology was created as a response to the need to address the need for interoperability in genomic databases in the wake of the sequencing of the first metazoan genomes. In the paper Gene Ontology: tool for the unification of biology published nearly ten years ago, Ashburner et al state: Progress in the way that biologists describe and conceptualize the shared biological elements has not kept pace with sequencing . . . Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to address [25].The GO has since become the de-facto terminological standard for functional annotation, and its success is evident in the popularity of GO-based class enrichment analyses. However, the intervening ten years have witnessed an explosion of interest in systems biology, with a concomitant increase in the number of databases providing information on interactions and pathways, including Reactome, Nature Signaling, PANTHER [26], BIND, BioGRID and HumanCyc (the EcoCyc metabolic pathway database preceded GO [27]). These databases each have their own individual data models and schemas, creating an interoperability problem. This has partly been mitigated by the adoption of BioPAX as a standard exchange format, which allows the aggregation of multiple pathway databases in single “one-stop shopping” warehouses, such as the Pathway Knowledge Base [28], Pathway Commons, and WikiPathways. However, the data is still only partially integrated, and if a researcher wishes to obtain a comprehensive view of a pathway they must still examine multiple records, in addition to GO annotations
  • #4: loss in pathway db transition
  • #5: their data models capture more.
  • #10: also via col16
  • #11: PRO does not yet have Ras etc