SlideShare a Scribd company logo
Introduction to the Gene OntologyNic WeberLIS 590 Ontology Development in Natural Sciences9/24/2010All works referenced at first use, all images are CC except where notes
Gene OntologyWhy :  “The main opportunity lies in the possibility of automated transfer of biological annotations from the experimentally tractable model organisms to the less tractable organisms based on gene and protein sequence similarity.” Ashburner et al. p 25 *Breakthroughs in sequencing show large fraction of genes specifying core bio functions are shared by all eukaryotes (commonalities at cellular level)*Knowledge of role of shared protein in one organism can often transferred (less duplication of work / saved money)*Sequencing takes place at large scale, new discoveries constant (need for documenting change in controlled way)*Traditional Indexing efforts proved “unwieldy” in fruit fly and mouse sequencingAshburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics, 25(1), 25-9. doi: 10.1038/75556.
Gene OntologyGoals Produce a dynamic, controlled vocabulary of that can be applied to eukaryotes. Provide formal structure to document and adopt change.Facilitate the  annotation of and dissemination of annotations for  genes and gene productsFor problematic reasons with hierarchal models (EC), indexing, and biological terminology like “functions”,  three ontologies were developed1.Biological Process2. Molecular Function3. Cellular Component
Biological ProcessThe biolgical objective to which the gene or gene product contributes. A process is accomplished via one or more ordered assemblies of molecular functions.*(This is an ordered process in that something goes in, something different comes out)
Molecular FunctionThe biochemical activity (incuding binding ) of a gene product. Also applies to the capability that a gene product carries as a potential. Describes only what is done, not when or where.
Cellular ComponentThe place in all cells where a gene product is active. These terms reflect our understanding of eukaryotic cell structure. (i.e. ‘ribosome’ or ‘nuclear membrane’)
Dependent vs. Independent EntitiesBiological Process: Dependent (“occurrents that require support from some substance in order to allow them to occur.” Smith et al. p4)2. Molecular Function: Dependent (“which means entities which have a necessary reference to the sub- stances in which they inhere.” ibid) 3. Cellular Component: Independent
GO “Terms”Each “Ontology” defines terms representing gene product properties.Each GO term within the ontology contains the following: unique alphanumeric identifierterm name (which may be a word or string of words)3.	 definitionwith cited sources 4.	namespace indicating the domain to which it belongs.	*Terms may also have synonyms, which are classed as being exactly equivalent to the term name, broader, narrower, or related4. references to equivalent concepts in other databases5.	 commentson term meaning or usage.
Example GO Term [Term] id: GO:0000010name: trans-hexaprenyltranstransferaseactivity namespace: molecular_functiondef: "Catalysis of the reaction: all-trans-hexaprenyldiphosphate + isopentenyldiphosphate = diphosphate + all-trans-heptaprenyldiphosphate." [EC:2.5.1.30]subset: gosubset_proksynonym: "all-trans-heptaprenyl-diphosphatesynthase activity" EXACT [EC:2.5.1.30]synonym: "all-trans-hexaprenyl-diphosphate:isopentenyl-diphosphatehexaprenyltranstransferase activity" EXACT [EC:2.5.1.30]synonym: "heptaprenyldiphosphatesynthase activity" EXACT [EC:2.5.1.30]synonym: "heptaprenyl pyrophosphate synthase activity" EXACT [EC:2.5.1.30]synonym: "heptaprenyl pyrophosphate synthetase activity" EXACT [EC:2.5.1.30]xref: EC:2.5.1.30xref: MetaCyc:TRANS-HEXAPRENYLTRANSTRANSFERASE-RXNis_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups
How Do GO Terms WorkGO terms are connected into nodes of a network, thus the connections between its parents and children are known and form what are technically described as directed acyclic graphs.In a GO DAG- Terms are nodes and Relationships among them are edges.
What the F*@% is a Directed Acyclic Graph?  directed graph- a set A whose elements are called nodes or verticies  and a set E with connecting arcs or edges.So that G = (V,E)     Directed Acyclic Graph-  a directed graph with no directed cycles. *Formed by a collection of vertices and directed edges*Each edge connecting one vertex to another, so that there is no way to start at some vertex A and follow a sequence of edges that eventually loops back to A again.*Important note : DAGs are distinct from hierarchies, in that each term in a DAG may have more than one parent term; these terms are generally  connected by ‘is-a’ and ‘part-of’ relations.Images via: commons.wikimedia.org
GO Directed Acyclic GraphImage via: commons.wikimedia.org
“Relationships” Each term has a defined “relationship” to another term in the same ontology or a related ontology (in GO.)is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups
Relationship typesis_a …part_ofOriginally only two relationship types.is_a  = subsumption   ; part_of = patromonic inclusionNew Types In last year regulates, positively-regulates, and negatively regulates have been added to distinguish gene products that play a regulatory vs. direct role in a biological process
Problems… is_aMeant to facilitate “instance of ”In practice often used to model as “is a kind of” relationships between universals.The is_a relation in its intended meaning indicates a necessary relationship. That is, when we say “euka- ryotic cell is_a cell”, we mean that every eukaryotic cell is a cell.In practice, cases of non-necessary subsumption(i.e. transport, or cell growth)
Problems…part_ofExplained usage = “can be a part of, not is always a part of”In GO,  part_of is used transitively (e.g. where A = B; and B = C; then also A = C) Can’t significantly represent an occurrent , meaning the notion of time is not accurately represented in these relations.
Part – Whole …. has_partAlso introducedhas_part “…In GO, the relationship A has_part B means that A necessarily (always) has B as a part; i.e., if A exists then B also exists as a part of A. If A does not exist, B may or may not exist. Example ‘cell envelope’ has_part ‘plasma membrane’” From: Consortium, G. O. (2010). The Gene Ontology in 2010: extensions and refinements. Nucleic acids research, 38(Database issue), D331-5. doi: 10.1093/nar/gkp1018.
has_part modeled
Annotations (applied terms)Capture data about a gene or gene product, GO provides terms to do so. These annotations allow for genomic information to be uploaded and shared. When a gene is annotated to a term, associations between the gene and terms’ parents are implicitly inferred. Annotations are either generated by a curator or automatically through predictive methods (Rhee et al. p 509)
Annotation StructureGene product identifier Relevant GO termGO annotations have the following data:Reference of the annotation (e.g. a journal article)Evidence code denoting the type of evidence upon which the annotation is basedDate of annotation Creator of annotation
Evidence CodesEvidence codes are of four types:Experimental ComputationalIndirectly derived from exp or compunknown 95% of annotations are computational, this is problematic in that computational annotations increase coverage but also likely to be false positives
Annotation QualifiersColocallizes_withContributes_toNot (most vital) – indicates a lack of properties.
Annotation in EMBL-EBIhttp://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0006915#term=info					(In case link fails, this is a quick view from GO)Gene product:  Actin, alpha cardiac muscle 1, UniProtKB:P68032GO term: heart contraction ; GO:0060047 (biological process)Evidence code:   Inferred from Mutant Phenotype (IMP)Reference: PMID:17611253Assigned by: UniProtKB, June 06, 2008
Universals and ParticularsUniversal: species E-coli; function: boost insulinParticulars: E-coli in this petri dish; function: boost insulin in subject X pancreas “GO terms correspond, in philosophical terminology, to universals…and each universal  corresponding to the term Cell is instantiated by every actual cell.” Smith et al. p 3
Continuants vs. OccurrentsContinuants: entities that continue to exist throughout time (cells, organisms, chromosomes) Preserve their identity, while undergoing variety of changes.   Occurrents (events, processes): Unfold through time.
But…“Biological process, molecular function and cellular components are all attributes of genes, gene products or gene-product groups.” p. 27..do we usually model attributes as ontologies? Are genes, gene products or gene product groups, “backbone” ontologies, OR Super Classes? If these aren’t Top Level Ontologies, what are they?
Smith et al. ; Yu’s “other” example *Recall Yu’s Fourth Definition of Ontologies“The Gene Ontology, in spite of its name, is not an ontology as the latter term is commonly used either by information scientists or by philosophers.It is, as the GO Consortium puts it, a ‘controlled vocabulary’…. their efforts have been directed toward providing a practically useful framework for keeping track of the biological annotations that are applied to gene products.” Smith et al. p 1
Problems and Potential SolutionsEach new term requires understanding of the whole. Therefore curators must be subject experts in order to perform meaningful enhancement.  Solution: make explicit the criteria used for discriminating subclassifications by introducing a decision-tree methodology into the construction of each hierarchy. ( Is this a good solution?)
Drawbacks to GO It is unclear what kinds of reasoning are permissible on the basis of GO’s hierarchies. The rationale of GO’ssubclassificationsis un- clear. The reasoning that went into current choices has not been preserved and thus cannot be explained to or re-examined by a third party. No procedures are offered by which GO can be validated. There are insufficient rules for determining how to recognize whether a given concept is or is not present in GO. The use of a mere string search pre- supposes that all concepts already have a single standardized representation, which is not the case. Smith et al. p6

More Related Content

PPT
Est database
PDF
Gene prediction method
PPTX
Introduction to sequence alignment partii
PDF
Ab Initio Protein Structure Prediction
PPTX
Protein-Protein Interactions (PPIs)
PPTX
HELIX-LOOP-HELIX, HELIX-TURN-HELIX
PPTX
Protein Data Bank ( PDB ) - Bioinformatics
PPTX
Protien Structure Prediction
Est database
Gene prediction method
Introduction to sequence alignment partii
Ab Initio Protein Structure Prediction
Protein-Protein Interactions (PPIs)
HELIX-LOOP-HELIX, HELIX-TURN-HELIX
Protein Data Bank ( PDB ) - Bioinformatics
Protien Structure Prediction

What's hot (20)

PPTX
Gemome annotation
PPTX
(Expasy)
PPTX
The Gene Ontology & Gene Ontology Annotation resources
PPTX
Protein data bank
PPT
Protein database
PPTX
Protein database
PPTX
The yeast two hybrid system and ChIP
PPTX
Structural genomics
PPTX
String.pptx
PPTX
Expressed sequence tag (EST), molecular marker
PPTX
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
PPTX
Scop database
PPTX
Biological databases
PDF
Protein Structure Prediction
PDF
Secondary Structure Prediction of proteins
PDF
Dot matrix
PPTX
encode project
PPTX
Entrez databases
PPT
Genome annotation 2013
PPTX
sequence of file formats in bioinformatics
Gemome annotation
(Expasy)
The Gene Ontology & Gene Ontology Annotation resources
Protein data bank
Protein database
Protein database
The yeast two hybrid system and ChIP
Structural genomics
String.pptx
Expressed sequence tag (EST), molecular marker
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
Scop database
Biological databases
Protein Structure Prediction
Secondary Structure Prediction of proteins
Dot matrix
encode project
Entrez databases
Genome annotation 2013
sequence of file formats in bioinformatics
Ad

Viewers also liked (20)

PDF
Luciano pr 08-849_ontology_evaluation_methods_metrics
PDF
Performance Evaluation Of Ontology And Fuzzybase Cbir
PPT
Ontology Engineering: Ontology evaluation
PPT
Gene Ontology Project
PDF
Ontologies for life sciences: examples from the gene ontology
PPTX
Suggestopedia
PPTX
Suggestopedia
PPT
The PPP & ESA teaching methods
PPTX
Learners' roles in the different teaching approaches and methods
PPTX
The monitor model
PPTX
Método audiolingual
PPT
TOTAL PHYSICAL RESPONSE
PPTX
Approach, method and Technique in Language Learning and teaching
PPT
Language Teaching Approaches and Methods
PPTX
The roles of teachers and learners
PPT
Grammar Translation Method
PPT
Communicative approach presentation
PPT
Methods, approaches and techniques of teaching english
PPTX
Principles of Teaching:Different Methods and Approaches
Luciano pr 08-849_ontology_evaluation_methods_metrics
Performance Evaluation Of Ontology And Fuzzybase Cbir
Ontology Engineering: Ontology evaluation
Gene Ontology Project
Ontologies for life sciences: examples from the gene ontology
Suggestopedia
Suggestopedia
The PPP & ESA teaching methods
Learners' roles in the different teaching approaches and methods
The monitor model
Método audiolingual
TOTAL PHYSICAL RESPONSE
Approach, method and Technique in Language Learning and teaching
Language Teaching Approaches and Methods
The roles of teachers and learners
Grammar Translation Method
Communicative approach presentation
Methods, approaches and techniques of teaching english
Principles of Teaching:Different Methods and Approaches
Ad

Similar to Light Intro to the Gene Ontology (20)

PPT
Ontology - and Reloaded and Revolutions
PDF
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
PPTX
Interactomeee
PPT
Reasoning Requirements for Bioscience
PPTX
Chibucos annot go_final
PDF
The Silence Eclipsing Introns
PDF
The Silence Eclipsing Introns
PPT
Epigenetics /certified fixed orthodontic courses by Indian dental academy
PPT
Epigenetics /certified fixed orthodontic courses by Indian dental academy
PPT
OBO Foundry
PPT
Basic Formal Ontology: A Common Standard
PDF
generic optimization techniques lecture slides
DOCX
adaptation and selection
DOCX
Essential Biology 6.6 & 11.1 Reproduction Core & AHL
PDF
Genomics Of Plants And Fungi Mycology 1st Edition Rolf A Prade
PPT
Basic Formal Ontology (BFO) and Disease
PPT
The Past, Present and Future of Knowledge in Biology
PDF
Introduction to biocomputing
PDF
Organelles In Animal Cells Essay
Ontology - and Reloaded and Revolutions
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Interactomeee
Reasoning Requirements for Bioscience
Chibucos annot go_final
The Silence Eclipsing Introns
The Silence Eclipsing Introns
Epigenetics /certified fixed orthodontic courses by Indian dental academy
Epigenetics /certified fixed orthodontic courses by Indian dental academy
OBO Foundry
Basic Formal Ontology: A Common Standard
generic optimization techniques lecture slides
adaptation and selection
Essential Biology 6.6 & 11.1 Reproduction Core & AHL
Genomics Of Plants And Fungi Mycology 1st Edition Rolf A Prade
Basic Formal Ontology (BFO) and Disease
The Past, Present and Future of Knowledge in Biology
Introduction to biocomputing
Organelles In Animal Cells Essay

Light Intro to the Gene Ontology

  • 1. Introduction to the Gene OntologyNic WeberLIS 590 Ontology Development in Natural Sciences9/24/2010All works referenced at first use, all images are CC except where notes
  • 2. Gene OntologyWhy : “The main opportunity lies in the possibility of automated transfer of biological annotations from the experimentally tractable model organisms to the less tractable organisms based on gene and protein sequence similarity.” Ashburner et al. p 25 *Breakthroughs in sequencing show large fraction of genes specifying core bio functions are shared by all eukaryotes (commonalities at cellular level)*Knowledge of role of shared protein in one organism can often transferred (less duplication of work / saved money)*Sequencing takes place at large scale, new discoveries constant (need for documenting change in controlled way)*Traditional Indexing efforts proved “unwieldy” in fruit fly and mouse sequencingAshburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics, 25(1), 25-9. doi: 10.1038/75556.
  • 3. Gene OntologyGoals Produce a dynamic, controlled vocabulary of that can be applied to eukaryotes. Provide formal structure to document and adopt change.Facilitate the annotation of and dissemination of annotations for genes and gene productsFor problematic reasons with hierarchal models (EC), indexing, and biological terminology like “functions”, three ontologies were developed1.Biological Process2. Molecular Function3. Cellular Component
  • 4. Biological ProcessThe biolgical objective to which the gene or gene product contributes. A process is accomplished via one or more ordered assemblies of molecular functions.*(This is an ordered process in that something goes in, something different comes out)
  • 5. Molecular FunctionThe biochemical activity (incuding binding ) of a gene product. Also applies to the capability that a gene product carries as a potential. Describes only what is done, not when or where.
  • 6. Cellular ComponentThe place in all cells where a gene product is active. These terms reflect our understanding of eukaryotic cell structure. (i.e. ‘ribosome’ or ‘nuclear membrane’)
  • 7. Dependent vs. Independent EntitiesBiological Process: Dependent (“occurrents that require support from some substance in order to allow them to occur.” Smith et al. p4)2. Molecular Function: Dependent (“which means entities which have a necessary reference to the sub- stances in which they inhere.” ibid) 3. Cellular Component: Independent
  • 8. GO “Terms”Each “Ontology” defines terms representing gene product properties.Each GO term within the ontology contains the following: unique alphanumeric identifierterm name (which may be a word or string of words)3. definitionwith cited sources 4. namespace indicating the domain to which it belongs. *Terms may also have synonyms, which are classed as being exactly equivalent to the term name, broader, narrower, or related4. references to equivalent concepts in other databases5. commentson term meaning or usage.
  • 9. Example GO Term [Term] id: GO:0000010name: trans-hexaprenyltranstransferaseactivity namespace: molecular_functiondef: "Catalysis of the reaction: all-trans-hexaprenyldiphosphate + isopentenyldiphosphate = diphosphate + all-trans-heptaprenyldiphosphate." [EC:2.5.1.30]subset: gosubset_proksynonym: "all-trans-heptaprenyl-diphosphatesynthase activity" EXACT [EC:2.5.1.30]synonym: "all-trans-hexaprenyl-diphosphate:isopentenyl-diphosphatehexaprenyltranstransferase activity" EXACT [EC:2.5.1.30]synonym: "heptaprenyldiphosphatesynthase activity" EXACT [EC:2.5.1.30]synonym: "heptaprenyl pyrophosphate synthase activity" EXACT [EC:2.5.1.30]synonym: "heptaprenyl pyrophosphate synthetase activity" EXACT [EC:2.5.1.30]xref: EC:2.5.1.30xref: MetaCyc:TRANS-HEXAPRENYLTRANSTRANSFERASE-RXNis_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups
  • 10. How Do GO Terms WorkGO terms are connected into nodes of a network, thus the connections between its parents and children are known and form what are technically described as directed acyclic graphs.In a GO DAG- Terms are nodes and Relationships among them are edges.
  • 11. What the F*@% is a Directed Acyclic Graph? directed graph- a set A whose elements are called nodes or verticies and a set E with connecting arcs or edges.So that G = (V,E) Directed Acyclic Graph- a directed graph with no directed cycles. *Formed by a collection of vertices and directed edges*Each edge connecting one vertex to another, so that there is no way to start at some vertex A and follow a sequence of edges that eventually loops back to A again.*Important note : DAGs are distinct from hierarchies, in that each term in a DAG may have more than one parent term; these terms are generally connected by ‘is-a’ and ‘part-of’ relations.Images via: commons.wikimedia.org
  • 12. GO Directed Acyclic GraphImage via: commons.wikimedia.org
  • 13. “Relationships” Each term has a defined “relationship” to another term in the same ontology or a related ontology (in GO.)is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups
  • 14. Relationship typesis_a …part_ofOriginally only two relationship types.is_a = subsumption ; part_of = patromonic inclusionNew Types In last year regulates, positively-regulates, and negatively regulates have been added to distinguish gene products that play a regulatory vs. direct role in a biological process
  • 15. Problems… is_aMeant to facilitate “instance of ”In practice often used to model as “is a kind of” relationships between universals.The is_a relation in its intended meaning indicates a necessary relationship. That is, when we say “euka- ryotic cell is_a cell”, we mean that every eukaryotic cell is a cell.In practice, cases of non-necessary subsumption(i.e. transport, or cell growth)
  • 16. Problems…part_ofExplained usage = “can be a part of, not is always a part of”In GO, part_of is used transitively (e.g. where A = B; and B = C; then also A = C) Can’t significantly represent an occurrent , meaning the notion of time is not accurately represented in these relations.
  • 17. Part – Whole …. has_partAlso introducedhas_part “…In GO, the relationship A has_part B means that A necessarily (always) has B as a part; i.e., if A exists then B also exists as a part of A. If A does not exist, B may or may not exist. Example ‘cell envelope’ has_part ‘plasma membrane’” From: Consortium, G. O. (2010). The Gene Ontology in 2010: extensions and refinements. Nucleic acids research, 38(Database issue), D331-5. doi: 10.1093/nar/gkp1018.
  • 19. Annotations (applied terms)Capture data about a gene or gene product, GO provides terms to do so. These annotations allow for genomic information to be uploaded and shared. When a gene is annotated to a term, associations between the gene and terms’ parents are implicitly inferred. Annotations are either generated by a curator or automatically through predictive methods (Rhee et al. p 509)
  • 20. Annotation StructureGene product identifier Relevant GO termGO annotations have the following data:Reference of the annotation (e.g. a journal article)Evidence code denoting the type of evidence upon which the annotation is basedDate of annotation Creator of annotation
  • 21. Evidence CodesEvidence codes are of four types:Experimental ComputationalIndirectly derived from exp or compunknown 95% of annotations are computational, this is problematic in that computational annotations increase coverage but also likely to be false positives
  • 22. Annotation QualifiersColocallizes_withContributes_toNot (most vital) – indicates a lack of properties.
  • 23. Annotation in EMBL-EBIhttp://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0006915#term=info (In case link fails, this is a quick view from GO)Gene product: Actin, alpha cardiac muscle 1, UniProtKB:P68032GO term: heart contraction ; GO:0060047 (biological process)Evidence code: Inferred from Mutant Phenotype (IMP)Reference: PMID:17611253Assigned by: UniProtKB, June 06, 2008
  • 24. Universals and ParticularsUniversal: species E-coli; function: boost insulinParticulars: E-coli in this petri dish; function: boost insulin in subject X pancreas “GO terms correspond, in philosophical terminology, to universals…and each universal corresponding to the term Cell is instantiated by every actual cell.” Smith et al. p 3
  • 25. Continuants vs. OccurrentsContinuants: entities that continue to exist throughout time (cells, organisms, chromosomes) Preserve their identity, while undergoing variety of changes. Occurrents (events, processes): Unfold through time.
  • 26. But…“Biological process, molecular function and cellular components are all attributes of genes, gene products or gene-product groups.” p. 27..do we usually model attributes as ontologies? Are genes, gene products or gene product groups, “backbone” ontologies, OR Super Classes? If these aren’t Top Level Ontologies, what are they?
  • 27. Smith et al. ; Yu’s “other” example *Recall Yu’s Fourth Definition of Ontologies“The Gene Ontology, in spite of its name, is not an ontology as the latter term is commonly used either by information scientists or by philosophers.It is, as the GO Consortium puts it, a ‘controlled vocabulary’…. their efforts have been directed toward providing a practically useful framework for keeping track of the biological annotations that are applied to gene products.” Smith et al. p 1
  • 28. Problems and Potential SolutionsEach new term requires understanding of the whole. Therefore curators must be subject experts in order to perform meaningful enhancement. Solution: make explicit the criteria used for discriminating subclassifications by introducing a decision-tree methodology into the construction of each hierarchy. ( Is this a good solution?)
  • 29. Drawbacks to GO It is unclear what kinds of reasoning are permissible on the basis of GO’s hierarchies. The rationale of GO’ssubclassificationsis un- clear. The reasoning that went into current choices has not been preserved and thus cannot be explained to or re-examined by a third party. No procedures are offered by which GO can be validated. There are insufficient rules for determining how to recognize whether a given concept is or is not present in GO. The use of a mere string search pre- supposes that all concepts already have a single standardized representation, which is not the case. Smith et al. p6