SlideShare a Scribd company logo
Using co-annotation and biological knowledge
as a quality control procedure for ontology
structure and gene annotation in the Gene
Ontology
Biocuration, 2017
Valerie Wood and Seth Carbon
@news4go
No co-annotated genes (e.g.)
• Unlikely there is a protein that is directly involved in both processes.
• Use this biological knowledge to check annotations.
tRNA metabolism transmembrane transport
161 339
Intersections in a simple eukaryote
Intersections in a simple eukaryote
The Matrix Tool
http://amigo.geneontology.org/matrix *
* http://tomodachi.berkeleybop.org/matrix
Step 1
Annotations shared between sets of GO
terms are explored and annotation
intersections are noted.
Step 3
Identify new annotations
violating existing rules.
Report to contributing
database(s) for validation.
Step 2
Rules are created for “zero intersects” based on known biology:
• (“cellular amino acid meta. proc.” ∩ “DNA recombination”) = 0
• (“lipid meta. proc.” ∩ “carbohydrate meta. proc.”) = 0
Step 4
Annotations critically inspected, leading to one of two outcomes:
A: Violation identified: contributing database corrects annotation
B: Annotation confirmed: rules are extended to allow specific intersections:
(“amino acid metabolism” ∩ “cofactor metabolism”) = 0, except if
'de novo' NAD biosynthetic process from aspartate
(“amino acid metabolism” ∩ “peroxisome organization”) = 0, except if
C. elegans prx-10 (peroxisome protein and tryptophan synthase)
Steps 1-4: an iterative process
Explore
co-annotation
Correct or
modify
Identify and
report
Biological
“rules”
Example errors
membrane organization
167 1011
INDIRECT EFFECT
Brr6 is involved in nuclear envelope organization,
when mutated this causes nucleocytoplasmic
transport defects, but it is not involved in this process
InterPro2GO MAPPING ERROR
Brr6 annotation used in electronic mappings (IEA)
applied to 814 entries
DOWNSTREAM EFFECT
Fes1 is a ribosome associated protein involved in
clearing misfolded proteins, when mutated can affect
translation (upstream process)
nucleocytoplasmic transport
cytoplasmic translation
450 1823
protein catabolic process
Fission yeast intersections 01/2012
Fission yeast intersections 03/2017
29 annotation errors corrected
Multispecies exercise: cohesin complex vs. processes
Multispecies rule building
• Rule base needs to apply to all species
• Used fission yeast, budding yeast, worm, and mouse for phylogenetic
coverage and a large body of experimental annotation
• Critically evaluated annotation outliers in intersections with slim terms for:
amino acid metabolism, tRNA metabolism, translation, ribosome
biogenesis, and DNA replication
• Fixed errors
• Built rules for “zero” intersections
• Extended some intersections to allow exceptions
Multispecies rule building
report: http://bit.ly/go-report-int ; data: https://github.com/geneontology/shared-annotation-check
Multispecies rule building
report: http://bit.ly/go-report-int ; data: https://github.com/geneontology/shared-annotation-check
Multispecies rule building
report: http://bit.ly/go-report-int ; data: https://github.com/geneontology/shared-annotation-check
Multispecies rule building results
107 rules created, 83 rules broken, by 568 experimental annotation violations
Types of errors identified (and corrected): 147
74
73
Multispecies rule building results
The Future
• Implement rules as part of GO pipeline (Jenkins); reports to MODs
(Currently reporting at: http://bit.ly/go-report-int)
• A community resource; set up a mechanism to challenge rules
(GitHub for now: https://github.com/geneontology/shared-annotation-check)
• Full “slim” process coverage
• Component/complex/function intersections
• Regulation vs. “involved in”
• Something to make the function prediction people happy
Thank yous and contact
• Chris Mungall
• Midori Harris, Antonia Lock, David Hill,
Stacia Engel, and Kimberly Van Auken
• The InterPro Team
• The UniProt Team
• Peter D'Eustachio and Reactome
• Monica Munoz-Torres and Suzanna
Lewis
• 104967/Z/14/Z (Wellcome Trust)
NHGRI U41HG 002273 (Gene Ontology)
• Valerie Wood
http://pombase.org
vw253@cam.ac.uk
@ValWood (GitHub)
• Seth Carbon
http://berkeleybop.org
sjcarbon@lbl.gov
@kltm (GitHub)

More Related Content

PPTX
Tandem affinity purification
PPTX
Protein protein interaction basic
PPTX
Seminario sobre la Aplicación "Expression2Kinases"
PDF
Cytoscape: Gene coexppression and PPI networks
PPTX
YEAST TWO HYBRID SYSTEM
PPTX
Protein-protein interaction (PPI)
PPTX
Systems Biology Approaches to Cancer
PDF
A Systems Biology Approach to Natural Products Research
Tandem affinity purification
Protein protein interaction basic
Seminario sobre la Aplicación "Expression2Kinases"
Cytoscape: Gene coexppression and PPI networks
YEAST TWO HYBRID SYSTEM
Protein-protein interaction (PPI)
Systems Biology Approaches to Cancer
A Systems Biology Approach to Natural Products Research

What's hot (20)

DOCX
Summer '13 Lab Report
PPT
Biotech 2012 spring-6_protein_interactions_0
PDF
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
PPT
Systems biology & Approaches of genomics and proteomics
PPT
Systems biology - Bioinformatics on complete biological systems
PPTX
Molecular analysis of Microbial Community
PPT
Gene regulatory networks
PPTX
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
PDF
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
PPTX
Introduction to systems biology
PDF
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
PPT
The Language of the Gene Ontology
PPTX
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
PPT
Systems biology - Understanding biology at the systems level
PPTX
Abbasi protein microarray
PDF
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
PPTX
Ensembl annotation
PPT
Omics in plant breeding
PPT
Summer '13 Lab Report
Biotech 2012 spring-6_protein_interactions_0
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Systems biology & Approaches of genomics and proteomics
Systems biology - Bioinformatics on complete biological systems
Molecular analysis of Microbial Community
Gene regulatory networks
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
Introduction to systems biology
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
The Language of the Gene Ontology
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Systems biology - Understanding biology at the systems level
Abbasi protein microarray
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
Ensembl annotation
Omics in plant breeding
Ad

Similar to Copy of biocuration 2017 (20)

PPTX
PomBase conventions for improving annotation depth, breadth, consistency and ...
PPTX
High throughput approaches to understanding gene function and mapping archite...
PPTX
Molecular basis of evolution and softwares used in phylogenetic tree contruction
PPTX
Protein protein interaction
DOCX
Directed evolution
PDF
Tyler functional annotation thurs 1120
PDF
Introduction to 16S Microbiome Analysis
PPTX
Enzymology
PDF
Identification, annotation and visualisation of extreme changes in splicing w...
PDF
Functional annotation of invertebrate genomes
PDF
AI and Machine Learning for Secondary Metabolite Prediction
PPTX
Proteomics resources at the EBI & ExPASy
PDF
T-BioInfo Methods and Approaches
PDF
T-bioinfo overview
PDF
Pizza club - February 2017 - Federico
PDF
Methods In Enzymology 429 Translation Initiation Extract Systems And Molecula...
PDF
GO slimming tips
PPTX
Omics in crop improvement
PDF
Riboswitches Methods and Protocols 1st Edition Jeffrey E. Barrick (Auth.)
PDF
JBEI Research Highlights - November 2018
PomBase conventions for improving annotation depth, breadth, consistency and ...
High throughput approaches to understanding gene function and mapping archite...
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Protein protein interaction
Directed evolution
Tyler functional annotation thurs 1120
Introduction to 16S Microbiome Analysis
Enzymology
Identification, annotation and visualisation of extreme changes in splicing w...
Functional annotation of invertebrate genomes
AI and Machine Learning for Secondary Metabolite Prediction
Proteomics resources at the EBI & ExPASy
T-BioInfo Methods and Approaches
T-bioinfo overview
Pizza club - February 2017 - Federico
Methods In Enzymology 429 Translation Initiation Extract Systems And Molecula...
GO slimming tips
Omics in crop improvement
Riboswitches Methods and Protocols 1st Edition Jeffrey E. Barrick (Auth.)
JBEI Research Highlights - November 2018
Ad

More from Valerie Wood (6)

PDF
Go users meeting, unknowns
PDF
Curate locally, think globally
PDF
PomBase infographic
PPTX
New PomBase website features
PPTX
Community curation at PomBase
PPTX
Hidden in plain sight
Go users meeting, unknowns
Curate locally, think globally
PomBase infographic
New PomBase website features
Community curation at PomBase
Hidden in plain sight

Recently uploaded (20)

PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
. Radiology Case Scenariosssssssssssssss
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPT
6.1 High Risk New Born. Padetric health ppt
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
famous lake in india and its disturibution and importance
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
BIOMOLECULES PPT........................
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Sciences of Europe No 170 (2025)
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Classification Systems_TAXONOMY_SCIENCE8.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
. Radiology Case Scenariosssssssssssssss
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
6.1 High Risk New Born. Padetric health ppt
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
famous lake in india and its disturibution and importance
Placing the Near-Earth Object Impact Probability in Context
BIOMOLECULES PPT........................
Introduction to Fisheries Biotechnology_Lesson 1.pptx
lecture 2026 of Sjogren's syndrome l .pdf
2. Earth - The Living Planet Module 2ELS
Sciences of Europe No 170 (2025)
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...

Copy of biocuration 2017

  • 1. Using co-annotation and biological knowledge as a quality control procedure for ontology structure and gene annotation in the Gene Ontology Biocuration, 2017 Valerie Wood and Seth Carbon @news4go
  • 2. No co-annotated genes (e.g.) • Unlikely there is a protein that is directly involved in both processes. • Use this biological knowledge to check annotations. tRNA metabolism transmembrane transport 161 339
  • 3. Intersections in a simple eukaryote
  • 4. Intersections in a simple eukaryote
  • 5. The Matrix Tool http://amigo.geneontology.org/matrix * * http://tomodachi.berkeleybop.org/matrix
  • 6. Step 1 Annotations shared between sets of GO terms are explored and annotation intersections are noted.
  • 7. Step 3 Identify new annotations violating existing rules. Report to contributing database(s) for validation. Step 2 Rules are created for “zero intersects” based on known biology: • (“cellular amino acid meta. proc.” ∩ “DNA recombination”) = 0 • (“lipid meta. proc.” ∩ “carbohydrate meta. proc.”) = 0
  • 8. Step 4 Annotations critically inspected, leading to one of two outcomes: A: Violation identified: contributing database corrects annotation B: Annotation confirmed: rules are extended to allow specific intersections: (“amino acid metabolism” ∩ “cofactor metabolism”) = 0, except if 'de novo' NAD biosynthetic process from aspartate (“amino acid metabolism” ∩ “peroxisome organization”) = 0, except if C. elegans prx-10 (peroxisome protein and tryptophan synthase)
  • 9. Steps 1-4: an iterative process Explore co-annotation Correct or modify Identify and report Biological “rules”
  • 10. Example errors membrane organization 167 1011 INDIRECT EFFECT Brr6 is involved in nuclear envelope organization, when mutated this causes nucleocytoplasmic transport defects, but it is not involved in this process InterPro2GO MAPPING ERROR Brr6 annotation used in electronic mappings (IEA) applied to 814 entries DOWNSTREAM EFFECT Fes1 is a ribosome associated protein involved in clearing misfolded proteins, when mutated can affect translation (upstream process) nucleocytoplasmic transport cytoplasmic translation 450 1823 protein catabolic process
  • 13. 29 annotation errors corrected Multispecies exercise: cohesin complex vs. processes
  • 14. Multispecies rule building • Rule base needs to apply to all species • Used fission yeast, budding yeast, worm, and mouse for phylogenetic coverage and a large body of experimental annotation • Critically evaluated annotation outliers in intersections with slim terms for: amino acid metabolism, tRNA metabolism, translation, ribosome biogenesis, and DNA replication • Fixed errors • Built rules for “zero” intersections • Extended some intersections to allow exceptions
  • 15. Multispecies rule building report: http://bit.ly/go-report-int ; data: https://github.com/geneontology/shared-annotation-check
  • 16. Multispecies rule building report: http://bit.ly/go-report-int ; data: https://github.com/geneontology/shared-annotation-check
  • 17. Multispecies rule building report: http://bit.ly/go-report-int ; data: https://github.com/geneontology/shared-annotation-check
  • 18. Multispecies rule building results 107 rules created, 83 rules broken, by 568 experimental annotation violations Types of errors identified (and corrected): 147 74 73
  • 20. The Future • Implement rules as part of GO pipeline (Jenkins); reports to MODs (Currently reporting at: http://bit.ly/go-report-int) • A community resource; set up a mechanism to challenge rules (GitHub for now: https://github.com/geneontology/shared-annotation-check) • Full “slim” process coverage • Component/complex/function intersections • Regulation vs. “involved in” • Something to make the function prediction people happy
  • 21. Thank yous and contact • Chris Mungall • Midori Harris, Antonia Lock, David Hill, Stacia Engel, and Kimberly Van Auken • The InterPro Team • The UniProt Team • Peter D'Eustachio and Reactome • Monica Munoz-Torres and Suzanna Lewis • 104967/Z/14/Z (Wellcome Trust) NHGRI U41HG 002273 (Gene Ontology) • Valerie Wood http://pombase.org vw253@cam.ac.uk @ValWood (GitHub) • Seth Carbon http://berkeleybop.org sjcarbon@lbl.gov @kltm (GitHub)