SlideShare a Scribd company logo
PomBase conventions for improving
annotation depth, breadth,
consistency and accuracy
Annotation numbers are important
…but numbers aren’t everything…..
• Use of annotation for data-mining and data-analysis is limited
by errors, inconsistencies and omissions.
• PomBase uses a combination of annotation conventions, to
improve information content (annotation coverage, specificity
and redundancy), and QC mechanisms to identify possible
annotation inconsistencies and errors.
• In combination these mechanisms address many recurring
annotation issues.
1. The definition is critical
All ontology terms have a “fixed” definition
• If a definition is misleading or incorrect its meaning cannot
be changed. To fix the term is obsoleted and annotations
are migrated.
• This makes annotations very robust to ontology changes. If
a term needs to be repositioned the annotations remain
correct .
• We annotate to the definition, not the term name. Always
check the definition.
2. Improving annotation specificity
• i) Consider descendant terms
• ii) Veto use of uninformative terms
2i. Consider descendants
Annotate as specifically as experiment allows and be
unambiguous about the biology
• regulation: positive or negative?
• translation: cytoplasmic or mitochondrial?
• transport: of what? to where? how?
• chromosome segregation: mitotic or meiotic?
If the available terms are insufficient, request a more specific
term
• For a carboxylic acid carrier
“carboxylic acid transport”
looks initially OK
• However “transmembrane transport”
is not explicit here… Carboxylic acid
might be transported in other ways…
2i. Consider descendants e.g.
More specific annotation can
provide additional detail e.g.
• substrate,
• type (transmembrane),
• sometimes directionality
Additional parents increase the
information content as
annotating indirectly to more
terms.
2. Consider descendants e.g.
2. Veto use of non-specific terms
Identify the set of ontology terms where more specific
annotation should be possible (more biological detail)
Examples:
• e.g. cellular process (which one?)
• e.g. translation (cytoplasmic? mitochondrial?)
• e.g. transport ( of what? to where? )
Some GO terms are already flagged as not for manual
annotation. Review and improve annotations to vetoed terms
PomBase blocks 1298 upper level GO terms for direct
annotation (<200 violations)
3. i) Missing parents
Original arrangement
3. Improve the ontologies
3i. Missing parents
These process annotations were originally in different branches
of the ontology, so all annotations were required
New arrangement:
3i. Missing parents
3.i Missing parents
Collapsed 6 processes to 2. Exactly the same information content
Less redundancy, easier for users to interpret annotation
3.ii Report incorrect parents
AKA “True Path Violations” or “TPVs”
For example
protein maturation
--protein processing (part_of)
----proteolysis (part_of)
(not all proteolysis is processing or
maturation)
4. The power of Annotation Extensions
Provide additional specificity for a GO annotation e.g.
• Target gene (kinase substrate, TF regulation target)
• Location of a function
• Localization dependencies (protein A localizes protein B)
• Spatial and temporal aspects of processes, functions, locations (cell cycle stage
of occurrence)
• ADD an example of a gene product specific AE
See: Huntley et. al. A method for increasing expressivity of Gene Ontology
annotations using a compositional approach. PMID:24885854
cyclin-dependent protein serine/threonine kinase
• has substrate fkh2 involved in negative regulation of conjugation with cellular fusion
• directly inhibits srw1 involved in positive regulation regulation of G1/S transition
• has substrate drc1 involved in positive regulation of mitotic cell cycle DNA replication
• has substrate cdc18, orc2 involved in negative regulation of DNA replication during mitotic G2 phase
• has substrate xlf1 involved in negative regulation of double-strand break repair via nonhomologous end joining,
during mitotic G2 phase
• has substrate rap1 involved in negative regulation of mitotic telomere tethering at nuclear periphery
during mitotic M phase
• has substrate hcn1 during mitotic M phase
• has substrate cut3 involved in positive regulation of mitotic chromosome condensation during mitotic metaphase
• has substrate mde4 involved in correction of merotelic attachment, mitotic during mitotic metaphase
• has substrate, nsk1, involved in negative regulation of attachment of mitotic spindle microtubules during mitotic
metaphase
• has substrate mde4,cut7 involved in negative regulation of mitotic spindle elongation during mitotic metaphase
• has substrate klp9 involved in negative regulation of mitotic spindle elongation during mitotic anaphase A
• directly inhibits clp1 involved in negative regulation of exit from mitosis
• has substrate byr4 involved in positive regulation of septation initiation signaling
• directly inhibits dis2,
• has substrate rum1, crb2, sds23
Link function (cyclin-dependent-kinase) to target genes, processes,
and temporal information
4. Annotation Extension e.g. cdc2
Alternative (human CDK1):
Not scalable or maintainable
4. Using AE for effectors
• Reciprocal of the extension (automated) called “target of”
• Collects known “upstream effectors” on cdc2 page
• We can use effector substrate connections to generate
networks (interaction, metabolic, regulatory)
• Provide directional links to support pathway reconstruction
4. Using Annotation Extensions to
generate networks/pathways
sty1
cmk2
srk1
rum1
atf1
srk1
gsa1
gpx1
ntp1
sro1
ish1
4. Automated AE networks e.g.
44/59 connected in automated network based on annotated
connections within “regulation of G2/M transition” (fission yeast)
(Network for each GO slim category from the slim page)
5. Suppress redundant IEA annotation
• PomBase pipelines filter redundant IEA
(Inferred from Electronic Annotation)
evidence
• Removes >90% of IEA (because an existing
manual annotation exists)
5. Suppress redundant IEA annotation
13 annotations are reduced to 4
Same information, fewer terms
Incorrect annotations are more easily spotted
Mis16 is not involved in ‘chromatin modification,- > fix mapping
5. Suppress redundant IEA,
QC of mappings
Missing parents in ontology more obvious
“inorganic anion exchanger” should be an ‘ancestor’ of
GO:0005452, to suppress the IEA as redundant
5. Suppress redundant IEA,
QC of ontology
(SPBC543.05c)
5. Suppress redundant IEA annotation
• >40,000 fission yeast IEAs available.
• PomBase filter 36000 redundant, retain 4000 (IEAs are at least
90% accurate if manual correct).
• It is easier to evaluate the remaining IEA’s to identify/fix
anomalies
Reducing IEAs over time
5. Suppress redundant IEA
• More concise view with zero loss of information
• IEA mappings derived from a single experiment/publication
can be interpreted as proof by repetition and make weak EXP
data appear multiply supported/acceptable
• Fewer annotations, easier QC of remaining IEA’s
Q “Why isn’t an IEA covered by manual annotation?” Either:
1. Incorrect mapping
2. Missing parent in ontology
3. Missing annotation -> find supporting evidence and
annotate manually (EXP or ISO)
(PomBase also filter NAS/TAS/IC)
6. Annotate by process (pathway)
• Annotating by process rather than “ad hoc”
improves consistency and allows ‘annotation
gaps’ to be targeted
• Process papers more quickly (become more
familiar with the field, experimental methods)
Become familiar with an area of biology and
the techniques used. Don’t need to read the
background every time. Recognise
phenotypes.
From PMID:22898774
Regulation of the
metaphase/anaphase
transition by the MCC, the
APC and upstream
Signalling
Identify obvious missing
annotation, for example
between complex
members
6. Annotate by process or pathway
6. Annotate by process or pathway
cdc20
proteasome
APC separase
Cohesin subunit
securin
Post transition
SAC/MCC
Can perform QC on processed or components
e.g. Use STRING to evaluate outliers (potential annotation
errors) Input list “regulation of mitotic metaphase/anaphase
transition”
Can also ask “are any
Complex members missing”
• We are annotating whole organisms…use a
holistic whole annotation approach
• Evaluate annotation breadth (coverage) using
slims
• Evaluate intersections between slim processes
7. Assess annotation at the
organismal level
7. Evaluate organismal annotation
coverage using “slims”
• EXP supported BP
• ISO/IEA inferred BP
‘unknowns’
• Species specific, no
inference possible
• Conserved, but
unannotated in any
species
7. Browsable Slim:
http://preview.pombase.org/browse-curation/fission-yeast-go-slim-terms
7. Sensible assignments?
DNA
recombination
Periodic check that
slim class contents
Look sensible
7. Monitor unslimmed gene products
Note: Exclude biologically uninformative terms like “phosphorylation” or
“response to chemical” as these could apply to any real biological role.
Unknown 830
TOTAL
5054
cytoskeleton
org 206
nuclear DNA
replica on,
recombina on,
repair
305
mito c
chromosome
segrega on
184 regula on of mito c
cell cycle 232
10
CELL DIVISION 751
27
cytokinesis
110
0
39 1
46
3
4. MITOCHONDRIAL
ORG/EXP
280
4
cell wall
org 1303
4
1
MEMBRANES, TRAFFICKING, CELL SURFACE 787
14
lipid met
222 vesicle
Mediated
transport
324
6
glycosyla on
polysacc met
140membrane
org 199
75
0
6
74
10
33
0
detox
SMALL MOLECULE TM
TRANSPORT
288
13
9
0
AA &
sulfur
met
220
vitamin
cofactor
met
9
5 nucleo-base/
side/ de met
219
small
sugar met
77
CENTRAL MET,
ENERGY
AND BUILDING
BLOCKS 549
Nitrogen
15
25
174
54
3430
other energy
genera on
25
23
signalling
404
sexual reproduc ve
process 262
(Many intersec ons)
Other 290
No intersec ons.
Includes adhesion,
many proteases,
peroxions
EXPRESSION 1294
````
EXPRESSION submod 863
4
1
3
ribosome
biogenesis
317
RNA
metabolism
772cytoplasmic
transla on
249
189
c
nucleocyto
transport
110
5
34
26
2
Transcrip on
479
32
18
PROTEIN ASSEMBLY/STABILITY 765
protein
catabolism
& autophagy
251
ubiqui na on
192
63
folding
102
complex
Assembly
325
1
3
4
1
7. Visual slim, all pombe proteins
7. Evaluate intersections between slim
categories
Evaluate intersections between processes
Many GO processes are rarely co-annotated because they are
functionally spatially or temporally distant. For example, would
not expect “ribosome biogenesis” to intersect with “vitamin
metabolism”
We can use this observation to identify potential conflicts using
the GO term matrix
Fission yeast intersections Jan 2012
Fission yeast intersections March 2017
7. Identifies ontology errors (e.g)
DNA metabolism and chromosome segregation do not usually intersect
Regulation of chromosome condensation should not be a DNA metabolic process
7. Ontology error (e.g.)
Genes annotated to folic acid metabolism were also incorrectly annotated to amino acid
metabolism. Folic acid was classified as an amino acid by CHEBI -> fix, CHEBI, which fixes GO
7. Finds incorrect mappings (e.g)
Intersect between tRNA metabolism and transcription.
Elongator is no longer thought to have a direct role in transcription, mapping removed
8. Consider Author intent
Think about the biology the author intended
e.g. rubidium ion transmembrane transporter/ transport
Rubidium ion is used as an assay for K+ transport not rubidium
(non-physiological substrate)
e.g. Apoptosis (RPS19)
Rps19 mutant displayed condensed DNA, a fragmented nucleus
and caspase activation - indicative of apoptosis.
Since RPS19 has an essential role in ribosome biogenesis
apoptosis is likely to be an indirect effect of the disruption of an
upstream process translation (i.e. an experimental readout)
9. Communication with the author
and community curation
• Most authors are happy to discuss their publications. If unsure
about an annotation ask them. PomBase routinely use the
authors as a QC step to refine annotation.
9. Community Curation
• Most authors are happy to curate their own papers
• Co-curation by author and curator improves annotation quality
(especially PhD/post doc/recent papers).
• 9619 annotations (FTPO/GO/MOD) created by Community
from 510 publications (excludes HTP spreadsheet submissions)
Some example sessions
• http://tinyurl.com/q2bgyqv
• http://tinyurl.com/p7d979b
• http://tinyurl.com/o72bzul
Very specific annotation is possible because Canto guides the user
step by step to construct genotypes and ontology based annotations.
“Drill down” to more specific terms is assisted.
Prompts are provided for AE of specified types for certain terms.
10. Prioritise error fixing
• Fixing known errors takes precedence over new annotation....
like critical bugs in code
• Even small errors often uncover larger issues, or can fix many
problems simultaneously across multiple species.
• Prevents propagation of annotation errors
11. GO process vs. phenotype
• GO annotation should reflect a gene's direct involvement
in, or role in regulating, processes or functions.
• Phenotypes may indicate that a mutation *affects* a
process, but may reflect downstream or indirect effects.
e.g. ER membrane defect -> nuclear envelope defect -> chromosome
decondensation defect-> defects in next round of DNA replication.
• A “DNA replication phenotype” alone is not enough to
make a “DNA replication” GO annotation.
• Single phenotype is often NOT SPECIFIC FOR A PROCESS.
Phenotype annotation rules
• To make GO annotations based on phenotypes
• Ask the question
“Is this phenotype or collection of phenotypes
specific to this process (usually need detailed
phenotypes)
Additional data can support GO inference from
phenotype (location, orthology), and author intent.
(Intersections between processes useful for identifying
annotation errors caused by indirect annotation)
Summary

More Related Content

PDF
RNA-seq: general concept, goal and experimental design - part 1
PPTX
GPCRs_HouseLA
PPT
The Language of the Gene Ontology
POT
RNA-seq quality control and pre-processing
PDF
BITS: Basics of sequence analysis
PDF
RNA-seq: Mapping and quality control - part 3
PDF
modelling assignment
PPTX
Community curation at PomBase
RNA-seq: general concept, goal and experimental design - part 1
GPCRs_HouseLA
The Language of the Gene Ontology
RNA-seq quality control and pre-processing
BITS: Basics of sequence analysis
RNA-seq: Mapping and quality control - part 3
modelling assignment
Community curation at PomBase

Viewers also liked (14)

PDF
Reclamebord v2
DOC
DOCX
Ubuntu
PPT
Planificación del 3 er cohorte
PDF
Page 12
PDF
JW Healthcare Logo
PPTX
DOC
Sesion de aprendizaje razonami
DOC
Korrika 18ri buruzko gutuna
PDF
Onet50
PDF
Irakurle kanpaina: 2016ko negua
PDF
Ulermena escolar letra ESKUTITZA
PDF
Asmakizunak
Reclamebord v2
Ubuntu
Planificación del 3 er cohorte
Page 12
JW Healthcare Logo
Sesion de aprendizaje razonami
Korrika 18ri buruzko gutuna
Onet50
Irakurle kanpaina: 2016ko negua
Ulermena escolar letra ESKUTITZA
Asmakizunak
Ad

Similar to PomBase conventions for improving annotation depth, breadth, consistency and accuracy (20)

PDF
Curate locally, think globally
PPTX
Data cycle microbes
PPTX
Data cycle health
PDF
Curation Introduction - Apollo Workshop
PDF
Automated Prokaryotic Annotation at JCVI
PDF
Apollo Workshop AGS2017 Introduction
PDF
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
PDF
bioinformatics enabling knowledge generation from agricultural omics data
PDF
Lock - PomBase community curation
PPT
Experimentos de nubes científicas: Medical Genome Project
PPTX
Cshl minseqe 2013_ouellette
PDF
Tyler functional annotation thurs 1120
PPTX
Ondex: Data integration and visualisation
PDF
"Biomolecular annotation prediction through information integration" - Davide...
PDF
2 md2016 annotation
PPTX
2013 nas-ehs-data-integration-dc
PPTX
2016 bergen-sars
PPT
Gene Ontology Network Enrichment Analysis
PDF
Pathway analysis 2012
PDF
Algal Functional Annotation Tool
Curate locally, think globally
Data cycle microbes
Data cycle health
Curation Introduction - Apollo Workshop
Automated Prokaryotic Annotation at JCVI
Apollo Workshop AGS2017 Introduction
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
bioinformatics enabling knowledge generation from agricultural omics data
Lock - PomBase community curation
Experimentos de nubes científicas: Medical Genome Project
Cshl minseqe 2013_ouellette
Tyler functional annotation thurs 1120
Ondex: Data integration and visualisation
"Biomolecular annotation prediction through information integration" - Davide...
2 md2016 annotation
2013 nas-ehs-data-integration-dc
2016 bergen-sars
Gene Ontology Network Enrichment Analysis
Pathway analysis 2012
Algal Functional Annotation Tool
Ad

More from Valerie Wood (6)

PDF
Go users meeting, unknowns
PDF
GO slimming tips
PDF
Copy of biocuration 2017
PDF
PomBase infographic
PPTX
New PomBase website features
PPTX
Hidden in plain sight
Go users meeting, unknowns
GO slimming tips
Copy of biocuration 2017
PomBase infographic
New PomBase website features
Hidden in plain sight

Recently uploaded (20)

PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPT
protein biochemistry.ppt for university classes
PPTX
famous lake in india and its disturibution and importance
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
Microbiology with diagram medical studies .pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
6.1 High Risk New Born. Padetric health ppt
2. Earth - The Living Planet Module 2ELS
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
. Radiology Case Scenariosssssssssssssss
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
neck nodes and dissection types and lymph nodes levels
Taita Taveta Laboratory Technician Workshop Presentation.pptx
protein biochemistry.ppt for university classes
famous lake in india and its disturibution and importance
Classification Systems_TAXONOMY_SCIENCE8.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
The scientific heritage No 166 (166) (2025)
2Systematics of Living Organisms t-.pptx
Microbiology with diagram medical studies .pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...

PomBase conventions for improving annotation depth, breadth, consistency and accuracy

  • 1. PomBase conventions for improving annotation depth, breadth, consistency and accuracy
  • 2. Annotation numbers are important …but numbers aren’t everything….. • Use of annotation for data-mining and data-analysis is limited by errors, inconsistencies and omissions. • PomBase uses a combination of annotation conventions, to improve information content (annotation coverage, specificity and redundancy), and QC mechanisms to identify possible annotation inconsistencies and errors. • In combination these mechanisms address many recurring annotation issues.
  • 3. 1. The definition is critical All ontology terms have a “fixed” definition • If a definition is misleading or incorrect its meaning cannot be changed. To fix the term is obsoleted and annotations are migrated. • This makes annotations very robust to ontology changes. If a term needs to be repositioned the annotations remain correct . • We annotate to the definition, not the term name. Always check the definition.
  • 4. 2. Improving annotation specificity • i) Consider descendant terms • ii) Veto use of uninformative terms
  • 5. 2i. Consider descendants Annotate as specifically as experiment allows and be unambiguous about the biology • regulation: positive or negative? • translation: cytoplasmic or mitochondrial? • transport: of what? to where? how? • chromosome segregation: mitotic or meiotic? If the available terms are insufficient, request a more specific term
  • 6. • For a carboxylic acid carrier “carboxylic acid transport” looks initially OK • However “transmembrane transport” is not explicit here… Carboxylic acid might be transported in other ways… 2i. Consider descendants e.g.
  • 7. More specific annotation can provide additional detail e.g. • substrate, • type (transmembrane), • sometimes directionality Additional parents increase the information content as annotating indirectly to more terms. 2. Consider descendants e.g.
  • 8. 2. Veto use of non-specific terms Identify the set of ontology terms where more specific annotation should be possible (more biological detail) Examples: • e.g. cellular process (which one?) • e.g. translation (cytoplasmic? mitochondrial?) • e.g. transport ( of what? to where? ) Some GO terms are already flagged as not for manual annotation. Review and improve annotations to vetoed terms PomBase blocks 1298 upper level GO terms for direct annotation (<200 violations)
  • 9. 3. i) Missing parents Original arrangement 3. Improve the ontologies
  • 10. 3i. Missing parents These process annotations were originally in different branches of the ontology, so all annotations were required
  • 12. 3.i Missing parents Collapsed 6 processes to 2. Exactly the same information content Less redundancy, easier for users to interpret annotation
  • 13. 3.ii Report incorrect parents AKA “True Path Violations” or “TPVs” For example protein maturation --protein processing (part_of) ----proteolysis (part_of) (not all proteolysis is processing or maturation)
  • 14. 4. The power of Annotation Extensions Provide additional specificity for a GO annotation e.g. • Target gene (kinase substrate, TF regulation target) • Location of a function • Localization dependencies (protein A localizes protein B) • Spatial and temporal aspects of processes, functions, locations (cell cycle stage of occurrence) • ADD an example of a gene product specific AE See: Huntley et. al. A method for increasing expressivity of Gene Ontology annotations using a compositional approach. PMID:24885854
  • 15. cyclin-dependent protein serine/threonine kinase • has substrate fkh2 involved in negative regulation of conjugation with cellular fusion • directly inhibits srw1 involved in positive regulation regulation of G1/S transition • has substrate drc1 involved in positive regulation of mitotic cell cycle DNA replication • has substrate cdc18, orc2 involved in negative regulation of DNA replication during mitotic G2 phase • has substrate xlf1 involved in negative regulation of double-strand break repair via nonhomologous end joining, during mitotic G2 phase • has substrate rap1 involved in negative regulation of mitotic telomere tethering at nuclear periphery during mitotic M phase • has substrate hcn1 during mitotic M phase • has substrate cut3 involved in positive regulation of mitotic chromosome condensation during mitotic metaphase • has substrate mde4 involved in correction of merotelic attachment, mitotic during mitotic metaphase • has substrate, nsk1, involved in negative regulation of attachment of mitotic spindle microtubules during mitotic metaphase • has substrate mde4,cut7 involved in negative regulation of mitotic spindle elongation during mitotic metaphase • has substrate klp9 involved in negative regulation of mitotic spindle elongation during mitotic anaphase A • directly inhibits clp1 involved in negative regulation of exit from mitosis • has substrate byr4 involved in positive regulation of septation initiation signaling • directly inhibits dis2, • has substrate rum1, crb2, sds23 Link function (cyclin-dependent-kinase) to target genes, processes, and temporal information 4. Annotation Extension e.g. cdc2
  • 16. Alternative (human CDK1): Not scalable or maintainable
  • 17. 4. Using AE for effectors • Reciprocal of the extension (automated) called “target of” • Collects known “upstream effectors” on cdc2 page
  • 18. • We can use effector substrate connections to generate networks (interaction, metabolic, regulatory) • Provide directional links to support pathway reconstruction 4. Using Annotation Extensions to generate networks/pathways sty1 cmk2 srk1 rum1 atf1 srk1 gsa1 gpx1 ntp1 sro1 ish1
  • 19. 4. Automated AE networks e.g. 44/59 connected in automated network based on annotated connections within “regulation of G2/M transition” (fission yeast) (Network for each GO slim category from the slim page)
  • 20. 5. Suppress redundant IEA annotation • PomBase pipelines filter redundant IEA (Inferred from Electronic Annotation) evidence • Removes >90% of IEA (because an existing manual annotation exists)
  • 21. 5. Suppress redundant IEA annotation 13 annotations are reduced to 4 Same information, fewer terms
  • 22. Incorrect annotations are more easily spotted Mis16 is not involved in ‘chromatin modification,- > fix mapping 5. Suppress redundant IEA, QC of mappings
  • 23. Missing parents in ontology more obvious “inorganic anion exchanger” should be an ‘ancestor’ of GO:0005452, to suppress the IEA as redundant 5. Suppress redundant IEA, QC of ontology (SPBC543.05c)
  • 24. 5. Suppress redundant IEA annotation • >40,000 fission yeast IEAs available. • PomBase filter 36000 redundant, retain 4000 (IEAs are at least 90% accurate if manual correct). • It is easier to evaluate the remaining IEA’s to identify/fix anomalies Reducing IEAs over time
  • 25. 5. Suppress redundant IEA • More concise view with zero loss of information • IEA mappings derived from a single experiment/publication can be interpreted as proof by repetition and make weak EXP data appear multiply supported/acceptable • Fewer annotations, easier QC of remaining IEA’s Q “Why isn’t an IEA covered by manual annotation?” Either: 1. Incorrect mapping 2. Missing parent in ontology 3. Missing annotation -> find supporting evidence and annotate manually (EXP or ISO) (PomBase also filter NAS/TAS/IC)
  • 26. 6. Annotate by process (pathway) • Annotating by process rather than “ad hoc” improves consistency and allows ‘annotation gaps’ to be targeted • Process papers more quickly (become more familiar with the field, experimental methods) Become familiar with an area of biology and the techniques used. Don’t need to read the background every time. Recognise phenotypes.
  • 27. From PMID:22898774 Regulation of the metaphase/anaphase transition by the MCC, the APC and upstream Signalling Identify obvious missing annotation, for example between complex members 6. Annotate by process or pathway
  • 28. 6. Annotate by process or pathway cdc20 proteasome APC separase Cohesin subunit securin Post transition SAC/MCC Can perform QC on processed or components e.g. Use STRING to evaluate outliers (potential annotation errors) Input list “regulation of mitotic metaphase/anaphase transition” Can also ask “are any Complex members missing”
  • 29. • We are annotating whole organisms…use a holistic whole annotation approach • Evaluate annotation breadth (coverage) using slims • Evaluate intersections between slim processes 7. Assess annotation at the organismal level
  • 30. 7. Evaluate organismal annotation coverage using “slims” • EXP supported BP • ISO/IEA inferred BP ‘unknowns’ • Species specific, no inference possible • Conserved, but unannotated in any species
  • 32. 7. Sensible assignments? DNA recombination Periodic check that slim class contents Look sensible
  • 33. 7. Monitor unslimmed gene products Note: Exclude biologically uninformative terms like “phosphorylation” or “response to chemical” as these could apply to any real biological role.
  • 34. Unknown 830 TOTAL 5054 cytoskeleton org 206 nuclear DNA replica on, recombina on, repair 305 mito c chromosome segrega on 184 regula on of mito c cell cycle 232 10 CELL DIVISION 751 27 cytokinesis 110 0 39 1 46 3 4. MITOCHONDRIAL ORG/EXP 280 4 cell wall org 1303 4 1 MEMBRANES, TRAFFICKING, CELL SURFACE 787 14 lipid met 222 vesicle Mediated transport 324 6 glycosyla on polysacc met 140membrane org 199 75 0 6 74 10 33 0 detox SMALL MOLECULE TM TRANSPORT 288 13 9 0 AA & sulfur met 220 vitamin cofactor met 9 5 nucleo-base/ side/ de met 219 small sugar met 77 CENTRAL MET, ENERGY AND BUILDING BLOCKS 549 Nitrogen 15 25 174 54 3430 other energy genera on 25 23 signalling 404 sexual reproduc ve process 262 (Many intersec ons) Other 290 No intersec ons. Includes adhesion, many proteases, peroxions EXPRESSION 1294 ```` EXPRESSION submod 863 4 1 3 ribosome biogenesis 317 RNA metabolism 772cytoplasmic transla on 249 189 c nucleocyto transport 110 5 34 26 2 Transcrip on 479 32 18 PROTEIN ASSEMBLY/STABILITY 765 protein catabolism & autophagy 251 ubiqui na on 192 63 folding 102 complex Assembly 325 1 3 4 1 7. Visual slim, all pombe proteins
  • 35. 7. Evaluate intersections between slim categories Evaluate intersections between processes Many GO processes are rarely co-annotated because they are functionally spatially or temporally distant. For example, would not expect “ribosome biogenesis” to intersect with “vitamin metabolism” We can use this observation to identify potential conflicts using the GO term matrix
  • 38. 7. Identifies ontology errors (e.g) DNA metabolism and chromosome segregation do not usually intersect Regulation of chromosome condensation should not be a DNA metabolic process
  • 39. 7. Ontology error (e.g.) Genes annotated to folic acid metabolism were also incorrectly annotated to amino acid metabolism. Folic acid was classified as an amino acid by CHEBI -> fix, CHEBI, which fixes GO
  • 40. 7. Finds incorrect mappings (e.g) Intersect between tRNA metabolism and transcription. Elongator is no longer thought to have a direct role in transcription, mapping removed
  • 41. 8. Consider Author intent Think about the biology the author intended e.g. rubidium ion transmembrane transporter/ transport Rubidium ion is used as an assay for K+ transport not rubidium (non-physiological substrate) e.g. Apoptosis (RPS19) Rps19 mutant displayed condensed DNA, a fragmented nucleus and caspase activation - indicative of apoptosis. Since RPS19 has an essential role in ribosome biogenesis apoptosis is likely to be an indirect effect of the disruption of an upstream process translation (i.e. an experimental readout)
  • 42. 9. Communication with the author and community curation • Most authors are happy to discuss their publications. If unsure about an annotation ask them. PomBase routinely use the authors as a QC step to refine annotation.
  • 43. 9. Community Curation • Most authors are happy to curate their own papers • Co-curation by author and curator improves annotation quality (especially PhD/post doc/recent papers). • 9619 annotations (FTPO/GO/MOD) created by Community from 510 publications (excludes HTP spreadsheet submissions)
  • 44. Some example sessions • http://tinyurl.com/q2bgyqv • http://tinyurl.com/p7d979b • http://tinyurl.com/o72bzul
  • 45. Very specific annotation is possible because Canto guides the user step by step to construct genotypes and ontology based annotations. “Drill down” to more specific terms is assisted. Prompts are provided for AE of specified types for certain terms.
  • 46. 10. Prioritise error fixing • Fixing known errors takes precedence over new annotation.... like critical bugs in code • Even small errors often uncover larger issues, or can fix many problems simultaneously across multiple species. • Prevents propagation of annotation errors
  • 47. 11. GO process vs. phenotype • GO annotation should reflect a gene's direct involvement in, or role in regulating, processes or functions. • Phenotypes may indicate that a mutation *affects* a process, but may reflect downstream or indirect effects. e.g. ER membrane defect -> nuclear envelope defect -> chromosome decondensation defect-> defects in next round of DNA replication. • A “DNA replication phenotype” alone is not enough to make a “DNA replication” GO annotation. • Single phenotype is often NOT SPECIFIC FOR A PROCESS.
  • 48. Phenotype annotation rules • To make GO annotations based on phenotypes • Ask the question “Is this phenotype or collection of phenotypes specific to this process (usually need detailed phenotypes) Additional data can support GO inference from phenotype (location, orthology), and author intent. (Intersections between processes useful for identifying annotation errors caused by indirect annotation)

Editor's Notes

  • #2: describe some pombase curation procedures, might be useful to other daabases/curators
  • #3: Coverage, genes annotated OR number of different processes for a gene
  • #8: Another improtant poitn is that annotations are explicity coupled by using a term which covers both (although this can also be done with extensions)
  • #16: Arrange temporally?
  • #25: NOTE: we don’t filter redundant EXP annotations, but we do manage this in the display so the term is presented and the source (often multiple) is avaiable in a full view Later we hope to hide higher level EXP annotations
  • #29: Complexes cluster together, some genes incorrectly annotated, can work out how they are connected, check appropriate sub porcesses annotated fr complexes, complex annotations are internally consistent etc
  • #38: Add error type examples
  • #40: CHEbi u= sed to define chemicals in GO
  • #42: This isn’t speculative, its the curator using what is known but not explicitly stated, it’s a valid interpretation of the experiment based on what is presented- we are modelling the biology