SlideShare a Scribd company logo
UNIVERSITY OF 
CALIFORNIA
An introduction to Web Apollo. 
A webinar for the Eurytemora affinis research community. 
Monica Munoz-Torres, PhD | @monimunozto 
Berkeley Bioinformatics Open-Source Projects (BBOP) 
Genomics Division, Lawrence Berkeley National Laboratory 
29 August, 2014 
UNIVERSITY OF 
CALIFORNIA
Outline 
1. What is Web Apollo?: 
• Definition & working concept. 
2. Our Experience With Community 
Based Curation. 
3. The Manual Annotation Process. 
4. Becoming acquainted with Web 
Apollo. 
5. Example. 
An introduction to 
Web Apollo. 
A webinar for the 
Eurytemora affinis 
research community. 
Outline 3
During this webinar you will: 
• Learn to identify homologs of known genes of interest 
in your newly sequenced genome. 
• Become familiar with the environment and 
functionality of the Web Apollo genome annotation 
editing tool. 
Footer 4
What is Web Apollo? 
• Web Apollo is a web-based, collaborative genomic 
annotation editing platform. 
We need annotation editing tools to modify and refine the 
precise location and structure of the genome elements that 
predictive algorithms cannot yet resolve automatically. 
1. What is Web Apollo? 5 
Find more about Web Apollo at 
http://GenomeArchitect.org 
and 
Genome Biol 14:R93. (2013).
Brief history of Apollo*: 
Biologists could finally visualize computational analyses and 
experimental evidence from genomic features and build 
manually-curated consensus gene structures. Apollo became a 
very popular, open source tool (insects, fish, mammals, birds, etc.). 
a. Desktop: 
one person at a time editing a 
specific region, annotations 
saved in local files; slowed down 
collaboration. 
b. Java Web Start: 
users saved annotations directly 
to a centralized database; 
potential issues with stale 
annotation data remained. 
1. What is Web Apollo? 6 
*
Web Apollo 
• Browser-based tool integrated with JBrowse. 
• Two new tracks: “Annotation” and “DNA Sequence” 
• Allows for intuitive annotation creation and editing, 
with gestures and pull-down menus to create and 
modify transcripts and exons 
structures, insert comments 
(CV, freeform text), etc. 
• Customizable look & feel. 
• Edits in one client are 
instantly pushed to all other 
clients: Collaborative! 
1. What is Web Apollo? 7
Working 
Concept 
In the context of gene manual annotation, 
curation tries to find the best examples 
and/or eliminate most errors. 
To conduct manual annotation efforts: 
Gather and evaluate all available evidence 
using quality-control metrics to 
corroborate or modify automated 
annotation predictions. 
Perform sequence similarity searches 
(phylogenetic framework) and use 
literature and public databases to: 
• Predict functional assignments from 
experimental data. 
• Distinguish orthologs from paralogs, 
and classify gene membership in 
families and networks. 
Automated gene models 
Evidence: 
cDNAs, HMM domain searches, 
alignments with assemblies or 
genes from other species. 
Manual annotation & curation 
2. In our experience. 8
Dispersed, community-based gene 
manual annotation efforts. 
We continuously train and support 
hundreds of geographically dispersed 
scientists from many research 
communities, to perform biologically 
supported manual annotations using 
Web Apollo. 
– Gate keepers and monitoring. 
– Written tutorials. 
– Training workshops and geneborees. 
– Personalized user support. 
2. In our experience. 9
What we have learned. 
Harvesting expertise from dispersed researchers who 
assigned functions to predicted and curated peptides 
we have developed more interactive and 
responsive tools, as well as better visualization, 
editing, and analysis capabilities. 
2. In our experience. 10 
http://people.csail.mit.edu/fredo/PUBLI/Drawing/
Collaborative Efforts Improved 
Automated Annotations 
In many cases, automated annotations have been 
improved (e.g: Apis mellifera. Elsik et al. BMC Genomics 2014, 15:86). 
Also, learned of the challenges of newer sequencing 
technologies, e.g.: 
– Frameshifts and indel errors 
– Split genes across scaffolds 
– Highly repetitive sequences 
To face these challenges, we train annotators in 
recovering coding sequences in agreement with all 
available biological evidence. 
2. In our experience. 11
It is helpful to work together. 
Scientific community efforts bring together domain-specific 
and natural history expertise that would 
otherwise remain disconnected. 
Breaking down large amounts of data into 
manageable portions and mobilizing groups 
of researchers to extract the most accurate 
representation of the biology from all 
available data distills invaluable 
knowledge from genome analysis. 
2. In our experience. 12
Understanding the evolution of sociality 
Comparing the genomes of 7 species of ants 
contributed to a better understanding of the 
evolution and organization of insect societies 
at the molecular level. 
Insights drawn mainly from six core aspects of 
ant biology: 
1. Alternative morphological castes 
2. Division of labor 
3. Chemical Communication 
4. Alternative social organization 
5. Social immunity 
6. Mutualism 
13 
Libbrecht et al. 2012. Genome Biology 2013, 14:212 
Groups of 
communities 
continue to guide 
our efforts. 
Atta cephalotes (above) and Harpegnathos saltator. 
©alexanderwild.com 
2. In our experience.
A little training goes a long way! 
With the right tools, wet lab scientists make exceptional 
curators who can easily learn to maximize the 
generation of accurate, biologically supported gene 
models. 
2. In our experience. 14
Manual 
Annotation 
How do we get there? 
15 
Assembly 
Manual 
annotation 
Experimental 
validation 
Automated 
Annotation 
In a genome sequencing project… 
3. How do we get there?
Gene Prediction 
Identification of protein-coding genes, tRNAs, rRNAs, 
regulatory motifs, repetitive elements (masked), etc. 
- Ab initio (DNA composition): Augustus, GENSCAN, 
geneid, fgenesh 
- Homology-based: E.g: SGP2, fgenesh++ 
16 
Nucleic Acids 2003 vol. 31 no. 13 3738-3741 
3. How do we get there?
Gene Annotation 
Integration of data from prediction tools to generate a 
consensus set of predictions or gene models. 
• Models may be organized using: 
- automatic integration of predicted sets; e.g: GLEAN 
- packaging necessary tools into pipeline; e.g: MAKER 
• All available biological evidence (e.g. transcriptomes) further 
informs the annotation process. 
In some cases algorithms and metrics used to generate 
consensus sets may actually reduce the accuracy of the 
gene’s representation; in such cases it is usually better to 
use an ab initio model to create a new annotation. 
3. How do we get there? 17
Manual Genome Annotation 
• Identifies elements that best represent the underlying 
biology. 
• Eliminates elements that reflect the systemic errors of 
automated genome analyses. 
• Determines functional roles through comparative 
analysis of well-studied, phylogenetically similar 
genome elements using literature, databases, and 
the researcher’s experience. 
3. How do we get there? 18
Curation Process: is Necessary 
1. A computationally predicted consensus gene set is 
generated using multiple lines of evidence. 
2. Manual annotation takes place. 
3. Ideally consensus computational predictions will be 
integrated with manual annotations to produce an 
updated Official Gene Set (OGS). 
Otherwise, “incorrect and incomplete genome annotations 
will poison every experiment that uses them”. 
- M. Yandell. 
3. How do we get there? 19
The Collaborative Curation Process at 
i5K 
1) A computationally predicted consensus gene set has 
been generated using multiple lines of evidence; e.g. 
Consensus Gene EAFF_v0.5.3-Models. 
2) i5K Projects will integrate consensus computational 
predictions with manual annotations to produce an updated 
Official Gene Set (OGS): 
» If it’s not on either track, it won’t make the OGS! 
» If it’s there and it shouldn’t, it will still make the OGS! 
3. How do we get there? 20
Consensus set: reference and start point 
• In some cases algorithms and metrics used to generate 
consensus sets may actually reduce the accuracy of the gene’s 
representation; e.g. use Augustus model instead to create a new 
annotation. 
• Isoforms: drag original and alternatively spliced form to ‘User-created 
Annotations’ area. 
• If an annotation needs to be removed from the consensus set, 
drag it to the ‘User-created Annotations’ area and label as 
‘Delete’ on Information Editor. 
• Overlapping interests? Collaborate to reach agreement. 
• Follow guidelines for i5K Pilot Species Projects as shown at 
http://goo.gl/LRu1VY 
3. How do we get there? 21
Web Apollo
Web Apollo The Sequence Selection Window 
Sort 
4. Becoming Acquainted with Web Apollo. 
23
Navigation tools: 
pan and zoom Search box: go 
to a scaffold or 
a gene model. 
Grey bar of coordinates 
indicates location. You can 
also select here in order to 
zoom to a sub-region. 
‘View’: change 
color by CDS, 
toggle strands, 
set highlight. 
‘File’: 
Upload your own 
evidence: GFF3, 
BAM, BigWig, VCF*. 
Add combination 
and sequence 
search tracks. 
‘Tools’: 
Use BLAT to query the 
genome with a protein 
or DNA sequence. 
Available Tracks 
‘User-created Annotations’ Track 
Evidence Tracks Area 
Login 
Web Apollo 
Graphical User Interface (GUI) for editing annotations 
4. Becoming Acquainted with Web Apollo.
Flags non-canonical 
splice 
sites. 
Selection of features and 
sub-features 
Edge-matching 
‘User-created Annotations’ Track 
Evidence Tracks Area 
The editing logic in the server: 
 selects longest ORF as CDS 
 flags non-canonical splice sites 
Web Apollo 
4. Becoming Acquainted with Web Apollo. 
25
Web Apollo 
DNA Track 
‘User-created Annotations’ Track 
4. Becoming Acquainted with Web Apollo. 
 There are two new kinds of tracks for: 
 annotation editing 
 sequence alteration editing
Web Apollo 
Annotations, annotation edits, and History: stored in a centralized database. 
4. Becoming Acquainted with Web Apollo.
Web Apollo 
4. Becoming Acquainted with Web Apollo. 
28 
• DBXRefs 
• PubMed IDs 
• GO terms 
• Comments 
The Information Editor
Additional Functionality 
In addition to protein-coding gene annotation that you know and love. 
• Non-coding genes: ncRNAs, miRNAs, repeat regions, and TEs 
• Sequence alterations (less coverage = more fragmentation) 
• Visualization of stage and cell-type specific transcription data as 
coverage plots, heat maps, and alignments 
4. Becoming Acquainted with Web Apollo.
How to begin curating 
To find the gene region you wish to annotate, you may use: 
a) a protein sequence from another species 
b) a sequence from a similar gene 
c) on your own, you aligned your gene models or transcriptomic data to the genome. 
d) you used high quality proteins and/or gene family alignments (multi or single 
species) and are able to identify conserved domains. 
Option 1 – You have a sequence but don’t know where it is in this genome: 
• Use BLAT in Web Apollo window, or BLAST at NAL’s i5k BLAST server, available at: 
http://i5k.nal.usda.gov/blastn 
• Alternatively, use any other tool; for example Geneious. 
Option 2 – The genome has already been annotated with your sequences and you have a gene 
identifier that has been indexed in Web Apollo. 
• That is, you know where to look, so type the ID in the Search box of Web Apollo. 
• Web Apollo autocompletes using a case-insensitive search anchored on the left-hand side of 
the word. For example “HaGR” will show all “hagr” objects (up to 30). 
• Choose one of the genes and click “Go”. 
• You can do that with Domains, Alignments or Gene names provided to you (if they have been 
indexed). 
Option 3 – Find genes based on functional ontology terms or network membership identifiers.
General Process of Curation 
1. Select the chromosomal region of interest, e.g. scaffold. 
2. Select appropriate evidence tracks. 
3. Determine whether a feature in an existing evidence track will 
provide a reasonable gene model to start working. 
- If yes: select and drag the feature to the ‘User-created 
Annotations’ area, creating an initial gene model. If necessary 
use editing functions to adjust the gene model. 
- Nothing available to you? Let’s have a talk. 
4. Check your edited gene model for integrity and accuracy by 
comparing it with available homologs. 
4. Becoming Acquainted with Web Apollo 
31 | 
Always remember: when annotating gene models using Web 
Apollo, you are looking at a ‘frozen’ version of the genome 
assembly and you will not be able to modify the assembly itself.
Example 
Introductory demonstration using the Apis mellifera genome. 
Q&A session using the Eurytemora affinis genome at 
https://apollo.nal.usda.gov/euraff/selectTrack.jsp 
A public Honey Bee Web Apollo Demo is available at 
http://genomearchitect.org/WebApolloDemo 
Example 32
What do we know for this species? 
• What data are currently available? 
• At NCBI: 
• 5,570 nucleotide sequences  scaffolds 
• 446 amino acid sequences  CO-I 
• 0 conserved domains identified 
• 0 “gene” entries submitted 
Footer 33
PubMed Search: what’s new? 
Footer 34 
Empirical examples of 
beneficial reversal of 
dominance: 
• Warfarin resistance: mutation 
of VKORC1 is associated with 
increased dietary requirement 
for vit. K
How many sequences for your gene of 
interest? 
And what do we know about it? 
• VKORC1 – vit. K epoxide reductase 
complex, subunit 1. 
• MF: quinone binding (IEA, 
GO:0048038), vit K epoxide reductase 
activity (IDA, GO:0047057). 
• BP: blood coagulation (IMP, 
GO:0007596), bone development 
(ISS,GO:0060348). 
• CC: endoplasmic reticulum membrane 
(TAS, GO:0005789), integral 
component of membrane (IEA, 
GO:0016021). 
Footer 35
BLAST at i5K 
https://i5k.nal.usda.gov/blast 
Footer 36 
To Web Apollo
BLAST at i5K: hsps in “BLAST+ results” track 
Footer 37
Available Tracks 
Footer 38
Creating a new gene model: drag and drop 
• Web Apollo automatically calculates the longest open reading 
frame (ORF). In this case, the ORF includes the hsp. 
Footer 39
Get Sequence 
Footer 40 
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Flanking sequences (other gene models) vs. NCBI nr 
At 3’ end 
Footer 41 
At 5’ end
Additional evidence in support of split 
Footer 42
Editing: split 
Footer 43
Finished model 
Footer 44
Information Editor 
• DBXRefs: NP_076869.1, H. sapiens, RefSeq 
• PubMed identifier: PMID:24337963 
• Gene Ontology IDs: GO:0048038, GO:0047057, GO:0007596, 
GO:0060348, GO:0005789, GO:0016021. 
• Comments. 
• Name, Symbol. 
• Approve / Delete radio button. 
Footer 45 
Comments
Arthropodcentric Thanks! 
AgriPest Base 
FlyBase 
Hymenoptera Genome Database 
VectorBase 
Acromyrmex echinatior 
Acyrthosiphon pisum 
Apis mellifera 
Atta cephalotes 
Bombus terrestris 
Camponotus floridanus 
Helicoverpa armigera 
Linepithema humile 
Manduca sexta 
Mayetiola destructor 
Nasonia vitripennis 
Pogonomyrmex barbatus 
Solenopsis invicta 
Tribolium castaneum… and you!
Thanks! 
• Berkeley Bioinformatics Open-source Projects 
(BBOP), Berkeley Lab: Web Apollo and Gene Ontology 
teams. Suzanna E. Lewis (PI). 
• Christine G. Elsik (PI). § University of Missouri. 
• Ian Holmes (PI). University of California, Berkeley. 
• Arthropod genomics community, i5K Steering 
Committee, Monica Poelchau at USDA/NAL, fringy 
Richards at HGSC-BCM, Alexie Papanicolaou at 
CSIRO, Oliver Niehuis at 1KITE http://www.1kite.org/, 
BGI, and the Honey Bee Genome Sequencing 
Consortium. 
• Web Apollo is supported by NIH grants 
5R01GM080203 from NIGMS, and 5R01HG004483 
from NHGRI, and by the Director, Office of Science, 
Office of Basic Energy Sciences, of the U.S. 
Department of Energy under Contract No. DE-AC02- 
05CH11231. 
• Insect images used with permission: 
http://AlexanderWild.com and O. Niehuis. 
• For your attention, thank you! 
Colleagues at BBOP 
Web Apollo 
Suzanna Lewis 
Gregg Helt 
Colin Diesh § 
Deepak Unni § 
Gene Ontology 
Chris Mungall 
Seth Carbon 
Heiko Dietze 
Web Apollo: http://GenomeArchitect.org 
GO: http://GeneOntology.org 
i5K: http://arthropodgenomes.org/wiki/i5K 
Thank you. 47

More Related Content

PDF
An introduction to Web Apollo for the Biomphalaria glabatra research community.
PDF
Web Apollo Workshop UIUC
PDF
Web Apollo Tutorial for Medfly Research Community
PPTX
An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
PDF
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
PDF
Advanced Bioinformatics for Genomics and BioData Driven Research
PPT
SooryaKiran Bioinformatics
PDF
An introduction to Web Apollo for the Biomphalaria glabatra research community.
Web Apollo Workshop UIUC
Web Apollo Tutorial for Medfly Research Community
An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Advanced Bioinformatics for Genomics and BioData Driven Research
SooryaKiran Bioinformatics

What's hot (20)

PPTX
Three's a crowd-source: Observations on Collaborative Genome Annotation
PDF
Apollo provides collaborative genome annotation editing with the power of jbr...
PPTX
The Gene Ontology & Gene Ontology Annotation resources
PDF
Apollo annotation guidelines for i5k projects Diaphorina citri
PDF
Metabolic Network Analysis
PDF
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
PDF
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
PDF
2015_CV_J_SHELTON_linked
PPTX
2013 nas-ehs-data-integration-dc
PPT
Gene Ontology Project
PDF
Ontologies for life sciences: examples from the gene ontology
PPTX
Gene Ontology WormBase Workshop International Worm Meeting 2015
PDF
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
PPTX
Sequence Submission Tools
PPTX
Cshl minseqe 2013_ouellette
PPTX
Light Intro to the Gene Ontology
PDF
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
PPTX
Phylogenetics: Making publication-quality tree figures
PPTX
2014 sage-talk
Three's a crowd-source: Observations on Collaborative Genome Annotation
Apollo provides collaborative genome annotation editing with the power of jbr...
The Gene Ontology & Gene Ontology Annotation resources
Apollo annotation guidelines for i5k projects Diaphorina citri
Metabolic Network Analysis
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
2015_CV_J_SHELTON_linked
2013 nas-ehs-data-integration-dc
Gene Ontology Project
Ontologies for life sciences: examples from the gene ontology
Gene Ontology WormBase Workshop International Worm Meeting 2015
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Sequence Submission Tools
Cshl minseqe 2013_ouellette
Light Intro to the Gene Ontology
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Phylogenetics: Making publication-quality tree figures
2014 sage-talk
Ad

Similar to Web Apollo Tutorial for the i5K copepod research community. (20)

PDF
Web Apollo Workshop University of Exeter
PPTX
Introduction to Web Apollo for the i5K pilot species.
PPTX
Munoz torres web-apollo-workshop_exeter-2014_ss
PDF
Web Apollo at Genome Informatics 2014
PPTX
Web Apollo: Lessons learned from community-based biocuration efforts.
PDF
Apollo Workshop at KSU 2015
PDF
Curation Introduction - Apollo Workshop
PDF
Introduction to Apollo for i5k
PDF
Apollo Collaborative genome annotation editing
PPTX
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
PPTX
BioAssay Express: Creating and exploiting assay metadata
PDF
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
PDF
Variant analysis and whole exome sequencing
PPTX
Web based servers and softwares for genome analysis
PPTX
Introduction to bioinformatics
PPTX
Molecular Biology Software Links
PPTX
Kuchinsky_Cytoscape_BOSC2009
PPTX
Designing a community resource - Sandra Orchard
PPT
Kino : Making Semantic Annotations Easier
Web Apollo Workshop University of Exeter
Introduction to Web Apollo for the i5K pilot species.
Munoz torres web-apollo-workshop_exeter-2014_ss
Web Apollo at Genome Informatics 2014
Web Apollo: Lessons learned from community-based biocuration efforts.
Apollo Workshop at KSU 2015
Curation Introduction - Apollo Workshop
Introduction to Apollo for i5k
Apollo Collaborative genome annotation editing
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
BioAssay Express: Creating and exploiting assay metadata
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Variant analysis and whole exome sequencing
Web based servers and softwares for genome analysis
Introduction to bioinformatics
Molecular Biology Software Links
Kuchinsky_Cytoscape_BOSC2009
Designing a community resource - Sandra Orchard
Kino : Making Semantic Annotations Easier
Ad

More from Monica Munoz-Torres (20)

PDF
Apollo Workshop AGS2017 Editing functionality
PDF
Apollo Workshop AGS2017 Introduction
PDF
Editing Functionality - Apollo Workshop
PDF
Apollo Exercises Kansas State University 2015
PDF
JBrowse & Apollo Overview - for AGR
PDF
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
PDF
Gene Ontology Consortium: Website & COmmunity
PDF
Essential Requirements for Community Annotation Tools
PDF
Genome Curation using Apollo - Workshop at UTK
PDF
Introduction to Apollo: i5K E affinis
PDF
Introduction to Apollo: A webinar for the i5K Research Community
PDF
Genome Curation using Apollo
PDF
Apollo Introduction for i5K Groups 2015-10-07
PDF
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
PDF
Apolo Taller en BIOS
PDF
Apollo Introduction for the Chestnut Research Community
PDF
Apollo : A workshop for the Manakin Research Coordination Network
PDF
Apollo - A webinar for the Phascolarctos cinereus research community
PDF
PAINT Family PTHR13451-MUS81
PDF
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Introduction
Editing Functionality - Apollo Workshop
Apollo Exercises Kansas State University 2015
JBrowse & Apollo Overview - for AGR
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Gene Ontology Consortium: Website & COmmunity
Essential Requirements for Community Annotation Tools
Genome Curation using Apollo - Workshop at UTK
Introduction to Apollo: i5K E affinis
Introduction to Apollo: A webinar for the i5K Research Community
Genome Curation using Apollo
Apollo Introduction for i5K Groups 2015-10-07
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
Apolo Taller en BIOS
Apollo Introduction for the Chestnut Research Community
Apollo : A workshop for the Manakin Research Coordination Network
Apollo - A webinar for the Phascolarctos cinereus research community
PAINT Family PTHR13451-MUS81
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015

Recently uploaded (20)

PPTX
Pharma ospi slides which help in ospi learning
PDF
Classroom Observation Tools for Teachers
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Cell Structure & Organelles in detailed.
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
master seminar digital applications in india
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Sports Quiz easy sports quiz sports quiz
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Basic Mud Logging Guide for educational purpose
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
GDM (1) (1).pptx small presentation for students
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Institutional Correction lecture only . . .
Pharma ospi slides which help in ospi learning
Classroom Observation Tools for Teachers
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Cell Structure & Organelles in detailed.
102 student loan defaulters named and shamed – Is someone you know on the list?
master seminar digital applications in india
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
TR - Agricultural Crops Production NC III.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Sports Quiz easy sports quiz sports quiz
O7-L3 Supply Chain Operations - ICLT Program
Basic Mud Logging Guide for educational purpose
PPH.pptx obstetrics and gynecology in nursing
Abdominal Access Techniques with Prof. Dr. R K Mishra
GDM (1) (1).pptx small presentation for students
STATICS OF THE RIGID BODIES Hibbelers.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Institutional Correction lecture only . . .

Web Apollo Tutorial for the i5K copepod research community.

  • 2. An introduction to Web Apollo. A webinar for the Eurytemora affinis research community. Monica Munoz-Torres, PhD | @monimunozto Berkeley Bioinformatics Open-Source Projects (BBOP) Genomics Division, Lawrence Berkeley National Laboratory 29 August, 2014 UNIVERSITY OF CALIFORNIA
  • 3. Outline 1. What is Web Apollo?: • Definition & working concept. 2. Our Experience With Community Based Curation. 3. The Manual Annotation Process. 4. Becoming acquainted with Web Apollo. 5. Example. An introduction to Web Apollo. A webinar for the Eurytemora affinis research community. Outline 3
  • 4. During this webinar you will: • Learn to identify homologs of known genes of interest in your newly sequenced genome. • Become familiar with the environment and functionality of the Web Apollo genome annotation editing tool. Footer 4
  • 5. What is Web Apollo? • Web Apollo is a web-based, collaborative genomic annotation editing platform. We need annotation editing tools to modify and refine the precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. 1. What is Web Apollo? 5 Find more about Web Apollo at http://GenomeArchitect.org and Genome Biol 14:R93. (2013).
  • 6. Brief history of Apollo*: Biologists could finally visualize computational analyses and experimental evidence from genomic features and build manually-curated consensus gene structures. Apollo became a very popular, open source tool (insects, fish, mammals, birds, etc.). a. Desktop: one person at a time editing a specific region, annotations saved in local files; slowed down collaboration. b. Java Web Start: users saved annotations directly to a centralized database; potential issues with stale annotation data remained. 1. What is Web Apollo? 6 *
  • 7. Web Apollo • Browser-based tool integrated with JBrowse. • Two new tracks: “Annotation” and “DNA Sequence” • Allows for intuitive annotation creation and editing, with gestures and pull-down menus to create and modify transcripts and exons structures, insert comments (CV, freeform text), etc. • Customizable look & feel. • Edits in one client are instantly pushed to all other clients: Collaborative! 1. What is Web Apollo? 7
  • 8. Working Concept In the context of gene manual annotation, curation tries to find the best examples and/or eliminate most errors. To conduct manual annotation efforts: Gather and evaluate all available evidence using quality-control metrics to corroborate or modify automated annotation predictions. Perform sequence similarity searches (phylogenetic framework) and use literature and public databases to: • Predict functional assignments from experimental data. • Distinguish orthologs from paralogs, and classify gene membership in families and networks. Automated gene models Evidence: cDNAs, HMM domain searches, alignments with assemblies or genes from other species. Manual annotation & curation 2. In our experience. 8
  • 9. Dispersed, community-based gene manual annotation efforts. We continuously train and support hundreds of geographically dispersed scientists from many research communities, to perform biologically supported manual annotations using Web Apollo. – Gate keepers and monitoring. – Written tutorials. – Training workshops and geneborees. – Personalized user support. 2. In our experience. 9
  • 10. What we have learned. Harvesting expertise from dispersed researchers who assigned functions to predicted and curated peptides we have developed more interactive and responsive tools, as well as better visualization, editing, and analysis capabilities. 2. In our experience. 10 http://people.csail.mit.edu/fredo/PUBLI/Drawing/
  • 11. Collaborative Efforts Improved Automated Annotations In many cases, automated annotations have been improved (e.g: Apis mellifera. Elsik et al. BMC Genomics 2014, 15:86). Also, learned of the challenges of newer sequencing technologies, e.g.: – Frameshifts and indel errors – Split genes across scaffolds – Highly repetitive sequences To face these challenges, we train annotators in recovering coding sequences in agreement with all available biological evidence. 2. In our experience. 11
  • 12. It is helpful to work together. Scientific community efforts bring together domain-specific and natural history expertise that would otherwise remain disconnected. Breaking down large amounts of data into manageable portions and mobilizing groups of researchers to extract the most accurate representation of the biology from all available data distills invaluable knowledge from genome analysis. 2. In our experience. 12
  • 13. Understanding the evolution of sociality Comparing the genomes of 7 species of ants contributed to a better understanding of the evolution and organization of insect societies at the molecular level. Insights drawn mainly from six core aspects of ant biology: 1. Alternative morphological castes 2. Division of labor 3. Chemical Communication 4. Alternative social organization 5. Social immunity 6. Mutualism 13 Libbrecht et al. 2012. Genome Biology 2013, 14:212 Groups of communities continue to guide our efforts. Atta cephalotes (above) and Harpegnathos saltator. ©alexanderwild.com 2. In our experience.
  • 14. A little training goes a long way! With the right tools, wet lab scientists make exceptional curators who can easily learn to maximize the generation of accurate, biologically supported gene models. 2. In our experience. 14
  • 15. Manual Annotation How do we get there? 15 Assembly Manual annotation Experimental validation Automated Annotation In a genome sequencing project… 3. How do we get there?
  • 16. Gene Prediction Identification of protein-coding genes, tRNAs, rRNAs, regulatory motifs, repetitive elements (masked), etc. - Ab initio (DNA composition): Augustus, GENSCAN, geneid, fgenesh - Homology-based: E.g: SGP2, fgenesh++ 16 Nucleic Acids 2003 vol. 31 no. 13 3738-3741 3. How do we get there?
  • 17. Gene Annotation Integration of data from prediction tools to generate a consensus set of predictions or gene models. • Models may be organized using: - automatic integration of predicted sets; e.g: GLEAN - packaging necessary tools into pipeline; e.g: MAKER • All available biological evidence (e.g. transcriptomes) further informs the annotation process. In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representation; in such cases it is usually better to use an ab initio model to create a new annotation. 3. How do we get there? 17
  • 18. Manual Genome Annotation • Identifies elements that best represent the underlying biology. • Eliminates elements that reflect the systemic errors of automated genome analyses. • Determines functional roles through comparative analysis of well-studied, phylogenetically similar genome elements using literature, databases, and the researcher’s experience. 3. How do we get there? 18
  • 19. Curation Process: is Necessary 1. A computationally predicted consensus gene set is generated using multiple lines of evidence. 2. Manual annotation takes place. 3. Ideally consensus computational predictions will be integrated with manual annotations to produce an updated Official Gene Set (OGS). Otherwise, “incorrect and incomplete genome annotations will poison every experiment that uses them”. - M. Yandell. 3. How do we get there? 19
  • 20. The Collaborative Curation Process at i5K 1) A computationally predicted consensus gene set has been generated using multiple lines of evidence; e.g. Consensus Gene EAFF_v0.5.3-Models. 2) i5K Projects will integrate consensus computational predictions with manual annotations to produce an updated Official Gene Set (OGS): » If it’s not on either track, it won’t make the OGS! » If it’s there and it shouldn’t, it will still make the OGS! 3. How do we get there? 20
  • 21. Consensus set: reference and start point • In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representation; e.g. use Augustus model instead to create a new annotation. • Isoforms: drag original and alternatively spliced form to ‘User-created Annotations’ area. • If an annotation needs to be removed from the consensus set, drag it to the ‘User-created Annotations’ area and label as ‘Delete’ on Information Editor. • Overlapping interests? Collaborate to reach agreement. • Follow guidelines for i5K Pilot Species Projects as shown at http://goo.gl/LRu1VY 3. How do we get there? 21
  • 23. Web Apollo The Sequence Selection Window Sort 4. Becoming Acquainted with Web Apollo. 23
  • 24. Navigation tools: pan and zoom Search box: go to a scaffold or a gene model. Grey bar of coordinates indicates location. You can also select here in order to zoom to a sub-region. ‘View’: change color by CDS, toggle strands, set highlight. ‘File’: Upload your own evidence: GFF3, BAM, BigWig, VCF*. Add combination and sequence search tracks. ‘Tools’: Use BLAT to query the genome with a protein or DNA sequence. Available Tracks ‘User-created Annotations’ Track Evidence Tracks Area Login Web Apollo Graphical User Interface (GUI) for editing annotations 4. Becoming Acquainted with Web Apollo.
  • 25. Flags non-canonical splice sites. Selection of features and sub-features Edge-matching ‘User-created Annotations’ Track Evidence Tracks Area The editing logic in the server:  selects longest ORF as CDS  flags non-canonical splice sites Web Apollo 4. Becoming Acquainted with Web Apollo. 25
  • 26. Web Apollo DNA Track ‘User-created Annotations’ Track 4. Becoming Acquainted with Web Apollo.  There are two new kinds of tracks for:  annotation editing  sequence alteration editing
  • 27. Web Apollo Annotations, annotation edits, and History: stored in a centralized database. 4. Becoming Acquainted with Web Apollo.
  • 28. Web Apollo 4. Becoming Acquainted with Web Apollo. 28 • DBXRefs • PubMed IDs • GO terms • Comments The Information Editor
  • 29. Additional Functionality In addition to protein-coding gene annotation that you know and love. • Non-coding genes: ncRNAs, miRNAs, repeat regions, and TEs • Sequence alterations (less coverage = more fragmentation) • Visualization of stage and cell-type specific transcription data as coverage plots, heat maps, and alignments 4. Becoming Acquainted with Web Apollo.
  • 30. How to begin curating To find the gene region you wish to annotate, you may use: a) a protein sequence from another species b) a sequence from a similar gene c) on your own, you aligned your gene models or transcriptomic data to the genome. d) you used high quality proteins and/or gene family alignments (multi or single species) and are able to identify conserved domains. Option 1 – You have a sequence but don’t know where it is in this genome: • Use BLAT in Web Apollo window, or BLAST at NAL’s i5k BLAST server, available at: http://i5k.nal.usda.gov/blastn • Alternatively, use any other tool; for example Geneious. Option 2 – The genome has already been annotated with your sequences and you have a gene identifier that has been indexed in Web Apollo. • That is, you know where to look, so type the ID in the Search box of Web Apollo. • Web Apollo autocompletes using a case-insensitive search anchored on the left-hand side of the word. For example “HaGR” will show all “hagr” objects (up to 30). • Choose one of the genes and click “Go”. • You can do that with Domains, Alignments or Gene names provided to you (if they have been indexed). Option 3 – Find genes based on functional ontology terms or network membership identifiers.
  • 31. General Process of Curation 1. Select the chromosomal region of interest, e.g. scaffold. 2. Select appropriate evidence tracks. 3. Determine whether a feature in an existing evidence track will provide a reasonable gene model to start working. - If yes: select and drag the feature to the ‘User-created Annotations’ area, creating an initial gene model. If necessary use editing functions to adjust the gene model. - Nothing available to you? Let’s have a talk. 4. Check your edited gene model for integrity and accuracy by comparing it with available homologs. 4. Becoming Acquainted with Web Apollo 31 | Always remember: when annotating gene models using Web Apollo, you are looking at a ‘frozen’ version of the genome assembly and you will not be able to modify the assembly itself.
  • 32. Example Introductory demonstration using the Apis mellifera genome. Q&A session using the Eurytemora affinis genome at https://apollo.nal.usda.gov/euraff/selectTrack.jsp A public Honey Bee Web Apollo Demo is available at http://genomearchitect.org/WebApolloDemo Example 32
  • 33. What do we know for this species? • What data are currently available? • At NCBI: • 5,570 nucleotide sequences  scaffolds • 446 amino acid sequences  CO-I • 0 conserved domains identified • 0 “gene” entries submitted Footer 33
  • 34. PubMed Search: what’s new? Footer 34 Empirical examples of beneficial reversal of dominance: • Warfarin resistance: mutation of VKORC1 is associated with increased dietary requirement for vit. K
  • 35. How many sequences for your gene of interest? And what do we know about it? • VKORC1 – vit. K epoxide reductase complex, subunit 1. • MF: quinone binding (IEA, GO:0048038), vit K epoxide reductase activity (IDA, GO:0047057). • BP: blood coagulation (IMP, GO:0007596), bone development (ISS,GO:0060348). • CC: endoplasmic reticulum membrane (TAS, GO:0005789), integral component of membrane (IEA, GO:0016021). Footer 35
  • 36. BLAST at i5K https://i5k.nal.usda.gov/blast Footer 36 To Web Apollo
  • 37. BLAST at i5K: hsps in “BLAST+ results” track Footer 37
  • 39. Creating a new gene model: drag and drop • Web Apollo automatically calculates the longest open reading frame (ORF). In this case, the ORF includes the hsp. Footer 39
  • 40. Get Sequence Footer 40 http://blast.ncbi.nlm.nih.gov/Blast.cgi
  • 41. Flanking sequences (other gene models) vs. NCBI nr At 3’ end Footer 41 At 5’ end
  • 42. Additional evidence in support of split Footer 42
  • 45. Information Editor • DBXRefs: NP_076869.1, H. sapiens, RefSeq • PubMed identifier: PMID:24337963 • Gene Ontology IDs: GO:0048038, GO:0047057, GO:0007596, GO:0060348, GO:0005789, GO:0016021. • Comments. • Name, Symbol. • Approve / Delete radio button. Footer 45 Comments
  • 46. Arthropodcentric Thanks! AgriPest Base FlyBase Hymenoptera Genome Database VectorBase Acromyrmex echinatior Acyrthosiphon pisum Apis mellifera Atta cephalotes Bombus terrestris Camponotus floridanus Helicoverpa armigera Linepithema humile Manduca sexta Mayetiola destructor Nasonia vitripennis Pogonomyrmex barbatus Solenopsis invicta Tribolium castaneum… and you!
  • 47. Thanks! • Berkeley Bioinformatics Open-source Projects (BBOP), Berkeley Lab: Web Apollo and Gene Ontology teams. Suzanna E. Lewis (PI). • Christine G. Elsik (PI). § University of Missouri. • Ian Holmes (PI). University of California, Berkeley. • Arthropod genomics community, i5K Steering Committee, Monica Poelchau at USDA/NAL, fringy Richards at HGSC-BCM, Alexie Papanicolaou at CSIRO, Oliver Niehuis at 1KITE http://www.1kite.org/, BGI, and the Honey Bee Genome Sequencing Consortium. • Web Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI, and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02- 05CH11231. • Insect images used with permission: http://AlexanderWild.com and O. Niehuis. • For your attention, thank you! Colleagues at BBOP Web Apollo Suzanna Lewis Gregg Helt Colin Diesh § Deepak Unni § Gene Ontology Chris Mungall Seth Carbon Heiko Dietze Web Apollo: http://GenomeArchitect.org GO: http://GeneOntology.org i5K: http://arthropodgenomes.org/wiki/i5K Thank you. 47