SlideShare a Scribd company logo
An Introduction to Web Apollo 
Manual Annotation Workshop at University of Exeter 
Monica Munoz-Torres, PhD | @monimunozto 
Berkeley Bioinformatics Open-Source Projects (BBOP) 
Genomics Division, Lawrence Berkeley National Laboratory 
At University of Exeter. October 8, 2014
TEACHING MATERIALS FOR TODAY 
Demo 
instance 
1: 
h/p://genomes.missouri.edu:8080/Amel_4.5_demo_1/selectTrack.jsp 
Demo 
instance 
2: 
h/p://genomes.missouri.edu:8080/Amel_4.5_demo_2/selectTrack.jsp 
Recommended 
Browser: 
Chrome
OUTLINE 
• MANUAL 
ANNOTATION 
working 
concept 
• COMMUNITY 
BASED 
CURATION 
in 
our 
experience 
• APOLLO 
empowering 
collaboraRve 
curaRon 
• APOLLO 
on 
THE 
WEB 
becoming 
acquainted 
• PRACTICE 
demonstraRon 
and 
exercises 
Web 
Apollo 
CollaboraRve 
CuraRon 
and 
InteracRve 
Analysis 
of 
Genomes 
3
DURING THIS WORKSHOP 
you will 
v Learn 
to 
idenRfy 
homologs 
of 
known 
genes 
of 
interest 
in 
a 
newly 
sequenced 
genome 
of 
interest. 
v Become 
familiar 
with 
the 
environment 
and 
funcRonality 
of 
the 
Web 
Apollo 
genome 
annotaRon 
ediRng 
tool. 
v Learn 
how 
to 
corroborate 
and 
/ 
or 
modify 
automaRcally 
annotated 
gene 
models 
using 
available 
biological 
evidence 
in 
Web 
Apollo. 
v Understand 
the 
process 
of 
curaRon 
in 
the 
context 
of 
genome 
annotaRon: 
from 
the 
assembled 
genome 
to 
manual 
curaRon 
via 
automated 
annotaRon. 
4
I INVITE YOU TO: 
v Observe 
the 
figures 
v Listen 
to 
the 
explanaRons 
v Interrupt 
me 
at 
any 
Rme 
to 
ask 
quesRons 
v Use 
Twi/er 
& 
share 
your 
thoughts: 
I 
am 
@monimunozto 
Some 
tags 
& 
users: 
#WebApollo 
#AnnotaRon 
#CuraRon 
#GMOD 
#genome 
@JBrowseGossip 
v Take 
brakes: 
LBL’s 
ergo 
safety 
team 
suggests 
I 
should 
not 
work 
at 
the 
computer 
for 
>45 
minutes 
without 
a 
break; 
neither 
should 
you! 
We 
will 
be 
here 
for 
2.5 
hours: 
please 
get 
up 
and 
stretch 
your 
neck, 
arms, 
and 
legs 
as 
oeen 
as 
you 
need. 
5
I kindly ask that you refrain from: 
v Reading 
all 
that 
text 
I 
wrote! 
Think 
of 
the 
text 
on 
these 
slides 
as 
your 
“class 
notes”. 
You 
will 
use 
them 
during 
exercises. 
v Checking 
email. 
I’d 
like 
to 
kindly 
ask 
for 
your 
undivided 
a/enRon.
Let Us Get Started
MANUAL ANNOTATION 
working concept 
v Automated 
genome 
analyses 
remain 
an 
imperfect 
art 
that 
cannot 
yet 
resolve 
all 
elements 
of 
the 
genome. 
v Precise 
elucidaRon 
of 
biological 
features 
encoded 
in 
the 
genome 
requires 
careful 
examinaRon 
and 
review. 
Schiex 
et 
al. 
Nucleic 
Acids 
2003 
(31) 
13: 
3738-­‐3741 
Automated Predictions 
Experimental Evidence 
cDNAs, 
HMM 
domain 
searches, 
RNAseq, 
genes 
from 
other 
species. 
Manual Curation 8
9 
Nucleic Acids 2003 vol. 31 no. 13 3738-3741 
Manual Curation 
GENE PREDICTION 
v IdenRficaRon 
of 
protein-­‐coding 
genes, 
tRNAs, 
rRNAs, 
regulatory 
moRfs, 
repeRRve 
elements 
(masked), 
etc. 
• Ab 
ini-o 
(DNA 
composiRon): 
Augustus, 
GENSCAN, 
geneid, 
fgenesh 
• Homology-­‐based: 
e.g: 
SGP2, 
fgenesh++
GENE ANNOTATION 
IntegraRon 
of 
data 
from 
predicRon 
tools 
to 
generate 
a 
consensus 
set 
of 
predicRons 
or 
gene 
models. 
v Models 
may 
be 
organized 
using: 
v automaRc 
integraRon 
of 
predicted 
sets; 
e.g: 
GLEAN 
v packaged 
tools 
from 
pipeline; 
e.g: 
MAKER 
v All 
available 
biological 
evidence 
(e.g. 
transcriptomes) 
further 
informs 
the 
annotaRon 
process. 
In 
some 
cases 
algorithms 
and 
metrics 
used 
to 
generate 
consensus 
sets 
may 
actually 
reduce 
the 
accuracy 
of 
the 
gene’s 
representaRon; 
in 
such 
cases 
it 
is 
usually 
be/er 
to 
use 
an 
10 
ab 
ini-o 
model 
to 
create 
a 
new 
annotaRon. 
Manual Curation
MANUAL ANNOTATION 
is necessary 
v Evaluate 
all 
available 
evidence 
and 
corroborate 
or 
modify 
genome 
element 
predicRons. 
v Determine 
funcRonal 
roles 
through 
comparaRve 
analysis 
using 
literature, 
databases, 
and 
experience*. 
v Resolve 
discrepancies 
and 
validate 
automated 
gene 
model 
hypotheses. 
v Desktop 
version 
of 
Apollo 
was 
designed 
to 
fit 
the 
manual 
annotaRon 
needs 
of 
genome 
projects 
such 
as 
fruit 
fly, 
mouse, 
zebrafish, 
human, 
etc. 
Manual Curation 11 
Automated Predictions 
Curated Gene Models 
Official Gene Set 
“Incorrect 
and 
incomplete 
genome 
annota-ons 
will 
poison 
every 
experiment 
that 
uses 
them”. 
-­‐ 
M. 
Yandell
BUT, MANUAL CURATION 
did not always scale well 
Too 
many 
sequences 
and 
not 
enough 
hands 
to 
approach 
curaRon. 
A 
small 
group 
of 
highly 
trained 
experts; 
e.g. 
GO 
1 
Museum 
2 
Jamboree 
A 
few 
very 
good 
biologists 
and 
a 
few 
very 
good 
bioinformaRcians 
camp 
together, 
during 
intense 
but 
short 
periods 
of 
Rme. 
3 
Co8age 
Researchers 
work 
by 
themselves, 
then 
may 
or 
may 
not 
publicize 
results; 
… 
may 
be 
a 
dead-­‐end 
with 
very 
few 
people 
ever 
aware 
of 
these 
results. 
Elsik 
et 
al. 
2006. 
Genome 
Res. 
16(11):1329-­‐33. 
Manual Curation 12
POWER TO THE CURATORS 
augment existing tools 
Give 
more 
people 
the 
power 
to 
curate! 
Fill 
in 
the 
gap 
for 
all 
the 
things 
that 
won’t 
be 
easy 
to 
cover 
with 
these 
approaches; 
this 
will 
allow 
researchers 
to 
be/er 
contribute 
their 
efforts. 
Big 
data 
are 
not 
a 
subsRtute 
for, 
but 
a 
supplement 
to 
tradiRonal 
data 
collecRon 
and 
analysis. 
The 
Parable 
of 
Google 
Flu. 
Lazer 
et 
al. 
2014. 
Science 
343 
(6176): 
1203-­‐1205. 
v Enable 
more 
curators 
to 
work 
v Enable 
be/er 
scienRfic 
publishing 
v Credit 
curators 
for 
their 
work 
Manual Curation 13
IMPROVING TOOLS FOR MANUAL ANNOTATION 
our plan 
“More 
and 
more 
sequences”: 
more 
genomes, 
within 
populaRons 
and 
across 
species, 
are 
now 
being 
sequenced. 
This 
begs 
the 
need 
for 
a 
universally 
accessible 
genome 
curaRon 
tool: 
To 
produce 
accurate 
sets 
of 
genomic 
features. 
Manual Curation 14 
To 
address 
the 
need 
to 
correct 
for 
more 
frequent 
assembly 
and 
automated 
predicRon 
errors 
due 
to 
new 
sequencing 
technologies.
GENOME ANNOTATION 
an inherently collaborative task 
Researchers 
oeen 
turn 
to 
colleagues 
for 
second 
opinions 
and 
insight 
from 
those 
with 
experRse 
in 
parRcular 
areas 
(e.g., 
domains, 
families). 
To 
facilitate 
and 
encourage 
this, 
we 
conRnue 
to 
improve 
Apollo. 
New 
Javascript-­‐based 
Apollo 
: 
h/p://GenomeArchitect.org 
v Web 
based 
for 
easy 
access. 
v Concurrent 
access 
supports 
real 
Rme 
collaboraRon. 
v Built-­‐in 
support 
for 
standards 
(transparently 
compliant). 
v AutomaRc 
generaRon 
of 
ready-­‐made 
computable 
data. 
v Client-­‐side 
applicaRon 
relieves 
server 
bo/leneck 
and 
supports 
privacy. 
v Supports 
annotaRon 
of 
genes, 
pseudogenes, 
tRNAs, 
snRNAs, 
snoRNAs, 
ncRNAs, 
miRNAs, 
TEs, 
and 
repeats. 
APOLLO 15
WEB APOLLO 
v Integrated 
with 
JBrowse. 
v Two 
new 
tracks: 
“AnnotaRons” 
and 
“DNA 
Sequence” 
v IntuiRve 
annotaRon, 
gestures 
and 
pull-­‐down 
menus 
to 
create 
and 
edit 
transcripts 
and 
exons 
structures, 
insert 
comments 
(CV, 
freeform 
text), 
etc. 
v Customizable 
look, 
feel 
& 
funcRonality. 
v Edits 
in 
one 
client 
are 
instantly 
pushed 
to 
all 
other 
clients: 
CollaboraRve! 
16 
APOLLO
WEB APOLLO 
v Provides 
dynamic 
access 
to 
genomic 
analysis 
results 
from 
UCSC 
and 
Chado 
databases, 
as 
well 
as 
database 
storage 
of 
user-­‐created 
annotaRons. 
v All 
user-­‐created 
sequence 
annotaRons 
are 
automaRcally 
uploaded 
to 
a 
server, 
ensuring 
reliability. 
17 
Chado 
UCSC 
(MySQL) 
Ensembl 
(DAS) 
BAM 
BED 
BigWig 
GFF3 
MAKER 
output 
APOLLO
WEB APOLLO 
architecture 
1 
APOLLO 18 
2 
3
DISPERSED COMMUNITIES 
collaborative manual annotation efforts 
We 
conRnuously 
train 
and 
support 
hundreds 
of 
geographically 
dispersed 
scienRsts 
from 
many 
research 
communiRes 
to 
conduct 
manual 
annotaRons, 
recovering 
coding 
sequences 
in 
agreement 
with 
all 
available 
biological 
evidence 
using 
Web 
Apollo. 
v Gate 
keeping 
and 
monitoring. 
v Tutorials, 
training 
workshops, 
and 
geneborees. 
v Personalized 
user 
support. 
19 
APOLLO
CURATION 
in this context 
20 
1 
IdenRfies 
elements 
that 
best 
represent 
the 
underlying 
biology 
(including 
missing 
genes) 
and 
eliminates 
elements 
that 
reflect 
systemic 
errors 
of 
automated 
analyses. 
2 
Assigns 
funcRon 
through 
comparaRve 
analysis 
of 
similar 
genome 
elements 
from 
closely 
related 
species 
using 
literature, 
databases, 
and 
researchers’ 
lab 
data. 
Examples 
Comparing 
7 
ant 
genomes 
contributed 
to 
be/er 
understanding 
evoluRon 
and 
organizaRon 
of 
insect 
socieRes 
at 
the 
molecular 
level; 
e.g. 
division 
of 
labor, 
mutualism, 
chemical 
communicaRon, 
etc. 
Libbrecht 
et 
al. 
2012. 
Genome 
Biology 
2013, 
14:212 
Queen 
Bee 
Insect 
Methylome 
Worker 
Bee 
Castes 
Larva 
Dnmt 
Royal 
jelly 
RNAi 
Kucharski 
et 
al. 
2008. 
Science 
(319) 
5871: 
1827-­‐1830 
Anchoring 
molecular 
markers 
to 
reference 
genome 
pointed 
to 
chromosomal 
rearrangements 
& 
detecRng 
signals 
of 
adapRve 
radiaRon 
in 
Heliconius 
bu/erflies. 
APOLLO Joron 
et 
al. 
2011. 
Nature, 
477:203-­‐206
WORKING TOGETHER 
we have obtained better results 
ScienRfic 
community 
efforts 
bring 
together 
domain-­‐specific 
and 
natural 
history 
experRse 
that 
would 
otherwise 
remain 
disconnected. 
Breaking 
down 
large 
amounts 
of 
data 
into 
manageable 
porRons 
and 
mobilizing 
groups 
of 
researchers 
to 
extract 
the 
most 
accurate 
representaRon 
of 
the 
biology 
from 
all 
available 
data 
disRlls 
invaluable 
knowledge 
from 
genome 
analysis. 
21 
APOLLO
CURRENT COLLABORATIONS 
training and contributions 
Partnerships 
UNIVERSITY 
of MISSOURI 
Phlebotomus 
papatasi 
Wasmania 
auropunctata 
WEB APOLLO 22 
National 
Agricultural 
Library 
Nature 
Reviews 
Gene-cs 
2009 
(10), 
346-­‐347 
Norwegian 
Spruce 
h/p://congenie.org/ 
Tallapoosa 
darter 
hGp://dendrome.ucdavis.edu/treegenes/browsers/ 
h/p://darter2.westga.edu/ 
Pinus 
taeda 
Homo 
sapiens 
hg19
TRAINING CURATORS 
a little training goes a long way! 
Provided 
with 
the 
right 
tools, 
wet 
lab 
scienRsts 
make 
excepRonal 
curators 
who 
can 
easily 
learn 
to 
maximize 
the 
generaRon 
of 
accurate, 
biologically 
supported 
gene 
models. 
23 
APOLLO
Web 
Apollo 
Q-­‐ratore
WEB APOLLO 
the sequence selection window 
Sort 
Becoming Acquainted with Web Apollo. 25 
25
WEB APOLLO 
graphical user interface (GUI) for editing annotations 
NavigaRon 
tools: 
pan 
and 
zoom 
Grey 
bar 
of 
coordinates 
indicates 
locaRon. 
You 
can 
also 
select 
here 
in 
order 
to 
zoom 
to 
a 
sub-­‐region. 
Search 
box: 
go 
to 
a 
scaffold 
or 
a 
gene 
model. 
‘View’: 
change 
color 
by 
CDS, 
toggle 
strands, 
set 
highlight. 
‘File’: 
Upload 
your 
own 
evidence: 
GFF3, 
BAM, 
BigWig, 
VCF*. 
Add 
combinaRon 
and 
sequence 
search 
tracks. 
‘Tools’: 
Use 
BLAT 
to 
query 
the 
genome 
with 
a 
protein 
or 
DNA 
sequence. 
Available Tracks 
‘User-­‐created 
AnnotaRons’ 
Track 
Evidence 
Tracks 
Area 
Login 
26 
Becoming Acquainted with Web Apollo.
WEB APOLLO 
additional functionality 
In 
addiRon 
to 
protein-­‐coding 
gene 
annotaRon 
that 
you 
know 
and 
love. 
• Non-­‐coding 
genes: 
ncRNAs, 
miRNAs, 
repeat 
regions, 
and 
TEs 
• Sequence 
alteraRons 
(less 
coverage 
= 
more 
fragmentaRon) 
• VisualizaRon 
of 
stage 
and 
cell-­‐type 
specific 
transcripRon 
data 
as 
coverage 
plots, 
heat 
maps, 
and 
alignments 
27 
27 
Becoming Acquainted with Web Apollo.
GENERAL PROCESS OF CURATION 
steps to remember 
1. Select 
a 
chromosomal 
region 
of 
interest, 
e.g. 
scaffold. 
2. Select 
appropriate 
evidence 
tracks. 
3. Determine 
whether 
a 
feature 
in 
an 
exisRng 
evidence 
track 
will 
provide 
a 
reasonable 
gene 
model 
to 
start 
working. 
-­‐ If 
yes: 
select 
and 
drag 
the 
feature 
to 
the 
‘User-­‐created 
AnnotaRons’ 
area, 
creaJng 
an 
iniJal 
gene 
model. 
If 
necessary 
use 
ediRng 
funcRons 
to 
adjust 
the 
gene 
model. 
-­‐ If 
not: 
let’s 
talk. 
4. Check 
your 
edited 
gene 
model 
for 
integrity 
and 
accuracy 
by 
comparing 
it 
with 
available 
homologs. 
Always 
remember: 
Becoming Acquainted 28 | with Web Apollo 
when 
annotaRng 
gene 
models 
using 
Web 
Apollo, 
you 
are 
looking 
at 
a 
‘frozen’ 
version 
of 
the 
genome 
assembly 
and 
you 
will 
not 
be 
able 
to 
modify 
the 
assembly 
itself. 
28
Choose 
(click 
or 
drag) 
appropriate 
evidence 
tracks 
from 
the 
list 
on 
the 
lee. 
Click 
on 
an 
exon 
to 
select 
it. 
Double 
click 
on 
an 
exon 
or 
single 
click 
on 
an 
intron 
to 
select 
the 
enRre 
gene. 
Select 
& 
drag 
any 
elements 
from 
an 
evidence 
track 
into 
the 
curaRon 
area: 
these 
are 
editable 
and 
considered 
the 
curated 
version 
of 
the 
gene. 
Other 
opRons 
for 
elements 
in 
evidence 
tracks 
available 
from 
right-­‐click 
menu. 
If 
you 
select 
an 
exon 
or 
a 
gene, 
then 
every 
track 
is 
automaRcally 
searched 
for 
exons 
with 
exactly 
the 
same 
co-­‐ordinates 
as 
what 
you 
selected. 
Matching 
edges 
are 
highlighted 
red. 
Hovering 
over 
an 
annotaRon 
in 
progress 
brings 
up 
an 
informaRon 
pop-­‐up. 
29 | Becoming Acquainted with Web Apollo. 2 
9 
USER NAVIGATION
USER NAVIGATION 
Right-­‐click 
menu: 
• With 
the 
excepRon 
of 
deleRng 
a 
model, 
all 
edits 
can 
be 
reversed 
with 
‘Undo’ 
opRon. 
‘Redo’ 
also 
available. 
All 
changes 
are 
immediately 
saved 
and 
available 
to 
all 
users 
in 
real 
Rme. 
• ‘Get 
sequence’ 
retrieves 
pepRde, 
cDNA, 
CDS, 
and 
genomic 
sequences. 
• You 
can 
select 
an 
exon 
and 
select 
‘Delete’. 
You 
can 
create 
an 
intron, 
flip 
the 
direcRon, 
change 
the 
start 
or 
split 
the 
gene. 
30 | Becoming Acquainted with Web Apollo. 
30
USER NAVIGATION 
Right-­‐click 
menu: 
• If 
you 
select 
two 
gene 
models, 
you 
can 
join 
them 
using 
‘Merge’, 
and 
you 
may 
also 
‘Split’ 
a 
model. 
• You 
can 
select 
‘Duplicate’, 
for 
example 
to 
annotate 
isoforms. 
• Set 
translaRon 
start, 
annotate 
selenocysteine-­‐containing 
proteins, 
match 
edges 
of 
annotaRon 
to 
those 
of 
evidence 
tracks. 
31 | Becoming Acquainted with Web Apollo. 
31
32 
AnnotaRons, 
annotaRon 
edits, 
and 
History: 
stored 
in 
a 
centralized 
database. 
32 
USER NAVIGATION 
Becoming Acquainted with Web Apollo.
The 
AnnotaRon 
InformaRon 
Editor 
DBXRefs 
are 
database 
crossreferences: 
if 
you 
have 
reason 
to 
believe 
that 
this 
gene 
is 
linked 
to 
a 
gene 
in 
a 
public 
database 
(including 
your 
own), 
then 
add 
it 
here. 
33 
33 
USER NAVIGATION 
Becoming Acquainted with Web Apollo.
The 
AnnotaRon 
InformaRon 
Editor 
34 
• Add 
PubMed 
IDs 
• Include 
GO 
terms 
as 
appropriate 
from 
any 
of 
the 
three 
ontologies 
• Write 
comments 
staRng 
how 
you 
have 
validated 
each 
model. 
34 
USER NAVIGATION 
Becoming Acquainted with Web Apollo.
• ‘Zoom 
35 | 
to 
base 
level’ 
opRon 
reveals 
the 
DNA 
Track. 
• Change 
color 
of 
exons 
by 
CDS 
from 
the 
‘View’ 
menu. 
• The 
reference 
DNA 
sequence 
is 
visible 
in 
both 
direcRons 
as 
are 
the 
protein 
translaRons 
in 
all 
six 
frames. 
You 
can 
toggle 
either 
direcRon 
to 
display 
only 
3 
frames. 
Zoom 
in/out 
with 
keyboard: 
shie 
+ 
arrow 
keys 
up/down 
35 
USER NAVIGATION 
Becoming Acquainted with Web Apollo.
Web Apollo User Guide 
(Fragment) 
http://genomearchitect.org/web_apollo_user_guide
ANNOTATING SIMPLE CASES 
In 
a 
“simple 
case” 
the 
predicted 
gene 
model 
is 
correct 
or 
nearly 
correct, 
and 
this 
model 
is 
supported 
by 
evidence 
that 
completely 
or 
mostly 
agrees 
with 
the 
predicRon. 
Evidence 
that 
extends 
beyond 
the 
predicted 
model 
is 
assumed 
to 
be 
non-­‐coding 
sequence. 
The 
following 
secRons 
describe 
simple 
modificaRons. 
37 | Becoming Acquainted with Web Apollo. 
37
ADDING EXONS 
Select 
and 
drag 
the 
putaRve 
new 
exon 
from 
a 
track, 
and 
add 
it 
directly 
to 
an 
annotated 
transcript 
in 
the 
‘User-­‐created 
AnnotaRons’ 
area. 
• Click 
the 
exon, 
hold 
your 
finger 
on 
the 
mouse 
bu/on, 
and 
drag 
the 
cursor 
unRl 
it 
touches 
the 
receiving 
transcript. 
A 
dark 
green 
highlight 
indicates 
it 
is 
okay 
to 
release 
the 
mouse 
bu/on. 
• When 
released, 
the 
addiRonal 
exon 
becomes 
a/ached 
to 
the 
receiving 
transcript. 
• A 
38 | 
confirmaRon 
box 
will 
warn 
you 
if 
the 
receiving 
transcript 
is 
not 
on 
the 
same 
strand 
as 
the 
feature 
where 
the 
new 
exon 
originated. 
38 
Becoming Acquainted with Web Apollo.
ADDING EXONS 
Each 
Rme 
you 
add 
an 
exon 
region, 
whether 
by 
extension 
or 
adding 
an 
exon, 
Web 
Apollo 
recalculates 
the 
longest 
ORF, 
idenRfying 
‘Start’ 
and 
‘Stop’ 
signals 
and 
allowing 
you 
to 
determine 
whether 
a 
‘Stop’ 
codon 
has 
been 
incorporated 
aeer 
each 
ediRng 
step. 
39 | 
Web 
Apollo 
demands 
that 
an 
exon 
already 
exists 
as 
an 
evidence 
in 
one 
of 
the 
tracks. 
You 
could 
provide 
a 
text 
file 
in 
GFF 
format 
and 
select 
File 
à 
Open. 
GFF 
is 
a 
simple 
text 
file 
delimited 
by 
TABs, 
one 
line 
for 
each 
genomic 
‘feature’: 
column 
1 
is 
the 
name 
of 
the 
scaffold; 
then 
some 
text 
(irrelevant), 
then 
‘exon’, 
then 
start, 
stop, 
strand 
as 
+ 
or 
-­‐, 
a 
dot, 
another 
dot, 
and 
Name=some 
name 
Example: 
scaffold_88 
Qratore 
exon 
21 
2111 
+ 
. 
. 
Name=bob 
scaffold_88 
Qratore 
exon 
2201 
5111 
+ 
. 
. 
Name=rad 
39 
Becoming Acquainted with Web Apollo.
ADDING UTRs 
Gene 
predicRons 
may 
or 
may 
not 
include 
UTRs. 
If 
transcript 
alignment 
data 
are 
available 
and 
extend 
beyond 
your 
original 
annotaRon, 
you 
may 
extend 
or 
add 
UTRs. 
1. PosiRon 
the 
cursor 
at 
the 
beginning 
of 
the 
exon 
that 
needs 
to 
be 
extended 
and 
‘Zoom 
to 
base 
level’. 
2. Place 
the 
cursor 
over 
the 
edge 
of 
the 
exon 
unRl 
it 
becomes 
a 
black 
arrow 
then 
click 
and 
drag 
the 
edge 
of 
the 
exon 
to 
the 
new 
coordinate 
posiRon 
that 
includes 
the 
UTR. 
View 
zoomed 
to 
base 
level. 
The 
DNA 
track 
and 
annotaRon 
track 
are 
visible. 
The 
DNA 
track 
includes 
the 
sense 
strand 
(top) 
and 
anR-­‐sense 
strand 
(bo/om). 
The 
six 
reading 
frames 
flank 
the 
DNA 
track, 
with 
the 
three 
forward 
frames 
above 
and 
the 
three 
reverse 
frames 
below. 
The 
User-­‐ 
created 
AnnotaRon 
track 
shows 
the 
terminal 
end 
of 
an 
annotaRon. 
The 
green 
rectangle 
highlights 
the 
locaRon 
of 
the 
nucleoRde 
residues 
in 
the 
‘Stop’ 
signal. 
40 | 
To 
add 
a 
new 
spliced 
UTR 
to 
an 
exisRng 
annotaRon 
follow 
the 
procedure 
for 
adding 
an 
exon. 
40 
Becoming Acquainted with Web Apollo.
EXON STRUCTURE INTEGRITY 
1. Zoom 
in 
sufficiently 
to 
clearly 
resolve 
each 
exon 
as 
a 
disRnct 
rectangle. 
2. Two 
exons 
from 
different 
tracks 
sharing 
the 
same 
start 
and/or 
end 
coordinates 
will 
display 
a 
red 
bar 
to 
indicate 
the 
matching 
edges. 
3. SelecRng 
the 
whole 
annotaRon 
or 
one 
exon 
at 
a 
Rme, 
use 
this 
‘edge-­‐ 
matching’ 
funcRon 
and 
scroll 
along 
the 
length 
of 
the 
annotaRon, 
verifying 
exon 
boundaries 
against 
available 
data. 
Use 
square 
[ 
] 
brackets 
to 
scroll 
from 
exon 
to 
exon. 
4. Note 
if 
there 
are 
cDNA 
/ 
RNAseq 
reads 
that 
lack 
one 
or 
more 
of 
the 
annotated 
exons 
or 
include 
addiRonal 
exons. 
41 | Becoming Acquainted with Web Apollo. 
41
EXON STRUCTURE INTEGRITY 
To 
modify 
an 
exon 
boundary 
and 
match 
data 
in 
the 
evidence 
tracks: 
select 
both 
the 
offending 
exon 
and 
the 
feature 
with 
the 
expected 
boundary, 
then 
right 
click 
on 
the 
annotaRon 
to 
select 
‘Set 
3’ 
end’ 
or 
‘Set 
5’ 
end’ 
as 
appropriate. 
42 | 
In 
some 
cases 
all 
the 
data 
may 
disagree 
with 
the 
annotaRon, 
in 
other 
cases 
some 
data 
support 
the 
annotaRon 
and 
some 
of 
the 
data 
support 
one 
or 
more 
alternaRve 
transcripts. 
Try 
to 
annotate 
as 
many 
alternaRve 
transcripts 
as 
are 
well 
supported 
by 
the 
data. 
42 
Becoming Acquainted with Web Apollo.
EDITING LOGIC 
Flags 
non-­‐canonical 
splice 
sites. 
SelecRon 
of 
features 
and 
sub-­‐ 
features 
Edge-­‐matching 
‘User-­‐created 
AnnotaRons’ 
Track 
Evidence 
Tracks 
Area 
The 
ediRng 
logic 
in 
the 
server: 
§ selects 
longest 
ORF 
as 
CDS 
§ flags 
non-­‐canonical 
splice 
sites 
43 
Becoming Acquainted with Web Apollo.
SPLICE SITES 
Zoom 
to 
base 
level 
to 
review 
non-­‐ 
canonical 
splice 
site 
warnings. 
These 
do 
not 
necessarily 
need 
to 
be 
corrected, 
but 
should 
be 
flagged 
with 
the 
appropriate 
comment. 
Curated 
model 
Exon/intron 
juncRon 
possible 
error 
44 | 
Original 
model 
Non-­‐canonical 
splices 
are 
indicated 
by 
an 
orange 
circle 
with 
a 
white 
exclamaRon 
point 
inside, 
placed 
over 
the 
edge 
of 
the 
offending 
exon. 
Most 
insects, 
have 
a 
valid 
non-­‐canonical 
site 
GC-­‐AG. 
Other 
non-­‐canonical 
splice 
sites 
are 
unverified. 
Web 
Apollo 
flags 
GC 
splice 
donors 
as 
non-­‐canonical. 
Canonical 
splice 
sites: 
forward 
strand 
5’-­‐…exon]GT 
/ 
AG[exon…-­‐3’ 
reverse 
strand, 
not 
reverse-­‐complemented: 
3’-­‐…exon]GA 
/ 
TG[exon…-­‐5’ 
44 
Becoming Acquainted with Web Apollo.
SPLICE SITES 
keep this in mind 
Some 
gene 
predicRon 
algorithms 
do 
not 
recognize 
GC 
splice 
sites, 
thus 
the 
intron/exon 
juncRon 
may 
be 
incorrect. 
For 
example, 
one 
such 
gene 
predicRon 
algorithm 
may 
ignore 
a 
true 
GC 
donor 
and 
select 
another 
non-­‐canonical 
splice 
site 
that 
is 
less 
frequently 
observed 
in 
nature. 
Therefore, 
if 
upon 
inspecRon 
you 
find 
a 
non-­‐ 
canonical 
splice 
site 
that 
is 
rarely 
observed 
in 
nature, 
you 
may 
wish 
to 
search 
the 
region 
for 
a 
more 
frequent 
in-­‐frame 
non-­‐canonical 
splice 
site, 
such 
as 
a 
GC 
donor. 
If 
there 
is 
an 
in-­‐frame 
site 
close 
that 
is 
more 
likely 
to 
be 
the 
correct 
splice 
donor, 
you 
may 
make 
this 
adjustment 
while 
zoomed 
at 
base 
level. 
Curated model 
Exon/intron junction possible error 
Canonical 
splice 
sites: 
5’-­‐…exon]GT 
/ 
AG[exon…-­‐3’ 
reverse 
strand, 
not 
reverse-­‐complemented: 
3’-­‐…exon]GA 
/ 
TG[exon…-­‐5’ 
45 | 
Original model 
Use 
RNA-­‐Seq 
data 
to 
make 
a 
decision. 
forward 
strand 
45 
Becoming Acquainted with Web Apollo.
‘START’ AND ‘STOP’ SITES 
Web 
Apollo 
calculates 
the 
longest 
possible 
open 
reading 
frame 
(ORF) 
that 
includes 
canonical 
‘Start’ 
and 
‘Stop’ 
signals 
within 
the 
predicted 
exons. 
If 
it 
appears 
to 
have 
calculated 
an 
incorrect 
‘Start’ 
signal, 
you 
may 
modify 
it 
selecRng 
an 
in-­‐frame 
‘Start’ 
codon 
further 
up 
or 
downstream, 
depending 
on 
evidence 
(protein 
database, 
addiRonal 
evidence 
tracks). 
An 
upstream 
‘Start’ 
codon 
may 
be 
present 
outside 
the 
predicted 
gene 
model, 
within 
a 
region 
supported 
by 
another 
evidence 
track. 
46 | Becoming Acquainted with Web Apollo. 
46
‘START’ AND ‘STOP’ SITES 
keep this in mind 
Note 
that 
the 
‘Start’ 
codon 
may 
also 
be 
located 
in 
a 
non-­‐predicted 
exon 
further 
upstream. 
If 
you 
cannot 
idenRfy 
that 
exon, 
add 
the 
appropriate 
note 
in 
the 
transcript’s 
‘Comments’ 
secRon. 
In 
very 
rare 
cases, 
the 
actual 
‘Start’ 
codon 
may 
be 
non-­‐canonical 
(non-­‐ATG). 
In 
some 
cases, 
a 
‘Stop’ 
codon 
may 
not 
be 
automaRcally 
idenRfied. 
Check 
to 
see 
if 
there 
are 
data 
supporRng 
a 
3’ 
extension 
of 
the 
terminal 
exon 
or 
addiRonal 
3’ 
exons 
with 
valid 
splice 
sites. 
47 | Becoming Acquainted with Web Apollo. 
47
Web Apollo Workshop University of Exeter
COMPLEX CASES 
merge two gene predictions on the same scaffold 
Evidence 
may 
support 
joining 
two 
or 
more 
different 
gene 
models. 
Warning: 
protein 
alignments 
may 
have 
incorrect 
splice 
sites 
and 
lack 
non-­‐conserved 
regions! 
1. Drag 
and 
drop 
each 
gene 
model 
to 
‘User-­‐created 
AnnotaRons’ 
area. 
Shie 
click 
to 
select 
an 
intron 
from 
each 
gene 
model 
and 
right 
click 
to 
select 
the 
‘Merge’ 
opRon 
from 
the 
menu. 
2. Drag 
supporRng 
evidence 
tracks 
over 
the 
candidate 
models 
to 
corroborate 
overlap, 
or 
review 
edge 
matching 
and 
coverage 
across 
models. 
3. Check 
the 
resulRng 
translaRon 
by 
querying 
a 
protein 
database 
e.g. 
UniProt. 
Record 
the 
IDs 
of 
both 
starRng 
gene 
models 
in 
‘DBXref’ 
and 
add 
comments 
to 
record 
that 
this 
annotaRon 
is 
the 
result 
of 
a 
merge. 
Red 
lines 
around 
exons: 
‘edge-­‐matching’ 
allows 
annotators 
to 
confirm 
whether 
the 
evidence 
is 
in 
agreement 
without 
examining 
each 
exon 
at 
the 
base 
level. 
49 | Becoming Acquainted with Web Apollo. 
49
COMPLEX CASES 
split a gene prediction 
One 
or 
more 
splits 
may 
be 
recommended 
when 
different 
segments 
of 
the 
predicted 
protein 
align 
to 
two 
or 
more 
different 
families 
of 
protein 
homologs, 
and 
the 
predicted 
protein 
does 
not 
align 
to 
any 
known 
protein 
over 
its 
enRre 
length. 
Transcript 
data 
may 
support 
a 
split 
(if 
so, 
verify 
that 
it 
is 
not 
a 
case 
of 
alternaRve 
transcripts). 
50 | Becoming Acquainted with Web Apollo. 
50
frameshifts, single-base errors, and selenocysteines 
DNA 
Track 
COMPLEX CASES 
‘User-­‐created 
AnnotaJons’ 
Track 
51 
Becoming Acquainted with Web Apollo.
COMPLEX CASES 
frameshifts, single-base errors, and selenocysteines 
1. Web 
Apollo 
allows 
annotators 
to 
make 
single 
base 
modificaRons 
or 
frameshies 
that 
are 
reflected 
in 
the 
sequence 
and 
structure 
of 
any 
transcripts 
overlapping 
the 
modificaRon. 
Note 
that 
these 
manipulaRons 
do 
NOT 
change 
the 
underlying 
genomic 
sequence. 
2. If 
you 
determine 
that 
you 
need 
to 
make 
one 
of 
these 
changes, 
zoom 
in 
to 
the 
nucleoRde 
level 
and 
right 
click 
over 
a 
single 
nucleoRde 
on 
the 
genomic 
sequence 
to 
access 
a 
menu 
that 
provides 
opRons 
for 
creaRng 
inserRons, 
deleRons 
or 
subsRtuRons. 
3. The 
‘Create 
Genomic 
InserRon’ 
feature 
will 
require 
you 
to 
enter 
the 
necessary 
string 
of 
nucleoRde 
residues 
that 
will 
be 
inserted 
to 
the 
right 
of 
the 
cursor’s 
current 
locaRon. 
The 
‘Create 
Genomic 
DeleRon’ 
opRon 
will 
require 
you 
to 
enter 
the 
length 
of 
the 
deleRon, 
starRng 
with 
the 
nucleoRde 
where 
the 
cursor 
is 
posiRoned. 
The 
‘Create 
Genomic 
SubsRtuRon’ 
feature 
asks 
for 
the 
string 
of 
nucleoRde 
residues 
that 
will 
replace 
the 
ones 
on 
the 
DNA 
track. 
4. Once 
you 
have 
entered 
the 
modificaRons, 
Web 
Apollo 
will 
recalculate 
the 
corrected 
transcript 
and 
protein 
sequences, 
which 
will 
appear 
when 
you 
use 
the 
right-­‐click 
menu 
‘Get 
Sequence’ 
opRon. 
Since 
the 
underlying 
genomic 
sequence 
is 
reflected 
in 
all 
annotaRons 
that 
include 
the 
modified 
region 
you 
should 
alert 
the 
curators 
of 
your 
organisms 
database 
using 
the 
‘Comments’ 
secRon 
to 
report 
the 
CDS 
edits. 
5. In 
special 
cases 
such 
as 
selenocysteine 
containing 
proteins 
(read-­‐throughs), 
right-­‐click 
over 
the 
offending/premature 
‘Stop’ 
signal 
and 
choose 
the 
‘Set 
readthrough 
stop 
codon’ 
opRon 
from 
the 
menu. 
52 | Becoming Acquainted with Web Apollo. 
52
COMPLETING THE ANNOTATION 
Follow 
our 
checklist 
unRl 
you 
are 
happy 
with 
the 
annotaRon! 
Then: 
– Comment 
to 
validate 
your 
annotaRon, 
even 
if 
you 
made 
no 
changes 
to 
an 
exisRng 
model. 
Your 
comments 
mean 
you 
looked 
at 
the 
curated 
model 
and 
are 
happy 
with 
it; 
think 
of 
it 
as 
a 
vote 
of 
confidence. 
– Or 
add 
a 
comment 
to 
inform 
the 
community 
of 
unresolved 
issues 
you 
think 
this 
model 
may 
have. 
Always 
Remember: 
53 | 53 
Web 
Apollo 
curaRon 
is 
a 
community 
effort 
so 
please 
use 
comments 
to 
communicate 
the 
reasons 
for 
your 
annotaRon 
(your 
comments 
will 
be 
visible 
to 
everyone). 
Becoming Acquainted with Web Apollo.
CHECK LIST 
for accuracy and integrity 
1. Can 
you 
add 
UTRs 
(e.g.: 
via 
RNA-­‐Seq)? 
2. Check 
exon 
structures 
3. Check 
splice 
sites: 
most 
splice 
sites 
display 
these 
residues 
…]5’-­‐GT/AG-­‐3’[… 
4. Check 
‘Start’ 
and 
‘Stop’ 
sites 
5. Check 
the 
predicted 
protein 
product(s) 
– Align 
it 
against 
relevant 
genes/gene 
family. 
– blastp 
against 
NCBI’s 
RefSeq 
or 
nr 
6. If 
the 
protein 
product 
sRll 
does 
not 
look 
correct 
then 
check: 
– Are 
there 
gaps 
in 
the 
genome? 
– Merge 
of 
2 
gene 
predicRons 
on 
the 
same 
scaffold 
– Merge 
of 
2 
gene 
predicRons 
from 
different 
scaffolds 
– Split 
a 
gene 
predicRon 
– Frameshies 
– error 
in 
the 
genome 
assembly? 
– Selenocysteine, 
single-­‐base 
errors, 
and 
other 
inconvenient 
phenomena 
54 | 54 
7. Finalize 
annotaRon 
by 
adding: 
– Important 
project 
informaRon 
in 
the 
form 
of 
canned 
and/or 
customized 
comments 
– IDs 
from 
GenBank 
(via 
DBXRef), 
gene 
symbol(s), 
common 
name(s), 
synonyms, 
top 
BLAST 
hits 
(with 
GenBank 
IDs), 
orthologs 
with 
species 
names, 
and 
everything 
else 
you 
can 
think 
of, 
because 
you 
are 
the 
expert. 
– Whether 
your 
model 
replaces 
one 
or 
more 
models 
from 
the 
official 
gene 
set 
(so 
it 
can 
be 
deleted). 
– The 
kinds 
of 
changes 
you 
made 
to 
the 
gene 
model 
of 
interest, 
if 
any. 
E.g.: 
splits, 
merges, 
whether 
the 
5’ 
or 
3’ 
ends 
had 
to 
be 
modified 
to 
include 
‘Start’ 
or 
‘Stop’ 
codons, 
addiRonal 
exons 
had 
to 
be 
added, 
or 
non-­‐canonical 
splice 
sites 
were 
accepted. 
– Any 
funcRonal 
assignments 
that 
you 
think 
are 
of 
interest 
to 
the 
community 
(e.g. 
via 
BLAST, 
RNA-­‐Seq 
data, 
literature 
searches, 
etc.) 
Becoming Acquainted with Web Apollo.
FUTURE PLANS 
interactive analysis and curation of variants 
v InteracRve 
exploraRon 
of 
VCF 
files 
(e.g. 
from 
GATK, 
VAAST) 
in 
addiRon 
to 
BAM 
and 
GVF. 
MulRple 
tracks 
in 
one: 
visualizaRon 
of 
geneRc 
alteraRons 
and 
populaRon 
frequency 
of 
variants. 
v Clinical 
applicaRons: 
analysis 
of 
WEB APOLLO 55 
1 
1 
2 
Copy 
Number 
VariaRons 
for 
regulatory 
effects; 
overlaying 
display 
of 
the 
regulatory 
domains. 
Philips-­‐Creminis 
and 
Corces. 
2013. 
Cell 
50 
(4):461-­‐474 
2 
TADs: 
topologically 
associaRng 
domains
FUTURE PLANS 
educational tools 
We 
are 
working 
with 
educators 
to 
make 
Web 
Apollo 
part 
of 
their 
curricula. 
In 
the 
classroom. 
Lecture 
Series. 
WEB APOLLO 56 
At 
the 
lab. 
Classroom 
exercises: 
from 
genome 
sequence 
to 
hypothesis. 
CuraRon 
group 
dedicated 
to 
producing 
educaRon 
materials 
for 
non-­‐model 
organism 
communiRes. 
Our 
team 
provides 
online 
documentaRon, 
hands-­‐on 
training, 
and 
rapid 
response 
to 
users.
Exercises 
Live 
DemonstraRon 
using 
the 
Apis 
mellifera 
genome. 
57 
1. 
Evidence 
in 
support 
of 
protein 
coding 
gene 
models. 
1.1 
Consensus 
Gene 
Sets: 
Official 
Gene 
Set 
v3.2 
Official 
Gene 
Set 
v1.0 
1.2 
Consensus 
Gene 
Sets 
comparison: 
OGSv3.2 
genes 
that 
merge 
OGSv1.0 
and 
RefSeq 
genes 
OGSv3.2 
genes 
that 
split 
OGSv1.0 
and 
RefSeq 
genes 
1.3 
Protein 
Coding 
Gene 
PredicJons 
Supported 
by 
Biological 
Evidence: 
NCBI 
Gnomon 
Fgenesh++ 
with 
RNASeq 
training 
data 
Fgenesh++ 
without 
RNASeq 
training 
data 
NCBI 
RefSeq 
Protein 
Coding 
Genes 
and 
Low 
Quality 
Protein 
Coding 
Genes 
1.4 
Ab 
ini&o 
protein 
coding 
gene 
predicJons: 
Augustus 
Set 
12, 
Augustus 
Set 
9, 
Fgenesh, 
GeneID, 
N-­‐SCAN, 
SGP2 
1.5 
Transcript 
Sequence 
Alignment: 
NCBI 
ESTs, 
Apis 
cerana 
RNA-­‐Seq, 
Forager 
Bee 
Brain 
Illumina 
ConRgs, 
Nurse 
Bee 
Brain 
Illumina 
ConRgs, 
Forager 
RNA-­‐Seq 
reads, 
Nurse 
RNA-­‐Seq 
reads, 
Abdomen 
454 
ConRgs, 
Brain 
and 
Ovary 
454 
ConRgs, 
Embryo 
454 
ConRgs, 
Larvae 
454 
ConRgs, 
Mixed 
Antennae 
454 
ConRgs, 
Ovary 
454 
ConRgs 
Testes 
454 
ConRgs, 
Forager 
RNA-­‐Seq 
HeatMap, 
Forager 
RNA-­‐Seq 
XY 
Plot, 
Nurse 
RNA-­‐Seq 
HeatMap, 
Nurse 
RNA-­‐Seq 
XY 
Plot 
Becoming Acquainted with Web Apollo.
Exercises 
Live 
DemonstraRon 
using 
the 
Apis 
mellifera 
genome. 
58 
1. 
Evidence 
in 
support 
of 
protein 
coding 
gene 
models 
(ConJnued). 
1.6 
Protein 
homolog 
alignment: 
Acep_OGSv1.2 
Aech_OGSv3.8 
Cflo_OGSv3.3 
Dmel_r5.42 
Hsal_OGSv3.3 
Lhum_OGSv1.2 
Nvit_OGSv1.2 
Nvit_OGSv2.0 
Pbar_OGSv1.2 
Sinv_OGSv2.2.3 
Znev_OGSv2.1 
Metazoa_Swissprot 
2. 
Evidence 
in 
support 
of 
non 
protein 
coding 
gene 
models 
2.1 
Non-­‐protein 
coding 
gene 
predicJons: 
NCBI 
RefSeq 
Noncoding 
RNA 
NCBI 
RefSeq 
miRNA 
2.2 
Pseudogene 
predicJons: 
NCBI 
RefSeq 
Pseudogene 
Becoming Acquainted with Web Apollo.
Web Apollo Workshop Instances 
h/p://genomes.missouri.edu:8080/Amel_4.5_demo_1 
h/p://genomes.missouri.edu:8080/Amel_4.5_demo_2 
Workshop 
DocumentaRon 
at
FEDERATED ENVIRONMENT 
other BBOP tools 
BBOP Projects 60
• Berkeley 
BioinformaJcs 
Open-­‐source 
Projects 
(BBOP), 
Berkeley 
Lab: 
Web 
Apollo 
and 
Gene 
Ontology 
teams. 
Suzanna 
E. 
Lewis 
(PI). 
• § 
ChrisRne 
G. 
Elsik 
(PI). 
University 
of 
Missouri. 
• * 
Ian 
Holmes 
(PI). 
University 
of 
California 
Berkeley. 
• Arthropod 
genomics 
community: 
i5K 
Steering 
Commi/ee, 
Alexie 
Papanicolaou 
(CSIRO), 
Monica 
Poelchau 
(USDA/NAL), 
fringy 
Richards 
(HGSC-­‐BCM), 
BGI, 
1KITE 
h/p://www.1kite.org/, 
and 
the 
Honey 
Bee 
Genome 
Sequencing 
ConsorRum. 
• AgriPest 
Base, 
Hymenoptera 
Genome 
Database, 
VectorBase, 
FlyBase. 
• Web 
Apollo 
is 
supported 
by 
NIH 
grants 
5R01GM080203 
from 
NIGMS, 
and 
5R01HG004483 
from 
NHGRI, 
and 
by 
the 
Director, 
Office 
of 
Science, 
Office 
of 
Basic 
Energy 
Sciences, 
of 
the 
U.S. 
Department 
of 
Energy 
under 
Contract 
No. 
DE-­‐ 
AC02-­‐05CH11231. 
• Insect 
images 
used 
with 
permission: 
h/p://AlexanderWild.com 
and 
O. 
Niehuis. 
• For 
your 
a8enJon, 
thank 
you! 
Thanks! 
Web 
Apollo 
Nathan 
Dunn 
Colin 
Diesh 
§ 
Deepak 
Unni 
§ 
BBOP 
Gene 
Ontology 
Chris 
Mungall 
Seth 
Carbon 
Heiko 
Dietze 
Alumni 
Gregg 
Helt 
Ed 
Lee 
Rob 
Buels* 
Web 
Apollo: 
h/p://GenomeArchitect.org 
GO: 
h/p://GeneOntology.org 
i5K: 
h/p://arthropodgenomes.org/wiki/i5K 
Thank you. 61

More Related Content

PDF
Introduction to 16S rRNA gene multivariate analysis
PPTX
Munoz torres web-apollo-workshop_exeter-2014_ss
PDF
Web Apollo at Genome Informatics 2014
PDF
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
PDF
An introduction to Web Apollo for the Biomphalaria glabatra research community.
PDF
Web Apollo Workshop UIUC
PPTX
Introduction to Web Apollo for the i5K pilot species.
PPTX
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Introduction to 16S rRNA gene multivariate analysis
Munoz torres web-apollo-workshop_exeter-2014_ss
Web Apollo at Genome Informatics 2014
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
An introduction to Web Apollo for the Biomphalaria glabatra research community.
Web Apollo Workshop UIUC
Introduction to Web Apollo for the i5K pilot species.
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.

What's hot (20)

PDF
Genome Curation using Apollo
PPTX
2014 villefranche
PPTX
2014 naples
PDF
SteilLinkedInResume2015
PDF
Flash introduction to Qiime2 -- 16S Amplicon analysis
PPTX
An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
PDF
How to write bioinformatics software people will use and cite - t.seemann - ...
PPTX
2014 khmer protocols
PPTX
2013 duke-talk
PDF
2014 11-13-sbsm032-reproducible research
PPTX
2015 ohsu-metagenome
PDF
Introduction to Apollo for i5k
PPT
Newcastle iGEM Presentation 2008
PDF
2015 10-7-11am-reproducible research
PPTX
Ensembl plants hsf_d_bolser_2012
PDF
VIVO: an interdisciplinary national network
PPTX
2014 bangkok-talk
PPTX
2013 stamps-intro-assembly
PPT
Jc synthetic biology 6-15-2012
PDF
Introduction to Apollo: A webinar for the i5K Research Community
Genome Curation using Apollo
2014 villefranche
2014 naples
SteilLinkedInResume2015
Flash introduction to Qiime2 -- 16S Amplicon analysis
An introduction to Web Apollo for i5K Pilot Species Projects - Hemiptera
How to write bioinformatics software people will use and cite - t.seemann - ...
2014 khmer protocols
2013 duke-talk
2014 11-13-sbsm032-reproducible research
2015 ohsu-metagenome
Introduction to Apollo for i5k
Newcastle iGEM Presentation 2008
2015 10-7-11am-reproducible research
Ensembl plants hsf_d_bolser_2012
VIVO: an interdisciplinary national network
2014 bangkok-talk
2013 stamps-intro-assembly
Jc synthetic biology 6-15-2012
Introduction to Apollo: A webinar for the i5K Research Community
Ad

Similar to Web Apollo Workshop University of Exeter (20)

PDF
Apollo Workshop at KSU 2015
PPTX
Web Apollo Tutorial for the i5K copepod research community.
PDF
Web Apollo Tutorial for Medfly Research Community
PPTX
Three's a crowd-source: Observations on Collaborative Genome Annotation
PDF
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
PDF
Apollo Collaborative genome annotation editing
PDF
Apollo Workshop AGS2017 Introduction
PDF
Curation Introduction - Apollo Workshop
PPTX
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
PPTX
Web Apollo: Lessons learned from community-based biocuration efforts.
PDF
Apollo - A webinar for the Phascolarctos cinereus research community
PDF
Apollo : A workshop for the Manakin Research Coordination Network
PDF
Apollo Introduction for the Chestnut Research Community
PDF
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
PDF
Apollo Introduction for i5K Groups 2015-10-07
PDF
Apollo provides collaborative genome annotation editing with the power of jbr...
PDF
Apollo annotation guidelines for i5k projects Diaphorina citri
PDF
Introduction to Apollo: i5K E affinis
PDF
Apollo Workshop AGS2017 Editing functionality
PDF
20140710 6 c_mason_ercc2.0_workshop
Apollo Workshop at KSU 2015
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for Medfly Research Community
Three's a crowd-source: Observations on Collaborative Genome Annotation
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Apollo Collaborative genome annotation editing
Apollo Workshop AGS2017 Introduction
Curation Introduction - Apollo Workshop
Genome annotation with open source software: Apollo, Jbrowse and the GO in Ga...
Web Apollo: Lessons learned from community-based biocuration efforts.
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo : A workshop for the Manakin Research Coordination Network
Apollo Introduction for the Chestnut Research Community
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo Introduction for i5K Groups 2015-10-07
Apollo provides collaborative genome annotation editing with the power of jbr...
Apollo annotation guidelines for i5k projects Diaphorina citri
Introduction to Apollo: i5K E affinis
Apollo Workshop AGS2017 Editing functionality
20140710 6 c_mason_ercc2.0_workshop
Ad

More from Monica Munoz-Torres (12)

PDF
Editing Functionality - Apollo Workshop
PDF
Apollo Exercises Kansas State University 2015
PDF
JBrowse & Apollo Overview - for AGR
PDF
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
PDF
Gene Ontology Consortium: Website & COmmunity
PDF
Essential Requirements for Community Annotation Tools
PDF
Genome Curation using Apollo - Workshop at UTK
PDF
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
PDF
Apolo Taller en BIOS
PDF
PAINT Family PTHR13451-MUS81
PDF
Data Visualization And Annotation Workshop at Biocuration 2015
PDF
Apollo: developers call 2015-02-05
Editing Functionality - Apollo Workshop
Apollo Exercises Kansas State University 2015
JBrowse & Apollo Overview - for AGR
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Gene Ontology Consortium: Website & COmmunity
Essential Requirements for Community Annotation Tools
Genome Curation using Apollo - Workshop at UTK
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
Apolo Taller en BIOS
PAINT Family PTHR13451-MUS81
Data Visualization And Annotation Workshop at Biocuration 2015
Apollo: developers call 2015-02-05

Recently uploaded (20)

PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Cell Types and Its function , kingdom of life
PPTX
Lesson notes of climatology university.
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
Pre independence Education in Inndia.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Basic Mud Logging Guide for educational purpose
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Classroom Observation Tools for Teachers
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Complications of Minimal Access Surgery at WLH
Sports Quiz easy sports quiz sports quiz
Cell Types and Its function , kingdom of life
Lesson notes of climatology university.
VCE English Exam - Section C Student Revision Booklet
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Pharma ospi slides which help in ospi learning
Pre independence Education in Inndia.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Basic Mud Logging Guide for educational purpose
2.FourierTransform-ShortQuestionswithAnswers.pdf
RMMM.pdf make it easy to upload and study
Classroom Observation Tools for Teachers
human mycosis Human fungal infections are called human mycosis..pptx
PPH.pptx obstetrics and gynecology in nursing
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Insiders guide to clinical Medicine.pdf
Final Presentation General Medicine 03-08-2024.pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Complications of Minimal Access Surgery at WLH

Web Apollo Workshop University of Exeter

  • 1. An Introduction to Web Apollo Manual Annotation Workshop at University of Exeter Monica Munoz-Torres, PhD | @monimunozto Berkeley Bioinformatics Open-Source Projects (BBOP) Genomics Division, Lawrence Berkeley National Laboratory At University of Exeter. October 8, 2014
  • 2. TEACHING MATERIALS FOR TODAY Demo instance 1: h/p://genomes.missouri.edu:8080/Amel_4.5_demo_1/selectTrack.jsp Demo instance 2: h/p://genomes.missouri.edu:8080/Amel_4.5_demo_2/selectTrack.jsp Recommended Browser: Chrome
  • 3. OUTLINE • MANUAL ANNOTATION working concept • COMMUNITY BASED CURATION in our experience • APOLLO empowering collaboraRve curaRon • APOLLO on THE WEB becoming acquainted • PRACTICE demonstraRon and exercises Web Apollo CollaboraRve CuraRon and InteracRve Analysis of Genomes 3
  • 4. DURING THIS WORKSHOP you will v Learn to idenRfy homologs of known genes of interest in a newly sequenced genome of interest. v Become familiar with the environment and funcRonality of the Web Apollo genome annotaRon ediRng tool. v Learn how to corroborate and / or modify automaRcally annotated gene models using available biological evidence in Web Apollo. v Understand the process of curaRon in the context of genome annotaRon: from the assembled genome to manual curaRon via automated annotaRon. 4
  • 5. I INVITE YOU TO: v Observe the figures v Listen to the explanaRons v Interrupt me at any Rme to ask quesRons v Use Twi/er & share your thoughts: I am @monimunozto Some tags & users: #WebApollo #AnnotaRon #CuraRon #GMOD #genome @JBrowseGossip v Take brakes: LBL’s ergo safety team suggests I should not work at the computer for >45 minutes without a break; neither should you! We will be here for 2.5 hours: please get up and stretch your neck, arms, and legs as oeen as you need. 5
  • 6. I kindly ask that you refrain from: v Reading all that text I wrote! Think of the text on these slides as your “class notes”. You will use them during exercises. v Checking email. I’d like to kindly ask for your undivided a/enRon.
  • 7. Let Us Get Started
  • 8. MANUAL ANNOTATION working concept v Automated genome analyses remain an imperfect art that cannot yet resolve all elements of the genome. v Precise elucidaRon of biological features encoded in the genome requires careful examinaRon and review. Schiex et al. Nucleic Acids 2003 (31) 13: 3738-­‐3741 Automated Predictions Experimental Evidence cDNAs, HMM domain searches, RNAseq, genes from other species. Manual Curation 8
  • 9. 9 Nucleic Acids 2003 vol. 31 no. 13 3738-3741 Manual Curation GENE PREDICTION v IdenRficaRon of protein-­‐coding genes, tRNAs, rRNAs, regulatory moRfs, repeRRve elements (masked), etc. • Ab ini-o (DNA composiRon): Augustus, GENSCAN, geneid, fgenesh • Homology-­‐based: e.g: SGP2, fgenesh++
  • 10. GENE ANNOTATION IntegraRon of data from predicRon tools to generate a consensus set of predicRons or gene models. v Models may be organized using: v automaRc integraRon of predicted sets; e.g: GLEAN v packaged tools from pipeline; e.g: MAKER v All available biological evidence (e.g. transcriptomes) further informs the annotaRon process. In some cases algorithms and metrics used to generate consensus sets may actually reduce the accuracy of the gene’s representaRon; in such cases it is usually be/er to use an 10 ab ini-o model to create a new annotaRon. Manual Curation
  • 11. MANUAL ANNOTATION is necessary v Evaluate all available evidence and corroborate or modify genome element predicRons. v Determine funcRonal roles through comparaRve analysis using literature, databases, and experience*. v Resolve discrepancies and validate automated gene model hypotheses. v Desktop version of Apollo was designed to fit the manual annotaRon needs of genome projects such as fruit fly, mouse, zebrafish, human, etc. Manual Curation 11 Automated Predictions Curated Gene Models Official Gene Set “Incorrect and incomplete genome annota-ons will poison every experiment that uses them”. -­‐ M. Yandell
  • 12. BUT, MANUAL CURATION did not always scale well Too many sequences and not enough hands to approach curaRon. A small group of highly trained experts; e.g. GO 1 Museum 2 Jamboree A few very good biologists and a few very good bioinformaRcians camp together, during intense but short periods of Rme. 3 Co8age Researchers work by themselves, then may or may not publicize results; … may be a dead-­‐end with very few people ever aware of these results. Elsik et al. 2006. Genome Res. 16(11):1329-­‐33. Manual Curation 12
  • 13. POWER TO THE CURATORS augment existing tools Give more people the power to curate! Fill in the gap for all the things that won’t be easy to cover with these approaches; this will allow researchers to be/er contribute their efforts. Big data are not a subsRtute for, but a supplement to tradiRonal data collecRon and analysis. The Parable of Google Flu. Lazer et al. 2014. Science 343 (6176): 1203-­‐1205. v Enable more curators to work v Enable be/er scienRfic publishing v Credit curators for their work Manual Curation 13
  • 14. IMPROVING TOOLS FOR MANUAL ANNOTATION our plan “More and more sequences”: more genomes, within populaRons and across species, are now being sequenced. This begs the need for a universally accessible genome curaRon tool: To produce accurate sets of genomic features. Manual Curation 14 To address the need to correct for more frequent assembly and automated predicRon errors due to new sequencing technologies.
  • 15. GENOME ANNOTATION an inherently collaborative task Researchers oeen turn to colleagues for second opinions and insight from those with experRse in parRcular areas (e.g., domains, families). To facilitate and encourage this, we conRnue to improve Apollo. New Javascript-­‐based Apollo : h/p://GenomeArchitect.org v Web based for easy access. v Concurrent access supports real Rme collaboraRon. v Built-­‐in support for standards (transparently compliant). v AutomaRc generaRon of ready-­‐made computable data. v Client-­‐side applicaRon relieves server bo/leneck and supports privacy. v Supports annotaRon of genes, pseudogenes, tRNAs, snRNAs, snoRNAs, ncRNAs, miRNAs, TEs, and repeats. APOLLO 15
  • 16. WEB APOLLO v Integrated with JBrowse. v Two new tracks: “AnnotaRons” and “DNA Sequence” v IntuiRve annotaRon, gestures and pull-­‐down menus to create and edit transcripts and exons structures, insert comments (CV, freeform text), etc. v Customizable look, feel & funcRonality. v Edits in one client are instantly pushed to all other clients: CollaboraRve! 16 APOLLO
  • 17. WEB APOLLO v Provides dynamic access to genomic analysis results from UCSC and Chado databases, as well as database storage of user-­‐created annotaRons. v All user-­‐created sequence annotaRons are automaRcally uploaded to a server, ensuring reliability. 17 Chado UCSC (MySQL) Ensembl (DAS) BAM BED BigWig GFF3 MAKER output APOLLO
  • 18. WEB APOLLO architecture 1 APOLLO 18 2 3
  • 19. DISPERSED COMMUNITIES collaborative manual annotation efforts We conRnuously train and support hundreds of geographically dispersed scienRsts from many research communiRes to conduct manual annotaRons, recovering coding sequences in agreement with all available biological evidence using Web Apollo. v Gate keeping and monitoring. v Tutorials, training workshops, and geneborees. v Personalized user support. 19 APOLLO
  • 20. CURATION in this context 20 1 IdenRfies elements that best represent the underlying biology (including missing genes) and eliminates elements that reflect systemic errors of automated analyses. 2 Assigns funcRon through comparaRve analysis of similar genome elements from closely related species using literature, databases, and researchers’ lab data. Examples Comparing 7 ant genomes contributed to be/er understanding evoluRon and organizaRon of insect socieRes at the molecular level; e.g. division of labor, mutualism, chemical communicaRon, etc. Libbrecht et al. 2012. Genome Biology 2013, 14:212 Queen Bee Insect Methylome Worker Bee Castes Larva Dnmt Royal jelly RNAi Kucharski et al. 2008. Science (319) 5871: 1827-­‐1830 Anchoring molecular markers to reference genome pointed to chromosomal rearrangements & detecRng signals of adapRve radiaRon in Heliconius bu/erflies. APOLLO Joron et al. 2011. Nature, 477:203-­‐206
  • 21. WORKING TOGETHER we have obtained better results ScienRfic community efforts bring together domain-­‐specific and natural history experRse that would otherwise remain disconnected. Breaking down large amounts of data into manageable porRons and mobilizing groups of researchers to extract the most accurate representaRon of the biology from all available data disRlls invaluable knowledge from genome analysis. 21 APOLLO
  • 22. CURRENT COLLABORATIONS training and contributions Partnerships UNIVERSITY of MISSOURI Phlebotomus papatasi Wasmania auropunctata WEB APOLLO 22 National Agricultural Library Nature Reviews Gene-cs 2009 (10), 346-­‐347 Norwegian Spruce h/p://congenie.org/ Tallapoosa darter hGp://dendrome.ucdavis.edu/treegenes/browsers/ h/p://darter2.westga.edu/ Pinus taeda Homo sapiens hg19
  • 23. TRAINING CURATORS a little training goes a long way! Provided with the right tools, wet lab scienRsts make excepRonal curators who can easily learn to maximize the generaRon of accurate, biologically supported gene models. 23 APOLLO
  • 25. WEB APOLLO the sequence selection window Sort Becoming Acquainted with Web Apollo. 25 25
  • 26. WEB APOLLO graphical user interface (GUI) for editing annotations NavigaRon tools: pan and zoom Grey bar of coordinates indicates locaRon. You can also select here in order to zoom to a sub-­‐region. Search box: go to a scaffold or a gene model. ‘View’: change color by CDS, toggle strands, set highlight. ‘File’: Upload your own evidence: GFF3, BAM, BigWig, VCF*. Add combinaRon and sequence search tracks. ‘Tools’: Use BLAT to query the genome with a protein or DNA sequence. Available Tracks ‘User-­‐created AnnotaRons’ Track Evidence Tracks Area Login 26 Becoming Acquainted with Web Apollo.
  • 27. WEB APOLLO additional functionality In addiRon to protein-­‐coding gene annotaRon that you know and love. • Non-­‐coding genes: ncRNAs, miRNAs, repeat regions, and TEs • Sequence alteraRons (less coverage = more fragmentaRon) • VisualizaRon of stage and cell-­‐type specific transcripRon data as coverage plots, heat maps, and alignments 27 27 Becoming Acquainted with Web Apollo.
  • 28. GENERAL PROCESS OF CURATION steps to remember 1. Select a chromosomal region of interest, e.g. scaffold. 2. Select appropriate evidence tracks. 3. Determine whether a feature in an exisRng evidence track will provide a reasonable gene model to start working. -­‐ If yes: select and drag the feature to the ‘User-­‐created AnnotaRons’ area, creaJng an iniJal gene model. If necessary use ediRng funcRons to adjust the gene model. -­‐ If not: let’s talk. 4. Check your edited gene model for integrity and accuracy by comparing it with available homologs. Always remember: Becoming Acquainted 28 | with Web Apollo when annotaRng gene models using Web Apollo, you are looking at a ‘frozen’ version of the genome assembly and you will not be able to modify the assembly itself. 28
  • 29. Choose (click or drag) appropriate evidence tracks from the list on the lee. Click on an exon to select it. Double click on an exon or single click on an intron to select the enRre gene. Select & drag any elements from an evidence track into the curaRon area: these are editable and considered the curated version of the gene. Other opRons for elements in evidence tracks available from right-­‐click menu. If you select an exon or a gene, then every track is automaRcally searched for exons with exactly the same co-­‐ordinates as what you selected. Matching edges are highlighted red. Hovering over an annotaRon in progress brings up an informaRon pop-­‐up. 29 | Becoming Acquainted with Web Apollo. 2 9 USER NAVIGATION
  • 30. USER NAVIGATION Right-­‐click menu: • With the excepRon of deleRng a model, all edits can be reversed with ‘Undo’ opRon. ‘Redo’ also available. All changes are immediately saved and available to all users in real Rme. • ‘Get sequence’ retrieves pepRde, cDNA, CDS, and genomic sequences. • You can select an exon and select ‘Delete’. You can create an intron, flip the direcRon, change the start or split the gene. 30 | Becoming Acquainted with Web Apollo. 30
  • 31. USER NAVIGATION Right-­‐click menu: • If you select two gene models, you can join them using ‘Merge’, and you may also ‘Split’ a model. • You can select ‘Duplicate’, for example to annotate isoforms. • Set translaRon start, annotate selenocysteine-­‐containing proteins, match edges of annotaRon to those of evidence tracks. 31 | Becoming Acquainted with Web Apollo. 31
  • 32. 32 AnnotaRons, annotaRon edits, and History: stored in a centralized database. 32 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 33. The AnnotaRon InformaRon Editor DBXRefs are database crossreferences: if you have reason to believe that this gene is linked to a gene in a public database (including your own), then add it here. 33 33 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 34. The AnnotaRon InformaRon Editor 34 • Add PubMed IDs • Include GO terms as appropriate from any of the three ontologies • Write comments staRng how you have validated each model. 34 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 35. • ‘Zoom 35 | to base level’ opRon reveals the DNA Track. • Change color of exons by CDS from the ‘View’ menu. • The reference DNA sequence is visible in both direcRons as are the protein translaRons in all six frames. You can toggle either direcRon to display only 3 frames. Zoom in/out with keyboard: shie + arrow keys up/down 35 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 36. Web Apollo User Guide (Fragment) http://genomearchitect.org/web_apollo_user_guide
  • 37. ANNOTATING SIMPLE CASES In a “simple case” the predicted gene model is correct or nearly correct, and this model is supported by evidence that completely or mostly agrees with the predicRon. Evidence that extends beyond the predicted model is assumed to be non-­‐coding sequence. The following secRons describe simple modificaRons. 37 | Becoming Acquainted with Web Apollo. 37
  • 38. ADDING EXONS Select and drag the putaRve new exon from a track, and add it directly to an annotated transcript in the ‘User-­‐created AnnotaRons’ area. • Click the exon, hold your finger on the mouse bu/on, and drag the cursor unRl it touches the receiving transcript. A dark green highlight indicates it is okay to release the mouse bu/on. • When released, the addiRonal exon becomes a/ached to the receiving transcript. • A 38 | confirmaRon box will warn you if the receiving transcript is not on the same strand as the feature where the new exon originated. 38 Becoming Acquainted with Web Apollo.
  • 39. ADDING EXONS Each Rme you add an exon region, whether by extension or adding an exon, Web Apollo recalculates the longest ORF, idenRfying ‘Start’ and ‘Stop’ signals and allowing you to determine whether a ‘Stop’ codon has been incorporated aeer each ediRng step. 39 | Web Apollo demands that an exon already exists as an evidence in one of the tracks. You could provide a text file in GFF format and select File à Open. GFF is a simple text file delimited by TABs, one line for each genomic ‘feature’: column 1 is the name of the scaffold; then some text (irrelevant), then ‘exon’, then start, stop, strand as + or -­‐, a dot, another dot, and Name=some name Example: scaffold_88 Qratore exon 21 2111 + . . Name=bob scaffold_88 Qratore exon 2201 5111 + . . Name=rad 39 Becoming Acquainted with Web Apollo.
  • 40. ADDING UTRs Gene predicRons may or may not include UTRs. If transcript alignment data are available and extend beyond your original annotaRon, you may extend or add UTRs. 1. PosiRon the cursor at the beginning of the exon that needs to be extended and ‘Zoom to base level’. 2. Place the cursor over the edge of the exon unRl it becomes a black arrow then click and drag the edge of the exon to the new coordinate posiRon that includes the UTR. View zoomed to base level. The DNA track and annotaRon track are visible. The DNA track includes the sense strand (top) and anR-­‐sense strand (bo/om). The six reading frames flank the DNA track, with the three forward frames above and the three reverse frames below. The User-­‐ created AnnotaRon track shows the terminal end of an annotaRon. The green rectangle highlights the locaRon of the nucleoRde residues in the ‘Stop’ signal. 40 | To add a new spliced UTR to an exisRng annotaRon follow the procedure for adding an exon. 40 Becoming Acquainted with Web Apollo.
  • 41. EXON STRUCTURE INTEGRITY 1. Zoom in sufficiently to clearly resolve each exon as a disRnct rectangle. 2. Two exons from different tracks sharing the same start and/or end coordinates will display a red bar to indicate the matching edges. 3. SelecRng the whole annotaRon or one exon at a Rme, use this ‘edge-­‐ matching’ funcRon and scroll along the length of the annotaRon, verifying exon boundaries against available data. Use square [ ] brackets to scroll from exon to exon. 4. Note if there are cDNA / RNAseq reads that lack one or more of the annotated exons or include addiRonal exons. 41 | Becoming Acquainted with Web Apollo. 41
  • 42. EXON STRUCTURE INTEGRITY To modify an exon boundary and match data in the evidence tracks: select both the offending exon and the feature with the expected boundary, then right click on the annotaRon to select ‘Set 3’ end’ or ‘Set 5’ end’ as appropriate. 42 | In some cases all the data may disagree with the annotaRon, in other cases some data support the annotaRon and some of the data support one or more alternaRve transcripts. Try to annotate as many alternaRve transcripts as are well supported by the data. 42 Becoming Acquainted with Web Apollo.
  • 43. EDITING LOGIC Flags non-­‐canonical splice sites. SelecRon of features and sub-­‐ features Edge-­‐matching ‘User-­‐created AnnotaRons’ Track Evidence Tracks Area The ediRng logic in the server: § selects longest ORF as CDS § flags non-­‐canonical splice sites 43 Becoming Acquainted with Web Apollo.
  • 44. SPLICE SITES Zoom to base level to review non-­‐ canonical splice site warnings. These do not necessarily need to be corrected, but should be flagged with the appropriate comment. Curated model Exon/intron juncRon possible error 44 | Original model Non-­‐canonical splices are indicated by an orange circle with a white exclamaRon point inside, placed over the edge of the offending exon. Most insects, have a valid non-­‐canonical site GC-­‐AG. Other non-­‐canonical splice sites are unverified. Web Apollo flags GC splice donors as non-­‐canonical. Canonical splice sites: forward strand 5’-­‐…exon]GT / AG[exon…-­‐3’ reverse strand, not reverse-­‐complemented: 3’-­‐…exon]GA / TG[exon…-­‐5’ 44 Becoming Acquainted with Web Apollo.
  • 45. SPLICE SITES keep this in mind Some gene predicRon algorithms do not recognize GC splice sites, thus the intron/exon juncRon may be incorrect. For example, one such gene predicRon algorithm may ignore a true GC donor and select another non-­‐canonical splice site that is less frequently observed in nature. Therefore, if upon inspecRon you find a non-­‐ canonical splice site that is rarely observed in nature, you may wish to search the region for a more frequent in-­‐frame non-­‐canonical splice site, such as a GC donor. If there is an in-­‐frame site close that is more likely to be the correct splice donor, you may make this adjustment while zoomed at base level. Curated model Exon/intron junction possible error Canonical splice sites: 5’-­‐…exon]GT / AG[exon…-­‐3’ reverse strand, not reverse-­‐complemented: 3’-­‐…exon]GA / TG[exon…-­‐5’ 45 | Original model Use RNA-­‐Seq data to make a decision. forward strand 45 Becoming Acquainted with Web Apollo.
  • 46. ‘START’ AND ‘STOP’ SITES Web Apollo calculates the longest possible open reading frame (ORF) that includes canonical ‘Start’ and ‘Stop’ signals within the predicted exons. If it appears to have calculated an incorrect ‘Start’ signal, you may modify it selecRng an in-­‐frame ‘Start’ codon further up or downstream, depending on evidence (protein database, addiRonal evidence tracks). An upstream ‘Start’ codon may be present outside the predicted gene model, within a region supported by another evidence track. 46 | Becoming Acquainted with Web Apollo. 46
  • 47. ‘START’ AND ‘STOP’ SITES keep this in mind Note that the ‘Start’ codon may also be located in a non-­‐predicted exon further upstream. If you cannot idenRfy that exon, add the appropriate note in the transcript’s ‘Comments’ secRon. In very rare cases, the actual ‘Start’ codon may be non-­‐canonical (non-­‐ATG). In some cases, a ‘Stop’ codon may not be automaRcally idenRfied. Check to see if there are data supporRng a 3’ extension of the terminal exon or addiRonal 3’ exons with valid splice sites. 47 | Becoming Acquainted with Web Apollo. 47
  • 49. COMPLEX CASES merge two gene predictions on the same scaffold Evidence may support joining two or more different gene models. Warning: protein alignments may have incorrect splice sites and lack non-­‐conserved regions! 1. Drag and drop each gene model to ‘User-­‐created AnnotaRons’ area. Shie click to select an intron from each gene model and right click to select the ‘Merge’ opRon from the menu. 2. Drag supporRng evidence tracks over the candidate models to corroborate overlap, or review edge matching and coverage across models. 3. Check the resulRng translaRon by querying a protein database e.g. UniProt. Record the IDs of both starRng gene models in ‘DBXref’ and add comments to record that this annotaRon is the result of a merge. Red lines around exons: ‘edge-­‐matching’ allows annotators to confirm whether the evidence is in agreement without examining each exon at the base level. 49 | Becoming Acquainted with Web Apollo. 49
  • 50. COMPLEX CASES split a gene prediction One or more splits may be recommended when different segments of the predicted protein align to two or more different families of protein homologs, and the predicted protein does not align to any known protein over its enRre length. Transcript data may support a split (if so, verify that it is not a case of alternaRve transcripts). 50 | Becoming Acquainted with Web Apollo. 50
  • 51. frameshifts, single-base errors, and selenocysteines DNA Track COMPLEX CASES ‘User-­‐created AnnotaJons’ Track 51 Becoming Acquainted with Web Apollo.
  • 52. COMPLEX CASES frameshifts, single-base errors, and selenocysteines 1. Web Apollo allows annotators to make single base modificaRons or frameshies that are reflected in the sequence and structure of any transcripts overlapping the modificaRon. Note that these manipulaRons do NOT change the underlying genomic sequence. 2. If you determine that you need to make one of these changes, zoom in to the nucleoRde level and right click over a single nucleoRde on the genomic sequence to access a menu that provides opRons for creaRng inserRons, deleRons or subsRtuRons. 3. The ‘Create Genomic InserRon’ feature will require you to enter the necessary string of nucleoRde residues that will be inserted to the right of the cursor’s current locaRon. The ‘Create Genomic DeleRon’ opRon will require you to enter the length of the deleRon, starRng with the nucleoRde where the cursor is posiRoned. The ‘Create Genomic SubsRtuRon’ feature asks for the string of nucleoRde residues that will replace the ones on the DNA track. 4. Once you have entered the modificaRons, Web Apollo will recalculate the corrected transcript and protein sequences, which will appear when you use the right-­‐click menu ‘Get Sequence’ opRon. Since the underlying genomic sequence is reflected in all annotaRons that include the modified region you should alert the curators of your organisms database using the ‘Comments’ secRon to report the CDS edits. 5. In special cases such as selenocysteine containing proteins (read-­‐throughs), right-­‐click over the offending/premature ‘Stop’ signal and choose the ‘Set readthrough stop codon’ opRon from the menu. 52 | Becoming Acquainted with Web Apollo. 52
  • 53. COMPLETING THE ANNOTATION Follow our checklist unRl you are happy with the annotaRon! Then: – Comment to validate your annotaRon, even if you made no changes to an exisRng model. Your comments mean you looked at the curated model and are happy with it; think of it as a vote of confidence. – Or add a comment to inform the community of unresolved issues you think this model may have. Always Remember: 53 | 53 Web Apollo curaRon is a community effort so please use comments to communicate the reasons for your annotaRon (your comments will be visible to everyone). Becoming Acquainted with Web Apollo.
  • 54. CHECK LIST for accuracy and integrity 1. Can you add UTRs (e.g.: via RNA-­‐Seq)? 2. Check exon structures 3. Check splice sites: most splice sites display these residues …]5’-­‐GT/AG-­‐3’[… 4. Check ‘Start’ and ‘Stop’ sites 5. Check the predicted protein product(s) – Align it against relevant genes/gene family. – blastp against NCBI’s RefSeq or nr 6. If the protein product sRll does not look correct then check: – Are there gaps in the genome? – Merge of 2 gene predicRons on the same scaffold – Merge of 2 gene predicRons from different scaffolds – Split a gene predicRon – Frameshies – error in the genome assembly? – Selenocysteine, single-­‐base errors, and other inconvenient phenomena 54 | 54 7. Finalize annotaRon by adding: – Important project informaRon in the form of canned and/or customized comments – IDs from GenBank (via DBXRef), gene symbol(s), common name(s), synonyms, top BLAST hits (with GenBank IDs), orthologs with species names, and everything else you can think of, because you are the expert. – Whether your model replaces one or more models from the official gene set (so it can be deleted). – The kinds of changes you made to the gene model of interest, if any. E.g.: splits, merges, whether the 5’ or 3’ ends had to be modified to include ‘Start’ or ‘Stop’ codons, addiRonal exons had to be added, or non-­‐canonical splice sites were accepted. – Any funcRonal assignments that you think are of interest to the community (e.g. via BLAST, RNA-­‐Seq data, literature searches, etc.) Becoming Acquainted with Web Apollo.
  • 55. FUTURE PLANS interactive analysis and curation of variants v InteracRve exploraRon of VCF files (e.g. from GATK, VAAST) in addiRon to BAM and GVF. MulRple tracks in one: visualizaRon of geneRc alteraRons and populaRon frequency of variants. v Clinical applicaRons: analysis of WEB APOLLO 55 1 1 2 Copy Number VariaRons for regulatory effects; overlaying display of the regulatory domains. Philips-­‐Creminis and Corces. 2013. Cell 50 (4):461-­‐474 2 TADs: topologically associaRng domains
  • 56. FUTURE PLANS educational tools We are working with educators to make Web Apollo part of their curricula. In the classroom. Lecture Series. WEB APOLLO 56 At the lab. Classroom exercises: from genome sequence to hypothesis. CuraRon group dedicated to producing educaRon materials for non-­‐model organism communiRes. Our team provides online documentaRon, hands-­‐on training, and rapid response to users.
  • 57. Exercises Live DemonstraRon using the Apis mellifera genome. 57 1. Evidence in support of protein coding gene models. 1.1 Consensus Gene Sets: Official Gene Set v3.2 Official Gene Set v1.0 1.2 Consensus Gene Sets comparison: OGSv3.2 genes that merge OGSv1.0 and RefSeq genes OGSv3.2 genes that split OGSv1.0 and RefSeq genes 1.3 Protein Coding Gene PredicJons Supported by Biological Evidence: NCBI Gnomon Fgenesh++ with RNASeq training data Fgenesh++ without RNASeq training data NCBI RefSeq Protein Coding Genes and Low Quality Protein Coding Genes 1.4 Ab ini&o protein coding gene predicJons: Augustus Set 12, Augustus Set 9, Fgenesh, GeneID, N-­‐SCAN, SGP2 1.5 Transcript Sequence Alignment: NCBI ESTs, Apis cerana RNA-­‐Seq, Forager Bee Brain Illumina ConRgs, Nurse Bee Brain Illumina ConRgs, Forager RNA-­‐Seq reads, Nurse RNA-­‐Seq reads, Abdomen 454 ConRgs, Brain and Ovary 454 ConRgs, Embryo 454 ConRgs, Larvae 454 ConRgs, Mixed Antennae 454 ConRgs, Ovary 454 ConRgs Testes 454 ConRgs, Forager RNA-­‐Seq HeatMap, Forager RNA-­‐Seq XY Plot, Nurse RNA-­‐Seq HeatMap, Nurse RNA-­‐Seq XY Plot Becoming Acquainted with Web Apollo.
  • 58. Exercises Live DemonstraRon using the Apis mellifera genome. 58 1. Evidence in support of protein coding gene models (ConJnued). 1.6 Protein homolog alignment: Acep_OGSv1.2 Aech_OGSv3.8 Cflo_OGSv3.3 Dmel_r5.42 Hsal_OGSv3.3 Lhum_OGSv1.2 Nvit_OGSv1.2 Nvit_OGSv2.0 Pbar_OGSv1.2 Sinv_OGSv2.2.3 Znev_OGSv2.1 Metazoa_Swissprot 2. Evidence in support of non protein coding gene models 2.1 Non-­‐protein coding gene predicJons: NCBI RefSeq Noncoding RNA NCBI RefSeq miRNA 2.2 Pseudogene predicJons: NCBI RefSeq Pseudogene Becoming Acquainted with Web Apollo.
  • 59. Web Apollo Workshop Instances h/p://genomes.missouri.edu:8080/Amel_4.5_demo_1 h/p://genomes.missouri.edu:8080/Amel_4.5_demo_2 Workshop DocumentaRon at
  • 60. FEDERATED ENVIRONMENT other BBOP tools BBOP Projects 60
  • 61. • Berkeley BioinformaJcs Open-­‐source Projects (BBOP), Berkeley Lab: Web Apollo and Gene Ontology teams. Suzanna E. Lewis (PI). • § ChrisRne G. Elsik (PI). University of Missouri. • * Ian Holmes (PI). University of California Berkeley. • Arthropod genomics community: i5K Steering Commi/ee, Alexie Papanicolaou (CSIRO), Monica Poelchau (USDA/NAL), fringy Richards (HGSC-­‐BCM), BGI, 1KITE h/p://www.1kite.org/, and the Honey Bee Genome Sequencing ConsorRum. • AgriPest Base, Hymenoptera Genome Database, VectorBase, FlyBase. • Web Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI, and by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-­‐ AC02-­‐05CH11231. • Insect images used with permission: h/p://AlexanderWild.com and O. Niehuis. • For your a8enJon, thank you! Thanks! Web Apollo Nathan Dunn Colin Diesh § Deepak Unni § BBOP Gene Ontology Chris Mungall Seth Carbon Heiko Dietze Alumni Gregg Helt Ed Lee Rob Buels* Web Apollo: h/p://GenomeArchitect.org GO: h/p://GeneOntology.org i5K: h/p://arthropodgenomes.org/wiki/i5K Thank you. 61