SlideShare a Scribd company logo
Training materials
Ensembl materials are protected by a CC BY license
http://creativecommons.org/licenses/by/4.0/
If you wish to re-use these, please credit Ensembl for
their creation
If you use Ensembl for your work, please cite our papers
http://www.ensembl.org/info/about/publications.html
Denise Carvalho-Silva
European Molecular Biology Laboratory
European Bioinformatics Institute
Browsing Genes, Variation and
Regulation data with Ensembl
UCD - Dublin
Today 09:30-17:00
•  Introduction to Ensembl
•  Browser walkthrough
10:45-11:00 coffee/tea
•  Browser exercises
•  BioMart (Talk + Exercises)
13:00-14:00 lunch break
•  Genetic variation (Talk + Exercises)
15:30-15:45 coffee/tea
•  Gene Regulation and/or Custom data (Talk + Exercises)
•  Wrap up, photo opportunity & feedback survey
http://www.ebi.ac.uk/
~denise/workshops/
2016/dublin
Materials
Course Objectives
What is Ensembl? What type of data can
you get in Ensembl?
How to navigate the
Ensembl browser website?
How to connect
with Ensembl
bli blo
bla blu
bla bla
bli blo
bla blu
bla bla
bli blo
bla blu
bla bla
bla blu
bla bla
bli blo
Introduction
Why do we need/have genome browsers?
Genome sequencing
1977: 1st genome to be sequenced (5 kb)
2000: draft human sequence (3 gb)
Large amounts of raw DNA sequence data
Raw DNA sequence data
Annotation: making sense
Annotation of vertebrate genomeswww.ensembl.org
pre.ensembl.org
>80 genomes*
D. melanogaster
C. elegans
S. cerevisae
*Release 84
March 2016
1 human genome à 3 assemblies
www.ensembl.org grch37.ensembl.org
e54.ensembl.org
EBI is an Outstation of the European Molecular Biology Laboratory.
Comparative GenomicsGene models
RegulationVariation
Custom data displayProgrammatic access
Toolkit
Ensembl Features
EBI is an Outstation of the European Molecular Biology Laboratory.
Comparative GenomicsGene models
RegulationVariation
Custom data displayProgrammatic access
Toolkit
Ensembl Features
Ensemblautomatic
annotation
Gene models in Ensembl
Goal: Generate set of well-supported genes
Automatic Manual
•  many species
•  genome-wide at once
•  ~ 4 months
•  fewer species
•  gene by gene
•  many years
Automatic and coding (20_)
Manual and coding (00_)
Automatic + Manual
(“gold”)
Manual and non-coding (00_)
Automatic annotation* Manual annotation*
* based on experimental, biological evidence (INSDC, UniProtKB…)
Ensembl genes & transcripts
•  merged annotation
•  higher confidence and quality
•  comprehensive: alternatively spliced transcripts
UTR ExonIntron
5’ UTR3’ UTR
Gold (identical annotation) = Automatic + Manual
Alternatively splicing
rich and comprehensive annotation
Which transcript to use?
http://www.ensembl.org/Help/Glossary?id=493
http://www.ensembl.org/Help/Glossary?id=492
APPRIS
TSLs
CCDS project
•  annotate a consensus coding DNA sequence set
•  EBI, WTSI, UCSC and NCBI
• 
Genome Res. 19:1316-23 (2009)
http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi
CCDS transcript
Disclaimer: which transcript to use
No single method will tell us which transcript to use
Decision on a case by case basis
•  All transcripts OR one/two well supported ones?
List of transcripts: we offer choices based on
•  CCDS (Ensembl, HAVANA, NCBI, UCSC)
•  Golden transcripts (identical Ensembl and HAVANA)
•  Cross reference entries (e.g. UniProtKB, RefSeq)
•  APPRIS
•  TSLs
Annotation based on RNASeq
http://www.ensembl.org/info/genome/genebuild/
rnaseq_annotation.html
ncRNA gene annotation
http://www.ensembl.org/info/genome/genebuild/ncrna.html
Ensembl stable identifiers
•  ENSG########### Ensembl Gene ID
•  ENST########### Ensembl Transcript ID
•  ENSP########### Ensembl Peptide ID
•  ENSE########### Ensembl Exon ID
•  For non-human species a suffix is added:
ENSMUSG MUS (Mus musculus) for mouse
Ensembl Browser Workshop
Ensembl Browser
Live demo:
Walking through the website
pages 11-31
The ESPN gene products are active in the
inner ear, where it appears to play an
essential role in normal hearing and balance.
Let’s explore ESPN
Before we start: background
A)  What is the location and strand of the
human ESPN gene?
B)  How can I view protein alignments and
variants mapped to this location?
C)  Can I move data tracks up and down,
share and delete tracks?
Human ESPN: location
A)  How can I find the genomic sequence of
this gene? What is the ID of its first exon?
B)  Can I display the genomic coordinates and
variants on this sequence?
C)  Can I find information on the expression
of this gene in different tissues?
Human ESPN: gene
A)  How many exons does the longest ESPN
transcript have? Are there any completely
untranslated exons?
B)  Can I find its cDNA sequence?
C)  What are the UniProt and RefSeq entries
cross referenced to this transcript?
Human ESPN: transcript
Ensembl Browser
Exercises
pages 32-35
Answers
www.ebi.ac.uk/~denise/workshops/2016/
dublin/answers
Feel free to explore your favourite gene/region too!
EBI	
  is	
  an	
  Outsta,on	
  of	
  the	
  European	
  Molecular	
  Biology	
  Laboratory.	
  	
  
BioMart
Outline
•  Definitions
•  The principle: 4 steps
•  Tutorial: simple query in human
•  Find Ensembl BioMart and BioMart elsewhere
•  Sophisticated platforms: mart services, APIs, etc…
•  Exercises
Would you like to…
•  … convert protein IDs into gene IDs or names?
•  … get a list of all genes mapped to a region deleted in a
patients’ cohort?
•  … export sequences for a bunch of genes or variants?
If you answered yes, keep listening!
What is BioMart?
•  Free service for easy retrieval of Ensembl data
•  Data export tool with little/no programming required
•  Complex queries with a few mouse clicks
•  Output formats (.xls, .csv, fasta, tsv, html)
The four-step principle
DATA FILTERS ATTRIBUTES RESULTS
IDs
Regions
Domains
Expression
Tables
Fasta
Dataset
Database
Homologs
Sequences
Features
Structures
Choosing the data
Database and dataset
Limit your data set
(information that you know)
Selecting the filters
Click “Count” to see
if BioMart is reading
the input data
Picking the attributes
Determine output columns
(information you want to know)
The different attributes
Getting the results
Tables/sequences
click “Unique
results only”
For the full
table: click
View “ALL”
rows or “Go”
Selected IBD genes
IL23R, PTPN22,
CUL2, C1orf106, IL18RAP
For the IL23R, PTPN22, CUL2, C1orf106 and
IL18RAP genes, use BioMart to retrieve a
table (.xls) containing:
•  Associated gene name, ENSG and ENST IDs
•  Chromosome name, gene start and end
•  GO term name and Interpro description
Tutorial: BioMart
The four-step principle
DATA FILTERS ATTRIBUTES RESULTS
Gene
Gene name,
ENSG/ENST ID,
Chr start end,
GO term name,
Interpro
description
.xls table
Human
IL23R,
CUL2,
PTPN22,
C1orf106
IL18RAP
Ensembl BioMart
Live demo
Find BioMart
www.ensembl.org/biomart
Ensembl BioMarts
http://tinyurl.com/biomart-video
BioMart video
More sophisticated platforms
•  BioMart queries: MartService
www.biomart.org/martservice.html
•  APIs: PERL, Java, Web Services
•  Third party softwares
galaxyproject.orgbioconductor.orgtaverna.org.uk
Ensembl Browser Workshop
Ensembl BioMart
Thomas
Maurel
Amonida
Zadissa
BioMart
Step-by-step example
pages 36-40
Exercises
pages 41-43
Answers
www.ebi.ac.uk/~denise/workshops/2016/
dublin/answers
Feel free to explore BioMart in other contexts too!
EBI	
  is	
  an	
  Outsta,on	
  of	
  the	
  European	
  Molecular	
  Biology	
  Laboratory.	
  	
  
Genetic
Variation
Outline
•  Classes of variation, species and sources
•  Browsing variation data: some entry points
Location tab
Gene tab
Variation tab
•  Phenotypic data and population genetics
•  How to annotate your own variants
•  Exercises
1) Large scale: structural (> 50 base pairs)
Genetic variation
duplication deletion inversion translocation loss
2) Short scale: SNPs (or SNVs), indels
G A C TG A C TA T C G GG GTT TCC CA A A
G A A TG A C TT T C G G- G-T TCC -A A A
Species with variation data
Understand the types of genetic variation data and
how to view them in the context of our genomes
Sources of variation data
•  Import alleles and frequencies
•  Annotate variants
http://www.ensembl.org/info/docs/variation/sources_documentation.html
Location tab: across a region
SVsSNPs
Ensembl
genes
Gene tab: gene-centric
SNPs
SVs
Variation tab: variant centric
summary	
  data	
  
SNP
or
SV
Variants on the karyotype
Phenotype data in Ensembl
species and sources
Population data for variants
http://hapmap.ncbi.nlm.nih.gov/
http://www.1000genomes.org
pie charts:
1KG super populations
Human Population Genetics
Coffee intake is a worldwide phenomenon
with Finland at the top, and UK in the 44th
place. Is caffeine consumption in our genes?
A)  What are the chromosome locations of variants associated
with this phenotype?
B)  Which variant has got the most significant association?
C)  What is the ancestral allele of this variant? Is it conserved
in eutherian mammals?
D)  What is the most frequent allele in GBR?
E)  Can you download this variant and 200 nt upstream and
downstream flanking sequence in RTF (Rich Text Format)?
Live demo
before we find out
•  What is it?
•  What does it do?
•  Where can I find it?
I’ve got a list of genetic variants from my resequencing project
of a cohort study of breast cancer in London. The positions are
all on chromosome 9, GRCh37 assembly:
131090740 A/- (positive strand)
131084628 C/A (positive strand)
131085358 C/G (positive strand)
131085196 G/A (positive strand)
1)  Do any of these cause a change at the amino acid level?
2)  Are these predicted to be deleterious according to PolyPhen?
3)  Can I get the flanking sequence (200 nt both up and
downstream) for the known variants in this set?
A use case: cancer patients
My resequencing experiments
cancer patients
X
healthy controls
9 131090740 131090740 A/-
9 131084628 131084628 C/A
9 131085358 131085358 C/G
9 131085196 131085196 G/A
Chromosome
Alleles
End
Start
Positions in the genome
vary between the two groups
Can I annotate these variants?
•  Variant Effect Predictor
•  Annotate variants (SNPs, CNVs, indels)
•  Available for GRCh38 and GRCh37 (hg=19)
Yes, you can!
PMID: 20562413
Perl	
  script	
  Web	
  interface	
   REST	
  API	
  
XML	
  
CODING
Synonymous
INTRONIC5’ UTR
ATG AAAAAAA
Regulatory
Splice sites
CODING
Missense
3’ UTR5’ Upstream 3’ downstream
Mapping variants on transcripts
Identify transcripts that overlap variants and
predict the consequence of these on Ensembl
(or RefSeq) transcripts using
Consequence terms for variants
http://www.ensembl.org/info/genome/variation/predicted_data.html#consequence_type_table
* defined by the Sequence Ontology (SO) project (http://www.sequenceontology.org/)
SIFT
sift.jcvi.org/
Consequence: missense
GAG >GGG
Glu > Gly
PolyPhen-2
genetics.bwh.harvard.edu/pph2/
Condel
dbNSFP
Ensembl tools
http://www.ensembl.org/tools.html
http://www.ensembl.org/vep
Inputting data into
Chromosome
Start
End
Alleles
Strand
Output options in
GAG > GGG
Glu > Gly
GAG > GAA
Glu > Glu
Queued
Running
Done
Failed
Save to your account (log in)
Edit and resubmit your job
Delete job
Ticket system in
Ticket identifier Job name
Viewing results
SO consequence terms*
*http://www.sequenceontology.org/index.html
ensembl.org/info/docs/tools/vep/online/results.html#summary
Table
•  Before / after filtering
•  novel / existing variants
Pie charts (consequence terms)
•  total observed (more than one per variant)
•  Separate chart: coding consequences
Viewing results
Navigate results
(one row per variant/
transcript overlap)
Show/hide columns in
results table
more columns:
scroll right
•  Download results
•  Send results to BioMart
Create and edit filters
ensembl.org/info/docs/tools/vep/online/results.html#table
results table
Filters consist of three components
Field
•  e.g. Consequence, biotype
Operator
•  e.g. is, matches (partial string matches)
Value
•  the value to compare against
•  some fields have autocomplete values
Multiple filters allowed with logical relationship (AND, OR)
Active filters can be edited too!
ensembl.org/info/docs/tools/vep/online/results.html#filter
Filtering results
My resequencing experiments
cancer patients
X
healthy controls
9 131090740 131090740 A/-
9 131084628 131084628 C/A
9 131085358 131085358 C/G
9 131085196 131085196 G/A
Chromosome
Alleles
End
Start
Positions in the genome
vary between the two groups
Ensembl VEP
Live demo
VEP video
http://tinyurl.com/vep-video
Things to bear in mind
1)  No distinction between polymorphisms and mutations.
Exception HGMD and COSMIC: all mutations;
2)  C/T à first allele is the one in the reference genome,
not necessarily the major or the ancestral;
3)  Ensembl reports all alleles on the forward strand
(different from dbSNP).
Ensembl Variation API
Variation Schema Description
http://useast.ensembl.org/info/docs/api/variation/index.html
Variation Team
Fiona
Cunningham
Will
McLaren
Laurent
Gil
Sarah
Hunt
Anja
Thormann
Ensembl Browser Workshop
Ensembl Genetic Variation
Exercises
pages 46-50
Answers
www.ebi.ac.uk/~denise/workshops/2016/
dublin/answers
Feel free to explore your favourite variant/phenotype too!
EBI	
  is	
  an	
  Outsta,on	
  of	
  the	
  European	
  Molecular	
  Biology	
  Laboratory.	
  	
  
Gene
Regulation
Outline
•  Definition and models
•  Epigenetics and Epigenomics
•  Ensembl Regulation: goal, data sources, species
•  Viewing / accessing regulation data in Ensembl
•  Track hubs: ENCODE, Blueprint
•  Exercises
Regulation of gene expression
•  Change in the production of mRNA/proteins ( or )
•  From the transcription to post-translational levels
•  Models of regulation of gene transcription
•  Basic
•  Expanded
•  Complete ??
Transcription regulation
Transcription Factor Binding Sites Promoter Gene
mRNA
Transcription Factors Activation
Repression RNA polymerase complex
2 nm
basic model
•  TF binding (promoters, enhancers) à transcription
Nucleosomes
Histones
Histone marks
CpG methylation
11 nm
Transcription regulation
expanded model
•  Epigenetic marks may affect the binding of TFs
Histone modifications
dynamically regulating genes
JillS.Butler,andSharonY.R.Dent
Blood2013;121:3076-3084
Packed Chromatin
30 nm
Open Chromatin
Distal enhancer
Complete (???) model
Transcription regulation
Epigenetics/Epigenomics
Epigenetics*
The study of inherited changes in
phenotype without changes in
genotype
Epigenomics
Epigenetics on a genome-wide scale
http://integratedhealthcare.eu/
*One of the routes to regulate gene transcription
Measuring gene expression
Northern/Western blot Microarrays
SAGE
AdpatedfromDarrylLeja,IanDunham
NGS techniques
DNase-seq ChIP-seq
RNASeq
RT-qPCR
ChIP-sequencing
crosslink and shear
TF1 TF2
TF3
TF1
TF3TF2
Antibodies and IP
unlink, purify and DNA sequencing
Y YY
TF1
TF3
TF2
ACGTC CGCTT GAACA
map back to the genome
DNA and proteins
Ensembl Regulation
Goal: Annotate the genome with features that may play a
role in the transcriptional regulation of genes
Multiple data sources: collection and summary
http://www.ensembl.org/info/docs/funcgen/regulation_sources.html
http://www.ensembl.org/Homo_sapiens/Experiment/
Data source: ENCODE
“Encyclopedia of DNA Elements”
Trying to assign function to many regions as possible
Transcription and regulatory information
4,626 datasets, 2,498 cell types à functional elements
PMID: 22955616, PMID: 17571346
http://www.nature.com/encode/#/threads
Data source: Roadmap
NIH consortium: public resource of normal epigenomes
DNA methylation, histone marks, open chromatin, small RNA
http://www.roadmapepigenomics.org/data
http://www.roadmapepigenomics.org/publications
•  EU consortium: generate 100 reference epigenomes
•  Blood cells: healthy individuals and malignant leukaemic
counterparts
•  1046 experiments {ChIP, RNA, Bisulfite, DNase}-Seq
•  425 cell types and seven cell lines
•  http://www.blueprint-epigenome.eu/
Data source: Blueprint
Dataset for the Ensembl build
raw data à Ensembl Regulation pipeline à Ensembl annotation
Regulation data: view
MultiCell: all cell lines combined
Displayed by default
Regulatory features: view
Configure this page à Regulation à Regulatory features
For single and individual cell lines, e.g. GM12878, HUVEC
ChiP-Seq signal for TF
signal
Regulatory Features: motifs
Ensembl regulatory feature
Position Weight Matrix for TF
(JASPAR database)
Viewing the raw NGS data
DNaseI and TFBS
Histone marks
and polymerases
Configure this page à Regulation à Open chromatin &…
Configure this page à Regulation à Histones &…
How to choose raw data: matrix
Supporting evidence:
1) Open chromatin & TFBS
2) Histones & polymerases
http://tinyurl.com/matrix-ensembl
CTCF	
  enriched	
  
Predicted	
  Weak	
  Enhancer/Cis-­‐reg	
  element	
  
Predicted	
  Transcribed	
  Region 	
  	
  
Predicted	
  Enhancer	
  
Predicted	
  Promoter	
  Flank	
  
Predicted	
  Repressed/Low	
  AcAvity	
  
Predicted	
  Promoter	
  with	
  TSS	
  
Segmentation data in Ensemblcategoriesof
combinedsegments
Configure this page à Regulation à Regulatory features
Experimental confirmation
•  CTCF: good recall, reproducible across
multiple cell lines, tight boundaries.
•  TSS:
•  88.9% of FANTOM 5 strict TSSs were covered.
•  Enhancers:
•  92.4% of 882 VISTA enhancers were detected.
•  80.3% of 40279 robust FANTOM 5 enhancers were found.
Methylation data in Ensembl
CpG DNA methylation (RRBS, WGBS, MeDIP)
ENCODE and PMID: 18577705
Configure this page à Regulation à DNA Methylation
The STRADA controls tumor suppressor activities of LKB1
(https://www.wikigenes.org/)
A.  What are the Ensembl regulatory features annotated in this
gene?
B.  Are there any features in the 5’ region of STRADA?
C.  Do the regulatory features for K562, CD8+ cells (ENCODE)
and erythroblast (Blueprint) differ at this region?
D.  What is the stable IDs of the most 5’ regulatory feature?
Tutorial
Browsing Regulation data
tinyurl.com/ensembl-regulation
Things to bear in mind
1)  The annotation of regulatory elements in Ensembl
highlight where the biochemical data (ChIP-seq, etc)
maps to on the human (mouse) genomes;
2)  Features can be nearby genes but might not affect
their transcription/expression;
3)  Disclaimer: Ensembl can not tell you how your
favourite gene is regulated.
In addition to the big names
CpG islands, TSS, miRNA target predictions (TarBase)
Configure this page à Regulation à Other regulatory regions
Configure this page à Sequence and assembly à Simple features
Track hubs in Ensembl
ENCODE data hub in Ensembl
www.ensembl.org/info/encode.html
>2,800 data tracks
Ensembl Regulation in BioMart
Human, mouse and fruit fly
FILTERS
ATTRIBUTES
Ensembl Regulation API
http://useast.ensembl.org/info/docs/api/funcgen/index.html
Funcgen Schema Description
Regulation Team
Thomas
Juettemann
Myrto
Kostadima
Ilias
Lavidas
Michael
Nuhn
Ensembl Browser Workshop
Ensembl Regulation
Exercises
Pages 52-54
Answers
www.ebi.ac.uk/~denise/workshops/2016/
dublin/answers
Feel free to explore your favourite gene/genomic region!
EBI	
  is	
  an	
  Outsta,on	
  of	
  the	
  European	
  Molecular	
  Biology	
  Laboratory.	
  	
  
Custom data
display
Outline
•  Overview
•  Supported file formats
•  Add your own data
•  Where to view your own data
•  Tutorials and exercises
Overview
•  Genome browsers have pre-defined sets of data
•  Need to display personal data
•  Compare one’s own data to publicly available one
•  Requisite: own data organised to specific rules
http://www.ensembl.org/info/website/upload/index.html
Supported files in Ensembl
Sequence alignments
http://www.ensembl.org/info/website/upload/index.html#formats
•  BAM (compact representation)
CRAM (compressed version)
Flexible definition of data lines
Variation data
Feature information
Continuous-valued data
(probability scores)
•  VCF: Variant Call Format
•  BED (Browser Extensible Data)
e.g. chr, start, end
•  Gene Finding Format (GFF)
General Transfer Format (GTF)
•  Wig, BigWig
Add custom data
•  Data upload: small files (< 5MB; file name or URL)
•  Attach your data: larger files (> 5MB; URL)
Things to bear in mind
Saved in a temp location (file system)
Saved in a db if logged in
Standard security
http, https or ftp
How can I add my data
Where to view my data
Structural variants in the 350-50 kb region upstream of the
SOX9 cause severe dysplasia and other phenotypes. Many
enhancers (e.g. E250, located at -250 kb) activate the SOX9
promoter, whereas E70 seems to be active in somatic tissues.
CGH/other experiments have revealed the following deletions :
17 69872078 69886644 patient1
17 70040357 70049956 patient2
17 70111957 70116270 patient3
A)  Is any of these deletions known to be polymorphic in 1KG?
B)  Would these deletions affect E250 and E70?
C)  Do they map to regions of promoter activity or CpG islands?
Tutorial
Custom data display video
http://tinyurl.com/ensembl-upload
Ensembl Browser Workshop
Custom data display
Exercises
Pages 55-57
Answers
www.ebi.ac.uk/~denise/workshops/2016/
dublin/answers
Feel free to attach your own data
Wrap up
Ensembl is the place!
Genes, genomes, variants, regulatory features,
tools and more
Ensembl Retreat June 2015
Latest publication
Acknowledgements
The Entire Ensembl
Team
Funding
Co-funded by the
European Union
Your take home message
Feedback survey
http://tinyurl.com/dublin-020616
Connect with Ensembl
?
??
?
?
?
?
???
helpdesk@ensembl.org
https://www.youtube.com/user/EnsemblHelpdesk
Training materials
Ensembl materials are protected by a CC BY license
http://creativecommons.org/licenses/by/4.0/
If you wish to re-use these, please credit Ensembl for
their creation
If you use Ensembl for your work, please cite our papers
http://www.ensembl.org/info/about/publications.html

More Related Content

PPTX
Genomic Databases-.pptx
PPTX
Ppt of genome annotatioon 2
PPTX
Bioinformatics .pptx
PPTX
Databases short nucletide polymorphism
PPTX
encode project
PPTX
Single Nucleotide Polymorphism
PPTX
Genome sequencing
Genomic Databases-.pptx
Ppt of genome annotatioon 2
Bioinformatics .pptx
Databases short nucletide polymorphism
encode project
Single Nucleotide Polymorphism
Genome sequencing

What's hot (20)

PPT
Exome Sequencing
PPTX
Epigenomics gyanika
PPTX
Third Generation Sequencing
PDF
Vector engineering and codon optimization
PPTX
Comparative genomics
PPTX
Types of genomics ppt
PPTX
Gene silencing
PPTX
transposons complete ppt
PPTX
Human genome project
PDF
Genomic Data Analysis
PPT
PPTX
Molecular tagging
PDF
Overview of Next Gen Sequencing Data Analysis
PPTX
Site Directed mutagenesis- Sonal Singh Shrivas.pptx
PPTX
Translation
PPTX
Plant transposable elements where genetics meets genomics
PPTX
gene prediction programs
PPTX
codon_optimization
Exome Sequencing
Epigenomics gyanika
Third Generation Sequencing
Vector engineering and codon optimization
Comparative genomics
Types of genomics ppt
Gene silencing
transposons complete ppt
Human genome project
Genomic Data Analysis
Molecular tagging
Overview of Next Gen Sequencing Data Analysis
Site Directed mutagenesis- Sonal Singh Shrivas.pptx
Translation
Plant transposable elements where genetics meets genomics
gene prediction programs
codon_optimization
Ad

Viewers also liked (19)

PDF
Variation and the VEP: Ensembl Online Webinar series
PPTX
A Distributed Annotation Pipeline for MSSNG
PDF
Genome Browser
PPTX
Ensembl annotation
PDF
ODP
MyGene.info talk at ISMB/BOSC 2013
PPTX
Open biomedical knowledge using crowdsourcing and citizen science
PPTX
Qt Framework Events Signals Threads
PPTX
UCSD / DBMI seminar 2015-02-6
PPTX
MyGene.info learn-more
PPTX
2015 functional genomics variant annotation and interpretation- tools and p...
PPT
New Generation Sequencing Technologies: an overview
PPTX
Next generation sequencing
PPTX
Next Gen Sequencing (NGS) Technology Overview
PDF
NGS technologies - platforms and applications
PPTX
Ngs ppt
PPTX
A Comparison of NGS Platforms.
PDF
Introduction to next generation sequencing
PDF
NGS - Basic principles and sequencing platforms
Variation and the VEP: Ensembl Online Webinar series
A Distributed Annotation Pipeline for MSSNG
Genome Browser
Ensembl annotation
MyGene.info talk at ISMB/BOSC 2013
Open biomedical knowledge using crowdsourcing and citizen science
Qt Framework Events Signals Threads
UCSD / DBMI seminar 2015-02-6
MyGene.info learn-more
2015 functional genomics variant annotation and interpretation- tools and p...
New Generation Sequencing Technologies: an overview
Next generation sequencing
Next Gen Sequencing (NGS) Technology Overview
NGS technologies - platforms and applications
Ngs ppt
A Comparison of NGS Platforms.
Introduction to next generation sequencing
NGS - Basic principles and sequencing platforms
Ad

Similar to Ensembl Browser Workshop (20)

PDF
Browsing Genes, Variation and Regulation data with Ensembl
PDF
Genes and Transcripts: Ensembl Online Webinar series
PDF
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
PDF
EMBL-EBI at Plant and Animal Genome conference
PPTX
Role of ensembl in genome browsing
PPT
BITs: Genome browsers and interpretation of gene lists.
PDF
Advanced Bioinformatics for Genomics and BioData Driven Research
PPTX
Understanding Genome
PDF
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
PPT
Ensembl genome
PPTX
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
PPTX
Bioinformatics t2-databases v2014
PDF
Bioinfornatics Practical Lab Manual For Biotech
PPTX
CS Lecture 2017 04-11 from Data to Precision Medicine
PDF
Variant analysis and whole exome sequencing
PDF
The ensembl database
PPT
RML NCBI Resources
PPTX
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
PDF
Annotation capabilities
PPTX
Biomart
Browsing Genes, Variation and Regulation data with Ensembl
Genes and Transcripts: Ensembl Online Webinar series
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
EMBL-EBI at Plant and Animal Genome conference
Role of ensembl in genome browsing
BITs: Genome browsers and interpretation of gene lists.
Advanced Bioinformatics for Genomics and BioData Driven Research
Understanding Genome
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
Ensembl genome
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Bioinformatics t2-databases v2014
Bioinfornatics Practical Lab Manual For Biotech
CS Lecture 2017 04-11 from Data to Precision Medicine
Variant analysis and whole exome sequencing
The ensembl database
RML NCBI Resources
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Annotation capabilities
Biomart

Recently uploaded (20)

PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Sciences of Europe No 170 (2025)
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Introduction to Cardiovascular system_structure and functions-1
ECG_Course_Presentation د.محمد صقران ppt
Introduction to Fisheries Biotechnology_Lesson 1.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
The KM-GBF monitoring framework – status & key messages.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
HPLC-PPT.docx high performance liquid chromatography
Taita Taveta Laboratory Technician Workshop Presentation.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
neck nodes and dissection types and lymph nodes levels
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Placing the Near-Earth Object Impact Probability in Context
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Sciences of Europe No 170 (2025)
Derivatives of integument scales, beaks, horns,.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
cpcsea ppt.pptxssssssssssssssjjdjdndndddd

Ensembl Browser Workshop

  • 1. Training materials Ensembl materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these, please credit Ensembl for their creation If you use Ensembl for your work, please cite our papers http://www.ensembl.org/info/about/publications.html
  • 2. Denise Carvalho-Silva European Molecular Biology Laboratory European Bioinformatics Institute Browsing Genes, Variation and Regulation data with Ensembl UCD - Dublin
  • 3. Today 09:30-17:00 •  Introduction to Ensembl •  Browser walkthrough 10:45-11:00 coffee/tea •  Browser exercises •  BioMart (Talk + Exercises) 13:00-14:00 lunch break •  Genetic variation (Talk + Exercises) 15:30-15:45 coffee/tea •  Gene Regulation and/or Custom data (Talk + Exercises) •  Wrap up, photo opportunity & feedback survey
  • 5. Course Objectives What is Ensembl? What type of data can you get in Ensembl? How to navigate the Ensembl browser website? How to connect with Ensembl
  • 6. bli blo bla blu bla bla bli blo bla blu bla bla bli blo bla blu bla bla bla blu bla bla bli blo
  • 7. Introduction Why do we need/have genome browsers?
  • 8. Genome sequencing 1977: 1st genome to be sequenced (5 kb) 2000: draft human sequence (3 gb) Large amounts of raw DNA sequence data
  • 11. Annotation of vertebrate genomeswww.ensembl.org pre.ensembl.org >80 genomes* D. melanogaster C. elegans S. cerevisae *Release 84 March 2016
  • 12. 1 human genome à 3 assemblies www.ensembl.org grch37.ensembl.org e54.ensembl.org
  • 13. EBI is an Outstation of the European Molecular Biology Laboratory. Comparative GenomicsGene models RegulationVariation Custom data displayProgrammatic access Toolkit Ensembl Features
  • 14. EBI is an Outstation of the European Molecular Biology Laboratory. Comparative GenomicsGene models RegulationVariation Custom data displayProgrammatic access Toolkit Ensembl Features
  • 16. Gene models in Ensembl Goal: Generate set of well-supported genes Automatic Manual
  • 17. •  many species •  genome-wide at once •  ~ 4 months •  fewer species •  gene by gene •  many years Automatic and coding (20_) Manual and coding (00_) Automatic + Manual (“gold”) Manual and non-coding (00_) Automatic annotation* Manual annotation* * based on experimental, biological evidence (INSDC, UniProtKB…)
  • 18. Ensembl genes & transcripts •  merged annotation •  higher confidence and quality •  comprehensive: alternatively spliced transcripts UTR ExonIntron 5’ UTR3’ UTR Gold (identical annotation) = Automatic + Manual
  • 19. Alternatively splicing rich and comprehensive annotation
  • 20. Which transcript to use? http://www.ensembl.org/Help/Glossary?id=493 http://www.ensembl.org/Help/Glossary?id=492 APPRIS TSLs
  • 21. CCDS project •  annotate a consensus coding DNA sequence set •  EBI, WTSI, UCSC and NCBI •  Genome Res. 19:1316-23 (2009) http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi CCDS transcript
  • 22. Disclaimer: which transcript to use No single method will tell us which transcript to use Decision on a case by case basis •  All transcripts OR one/two well supported ones? List of transcripts: we offer choices based on •  CCDS (Ensembl, HAVANA, NCBI, UCSC) •  Golden transcripts (identical Ensembl and HAVANA) •  Cross reference entries (e.g. UniProtKB, RefSeq) •  APPRIS •  TSLs
  • 23. Annotation based on RNASeq http://www.ensembl.org/info/genome/genebuild/ rnaseq_annotation.html
  • 25. Ensembl stable identifiers •  ENSG########### Ensembl Gene ID •  ENST########### Ensembl Transcript ID •  ENSP########### Ensembl Peptide ID •  ENSE########### Ensembl Exon ID •  For non-human species a suffix is added: ENSMUSG MUS (Mus musculus) for mouse
  • 27. Ensembl Browser Live demo: Walking through the website pages 11-31
  • 28. The ESPN gene products are active in the inner ear, where it appears to play an essential role in normal hearing and balance. Let’s explore ESPN Before we start: background
  • 29. A)  What is the location and strand of the human ESPN gene? B)  How can I view protein alignments and variants mapped to this location? C)  Can I move data tracks up and down, share and delete tracks? Human ESPN: location
  • 30. A)  How can I find the genomic sequence of this gene? What is the ID of its first exon? B)  Can I display the genomic coordinates and variants on this sequence? C)  Can I find information on the expression of this gene in different tissues? Human ESPN: gene
  • 31. A)  How many exons does the longest ESPN transcript have? Are there any completely untranslated exons? B)  Can I find its cDNA sequence? C)  What are the UniProt and RefSeq entries cross referenced to this transcript? Human ESPN: transcript
  • 33. EBI  is  an  Outsta,on  of  the  European  Molecular  Biology  Laboratory.     BioMart
  • 34. Outline •  Definitions •  The principle: 4 steps •  Tutorial: simple query in human •  Find Ensembl BioMart and BioMart elsewhere •  Sophisticated platforms: mart services, APIs, etc… •  Exercises
  • 35. Would you like to… •  … convert protein IDs into gene IDs or names? •  … get a list of all genes mapped to a region deleted in a patients’ cohort? •  … export sequences for a bunch of genes or variants? If you answered yes, keep listening!
  • 36. What is BioMart? •  Free service for easy retrieval of Ensembl data •  Data export tool with little/no programming required •  Complex queries with a few mouse clicks •  Output formats (.xls, .csv, fasta, tsv, html)
  • 37. The four-step principle DATA FILTERS ATTRIBUTES RESULTS IDs Regions Domains Expression Tables Fasta Dataset Database Homologs Sequences Features Structures
  • 39. Limit your data set (information that you know) Selecting the filters Click “Count” to see if BioMart is reading the input data
  • 40. Picking the attributes Determine output columns (information you want to know)
  • 42. Getting the results Tables/sequences click “Unique results only” For the full table: click View “ALL” rows or “Go”
  • 43. Selected IBD genes IL23R, PTPN22, CUL2, C1orf106, IL18RAP
  • 44. For the IL23R, PTPN22, CUL2, C1orf106 and IL18RAP genes, use BioMart to retrieve a table (.xls) containing: •  Associated gene name, ENSG and ENST IDs •  Chromosome name, gene start and end •  GO term name and Interpro description Tutorial: BioMart
  • 45. The four-step principle DATA FILTERS ATTRIBUTES RESULTS Gene Gene name, ENSG/ENST ID, Chr start end, GO term name, Interpro description .xls table Human IL23R, CUL2, PTPN22, C1orf106 IL18RAP
  • 50. More sophisticated platforms •  BioMart queries: MartService www.biomart.org/martservice.html •  APIs: PERL, Java, Web Services •  Third party softwares galaxyproject.orgbioconductor.orgtaverna.org.uk
  • 53. BioMart Step-by-step example pages 36-40 Exercises pages 41-43 Answers www.ebi.ac.uk/~denise/workshops/2016/ dublin/answers Feel free to explore BioMart in other contexts too!
  • 54. EBI  is  an  Outsta,on  of  the  European  Molecular  Biology  Laboratory.     Genetic Variation
  • 55. Outline •  Classes of variation, species and sources •  Browsing variation data: some entry points Location tab Gene tab Variation tab •  Phenotypic data and population genetics •  How to annotate your own variants •  Exercises
  • 56. 1) Large scale: structural (> 50 base pairs) Genetic variation duplication deletion inversion translocation loss 2) Short scale: SNPs (or SNVs), indels G A C TG A C TA T C G GG GTT TCC CA A A G A A TG A C TT T C G G- G-T TCC -A A A
  • 57. Species with variation data Understand the types of genetic variation data and how to view them in the context of our genomes
  • 58. Sources of variation data •  Import alleles and frequencies •  Annotate variants http://www.ensembl.org/info/docs/variation/sources_documentation.html
  • 59. Location tab: across a region SVsSNPs Ensembl genes
  • 61. Variation tab: variant centric summary  data   SNP or SV
  • 62. Variants on the karyotype
  • 63. Phenotype data in Ensembl species and sources
  • 64. Population data for variants http://hapmap.ncbi.nlm.nih.gov/ http://www.1000genomes.org
  • 65. pie charts: 1KG super populations Human Population Genetics
  • 66. Coffee intake is a worldwide phenomenon with Finland at the top, and UK in the 44th place. Is caffeine consumption in our genes? A)  What are the chromosome locations of variants associated with this phenotype? B)  Which variant has got the most significant association? C)  What is the ancestral allele of this variant? Is it conserved in eutherian mammals? D)  What is the most frequent allele in GBR? E)  Can you download this variant and 200 nt upstream and downstream flanking sequence in RTF (Rich Text Format)? Live demo
  • 67. before we find out •  What is it? •  What does it do? •  Where can I find it?
  • 68. I’ve got a list of genetic variants from my resequencing project of a cohort study of breast cancer in London. The positions are all on chromosome 9, GRCh37 assembly: 131090740 A/- (positive strand) 131084628 C/A (positive strand) 131085358 C/G (positive strand) 131085196 G/A (positive strand) 1)  Do any of these cause a change at the amino acid level? 2)  Are these predicted to be deleterious according to PolyPhen? 3)  Can I get the flanking sequence (200 nt both up and downstream) for the known variants in this set? A use case: cancer patients
  • 69. My resequencing experiments cancer patients X healthy controls 9 131090740 131090740 A/- 9 131084628 131084628 C/A 9 131085358 131085358 C/G 9 131085196 131085196 G/A Chromosome Alleles End Start Positions in the genome vary between the two groups
  • 70. Can I annotate these variants? •  Variant Effect Predictor •  Annotate variants (SNPs, CNVs, indels) •  Available for GRCh38 and GRCh37 (hg=19) Yes, you can! PMID: 20562413 Perl  script  Web  interface   REST  API   XML  
  • 71. CODING Synonymous INTRONIC5’ UTR ATG AAAAAAA Regulatory Splice sites CODING Missense 3’ UTR5’ Upstream 3’ downstream Mapping variants on transcripts Identify transcripts that overlap variants and predict the consequence of these on Ensembl (or RefSeq) transcripts using
  • 72. Consequence terms for variants http://www.ensembl.org/info/genome/variation/predicted_data.html#consequence_type_table * defined by the Sequence Ontology (SO) project (http://www.sequenceontology.org/)
  • 73. SIFT sift.jcvi.org/ Consequence: missense GAG >GGG Glu > Gly PolyPhen-2 genetics.bwh.harvard.edu/pph2/ Condel dbNSFP
  • 76. Output options in GAG > GGG Glu > Gly GAG > GAA Glu > Glu
  • 77. Queued Running Done Failed Save to your account (log in) Edit and resubmit your job Delete job Ticket system in Ticket identifier Job name
  • 78. Viewing results SO consequence terms* *http://www.sequenceontology.org/index.html
  • 79. ensembl.org/info/docs/tools/vep/online/results.html#summary Table •  Before / after filtering •  novel / existing variants Pie charts (consequence terms) •  total observed (more than one per variant) •  Separate chart: coding consequences Viewing results
  • 80. Navigate results (one row per variant/ transcript overlap) Show/hide columns in results table more columns: scroll right •  Download results •  Send results to BioMart Create and edit filters ensembl.org/info/docs/tools/vep/online/results.html#table results table
  • 81. Filters consist of three components Field •  e.g. Consequence, biotype Operator •  e.g. is, matches (partial string matches) Value •  the value to compare against •  some fields have autocomplete values Multiple filters allowed with logical relationship (AND, OR) Active filters can be edited too! ensembl.org/info/docs/tools/vep/online/results.html#filter Filtering results
  • 82. My resequencing experiments cancer patients X healthy controls 9 131090740 131090740 A/- 9 131084628 131084628 C/A 9 131085358 131085358 C/G 9 131085196 131085196 G/A Chromosome Alleles End Start Positions in the genome vary between the two groups
  • 85. Things to bear in mind 1)  No distinction between polymorphisms and mutations. Exception HGMD and COSMIC: all mutations; 2)  C/T à first allele is the one in the reference genome, not necessarily the major or the ancestral; 3)  Ensembl reports all alleles on the forward strand (different from dbSNP).
  • 86. Ensembl Variation API Variation Schema Description http://useast.ensembl.org/info/docs/api/variation/index.html
  • 89. Ensembl Genetic Variation Exercises pages 46-50 Answers www.ebi.ac.uk/~denise/workshops/2016/ dublin/answers Feel free to explore your favourite variant/phenotype too!
  • 90. EBI  is  an  Outsta,on  of  the  European  Molecular  Biology  Laboratory.     Gene Regulation
  • 91. Outline •  Definition and models •  Epigenetics and Epigenomics •  Ensembl Regulation: goal, data sources, species •  Viewing / accessing regulation data in Ensembl •  Track hubs: ENCODE, Blueprint •  Exercises
  • 92. Regulation of gene expression •  Change in the production of mRNA/proteins ( or ) •  From the transcription to post-translational levels •  Models of regulation of gene transcription •  Basic •  Expanded •  Complete ??
  • 93. Transcription regulation Transcription Factor Binding Sites Promoter Gene mRNA Transcription Factors Activation Repression RNA polymerase complex 2 nm basic model •  TF binding (promoters, enhancers) à transcription
  • 94. Nucleosomes Histones Histone marks CpG methylation 11 nm Transcription regulation expanded model •  Epigenetic marks may affect the binding of TFs
  • 95. Histone modifications dynamically regulating genes JillS.Butler,andSharonY.R.Dent Blood2013;121:3076-3084
  • 96. Packed Chromatin 30 nm Open Chromatin Distal enhancer Complete (???) model Transcription regulation
  • 97. Epigenetics/Epigenomics Epigenetics* The study of inherited changes in phenotype without changes in genotype Epigenomics Epigenetics on a genome-wide scale http://integratedhealthcare.eu/ *One of the routes to regulate gene transcription
  • 98. Measuring gene expression Northern/Western blot Microarrays SAGE AdpatedfromDarrylLeja,IanDunham NGS techniques DNase-seq ChIP-seq RNASeq RT-qPCR
  • 99. ChIP-sequencing crosslink and shear TF1 TF2 TF3 TF1 TF3TF2 Antibodies and IP unlink, purify and DNA sequencing Y YY TF1 TF3 TF2 ACGTC CGCTT GAACA map back to the genome DNA and proteins
  • 100. Ensembl Regulation Goal: Annotate the genome with features that may play a role in the transcriptional regulation of genes Multiple data sources: collection and summary http://www.ensembl.org/info/docs/funcgen/regulation_sources.html http://www.ensembl.org/Homo_sapiens/Experiment/
  • 101. Data source: ENCODE “Encyclopedia of DNA Elements” Trying to assign function to many regions as possible Transcription and regulatory information 4,626 datasets, 2,498 cell types à functional elements PMID: 22955616, PMID: 17571346 http://www.nature.com/encode/#/threads
  • 102. Data source: Roadmap NIH consortium: public resource of normal epigenomes DNA methylation, histone marks, open chromatin, small RNA http://www.roadmapepigenomics.org/data http://www.roadmapepigenomics.org/publications
  • 103. •  EU consortium: generate 100 reference epigenomes •  Blood cells: healthy individuals and malignant leukaemic counterparts •  1046 experiments {ChIP, RNA, Bisulfite, DNase}-Seq •  425 cell types and seven cell lines •  http://www.blueprint-epigenome.eu/ Data source: Blueprint
  • 104. Dataset for the Ensembl build raw data à Ensembl Regulation pipeline à Ensembl annotation
  • 105. Regulation data: view MultiCell: all cell lines combined Displayed by default
  • 106. Regulatory features: view Configure this page à Regulation à Regulatory features For single and individual cell lines, e.g. GM12878, HUVEC
  • 107. ChiP-Seq signal for TF signal Regulatory Features: motifs Ensembl regulatory feature Position Weight Matrix for TF (JASPAR database)
  • 108. Viewing the raw NGS data DNaseI and TFBS Histone marks and polymerases Configure this page à Regulation à Open chromatin &… Configure this page à Regulation à Histones &…
  • 109. How to choose raw data: matrix Supporting evidence: 1) Open chromatin & TFBS 2) Histones & polymerases http://tinyurl.com/matrix-ensembl
  • 110. CTCF  enriched   Predicted  Weak  Enhancer/Cis-­‐reg  element   Predicted  Transcribed  Region     Predicted  Enhancer   Predicted  Promoter  Flank   Predicted  Repressed/Low  AcAvity   Predicted  Promoter  with  TSS   Segmentation data in Ensemblcategoriesof combinedsegments Configure this page à Regulation à Regulatory features
  • 111. Experimental confirmation •  CTCF: good recall, reproducible across multiple cell lines, tight boundaries. •  TSS: •  88.9% of FANTOM 5 strict TSSs were covered. •  Enhancers: •  92.4% of 882 VISTA enhancers were detected. •  80.3% of 40279 robust FANTOM 5 enhancers were found.
  • 112. Methylation data in Ensembl CpG DNA methylation (RRBS, WGBS, MeDIP) ENCODE and PMID: 18577705 Configure this page à Regulation à DNA Methylation
  • 113. The STRADA controls tumor suppressor activities of LKB1 (https://www.wikigenes.org/) A.  What are the Ensembl regulatory features annotated in this gene? B.  Are there any features in the 5’ region of STRADA? C.  Do the regulatory features for K562, CD8+ cells (ENCODE) and erythroblast (Blueprint) differ at this region? D.  What is the stable IDs of the most 5’ regulatory feature? Tutorial
  • 115. Things to bear in mind 1)  The annotation of regulatory elements in Ensembl highlight where the biochemical data (ChIP-seq, etc) maps to on the human (mouse) genomes; 2)  Features can be nearby genes but might not affect their transcription/expression; 3)  Disclaimer: Ensembl can not tell you how your favourite gene is regulated.
  • 116. In addition to the big names CpG islands, TSS, miRNA target predictions (TarBase) Configure this page à Regulation à Other regulatory regions Configure this page à Sequence and assembly à Simple features
  • 117. Track hubs in Ensembl
  • 118. ENCODE data hub in Ensembl www.ensembl.org/info/encode.html >2,800 data tracks
  • 119. Ensembl Regulation in BioMart Human, mouse and fruit fly FILTERS ATTRIBUTES
  • 124. EBI  is  an  Outsta,on  of  the  European  Molecular  Biology  Laboratory.     Custom data display
  • 125. Outline •  Overview •  Supported file formats •  Add your own data •  Where to view your own data •  Tutorials and exercises
  • 126. Overview •  Genome browsers have pre-defined sets of data •  Need to display personal data •  Compare one’s own data to publicly available one •  Requisite: own data organised to specific rules http://www.ensembl.org/info/website/upload/index.html
  • 127. Supported files in Ensembl Sequence alignments http://www.ensembl.org/info/website/upload/index.html#formats •  BAM (compact representation) CRAM (compressed version) Flexible definition of data lines Variation data Feature information Continuous-valued data (probability scores) •  VCF: Variant Call Format •  BED (Browser Extensible Data) e.g. chr, start, end •  Gene Finding Format (GFF) General Transfer Format (GTF) •  Wig, BigWig
  • 128. Add custom data •  Data upload: small files (< 5MB; file name or URL) •  Attach your data: larger files (> 5MB; URL) Things to bear in mind Saved in a temp location (file system) Saved in a db if logged in Standard security http, https or ftp
  • 129. How can I add my data
  • 130. Where to view my data
  • 131. Structural variants in the 350-50 kb region upstream of the SOX9 cause severe dysplasia and other phenotypes. Many enhancers (e.g. E250, located at -250 kb) activate the SOX9 promoter, whereas E70 seems to be active in somatic tissues. CGH/other experiments have revealed the following deletions : 17 69872078 69886644 patient1 17 70040357 70049956 patient2 17 70111957 70116270 patient3 A)  Is any of these deletions known to be polymorphic in 1KG? B)  Would these deletions affect E250 and E70? C)  Do they map to regions of promoter activity or CpG islands? Tutorial
  • 132. Custom data display video http://tinyurl.com/ensembl-upload
  • 134. Custom data display Exercises Pages 55-57 Answers www.ebi.ac.uk/~denise/workshops/2016/ dublin/answers Feel free to attach your own data
  • 135. Wrap up Ensembl is the place! Genes, genomes, variants, regulatory features, tools and more
  • 139. Your take home message
  • 142. Training materials Ensembl materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these, please credit Ensembl for their creation If you use Ensembl for your work, please cite our papers http://www.ensembl.org/info/about/publications.html