SlideShare a Scribd company logo
Genome Resources at the EBI -
     Ensembl and Ensembl Genomes
     Bert Overduin, Ph.D.




PAG XX, January 15th 2012, San Diego
                                       EBI is an Outstation of the European Molecular Biology Laboratory. 
EBI Database Workshop
Outline

     •  Introduction to Ensembl / Ensembl Genomes
     •  Highlights in 2011
     •    Demo 1: Browser basics
     •    Demo 2: Variant Effect Predictor
     •    Demo 3: Adding custom tracks
     •    Demo 4: BioMart
     •  Future plans for 2012
     •  Help & Workshops
     •  Acknowledgements




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Goal

     To provide access to genome-scale data from
     completely sequenced species of scientific
     interest from across the taxonomy




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
History

     •  1999: Start of Ensembl project for the Human Genome Project
     •  2000: First release of data and web interface
     •  2009: First release of Ensembl Genomes
     •  2011: Ensembl v65: 63 genomes
     •  2011: Ensembl Genomes v12: 335 genomes


     •  Ensembl: EBI & Wellcome Trust Sanger Institute
     •  Ensembl Genomes: EBI




                                                                      © John Freebrey

PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Species Ensembl

             Primates
          Rodents etc.
        Laurasiatheria
            Afrotheria
             Xenartha
      Other mammals
       Birds & reptiles
           Amphibians
                  Fish
      Other chordates
     Other eukaryotes
     On Pre! Ensembl

PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Species Ensembl Genomes




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Annotation

     •  Inclusion of species depends on various criteria (model organism?
        community interest / demand? funding? completeness / quality of
        genome assembly?)
     •  A broad taxonomic coverage is aimed for



     •  Annotation in-house by the Ensembl team


     •  Annotation preferably by or in collaboration with the scientific
        community for the species in question




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Ensembl genebuild




         Genome
         assembly
                +                      Genebuild pipeline
                                                            Ensembl
      Experimental                                          Genes
       evidence
   (cDNAs & proteins)




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Data

     •  Genomic sequence
     •  Gene/transcript/protein models
     •  External references
     •  Mapped cDNAs, proteins, microarray probes, BACs, cytogenetic
        bands, markers, repeats etc.
     •  Comparative data: orthologs and paralogs, protein families, whole
        genome alignments, syntenic regions
     •  Variation data: sequence variants, structural variants
     •  Regulatory data: “best guess” set of regulatory elements




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Access to data

     •  Web browser                    http://www.ensembl.org
                                       (with US West, US East and Asia mirrors
                                       and Pre! and Archive! sites)
                                       http://www.ensemblgenomes.org

     •  BioMart                        http://www.biomart.org

     •  FTP                            ftp.ensembl.org/pub
                                       ftp.ensemblgenomes.org/pub

     •  Public MySQL server            ensembldb.ensembl.org:5306:anonymous
                                       mysql.ebi.ac.uk:4157:anonymous

     •  Ensembl API                    http://www.ensembl.org/info/docs/api



PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Highlights in 2011


     •    Genebuilds for turkey and cod
     •    Genebuild on new cow assembly (UMD 3.1)
     •    Added rabbit to whole-genome multiple alignments
     •    3-way avian whole-genome alignment and constrained elements
          (chicken, turkey, zebra finch)
     •    Variation db for cat (dbSNP127)
     •    Updated variation data for cow (dbSNP133), dog (DGVa), pig (Illumina
          PorcineSNP60 Bead Chip, DGVa)
     •    Improved Variant Effect Predictor (VEP) and failed variation pipeline
     •    Sortable tracks, saving of configurations and configuration sets
     •    Support for large file formats (BAM, BigWig, VCF)


PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Highlights in 2011


     •    31 new species
     •    Plants: Chlamydomonas reinhardtii, Cyanidioschyzon merolae, Glycine
          max, Oryza glaberrima, Selaginella moellendorffii
     •    Fungal plant pathogens: Ashbya gossypii, Fusarium oxysporum,
          Gibberella moniliformis, Gibberella zeae, Mycosphaerella graminicola,
          Nectria haematococca, Phaeosphaeria nodorum, Puccinia triticina,
          Ustilago maydis
     •    Oomycete plant pathogens: Phytophthora infestans, Phytophthora
          ramorum, Phytophthora sojae, Pythium ultimum
     •    Active collaborations within PhytoPath (http://www.phytopathdb.org/)
          and PomBase projects
     •    Variation db for Arabidopsis thaliana contains over 14 million variants
          from over 1600 strains

PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Demo 1 - Browser basics
  Background:
  The CYN gene encodes cyanate hydratase,
  an enzyme found in bacteria and plants that
  catalyses the reaction of cyanate with
  bicarbonate to produce ammonia and
  carbon dioxide:

  NCO- + HCO3- + 2H+ = NH3 + 2CO2

  Task:
  Explore the CYN gene of Vitis vinifera
  (grape).




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Variant Effect Predictor (VEP)

     •  Predicts functional consequences of variants on Ensembl genes
     •  Web interface, standalone Perl script and Perl API
     •  Accepts tab-delimited, VCF and pileup format as input




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Demo 2 - Variant Effect Predictor
  Background:
  Variants in the bestrophin 1 (BEST1) gene are
  associated with various retinal disorders in man.
  Dog is used as a model to study these. The
  following are a number of new variants discovered
  in the BEST1 gene of a Lapponian Herder:
  chr   start         end          alleles strand

  18    57500034      57500034     A/G     +
  18    57500028      57500028     G/T     +
  18    57500027      57500027     G/T     +
  18    57499959      57499958     -/C     +
  18    57499929      57499929     G/T     +
  18    57499981      57499981     G/T     +
  18    57499834      57499834     A/T     +
  18    57449754      57449754     C/T     +


  Task:
  Determine the effect of the variants on dog BEST1.
                                                       © Royal Canin

PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Adding custom tracks

     •  Upload data to Ensembl (5 MB size limit) or attach file on web-
        accessible server (http or ftp) to Ensembl (no size limit)
     •  Possible formats:

          BAM                          sequence alignments (no upload)
          BED                          genes / features
          BedGraph                     continuous-valued data
          BigWig                       continuous-valued data (no upload)
          GBrowse                      genes / features
          GFF                          genes / features
          GTF                          genes / features
          PSL                          sequence alignments
          VCF                          variants (no upload)
          WIG                          continuous-valued data



PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Demo 3 - Adding custom tracks
  Background:
  The file SRR070570.bam contains alignments
  of Illumina RNAseq reads from a wildtype
  Arabidopsis thaliana strain.
  The bam file and its bam.bai index file are
  located at http://www.ebi.ac.uk/~bert/.

  Task:
  Attach SRR070570.bam to Ensembl Genomes.
  Check the expression of a constitutive and a
  non-constitutive Arabidopsis gene, e.g.
  RBCS1A (ribulose bisphosphate carboxylase
  small chain 1A) and PR1 (pathogenesis-related
  protein 1).

PAG XX, January 15th 2012, San Diego
EBI Database Workshop
BioMart

     •  Data retrieval tool
     •  Originally developed for Ensembl (EnsMart)
     •  Now used by many large data resources
     •  Integrated with several widely used software packages
     •  Joint project between the European Bioinformatics Institute (EBI)
        and the Ontario Institute for Cancer Research (OICR)
     •  Website : http://www.biomart.org




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Principle

     •  Step 1 – Dataset
        Choose your dataset

     •  Step 2 – Filters
        Limit your dataset

     •  Step 3 – Attributes
        Specify what information you want to output

     •  Step 4 – Results
        Preview and output your results




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Demo 4 - BioMart
  Background:
  “Lactation” (GO:0007595) is the
  Gene Ontology (GO) term for the
  biological process of “the secretion
  of milk by the mammary gland”.

  Task:
  Retrieve all cow genes that are
  annotated with the GO term
  “lactation”.




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Future plans for 2012


     •  Genebuilds for duck (?), salmon (?), sheep (?), tilapia
     •  Genebuilds on new assemblies for cat (Felis_catus-6.2), chicken
        (Gallus_gallus-4.0), dog (CanFam3.1), pig (Sscrofa10.2)
     •  Include RNAseq data in genebuild
     •  VEP support for structural variants
     •  New BLAST/BLAT interface
     •  http://www.ensembl.info/roadmap




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Future plans for 2012


     •  New species: barley, Brassica (from BrassEnsembl), foxtail millet,
        Oryza brachyanta, potato, tomato, Gaeumannomyces graminis,
        Magnaporthe oryzae, Magnaporthe poae, tsetse fly
     •  New assemblies: maize (B73_RefGen_v3), Oryza sativa ssp.
        japonica cu. Nipponbare (Os-Nipponbare-Reference-IRGSP-1.0;
        IRGSP1.0), poplar
     •  Variation db and new gene annotation for wheat stem rust pathogen
     •  New query interface for data re plant-fungal pathogen interactions
        (PhytoPath; http://www.phytopathdb.org/)
     •  Widened development of community annotation pipelines




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Help

     •  Helpdesk:
          helpdesk@ensembl.org
          helpdesk@ensemblgenomes.org

     •  Mailing lists:
          http://www.ensembl.org/info/about/contact/mailing.html
          http://plants.ensembl.org/info/about/contact/mailing.html

     •  Ensembl YouTube and YouKu (              ) channels:
          http://www.youtube.com/user/EnsemblHelpdesk
          http://u.youku.com/user_show/uid_Ensemblhelpdesk




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
EBI Train online




      http://www.ebi.ac.uk/training/online/course/ensembl-browsing-chordate-genomes

PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Workshops




until now:                     in 2011:
49 countries on 5 continents   ~ 90 workshops
Workshops

     •  Browser (0.5-2 days) and API (1-3 days) workshops
     •  Combination of lectures and hands-on exercises
     •  Advertised on http://www.ensembl.info/workshops/calendar/


     •  You can host your own workshop!
     •  For academic institutions there is, apart from the instructor’s
        expenses, no fee
     •  You only need a computer room and participants
     •  You can get more info from me (bert@ebi.ac.uk) or at the EBI booth
        (302)




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Stay in touch

     •  Blog:
          http://www.ensembl.info

     •  Facebook:
          http://www.facebook.com/Ensembl.org

     •  Twitter:
          http://twitter.com/Ensembl




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Acknowledgements



      •  WTSI                          •    CADRE                  •  OICR
                                       •    Gramene
                                       •    VectorBase
                                       •    WormBase
                                       •    PomBase

      •    Wellcome Trust              •  EMBL
      •    NIH-NHGRI                   •  BBSRC
      •    EMBL                        •  Wellcome Trust
      •    EU                          •  Bill and Melinda Gates
                                          Foundation
                                       •  EU



PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Acknowledgements

Paul Flicek, Ridwan Amode, Daniel Barrell, Kathryn Beal, Simon Brent, Denise Carvalho-Silva,
Clapham P, Guy Coates, Susan Fairley, Stephen Fitzgerald, Laurent Gil, Leo Gordon, Maurice
Hendrix, Thibaut Hourlier, Nathan Johnson, Andreas Kähäri, Damian Keefe, Stephen Keenan,
Rhoda Kinsella, Monika Komorowska, Gautier Koscielny, Eugene Kulesha, Pontus Larsson,
Ian Longden, Will McLaren, Matthieu Muffato, Bert Overduin, Miguel Pignatelli, Bethan
Pritchard, Harpreet Riat, Graham Ritchie, Magali Ruffier, Michael Schuster, Daniel Sobral,
Amy Tang, Kieron Taylor, Stephen Trevanion, Jana Vandrovcova, Simon White, Mark Wilson,
Steven Wilder, Bronwen Aken, Ewan Birney, Fiona Cunningham, Ian Dunham, Richard Durbin,
Xosé Fernández-Suarez, Jennifer Harrow, Javier Herrero, Tim Hubbard, Anne Parker, Glenn
Proctor, Giulietta Spudich, Jan Vogel, Andy Yates, Amonida Zadissa, Steve Searle

Paul Kersey, Dan Staines, Dan Lawson, Eugene Kulesha, Paul Derwent, Jay Humphrey,
Daniel Hughes, Stephen Keenan, Arnaud Kerhornou, Gautier Koscielny, Nick Langridge, Mark
McDowall , Karyn Megy, Uma Maheswari, Michael Nuhn, Michael Paulini, Helder Pedro, Iliana
Toneva, Derek Wilson, Andy Yates, Ewan Birney
Posters

      P941
      Genome Annotation in Ensembl
      Susan Fairley

      P942
      Ensembl Plants: An Integrating Resource for Plant
      Genomics and Variation
      Paul Kersey




PAG XX, January 15th 2012, San Diego
EBI Database Workshop
Training courses                                                                Careers


                       Meet the experts
                                                         Brochures and factsheets



     Come and see us at booth 302!

 PhD and post doc opportunities
                                                                     Industry programme


                                       Research and services



            Visitor’s programme
PAG XX, January 15th 2012, San Diego
                                                          EBI is an Outstation of the European Molecular Biology Laboratory. 
EBI Database Workshop
PDF of this presentation

      http://www.ebi.ac.uk/~bert/past_workshops.html




PAG XX, January 15th 2012, San Diego
EBI Database Workshop

More Related Content

PDF
Ensembl Browser Workshop
PPTX
Ensembl annotation
PDF
The ensembl database
PPT
Ensembl genome
PPTX
Web based servers and softwares for genome analysis
PDF
Variation and the VEP: Ensembl Online Webinar series
PDF
Browsing Genes, Variation and Regulation data with Ensembl
PDF
TGAC Browser bosc 2014
Ensembl Browser Workshop
Ensembl annotation
The ensembl database
Ensembl genome
Web based servers and softwares for genome analysis
Variation and the VEP: Ensembl Online Webinar series
Browsing Genes, Variation and Regulation data with Ensembl
TGAC Browser bosc 2014

What's hot (20)

PDF
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
PPT
BITs: Genome browsers and interpretation of gene lists.
PDF
BITS: UCSC genome browser - Part 1
PPTX
NCBI Boot Camp for Beginners Slides
PPT
B.sc biochem i bobi u 2 database
PPT
Intro bioinfo
PPT
Proteome databases
PPT
Biodatabases 101220022654-phpapp02
PPTX
PPT
Biological databases
PPTX
Genomic databases
PPTX
Introduction OF BIOLOGICAL DATABASE
PDF
Bioinformatics in a Nutshell
PDF
Ontologies for life sciences: examples from the gene ontology
PDF
BITS: Basics of sequence databases
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
BITs: Genome browsers and interpretation of gene lists.
BITS: UCSC genome browser - Part 1
NCBI Boot Camp for Beginners Slides
B.sc biochem i bobi u 2 database
Intro bioinfo
Proteome databases
Biodatabases 101220022654-phpapp02
Biological databases
Genomic databases
Introduction OF BIOLOGICAL DATABASE
Bioinformatics in a Nutshell
Ontologies for life sciences: examples from the gene ontology
BITS: Basics of sequence databases
Ad

Viewers also liked (20)

PDF
Genome Browser
PPTX
Functionally annotate genomic variants
PPTX
The Application of the Human Phenotype Ontology
PPT
Windows Vista
PDF
Genome Assembly
PDF
The UCSC genome browser: A Neuroscience focused overview
PPT
PPTX
Genome Mapping
PDF
Presentazione Tesi: Terra di Mezzo
PPTX
Shot Types
DOC
Jose esteves 1
PDF
Sub formulario2
PPTX
Consulta de portales
PPS
Beautiful
PPTX
Advertising wed
PPT
Chapter XI Board and Board Provisions (Cos Act 2013)
PDF
Formulario devoluciones
PPT
Mayrikis voski dzerqer
PPTX
Discover Great Reasons to move to ConfigMgr 2012 SP1
PPTX
50 states
Genome Browser
Functionally annotate genomic variants
The Application of the Human Phenotype Ontology
Windows Vista
Genome Assembly
The UCSC genome browser: A Neuroscience focused overview
Genome Mapping
Presentazione Tesi: Terra di Mezzo
Shot Types
Jose esteves 1
Sub formulario2
Consulta de portales
Beautiful
Advertising wed
Chapter XI Board and Board Provisions (Cos Act 2013)
Formulario devoluciones
Mayrikis voski dzerqer
Discover Great Reasons to move to ConfigMgr 2012 SP1
50 states
Ad

Similar to Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes (20)

PPTX
Understanding Genome
PPTX
Bioinformatica t2-databases
PDF
The European Nucleotide Archive
 
PPTX
BITS training - UCSC Genome Browser - Part 2
PPTX
Major databases in bioinformatics
PPT
Introduction to Bioinformatics and DatabasesDay1.ppt
PPT
RML NCBI Resources
PDF
VectorBase gene sets
PDF
Bioinformatics Introduction
PPTX
bioinformatics presentation in the master presentation
PDF
EMBL-EBI at Plant and Animal Genome conference
PPTX
Bioinformatics t2-databases v2014
PDF
BITS: Overview of important biological databases beyond sequences
PDF
database retrival.pdf
PPTX
Cool Informatics Tools and Services for Biomedical Research
PDF
Advanced Bioinformatics for Genomics and BioData Driven Research
PPT
Bioinformatic_Databases_2.ppt
PPT
Bioinformatic_Databases_2xcxzczxcxzxcxzc
PPT
Bioinformatic databases 2
PPT
Bioinformatic databases 2
Understanding Genome
Bioinformatica t2-databases
The European Nucleotide Archive
 
BITS training - UCSC Genome Browser - Part 2
Major databases in bioinformatics
Introduction to Bioinformatics and DatabasesDay1.ppt
RML NCBI Resources
VectorBase gene sets
Bioinformatics Introduction
bioinformatics presentation in the master presentation
EMBL-EBI at Plant and Animal Genome conference
Bioinformatics t2-databases v2014
BITS: Overview of important biological databases beyond sequences
database retrival.pdf
Cool Informatics Tools and Services for Biomedical Research
Advanced Bioinformatics for Genomics and BioData Driven Research
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic databases 2
Bioinformatic databases 2

More from EBI (6)

PPTX
The annotation of plant proteins in UniProtKB
 
PPT
UniProt-GOA
 
PPTX
InterPro and InterProScan 5.0
 
PPT
Automatic Annotation in UniProtKB
 
PDF
The Vertebrate Genome Annotation Database
 
PPT
Train online
 
The annotation of plant proteins in UniProtKB
 
UniProt-GOA
 
InterPro and InterProScan 5.0
 
Automatic Annotation in UniProtKB
 
The Vertebrate Genome Annotation Database
 
Train online
 

Recently uploaded (20)

PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Mushroom cultivation and it's methods.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
August Patch Tuesday
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
A comparative analysis of optical character recognition models for extracting...
SOPHOS-XG Firewall Administrator PPT.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
1. Introduction to Computer Programming.pptx
Empathic Computing: Creating Shared Understanding
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
Mushroom cultivation and it's methods.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Accuracy of neural networks in brain wave diagnosis of schizophrenia
gpt5_lecture_notes_comprehensive_20250812015547.pdf
cloud_computing_Infrastucture_as_cloud_p
August Patch Tuesday
Network Security Unit 5.pdf for BCA BBA.
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...

Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes

  • 1. Genome Resources at the EBI - Ensembl and Ensembl Genomes Bert Overduin, Ph.D. PAG XX, January 15th 2012, San Diego EBI is an Outstation of the European Molecular Biology Laboratory. EBI Database Workshop
  • 2. Outline •  Introduction to Ensembl / Ensembl Genomes •  Highlights in 2011 •  Demo 1: Browser basics •  Demo 2: Variant Effect Predictor •  Demo 3: Adding custom tracks •  Demo 4: BioMart •  Future plans for 2012 •  Help & Workshops •  Acknowledgements PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 3. Goal To provide access to genome-scale data from completely sequenced species of scientific interest from across the taxonomy PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 4. History •  1999: Start of Ensembl project for the Human Genome Project •  2000: First release of data and web interface •  2009: First release of Ensembl Genomes •  2011: Ensembl v65: 63 genomes •  2011: Ensembl Genomes v12: 335 genomes •  Ensembl: EBI & Wellcome Trust Sanger Institute •  Ensembl Genomes: EBI © John Freebrey PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 5. Species Ensembl Primates Rodents etc. Laurasiatheria Afrotheria Xenartha Other mammals Birds & reptiles Amphibians Fish Other chordates Other eukaryotes On Pre! Ensembl PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 6. Species Ensembl Genomes PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 7. Annotation •  Inclusion of species depends on various criteria (model organism? community interest / demand? funding? completeness / quality of genome assembly?) •  A broad taxonomic coverage is aimed for •  Annotation in-house by the Ensembl team •  Annotation preferably by or in collaboration with the scientific community for the species in question PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 8. Ensembl genebuild Genome assembly + Genebuild pipeline Ensembl Experimental Genes evidence (cDNAs & proteins) PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 9. Data •  Genomic sequence •  Gene/transcript/protein models •  External references •  Mapped cDNAs, proteins, microarray probes, BACs, cytogenetic bands, markers, repeats etc. •  Comparative data: orthologs and paralogs, protein families, whole genome alignments, syntenic regions •  Variation data: sequence variants, structural variants •  Regulatory data: “best guess” set of regulatory elements PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 10. Access to data •  Web browser http://www.ensembl.org (with US West, US East and Asia mirrors and Pre! and Archive! sites) http://www.ensemblgenomes.org •  BioMart http://www.biomart.org •  FTP ftp.ensembl.org/pub ftp.ensemblgenomes.org/pub •  Public MySQL server ensembldb.ensembl.org:5306:anonymous mysql.ebi.ac.uk:4157:anonymous •  Ensembl API http://www.ensembl.org/info/docs/api PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 11. Highlights in 2011 •  Genebuilds for turkey and cod •  Genebuild on new cow assembly (UMD 3.1) •  Added rabbit to whole-genome multiple alignments •  3-way avian whole-genome alignment and constrained elements (chicken, turkey, zebra finch) •  Variation db for cat (dbSNP127) •  Updated variation data for cow (dbSNP133), dog (DGVa), pig (Illumina PorcineSNP60 Bead Chip, DGVa) •  Improved Variant Effect Predictor (VEP) and failed variation pipeline •  Sortable tracks, saving of configurations and configuration sets •  Support for large file formats (BAM, BigWig, VCF) PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 12. Highlights in 2011 •  31 new species •  Plants: Chlamydomonas reinhardtii, Cyanidioschyzon merolae, Glycine max, Oryza glaberrima, Selaginella moellendorffii •  Fungal plant pathogens: Ashbya gossypii, Fusarium oxysporum, Gibberella moniliformis, Gibberella zeae, Mycosphaerella graminicola, Nectria haematococca, Phaeosphaeria nodorum, Puccinia triticina, Ustilago maydis •  Oomycete plant pathogens: Phytophthora infestans, Phytophthora ramorum, Phytophthora sojae, Pythium ultimum •  Active collaborations within PhytoPath (http://www.phytopathdb.org/) and PomBase projects •  Variation db for Arabidopsis thaliana contains over 14 million variants from over 1600 strains PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 13. Demo 1 - Browser basics Background: The CYN gene encodes cyanate hydratase, an enzyme found in bacteria and plants that catalyses the reaction of cyanate with bicarbonate to produce ammonia and carbon dioxide: NCO- + HCO3- + 2H+ = NH3 + 2CO2 Task: Explore the CYN gene of Vitis vinifera (grape). PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 14. Variant Effect Predictor (VEP) •  Predicts functional consequences of variants on Ensembl genes •  Web interface, standalone Perl script and Perl API •  Accepts tab-delimited, VCF and pileup format as input PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 15. Demo 2 - Variant Effect Predictor Background: Variants in the bestrophin 1 (BEST1) gene are associated with various retinal disorders in man. Dog is used as a model to study these. The following are a number of new variants discovered in the BEST1 gene of a Lapponian Herder: chr start end alleles strand 18 57500034 57500034 A/G + 18 57500028 57500028 G/T + 18 57500027 57500027 G/T + 18 57499959 57499958 -/C + 18 57499929 57499929 G/T + 18 57499981 57499981 G/T + 18 57499834 57499834 A/T + 18 57449754 57449754 C/T + Task: Determine the effect of the variants on dog BEST1. © Royal Canin PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 16. Adding custom tracks •  Upload data to Ensembl (5 MB size limit) or attach file on web- accessible server (http or ftp) to Ensembl (no size limit) •  Possible formats: BAM sequence alignments (no upload) BED genes / features BedGraph continuous-valued data BigWig continuous-valued data (no upload) GBrowse genes / features GFF genes / features GTF genes / features PSL sequence alignments VCF variants (no upload) WIG continuous-valued data PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 17. Demo 3 - Adding custom tracks Background: The file SRR070570.bam contains alignments of Illumina RNAseq reads from a wildtype Arabidopsis thaliana strain. The bam file and its bam.bai index file are located at http://www.ebi.ac.uk/~bert/. Task: Attach SRR070570.bam to Ensembl Genomes. Check the expression of a constitutive and a non-constitutive Arabidopsis gene, e.g. RBCS1A (ribulose bisphosphate carboxylase small chain 1A) and PR1 (pathogenesis-related protein 1). PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 18. BioMart •  Data retrieval tool •  Originally developed for Ensembl (EnsMart) •  Now used by many large data resources •  Integrated with several widely used software packages •  Joint project between the European Bioinformatics Institute (EBI) and the Ontario Institute for Cancer Research (OICR) •  Website : http://www.biomart.org PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 19. Principle •  Step 1 – Dataset Choose your dataset •  Step 2 – Filters Limit your dataset •  Step 3 – Attributes Specify what information you want to output •  Step 4 – Results Preview and output your results PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 20. Demo 4 - BioMart Background: “Lactation” (GO:0007595) is the Gene Ontology (GO) term for the biological process of “the secretion of milk by the mammary gland”. Task: Retrieve all cow genes that are annotated with the GO term “lactation”. PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 21. Future plans for 2012 •  Genebuilds for duck (?), salmon (?), sheep (?), tilapia •  Genebuilds on new assemblies for cat (Felis_catus-6.2), chicken (Gallus_gallus-4.0), dog (CanFam3.1), pig (Sscrofa10.2) •  Include RNAseq data in genebuild •  VEP support for structural variants •  New BLAST/BLAT interface •  http://www.ensembl.info/roadmap PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 22. Future plans for 2012 •  New species: barley, Brassica (from BrassEnsembl), foxtail millet, Oryza brachyanta, potato, tomato, Gaeumannomyces graminis, Magnaporthe oryzae, Magnaporthe poae, tsetse fly •  New assemblies: maize (B73_RefGen_v3), Oryza sativa ssp. japonica cu. Nipponbare (Os-Nipponbare-Reference-IRGSP-1.0; IRGSP1.0), poplar •  Variation db and new gene annotation for wheat stem rust pathogen •  New query interface for data re plant-fungal pathogen interactions (PhytoPath; http://www.phytopathdb.org/) •  Widened development of community annotation pipelines PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 23. Help •  Helpdesk: helpdesk@ensembl.org helpdesk@ensemblgenomes.org •  Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html http://plants.ensembl.org/info/about/contact/mailing.html •  Ensembl YouTube and YouKu ( ) channels: http://www.youtube.com/user/EnsemblHelpdesk http://u.youku.com/user_show/uid_Ensemblhelpdesk PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 24. EBI Train online http://www.ebi.ac.uk/training/online/course/ensembl-browsing-chordate-genomes PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 25. Workshops until now: in 2011: 49 countries on 5 continents ~ 90 workshops
  • 26. Workshops •  Browser (0.5-2 days) and API (1-3 days) workshops •  Combination of lectures and hands-on exercises •  Advertised on http://www.ensembl.info/workshops/calendar/ •  You can host your own workshop! •  For academic institutions there is, apart from the instructor’s expenses, no fee •  You only need a computer room and participants •  You can get more info from me (bert@ebi.ac.uk) or at the EBI booth (302) PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 27. Stay in touch •  Blog: http://www.ensembl.info •  Facebook: http://www.facebook.com/Ensembl.org •  Twitter: http://twitter.com/Ensembl PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 28. Acknowledgements •  WTSI •  CADRE •  OICR •  Gramene •  VectorBase •  WormBase •  PomBase •  Wellcome Trust •  EMBL •  NIH-NHGRI •  BBSRC •  EMBL •  Wellcome Trust •  EU •  Bill and Melinda Gates Foundation •  EU PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 29. Acknowledgements Paul Flicek, Ridwan Amode, Daniel Barrell, Kathryn Beal, Simon Brent, Denise Carvalho-Silva, Clapham P, Guy Coates, Susan Fairley, Stephen Fitzgerald, Laurent Gil, Leo Gordon, Maurice Hendrix, Thibaut Hourlier, Nathan Johnson, Andreas Kähäri, Damian Keefe, Stephen Keenan, Rhoda Kinsella, Monika Komorowska, Gautier Koscielny, Eugene Kulesha, Pontus Larsson, Ian Longden, Will McLaren, Matthieu Muffato, Bert Overduin, Miguel Pignatelli, Bethan Pritchard, Harpreet Riat, Graham Ritchie, Magali Ruffier, Michael Schuster, Daniel Sobral, Amy Tang, Kieron Taylor, Stephen Trevanion, Jana Vandrovcova, Simon White, Mark Wilson, Steven Wilder, Bronwen Aken, Ewan Birney, Fiona Cunningham, Ian Dunham, Richard Durbin, Xosé Fernández-Suarez, Jennifer Harrow, Javier Herrero, Tim Hubbard, Anne Parker, Glenn Proctor, Giulietta Spudich, Jan Vogel, Andy Yates, Amonida Zadissa, Steve Searle Paul Kersey, Dan Staines, Dan Lawson, Eugene Kulesha, Paul Derwent, Jay Humphrey, Daniel Hughes, Stephen Keenan, Arnaud Kerhornou, Gautier Koscielny, Nick Langridge, Mark McDowall , Karyn Megy, Uma Maheswari, Michael Nuhn, Michael Paulini, Helder Pedro, Iliana Toneva, Derek Wilson, Andy Yates, Ewan Birney
  • 30. Posters P941 Genome Annotation in Ensembl Susan Fairley P942 Ensembl Plants: An Integrating Resource for Plant Genomics and Variation Paul Kersey PAG XX, January 15th 2012, San Diego EBI Database Workshop
  • 31. Training courses Careers Meet the experts Brochures and factsheets Come and see us at booth 302! PhD and post doc opportunities Industry programme Research and services Visitor’s programme PAG XX, January 15th 2012, San Diego EBI is an Outstation of the European Molecular Biology Laboratory. EBI Database Workshop
  • 32. PDF of this presentation http://www.ebi.ac.uk/~bert/past_workshops.html PAG XX, January 15th 2012, San Diego EBI Database Workshop