SlideShare a Scribd company logo
Using EMBL-EBI resources
to explore stem cell data
Rafael Jimenez
Stem Cells and BioInformatics
Scottish Stem Cell Network
21-22 September
23.08.182
Content of this presentation
• EMBL-EBI overview
• EBI Resources
• Data integration and standards
• Queering EBI resources: examples
23.08.183
EMBL-EBI
23.08.184
EMBL-EBI
• Non-profit organisation
• Based on the Wellcome Trust Genome Campus
near Cambridge, UK
• Part of the European Molecular Biology Laboratory
23.08.185
The five branches of EMBL
Mouse biology
Monterotondo
Structural biology
Grenoble
Bioinformatics
Hinxton
Structural biology
Hamburg
Basic research in
molecular biology
Administration
EMBO
Heidelberg
• EMBL is a basic research institute funded by public
research monies from 20 member states
• 1400 staff, over 60 nationalities
23.08.186
EMBL-EBI’s mission
• To provide freely available data and bioinformatics
servicesservices to all facets of the scientific community in ways
that promote scientific progress
• To contribute to the advancement of biology through
basic investigator-driven researchresearch in bioinformatics
• To provide advanced bioinformatics trainingtraining to scientists
at all levels, from PhD students to independent
investigators
• To help disseminate cutting-edge technologies to
industryindustry
23.08.187
Research groups
Transcriptome
analysis
Brazma, Huber
Transcriptome
analysis
Brazma, Huber
Text mining
Rebholz-Schuhmann
Text mining
Rebholz-Schuhmann
Protein annotation
Apweiler
Protein annotation
Apweiler
Structural
bioinformatics
Thornton
Structural
bioinformatics
Thornton
Pathways, networks,
systems
Le Novère
Pathways, networks,
systems
Le Novère
Cheminformatics
Steinbeck,
Overington
Cheminformatics
Steinbeck,
Overington
Genome analysis
Birney, Flicek,
Enright, Goldman
Genome analysis
Birney, Flicek,
Enright, Goldman
Regulatory networks
Luscombe
Regulatory networks
Luscombe
Differentiation
and
development
Bertone
Differentiation
and
development
Bertone
23.08.188
Bioinformatics
Roadshow
eLearning
programme
Hands-on
training at EMBL-
EBI
MBL
ics
v me
g
A tripartite user-training programme
Training comes to you
www.ebi.ac.uk/training/roadshow
Training comes to you
www.ebi.ac.uk/training/roadshow
Training any time, anywhere, at
any pace
www.ebi.ac.uk/training/elearning
Training any time, anywhere, at
any pace
www.ebi.ac.uk/training/elearning
Hands-on user training on all our
core data resources for lab-based
researchers
www.ebi.ac.uk/training/handson
Hands-on user training on all our
core data resources for lab-based
researchers
www.ebi.ac.uk/training/handson
23.08.189
Hands-on training for all levels of experience
• Interactive training in our purpose-built IT training suite
at EMBL-EBI, Hinxton, Cambridge
• Learn from the EBI’s experts through a combination of
talks and practical exercises
• Take a tour of all our core data resources, or focus in on
specific data types
• Full programme at www.ebi.ac.uk/training/handson
Wellcome Images
23.08.1810
http://www.ebi.ac.uk/training/handson/
Genomics, proteomics, transcriptomics, protein structures…
23.08.1811
Moodle-based eLearning platform
Courses availableCourses available
www.ebi.ac.uk/training/elearning
•EBI and EB-eye
•Sequence searching
•Patent searching
•Literature searching
•Ensembl
•Transcriptomics
23.08.1812
Each course is modular
Video tutorial
learn by watching
and listening
Video tutorial
learn by watching
and listening
A course contains 3–5 modules (~30 min each)
Each module contains…
Print tutorial
Learn by reading
Print tutorial
Learn by reading
Quiz
Learn by testing
your
understanding
Quiz
Learn by testing
your
understanding
Reflective task
Learn by
practicing
Reflective task
Learn by
practicing
Please beta-test and provide feedback!
23.08.1813
The EBI Industry Programme
• Enables industry to adapt quickly to, and maximise the
benefit from, innovations in bioinformatics.
• Membership benefits include:
• Research of benefit to industry
• Expert training
• Standards development
• Technical development
• Networking opportunities
• Membership is by invitation and members subscribe
on an annual basis
EBI Resources
23.08.1815
Databases: molecules to systems
Genomes
Ensembl
Ensembl Genomes
EGA
Genomes
Ensembl
Ensembl Genomes
EGA
Nucleotide sequence
EMBL-Bank
Nucleotide sequence
EMBL-Bank
Microarray & gene
expression data
ArrayExpress
Microarray & gene
expression data
ArrayExpress
Proteomes
UniProt, PRIDE
Proteomes
UniProt, PRIDE
Protein families,
motifs and domains
InterPro
Protein families,
motifs and domains
InterPro
Protein structure
MSD
Protein structure
MSD
Protein interactions
IntAct
Protein interactions
IntAct
Chemical entities
ChEBI
Chemical entities
ChEBI
Pathways
Reactome
Pathways
Reactome
Systems
BioModels
Systems
BioModels
Literature and ontologies
CiteXplore, GO
Literature and ontologies
CiteXplore, GO
23.08.1816
EBI website and search engine EB-eye
Search all main
databases in one go
Search all main
databases in one go
Refine your searchRefine your search
Advanced search:
drill down to specific
fields in specific
databases
Advanced search:
drill down to specific
fields in specific
databases
23.08.1817
Genomes 1: Ensembl
Across species Within species
SyntenySynteny
Pick a genomePick a genome
OrthologyOrthology
Genomic alignmentsGenomic alignments
Gene familiesGene families
SNPsSNPs
GenesGenes
ChromosomesChromosomes
23.08.1818
Genomes 2: Ensembl Genomes
Ensembl-like genome browser for
non-vertebrate species
Ensembl-like genome browser for
non-vertebrate species
Ensembl
Metazoa
Ensembl
Metazoa
Ensembl BacteriaEnsembl Bacteria
Using view options, you
can select to view only the
current gene or the entire
expanded gene tree.
Select Orthologue
view to see putative
orthologues.
Across species View options
23.08.1819
• Keyword and
sequence
searching
• Map-based
search of
environmental
samples
• Downloads
Nucleotides: EMBL-Bank
EMBL-Bank
DDBJ GenBank
www.insdc.org
• Direct
submissions
• Patents
• Genome-
sequencing
projects
• Updates
• Third-party
annotation
23.08.1820
Transcriptomes: ArrayExpress
Search by experiment
Search by keywordSearch by keyword
Link to sample
properties and
experiment design
Link to sample
properties and
experiment design
View experimentView experiment
Search by gene across experiments
Browse results summaryBrowse results summary
Search by gene name,
species and
experimental condition
Search by gene name,
species and
experimental condition
View
expression
under different
conditions and
profiles
View
expression
under different
conditions and
profiles
23.08.1821
Protein sequence: UniProt
UniProt
• Manual curation
• Literature-based
annotation
• Sequence analysis
• Automated
annotation
PRIDE
GO
InterPro
IntAct
IntEnz
HAMAP
RESID
Functional infoFunctional info
Protein identification
data
Protein identification
data
Protein families and
domains
Protein families and
domains
Molecular
interactions
Molecular
interactions
EnzymesEnzymes
Microbial protein
families
Microbial protein
families
Post-translational
modifications
Post-translational
modifications
Someda
Transmembrane
prediction
Transmembrane
prediction
InterPro
classification
InterPro
classification
Signal predictionSignal prediction
Other predictionsOther predictions
Protein
classification
23.08.1822
Protein families, motifs and domains: InterPro
Powerful tool for protein
classification, integrating several
methods into one resource
View architectures of proteins
containing a signature
View architectures of proteins
containing a signature
Compare methods of protein
signature prediction
Compare methods of protein
signature prediction
Visualize the taxonomic range
for a protein signature
Visualize the taxonomic range
for a protein signature
23.08.1823
Proteomics services
IntAct: molecular interactionsIntAct: molecular interactions
INTENZ: enzyme classificationINTENZ: enzyme classification
ChEBI: small moleculesChEBI: small molecules
PRIDE: protein identifications
from proteomics experiments
PRIDE: protein identifications
from proteomics experiments
23.08.1824
ChEBI - Chemical Entities of Biological Interest
23.08.1825
Structures: PDBe
LigandsLigands
Sequence
mapping
Sequence
mapping
Linking to
domain data
Linking to
domain data
AssembliesAssemblies
Surface
matching
Surface
matching
Fold matchingFold matching
Active sitesActive sites
Electron
density
visualization
Electron
density
visualization
23.08.1826
Pathways: Reactome
View reactions and events in
detail
View reactions and events in
detail
Select a
pathway
Select a
pathway
Export pathway
to your favourite
modelling
software
Export pathway
to your favourite
modelling
software
Compare events in
different species
Compare events in
different species
Link to source
databases
Link to source
databases
23.08.1827
Standards and Integration
23.08.1828
Molecular Biology Database resources
Human Genes and
Diseases
13%
Proteomics Resources
1%
Other Molecular
Biology Databases
3%
Immunological
databases
2%
Plant databases
7%
Organelle databases
2%
Human and other
Vertebrate Genomes
8%
Nucleotide Sequence
Databases
9%
RNA sequence
databases
5%
Protein sequence
databases
13%
Structure Databases
9%
,Genomics Databases
non-vertebrate
19%
Metabolic and
Signaling Pathways
9%
Nucleic Acids Research annual
Database Issue and the NAR online
Molecular Biology Database
Collection in 2009. MY Galperin, GR
Cochrane - Nucleic Acids Research,
2008
~1440
resources
23.08.1829
Utility of bioinformaticsScientificimpact
Too little
bioinformatics
Too many databases
Too diverse interfaces
Tim Hubbard
23.08.1830
Many databases VS integration
DB
GUI
DB DB DB
SP SP SP SP
DB
GUI
DB
GUI
DB
GUI
DB
GUI
Many databases Integration
Database Graphical User InterfaceGUI User Standard protocolSP
23.08.1831
Utility of bioinformaticsScientificimpact
Too little
bioinformatics
Too many databases
Too diverse interfaces
Integration of
23.08.1832
Data integration
• Combining data residing in different sources
• … and providing users with a unified standards.
Main objective Requires
• Share
• Compare
• Integrate
• Data from the same domain
• Data from different domains
• Federated systems
• Standard formats
• Mapping tools
• Ontologies
23.08.1833
Data integration
• Federated systems
• DAS
• PSICQUIC
• …
• Standard formats
• DAS
• PSI-MI
• BioPAX
• SBML
• CellML
• …
• Mapping tools
• PICR
• Uniprot API
• Ensembl API
• Biomart
• …
• Ontologies
• OLS
• …
23.08.1834
Standards development – international collaborations
Genome annotation
www.geneontology.org
Genome annotation
www.geneontology.org
Microarray and Gene
Expression Data (MGED)
www.mged.org
Microarray and Gene
Expression Data (MGED)
www.mged.org
Protein sequence
www.uniprot.org
Protein sequence
www.uniprot.org
HUPO-
Proteomics
Standards
Initiative (PSI)
Psidev.sf.net
HUPO-
Proteomics
Standards
Initiative (PSI)
Psidev.sf.net
Protein structure
www.wwpdb.org
Protein structure
www.wwpdb.org
Cheminformatics
www.ebi.ac.uk/chebi
Cheminformatics
www.ebi.ac.uk/chebi
Pathways
www.reactome.org
www.biopax.org
Pathways
www.reactome.org
www.biopax.org Systems modelling
standards
www.sbml.org
Systems modelling
standards
www.sbml.orgMetabolomics Standards Initiative (MSI)
www.metabolomicssociety.org
Metabolomics Standards Initiative (MSI)
www.metabolomicssociety.org
Genomics Standards Consortium (GSC)
gensc.org
Genomics Standards Consortium (GSC)
gensc.org
Nucleotide sequence
www.insdc.org
Nucleotide sequence
www.insdc.org
Standards and integration
DAS
23.08.1836
DAS, The Distributed Annotation System
The Distributed Annotation System is…
• A network of biological data sources
• A Service Oriented Architecture (SOA)
• An example of federation
The DAS Protocol is…
• An integration platform
• A client-server protocol
• An agreed standard for web services
Andy Jenkinson
23.08.1837
DAS, Architectural Overview
Andy Jenkinson
23.08.1838
DAS servers and data types
Genome sequence
Sequence alignments
Protein sequence
Protein-protein interaction
Gel 2D
EMAP
3DM
Protein structure
Protein structure
EMAP
3DM
Protein-protein interaction
Protein structure
Gel 2D
Mass spectrometry
Epigenetics
Phenotype
Functional genomics
Structural genomics
Protein sequence
Alignment servers Annotation servers Reference servers
23.08.1839
Client
DAS client (Dasty2)
…
…
23.08.1840
DAS server for EMAP data
EMAP: The Edinburgh Mouse Atlas Project
Gene expression databases (EMAGE & GXD)

DAS reference server

EMAP - Ontology
DAS annotation servers

EMAGE

GXD
Jose Ramon Macias
Standards and integration
PSICQUIC
23.08.1842
PSICQUIC
based on the PSI-MI standard for molecular interactions
….….
….....
….….
….....
PSICQUIC PSICQUIC PSICQUIC
Sample
Observation error
Interaction databases
Publications
PSICQUIC servers
Annotation error
Client
23.08.1843
…
…
Psicquic client (Envision2)
ENFIN, EnCORE and EnVISION
Data Integration
23.08.1845
ENFIN Network of Excellence
• Brings together experimentalists
and computational biologists to
develop the next generation of
informatics resources for
systems biology
• Funded by the European
Commission within its FP6
programme under the thematic
area ‘Life sciences, genomics
and biotechnology for health’
• 20 partners in 13 countries
• www.enfin.org
23.08.1846
EnCORE Overview
External data sources
EnCORE wrappers
EnCORE workflows
EnVISION pages
WS, API
WS
API
Web interface
21
23.08.1847
EnCORE services
From Inputs to Outputs
Positive Negative
Input/Query
Output/Results
Program/Service
EnCORE dataset
EnCORE
results
EnCORE webservice
• Enfin-IntAct
• Enfin-PRIDE
• Enfin-Affy2UniProt
• Enfin-PICR
• Enfin-Reactome
• Enfin-ArrayExpress
• Enfin-UniProt
• Enfin-BioModels
• Enfin-KEGG
• Enfin-G:GOSt
• Enfin-CellMINT
• Enfin-DOMAINATION
• Database IDs
• Sequences
• Experiment: Identifies the result
• Sets: Contains the structure of the result
• Molecules: Includes the results
• Features: Describe details of the result
23.08.1848
EnCORE services
Example
Positive Negative
Input/Query
Output/Results
Program/Service
EnCORE dataset
EnCORE
results
EnCORE webservice
• Encore webservice
Enfin-IntAct
• Database ID (Uniprot ID)
P37173
• Experiment: ID4
• Sets: (1)EBI-296235, (2)EBI-1033040, (3) EBI-
902913, EBI-902937, (4) EBI-296166, EBI-296246,
(5)EBI-902913
• Molecules: (1)O35613, (2)P10600, (3)P07200,
(4)Q9UER7, (5)Q99K41
• Features: No features
23.08.1849
EnCORE services
Example (Result on a table)
Interactor A Interactor B Interaction IDs
1 P37173 O35613 EBI-296235
2 P37173 P10600 EBI-1033040
3 P37173 P07200 EBI-902913, EBI-902937
4 P37173 Q9UER7 EBI-296166, EBI-296246
5 P37173 Q99K41 EBI-902913
Input/Query
Output/Results
Program/Service
Enfin-IntAct
P37173
23.08.1850
EnCORE services
Building workflows
Input Result Positive result Negative resultWebservice Input selection
23.08.1851
Envison interface (example)
• Pride, Uniprot, Intact, Reactome, CellMint, PICR,
Biomodels, …
Standards and integration
BioMart
23.08.1853
BioMart data-mining
• BioMart is a search engine that can find multiple
terms and put them into a table format.
• Such as: human gene (IDs), chromosome and base
pair position
• No programming required!
Xose Fernandez
23.08.1854
Biomart intergration example
Name Fragment Position Alleles strand
SNP1 AL139258 1659852 T/A 1
SNP2 NT_25698 2569873 C/T -1
SNP3 chr13 1125698 C/G 1
Data conversion and integration
Ensembl
HapMap
NCBI
UCSC
Priopriatery
data
Diabetes-Gene Association DataBase
Combined proprietary
and public data
www.biomart.org
23.08.1855
BioMart integration
DB DB DB DB
… … … …
23.08.1856
BioMart web interface
23.08.1857
Information Flow
1. Filter
2. Attributes
3. Results
www.biomart.org
23.08.1858
BioMart web interface
23.08.1859
Mart view - results
(CV) controlled vocabularies
OLS
23.08.1861
Why use ontologies and CVs
• To structure, classify and/or model the concepts
pertaining to some field of interest and the relationships
between them.
• To enable a community to come to agreement and to
commit to use the same terms in the same way.
• All terms and their interrelations should be unique,
unambiguous and well-defined (ideally…)
• To facilitate data annotation,
retrieval and comparison.
Richard Cote
23.08.1862
What is OLS?
• A unified, single point of query for over 69 ontologies
(updated daily) and upwards of 850,000 terms.
• A tool that offers online and programmatic access to query
ontologies about:
• Term names
• Synonyms
• Relationships
• Annotations
• Cross-references
• Reusable code components to integrate such functionality
in other projects
http://www.ebi.ac.uk/ontology-lookup/
Richard Cote
23.08.1863
Ontology browsing
23.08.1864
Ontology browsing
ID mappings
PICR
23.08.1866
Why do you need ID mapping
• Merging datasets to a common identifier space
• Finding all aliases/synonyms for an identifier
• (data integration – submissions!)
• Mapping from secondary IDs to more recent primary IDs
• (data “freshness”)
• Preparing data sets for specific tools
• Querying in various primary databases
• (data format requirements)
Richard Cote
23.08.1867
Protein identifier mapping is hard
• The basic problem: the same protein sequence is referred to by
multiple accession numbers assigned by multiple databases.
• No universal identifier scheme
• Redundant databases – multiple identifiers for the same sequence in
the same database
• Unstable identifiers (ex: gi numbers)
• Obsolete and deleted identifiers (hypothetical proteins)
• Different production cycles for major databases
• Tools exist, but are limited in important their database and
species coverage and in their usability and availability. Richard Cote
23.08.1868
PICR Home Page
Submit accessions
OR sequences
(FASTA) with 500
entry interactive limit
(no batch limit)
Select output format
Select one or
many databases
to map to in one
request
Limit search by
taxonomy
(pessimistic)
Choose to return
all mappings or
only active ones
Run
search
Richard Cote
23.08.1869
PICR Result Page – simple view
Logical xref
(hyperlinked)
Inactive xref
Secondary
Identifier
Active xref
(hyperlinked)
Richard Cote
Queering EBI resources
examples
23.08.1871
EBeye - Quick search
23.08.1872
EBeye - Quick search - PRIDE
23.08.1873
EBeye – Understand your query (advance search)
23.08.1874
Pride – Protein identifications (Biomart)
23.08.1875
Pride – Protein identifications (Biomart)
23.08.1876
Pride – Protein identifications (Biomart)
1 experiment
2648 proteins
23.08.1877
Ensembl
23.08.1878
Pride – Protein identifications (Web)
…
23.08.1879
Pride – Protein identifications (Web)
79 experiments
23.08.1880
Pride – Protein identifications (Web)
23.08.1881
Pride – Protein identifications (Web)
23.08.1882
Pride – Protein identifications (Web)
23.08.1883
Reactome – pathways
23.08.1884
Uniprot – Protein sequence data
…
…
23.08.1885
Dasty2 – Third party protein sequence annotations
23.08.1886
Intact - Protein interactions
CV:GO
23.08.1887
Intact - Protein interactions
23.08.1888
Microarray and gene expression data
Experiments Archive
23.08.1889
Microarray and gene expression data
Gene Expression Atlas
CV:EFO
23.08.1890
Microarray and gene expression data
Gene Expression Atlas
23.08.1891
EBeye - Other sources …
Protein structures
Literature
23.08.1892
Citexplorer - Literature
23.08.1893
PDBe - Macromolecular structures
Support
23.08.1895
User support
• 2Can bioinformatics user support – www.ebi.ac.uk/2Can
• Online help pages – www.ebi.ac.uk/help
• E-mail support – www.ebi.ac.uk/support
Thank you!
Questions?

More Related Content

PDF
ELIXIR Node poster UK 2014
PDF
em3e_leaflet_april2016
PDF
Update to the ELIXIR Board Nov 2014
PDF
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
PDF
Advanced Bioinformatics for Genomics and BioData Driven Research
PPTX
ELIXIR-UK
PPTX
EMBL-EBI
ELIXIR Node poster UK 2014
em3e_leaflet_april2016
Update to the ELIXIR Board Nov 2014
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
Advanced Bioinformatics for Genomics and BioData Driven Research
ELIXIR-UK
EMBL-EBI

Similar to Using EMBL-EBI resources to explore stem cell data (20)

PDF
EMBL-Sustainable Access to the World's Largest Biomolecular Data Resources
PPTX
EMbaRC - Microbial Resources for Innovation
PPT
Standardisation in BMS European infrastructures
PDF
2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf
PPTX
Proteomics resources at the EBI & ExPASy
PDF
1 introduction to_the_ebi_(katrina_pavelin)
PDF
Model repositories and standard formats for model reusability
PPTX
eROSA Stakeholder WS1: Ensembl, ELIXIR and engineering interconnections
PPTX
IntAct Database.pptx
PPTX
ELIXIR UK Node presentation to the ELIXIR Board
PPTX
InterPro and InterProScan 5.0
 
PPT
Data Integration through Enfin and EnCore
PPTX
Introduction to bioinformatics and databases .pptx
PPTX
Designing a community resource - Sandra Orchard
PPT
EMBL-EBI Proteomics data resources and services
PPT
Pathology is being disrupted by Data Integration, AI & Blockchain
PPT
B.sc biochem i bobi u-1 introduction to bioinformatics
PPT
B.sc biochem i bobi u-1 introduction to bioinformatics
EMBL-Sustainable Access to the World's Largest Biomolecular Data Resources
EMbaRC - Microbial Resources for Innovation
Standardisation in BMS European infrastructures
2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf
Proteomics resources at the EBI & ExPASy
1 introduction to_the_ebi_(katrina_pavelin)
Model repositories and standard formats for model reusability
eROSA Stakeholder WS1: Ensembl, ELIXIR and engineering interconnections
IntAct Database.pptx
ELIXIR UK Node presentation to the ELIXIR Board
InterPro and InterProScan 5.0
 
Data Integration through Enfin and EnCore
Introduction to bioinformatics and databases .pptx
Designing a community resource - Sandra Orchard
EMBL-EBI Proteomics data resources and services
Pathology is being disrupted by Data Integration, AI & Blockchain
B.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformatics
Ad

More from Rafael C. Jimenez (20)

PPTX
BMB Resource Integration Workshop
PPTX
Proteomics repositories integration using EUDAT resources
PPTX
Summary of Technical Coordinators discussions
PPTX
The European life-science data infrastructure: Data, Computing and Services ...
PPT
PPT
ELIXIR TCG update
PPT
An introduction to programmatic access
PPTX
Life science requirements from e-infrastructure: initial results from a joint...
PPT
Technical activities in ELIXIR Europe
PPTX
Challenges of big data. Summary day 1.
PPTX
Challenges of big data. Aims of the workshop.
PPTX
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
PPT
ELIXIR and data grand challenges in life sciences
PPT
SASI, A lightweight standard for exchanging course information
PPTX
Introduction to the BioJS project
PPTX
ELIXIR . Technical Coordinator
PPTX
BioJS introduction
BMB Resource Integration Workshop
Proteomics repositories integration using EUDAT resources
Summary of Technical Coordinators discussions
The European life-science data infrastructure: Data, Computing and Services ...
ELIXIR TCG update
An introduction to programmatic access
Life science requirements from e-infrastructure: initial results from a joint...
Technical activities in ELIXIR Europe
Challenges of big data. Summary day 1.
Challenges of big data. Aims of the workshop.
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
ELIXIR and data grand challenges in life sciences
SASI, A lightweight standard for exchanging course information
Introduction to the BioJS project
ELIXIR . Technical Coordinator
BioJS introduction
Ad

Recently uploaded (20)

PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
An interstellar mission to test astrophysical black holes
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Microbiology with diagram medical studies .pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
The scientific heritage No 166 (166) (2025)
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
The KM-GBF monitoring framework – status & key messages.pptx
Placing the Near-Earth Object Impact Probability in Context
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Introduction to Cardiovascular system_structure and functions-1
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
An interstellar mission to test astrophysical black holes
POSITIONING IN OPERATION THEATRE ROOM.ppt
Microbiology with diagram medical studies .pptx
Phytochemical Investigation of Miliusa longipes.pdf
HPLC-PPT.docx high performance liquid chromatography
The scientific heritage No 166 (166) (2025)
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
. Radiology Case Scenariosssssssssssssss
ECG_Course_Presentation د.محمد صقران ppt
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...

Using EMBL-EBI resources to explore stem cell data

  • 1. Using EMBL-EBI resources to explore stem cell data Rafael Jimenez Stem Cells and BioInformatics Scottish Stem Cell Network 21-22 September
  • 2. 23.08.182 Content of this presentation • EMBL-EBI overview • EBI Resources • Data integration and standards • Queering EBI resources: examples
  • 4. 23.08.184 EMBL-EBI • Non-profit organisation • Based on the Wellcome Trust Genome Campus near Cambridge, UK • Part of the European Molecular Biology Laboratory
  • 5. 23.08.185 The five branches of EMBL Mouse biology Monterotondo Structural biology Grenoble Bioinformatics Hinxton Structural biology Hamburg Basic research in molecular biology Administration EMBO Heidelberg • EMBL is a basic research institute funded by public research monies from 20 member states • 1400 staff, over 60 nationalities
  • 6. 23.08.186 EMBL-EBI’s mission • To provide freely available data and bioinformatics servicesservices to all facets of the scientific community in ways that promote scientific progress • To contribute to the advancement of biology through basic investigator-driven researchresearch in bioinformatics • To provide advanced bioinformatics trainingtraining to scientists at all levels, from PhD students to independent investigators • To help disseminate cutting-edge technologies to industryindustry
  • 7. 23.08.187 Research groups Transcriptome analysis Brazma, Huber Transcriptome analysis Brazma, Huber Text mining Rebholz-Schuhmann Text mining Rebholz-Schuhmann Protein annotation Apweiler Protein annotation Apweiler Structural bioinformatics Thornton Structural bioinformatics Thornton Pathways, networks, systems Le Novère Pathways, networks, systems Le Novère Cheminformatics Steinbeck, Overington Cheminformatics Steinbeck, Overington Genome analysis Birney, Flicek, Enright, Goldman Genome analysis Birney, Flicek, Enright, Goldman Regulatory networks Luscombe Regulatory networks Luscombe Differentiation and development Bertone Differentiation and development Bertone
  • 8. 23.08.188 Bioinformatics Roadshow eLearning programme Hands-on training at EMBL- EBI MBL ics v me g A tripartite user-training programme Training comes to you www.ebi.ac.uk/training/roadshow Training comes to you www.ebi.ac.uk/training/roadshow Training any time, anywhere, at any pace www.ebi.ac.uk/training/elearning Training any time, anywhere, at any pace www.ebi.ac.uk/training/elearning Hands-on user training on all our core data resources for lab-based researchers www.ebi.ac.uk/training/handson Hands-on user training on all our core data resources for lab-based researchers www.ebi.ac.uk/training/handson
  • 9. 23.08.189 Hands-on training for all levels of experience • Interactive training in our purpose-built IT training suite at EMBL-EBI, Hinxton, Cambridge • Learn from the EBI’s experts through a combination of talks and practical exercises • Take a tour of all our core data resources, or focus in on specific data types • Full programme at www.ebi.ac.uk/training/handson Wellcome Images
  • 11. 23.08.1811 Moodle-based eLearning platform Courses availableCourses available www.ebi.ac.uk/training/elearning •EBI and EB-eye •Sequence searching •Patent searching •Literature searching •Ensembl •Transcriptomics
  • 12. 23.08.1812 Each course is modular Video tutorial learn by watching and listening Video tutorial learn by watching and listening A course contains 3–5 modules (~30 min each) Each module contains… Print tutorial Learn by reading Print tutorial Learn by reading Quiz Learn by testing your understanding Quiz Learn by testing your understanding Reflective task Learn by practicing Reflective task Learn by practicing Please beta-test and provide feedback!
  • 13. 23.08.1813 The EBI Industry Programme • Enables industry to adapt quickly to, and maximise the benefit from, innovations in bioinformatics. • Membership benefits include: • Research of benefit to industry • Expert training • Standards development • Technical development • Networking opportunities • Membership is by invitation and members subscribe on an annual basis
  • 15. 23.08.1815 Databases: molecules to systems Genomes Ensembl Ensembl Genomes EGA Genomes Ensembl Ensembl Genomes EGA Nucleotide sequence EMBL-Bank Nucleotide sequence EMBL-Bank Microarray & gene expression data ArrayExpress Microarray & gene expression data ArrayExpress Proteomes UniProt, PRIDE Proteomes UniProt, PRIDE Protein families, motifs and domains InterPro Protein families, motifs and domains InterPro Protein structure MSD Protein structure MSD Protein interactions IntAct Protein interactions IntAct Chemical entities ChEBI Chemical entities ChEBI Pathways Reactome Pathways Reactome Systems BioModels Systems BioModels Literature and ontologies CiteXplore, GO Literature and ontologies CiteXplore, GO
  • 16. 23.08.1816 EBI website and search engine EB-eye Search all main databases in one go Search all main databases in one go Refine your searchRefine your search Advanced search: drill down to specific fields in specific databases Advanced search: drill down to specific fields in specific databases
  • 17. 23.08.1817 Genomes 1: Ensembl Across species Within species SyntenySynteny Pick a genomePick a genome OrthologyOrthology Genomic alignmentsGenomic alignments Gene familiesGene families SNPsSNPs GenesGenes ChromosomesChromosomes
  • 18. 23.08.1818 Genomes 2: Ensembl Genomes Ensembl-like genome browser for non-vertebrate species Ensembl-like genome browser for non-vertebrate species Ensembl Metazoa Ensembl Metazoa Ensembl BacteriaEnsembl Bacteria Using view options, you can select to view only the current gene or the entire expanded gene tree. Select Orthologue view to see putative orthologues. Across species View options
  • 19. 23.08.1819 • Keyword and sequence searching • Map-based search of environmental samples • Downloads Nucleotides: EMBL-Bank EMBL-Bank DDBJ GenBank www.insdc.org • Direct submissions • Patents • Genome- sequencing projects • Updates • Third-party annotation
  • 20. 23.08.1820 Transcriptomes: ArrayExpress Search by experiment Search by keywordSearch by keyword Link to sample properties and experiment design Link to sample properties and experiment design View experimentView experiment Search by gene across experiments Browse results summaryBrowse results summary Search by gene name, species and experimental condition Search by gene name, species and experimental condition View expression under different conditions and profiles View expression under different conditions and profiles
  • 21. 23.08.1821 Protein sequence: UniProt UniProt • Manual curation • Literature-based annotation • Sequence analysis • Automated annotation PRIDE GO InterPro IntAct IntEnz HAMAP RESID Functional infoFunctional info Protein identification data Protein identification data Protein families and domains Protein families and domains Molecular interactions Molecular interactions EnzymesEnzymes Microbial protein families Microbial protein families Post-translational modifications Post-translational modifications Someda Transmembrane prediction Transmembrane prediction InterPro classification InterPro classification Signal predictionSignal prediction Other predictionsOther predictions Protein classification
  • 22. 23.08.1822 Protein families, motifs and domains: InterPro Powerful tool for protein classification, integrating several methods into one resource View architectures of proteins containing a signature View architectures of proteins containing a signature Compare methods of protein signature prediction Compare methods of protein signature prediction Visualize the taxonomic range for a protein signature Visualize the taxonomic range for a protein signature
  • 23. 23.08.1823 Proteomics services IntAct: molecular interactionsIntAct: molecular interactions INTENZ: enzyme classificationINTENZ: enzyme classification ChEBI: small moleculesChEBI: small molecules PRIDE: protein identifications from proteomics experiments PRIDE: protein identifications from proteomics experiments
  • 24. 23.08.1824 ChEBI - Chemical Entities of Biological Interest
  • 25. 23.08.1825 Structures: PDBe LigandsLigands Sequence mapping Sequence mapping Linking to domain data Linking to domain data AssembliesAssemblies Surface matching Surface matching Fold matchingFold matching Active sitesActive sites Electron density visualization Electron density visualization
  • 26. 23.08.1826 Pathways: Reactome View reactions and events in detail View reactions and events in detail Select a pathway Select a pathway Export pathway to your favourite modelling software Export pathway to your favourite modelling software Compare events in different species Compare events in different species Link to source databases Link to source databases
  • 28. 23.08.1828 Molecular Biology Database resources Human Genes and Diseases 13% Proteomics Resources 1% Other Molecular Biology Databases 3% Immunological databases 2% Plant databases 7% Organelle databases 2% Human and other Vertebrate Genomes 8% Nucleotide Sequence Databases 9% RNA sequence databases 5% Protein sequence databases 13% Structure Databases 9% ,Genomics Databases non-vertebrate 19% Metabolic and Signaling Pathways 9% Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009. MY Galperin, GR Cochrane - Nucleic Acids Research, 2008 ~1440 resources
  • 29. 23.08.1829 Utility of bioinformaticsScientificimpact Too little bioinformatics Too many databases Too diverse interfaces Tim Hubbard
  • 30. 23.08.1830 Many databases VS integration DB GUI DB DB DB SP SP SP SP DB GUI DB GUI DB GUI DB GUI Many databases Integration Database Graphical User InterfaceGUI User Standard protocolSP
  • 31. 23.08.1831 Utility of bioinformaticsScientificimpact Too little bioinformatics Too many databases Too diverse interfaces Integration of
  • 32. 23.08.1832 Data integration • Combining data residing in different sources • … and providing users with a unified standards. Main objective Requires • Share • Compare • Integrate • Data from the same domain • Data from different domains • Federated systems • Standard formats • Mapping tools • Ontologies
  • 33. 23.08.1833 Data integration • Federated systems • DAS • PSICQUIC • … • Standard formats • DAS • PSI-MI • BioPAX • SBML • CellML • … • Mapping tools • PICR • Uniprot API • Ensembl API • Biomart • … • Ontologies • OLS • …
  • 34. 23.08.1834 Standards development – international collaborations Genome annotation www.geneontology.org Genome annotation www.geneontology.org Microarray and Gene Expression Data (MGED) www.mged.org Microarray and Gene Expression Data (MGED) www.mged.org Protein sequence www.uniprot.org Protein sequence www.uniprot.org HUPO- Proteomics Standards Initiative (PSI) Psidev.sf.net HUPO- Proteomics Standards Initiative (PSI) Psidev.sf.net Protein structure www.wwpdb.org Protein structure www.wwpdb.org Cheminformatics www.ebi.ac.uk/chebi Cheminformatics www.ebi.ac.uk/chebi Pathways www.reactome.org www.biopax.org Pathways www.reactome.org www.biopax.org Systems modelling standards www.sbml.org Systems modelling standards www.sbml.orgMetabolomics Standards Initiative (MSI) www.metabolomicssociety.org Metabolomics Standards Initiative (MSI) www.metabolomicssociety.org Genomics Standards Consortium (GSC) gensc.org Genomics Standards Consortium (GSC) gensc.org Nucleotide sequence www.insdc.org Nucleotide sequence www.insdc.org
  • 36. 23.08.1836 DAS, The Distributed Annotation System The Distributed Annotation System is… • A network of biological data sources • A Service Oriented Architecture (SOA) • An example of federation The DAS Protocol is… • An integration platform • A client-server protocol • An agreed standard for web services Andy Jenkinson
  • 38. 23.08.1838 DAS servers and data types Genome sequence Sequence alignments Protein sequence Protein-protein interaction Gel 2D EMAP 3DM Protein structure Protein structure EMAP 3DM Protein-protein interaction Protein structure Gel 2D Mass spectrometry Epigenetics Phenotype Functional genomics Structural genomics Protein sequence Alignment servers Annotation servers Reference servers
  • 40. 23.08.1840 DAS server for EMAP data EMAP: The Edinburgh Mouse Atlas Project Gene expression databases (EMAGE & GXD)  DAS reference server  EMAP - Ontology DAS annotation servers  EMAGE  GXD Jose Ramon Macias
  • 42. 23.08.1842 PSICQUIC based on the PSI-MI standard for molecular interactions ….…. …..... ….…. …..... PSICQUIC PSICQUIC PSICQUIC Sample Observation error Interaction databases Publications PSICQUIC servers Annotation error Client
  • 44. ENFIN, EnCORE and EnVISION Data Integration
  • 45. 23.08.1845 ENFIN Network of Excellence • Brings together experimentalists and computational biologists to develop the next generation of informatics resources for systems biology • Funded by the European Commission within its FP6 programme under the thematic area ‘Life sciences, genomics and biotechnology for health’ • 20 partners in 13 countries • www.enfin.org
  • 46. 23.08.1846 EnCORE Overview External data sources EnCORE wrappers EnCORE workflows EnVISION pages WS, API WS API Web interface 21
  • 47. 23.08.1847 EnCORE services From Inputs to Outputs Positive Negative Input/Query Output/Results Program/Service EnCORE dataset EnCORE results EnCORE webservice • Enfin-IntAct • Enfin-PRIDE • Enfin-Affy2UniProt • Enfin-PICR • Enfin-Reactome • Enfin-ArrayExpress • Enfin-UniProt • Enfin-BioModels • Enfin-KEGG • Enfin-G:GOSt • Enfin-CellMINT • Enfin-DOMAINATION • Database IDs • Sequences • Experiment: Identifies the result • Sets: Contains the structure of the result • Molecules: Includes the results • Features: Describe details of the result
  • 48. 23.08.1848 EnCORE services Example Positive Negative Input/Query Output/Results Program/Service EnCORE dataset EnCORE results EnCORE webservice • Encore webservice Enfin-IntAct • Database ID (Uniprot ID) P37173 • Experiment: ID4 • Sets: (1)EBI-296235, (2)EBI-1033040, (3) EBI- 902913, EBI-902937, (4) EBI-296166, EBI-296246, (5)EBI-902913 • Molecules: (1)O35613, (2)P10600, (3)P07200, (4)Q9UER7, (5)Q99K41 • Features: No features
  • 49. 23.08.1849 EnCORE services Example (Result on a table) Interactor A Interactor B Interaction IDs 1 P37173 O35613 EBI-296235 2 P37173 P10600 EBI-1033040 3 P37173 P07200 EBI-902913, EBI-902937 4 P37173 Q9UER7 EBI-296166, EBI-296246 5 P37173 Q99K41 EBI-902913 Input/Query Output/Results Program/Service Enfin-IntAct P37173
  • 50. 23.08.1850 EnCORE services Building workflows Input Result Positive result Negative resultWebservice Input selection
  • 51. 23.08.1851 Envison interface (example) • Pride, Uniprot, Intact, Reactome, CellMint, PICR, Biomodels, …
  • 53. 23.08.1853 BioMart data-mining • BioMart is a search engine that can find multiple terms and put them into a table format. • Such as: human gene (IDs), chromosome and base pair position • No programming required! Xose Fernandez
  • 54. 23.08.1854 Biomart intergration example Name Fragment Position Alleles strand SNP1 AL139258 1659852 T/A 1 SNP2 NT_25698 2569873 C/T -1 SNP3 chr13 1125698 C/G 1 Data conversion and integration Ensembl HapMap NCBI UCSC Priopriatery data Diabetes-Gene Association DataBase Combined proprietary and public data www.biomart.org
  • 55. 23.08.1855 BioMart integration DB DB DB DB … … … …
  • 57. 23.08.1857 Information Flow 1. Filter 2. Attributes 3. Results www.biomart.org
  • 61. 23.08.1861 Why use ontologies and CVs • To structure, classify and/or model the concepts pertaining to some field of interest and the relationships between them. • To enable a community to come to agreement and to commit to use the same terms in the same way. • All terms and their interrelations should be unique, unambiguous and well-defined (ideally…) • To facilitate data annotation, retrieval and comparison. Richard Cote
  • 62. 23.08.1862 What is OLS? • A unified, single point of query for over 69 ontologies (updated daily) and upwards of 850,000 terms. • A tool that offers online and programmatic access to query ontologies about: • Term names • Synonyms • Relationships • Annotations • Cross-references • Reusable code components to integrate such functionality in other projects http://www.ebi.ac.uk/ontology-lookup/ Richard Cote
  • 66. 23.08.1866 Why do you need ID mapping • Merging datasets to a common identifier space • Finding all aliases/synonyms for an identifier • (data integration – submissions!) • Mapping from secondary IDs to more recent primary IDs • (data “freshness”) • Preparing data sets for specific tools • Querying in various primary databases • (data format requirements) Richard Cote
  • 67. 23.08.1867 Protein identifier mapping is hard • The basic problem: the same protein sequence is referred to by multiple accession numbers assigned by multiple databases. • No universal identifier scheme • Redundant databases – multiple identifiers for the same sequence in the same database • Unstable identifiers (ex: gi numbers) • Obsolete and deleted identifiers (hypothetical proteins) • Different production cycles for major databases • Tools exist, but are limited in important their database and species coverage and in their usability and availability. Richard Cote
  • 68. 23.08.1868 PICR Home Page Submit accessions OR sequences (FASTA) with 500 entry interactive limit (no batch limit) Select output format Select one or many databases to map to in one request Limit search by taxonomy (pessimistic) Choose to return all mappings or only active ones Run search Richard Cote
  • 69. 23.08.1869 PICR Result Page – simple view Logical xref (hyperlinked) Inactive xref Secondary Identifier Active xref (hyperlinked) Richard Cote
  • 72. 23.08.1872 EBeye - Quick search - PRIDE
  • 73. 23.08.1873 EBeye – Understand your query (advance search)
  • 74. 23.08.1874 Pride – Protein identifications (Biomart)
  • 75. 23.08.1875 Pride – Protein identifications (Biomart)
  • 76. 23.08.1876 Pride – Protein identifications (Biomart) 1 experiment 2648 proteins
  • 78. 23.08.1878 Pride – Protein identifications (Web) …
  • 79. 23.08.1879 Pride – Protein identifications (Web) 79 experiments
  • 80. 23.08.1880 Pride – Protein identifications (Web)
  • 81. 23.08.1881 Pride – Protein identifications (Web)
  • 82. 23.08.1882 Pride – Protein identifications (Web)
  • 84. 23.08.1884 Uniprot – Protein sequence data … …
  • 85. 23.08.1885 Dasty2 – Third party protein sequence annotations
  • 86. 23.08.1886 Intact - Protein interactions CV:GO
  • 88. 23.08.1888 Microarray and gene expression data Experiments Archive
  • 89. 23.08.1889 Microarray and gene expression data Gene Expression Atlas CV:EFO
  • 90. 23.08.1890 Microarray and gene expression data Gene Expression Atlas
  • 91. 23.08.1891 EBeye - Other sources … Protein structures Literature
  • 95. 23.08.1895 User support • 2Can bioinformatics user support – www.ebi.ac.uk/2Can • Online help pages – www.ebi.ac.uk/help • E-mail support – www.ebi.ac.uk/support

Editor's Notes

  • #6: We’re the second largest of the five EMBL sites; there is the main lab and administrative centre in Heidelberg; structural biology labs in Hamburg and Grenoble; mouse biology in Monterotondo, near Rome, and bioinformatics in Hinxton. There are around 1,400 staff within EMBL and about 330 of those work at the EBI.
  • #7: EBI shares its central four mission objectives with EMBL, although focussed on bioinformatics rather than molecular biology. The EBI is at the centre of Europe’s efforts to collect, organise and make all types biological data available and we do this by providing services so researchers can access and make sense of the information, by being active in bioinformatics research, by providing training and by working closely with industry.
  • #8: The research groups map roughly onto many of the areas that we provide services in; collaboration between services and research is widespread. For example, collaborations between Wolfgang Huber’s group and the ArrayExpress developers are leading to new methods for microarray data analysis; Dietrich Rebholz-Schuhmann is working closely with our literature services team (Peter Stoehr) to develop innovative literature mining methods, some of which will be used in the new UK PubMedCentral service. Some of the services groups (Brazma, Birney and Apweiler) also have research components.
  • #9: To train researchers in using the EBI’s resources, we have a tripartite training programme, encompassing training courses in-house at the EBI, the Bioinformatics Roadshow where EBI trainers travel out to host organisations to provide hands-on training on resources requested by the host, and most recently, we are launched an elearning programme for anyone to use so they can undertake some training in their own time.
  • #11: The training programme aims to cover all the core EBI resources, both at introductory and an advanced level. For example, the two day dip course shown here, gives an overview of different resources and acts as a way to orientate yourself and become familiar with the range of resources, whereas a more specific course, such as one on transcriptomics will link the use of several different resources together.
  • #12: As an alternative of coming to us, you can now access EBI training from anywhere in the world via our Moodle-based elearning which has just been launched. We currently offer three courses which have been developed with an external consultancy company; and then we are going to use the same format to develop additional courses covering the use of other EBI resources. The elearning is free to use and you just need to register to get a username and password.
  • #13: Each course is modular and the different parts allow people to use a combination of learning methods – computer based elements such as the video tutorial and quiz, and computer independent parts such as the print version of the tutorial so you can learn on the go, and a guided reflective task so you can learn by doing.
  • #16: The slide shows the core resources at the EBI mapped on to the same arrow to show the range of data you can access through the EBI. The EBI is the European centre for the collection and dissemination of biological data; we do this in collaboration with other global centres such as NCBI, the Institute of Genetics in Japan, the Swiss Institute of Bioinformatics and Cold Spring Harbor.
  • #17: We launched a new website and search engine just over a year ago. Our website gets over 2 million hits a day and it’s the gateway for accessing the information you want. The search engine, the EB-eye, allows integrated searching of all our core data resources from a single search box – it’s like a google for all the information held at the EBI.
  • #18: Ensembl provides a framework for working with the genomes of higher animals (metazoans). It presents, via an interactive website, the human genome together with other genomes that are important for addressing questions in medical research and molecular biology. It uses automated methods for gene prediction and annotation to provide a consistent view of completely sequenced genomes. Users can view the data at many levels, from entire chromosomes down to single nucleotide polymorphisms. As well as accessing a wealth of data for each species, users can also perform cross-species comparisons.
  • #19: Ensembl Genomes is the combined repository for non-vertebrate genome data, consisting of five resources: Ensembl Bacteria, Ensembl Fungi, Ensembl Metazoa, Ensembl Plants, and Ensembl Protists, bringing the power of the Ensembl system to all branches of life. Ensembl Genomes re-uses and extends software developed for vertebrate genomes in the context of the Ensembl project, and replaces several pre-existing resources (Integr8, Genome Reviews and ASTD) thereby unifying services and simplifying data access for users.
  • #20: EMBL-Bank is Europe's primary nucleotide sequence resource. The database is produced in an international collaboration (the International Sequence Database Collaboration, INSDC) with GenBank (USA) and the DNA Database of Japan (DDBJ). Main sources of DNA and RNA sequences are direct submissions from individual researchers, genome sequencing  projects and patent applications. Users can search the data using either keyword-based searches or using sequence homology tools such as BLAST and FASTA to compare their own sequence with the contents of EMBL-Bank. There’s also a map-based search (EMBLWorld) for exploring sequences derived from environmental genome sequencing projects. The data belong to the submitter and can only be updated by the submitter, but other researchers can submit ‘third party annotations’ to EMBL-Bank if they’re associated with a peer-reviewed publication.
  • #21: ArrayExpress is the world’s first and largest MIAME-compliant repository for microarray-based data (mostly gene expression data, but it also takes CGH and chip-chIP data). You can search the repository to view and download experiments; a subset of the data in the repository is hand-picked for the Data Warehouse, which can be searched on the basis of gene names and allows you to view gene expression data for different time points or experimental conditions.
  • #22: UniProt is the gold-standard resource for information on proteins. It comprises three different databases, but I haven’t shown all three here for the sake of simplicity. UniProtKB is the central database of protein sequences with accurate, consistent, and rich sequence and functional annotation. It comprises the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section. The UniProt archive is an archive of all the protein sequences in the public domain, and the UniRef databases are a series of three databases that store sequences of 100%, 90% and 50% identity in the same records to speed up searching without losing information. UniProtKB contains more than 29 million cross references to over 100 other data resources; a few key ones are shown here.
  • #23: InterPro amalgamates several different databases of protein signatures, which use different methods, into a single resource, providing a powerful tool for protein classification.
  • #24: The proteomics services team produces a range of resources for the proteomics research community; they are also heavily involved in the development of standards for proteomics research, and their resources adhere to these standards. IntAct provides an open source database and toolkit for the storage, presentation and analysis of molecular interactions. PRIDE is an open-source public repository of protein identifications, peptide identifications, post-translational modifications and mass spectra. The Integrated relational Enzyme database provides a complete, freely available database storing the most up-to-date version of the Enzyme Nomenclature approved by the NC-IUBMB. Chemical Entities of Biological Interest (ChEBI) aims to provide standardised descriptions of molecular entities that enable other databases at the EMBL-EBI and worldwide to annotate their entries in a consistent fashion.
  • #26: The Macromolecular Structure Database (MSD) is the European resource for the collection, organisation and dissemination of data about biological macromolecular structures. The MSD is one of three partners in the worldwide Protein Data Bank (wwPDB), the consortium entrusted with the collation, maintenance and distribution of the global repository of macromolecular structure data. The MSD team has developed a wide range of resources for the analysis of data in the PDB.
  • #27: Reactome aims to develop a curated resource of core pathways and reactions in human biology, although other species are also covered. It treats both metabolic and signalling pathways in the same way, providing both a human- and a computer-readable account of all the processes in a pathway. It makes extensive use of crosslinks to other data resources; the data in Reactome can also be exported to a number of different types of modelling software so that they can be incorporated into computer models of living systems.
  • #36: As well as providing services, the EBI does research…
  • #37: An integration platform for biological data a way of bringing together data from different providers federation unifies data sources that are different to each other
  • #42: As well as providing services, the EBI does research…
  • #45: As well as providing services, the EBI does research…
  • #53: As well as providing services, the EBI does research…
  • #61: As well as providing services, the EBI does research…
  • #66: As well as providing services, the EBI does research…
  • #71: As well as providing services, the EBI does research…
  • #95: As well as providing services, the EBI does research…
  • #96: If you need help using any of our databases it’s available; if our online support pages can’t answer your question we offer e-mail support and promise to get back to you within 2 working days.