SlideShare a Scribd company logo
European Bioinformatics Institute -
the home for big data in biology
www.ebi.ac.uk
Advanced Bioinformatics for Genomics
and BioData Driven Research
The European Molecular Biology Laboratory
Heidelberg, Germany
Main Laboratory
Barcelona, Spain
Tissue Biology, Disease Modeling
80+ nationalities
Hinxton, Cambridge, UK
Bioinformatics
Mouse Biology
Rome, Italy
>1700 personnel
Grenoble, France
Hamburg, Germany
Structural Biology
6 sites in Europe
Structural Biology
Our mission
Deliver
excellent
research
Train the
next
generation
of scientists
Engage with
industry
Coordinate
bioinformatics
in Europe
Deliver
scientific
services
Data and tools to support life science research
www.ebi.ac.uk/services
Bioinformatics services
What services do we provide? Labs around the
world send us their
data and we…
Archive it
Classify it
Share it with
other data
providers
Analyse, add
value and
integrate it
…provide tools
to help
researchers
use it
A collaborative
enterprise
~64 million
requests to EMBL-EBI websites
every day
273 petabytes
of raw storage in our data centres
22 500
participants to EMBL-EBI Training
events
Requests from
20 million
unique IP addresses
Big Data, big demand for EMBL-EBI data services…
Data resources at EMBL-EBI
Data resources for Genomics – Molecular Archives
BioSamples database - centralised resource for FAIR sample data
(>12 million samples)
Experimental Factor Ontology - systematic description of experimental
variables available in EBI databases and projects (26,764 terms)
European Genome-phenome Archive - sequence and genotype
experiments, including case-control and population studies (3,445 studies)
European Nucleotide Archive (ENA) - record of the world's nucleotide
sequencing information (>2,400 million sequences, > 7,200 billion bases)
European Variation Archive - sole international resource for human and
non-human variation
Data resources for Genomics – Genes, Genomes & Variation
Ensembl - genome browser (human: >0.6 billion SNV, >6 million SV)
Ensembl Genomes - 275 vertebrate species / strains; Metazoa; Plants;
Fungi; Protists; Bacteria
GWAS Catalog - moved to EBI in 2015 (4,390 publicn., > 17,000 assocn.)
HGNC - 41,787 approved gene entries (19,320 protein coding)
International Genome Sample Resource - ensures future usability and
accessibility of 1000 Genomes Project data
VEP started as a simple wrapper around the Ensembl API to map variants to
transcripts and predict molecular consequence.
As new data sets and algorithms have become available, functionality has
increased and VEP is now an extensive and sophisticated tool
The Ensembl Variant Effect Predictor
New resource for Genomics
• New resource for gene expression and splicing QTLs
• https://www.ebi.ac.uk/eqtl/
Global Alliance for Genomics and Health (GA4GH)
• Chaired by EMBL-EBI Director Ewan Birney
• EMBL-EBI teams leading various activities in Technical Work Streams:
• Large Scale Genomics (file formats and htsget subgroups)
• Clinical and phenotypic data capture
• Data Use and Researcher identification
• ENA/EGA/EVA and HCA DCP are also Driver Projects
Data resources for Genomics – Molecular Atlas
• Human Cell Atlas Data Coordination Platform
• In 2017, Chan Zuckerberg Initiative (CZI) funding to EMBL-
EBI, Broad Institute and the UCSC Genomics Institute, to
build a cloud-based data coordination platform
• HCA will generate petabytes of data for billions of cells,
across multiple modalities, generated by hundreds of labs
around the world
• DCP will organise, curate, standardise analyse this data
and enable open data access
Data resources for Genomics – Proteins and Protein Families
A free to use resource for the archiving,
assembly, analysis, & browsing of
microbiome data
AnalysisData archiving Assembly
NEW Resource: BioImage Archive
Molecules Cells
Tissues /
Organisms
Molecular
Machines
Graphic courtesy of Jan Ellenberg
Light Sheet
Microscopy
High Throughput
Microscopy
Superresolution
Microscopy
Cryo Electron
Microscopy
Correlate Technologies
Integrate Data
0.1 TB / day
0.5 TB / dataset
0.5 TB / day
7.5 TB / dataset
40 TB / day
10 TB / dataset
5 TB / day
20 TB / dataset
Data-driven discovery
Research
www.ebi.ac.uk/research
Zamin
Iqbal
Thomas
Keene
John
Marioni
Janet
Thornton
Andrew
Leach
Evangelia
Petsalaki
Virginie
Uhlmann
Daniel
Zerbino
Paul
Flicaek
Nick
Goldman
Rob
Finn
Alvis
Brazma
Pedro
Beltrao
Alex
Bateman
Ewan
Birney
Moritz
Gerstung
Isidro
Cortes-
Ciriano
Research groups at EMBL-EBI
Irene
Papatheodorou
In 2018, EMBL-EBI had 165 grants awarded, 120 jointly funded with researchers and institutes in 62 countries
Pedro Beltrao: Functional landscape of the human phosphoproteome
Ochoa et al Nature Biotech 2019
• Created largest phospho-
proteome resource to date
(120,000 human phosphosites)
• Used machine learning methods
to compile and analyse large
phosphorylation related biological
datasets
• Identifying new functional
phosphosites has enormous
potential to progress research
into many biological processes
and diseases
Evangelia Petsalaki: Inference of kinase-kinase regulatory networks
from phosphoproteomics data (collaboration with Beltrao group)
Invergo*,Petursson* et al, bioRxiv
Moritz Gerstung: Pan-cancer computational histopathology
• Analysis with deep learning extracts histopathological patterns
• accurately discriminates 28 cancer and 14 normal tissue types
• Predicts: whole genome duplications; focal amplifications and deletions; driver gene
mutations
• Correlations with gene expression indicative of immune infiltration and proliferation
• Prognostic information augments conventional grading and histopathology subtyping
https://doi.org/10.1101/813543
Zam Iqbal: Mykrobe – predicting TB drug resistance from WGS data
https://wellcomeopenresearch.org/articles/4-191/v1
Virginie Uhlmann: Mathematical models for bioimage analysis
doi.org/10.1371/journal.pone.0173433
Dictionary Learning for Two-Dimensional Kendall Shapes
https://arxiv.org/abs/1903.11356
An example of best practice for complex datasets
Single Cell RNA-Seq analysis at EMBL-EBI
From Irene Papatheodorou
Team Leader – Gene Expression
ArrayExpress – functional genomics archive
• started in 2000 as an archive
for microarray data
• evolved into general archive for
high-throughput functional
genomics data (microarray- or
NGS- based)
• all data are manually curated
prior to inclusion
• microarray data stored directly
in ArrayExpress
• sequencing data brokered to
and stored in ENA
• curated datasets support
reproducible and re-usable
research
Annotare – Minimum information about a scRNA-Seq
experiment
single cell
isolation
single cell well
quality
OK
doublet
debris
single cell
identifier barcode
UMI
cDNA
read
pass
fail
post-analysis single
cell quality
library
construction
inferred
cell type
R1
R2
I1
files
sample
metadata
https://arxiv.org/abs/1910.14623
From database to knowledgebase: Expression Atlases
165 baseline expression
~ 3,350 differential expression
> 3,500 bulk datasets
62 species
> 955,000 assays
> 120 single-cell datasets
12 species
https://www.ebi.ac.uk/gxa
https://www.ebi.ac.uk/gxa/sc/home
Interactive Analysis with Galaxy
https://humancellatlas.usegalaxy.eu/
Flexible
Interoperable
Scalable
Main Points
• Enabling rational choices when composing workflows
• Using a common exchange format as ‘workflow glue’
• Galaxy integrations
What people usually do...
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
OR
OR
What we really should be doing
Read Filter Normalise Compare Cluster Markers
Problem 2:
need format glue!
... but to do that we need interoperable components
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Read Filter Normalise Compare Cluster Markers
Problem 1:
components in different
languages
Our solution
Read Filter Normalise Compare Cluster Markers
Environments &
containers
Workflows
CLI CLI CLI CLI CLI CLIScripts layer
Galaxy integrations
• Extended Galaxy init container:
• Thin tool wrappers leveraging Bioconda wrappers
• Starting tertiary workflows
• Added logic for dynamic destinations
• Leverage existing Kubernetes integrations
• Improved LSF functionality for non-DRMAA clusters:
• Improved CLI executor
https://github.com/ebi-gene-expression-group/container-galaxy-sc-tertiary
Pablo
Moreno
Advanced Bioinformatics for Genomics and BioData Driven Research
Summary
• ArrayExpress/Annotare for data Submissions
• Expression Atlas/Single Cell Expression Atlas
• Analysis Workflows in Galaxy
Open Targets
Data integration Platforms
Drug discovery
• Finding the right biological target for a
drug requires bioinformatics to:
• identify promising targets
• select candidate medicines.
• EMBL-EBI services support all stages
of drug discovery:
• Ensembl
• UniProt
• ChEMBL
• Protein Data Bank in Europe
• Reactome
• Pinpointing the processes in the human body
that have a demonstrable effect on disease
• Aims to improve the success rate in the
discovery and repurposing of medicines
• A new kind of collaboration with:
• GSK
• EMBL-EBI
• Wellcome Sanger Institute
• Biogen
• Takeda
• Celgene
• Sanofi
Open Targets
www.opentargets.org
Open Targets Platform and Open Targets Genetics
www.targetvalidation.org genetics.opentargets.org
Challenges for the near future
• Non-coding SNVs
• Data standardization to enable AI/ML
• Connecting data
• Moving to the cloud
Advanced Bioinformatics for Genomics and BioData Driven Research
www.ebi.ac.uk
Stay in touch
Twitter: @emblebi
Facebook: EMBLEBI
LinkedIn: /company/ebi
YouTube: EMBLMedia

More Related Content

PDF
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
PDF
OpenTox Europe 2013
PPTX
2014 sage-talk
PPTX
2014 bangkok-talk
PPTX
Data analysis & integration challenges in genomics
PPTX
2016 davis-plantbio
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
OpenTox Europe 2013
2014 sage-talk
2014 bangkok-talk
Data analysis & integration challenges in genomics
2016 davis-plantbio

What's hot (20)

PPT
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
PDF
Drug Discovery- ELRIG -2012
PDF
Ontologies for life sciences: examples from the gene ontology
PDF
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
PPTX
The Gene Ontology & Gene Ontology Annotation resources
PPTX
Web Apollo Tutorial for the i5K copepod research community.
PPT
Folker Meyer: Metagenomic Data Annotation
PDF
BioSharing.org - mapping the landscape of community standards, databases, dat...
PPTX
2015 aem-grs-keynote
PPTX
Emerging challenges in data-intensive genomics
PPTX
FAIR Agronomy, where are we? The KnetMiner Use Case
PDF
Ontomaton icbo2013-alternative order-t_wv3
PPTX
Cshl minseqe 2013_ouellette
PDF
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
PPTX
VariantSpark a library for genomics by Lynn Langit
PDF
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
PDF
Introduction to Bioinformatics.
PDF
Article
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Drug Discovery- ELRIG -2012
Ontologies for life sciences: examples from the gene ontology
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
The Gene Ontology & Gene Ontology Annotation resources
Web Apollo Tutorial for the i5K copepod research community.
Folker Meyer: Metagenomic Data Annotation
BioSharing.org - mapping the landscape of community standards, databases, dat...
2015 aem-grs-keynote
Emerging challenges in data-intensive genomics
FAIR Agronomy, where are we? The KnetMiner Use Case
Ontomaton icbo2013-alternative order-t_wv3
Cshl minseqe 2013_ouellette
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
VariantSpark a library for genomics by Lynn Langit
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
Introduction to Bioinformatics.
Article
Ad

Similar to Advanced Bioinformatics for Genomics and BioData Driven Research (20)

PDF
Ontology Services for the Biomedical Sciences
PPTX
BioAssay Express: Creating and exploiting assay metadata
PPTX
Web Apollo: Lessons learned from community-based biocuration efforts.
PPTX
Towards Automated AI-guided Drug Discovery Labs
PDF
Developments in Metabolomics leading to PhenoMeNal
PPTX
Building an informatics solution to sustain AI-guided cell profiling with hig...
PDF
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
PPTX
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
PPTX
Towards automated phenotypic cell profiling with high-content imaging
PPTX
Supporting researchers in the molecular life sciences Jeff Christiansen
PPTX
Building a Network of Interoperable and Independently Produced Linked and Ope...
PDF
Connecting life sciences data at the European Bioinformatics Institute
PPT
Technical activities in ELIXIR Europe
PDF
Overview of Next Gen Sequencing Data Analysis
PDF
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
PPT
ELIXIR and data grand challenges in life sciences
PPTX
Introduction to bioinformatics
PDF
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
PDF
Open interoperability standards, tools and services at EMBL-EBI
PDF
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Ontology Services for the Biomedical Sciences
BioAssay Express: Creating and exploiting assay metadata
Web Apollo: Lessons learned from community-based biocuration efforts.
Towards Automated AI-guided Drug Discovery Labs
Developments in Metabolomics leading to PhenoMeNal
Building an informatics solution to sustain AI-guided cell profiling with hig...
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Towards automated phenotypic cell profiling with high-content imaging
Supporting researchers in the molecular life sciences Jeff Christiansen
Building a Network of Interoperable and Independently Produced Linked and Ope...
Connecting life sciences data at the European Bioinformatics Institute
Technical activities in ELIXIR Europe
Overview of Next Gen Sequencing Data Analysis
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
ELIXIR and data grand challenges in life sciences
Introduction to bioinformatics
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Open interoperability standards, tools and services at EMBL-EBI
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Ad

Recently uploaded (20)

PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPT
Chemical bonding and molecular structure
PPTX
2. Earth - The Living Planet earth and life
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
Comparative Structure of Integument in Vertebrates.pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Chemical bonding and molecular structure
2. Earth - The Living Planet earth and life
TOTAL hIP ARTHROPLASTY Presentation.pptx
neck nodes and dissection types and lymph nodes levels
microscope-Lecturecjchchchchcuvuvhc.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
. Radiology Case Scenariosssssssssssssss
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Derivatives of integument scales, beaks, horns,.pptx
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
Taita Taveta Laboratory Technician Workshop Presentation.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Comparative Structure of Integument in Vertebrates.pptx

Advanced Bioinformatics for Genomics and BioData Driven Research

  • 1. European Bioinformatics Institute - the home for big data in biology www.ebi.ac.uk Advanced Bioinformatics for Genomics and BioData Driven Research
  • 2. The European Molecular Biology Laboratory Heidelberg, Germany Main Laboratory Barcelona, Spain Tissue Biology, Disease Modeling 80+ nationalities Hinxton, Cambridge, UK Bioinformatics Mouse Biology Rome, Italy >1700 personnel Grenoble, France Hamburg, Germany Structural Biology 6 sites in Europe Structural Biology
  • 3. Our mission Deliver excellent research Train the next generation of scientists Engage with industry Coordinate bioinformatics in Europe Deliver scientific services
  • 4. Data and tools to support life science research www.ebi.ac.uk/services Bioinformatics services
  • 5. What services do we provide? Labs around the world send us their data and we… Archive it Classify it Share it with other data providers Analyse, add value and integrate it …provide tools to help researchers use it A collaborative enterprise
  • 6. ~64 million requests to EMBL-EBI websites every day 273 petabytes of raw storage in our data centres 22 500 participants to EMBL-EBI Training events Requests from 20 million unique IP addresses Big Data, big demand for EMBL-EBI data services…
  • 7. Data resources at EMBL-EBI
  • 8. Data resources for Genomics – Molecular Archives BioSamples database - centralised resource for FAIR sample data (>12 million samples) Experimental Factor Ontology - systematic description of experimental variables available in EBI databases and projects (26,764 terms) European Genome-phenome Archive - sequence and genotype experiments, including case-control and population studies (3,445 studies) European Nucleotide Archive (ENA) - record of the world's nucleotide sequencing information (>2,400 million sequences, > 7,200 billion bases) European Variation Archive - sole international resource for human and non-human variation
  • 9. Data resources for Genomics – Genes, Genomes & Variation Ensembl - genome browser (human: >0.6 billion SNV, >6 million SV) Ensembl Genomes - 275 vertebrate species / strains; Metazoa; Plants; Fungi; Protists; Bacteria GWAS Catalog - moved to EBI in 2015 (4,390 publicn., > 17,000 assocn.) HGNC - 41,787 approved gene entries (19,320 protein coding) International Genome Sample Resource - ensures future usability and accessibility of 1000 Genomes Project data
  • 10. VEP started as a simple wrapper around the Ensembl API to map variants to transcripts and predict molecular consequence. As new data sets and algorithms have become available, functionality has increased and VEP is now an extensive and sophisticated tool The Ensembl Variant Effect Predictor
  • 11. New resource for Genomics • New resource for gene expression and splicing QTLs • https://www.ebi.ac.uk/eqtl/
  • 12. Global Alliance for Genomics and Health (GA4GH) • Chaired by EMBL-EBI Director Ewan Birney • EMBL-EBI teams leading various activities in Technical Work Streams: • Large Scale Genomics (file formats and htsget subgroups) • Clinical and phenotypic data capture • Data Use and Researcher identification • ENA/EGA/EVA and HCA DCP are also Driver Projects
  • 13. Data resources for Genomics – Molecular Atlas • Human Cell Atlas Data Coordination Platform • In 2017, Chan Zuckerberg Initiative (CZI) funding to EMBL- EBI, Broad Institute and the UCSC Genomics Institute, to build a cloud-based data coordination platform • HCA will generate petabytes of data for billions of cells, across multiple modalities, generated by hundreds of labs around the world • DCP will organise, curate, standardise analyse this data and enable open data access
  • 14. Data resources for Genomics – Proteins and Protein Families A free to use resource for the archiving, assembly, analysis, & browsing of microbiome data AnalysisData archiving Assembly
  • 15. NEW Resource: BioImage Archive Molecules Cells Tissues / Organisms Molecular Machines Graphic courtesy of Jan Ellenberg Light Sheet Microscopy High Throughput Microscopy Superresolution Microscopy Cryo Electron Microscopy Correlate Technologies Integrate Data 0.1 TB / day 0.5 TB / dataset 0.5 TB / day 7.5 TB / dataset 40 TB / day 10 TB / dataset 5 TB / day 20 TB / dataset
  • 18. Pedro Beltrao: Functional landscape of the human phosphoproteome Ochoa et al Nature Biotech 2019 • Created largest phospho- proteome resource to date (120,000 human phosphosites) • Used machine learning methods to compile and analyse large phosphorylation related biological datasets • Identifying new functional phosphosites has enormous potential to progress research into many biological processes and diseases
  • 19. Evangelia Petsalaki: Inference of kinase-kinase regulatory networks from phosphoproteomics data (collaboration with Beltrao group) Invergo*,Petursson* et al, bioRxiv
  • 20. Moritz Gerstung: Pan-cancer computational histopathology • Analysis with deep learning extracts histopathological patterns • accurately discriminates 28 cancer and 14 normal tissue types • Predicts: whole genome duplications; focal amplifications and deletions; driver gene mutations • Correlations with gene expression indicative of immune infiltration and proliferation • Prognostic information augments conventional grading and histopathology subtyping https://doi.org/10.1101/813543
  • 21. Zam Iqbal: Mykrobe – predicting TB drug resistance from WGS data https://wellcomeopenresearch.org/articles/4-191/v1
  • 22. Virginie Uhlmann: Mathematical models for bioimage analysis doi.org/10.1371/journal.pone.0173433
  • 23. Dictionary Learning for Two-Dimensional Kendall Shapes https://arxiv.org/abs/1903.11356
  • 24. An example of best practice for complex datasets Single Cell RNA-Seq analysis at EMBL-EBI From Irene Papatheodorou Team Leader – Gene Expression
  • 25. ArrayExpress – functional genomics archive • started in 2000 as an archive for microarray data • evolved into general archive for high-throughput functional genomics data (microarray- or NGS- based) • all data are manually curated prior to inclusion • microarray data stored directly in ArrayExpress • sequencing data brokered to and stored in ENA • curated datasets support reproducible and re-usable research
  • 26. Annotare – Minimum information about a scRNA-Seq experiment single cell isolation single cell well quality OK doublet debris single cell identifier barcode UMI cDNA read pass fail post-analysis single cell quality library construction inferred cell type R1 R2 I1 files sample metadata https://arxiv.org/abs/1910.14623
  • 27. From database to knowledgebase: Expression Atlases 165 baseline expression ~ 3,350 differential expression > 3,500 bulk datasets 62 species > 955,000 assays > 120 single-cell datasets 12 species https://www.ebi.ac.uk/gxa
  • 29. Interactive Analysis with Galaxy https://humancellatlas.usegalaxy.eu/ Flexible Interoperable Scalable
  • 30. Main Points • Enabling rational choices when composing workflows • Using a common exchange format as ‘workflow glue’ • Galaxy integrations
  • 31. What people usually do... Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers OR OR
  • 32. What we really should be doing Read Filter Normalise Compare Cluster Markers
  • 33. Problem 2: need format glue! ... but to do that we need interoperable components Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Read Filter Normalise Compare Cluster Markers Problem 1: components in different languages
  • 34. Our solution Read Filter Normalise Compare Cluster Markers Environments & containers Workflows CLI CLI CLI CLI CLI CLIScripts layer
  • 35. Galaxy integrations • Extended Galaxy init container: • Thin tool wrappers leveraging Bioconda wrappers • Starting tertiary workflows • Added logic for dynamic destinations • Leverage existing Kubernetes integrations • Improved LSF functionality for non-DRMAA clusters: • Improved CLI executor https://github.com/ebi-gene-expression-group/container-galaxy-sc-tertiary Pablo Moreno
  • 37. Summary • ArrayExpress/Annotare for data Submissions • Expression Atlas/Single Cell Expression Atlas • Analysis Workflows in Galaxy
  • 39. Drug discovery • Finding the right biological target for a drug requires bioinformatics to: • identify promising targets • select candidate medicines. • EMBL-EBI services support all stages of drug discovery: • Ensembl • UniProt • ChEMBL • Protein Data Bank in Europe • Reactome
  • 40. • Pinpointing the processes in the human body that have a demonstrable effect on disease • Aims to improve the success rate in the discovery and repurposing of medicines • A new kind of collaboration with: • GSK • EMBL-EBI • Wellcome Sanger Institute • Biogen • Takeda • Celgene • Sanofi Open Targets www.opentargets.org
  • 41. Open Targets Platform and Open Targets Genetics www.targetvalidation.org genetics.opentargets.org
  • 42. Challenges for the near future • Non-coding SNVs • Data standardization to enable AI/ML • Connecting data • Moving to the cloud
  • 44. www.ebi.ac.uk Stay in touch Twitter: @emblebi Facebook: EMBLEBI LinkedIn: /company/ebi YouTube: EMBLMedia