SlideShare a Scribd company logo
An overview of the PRIDE ecosystem of
resources and computational tools for
mass spectrometry proteomics data
Dr. Juan Antonio Vizcaíno
EMBL-European Bioinformatics Institute
Hinxton, Cambridge, UK
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Overview
• PRIDE Archive and ProteomeXchange
• PRIDE tools
• Reuse of public proteomics data
• PRIDE added-value resources: PRIDE Cluster and
PRIDE Proteomes
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
What is a proteomics publication in 2016?
• Proteomics studies generate potentially large amounts of
data and results.
• Ideally, a proteomics publication needs to:
• Summarize the results of the study
• Provide supporting information for reliability of any
results reported
• Information in a publication:
• Manuscript
• Supplementary material
• Associated data submitted to a public repository
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
• PRIDE stores mass spectrometry (MS)-based
proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
PRIDE (PRoteomics IDEntifications) Archive
http://www.ebi.ac.uk/pride/archive
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
ProteomeXchange: A Global, distributed proteomics
database
PASSEL
(SRM data)
PRIDE
(MS/MS data)
MassIVE
(MS/MS data)
Raw
ID/Q
Meta
jPOST
(MS/MS data)
Mandatory raw data deposition
since July 2015
• Goal: Development of a framework to allow standard data submission and
dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.org
New in 2016
Vizcaíno et al., Nat Biotechnol, 2014
Deustch et al., NAR, 2017, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide Atlas
Receiving repositories
PRIDE
Researcher’s results
Raw data
Metadata
PASSEL
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Vizcaíno et al., Nat Biotechnol, 2014
Deustch et al., NAR, 2017, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
ProteomeCentral: Centralised portal for all PX
datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
Peptide Atlas
Receiving repositories
PRIDE
Researcher’s results
Raw data
Metadata
PASSEL
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Vizcaíno et al., Nat Biotechnol, 2014
Deustch et al., NAR, 2017, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
ProteomeCentral
Metadata /
Manuscript
Raw Data
Results
Journals
UniProt/
neXtProtPeptide Atlas
Other DBs
Receiving repositories
PRIDE
GPMDBResearcher’s results
Raw data
Metadata
PASSEL
proteomicsDB
Research
groups
Reanalysis of datasets
MassIVE
jPOST
MS/MS
data
(as complete
submissions)
Any other
workflow
(mainly partial
submissions)
DATASETS
OmicsDI
Integration with other
omics datasets
SRM
data
Reprocessed results
MassIVE
ProteomeXchange data workflow
Vizcaíno et al., Nat Biotechnol, 2014
Deustch et al., NAR, 2017, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE: Source of MS proteomics data
• PRIDE Archive already provides or
will soon provide MS proteomics
data to other EMBL-EBI resources
such as UniProt, Ensembl and the
EBI Expression Atlas.
http://www.ebi.ac.uk/pride/archive
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE Archive – over 4,500 datasets from
over 51 countries and 1,700 groups
• USA – 814 datasets
• Germany – 528
• UK – 338
• China – 328
• France – 222
• Netherlands – 175
• Canada - 137
Data volume:
• Total: ~275 TB
• Number of all files: ~560,000
• PXD000320-324: ~ 4 TB
• PXD002319-26 ~2.4 TB
• PXD001471 ~1.6 TB
• 1,973 datasets i.e. 52% of
all are publicly accessible
• ~90% of all
ProteomeXchange datasets
YearSubmissions
All submissions
Complete
PRIDE Archive growth
In the last 12 months: ~165 submitted datasets per month
Top Species studied by at least 100
datasets:
2,010 Homo sapiens
604 Mus musculus
191 Saccharomyces cerevisiae
140 Arabidopsis thaliana
127 Rattus norvegicus
>900 reported taxa in total
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Overview
• PRIDE Archive and ProteomeXchange
• PRIDE tools
• Reuse of public proteomics data
• PRIDE added-value resources: PRIDE Cluster and
PRIDE Proteomes
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE Components: Data Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
In addition to PRIDE Archive, the PRIDE team develops
and maintains different tools and software libraries to
facilitate the handling and visualisation of MS proteomics
data and the submission process
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012
Perez-Riverol et al., Bioinformatics,
2015
Perez-Riverol et al., MCP, 2016
• PRIDE Inspector - standalone tool to enable visualisation and validation of MS
data.
• Build on top of ms-data-core-api - open source algorithms and libraries for
computational proteomics.
• Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE
XML.
• Broad functionality.
https://github.com/PRIDE-Utilities/ms-data-core-api
https://github.com/PRIDE-Toolsuite/pride-inspector
Summary and QC charts Peptide spectra annotation and
visualization
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PX Submission Tool
 Desktop application for data
submissions to ProteomeXchange via
PRIDE
• Implemented in Java 7
• Streamlines the submission process
• Capture mappings between files
• Retain metadata
• Fast file transfer with Aspera (FASP®
transfer technology) – FTP also
available
• Command line option
Submission tool screenshot
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Overview
• PRIDE Archive and ProteomeXchange
• PRIDE tools
• Reuse of public proteomics data
• PRIDE added-value resources: PRIDE Cluster and
PRIDE Proteomes
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Datasets are being reused more and more….
Vaudel et al., Proteomics, 2016
Data download volume for
PRIDE Archive in 2015: 198 TB
0
50
100
150
200
250
2013 2014 2015 2016
Downloads in TBs
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Data sharing in Proteomics
Vaudel et al., Proteomics, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014 Kim et al., Nature, 2014
•Two independent groups claimed to have produced the
first complete draft of the human proteome by MS.
• Some of their findings are controversial and need further
validation… but generated a lot of discussion and put
proteomics in the spotlight.
•They used many different tissues.
Nature cover 29 May 2014
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014
•Around 60% of the data used for the
analysis comes from previous
experiments, most of them stored in
proteomics repositories such as
PRIDE/ProteomeXchange, PASSEL or
MassIVE.
•They complement that data with “exotic”
tissues.
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Data sharing in Proteomics
Vaudel et al., Proteomics, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Examples of repurposing in proteogenomics
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Public datasets from different omics: OmicsDI
http://www.ebi.ac.uk/Tools/omicsdi/
• Aims to integrate of ‘omics’ datasets (proteomics,
transcriptomics, metabolomics and genomics at present).
PRIDE
MassIVE
jPOST
PASSEL
GPMDB
ArrayExpress
Expression Atlas
MetaboLights
Metabolomics Workbench
GNPS
EGA
Perez-Riverol et al., Nat Biotechnol, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
OmicsDI: Portal for omics datasets
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
OmicsDI: Portal for omics datasets
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Overview
• PRIDE Archive and ProteomeXchange
• PRIDE tools
• Reuse of public proteomics data
• PRIDE added-value resources: PRIDE Cluster and
PRIDE Proteomes
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Added value resources: PRIDE Cluster
and PRIDE Proteomes
• Condensed and across-data set, QC-filtered view on
PRIDE data.
• PRIDE Cluster: Peptide centric.
• PRIDE Proteomes: Protein centric (identification data)
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Data sharing in Proteomics
Vaudel et al., Proteomics, 2016
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE Cluster
• Provide an aggregated peptide centric view of PRIDE Archive.
• Hypothesis: same peptide will generate similar MS/MS spectra across
experiments.
• New version of spectral clustering algorithm to reliably group spectra
coming from the same peptide.
• Enables QC of peptide-spectrum matches (PSMs). Infer reliable
identifications by comparing submitted identifications of spectra within a
cluster.
 After clustering, a representative spectrum is built for all peptides
consistently identified across different datasets.
 Used to build spectral libraries (for 16 species).
Griss et al., Nat. Methods, 2013
Griss et al., Nat. Methods,
2016
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Example: one perfect cluster
- 880 PSMs give the same peptide ID
- 4 species
- 28 datasets
- Same instruments
http://wwwdev.ebi.ac.uk/pride/cluster/
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
PRIDE Proteomes web interface:
identification info Unique/Shared Peptides
Mass spec-based
sequence coverage
PTM detected ( )
Observed
tissues
Biological vs
Sample Prep
PTMs
http://wwwdev.ebi.ac.uk/pride/proteomes/
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Conclusions
• PRIDE Archive and ProteomeXchange have become the
standard platform for public data deposition in proteomics.
• PRIDE Inspector: support for data standards.
• PX submission tool.
• Reuse of public proteomics data is increasing: many
opportunities for data miners.
• OmicsDI: new platform to identify public datasets coming
from different omics technologies (more possibilities for data
reuse!).
• PRIDE Cluster and PRIDE Proteomes.
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Aknowledgements: People
Attila Csordas
Tobias Ternent
Gerhard Mayer (de.NBI)
Johannes Griss
Yasset Perez-Riverol
Manuel Bernal-Llinares
Andrew Jarnuczak
Enrique Perez
Former team members, especially
Rui Wang, Florian Reisinger, Noemi
del Toro, Jose A. Dianes & Henning
Hermjakob
Acknowledgements: The PRIDE Team
All data submitters !!!
@pride_ebi
@proteomexchange
Juan A. Vizcaíno
juan@ebi.ac.uk
Mass Spectrometry and Proteomics Congress 2016
London, 15 November 2016
Questions?
http://www.slideshare.net/JuanAntonioVizcaino

More Related Content

PPTX
Mining the hidden proteome using hundreds of public proteomics datasets
PPTX
Experiences to learn from the MS proteomics field
PPTX
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
PPTX
Proteomics public data resources: enabling "big data" analysis in proteomics
PDF
Pride cluster presentation
PPTX
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PPTX
Mass spectrometry resources at the EBI
PPTX
PRIDE-ProteomeXchange
Mining the hidden proteome using hundreds of public proteomics datasets
Experiences to learn from the MS proteomics field
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics public data resources: enabling "big data" analysis in proteomics
Pride cluster presentation
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
Mass spectrometry resources at the EBI
PRIDE-ProteomeXchange

What's hot (20)

PPTX
Reuse of public proteomics data
PPTX
Proteomics data standards
PPTX
Proteomics repositories
PPTX
ProteomeXchange update HUPO 2016
PPTX
Public proteomics data: a (mostly unexploited) gold mine for computational re...
PPTX
Proteomics data standards
PPTX
ProteomeXchange_and_PRIDE_Semmeting_2015
PPT
The UK National Chemical Database Service – an integration of commercial and ...
PPT
Royal society of chemistry activities to develop a data repository for chemis...
PPT
How the InChI identifier is used to underpin our online chemistry databases a...
PPTX
How to run and maintain a popular biological data repository?
PPT
The importance of standards for data exchange and interchange on the Royal So...
PPTX
Investigating Impact Metrics for Performance for the US-EPA National Center f...
PPTX
Mass Spectrometry Informatics formats in progress
PPTX
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
PPTX
Human microbiome project
PPT
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
PPT
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
PPT
The application of text and data mining to enhance the RSC publication archive
Reuse of public proteomics data
Proteomics data standards
Proteomics repositories
ProteomeXchange update HUPO 2016
Public proteomics data: a (mostly unexploited) gold mine for computational re...
Proteomics data standards
ProteomeXchange_and_PRIDE_Semmeting_2015
The UK National Chemical Database Service – an integration of commercial and ...
Royal society of chemistry activities to develop a data repository for chemis...
How the InChI identifier is used to underpin our online chemistry databases a...
How to run and maintain a popular biological data repository?
The importance of standards for data exchange and interchange on the Royal So...
Investigating Impact Metrics for Performance for the US-EPA National Center f...
Mass Spectrometry Informatics formats in progress
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
Human microbiome project
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
The application of text and data mining to enhance the RSC publication archive
Ad

Similar to An overview of the PRIDE ecosystem of resources and computational tools for mass spectrometry proteomics data (20)

PPTX
Pride and ProteomeXchange
PPTX
PRIDE and ProteomeXchange
PDF
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
PPTX
Proteomics repositories
PDF
Reusing and integrating public proteomics data to improve our knowledge of th...
PPTX
Proteomexchange
PPTX
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PPTX
Proteomics data standards
PPTX
Reuse of public data in proteomics
PPTX
Is it feasible to identify novel biomarkers by mining public proteomics data?
PPTX
The ProteomeXchange Consoritum: 2017 update
PPTX
Reuse of public proteomics data
PPTX
ProteomeXchange update 2017
PPTX
ProteomeXchange update
PPTX
PRIDE and ProteomeXchange: Training webinar
PPTX
Pride Cluster 062016 Update
PPTX
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
PDF
ProteomeXchange update
PPTX
Proteomics data standards
PDF
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
Pride and ProteomeXchange
PRIDE and ProteomeXchange
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Proteomics repositories
Reusing and integrating public proteomics data to improve our knowledge of th...
Proteomexchange
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
Proteomics data standards
Reuse of public data in proteomics
Is it feasible to identify novel biomarkers by mining public proteomics data?
The ProteomeXchange Consoritum: 2017 update
Reuse of public proteomics data
ProteomeXchange update 2017
ProteomeXchange update
PRIDE and ProteomeXchange: Training webinar
Pride Cluster 062016 Update
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
ProteomeXchange update
Proteomics data standards
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
Ad

More from Juan Antonio Vizcaino (16)

PPTX
Introduction to the PSI standard data formats
PDF
Reuse of public proteomics data
PDF
PRIDE resources and ProteomeXchange
PDF
Proteomics repositories
PDF
Introduction to the Proteomics Bioinformatics Course 2018
PDF
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
PPTX
PSI-Proteome Informatics update
PDF
The ELIXIR Proteomics community
PDF
The ELIXIR Proteomics Community
PPTX
Proteomics repositories
PPTX
Introduction to the Proteomics Bioinformatics Course 2017
PPTX
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
PPTX
Enabling automated processing and analysis of large-scale proteomics data
PPTX
Introduction to EBI for Proteomics in ELIXIR
PPTX
The Proteomics Standards Initiative (PSI)
PPTX
Introduction to the Proteomics Bioinformatics Course 2016
Introduction to the PSI standard data formats
Reuse of public proteomics data
PRIDE resources and ProteomeXchange
Proteomics repositories
Introduction to the Proteomics Bioinformatics Course 2018
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
PSI-Proteome Informatics update
The ELIXIR Proteomics community
The ELIXIR Proteomics Community
Proteomics repositories
Introduction to the Proteomics Bioinformatics Course 2017
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
Enabling automated processing and analysis of large-scale proteomics data
Introduction to EBI for Proteomics in ELIXIR
The Proteomics Standards Initiative (PSI)
Introduction to the Proteomics Bioinformatics Course 2016

Recently uploaded (20)

PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
2Systematics of Living Organisms t-.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
BIOMOLECULES PPT........................
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
2. Earth - The Living Planet earth and life
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
INTRODUCTION TO EVS | Concept of sustainability
2Systematics of Living Organisms t-.pptx
Placing the Near-Earth Object Impact Probability in Context
BIOMOLECULES PPT........................
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
The KM-GBF monitoring framework – status & key messages.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Introduction to Fisheries Biotechnology_Lesson 1.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
ECG_Course_Presentation د.محمد صقران ppt
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
2. Earth - The Living Planet earth and life
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Cell Membrane: Structure, Composition & Functions
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud

An overview of the PRIDE ecosystem of resources and computational tools for mass spectrometry proteomics data

  • 1. An overview of the PRIDE ecosystem of resources and computational tools for mass spectrometry proteomics data Dr. Juan Antonio Vizcaíno EMBL-European Bioinformatics Institute Hinxton, Cambridge, UK
  • 2. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Overview • PRIDE Archive and ProteomeXchange • PRIDE tools • Reuse of public proteomics data • PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
  • 3. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 What is a proteomics publication in 2016? • Proteomics studies generate potentially large amounts of data and results. • Ideally, a proteomics publication needs to: • Summarize the results of the study • Provide supporting information for reliability of any results reported • Information in a publication: • Manuscript • Supplementary material • Associated data submitted to a public repository
  • 4. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 • PRIDE stores mass spectrometry (MS)-based proteomics data: • Peptide and protein expression data (identification and quantification) • Post-translational modifications • Mass spectra (raw data and peak lists) • Technical and biological metadata • Any other related information • Full support for tandem MS approaches PRIDE (PRoteomics IDEntifications) Archive http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2016
  • 5. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 ProteomeXchange: A Global, distributed proteomics database PASSEL (SRM data) PRIDE (MS/MS data) MassIVE (MS/MS data) Raw ID/Q Meta jPOST (MS/MS data) Mandatory raw data deposition since July 2015 • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. http://www.proteomexchange.org New in 2016 Vizcaíno et al., Nat Biotechnol, 2014 Deustch et al., NAR, 2017, in press
  • 6. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 ProteomeCentral Metadata / Manuscript Raw Data Results Journals Peptide Atlas Receiving repositories PRIDE Researcher’s results Raw data Metadata PASSEL Research groups Reanalysis of datasets MassIVE jPOST MS/MS data (as complete submissions) Any other workflow (mainly partial submissions) DATASETS SRM data Reprocessed results MassIVE ProteomeXchange data workflow Vizcaíno et al., Nat Biotechnol, 2014 Deustch et al., NAR, 2017, in press
  • 7. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 ProteomeCentral: Centralised portal for all PX datasets http://proteomecentral.proteomexchange.org/cgi/GetDataset
  • 8. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 ProteomeCentral Metadata / Manuscript Raw Data Results Journals Peptide Atlas Receiving repositories PRIDE Researcher’s results Raw data Metadata PASSEL Research groups Reanalysis of datasets MassIVE jPOST MS/MS data (as complete submissions) Any other workflow (mainly partial submissions) DATASETS SRM data Reprocessed results MassIVE ProteomeXchange data workflow Vizcaíno et al., Nat Biotechnol, 2014 Deustch et al., NAR, 2017, in press
  • 9. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 ProteomeCentral Metadata / Manuscript Raw Data Results Journals UniProt/ neXtProtPeptide Atlas Other DBs Receiving repositories PRIDE GPMDBResearcher’s results Raw data Metadata PASSEL proteomicsDB Research groups Reanalysis of datasets MassIVE jPOST MS/MS data (as complete submissions) Any other workflow (mainly partial submissions) DATASETS OmicsDI Integration with other omics datasets SRM data Reprocessed results MassIVE ProteomeXchange data workflow Vizcaíno et al., Nat Biotechnol, 2014 Deustch et al., NAR, 2017, in press
  • 10. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE: Source of MS proteomics data • PRIDE Archive already provides or will soon provide MS proteomics data to other EMBL-EBI resources such as UniProt, Ensembl and the EBI Expression Atlas. http://www.ebi.ac.uk/pride/archive
  • 11. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE Archive – over 4,500 datasets from over 51 countries and 1,700 groups • USA – 814 datasets • Germany – 528 • UK – 338 • China – 328 • France – 222 • Netherlands – 175 • Canada - 137 Data volume: • Total: ~275 TB • Number of all files: ~560,000 • PXD000320-324: ~ 4 TB • PXD002319-26 ~2.4 TB • PXD001471 ~1.6 TB • 1,973 datasets i.e. 52% of all are publicly accessible • ~90% of all ProteomeXchange datasets YearSubmissions All submissions Complete PRIDE Archive growth In the last 12 months: ~165 submitted datasets per month Top Species studied by at least 100 datasets: 2,010 Homo sapiens 604 Mus musculus 191 Saccharomyces cerevisiae 140 Arabidopsis thaliana 127 Rattus norvegicus >900 reported taxa in total
  • 12. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Overview • PRIDE Archive and ProteomeXchange • PRIDE tools • Reuse of public proteomics data • PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
  • 13. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE Components: Data Submission Process PRIDE Converter 2 PRIDE Inspector PX Submission Tool mzIdentML PRIDE XML In addition to PRIDE Archive, the PRIDE team develops and maintains different tools and software libraries to facilitate the handling and visualisation of MS proteomics data and the submission process
  • 14. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE Inspector Toolsuite Wang et al., Nat. Biotechnology, 2012 Perez-Riverol et al., Bioinformatics, 2015 Perez-Riverol et al., MCP, 2016 • PRIDE Inspector - standalone tool to enable visualisation and validation of MS data. • Build on top of ms-data-core-api - open source algorithms and libraries for computational proteomics. • Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE XML. • Broad functionality. https://github.com/PRIDE-Utilities/ms-data-core-api https://github.com/PRIDE-Toolsuite/pride-inspector Summary and QC charts Peptide spectra annotation and visualization
  • 15. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PX Submission Tool  Desktop application for data submissions to ProteomeXchange via PRIDE • Implemented in Java 7 • Streamlines the submission process • Capture mappings between files • Retain metadata • Fast file transfer with Aspera (FASP® transfer technology) – FTP also available • Command line option Submission tool screenshot
  • 16. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Overview • PRIDE Archive and ProteomeXchange • PRIDE tools • Reuse of public proteomics data • PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
  • 17. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Datasets are being reused more and more…. Vaudel et al., Proteomics, 2016 Data download volume for PRIDE Archive in 2015: 198 TB 0 50 100 150 200 250 2013 2014 2015 2016 Downloads in TBs
  • 18. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Data sharing in Proteomics Vaudel et al., Proteomics, 2016
  • 19. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Draft Human proteome papers published in 2014 Wilhelm et al., Nature, 2014 Kim et al., Nature, 2014 •Two independent groups claimed to have produced the first complete draft of the human proteome by MS. • Some of their findings are controversial and need further validation… but generated a lot of discussion and put proteomics in the spotlight. •They used many different tissues. Nature cover 29 May 2014
  • 20. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Draft Human proteome papers published in 2014 Wilhelm et al., Nature, 2014 •Around 60% of the data used for the analysis comes from previous experiments, most of them stored in proteomics repositories such as PRIDE/ProteomeXchange, PASSEL or MassIVE. •They complement that data with “exotic” tissues.
  • 21. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Data sharing in Proteomics Vaudel et al., Proteomics, 2016
  • 22. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Examples of repurposing in proteogenomics
  • 23. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Public datasets from different omics: OmicsDI http://www.ebi.ac.uk/Tools/omicsdi/ • Aims to integrate of ‘omics’ datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVE jPOST PASSEL GPMDB ArrayExpress Expression Atlas MetaboLights Metabolomics Workbench GNPS EGA Perez-Riverol et al., Nat Biotechnol, in press
  • 24. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 OmicsDI: Portal for omics datasets
  • 25. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 OmicsDI: Portal for omics datasets
  • 26. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Overview • PRIDE Archive and ProteomeXchange • PRIDE tools • Reuse of public proteomics data • PRIDE added-value resources: PRIDE Cluster and PRIDE Proteomes
  • 27. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Added value resources: PRIDE Cluster and PRIDE Proteomes • Condensed and across-data set, QC-filtered view on PRIDE data. • PRIDE Cluster: Peptide centric. • PRIDE Proteomes: Protein centric (identification data)
  • 28. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Data sharing in Proteomics Vaudel et al., Proteomics, 2016
  • 29. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE Cluster • Provide an aggregated peptide centric view of PRIDE Archive. • Hypothesis: same peptide will generate similar MS/MS spectra across experiments. • New version of spectral clustering algorithm to reliably group spectra coming from the same peptide. • Enables QC of peptide-spectrum matches (PSMs). Infer reliable identifications by comparing submitted identifications of spectra within a cluster.  After clustering, a representative spectrum is built for all peptides consistently identified across different datasets.  Used to build spectral libraries (for 16 species). Griss et al., Nat. Methods, 2013 Griss et al., Nat. Methods, 2016
  • 30. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Example: one perfect cluster - 880 PSMs give the same peptide ID - 4 species - 28 datasets - Same instruments http://wwwdev.ebi.ac.uk/pride/cluster/
  • 31. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 PRIDE Proteomes web interface: identification info Unique/Shared Peptides Mass spec-based sequence coverage PTM detected ( ) Observed tissues Biological vs Sample Prep PTMs http://wwwdev.ebi.ac.uk/pride/proteomes/
  • 32. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Conclusions • PRIDE Archive and ProteomeXchange have become the standard platform for public data deposition in proteomics. • PRIDE Inspector: support for data standards. • PX submission tool. • Reuse of public proteomics data is increasing: many opportunities for data miners. • OmicsDI: new platform to identify public datasets coming from different omics technologies (more possibilities for data reuse!). • PRIDE Cluster and PRIDE Proteomes.
  • 33. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Aknowledgements: People Attila Csordas Tobias Ternent Gerhard Mayer (de.NBI) Johannes Griss Yasset Perez-Riverol Manuel Bernal-Llinares Andrew Jarnuczak Enrique Perez Former team members, especially Rui Wang, Florian Reisinger, Noemi del Toro, Jose A. Dianes & Henning Hermjakob Acknowledgements: The PRIDE Team All data submitters !!! @pride_ebi @proteomexchange
  • 34. Juan A. Vizcaíno juan@ebi.ac.uk Mass Spectrometry and Proteomics Congress 2016 London, 15 November 2016 Questions? http://www.slideshare.net/JuanAntonioVizcaino