SlideShare a Scribd company logo
PRIDE and ProteomeXchange – Making 
proteomics data accessible and reusable 
Dr. Yasset Perez-Riverol 
Twitter: @ypriverol 
Github: ypriverol 
Bioinformatician - PRIDE Group 
Proteomics Services Team 
EMBL-EBI 
Hinxton, Cambridge, UK
Proteomics Services, EBI-EMBL 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
Protein Sequences 
IntAct 
Interactions 
PRIDE 
MS/MS Data 
Uniprot 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Reactome 
Pathways 
Biomodels
Overview 
• The ProteomeXchange (PX) consortium 
• PRIDE and ProteomeXchange 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
• PRIDE Components. 
• Current and future developments.
ProteomeXchange Consortium 
• Goal: Development of a framework to allow 
standard data submission and dissemination 
pipelines between the main existing proteomics 
repositories. 
• Includes PeptideAtlas (ISB, Seattle), PRIDE 
(Cambridge, UK) and MassIVE (UCSD, San Diego). 
• Common identifier space (PXD identifiers) 
• Two supported data workflows: MS/MS and SRM. 
• Main objective: Make data available and 
reusable. 
http://www.proteomexchange.org 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
ProteomeXchange data workflow 
Results 
Raw Data* 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
ProteomeCentral 
PRIDE 
(MS/MS data) 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Metadata / 
Manuscript 
Journals 
UniProt/ 
neXtProt 
Peptide Atlas 
Other DBs 
Receiving r e positories 
PASSEL 
(SRM data) 
Other DBs 
GPMDB 
Researcher’s results 
Reprocessed results 
Raw data* 
Metadata 
MassIVE 
(MS/MS data) 
Vizcaíno et al., Nat Biotechnol, 2014
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
MassIVE (UCSD) 
http://proteomics.ucsd.edu/service/massive/ 
• Just joined ProteomeXchange on June 2014
http://www.peptideatlas.org/passel/ 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
• Suitable for SRM assays 
• Part of PeptideAtlas set of 
resources. 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Farrah et al., Proteomics, 2012 
PASSEL: repository for SRM data
Pride: Protein identification Database 
http://www.ebi.ac.uk/pride/archive/ 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
Vizcaíno et al., N. A Research, 2014 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
PX Submission workflow for MS/MS data 
1. Mass spectrometer output files: raw data (binary files) or 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
peak list spectra in a standardized format (mzML, mzXML). 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
2. Result files: 
a. Complete submissions: Result files can be converted to 
PRIDE XML or the mzIdentML data standard. 
b. Partial submissions: For workflows not yet supported by 
PRIDE, search engine output files will be stored and 
provided in their original form. 
3. Metadata: Sufficiently detailed description of sample origin, 
workflow, instrumentation, submitter based on Ontologies and 
Controlled Vocabularies. 
4. Other files: Optional files: 
a. QUANT: Quantification related results e. FASTA 
b. PEAK: Peak list files 
c. OTHER: Any other file type 
Published 
Raw 
Files 
Other 
files 
Ternent et al., Proteomics, 2014
Complete submissions using mzIdentML 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
An increasing number of tools support export to mzIdentML 1.1 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Search 
Engine 
Results + 
MS files 
Search 
engines 
mzIdentML 
- Mascot 
- MSGF+ 
- Myrimatch and related tools from D. Tabb’s lab 
- OpenMS 
- PEAKS 
- ProCon (ProteomeDiscoverer, Sequest) 
- Scaffold 
- TPP via the idConvert tool (ProteoWizard) 
- ProteinPilot (planned by the end of 2014) 
- Others: library for X!Tandem conversion, lab 
internal pipelines, … 
- Referenced spectral files need to be submitted as well 
(all open formats are supported). 
Updated list: 
http://www.psidev.info/tools-implementing-mzIdentML#.
Metadata • Key-Value pairs 
Protein • Table-based 
Peptide • Table-based 
PSM • Table-based 
Small Molecule • Table-based 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
• Basic information about experiment and sample 
• Basic information about protein identifications 
• Information about quantified peptides 
• Information about identified spectra 
• Basic information about identified small molecules 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
mzTab 
http://mztab.googlecode.com 
J. Griss et al., MCP, 2014
PRIDE Components: Submission Process 
PRIDE Converter PRIDE Inspector PX Submission Tool 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
PRIDE Components: PX submission tool 
• Capture the mappings between the different types of files. 
• Add the mandatory metadata annotation. 
• Make the file upload process straightforward to the submitter (It transfers all the 
files using Aspera or FTP). 
• Command line alternative: some scripting is needed. 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Published 
Raw 
Other 
files 
http://www.proteomexchange.org/submission 
PX 
submission 
tool
Available for complete submissions 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Wang et al., Nat. Biotechnology, 2012 
PRIDE Inspector 2.0 
PRIDE Inspector 2.0 supports: 
- PRIDE XML 
- mzIdentML + all types of spectra files 
- mzML 
- mzTab Quantitation (work in progress) 
https://github.com/PRIDE-Toolsuite/
Pride Components: Pipelines and Visualization 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Submission 
validation 
Pipeline 
• QC of files submitted. 
• Metadata check. 
Submission 
pipeline. 
• Add Project to Database (files location, general statistics, 
metadata) 
Publication 
pipeline 
• Conversion of files to mztab 
• Conversion spectra peaks to mgf 
• Index de information in Solr server
Pride Components: Services & Web components 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
ProteomeCentral: Portal for all PX datasets 
http://proteomecentral.proteomexchange.org/cgi/GetDataset 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
ProteomeXchange: 1329 datasets up until October 2014 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Origin: 
271 USA 
166 Germany 
115 United Kingdom 
73 Switzerland 
70 China 
68 Netherlands 
67 France 
55 Canada 
44 Spain 
42 Belgium 
33 Sweden 
31 Australia 
31 Denmark 
31 Japan 
20 India 
20 Norway 
19 Taiwan 
17 Ireland 
16 Austria 
14 Finland 
14 Italy 
12 Republic of Korea 
11 Brazil 
9 Russia 
8 Israel 
7 Singapore … 
Type: 
437 PRIDE complete 
792 PRIDE partial 
63 PeptideAtlas/PASSEL complete 
14 MassIVE 
23 reprocessed 
Publicly Accessible: 
691 datasets, 52% of all 
86% PRIDE 
12% PASSEL 
2% MassIVE 
Top Species studied by at least 10 
datasets: 
577 Homo sapiens 
165 Mus musculus 
56 Saccharomyces cerevisiae 
53 Arabidopsis thaliana 
29 Rattus norvegicus 
22 Escherichia coli 
17 Bos taurus 
16 Mycobacterium tuberculosis 
13 Oryza sativa 
13 Drosophila melanogaster 
13 Glycine max 
~ 290 species in total 
Data volume: 
Total: ~55 TB 
Number of all files: ~131,000 
PXD000320-324: ~ 5 TB 
PXD000065: ~ 1.4TB 
Datasets/year: 
2012: 102 
2013: 527 
2014: 700
Journals and Data Deposition 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
Journal 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Number of Submissions
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Data Access ? 
Total Numbers
Future developments 
• Make the data reusable. 
• Integration of different Protein expression resources 
• PRIDE 
• PeptideAtlas 
• ProteomicsDB 
• Human Proteome Map 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
PXD 
Identifier 
Hits 
Dataset title 
PXD000561 153512 
A draft map of the human 
proteome 
PXD000865 51639 
Mass spectrometry based draft of 
the human proteome
Web Services PROXI PROXI PROXI PROXI PROXI 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
PROXI Clients 
Repositories 
& 
Databases 
Registry 
Data Perez-Riverol Y, Proteomics, 20014
Conclusions 
• ProteomeXchange is widely used. 
• PRIDE contains most of the MS/MS datasets. 
• It has now a new consortium member: MassIVE (UCSD). 
• Around half of the datasets are already public. 
• Different open source tools available to facilitate the process: 
• File transfer speed should not be a problem (Aspera support) 
• Data depostion enables and promotes data reuse. 
• ProteomeXchange is open to new members. 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
Acknowledgements 
PRIDE Team 
Juan A. Vizcaino (Group Leader) 
Attila Csordas 
Rui Wang 
Florian Reisinger 
Jose A. Dianes 
Tobias Ternent 
Yasset Perez-Riverol 
Noemi del Toro 
Henning Hermjakob 
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
PeptideAtlas Team (ISB, Seattle) 
Eric Deutsch 
Terry Farrah 
Zhi Sun 
MAssIVE 
Nuno Bandeira 
And many other PX partners and 
stakeholders 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014)
Yasset Perez-Riverol 
yperez@ebi.ac.uk 
BioHackthon 2014 
Miyagi, Japan (Nov 9-14, 2014) 
Questions?

More Related Content

PPTX
ProteomeXchange: data deposition and data retrieval made easy
PPTX
Mass spectrometry resources at the EBI
PPTX
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
PPTX
PRIDE-ProteomeXchange
PPTX
Reuse of public proteomics data
PPTX
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
PPTX
Proteomics repositories
PPTX
Experiences to learn from the MS proteomics field
ProteomeXchange: data deposition and data retrieval made easy
Mass spectrometry resources at the EBI
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
PRIDE-ProteomeXchange
Reuse of public proteomics data
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics repositories
Experiences to learn from the MS proteomics field

What's hot (20)

PPTX
Proteomics data standards
PPTX
The ProteomeXchange Consoritum: 2017 update
PPTX
Human microbiome project
PPTX
Public proteomics data: a (mostly unexploited) gold mine for computational re...
PDF
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...
PPTX
Germplasm data exchange, CGIAR SINGER (2009)
PPTX
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
PDF
TDWG VoMaG Vocabulary management workflow, 2013-10-31
PPTX
Data exchange alternatives, GIGA TAG (2009)
PDF
Materials design using knowledge from millions of journal articles via natura...
PDF
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
PPTX
Architecture of ContentMine Components contentmine.org
PDF
The Galaxy bioinformatics workflow environment
PPT
The importance of standards for data exchange and interchange on the Royal So...
PDF
Global Biodiversity Information Facility - 2013
PDF
GBIF BIFA mentoring, Day 5a Data management, July 2016
PDF
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
PPTX
GigaScience: a new resource for the big-data community.
PPTX
Multi-omics methods and resources for Bioconductor
PPTX
Text and Data Mining explained at FTDM
Proteomics data standards
The ProteomeXchange Consoritum: 2017 update
Human microbiome project
Public proteomics data: a (mostly unexploited) gold mine for computational re...
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...
Germplasm data exchange, CGIAR SINGER (2009)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
TDWG VoMaG Vocabulary management workflow, 2013-10-31
Data exchange alternatives, GIGA TAG (2009)
Materials design using knowledge from millions of journal articles via natura...
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
Architecture of ContentMine Components contentmine.org
The Galaxy bioinformatics workflow environment
The importance of standards for data exchange and interchange on the Royal So...
Global Biodiversity Information Facility - 2013
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GigaScience: a new resource for the big-data community.
Multi-omics methods and resources for Bioconductor
Text and Data Mining explained at FTDM
Ad

Similar to PRIDE and ProteomeXchange – Making proteomics data accessible and reusable (20)

PPTX
Do we need to make public our proteomics data?
PPTX
ProteomeXchange_and_PRIDE_Semmeting_2015
PDF
PRIDE resources and ProteomeXchange
PPTX
ProteomeXchange update 2017
PPTX
Proteomics public data resources: enabling "big data" analysis in proteomics
PPTX
PRIDE and ProteomeXchange
PPTX
Pride and ProteomeXchange
PPTX
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PPTX
ProteomeXchange update
PDF
AHUPO_Vizcaino_remote_presentation_082014
PPTX
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PPTX
Proteomexchange
PDF
ProteomeXchange update
PPTX
An overview of the PRIDE ecosystem of resources and computational tools for m...
PPTX
PRIDE and ProteomeXchange: Training webinar
PPTX
PSI-Proteome Informatics update
PPTX
Mining the hidden proteome using hundreds of public proteomics datasets
PDF
Reusing and integrating public proteomics data to improve our knowledge of th...
PPTX
Mass Spectrometry Informatics formats in progress
PPT
Pride quality controlattilacsordasbiocuration2012
Do we need to make public our proteomics data?
ProteomeXchange_and_PRIDE_Semmeting_2015
PRIDE resources and ProteomeXchange
ProteomeXchange update 2017
Proteomics public data resources: enabling "big data" analysis in proteomics
PRIDE and ProteomeXchange
Pride and ProteomeXchange
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
ProteomeXchange update
AHUPO_Vizcaino_remote_presentation_082014
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
Proteomexchange
ProteomeXchange update
An overview of the PRIDE ecosystem of resources and computational tools for m...
PRIDE and ProteomeXchange: Training webinar
PSI-Proteome Informatics update
Mining the hidden proteome using hundreds of public proteomics datasets
Reusing and integrating public proteomics data to improve our knowledge of th...
Mass Spectrometry Informatics formats in progress
Pride quality controlattilacsordasbiocuration2012
Ad

More from Yasset Perez-Riverol (14)

PPTX
Introduction to Proteogenomics
PDF
Biocontainers 2019: Presentation for the ELIXIR All Hands
PPTX
Mapping millions of peptidoforms to Genome Coordinates
PDF
Systematic integration of millions of peptidoform evidences into Ensembl and ...
PPTX
Biocontainers Hackathon Introduction
PPTX
BioContainers on ELIXIR All Hands 2017
PPTX
OpenMS: Quantitative proteomics at large scale
PDF
Design of an hexapeptide database for proteomics studies
PDF
Parallel conformational search of small molecules
PPT
PBS Web (Spanish)
PDF
Standarization in Proteomics: From raw data to metadata files
PPT
Yasset perezriverol csi2011
PDF
Yasset iso point-cigb-2012
PPT
SintCompound: A Small Compound Database for Virtual Screening
Introduction to Proteogenomics
Biocontainers 2019: Presentation for the ELIXIR All Hands
Mapping millions of peptidoforms to Genome Coordinates
Systematic integration of millions of peptidoform evidences into Ensembl and ...
Biocontainers Hackathon Introduction
BioContainers on ELIXIR All Hands 2017
OpenMS: Quantitative proteomics at large scale
Design of an hexapeptide database for proteomics studies
Parallel conformational search of small molecules
PBS Web (Spanish)
Standarization in Proteomics: From raw data to metadata files
Yasset perezriverol csi2011
Yasset iso point-cigb-2012
SintCompound: A Small Compound Database for Virtual Screening

Recently uploaded (20)

PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPT
protein biochemistry.ppt for university classes
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
protein biochemistry.ppt for university classes
lecture 2026 of Sjogren's syndrome l .pdf
The KM-GBF monitoring framework – status & key messages.pptx
Placing the Near-Earth Object Impact Probability in Context
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Microbiology with diagram medical studies .pptx
Cell Membrane: Structure, Composition & Functions
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Derivatives of integument scales, beaks, horns,.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Introduction to Cardiovascular system_structure and functions-1
Classification Systems_TAXONOMY_SCIENCE8.pptx
Comparative Structure of Integument in Vertebrates.pptx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...

PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

  • 1. PRIDE and ProteomeXchange – Making proteomics data accessible and reusable Dr. Yasset Perez-Riverol Twitter: @ypriverol Github: ypriverol Bioinformatician - PRIDE Group Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK
  • 2. Proteomics Services, EBI-EMBL Yasset Perez-Riverol yperez@ebi.ac.uk Protein Sequences IntAct Interactions PRIDE MS/MS Data Uniprot BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Reactome Pathways Biomodels
  • 3. Overview • The ProteomeXchange (PX) consortium • PRIDE and ProteomeXchange Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) • PRIDE Components. • Current and future developments.
  • 4. ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. • Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and MassIVE (UCSD, San Diego). • Common identifier space (PXD identifiers) • Two supported data workflows: MS/MS and SRM. • Main objective: Make data available and reusable. http://www.proteomexchange.org Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 5. ProteomeXchange data workflow Results Raw Data* Yasset Perez-Riverol yperez@ebi.ac.uk ProteomeCentral PRIDE (MS/MS data) BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Metadata / Manuscript Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving r e positories PASSEL (SRM data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014
  • 6. Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) MassIVE (UCSD) http://proteomics.ucsd.edu/service/massive/ • Just joined ProteomeXchange on June 2014
  • 7. http://www.peptideatlas.org/passel/ Yasset Perez-Riverol yperez@ebi.ac.uk • Suitable for SRM assays • Part of PeptideAtlas set of resources. BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Farrah et al., Proteomics, 2012 PASSEL: repository for SRM data
  • 8. Pride: Protein identification Database http://www.ebi.ac.uk/pride/archive/ Yasset Perez-Riverol yperez@ebi.ac.uk Vizcaíno et al., N. A Research, 2014 BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 9. PX Submission workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or Yasset Perez-Riverol yperez@ebi.ac.uk peak list spectra in a standardized format (mzML, mzXML). BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) 2. Result files: a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard. b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter based on Ontologies and Controlled Vocabularies. 4. Other files: Optional files: a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files c. OTHER: Any other file type Published Raw Files Other files Ternent et al., Proteomics, 2014
  • 10. Complete submissions using mzIdentML Yasset Perez-Riverol yperez@ebi.ac.uk An increasing number of tools support export to mzIdentML 1.1 BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Search Engine Results + MS files Search engines mzIdentML - Mascot - MSGF+ - Myrimatch and related tools from D. Tabb’s lab - OpenMS - PEAKS - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idConvert tool (ProteoWizard) - ProteinPilot (planned by the end of 2014) - Others: library for X!Tandem conversion, lab internal pipelines, … - Referenced spectral files need to be submitted as well (all open formats are supported). Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
  • 11. Metadata • Key-Value pairs Protein • Table-based Peptide • Table-based PSM • Table-based Small Molecule • Table-based Yasset Perez-Riverol yperez@ebi.ac.uk • Basic information about experiment and sample • Basic information about protein identifications • Information about quantified peptides • Information about identified spectra • Basic information about identified small molecules BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) mzTab http://mztab.googlecode.com J. Griss et al., MCP, 2014
  • 12. PRIDE Components: Submission Process PRIDE Converter PRIDE Inspector PX Submission Tool Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 13. PRIDE Components: PX submission tool • Capture the mappings between the different types of files. • Add the mandatory metadata annotation. • Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP). • Command line alternative: some scripting is needed. Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Published Raw Other files http://www.proteomexchange.org/submission PX submission tool
  • 14. Available for complete submissions Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Wang et al., Nat. Biotechnology, 2012 PRIDE Inspector 2.0 PRIDE Inspector 2.0 supports: - PRIDE XML - mzIdentML + all types of spectra files - mzML - mzTab Quantitation (work in progress) https://github.com/PRIDE-Toolsuite/
  • 15. Pride Components: Pipelines and Visualization Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Submission validation Pipeline • QC of files submitted. • Metadata check. Submission pipeline. • Add Project to Database (files location, general statistics, metadata) Publication pipeline • Conversion of files to mztab • Conversion spectra peaks to mgf • Index de information in Solr server
  • 16. Pride Components: Services & Web components Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 17. ProteomeCentral: Portal for all PX datasets http://proteomecentral.proteomexchange.org/cgi/GetDataset Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 18. ProteomeXchange: 1329 datasets up until October 2014 Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Origin: 271 USA 166 Germany 115 United Kingdom 73 Switzerland 70 China 68 Netherlands 67 France 55 Canada 44 Spain 42 Belgium 33 Sweden 31 Australia 31 Denmark 31 Japan 20 India 20 Norway 19 Taiwan 17 Ireland 16 Austria 14 Finland 14 Italy 12 Republic of Korea 11 Brazil 9 Russia 8 Israel 7 Singapore … Type: 437 PRIDE complete 792 PRIDE partial 63 PeptideAtlas/PASSEL complete 14 MassIVE 23 reprocessed Publicly Accessible: 691 datasets, 52% of all 86% PRIDE 12% PASSEL 2% MassIVE Top Species studied by at least 10 datasets: 577 Homo sapiens 165 Mus musculus 56 Saccharomyces cerevisiae 53 Arabidopsis thaliana 29 Rattus norvegicus 22 Escherichia coli 17 Bos taurus 16 Mycobacterium tuberculosis 13 Oryza sativa 13 Drosophila melanogaster 13 Glycine max ~ 290 species in total Data volume: Total: ~55 TB Number of all files: ~131,000 PXD000320-324: ~ 5 TB PXD000065: ~ 1.4TB Datasets/year: 2012: 102 2013: 527 2014: 700
  • 19. Journals and Data Deposition Yasset Perez-Riverol yperez@ebi.ac.uk Journal BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Number of Submissions
  • 20. Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Data Access ? Total Numbers
  • 21. Future developments • Make the data reusable. • Integration of different Protein expression resources • PRIDE • PeptideAtlas • ProteomicsDB • Human Proteome Map Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) PXD Identifier Hits Dataset title PXD000561 153512 A draft map of the human proteome PXD000865 51639 Mass spectrometry based draft of the human proteome
  • 22. Web Services PROXI PROXI PROXI PROXI PROXI Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) PROXI Clients Repositories & Databases Registry Data Perez-Riverol Y, Proteomics, 20014
  • 23. Conclusions • ProteomeXchange is widely used. • PRIDE contains most of the MS/MS datasets. • It has now a new consortium member: MassIVE (UCSD). • Around half of the datasets are already public. • Different open source tools available to facilitate the process: • File transfer speed should not be a problem (Aspera support) • Data depostion enables and promotes data reuse. • ProteomeXchange is open to new members. Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 24. Acknowledgements PRIDE Team Juan A. Vizcaino (Group Leader) Attila Csordas Rui Wang Florian Reisinger Jose A. Dianes Tobias Ternent Yasset Perez-Riverol Noemi del Toro Henning Hermjakob Yasset Perez-Riverol yperez@ebi.ac.uk PeptideAtlas Team (ISB, Seattle) Eric Deutsch Terry Farrah Zhi Sun MAssIVE Nuno Bandeira And many other PX partners and stakeholders BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)
  • 25. Yasset Perez-Riverol yperez@ebi.ac.uk BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014) Questions?