SlideShare a Scribd company logo
ProteomeXchange: data 
deposition and data retrieval made 
easy 
Juan Antonio VIZCAINO, Ph.D. 
PRIDE Group coordinator 
Proteomics Services Group 
European Bioinformatics Institute 
Hinxton, Cambridge 
United Kingdom 
juan@ebi.ac.uk
Overview 
• The ProteomeXchange (PX) consortium 
• Highlights in the last year 
• PRIME-XS datasets
ProteomeXchange Consortium 
• Goal: Development of a framework to allow 
standard data submission and dissemination 
pipelines between the main existing proteomics 
repositories. 
• Includes PeptideAtlas (ISB, Seattle), PRIDE 
(Cambridge, UK) and (very recently) MassIVE 
(UCSD, San Diego). 
• Common identifier space (PXD identifiers) 
• Two supported data workflows: MS/MS and SRM. 
• Main objective: Make life easier for researchers 
http://www.proteomexchange.org
ProteomeXchange data workflow 
ProteomeCentral 
Results 
Raw Data* 
Metadata / 
Manuscript 
PRIDE 
(MS/MS data) 
Journals 
UniProt/ 
neXtProt 
Peptide Atlas 
Other DBs 
Receiving repositories 
PASSEL 
(SRM data) 
Other DBs 
GPMDB 
Researcher’s results 
Reprocessed results 
Raw data* 
Metadata 
MassIVE 
(MS/MS data) 
Vizcaíno et al., Nat Biotechnol, 2014
MassIVE (UCSD) 
http://proteomics.ucsd.edu/service/massive/ 
• Just joined ProteomeXchange on June 2014 
• Only partial submissions. A few datasets so far.
Overview 
• The ProteomeXchange (PX) consortium 
• Highlights in the last year 
• PRIME-XS datasets
PX Data workflow for MS/MS data 
1. Mass spectrometer output files: raw data (binary files) or 
peak list spectra in a standardized format (mzML, mzXML). 
2. Result files: 
a. Complete submissions: Result files can be converted to 
PRIDE XML or the mzIdentML data standard. 
b. Partial submissions: For workflows not yet supported by 
PRIDE, search engine output files will be stored and 
provided in their original form. 
3. Metadata: Sufficiently detailed description of sample origin, 
workflow, instrumentation, submitter. 
4. Other files: Optional files: 
a. QUANT: Quantification related results e. FASTA 
b. PEAK: Peak list files f. SP_LIBRARY 
c. GEL: Gel images 
d. OTHER: Any other file type 
Published 
Raw 
Files 
Other 
files
Complete vs Partial submissions: 
processed results 
For complete submissions, it is possible to connect the spectra with the identification 
processed results and they can be visualized. 
PRIDE XML, mzIdentML supported 
mzTab to come 
Complete Partial
Complete vs Partial submissions: 
experimental metadata 
Complete Partial 
General experimental metadata about the projects is similar. 
However, at the assay level information, in partial submissions is less annotated
Complete submissions using 
mzIdentML 
Search Engine 
Results + MS 
files 
Search 
engines 
mzIdentML 
An increasing number of tools support export to mzIdentML 1.1 
- Mascot 
- MSGF+ 
- Myrimatch and related tools from D. Tabb’s lab 
- OpenMS 
- PEAKS 
- ProCon (ProteomeDiscoverer, Sequest) 
- Scaffold 
- TPP via the idConvert tool (ProteoWizard) 
- ProteinPilot (planned by the end of 2014) 
- Others: library for X!Tandem conversion, lab 
internal pipelines, … 
- Referenced spectral files need to be submitted as well 
(all open formats are supported). 
Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
Tools ‘RESULT’ file generation Final ‘RESULT’ file 
mzIdentML 
‘RESULT’ 
Now: native file export 
Spectra 
files 
Mascot 
ProteinPilo 
t 
Scaffold 
PEAKS 
MSGF+ 
Others 
Native File export
Original data files ‘RESULT’ file generation Final ‘RESULT’ file 
Search 
output 
files 
Spectra 
files 
PRIDE 
XML 
‘RESULT’ 
Before: file conversion using PRIDE 
Converter 
File conversion 
PRIDE 
Converter
PRIDE Inspector 2 
Wang et al., Nat. Biotechnology, 2012 
PRIDE Inspector 2.0 
PRIDE Inspector 2.0 supports: 
- PRIDE XML 
- mzIdentML + all types of spectra files 
- mzML 
- mzTab (work in progress) 
http://code.google.com/p/pride-toolsuite/ 
wiki/PRIDEInspector
PX submission tool: data submission 
Published 
Raw 
Other 
files 
http://www.proteomexchange.org/submission 
PX 
submission 
tool 
• Capture the mappings between the different types of files. 
• Add the mandatory metadata annotation. 
• Make the file upload process straightforward to the submitter (It transfers all the 
files using Aspera or FTP). 
• Command line alternative: some scripting is needed.
Uploading large datasets: Aspera 
- Aspera is the default file transfer protocol to PRIDE: 
- PX Submission tool 
- Command line 
- Up to 50X faster than FTP 
File transfer speed should 
not be a problem!!
Tutorial manuscript detailing 
the process 
Example dataset: 
PXD000764 
- Title: “Discovery of new CSF biomarkers for meningitis in children” 
- 12 runs: 4 controls and 8 infected samples 
- Identification and quantification data 
http://www.proteomexchange.org/submission Ternent et al., Proteomics, 2014
ProteomeXchange: 1329 datasets up until October 2014 
Origin: 
271 USA 
166 Germany 
115 United Kingdom 
73 Switzerland 
70 China 
68 Netherlands 
67 France 
55 Canada 
44 Spain 
42 Belgium 
33 Sweden 
31 Australia 
31 Denmark 
31 Japan 
20 India 
20 Norway 
19 Taiwan 
17 Ireland 
16 Austria 
14 Finland 
14 Italy 
12 Republic of Korea 
11 Brazil 
9 Russia 
8 Israel 
7 Singapore … 
Type: 
437 PRIDE complete 
792 PRIDE partial 
63 PeptideAtlas/PASSEL complete 
14 MassIVE 
23 reprocessed 
Publicly Accessible: 
691 datasets, 52% of all 
86% PRIDE 
12% PASSEL 
2% MassIVE 
Top Species studied by at least 10 
datasets: 
577 Homo sapiens 
165 Mus musculus 
56 Saccharomyces cerevisiae 
53 Arabidopsis thaliana 
29 Rattus norvegicus 
22 Escherichia coli 
17 Bos taurus 
16 Mycobacterium tuberculosis 
13 Oryza sativa 
13 Drosophila melanogaster 
13 Glycine max 
~ 290 species in total 
Data volume: 
Total: ~55 TB 
Number of all files: ~131,000 
PXD000320-324: ~ 5 TB 
PXD000065: ~ 1.4TB 
Datasets/year: 
2012: 102 
2013: 527 
2014: 700
Overview 
• The ProteomeXchange (PX) consortium 
• Highlights in the last year 
• PRIME-XS datasets
PX submission tool: PRIME-XS tags 
37 Datasets in total (both public and 
private at present): 
- 20 from the Netherlands 
- 4 from UK 
- 2 from Austria, Belgium, Denmark, 
Spain and Switzerland 
- 1 from France and USA.
PRIME-XS are now tagged in PRIDE 
PRIME-XS datasets are now tagged and can be browsed as a group 
http://www.ebi.ac.uk/pride/archive/simpleSearch?q=prime-xs
ProteomeCentral: Portal for all PX 
datasets 
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Which are the most accessed 
datasets? 
PXD Identifier Total Hits Dataset title Publication 
PXD000561 153512 A draft map of the human proteome 
Kim et al., Nature,2014. 
PMID: 24870542 
PXD000851 111587 
Membrane proteomic analysis of 
colorectal cancer tissue 
Kume et al., MCP, 2014. 
PMID:24687888 
PXD000865 51639 
Mass spectrometry based draft of the 
human proteome 
Wilhelm et al., 2014, 
Nature, PMID:24870543
Total Numbers 
Which are the most accessed 
datasets?
Find the desired PRIDE project … 
… inspect the project details …. 
Reshake PRIDE data in 
PeptideShaker 
… and start re-analyzing the data! 
http://peptide-shaker.googlecode.com 
Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L, 
Barsnes H. Nature Biotechnology (in press)
A little bit of perspective 
Berlin 2011 Mallorca 2012 
Annecy 2013 Split 2013
A little bit of perspective 
2011 2012 2013 2014 
PRIDE Inspector PX Submission Tool 
mzIdentML mzQuantML 
PRIDE/PX datasets 
qcML 
mzTab 
PRIDE web (2011) 
PRIDE Converter 
PRIDE Converter 2 
PRIDE Inspector 2 
PRIDE web (2014)
Conclusions 
• ProteomeXchange is widely used. 
– PRIDE contains most of the MS/MS datasets. 
– It has now a new consortium member: 
MassIVE (UCSD). 
– Around half of the datasets are already public. 
• Different open source tools available to 
facilitate the process: 
– File transfer speed should not be a problem 
(Aspera support)
Aknowledgements: People 
Attila Csordas 
Tobias Ternent 
Noemi del Toro 
Rui Wang 
Florian Reisinger 
Jose A. Dianes 
Johannes Griss 
Steven Lewis 
Yasset Perez-Riverol 
Henning Hermjakob 
All previous team members 
ProteomeXchange partners
Acknowledgements: Funding 
@pride_ebi 
pride-ebi@ebi.ac.uk 
pride-support@ebi.ac.uk 
http://www.proteomexchange.org 
http://code.google.com/p/pride-converter-2/

More Related Content

PPTX
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
PDF
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PPTX
2016 02 23_biological_databases_part1
PPT
Computation and Knowledge
PDF
The Galaxy bioinformatics workflow environment
PPT
Enabling HTS Hit follow up via Chemo informatics, File Enrichment, and Outsou...
PPTX
Architecture of ContentMine Components contentmine.org
PDF
WoSC19: Serverless Workflows for Indexing Large Scientific Data
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
2016 02 23_biological_databases_part1
Computation and Knowledge
The Galaxy bioinformatics workflow environment
Enabling HTS Hit follow up via Chemo informatics, File Enrichment, and Outsou...
Architecture of ContentMine Components contentmine.org
WoSC19: Serverless Workflows for Indexing Large Scientific Data

What's hot (20)

PPTX
Mass spectrometry resources at the EBI
PPTX
2016 bioinformatics i_databases_wim_vancriekinge
PPTX
Data exchange alternatives, GIGA TAG (2009)
PDF
SureChEMBL patent annotations in Open PHACTS
PPTX
Extreme Scripting July 2009
PPTX
Implementation of GPU-based bioinformatic tools at the ENCODE DCC
PPTX
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
PPTX
Linked Data, Labels, URIs
PDF
Sciunits: Reusable Research Objects
PDF
SureChEMBL and Open PHACTS
PDF
Overview of SureChEMBL
PDF
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
PPTX
Semantically supporting data discovery, markup and aggregation in EMODnet
PPTX
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
PDF
Sharing massive data analysis: from provenance to linked experiment reports
PDF
Big data from the LHC commissioning: practical lessons from big science - Sim...
PDF
Scalable Genome Analysis with ADAM
PDF
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
PPTX
Experiences to learn from the MS proteomics field
PDF
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Mass spectrometry resources at the EBI
2016 bioinformatics i_databases_wim_vancriekinge
Data exchange alternatives, GIGA TAG (2009)
SureChEMBL patent annotations in Open PHACTS
Extreme Scripting July 2009
Implementation of GPU-based bioinformatic tools at the ENCODE DCC
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Linked Data, Labels, URIs
Sciunits: Reusable Research Objects
SureChEMBL and Open PHACTS
Overview of SureChEMBL
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
Semantically supporting data discovery, markup and aggregation in EMODnet
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
Sharing massive data analysis: from provenance to linked experiment reports
Big data from the LHC commissioning: practical lessons from big science - Sim...
Scalable Genome Analysis with ADAM
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
Experiences to learn from the MS proteomics field
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Ad

Similar to ProteomeXchange: data deposition and data retrieval made easy (20)

PPTX
PRIDE and ProteomeXchange
PPTX
Proteomics public data resources: enabling "big data" analysis in proteomics
PPTX
Human microbiome project
PPTX
Pride and ProteomeXchange
PPTX
Submitting your data to ProteomeXchange – a mini tutorial
PPTX
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
PPTX
ProteomeXchange_and_PRIDE_Semmeting_2015
PDF
Standarization in Proteomics: From raw data to metadata files
PDF
AHUPO_Vizcaino_remote_presentation_082014
PPTX
The ProteomeXchange Consoritum: 2017 update
PDF
2014 genome informatics Linked Data
PPTX
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
PDF
Systematic integration of millions of peptidoform evidences into Ensembl and ...
PPTX
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PPTX
Normal/Tumor somatic mutations report tool
PPTX
Proteomics repositories integration using EUDAT resources
PPT
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
PPTX
PRIDE-ProteomeXchange
PDF
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
PPTX
Tim Pugh-SPEDDEXES 2014
PRIDE and ProteomeXchange
Proteomics public data resources: enabling "big data" analysis in proteomics
Human microbiome project
Pride and ProteomeXchange
Submitting your data to ProteomeXchange – a mini tutorial
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
ProteomeXchange_and_PRIDE_Semmeting_2015
Standarization in Proteomics: From raw data to metadata files
AHUPO_Vizcaino_remote_presentation_082014
The ProteomeXchange Consoritum: 2017 update
2014 genome informatics Linked Data
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Systematic integration of millions of peptidoform evidences into Ensembl and ...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
Normal/Tumor somatic mutations report tool
Proteomics repositories integration using EUDAT resources
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
PRIDE-ProteomeXchange
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
Tim Pugh-SPEDDEXES 2014
Ad

More from Juan Antonio Vizcaino (20)

PDF
Reusing and integrating public proteomics data to improve our knowledge of th...
PPTX
Introduction to the PSI standard data formats
PDF
Reuse of public proteomics data
PDF
PRIDE resources and ProteomeXchange
PDF
Proteomics repositories
PDF
Introduction to the Proteomics Bioinformatics Course 2018
PDF
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
PPTX
PSI-Proteome Informatics update
PDF
ProteomeXchange update
PDF
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
PDF
The ELIXIR Proteomics community
PDF
The ELIXIR Proteomics Community
PDF
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
PPTX
Public proteomics data: a (mostly unexploited) gold mine for computational re...
PPTX
How to run and maintain a popular biological data repository?
PPTX
Reuse of public proteomics data
PPTX
Proteomics repositories
PPTX
Proteomics data standards
PPTX
Introduction to the Proteomics Bioinformatics Course 2017
PPTX
Is it feasible to identify novel biomarkers by mining public proteomics data?
Reusing and integrating public proteomics data to improve our knowledge of th...
Introduction to the PSI standard data formats
Reuse of public proteomics data
PRIDE resources and ProteomeXchange
Proteomics repositories
Introduction to the Proteomics Bioinformatics Course 2018
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
PSI-Proteome Informatics update
ProteomeXchange update
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
The ELIXIR Proteomics community
The ELIXIR Proteomics Community
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
Public proteomics data: a (mostly unexploited) gold mine for computational re...
How to run and maintain a popular biological data repository?
Reuse of public proteomics data
Proteomics repositories
Proteomics data standards
Introduction to the Proteomics Bioinformatics Course 2017
Is it feasible to identify novel biomarkers by mining public proteomics data?

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
famous lake in india and its disturibution and importance
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Microbiology with diagram medical studies .pptx
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
protein biochemistry.ppt for university classes
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
2Systematics of Living Organisms t-.pptx
Introduction to Cardiovascular system_structure and functions-1
INTRODUCTION TO EVS | Concept of sustainability
famous lake in india and its disturibution and importance
Viruses (History, structure and composition, classification, Bacteriophage Re...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Biophysics 2.pdffffffffffffffffffffffffff
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Microbiology with diagram medical studies .pptx
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Classification Systems_TAXONOMY_SCIENCE8.pptx
POSITIONING IN OPERATION THEATRE ROOM.ppt
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
The KM-GBF monitoring framework – status & key messages.pptx
Cell Membrane: Structure, Composition & Functions
protein biochemistry.ppt for university classes
Taita Taveta Laboratory Technician Workshop Presentation.pptx

ProteomeXchange: data deposition and data retrieval made easy

  • 1. ProteomeXchange: data deposition and data retrieval made easy Juan Antonio VIZCAINO, Ph.D. PRIDE Group coordinator Proteomics Services Group European Bioinformatics Institute Hinxton, Cambridge United Kingdom juan@ebi.ac.uk
  • 2. Overview • The ProteomeXchange (PX) consortium • Highlights in the last year • PRIME-XS datasets
  • 3. ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. • Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego). • Common identifier space (PXD identifiers) • Two supported data workflows: MS/MS and SRM. • Main objective: Make life easier for researchers http://www.proteomexchange.org
  • 4. ProteomeXchange data workflow ProteomeCentral Results Raw Data* Metadata / Manuscript PRIDE (MS/MS data) Journals UniProt/ neXtProt Peptide Atlas Other DBs Receiving repositories PASSEL (SRM data) Other DBs GPMDB Researcher’s results Reprocessed results Raw data* Metadata MassIVE (MS/MS data) Vizcaíno et al., Nat Biotechnol, 2014
  • 5. MassIVE (UCSD) http://proteomics.ucsd.edu/service/massive/ • Just joined ProteomeXchange on June 2014 • Only partial submissions. A few datasets so far.
  • 6. Overview • The ProteomeXchange (PX) consortium • Highlights in the last year • PRIME-XS datasets
  • 7. PX Data workflow for MS/MS data 1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML). 2. Result files: a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard. b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form. 3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. 4. Other files: Optional files: a. QUANT: Quantification related results e. FASTA b. PEAK: Peak list files f. SP_LIBRARY c. GEL: Gel images d. OTHER: Any other file type Published Raw Files Other files
  • 8. Complete vs Partial submissions: processed results For complete submissions, it is possible to connect the spectra with the identification processed results and they can be visualized. PRIDE XML, mzIdentML supported mzTab to come Complete Partial
  • 9. Complete vs Partial submissions: experimental metadata Complete Partial General experimental metadata about the projects is similar. However, at the assay level information, in partial submissions is less annotated
  • 10. Complete submissions using mzIdentML Search Engine Results + MS files Search engines mzIdentML An increasing number of tools support export to mzIdentML 1.1 - Mascot - MSGF+ - Myrimatch and related tools from D. Tabb’s lab - OpenMS - PEAKS - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idConvert tool (ProteoWizard) - ProteinPilot (planned by the end of 2014) - Others: library for X!Tandem conversion, lab internal pipelines, … - Referenced spectral files need to be submitted as well (all open formats are supported). Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
  • 11. Tools ‘RESULT’ file generation Final ‘RESULT’ file mzIdentML ‘RESULT’ Now: native file export Spectra files Mascot ProteinPilo t Scaffold PEAKS MSGF+ Others Native File export
  • 12. Original data files ‘RESULT’ file generation Final ‘RESULT’ file Search output files Spectra files PRIDE XML ‘RESULT’ Before: file conversion using PRIDE Converter File conversion PRIDE Converter
  • 13. PRIDE Inspector 2 Wang et al., Nat. Biotechnology, 2012 PRIDE Inspector 2.0 PRIDE Inspector 2.0 supports: - PRIDE XML - mzIdentML + all types of spectra files - mzML - mzTab (work in progress) http://code.google.com/p/pride-toolsuite/ wiki/PRIDEInspector
  • 14. PX submission tool: data submission Published Raw Other files http://www.proteomexchange.org/submission PX submission tool • Capture the mappings between the different types of files. • Add the mandatory metadata annotation. • Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP). • Command line alternative: some scripting is needed.
  • 15. Uploading large datasets: Aspera - Aspera is the default file transfer protocol to PRIDE: - PX Submission tool - Command line - Up to 50X faster than FTP File transfer speed should not be a problem!!
  • 16. Tutorial manuscript detailing the process Example dataset: PXD000764 - Title: “Discovery of new CSF biomarkers for meningitis in children” - 12 runs: 4 controls and 8 infected samples - Identification and quantification data http://www.proteomexchange.org/submission Ternent et al., Proteomics, 2014
  • 17. ProteomeXchange: 1329 datasets up until October 2014 Origin: 271 USA 166 Germany 115 United Kingdom 73 Switzerland 70 China 68 Netherlands 67 France 55 Canada 44 Spain 42 Belgium 33 Sweden 31 Australia 31 Denmark 31 Japan 20 India 20 Norway 19 Taiwan 17 Ireland 16 Austria 14 Finland 14 Italy 12 Republic of Korea 11 Brazil 9 Russia 8 Israel 7 Singapore … Type: 437 PRIDE complete 792 PRIDE partial 63 PeptideAtlas/PASSEL complete 14 MassIVE 23 reprocessed Publicly Accessible: 691 datasets, 52% of all 86% PRIDE 12% PASSEL 2% MassIVE Top Species studied by at least 10 datasets: 577 Homo sapiens 165 Mus musculus 56 Saccharomyces cerevisiae 53 Arabidopsis thaliana 29 Rattus norvegicus 22 Escherichia coli 17 Bos taurus 16 Mycobacterium tuberculosis 13 Oryza sativa 13 Drosophila melanogaster 13 Glycine max ~ 290 species in total Data volume: Total: ~55 TB Number of all files: ~131,000 PXD000320-324: ~ 5 TB PXD000065: ~ 1.4TB Datasets/year: 2012: 102 2013: 527 2014: 700
  • 18. Overview • The ProteomeXchange (PX) consortium • Highlights in the last year • PRIME-XS datasets
  • 19. PX submission tool: PRIME-XS tags 37 Datasets in total (both public and private at present): - 20 from the Netherlands - 4 from UK - 2 from Austria, Belgium, Denmark, Spain and Switzerland - 1 from France and USA.
  • 20. PRIME-XS are now tagged in PRIDE PRIME-XS datasets are now tagged and can be browsed as a group http://www.ebi.ac.uk/pride/archive/simpleSearch?q=prime-xs
  • 21. ProteomeCentral: Portal for all PX datasets http://proteomecentral.proteomexchange.org/cgi/GetDataset
  • 22. Which are the most accessed datasets? PXD Identifier Total Hits Dataset title Publication PXD000561 153512 A draft map of the human proteome Kim et al., Nature,2014. PMID: 24870542 PXD000851 111587 Membrane proteomic analysis of colorectal cancer tissue Kume et al., MCP, 2014. PMID:24687888 PXD000865 51639 Mass spectrometry based draft of the human proteome Wilhelm et al., 2014, Nature, PMID:24870543
  • 23. Total Numbers Which are the most accessed datasets?
  • 24. Find the desired PRIDE project … … inspect the project details …. Reshake PRIDE data in PeptideShaker … and start re-analyzing the data! http://peptide-shaker.googlecode.com Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L, Barsnes H. Nature Biotechnology (in press)
  • 25. A little bit of perspective Berlin 2011 Mallorca 2012 Annecy 2013 Split 2013
  • 26. A little bit of perspective 2011 2012 2013 2014 PRIDE Inspector PX Submission Tool mzIdentML mzQuantML PRIDE/PX datasets qcML mzTab PRIDE web (2011) PRIDE Converter PRIDE Converter 2 PRIDE Inspector 2 PRIDE web (2014)
  • 27. Conclusions • ProteomeXchange is widely used. – PRIDE contains most of the MS/MS datasets. – It has now a new consortium member: MassIVE (UCSD). – Around half of the datasets are already public. • Different open source tools available to facilitate the process: – File transfer speed should not be a problem (Aspera support)
  • 28. Aknowledgements: People Attila Csordas Tobias Ternent Noemi del Toro Rui Wang Florian Reisinger Jose A. Dianes Johannes Griss Steven Lewis Yasset Perez-Riverol Henning Hermjakob All previous team members ProteomeXchange partners
  • 29. Acknowledgements: Funding @pride_ebi pride-ebi@ebi.ac.uk pride-support@ebi.ac.uk http://www.proteomexchange.org http://code.google.com/p/pride-converter-2/