SlideShare a Scribd company logo
Exploring the potential of public
proteomics data
Dr. Juan Antonio Vizcaíno
PRIDE Group Coordinator
Proteomics Services Team
EMBL-EBI
Hinxton, Cambridge, UK
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE (PRoteomics IDEntifications) Archive
http://www.ebi.ac.uk/pride
• PRIDE Archive stores mass spectrometry
(MS)-based proteomics data:
• Peptide and protein expression data
(identification and quantification)
• Post-translational modifications
• Mass spectra (raw data and peak
lists)
• Technical and biological metadata
• Any other related information
• Full support for tandem MS approaches
Martens et al., Proteomics, 2005
Vizcaíno et al., NAR, 2016, in press
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
ProteomeXchange Consortium
• Goal: Development of a framework to allow
standard data submission and dissemination
pipelines between the main existing proteomics
repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE
(Cambridge, UK) and (very recently) MassIVE
(UCSD, San Diego).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make life easier for researchers
http://www.proteomexchange.org
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PRIDE Archive submitted datasets up until 1st November, 2015
• 1,259 submitted datasets to PRIDE Archive by
November 1st
• 923 were submitted datasets in 2014
• In the last 6 months, 155 submitted datasets per month
• Size: ~ 160 TB
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Review paper in press
http://www.ncbi.nlm.nih.gov/pubmed/26449181
http://onlinelibrary.wiley.com/doi/10.1002/pmic.201500295/epdf
Vaudel et al., 2016, Proteomics,
in press
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data sharing in Proteomics
• Data as they are.
• Protein knowledge bases: UniProt, neXtProt.
• Contributing to the Protein Evidence Code.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Protein Evidence codes in UniProt/neXtProt
http://www.uniprot.org/help/protein_existence
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Use of MS data in UniProt
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Use of MS data in neXtProt
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Reuse
• Information is not only extracted, but reused in new
experiments with the potential of generating new
knowledge.
• Transitions used in SRM approaches.
• Meta-analysis approaches.
• Spectral libraries.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
SRMAtlas
http://www.srmatlas.org/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
MRMaid Workflow
TransitionsTransitions
Transition
Database
API
Web
Interface
Expor
t as
TraML Expor
t as
CSV
Other
Tools
Spectra
& IDs
Spectra
& IDs
Peptides,
Spectra &
IDs
Transition
s
TransitionsTransitionsValidated
Transitions
Export as
Vendor
Specific
File
Transition
Builder
Pipeline
• PRIDE
database
• XML files via
FTP
• PRIDE API
Slide from C. Bessant
Mead et al., MCP, 2009
http://138.250.31.29/mrmaid/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PeptidePicker
http://mrmpeptidepicker.proteincentre.com/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Meta-analysis approaches
• Putting data coming from a lot of experiments
together, to extract new knowledge. Examples:
• Study the cleavage mechanism and performance of
trypsin.
• Fragmentation patterns.
• Retention time prediction.
• Which is the most suitable reference DB for long-term
proteomics data storage?
• Data integration of experiments done at different time
points.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Spectral searching
• Concept: To compare experimental spectra to other
experimental spectra.
• There are many spectral libraries publicly available (for
instance, from NIST, PeptideAtlas and PRIDE)
• Custom ‘search engines’ have been developed:
• SpectraST (TPP)
• X!Hunter (GPM)
• It has been claimed that the searches have more
sensitivity that with sequence database approaches
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Spectral searching (2)
http://peptide.nist.gov/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
• Quality assessment method. Takes advantage of the wealth of data in PRIDE.
• Assumption: The same peptide will generate similar MS/MS spectra
across many experiments:
• Cluster all identified spectra (20.7 M) in PRIDE (modified version of the MS-Cluster
algorithm [1]). API available (http://pride-spectra-clustering.googlecode.com).
• Those clusters which contain only/mainly one peptide are considered reliable.
• Thresholds: At least 10 spectra in a cluster and ratio >70%.
NMMAACDPR
NMMAACDPR
PPECPDFDPPR
NMMAACDPR
Consensus
PPECPDFDPPR
PRIDE Cluster
Griss et al., Nat Methods, 20131. Frank et al. Nat Methods 8, 587-591 (2011)
PRIDE Cluster
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Reprocess
• Data are reprocessed with the intention of obtaining
new knowledge or to provide an updated view on the
results.
• It mainly serves the same purpose of the original
experiment.
• For instance, a shot-gun dataset can be reprocessed
with a different algorithm or an updated sequence
database.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Reprocessing repositories
• These resources collect MS raw data and reprocess it using
one given analysis pipeline, and an up-to date protein
sequence database.
• Main resources: GPMDB and PeptideAtlas (ISB, Seattle).
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PeptideAtlas builds
Examples of builds:
- Human
- Human plasma
- Human urine
- Drosophila
- Mouse
- Mouse plasma
- Cow
- Yeast
…
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data reprocessing in GPMDB
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014 Kim et al., Nature, 2014
•Two independent groups claimed to have produced the
first complete draft of the human proteome by MS.
• Some of their findings are controversial and need further
validation… but generated a lot of discussion and put
proteomics in the spotlight.
•They used many different tissues.
Nature cover 29 May 2014
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014
•Around 60% of the data used for the
analysis comes from previous
experiments, most of them stored in
proteomics repositories such as
PRIDE/ProteomeXchange, PASSEL or
MassIVE.
•They complement that data with “exotic”
tissues.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Reprocessing for the validation of controversial data
• Analysis of Tyrannosaurus rex fossils: controversial presence of
collagen (is it a contamination of the sample? Did the sample contain
any T. rex proteins at all?)
Asara et al. (2007) Science 316: 280-5.
Asara et al. (2007) Science 316: 1324-5.
Bern et al. (2009) JPR 9: 4328-32
PRIDE assay accession 8633
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Info from R. Chalkley
Bromenshenk et al. (2011) PLOS One 5: e13181
Reprocessing for the validation of controversial data (2)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Experimental Protocol
1. Collected samples from healthy, collapsing and collapsed bee colonies.
2. Homogenised bees.
3. Digested with Trypsin
4. Analyzed by LC-MSMS on LTQ
5. Searched using Sequest
6. Filtered Results using Peptide and Protein Prophet
7. Performed further analysis to determine species statistically more
commonly found in collapsing/collapsed colony samples
Info from R. Chalkley
Bromenshenk et al. (2011) PLOS One 5: e13181
Reprocessing for the validation of controversial data (3)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
• Big pitfall: Search database was only composed by viral
proteins. Not bee proteins at all!!
• After researching the data, there is no evidence for viral
peptides/proteins in any of their data: honey bee, fruit fly,
wasp, moth, human keratin, bacteria that like sugary
environments, …
• “We believe that there is currently insufficient evidence to
conclude that bees are a natural host for IIV-6, let alone that
the virus is linked to CCD”.
Info from R. Chalkley
Knudsen & Chalkley (2011) PLOS One 6:
e20873
Foster (2011), MCP 10: M110.006387
Reprocessing for the validation of controversial data (4)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014 Kim et al., Nature, 2014
•Two independent groups claimed to have produced the
first complete draft of the human proteome by MS.
• Some of their findings are controversial and need
further validation… but generated a lot of discussion and
put proteomics in the spotlight.
•They used many different tissues.
Nature cover 29 May 2014
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Reprocessing for the validation of controversial data
Datasets PXD000561 and PXD000865 in PRIDE Archive
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Repurposing
• Data are considered in light of a question or a context
that is different from the original study.
• Proteogenomics studies
• Discovery of novel PTMs.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
One example of repurpossing for genome annotation
Brosch et al. (2011) Genome Res 21:756-767
• In this particular paper:
• 53 genes alternatively transcribed
• 10 new protein coding genes
• Pipeline to integrate gene annotations in the mouse
genome.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Examples of repurposing in proteogenomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Repurposing: new PTMs found
• Individual authors can reprocess raw data with new
hypotheses in mind (not taken into account by the original
authors).
• Recent examples (using phosphoproteomics data sets):
• O-GlcNAc-6-phosphate1
• Phosphoglyceryl2
• ADP-ribosylation3
1Hahne & Kuster, Mol Cell Proteomics (2012) 11 10 1063-9
2Moellering & Cravatt, Science (2013) 341 549-553
3Matic et al., Nat Methods (2012) 9 771-2
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data sharing in Proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Vaudel M, Barsnes H, Berven FS, Sickmann A,
Martens L:
Proteomics 2011;11(5):996-9.
https://github.com/compomics/searchgui https://github.com/compomics/peptide-shaker
Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L,
Barsnes H:
Nature Biotechnology 2015; 33(1):22-4.
CompOmics Open Source Analysis Pipeline
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Find the desired PRIDE project …
… and start re-analyzing the data!
… inspect the project details ….
Reshake PRIDE data!
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Acknowledgements
http://www.ncbi.nlm.nih.gov/pubmed/26449181
http://onlinelibrary.wiley.com/doi/10.1002/pmic.201500295/epdf
Vaudel et al., 2016, Proteomics,
in press
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Questions?

More Related Content

PPTX
PRIDE-ProteomeXchange
PPTX
Proteomics repositories
PPTX
Proteomics data standards
PPTX
Mass spectrometry resources at the EBI
PPTX
PRIDE and ProteomeXchange
PPTX
Reuse of public proteomics data
PPTX
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
PPTX
Proteomics public data resources: enabling "big data" analysis in proteomics
PRIDE-ProteomeXchange
Proteomics repositories
Proteomics data standards
Mass spectrometry resources at the EBI
PRIDE and ProteomeXchange
Reuse of public proteomics data
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics public data resources: enabling "big data" analysis in proteomics

What's hot (20)

PPTX
Experiences to learn from the MS proteomics field
PPTX
Proteomics data standards
PPTX
Public proteomics data: a (mostly unexploited) gold mine for computational re...
PPTX
An overview of the PRIDE ecosystem of resources and computational tools for m...
PPTX
ProteomeXchange_and_PRIDE_Semmeting_2015
PDF
Pride cluster presentation
PDF
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PPT
Royal society of chemistry activities to develop a data repository for chemis...
PPTX
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
PPTX
Reuse of public data in proteomics
PPTX
Reproducibility (and the R*) of Science: motivations, challenges and trends
PPTX
How to run and maintain a popular biological data repository?
PPTX
The ProteomeXchange Consoritum: 2017 update
PDF
Better Data for a Better World
PPTX
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
PPT
UniProt-GOA
 
PPTX
Opportunities and challenges presented by Wikidata in the context of biocuration
PPTX
PRIDE and ProteomeXchange: Training webinar
PDF
Publication of raw and curated NMR spectroscopic data for organic molecules
PPTX
Reflections on a (slightly unusual) multi-disciplinary academic career
Experiences to learn from the MS proteomics field
Proteomics data standards
Public proteomics data: a (mostly unexploited) gold mine for computational re...
An overview of the PRIDE ecosystem of resources and computational tools for m...
ProteomeXchange_and_PRIDE_Semmeting_2015
Pride cluster presentation
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
Royal society of chemistry activities to develop a data repository for chemis...
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Reuse of public data in proteomics
Reproducibility (and the R*) of Science: motivations, challenges and trends
How to run and maintain a popular biological data repository?
The ProteomeXchange Consoritum: 2017 update
Better Data for a Better World
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
UniProt-GOA
 
Opportunities and challenges presented by Wikidata in the context of biocuration
PRIDE and ProteomeXchange: Training webinar
Publication of raw and curated NMR spectroscopic data for organic molecules
Reflections on a (slightly unusual) multi-disciplinary academic career
Ad

Similar to Reuse of public proteomics data (20)

PPTX
Is it feasible to identify novel biomarkers by mining public proteomics data?
PPTX
Proteomics repositories
PPTX
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PDF
Reuse of public proteomics data
PPTX
Pride and ProteomeXchange
PPTX
Proteomics repositories
PDF
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
PDF
Proteomics repositories
PPTX
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PPTX
Introduction to the Proteomics Bioinformatics Course 2016
PPTX
Mining the hidden proteome using hundreds of public proteomics datasets
PDF
Reusing and integrating public proteomics data to improve our knowledge of th...
PPTX
Introduction to the Proteomics Bioinformatics Course 2017
PDF
Big Data in Life Sciences
PDF
Introduction to the Proteomics Bioinformatics Course 2018
PPTX
Human microbiome project
PDF
The ELIXIR Proteomics Community
PPTX
Proteomexchange
PDF
Big Data in Genomics: Opportunities and Challenges
PPTX
Dynamic linkage of public proteomics data in Ensembl using TrackHubs
Is it feasible to identify novel biomarkers by mining public proteomics data?
Proteomics repositories
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
Reuse of public proteomics data
Pride and ProteomeXchange
Proteomics repositories
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
Proteomics repositories
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
Introduction to the Proteomics Bioinformatics Course 2016
Mining the hidden proteome using hundreds of public proteomics datasets
Reusing and integrating public proteomics data to improve our knowledge of th...
Introduction to the Proteomics Bioinformatics Course 2017
Big Data in Life Sciences
Introduction to the Proteomics Bioinformatics Course 2018
Human microbiome project
The ELIXIR Proteomics Community
Proteomexchange
Big Data in Genomics: Opportunities and Challenges
Dynamic linkage of public proteomics data in Ensembl using TrackHubs
Ad

More from Juan Antonio Vizcaino (12)

PPTX
Introduction to the PSI standard data formats
PDF
PRIDE resources and ProteomeXchange
PDF
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
PPTX
PSI-Proteome Informatics update
PDF
ProteomeXchange update
PDF
The ELIXIR Proteomics community
PDF
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
PPTX
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
PPTX
ProteomeXchange update 2017
PPTX
Enabling automated processing and analysis of large-scale proteomics data
PPTX
Introduction to EBI for Proteomics in ELIXIR
PPTX
The Proteomics Standards Initiative (PSI)
Introduction to the PSI standard data formats
PRIDE resources and ProteomeXchange
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
PSI-Proteome Informatics update
ProteomeXchange update
The ELIXIR Proteomics community
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
ProteomeXchange update 2017
Enabling automated processing and analysis of large-scale proteomics data
Introduction to EBI for Proteomics in ELIXIR
The Proteomics Standards Initiative (PSI)

Recently uploaded (20)

PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
Understanding the Circulatory System……..
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPTX
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPT
Presentation of a Romanian Institutee 2.
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPT
Mutation in dna of bacteria and repairss
PDF
The Land of Punt — A research by Dhani Irwanto
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
Biomechanics of the Hip - Basic Science.pptx
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPTX
BIOMOLECULES PPT........................
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
PMR- PPT.pptx for students and doctors tt
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Introcution to Microbes Burton's Biology for the Health
Understanding the Circulatory System……..
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Presentation of a Romanian Institutee 2.
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Mutation in dna of bacteria and repairss
The Land of Punt — A research by Dhani Irwanto
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
Biomechanics of the Hip - Basic Science.pptx
Hypertension_Training_materials_English_2024[1] (1).pptx
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
BIOMOLECULES PPT........................
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PMR- PPT.pptx for students and doctors tt

Reuse of public proteomics data

  • 1. Exploring the potential of public proteomics data Dr. Juan Antonio Vizcaíno PRIDE Group Coordinator Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK
  • 2. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE (PRoteomics IDEntifications) Archive http://www.ebi.ac.uk/pride • PRIDE Archive stores mass spectrometry (MS)-based proteomics data: • Peptide and protein expression data (identification and quantification) • Post-translational modifications • Mass spectra (raw data and peak lists) • Technical and biological metadata • Any other related information • Full support for tandem MS approaches Martens et al., Proteomics, 2005 Vizcaíno et al., NAR, 2016, in press
  • 3. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. • Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego). • Common identifier space (PXD identifiers) • Two supported data workflows: MS/MS and SRM. • Main objective: Make life easier for researchers http://www.proteomexchange.org
  • 4. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PRIDE Archive submitted datasets up until 1st November, 2015 • 1,259 submitted datasets to PRIDE Archive by November 1st • 923 were submitted datasets in 2014 • In the last 6 months, 155 submitted datasets per month • Size: ~ 160 TB
  • 5. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Review paper in press http://www.ncbi.nlm.nih.gov/pubmed/26449181 http://onlinelibrary.wiley.com/doi/10.1002/pmic.201500295/epdf Vaudel et al., 2016, Proteomics, in press
  • 6. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data sharing in Proteomics
  • 7. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data sharing in Proteomics
  • 8. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data sharing in Proteomics • Data as they are. • Protein knowledge bases: UniProt, neXtProt. • Contributing to the Protein Evidence Code.
  • 9. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Protein Evidence codes in UniProt/neXtProt http://www.uniprot.org/help/protein_existence
  • 10. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Use of MS data in UniProt
  • 11. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Use of MS data in neXtProt
  • 12. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data sharing in Proteomics
  • 13. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Reuse • Information is not only extracted, but reused in new experiments with the potential of generating new knowledge. • Transitions used in SRM approaches. • Meta-analysis approaches. • Spectral libraries.
  • 14. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 SRMAtlas http://www.srmatlas.org/
  • 15. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 MRMaid Workflow TransitionsTransitions Transition Database API Web Interface Expor t as TraML Expor t as CSV Other Tools Spectra & IDs Spectra & IDs Peptides, Spectra & IDs Transition s TransitionsTransitionsValidated Transitions Export as Vendor Specific File Transition Builder Pipeline • PRIDE database • XML files via FTP • PRIDE API Slide from C. Bessant Mead et al., MCP, 2009 http://138.250.31.29/mrmaid/
  • 16. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PeptidePicker http://mrmpeptidepicker.proteincentre.com/
  • 17. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Meta-analysis approaches • Putting data coming from a lot of experiments together, to extract new knowledge. Examples: • Study the cleavage mechanism and performance of trypsin. • Fragmentation patterns. • Retention time prediction. • Which is the most suitable reference DB for long-term proteomics data storage? • Data integration of experiments done at different time points.
  • 18. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Spectral searching • Concept: To compare experimental spectra to other experimental spectra. • There are many spectral libraries publicly available (for instance, from NIST, PeptideAtlas and PRIDE) • Custom ‘search engines’ have been developed: • SpectraST (TPP) • X!Hunter (GPM) • It has been claimed that the searches have more sensitivity that with sequence database approaches
  • 19. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Spectral searching (2) http://peptide.nist.gov/
  • 20. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 • Quality assessment method. Takes advantage of the wealth of data in PRIDE. • Assumption: The same peptide will generate similar MS/MS spectra across many experiments: • Cluster all identified spectra (20.7 M) in PRIDE (modified version of the MS-Cluster algorithm [1]). API available (http://pride-spectra-clustering.googlecode.com). • Those clusters which contain only/mainly one peptide are considered reliable. • Thresholds: At least 10 spectra in a cluster and ratio >70%. NMMAACDPR NMMAACDPR PPECPDFDPPR NMMAACDPR Consensus PPECPDFDPPR PRIDE Cluster Griss et al., Nat Methods, 20131. Frank et al. Nat Methods 8, 587-591 (2011) PRIDE Cluster
  • 21. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data sharing in Proteomics
  • 22. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Reprocess • Data are reprocessed with the intention of obtaining new knowledge or to provide an updated view on the results. • It mainly serves the same purpose of the original experiment. • For instance, a shot-gun dataset can be reprocessed with a different algorithm or an updated sequence database.
  • 23. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Reprocessing repositories • These resources collect MS raw data and reprocess it using one given analysis pipeline, and an up-to date protein sequence database. • Main resources: GPMDB and PeptideAtlas (ISB, Seattle).
  • 24. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PeptideAtlas builds Examples of builds: - Human - Human plasma - Human urine - Drosophila - Mouse - Mouse plasma - Cow - Yeast …
  • 25. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data reprocessing in GPMDB
  • 26. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Draft Human proteome papers published in 2014 Wilhelm et al., Nature, 2014 Kim et al., Nature, 2014 •Two independent groups claimed to have produced the first complete draft of the human proteome by MS. • Some of their findings are controversial and need further validation… but generated a lot of discussion and put proteomics in the spotlight. •They used many different tissues. Nature cover 29 May 2014
  • 27. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Draft Human proteome papers published in 2014 Wilhelm et al., Nature, 2014 •Around 60% of the data used for the analysis comes from previous experiments, most of them stored in proteomics repositories such as PRIDE/ProteomeXchange, PASSEL or MassIVE. •They complement that data with “exotic” tissues.
  • 28. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Reprocessing for the validation of controversial data • Analysis of Tyrannosaurus rex fossils: controversial presence of collagen (is it a contamination of the sample? Did the sample contain any T. rex proteins at all?) Asara et al. (2007) Science 316: 280-5. Asara et al. (2007) Science 316: 1324-5. Bern et al. (2009) JPR 9: 4328-32 PRIDE assay accession 8633
  • 29. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Info from R. Chalkley Bromenshenk et al. (2011) PLOS One 5: e13181 Reprocessing for the validation of controversial data (2)
  • 30. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Experimental Protocol 1. Collected samples from healthy, collapsing and collapsed bee colonies. 2. Homogenised bees. 3. Digested with Trypsin 4. Analyzed by LC-MSMS on LTQ 5. Searched using Sequest 6. Filtered Results using Peptide and Protein Prophet 7. Performed further analysis to determine species statistically more commonly found in collapsing/collapsed colony samples Info from R. Chalkley Bromenshenk et al. (2011) PLOS One 5: e13181 Reprocessing for the validation of controversial data (3)
  • 31. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 • Big pitfall: Search database was only composed by viral proteins. Not bee proteins at all!! • After researching the data, there is no evidence for viral peptides/proteins in any of their data: honey bee, fruit fly, wasp, moth, human keratin, bacteria that like sugary environments, … • “We believe that there is currently insufficient evidence to conclude that bees are a natural host for IIV-6, let alone that the virus is linked to CCD”. Info from R. Chalkley Knudsen & Chalkley (2011) PLOS One 6: e20873 Foster (2011), MCP 10: M110.006387 Reprocessing for the validation of controversial data (4)
  • 32. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Draft Human proteome papers published in 2014 Wilhelm et al., Nature, 2014 Kim et al., Nature, 2014 •Two independent groups claimed to have produced the first complete draft of the human proteome by MS. • Some of their findings are controversial and need further validation… but generated a lot of discussion and put proteomics in the spotlight. •They used many different tissues. Nature cover 29 May 2014
  • 33. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Reprocessing for the validation of controversial data Datasets PXD000561 and PXD000865 in PRIDE Archive
  • 34. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data sharing in Proteomics
  • 35. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Repurposing • Data are considered in light of a question or a context that is different from the original study. • Proteogenomics studies • Discovery of novel PTMs.
  • 36. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 One example of repurpossing for genome annotation Brosch et al. (2011) Genome Res 21:756-767 • In this particular paper: • 53 genes alternatively transcribed • 10 new protein coding genes • Pipeline to integrate gene annotations in the mouse genome.
  • 37. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Examples of repurposing in proteogenomics
  • 38. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Repurposing: new PTMs found • Individual authors can reprocess raw data with new hypotheses in mind (not taken into account by the original authors). • Recent examples (using phosphoproteomics data sets): • O-GlcNAc-6-phosphate1 • Phosphoglyceryl2 • ADP-ribosylation3 1Hahne & Kuster, Mol Cell Proteomics (2012) 11 10 1063-9 2Moellering & Cravatt, Science (2013) 341 549-553 3Matic et al., Nat Methods (2012) 9 771-2
  • 39. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data sharing in Proteomics
  • 40. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Vaudel M, Barsnes H, Berven FS, Sickmann A, Martens L: Proteomics 2011;11(5):996-9. https://github.com/compomics/searchgui https://github.com/compomics/peptide-shaker Vaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L, Barsnes H: Nature Biotechnology 2015; 33(1):22-4. CompOmics Open Source Analysis Pipeline
  • 41. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Find the desired PRIDE project … … and start re-analyzing the data! … inspect the project details …. Reshake PRIDE data!
  • 42. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Acknowledgements http://www.ncbi.nlm.nih.gov/pubmed/26449181 http://onlinelibrary.wiley.com/doi/10.1002/pmic.201500295/epdf Vaudel et al., 2016, Proteomics, in press
  • 43. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Questions?