SlideShare a Scribd company logo
Introduction to the PSI standard data
formats
Dr. Juan Antonio Vizcaíno
PRIDE Group Coordinator
Proteomics Services Team
EMBL-EBI
Hinxton, Cambridge, UK
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Overview
• A couple of slides about the need of data standards
• The Proteomics Standards Initiative
• Existing data standards
• Connection with ProteomeXchange and IMEx
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Overview
• A couple of slides about the need of data standards
• The Proteomics Standards Initiative
• Existing data standards
• Connection with ProteomeXchange and IMEx
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Standards are needed in life: also in bioinformatics…
With a small number
of standards,
data converters are feasible
Data standards are needed
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Taken from Biocomical, http://biocomicals.blogspot.com
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Mass Spectrometry (MS)-based proteomics
• Many different workflows -> Many different data
types -> Need for several data standards.
• Discovery mode:
• Bottom-up proteomics
• Data dependent acquisition
• Data independent acquisition
• Top down proteomics
• Targeted mode:
• SRM (Selected Reaction Monitoring)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Overview
• A couple of slides about the need of data standards
• The Proteomics Standards Initiative
• Existing data standards
• Connection with ProteomeXchange and IMEx
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
•Develops data format standards for proteomics.
•Both data representation and annotation standards.
•Involves data producers, database providers, software producers,
publishers, …
•Active Workgroups: MI, MS, PI, Mod.
•Inter-group activities: MIAPE and Controlled Vocabularies.
•Started in 2002, so some experience already…
•One annual meeting in March-April, regular phone calls.
•Close interaction with the metabolomics community.
http://www.psidev.info
HUPO Proteomics Standards Initiative
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PSI Deliverables
•Minimum information (MIAPE) specifications: Format-independent
specification of minimum information guidelines.
•Formats: Usually an XML schema (but also tab-delimited files) capable of
representing the relevant Minimum Information, plus additional detailed data
for the domain.
•Controlled vocabularies: Usually an OBO-style hierarchical controlled
vocabulary precisely defining the metadata that are encoded in the formats.
•Databases and Tools: Foster software implementations to make the
standards truly useful.
•Community interaction to ensure deposition of data in public repositories.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PSI MS Controlled Vocabulary
Mayer et al., Database, 2013
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Overview
• A couple of slides about the need of data standards
• The Proteomics Standards Initiative
• Existing data standards
• Connection with ProteomeXchange and IMEx
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
The typical dilemma
•Data standards need to be stable to promote adoption
•Proteomics standards need to evolve very rapidly:
• Data is inherently very complex
• Experimental techniques are evolving all the time
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
•MS data: mzML (also used in MS metabolomics).
•Protein and peptide identifications: mzIdentML.
•Peptide and protein quantification: mzQuantML (also supports
small molecules).
•SRM transitions (for targeted proteomics): TraML.
•Molecular interactions: PSI MI XML and MITAB.
•Latest addition (just published): mzTab: identification and
quantification results for peptides, proteins and small molecules
(also used in MS metabolomics).
www.psidev.info
Existing data standards in proteomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Current PSI Standard File Formats for MS
• mzTabFinal Results
• TraMLSRM
• mzQuantMLQuantitation
• mzIdentMLIdentification
• mzMLMS data
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Binary data
mzData
mzXML
mzML
XML-based
files
.dta, .pkl, .mgf,
.ms2
Peak lists
Data formats for mass spectra data
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
An example of success story: mzML
• A data format for the storage and exchange of MS output files
• Designed by merging the best aspects of both mzData and mzXML
• Developed with full participation of academic researchers, hardware
and software vendors
• Expected to replace mzXML and mzData, but not expected to
completely replace vendor binary formats
• Captures spectra (raw data or peak lists), chromatograms and related
metadata
• Version 1.0 released in June 2008, v1.1 released in June 2009
• Many implementations already exist
• Version 1.2 with enhanced compression considered for 2014
Martens et al., MCP, 2011
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
An example of success story: mzML
Martens et al., MCP, 2011
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
An example of success story: mzML
The most popular search
engines support mzML
Many parser libraries available
Conversion from raw files
into mzMLhttp://www.psidev.info/mzml_1_0_0
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Application of mzML to metabolomics
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
mzIdentML, mascot
.dat, sequest .out,
SpectrumMill .spo
pep.xml, prot.xml
Only qualitative data!
Data formats for output from search engines
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
mzIdentML: peptide and protein identifications
• Overview
• XML-based data standard for peptide and protein identifications e.g.
following database search and protein inference.
• Sections for all PSMs, proteins/protein groups inferred,
protocols/parameters etc.
• Timeline:
• Original 1.0 version in Aug 2009.
• Version 1.1 stable (Aug 2011).
• Manuscript published in MCP in 2012*.
• 2012-2015:
• Improving support for protein grouping multiple search engines, pre-fractionation
approaches and de novo sequencing.
• Now firmly embedded as part of ProteomeXchange submission
process, and supported by lots of external software.
* Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., et al., The mzIdentML data standard for mass spectrometry-based
proteomics results. Molecular & Cellular Proteomics 2012, 11, M111.014381.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
mzIdentML: peptide and protein identifications
* Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., et al., The mzIdentML data standard for mass spectrometry-based
proteomics results. Molecular & Cellular Proteomics 2012, 11, M111.014381.
MzIdentML
CvList
AnalysisSoftwareList
AnalysisSampleCollection
SequenceCollection
AnalysisCollection
AnalysisProtocolCollection
DataCollection
URLsof controlledvocabularies
usedwithinthe file
Softwarepackagesused
Biologicalsamples analysed,
annotated with CV terms
Databaseentriesofprotein/ peptide
sequencesidentifiedandmodifications
Applicationofprotocol
inputs= externalspectra1..n
output= SpectrumIdentificationList1
SpectrumIdentificationProtocol
ProteinDetectionProtocol
SpectrumIdentificationProtocol
AdditionalSearchParams
ModificationParams
Enzymes
DatabaseFilters
Parametersfor theprotein
detection procedure
Inputs
AnalysisData
AnalysisData
SpectrumIdentificationList
Thedatabasesearched and theinput
fileconverted tomzIdentML
SpectrumIdentificationResult
SpectrumIdentificationItem
ProteinDetectionList
ProteinAmbiguityGroup
ProteinDetectionHypothesis
All identificationsmade from
searchingonespectrum
One(poly)peptide-
spectrummatch
Aset ofrelatedprotein
identificationse.g.conflicting
peptide-proteinassignments
Asingle proteinidentification
SpectrumIdentification
ProteinDetectionApplicationofprotocol
inputs= SpectrumIdentificationList1..n
output= ProteinDetectionList1
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
An example: XML snippet of mzIdentML
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Support for mzIdentML
Search
Engine
Results +
MS files
Search
engines
mzIdentML
- Mascot
- MSGF+
- Myrimatch and related tools from D. Tabb’s lab
- OpenMS
- PEAKS
- PeptideShaker (several open source tools)
- ProCon (ProteomeDiscoverer, Sequest)
- Scaffold
- TPP via the idConvert tool (ProteoWizard)
- ProteinPilot (from version 5.0)
- X!Tandem (from PILEDRIVER version)
- Others: library for X!Tandem conversion, lab
internal pipelines, …
- Crux
An increasing number of tools support export to mzIdentML
1.1
Updated list: http://www.psidev.info/tools-implementing-
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
mzIdentML status
• Current status of mzIdentML 1.2
• Improved support for protein grouping**
• Support for PTM localisation scoring added - COMPLETE
• Support for peptide-level statistics added - COMPLETE
• Better support for multiple search approaches - COMPLETE
• Support for crosslinking approaches – INCOMPLETE
• mzIdentML 1.2 submission to document process -
INCOMPLETE
** Seymour, S. L., Farrah, T., Binz, P. A., Chalkley, R. J., et al., A standardized framing for reporting protein identifications in
mzIdentML 1.2. Proteomics 2014, 14, 2389-2399.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
mzQuantML: Standard for quantitative data
Overview
• XML-based standard for quantification data – following use of quant software
• Can report tables of data (QuantLayers), columns are: StudyVariables, Assays or Ratios;
rows are ProteinGroups, Proteins or Peptides
• Can also capture 2D coordinates of quantified regions in LC-MS (Features)
Timeline
• Version 1.0 rc-1 submitted to the PSI process October 2011; Version 1.0 rc-2 June 2012; Re-
submitted to PSI process in October 2012
• Completed PSI process in Feb 2013 – version 1.0 release
• Supports label-free (intensity), label-free (spectral counting), MS2 tag techniques (e.g. iTRAQ) and
MS1 label techniques e.g. SILAC
• Schema is fixed with each technique defined by separate semantic rules, implemented in validator
software
• Manuscript published in MCP in summer 2013*
• Updated in 2013-2014 to support SRM as a new technique** (version 1.0.1 just submitted to the
document process).
*Walzer et al. MCP 2013 Aug;12(8):2332-40. doi: 10.1074/mcp.O113.028506
**Qi et al. PROTEOMICS, 2015
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
mzQuantML: Standard for quantitative data
*Walzer et al. MCP 2013 Aug;12(8):2332-40. doi: 10.1074/mcp.O113.028506
**Qi et al. PROTEOMICS, 2015
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
mzQuantML status
Open issues
• Updated in 2014 to support absolute quant techniques but
not yet added to specification document
• Some supporting software but not currently implemented for
import in databases (PRIDE?).
• No webpage for software implementations.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Wide variety of quantification techniques
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
mzQ library
• The “mzqLibrary” includes a set of applications/modules to
facilitate the use of mzQuantML within analysis pipelines.
• The “mzqLibrary” provides common routines for post-
processing quantitative proteomics data via the command
line interface or graphical user interface - called the
“mzqViewer".
• The mzqViewer integrates with R to produce a heat map or
principal component analysis (PCA) on selected data
tables. Line plots can also be produced showing peptide
and protein quant data across MS runs.
* Qi et al., Proteomics, 2015
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
The mzqLibrary can convert output of
Progenesis (*.csv files), MaxQuant (*.txt files),
and OpenMS (*.consensusXML files) into
mzQuantML (*.mzq).
The mzqLibrary can convert mzQuantML file to four file
formats (XLS, CSV, HTML, and mzTab).
mzqLibrary:
Converter &
Exporter
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
The last addition: mzTab – Aims and concept
• To provide a simple and efficient way of exchanging results from MS
approaches.
• Simpler summary report of the experimental results
• Peptides and proteins identified in a given experimental setting
• Small molecules identified
• Reported quantification values
• Technical and biological metadata
• Easier to parse and use by the research community, systems
biologists as well as providers of knowledge bases.
• It can be used by non-experts in bioinformatics.
• It does not aim to replace mzIdentMl and mzQuantML
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
mzTab - Sections
• Basic information about experiment and sample
• Key-Value pairsMetadata
• Basic information about protein identifications
• Table-basedProtein
• Information about quantified peptides
• Table-basedPeptide
• Information about identified spectra
• Table-basedPSM
• Basic information about identified small molecules
• Table-basedSmall Molecule
Griss et al., MCP, 2014
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Metadata section - Example
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Protein Section (label-free)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
mzTab – Current implementations
• jmzTab (Java API): Version 3.0 is now a stable version. Manuscript
published in the journal Proteomics.
• mzTab Validator, PRIDE XML to mzTab converter (PRIDE team).
• mzIdentML and mzQuantML to mzTab converters (Andy Jones
group).
• Mascot (Matrix Science) exporter.
• MaxQuant: exporter in beta is available.
• OpenMS (version 1.10).
• R/Bioconductor package Msnbase (L. Gatto, Cambridge University).
• LipidDataAnalyzer (J. Hartler, University of Graz, see next talk).
• Metabolights (EBI).
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
mzTab – ongoing development in metabolomics
• More detailed modelling of MS metabolomics data
• Led by S. Neumann (COSMOS EU FP7 project).
• Extension from one to three sections.
Example file exists at
https://github.com/sneumann/mtbls2/faahKO.mzTab
http://www.cosmos-fp7.eu/
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Unify exchange of transitions with TraML
• PSI’s TraML (Transitions Markup Language)
• Format for encoding SRM/MRM transitions
• Version 1.0.0 now released and published in MCP (Deutsch et al. 2012)
Journal
Articles
Transitions
Databases
Excel
sheets
SRM
Analysis
Software
Instruments
TraML
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Unify exchange of transitions with TraML
Deutsch et al., MCP, 2012
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Java library for working with TraML files
It aims:
- command line & simple GUI
- TraML to TSV <-> TSV to TraML
- TSV vendor formats from TSQ, QTRAP5500, AgilentQQQ
Published: Helsens et al., JPR, 2011
TraML software implementations
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
PSI document process
•Every data standard has to undergo a
thorough review process…
•In fact, in practice, two review processes
happen in parallel: the PSI and
manuscript review.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Data standard publications
mzML (data standard for MS data) Martens et al., MCP, 2011
mzIdentML (standard for peptide/protein IDs) Jones et al., MCP, 2012
TraML (for SRM transitions) Deutsch et al., MCP, 2012
mzQuantML (for quantitative data) Waltzer et al., MCP, 2013
mzTab (peptide/protein ID and quantification) Griss et al., MCP, 2014
Some updates already going on (e.g. mzIdentML 1.2)
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Importance of making software available
jmzML (http://code.google.com/p/jmzml/) Cote et al., Proteomics, 2009
jmzIdentML (http://code.google.com/p/jmzidentml/) Reisinger et al., Proteomics, 2012
jmzReader (http://code.google.com/p/jmzreader/) Griss et al., Proteomics, 2012
jmzQuantML (http://code.google.com/p/jmzquantml/) Qi et al., Proteomics, 2014
jmzTab (http://code.google.com/p/mztab/) Xu et al., Proteomics, 2014
PSI promotes implementations. The reference libraries are always
open source.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
And also… protein-protein interactions
PSI-XML: XML-based format
• Version 2.5 is the working version
• Version 3.0 under development
MITAB: tab-delimited format
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Overview
• A couple of slides about the need of data standards
• The Proteomics Standards Initiative
• Existing data standards
• Connection with ProteomeXchange and IMEx
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
ProteomeXchange Consortium
• Goal: Development of a framework to allow
standard data submission and dissemination
pipelines between the main existing proteomics
repositories.
• Includes PeptideAtlas (ISB, Seattle), PRIDE
(Cambridge, UK) and (very recently) MassIVE
(UCSD, San Diego).
• EU FP7 CA (01/2011-> 06/2014).
• Common identifier space (PXD identifiers)
• Two supported data workflows: MS/MS and SRM.
• Main objective: Make life easier for researchers
http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
MBInfo
The IMEx Consortium (www.imexconsortium.org)
Orchard et al., Nat Methods, 2012
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Conclusions
• The PSI is a very active group.
• Big variety of data types in proteomics -> Several data
standards available.
• Adoption of standards (as usual) takes some time.
• Public databases greatly benefit from them.
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Next meeting in Ghent on April!
Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2015
Hinxton, 10 December 2015
Questions?

More Related Content

PPTX
Proteomics repositories
PPTX
Reuse of public proteomics data
PPTX
PRIDE-ProteomeXchange
PPTX
Mass spectrometry resources at the EBI
PPTX
Reuse of public proteomics data
PPTX
PRIDE and ProteomeXchange
PPTX
Proteomics data standards
PPTX
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics repositories
Reuse of public proteomics data
PRIDE-ProteomeXchange
Mass spectrometry resources at the EBI
Reuse of public proteomics data
PRIDE and ProteomeXchange
Proteomics data standards
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...

What's hot (20)

PPTX
Experiences to learn from the MS proteomics field
PPTX
Proteomics public data resources: enabling "big data" analysis in proteomics
PPTX
Public proteomics data: a (mostly unexploited) gold mine for computational re...
PPTX
ProteomeXchange_and_PRIDE_Semmeting_2015
PPTX
An overview of the PRIDE ecosystem of resources and computational tools for m...
PPTX
How to run and maintain a popular biological data repository?
PDF
Pride cluster presentation
PPTX
Reproducibility (and the R*) of Science: motivations, challenges and trends
PPTX
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
PPT
Royal society of chemistry activities to develop a data repository for chemis...
PDF
Better Data for a Better World
PPTX
Reflections on a (slightly unusual) multi-disciplinary academic career
PPTX
Finding and Accessing Human Genomics Datasets
PPT
The Seven Deadly Sins of Bioinformatics
PPTX
Introduction to the Proteomics Bioinformatics Course 2017
PDF
BioSharing.org - mapping the landscape of community standards, databases, dat...
PPTX
Reproducible Research: how could Research Objects help
PPTX
Introduction to data management
PDF
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
PPTX
Bioinformatics in the Era of Open Science and Big Data
Experiences to learn from the MS proteomics field
Proteomics public data resources: enabling "big data" analysis in proteomics
Public proteomics data: a (mostly unexploited) gold mine for computational re...
ProteomeXchange_and_PRIDE_Semmeting_2015
An overview of the PRIDE ecosystem of resources and computational tools for m...
How to run and maintain a popular biological data repository?
Pride cluster presentation
Reproducibility (and the R*) of Science: motivations, challenges and trends
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Royal society of chemistry activities to develop a data repository for chemis...
Better Data for a Better World
Reflections on a (slightly unusual) multi-disciplinary academic career
Finding and Accessing Human Genomics Datasets
The Seven Deadly Sins of Bioinformatics
Introduction to the Proteomics Bioinformatics Course 2017
BioSharing.org - mapping the landscape of community standards, databases, dat...
Reproducible Research: how could Research Objects help
Introduction to data management
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Bioinformatics in the Era of Open Science and Big Data
Ad

Similar to Proteomics data standards (20)

PPTX
Introduction to the PSI standard data formats
PPTX
Proteomics data standards
PPTX
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
PPTX
PRIDE and ProteomeXchange: Training webinar
PPTX
Mass Spectrometry Informatics formats in progress
PPTX
PSI-Proteome Informatics update
PPTX
Proteomics data standards
PPTX
Introduction to EBI for Proteomics in ELIXIR
PPTX
Pride and ProteomeXchange
PPTX
Proteomics repositories
PDF
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
PPTX
The mzTab data standard format for reporting MS-based peptide, protein and sm...
PPT
ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...
PPTX
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
PPTX
Big Data and its Role in Biomedical Research
PDF
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
PDF
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
PDF
Access methods for analysing sensitive data (amased)
PDF
Data discovery and sharing at UCLH
PPTX
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Introduction to the PSI standard data formats
Proteomics data standards
Data volumes in proteomics data resources: PRIDE and ProteomeXchange
PRIDE and ProteomeXchange: Training webinar
Mass Spectrometry Informatics formats in progress
PSI-Proteome Informatics update
Proteomics data standards
Introduction to EBI for Proteomics in ELIXIR
Pride and ProteomeXchange
Proteomics repositories
Developing open data analysis pipelines in the cloud: Enabling the ‘big data’...
The mzTab data standard format for reporting MS-based peptide, protein and sm...
ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange us...
PRIDE and ProteomeXchange: A golden age for working with public proteomics data
Big Data and its Role in Biomedical Research
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Access methods for analysing sensitive data (amased)
Data discovery and sharing at UCLH
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Ad

More from Juan Antonio Vizcaino (19)

PDF
Reusing and integrating public proteomics data to improve our knowledge of th...
PDF
Reuse of public proteomics data
PDF
PRIDE resources and ProteomeXchange
PDF
Proteomics repositories
PDF
Introduction to the Proteomics Bioinformatics Course 2018
PDF
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
PDF
ProteomeXchange update
PDF
The ELIXIR Proteomics community
PDF
The ELIXIR Proteomics Community
PDF
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
PPTX
The ProteomeXchange Consoritum: 2017 update
PPTX
Proteomics repositories
PPTX
Is it feasible to identify novel biomarkers by mining public proteomics data?
PPTX
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
PPTX
ProteomeXchange update 2017
PPTX
Enabling automated processing and analysis of large-scale proteomics data
PPTX
The Proteomics Standards Initiative (PSI)
PPTX
Introduction to the Proteomics Bioinformatics Course 2016
PPTX
Reuse of public data in proteomics
Reusing and integrating public proteomics data to improve our knowledge of th...
Reuse of public proteomics data
PRIDE resources and ProteomeXchange
Proteomics repositories
Introduction to the Proteomics Bioinformatics Course 2018
ELIXIR Implementation Study: “Mining the Proteome: Enabling Automated Process...
ProteomeXchange update
The ELIXIR Proteomics community
The ELIXIR Proteomics Community
A proteomics data “gold mine” at your disposal: Now that the data is there, w...
The ProteomeXchange Consoritum: 2017 update
Proteomics repositories
Is it feasible to identify novel biomarkers by mining public proteomics data?
The spectra-cluster toolsuite: Enhancing proteomics analysis through spectrum...
ProteomeXchange update 2017
Enabling automated processing and analysis of large-scale proteomics data
The Proteomics Standards Initiative (PSI)
Introduction to the Proteomics Bioinformatics Course 2016
Reuse of public data in proteomics

Recently uploaded (20)

PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
An interstellar mission to test astrophysical black holes
PDF
The scientific heritage No 166 (166) (2025)
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
famous lake in india and its disturibution and importance
PPTX
2Systematics of Living Organisms t-.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
Microbiology with diagram medical studies .pptx
PDF
. Radiology Case Scenariosssssssssssssss
7. General Toxicologyfor clinical phrmacy.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Biophysics 2.pdffffffffffffffffffffffffff
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
POSITIONING IN OPERATION THEATRE ROOM.ppt
An interstellar mission to test astrophysical black holes
The scientific heritage No 166 (166) (2025)
2. Earth - The Living Planet Module 2ELS
neck nodes and dissection types and lymph nodes levels
famous lake in india and its disturibution and importance
2Systematics of Living Organisms t-.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Cell Membrane: Structure, Composition & Functions
Microbiology with diagram medical studies .pptx
. Radiology Case Scenariosssssssssssssss

Proteomics data standards

  • 1. Introduction to the PSI standard data formats Dr. Juan Antonio Vizcaíno PRIDE Group Coordinator Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK
  • 2. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Overview • A couple of slides about the need of data standards • The Proteomics Standards Initiative • Existing data standards • Connection with ProteomeXchange and IMEx
  • 3. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Overview • A couple of slides about the need of data standards • The Proteomics Standards Initiative • Existing data standards • Connection with ProteomeXchange and IMEx
  • 4. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Standards are needed in life: also in bioinformatics… With a small number of standards, data converters are feasible Data standards are needed
  • 5. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Taken from Biocomical, http://biocomicals.blogspot.com
  • 6. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Mass Spectrometry (MS)-based proteomics • Many different workflows -> Many different data types -> Need for several data standards. • Discovery mode: • Bottom-up proteomics • Data dependent acquisition • Data independent acquisition • Top down proteomics • Targeted mode: • SRM (Selected Reaction Monitoring)
  • 7. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Overview • A couple of slides about the need of data standards • The Proteomics Standards Initiative • Existing data standards • Connection with ProteomeXchange and IMEx
  • 8. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 •Develops data format standards for proteomics. •Both data representation and annotation standards. •Involves data producers, database providers, software producers, publishers, … •Active Workgroups: MI, MS, PI, Mod. •Inter-group activities: MIAPE and Controlled Vocabularies. •Started in 2002, so some experience already… •One annual meeting in March-April, regular phone calls. •Close interaction with the metabolomics community. http://www.psidev.info HUPO Proteomics Standards Initiative
  • 9. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PSI Deliverables •Minimum information (MIAPE) specifications: Format-independent specification of minimum information guidelines. •Formats: Usually an XML schema (but also tab-delimited files) capable of representing the relevant Minimum Information, plus additional detailed data for the domain. •Controlled vocabularies: Usually an OBO-style hierarchical controlled vocabulary precisely defining the metadata that are encoded in the formats. •Databases and Tools: Foster software implementations to make the standards truly useful. •Community interaction to ensure deposition of data in public repositories.
  • 10. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PSI MS Controlled Vocabulary Mayer et al., Database, 2013
  • 11. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Overview • A couple of slides about the need of data standards • The Proteomics Standards Initiative • Existing data standards • Connection with ProteomeXchange and IMEx
  • 12. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 The typical dilemma •Data standards need to be stable to promote adoption •Proteomics standards need to evolve very rapidly: • Data is inherently very complex • Experimental techniques are evolving all the time
  • 13. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 •MS data: mzML (also used in MS metabolomics). •Protein and peptide identifications: mzIdentML. •Peptide and protein quantification: mzQuantML (also supports small molecules). •SRM transitions (for targeted proteomics): TraML. •Molecular interactions: PSI MI XML and MITAB. •Latest addition (just published): mzTab: identification and quantification results for peptides, proteins and small molecules (also used in MS metabolomics). www.psidev.info Existing data standards in proteomics
  • 14. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Current PSI Standard File Formats for MS • mzTabFinal Results • TraMLSRM • mzQuantMLQuantitation • mzIdentMLIdentification • mzMLMS data
  • 15. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Binary data mzData mzXML mzML XML-based files .dta, .pkl, .mgf, .ms2 Peak lists Data formats for mass spectra data
  • 16. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 An example of success story: mzML • A data format for the storage and exchange of MS output files • Designed by merging the best aspects of both mzData and mzXML • Developed with full participation of academic researchers, hardware and software vendors • Expected to replace mzXML and mzData, but not expected to completely replace vendor binary formats • Captures spectra (raw data or peak lists), chromatograms and related metadata • Version 1.0 released in June 2008, v1.1 released in June 2009 • Many implementations already exist • Version 1.2 with enhanced compression considered for 2014 Martens et al., MCP, 2011
  • 17. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 An example of success story: mzML Martens et al., MCP, 2011
  • 18. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 An example of success story: mzML The most popular search engines support mzML Many parser libraries available Conversion from raw files into mzMLhttp://www.psidev.info/mzml_1_0_0
  • 19. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Application of mzML to metabolomics
  • 20. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 mzIdentML, mascot .dat, sequest .out, SpectrumMill .spo pep.xml, prot.xml Only qualitative data! Data formats for output from search engines
  • 21. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 mzIdentML: peptide and protein identifications • Overview • XML-based data standard for peptide and protein identifications e.g. following database search and protein inference. • Sections for all PSMs, proteins/protein groups inferred, protocols/parameters etc. • Timeline: • Original 1.0 version in Aug 2009. • Version 1.1 stable (Aug 2011). • Manuscript published in MCP in 2012*. • 2012-2015: • Improving support for protein grouping multiple search engines, pre-fractionation approaches and de novo sequencing. • Now firmly embedded as part of ProteomeXchange submission process, and supported by lots of external software. * Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., et al., The mzIdentML data standard for mass spectrometry-based proteomics results. Molecular & Cellular Proteomics 2012, 11, M111.014381.
  • 22. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 mzIdentML: peptide and protein identifications * Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., et al., The mzIdentML data standard for mass spectrometry-based proteomics results. Molecular & Cellular Proteomics 2012, 11, M111.014381. MzIdentML CvList AnalysisSoftwareList AnalysisSampleCollection SequenceCollection AnalysisCollection AnalysisProtocolCollection DataCollection URLsof controlledvocabularies usedwithinthe file Softwarepackagesused Biologicalsamples analysed, annotated with CV terms Databaseentriesofprotein/ peptide sequencesidentifiedandmodifications Applicationofprotocol inputs= externalspectra1..n output= SpectrumIdentificationList1 SpectrumIdentificationProtocol ProteinDetectionProtocol SpectrumIdentificationProtocol AdditionalSearchParams ModificationParams Enzymes DatabaseFilters Parametersfor theprotein detection procedure Inputs AnalysisData AnalysisData SpectrumIdentificationList Thedatabasesearched and theinput fileconverted tomzIdentML SpectrumIdentificationResult SpectrumIdentificationItem ProteinDetectionList ProteinAmbiguityGroup ProteinDetectionHypothesis All identificationsmade from searchingonespectrum One(poly)peptide- spectrummatch Aset ofrelatedprotein identificationse.g.conflicting peptide-proteinassignments Asingle proteinidentification SpectrumIdentification ProteinDetectionApplicationofprotocol inputs= SpectrumIdentificationList1..n output= ProteinDetectionList1
  • 23. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 An example: XML snippet of mzIdentML
  • 24. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Support for mzIdentML Search Engine Results + MS files Search engines mzIdentML - Mascot - MSGF+ - Myrimatch and related tools from D. Tabb’s lab - OpenMS - PEAKS - PeptideShaker (several open source tools) - ProCon (ProteomeDiscoverer, Sequest) - Scaffold - TPP via the idConvert tool (ProteoWizard) - ProteinPilot (from version 5.0) - X!Tandem (from PILEDRIVER version) - Others: library for X!Tandem conversion, lab internal pipelines, … - Crux An increasing number of tools support export to mzIdentML 1.1 Updated list: http://www.psidev.info/tools-implementing-
  • 25. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 mzIdentML status • Current status of mzIdentML 1.2 • Improved support for protein grouping** • Support for PTM localisation scoring added - COMPLETE • Support for peptide-level statistics added - COMPLETE • Better support for multiple search approaches - COMPLETE • Support for crosslinking approaches – INCOMPLETE • mzIdentML 1.2 submission to document process - INCOMPLETE ** Seymour, S. L., Farrah, T., Binz, P. A., Chalkley, R. J., et al., A standardized framing for reporting protein identifications in mzIdentML 1.2. Proteomics 2014, 14, 2389-2399.
  • 26. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 mzQuantML: Standard for quantitative data Overview • XML-based standard for quantification data – following use of quant software • Can report tables of data (QuantLayers), columns are: StudyVariables, Assays or Ratios; rows are ProteinGroups, Proteins or Peptides • Can also capture 2D coordinates of quantified regions in LC-MS (Features) Timeline • Version 1.0 rc-1 submitted to the PSI process October 2011; Version 1.0 rc-2 June 2012; Re- submitted to PSI process in October 2012 • Completed PSI process in Feb 2013 – version 1.0 release • Supports label-free (intensity), label-free (spectral counting), MS2 tag techniques (e.g. iTRAQ) and MS1 label techniques e.g. SILAC • Schema is fixed with each technique defined by separate semantic rules, implemented in validator software • Manuscript published in MCP in summer 2013* • Updated in 2013-2014 to support SRM as a new technique** (version 1.0.1 just submitted to the document process). *Walzer et al. MCP 2013 Aug;12(8):2332-40. doi: 10.1074/mcp.O113.028506 **Qi et al. PROTEOMICS, 2015
  • 27. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 mzQuantML: Standard for quantitative data *Walzer et al. MCP 2013 Aug;12(8):2332-40. doi: 10.1074/mcp.O113.028506 **Qi et al. PROTEOMICS, 2015
  • 28. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 mzQuantML status Open issues • Updated in 2014 to support absolute quant techniques but not yet added to specification document • Some supporting software but not currently implemented for import in databases (PRIDE?). • No webpage for software implementations.
  • 29. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Wide variety of quantification techniques
  • 30. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 mzQ library • The “mzqLibrary” includes a set of applications/modules to facilitate the use of mzQuantML within analysis pipelines. • The “mzqLibrary” provides common routines for post- processing quantitative proteomics data via the command line interface or graphical user interface - called the “mzqViewer". • The mzqViewer integrates with R to produce a heat map or principal component analysis (PCA) on selected data tables. Line plots can also be produced showing peptide and protein quant data across MS runs. * Qi et al., Proteomics, 2015
  • 31. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 The mzqLibrary can convert output of Progenesis (*.csv files), MaxQuant (*.txt files), and OpenMS (*.consensusXML files) into mzQuantML (*.mzq). The mzqLibrary can convert mzQuantML file to four file formats (XLS, CSV, HTML, and mzTab). mzqLibrary: Converter & Exporter
  • 32. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 The last addition: mzTab – Aims and concept • To provide a simple and efficient way of exchanging results from MS approaches. • Simpler summary report of the experimental results • Peptides and proteins identified in a given experimental setting • Small molecules identified • Reported quantification values • Technical and biological metadata • Easier to parse and use by the research community, systems biologists as well as providers of knowledge bases. • It can be used by non-experts in bioinformatics. • It does not aim to replace mzIdentMl and mzQuantML
  • 33. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 mzTab - Sections • Basic information about experiment and sample • Key-Value pairsMetadata • Basic information about protein identifications • Table-basedProtein • Information about quantified peptides • Table-basedPeptide • Information about identified spectra • Table-basedPSM • Basic information about identified small molecules • Table-basedSmall Molecule Griss et al., MCP, 2014
  • 34. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Metadata section - Example
  • 35. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Protein Section (label-free)
  • 36. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 mzTab – Current implementations • jmzTab (Java API): Version 3.0 is now a stable version. Manuscript published in the journal Proteomics. • mzTab Validator, PRIDE XML to mzTab converter (PRIDE team). • mzIdentML and mzQuantML to mzTab converters (Andy Jones group). • Mascot (Matrix Science) exporter. • MaxQuant: exporter in beta is available. • OpenMS (version 1.10). • R/Bioconductor package Msnbase (L. Gatto, Cambridge University). • LipidDataAnalyzer (J. Hartler, University of Graz, see next talk). • Metabolights (EBI).
  • 37. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 mzTab – ongoing development in metabolomics • More detailed modelling of MS metabolomics data • Led by S. Neumann (COSMOS EU FP7 project). • Extension from one to three sections. Example file exists at https://github.com/sneumann/mtbls2/faahKO.mzTab http://www.cosmos-fp7.eu/
  • 38. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Unify exchange of transitions with TraML • PSI’s TraML (Transitions Markup Language) • Format for encoding SRM/MRM transitions • Version 1.0.0 now released and published in MCP (Deutsch et al. 2012) Journal Articles Transitions Databases Excel sheets SRM Analysis Software Instruments TraML
  • 39. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Unify exchange of transitions with TraML Deutsch et al., MCP, 2012
  • 40. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Java library for working with TraML files It aims: - command line & simple GUI - TraML to TSV <-> TSV to TraML - TSV vendor formats from TSQ, QTRAP5500, AgilentQQQ Published: Helsens et al., JPR, 2011 TraML software implementations
  • 41. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 PSI document process •Every data standard has to undergo a thorough review process… •In fact, in practice, two review processes happen in parallel: the PSI and manuscript review.
  • 42. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Data standard publications mzML (data standard for MS data) Martens et al., MCP, 2011 mzIdentML (standard for peptide/protein IDs) Jones et al., MCP, 2012 TraML (for SRM transitions) Deutsch et al., MCP, 2012 mzQuantML (for quantitative data) Waltzer et al., MCP, 2013 mzTab (peptide/protein ID and quantification) Griss et al., MCP, 2014 Some updates already going on (e.g. mzIdentML 1.2)
  • 43. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Importance of making software available jmzML (http://code.google.com/p/jmzml/) Cote et al., Proteomics, 2009 jmzIdentML (http://code.google.com/p/jmzidentml/) Reisinger et al., Proteomics, 2012 jmzReader (http://code.google.com/p/jmzreader/) Griss et al., Proteomics, 2012 jmzQuantML (http://code.google.com/p/jmzquantml/) Qi et al., Proteomics, 2014 jmzTab (http://code.google.com/p/mztab/) Xu et al., Proteomics, 2014 PSI promotes implementations. The reference libraries are always open source.
  • 44. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 And also… protein-protein interactions PSI-XML: XML-based format • Version 2.5 is the working version • Version 3.0 under development MITAB: tab-delimited format
  • 45. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Overview • A couple of slides about the need of data standards • The Proteomics Standards Initiative • Existing data standards • Connection with ProteomeXchange and IMEx
  • 46. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 ProteomeXchange Consortium • Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories. • Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego). • EU FP7 CA (01/2011-> 06/2014). • Common identifier space (PXD identifiers) • Two supported data workflows: MS/MS and SRM. • Main objective: Make life easier for researchers http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014
  • 47. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 MBInfo The IMEx Consortium (www.imexconsortium.org) Orchard et al., Nat Methods, 2012
  • 48. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Conclusions • The PSI is a very active group. • Big variety of data types in proteomics -> Several data standards available. • Adoption of standards (as usual) takes some time. • Public databases greatly benefit from them.
  • 49. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Next meeting in Ghent on April!
  • 50. Juan A. Vizcaíno juan@ebi.ac.uk WT Proteomics Bioinformatics Course 2015 Hinxton, 10 December 2015 Questions?