SlideShare a Scribd company logo
Photo image area measures 2” H x 6.93” W and can be masked by a
collage strip of one, two or three images.
The photo image area is located 3.19” from left and 3.81” from top of page.
Each image used in collage should be reduced or cropped to a maximum of
2” high, stroked with a 1.5 pt white frame and positioned edge-to-edge with
accompanying images.
Consensus ranking and fragmentation
prediction for identification of unknowns in
high resolution mass spectrometry
Office of Research and Development
National Center for Computational Toxicology, RTP, NC August 20, 2018
Andrew D. McEachran
Hussein Al-Ghoul, Ilya Balabin, Tommy Cathey, Alex Chao, Jon Sobus, and
Antony J. Williams
http://orcid.org/0000-0003-1423-330X
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
AGRO 107
General Goals of SSA/NTA
- 1 Dust Sample
- Negative Ionization Mode
- 300 Extracted “Molecular
Features”
1) Prioritize “Molecular Features”
2) Correctly assign formulas
3) Correctly assign structures
4) Identify chemical sources
5) Predict chemical concentrations
C17H19NO3 12 µg/g
(1)
(2) (3) (4) (5)
EXPOSURE
8/20/18
C17H19NO3….
8/20/18
plus 100s more…
C17H19NO3….
8/20/18 Schymanski, et al. 2014
plus 100s more…
8/20/18
Analytical Instruments Comp. Tools & Workflows
Databases
The General Approach
8/20/18
Analytical Instruments Comp. Tools & Workflows
Databases
The General Approach
CompTox Dashboard
8/20/18
https://comptox.epa.gov
8/20/18
8/20/18
8/20/18
8/20/18
8/20/18
C10H14N2
Batch Search for SSA/NTA
MS-Ready Structures improve database
searching
8/20/18
McEachran, et al, accepted manuscript
MS-Ready Structures improve database
searching
8/20/18
McEachran, et al, accepted manuscript
C17H19NO3….
8/20/18
plus 100s more…
Data Source Ranking of “known
unknowns”
• Mass and/or formula
unknown to a researcher,
contained within a
reference database
• Most likely candidate
chemicals have the most
references/sources
8/20/18
C14H22N2O3
266.16304
Chemical
Reference
Database
Sorted
candidate
structures
Initial Data Source Ranking in
ChemSpider
8/20/18
• Adopted by NTA
researchers
around the world
• On same 162 chemicals,
Dashboard outperforms
ChemSpider
8/20/18
Additional Data Streams to Improve
Identifications
• US EPA CompTox Dashboard Data Sources (DS)
• PubChem Data Source Count
• PubMed Reference Count
• Presence in STOFF-IDENT Database
• Predicted Environmental Media Occurrence
• OPERA PhysChem Properties
• NORMAN Network Priority List
8/20/18
8/20/18
All available via Batch Search:
8/20/18
Identification ranks for 1783 chemicals using
multiple data streams
𝑆𝑆𝑆𝑆𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 𝑆𝑆𝑆𝑆𝐷𝐷𝐷𝐷 + 𝑆𝑆𝑆𝑆𝑃𝑃𝑃𝑃 + 𝑆𝑆𝑆𝑆𝑅𝑅𝑅𝑅 + 𝑆𝑆𝑆𝑆𝑀𝑀𝑀𝑀 + ⋯
Metadata is critical, but need structural
confirmation to increase confidence
8/20/18
Schymanski, et al. 2014
Library
Fragmentation
Spectra (20eV)
Observed
Fragmentation
Spectra (20eV)
Match
Score
MS/MS Spectral Matching for Identification
8/20/18
CFM-ID
• Fragmentation prediction for identification in HRMS
• Open source code allows for MS/MS spectra
prediction for ESI+, ESI-, and EI
• Predictions generated and stored for >700,000
structures, to be accessible via CompTox Dashboard
• Python code to pull matches and score experimental
vs predicted spectra
• Cosine dot product match score calculation
8/20/18
Allen, et al 2014, 2015, 2016
8/20/18
8/20/18
Predicted MS/MS spectra provide greater
coverage than empirical libraries
Chao, et al, in prep
Evaluating on CASMI 2016
• Critical Assessment of Small Molecule Identification
–Training data= 312 peak lists (from 285 substances)
• 234 MS/MS in positive mode
• 58 in negative mode
–Challenge Data= 208 peak lists (from 188 substances)
• 127 in positive mode
• 81 in negative mode
• Precursor ion search window= 15 ppm
• Fragment ion match threshold= 0.02 Da
• Candidates limited to Dashboard results within precursor ion
search window
8/20/18
http://www.casmi-contest.org/2016/index.shtml
8/20/18
# Identified % of Total
#1 Hits 89 43%
Top 5 154 74%
Top 10 174 84%
Top 20 190 91%
# Identified % of Total
#1 Hits 154 74%
Top 5 195 94%
Top 10 198 95%
Top 20 202 97%
CFM-ID only CFM-ID +DSSTox Data Sources
CASMI 2016 Contest Challenge Set (n=208)
Access via CompTox Dashboard
8/20/18
• Data available for download after publication
McEachran, et al, in prep
8/20/18
Mockup, work in progress….
Dashboard in NTA Workflows
Identification by Data
Source Ranking
Retention Time
Prediction
MS/MS Data and
Fragmentation
Prediction
Environmental Media
Occurrence
Functional Use/Product
Occurrence
8/20/18
CompTox
Dashboard
Future Directions
• Combined data visualization
• Retention time index (RTI) predictions
• Ongoing expansion of the database
• Integration to public MS databases
8/20/18
0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0
0
2 0 0
4 0 0
6 0 0
8 0 0
1 0 0 0
A C N M e th o d
E xp e rim e n ta l R T I
PredictedRTI
Conclusions
• Databases are effective resources in SSA/NTA
• CompTox Dashboard provides access to chemistry
data for >760,000 chemical substances
• Predicted MS/MS spectra linked within the CompTox
Dashboard further enhances effectiveness and
increases confidence in identifications
8/20/18
Acknowledgements
EPA NCCT
Tony Williams
Chris Grulke
Jeff Edwards
EPA NERL
Katherine Phillips
Kristin Isaacs
Kathie Dionisio
Jon Sobus
Mark Strynar
Elin Ulrich
Seth Newton
8/20/18
External Collaborators
Emma Schymanski- Univ.
Luxembourg
Christoph Ruttkies- IPB,
Halle
Kamel Mansouri- ILS, Inc
Questions?
• mceachran.andrew@epa.gov
• http://orcid.org/0000-0003-1423-330X
• Associated presentations:
–AGRO 29: Leveraging chemistry data to improve exposure
analyses using the EPA’s CompTox Chemistry Dashboard
–ANYL 100: Developing tools for high resolution mass
spectrometry-based screening via the EPA’s CompTox
Chemistry Dashboard
–ENVR 152: EPA Comptox Chemistry Dashboard as a data
integration hub for environmental chemistry data
8/20/18

More Related Content

PPTX
Data mining
PPTX
ImmPort strategies to enhance discoverability of clinical trial data
PDF
Developing tools for high resolution mass spectrometry-based screening via th...
PPTX
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
PDF
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
PPTX
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
PDF
RapidMiner, an entrance to explore MIMIC-III?
PPTX
EPA CompTox chemicals dashboard: An online resource for environmental chemists
Data mining
ImmPort strategies to enhance discoverability of clinical trial data
Developing tools for high resolution mass spectrometry-based screening via th...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
RapidMiner, an entrance to explore MIMIC-III?
EPA CompTox chemicals dashboard: An online resource for environmental chemists

What's hot (20)

PPTX
New developments in delivering public access to data from the National Center...
PPTX
Non-targeted analysis supported by data and cheminformatics delivered via the...
PPTX
Does bigger mean better in the world of chemistry databases?
PPTX
Accessing information for chemicals in hydraulic fracturing fluids using the ...
PPTX
Development of a Tool for Systematic Integration of Traditional and New Appro...
PPTX
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
PPTX
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
PDF
operationalizing asthma analytic plan using omop cdm brandt
PPTX
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
PPTX
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
PPTX
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
PPTX
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
PDF
Usage of open source software for Real World Data Analysis in pharmaceutical ...
PPTX
Bringing it all together: A Web-based Database for Chemical and Biological Da...
PPTX
Linking Structured and Unstructured Phenotypes through the OMOP Common Data M...
PDF
PPTX
Sharing chemical structures with peer reviewed publications
PPTX
OHDSI CDM Presentation
PDF
Data mining-for-prediction-of-aircraft-component-replacement
PPT
Adding complex expert knowledge into chemical database and transforming surfa...
New developments in delivering public access to data from the National Center...
Non-targeted analysis supported by data and cheminformatics delivered via the...
Does bigger mean better in the world of chemistry databases?
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Development of a Tool for Systematic Integration of Traditional and New Appro...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
operationalizing asthma analytic plan using omop cdm brandt
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...
Bringing it all together: A Web-based Database for Chemical and Biological Da...
Linking Structured and Unstructured Phenotypes through the OMOP Common Data M...
Sharing chemical structures with peer reviewed publications
OHDSI CDM Presentation
Data mining-for-prediction-of-aircraft-component-replacement
Adding complex expert knowledge into chemical database and transforming surfa...
Ad

Similar to Consensus ranking and fragmentation prediction for identification of unknowns in high resolution mass spectrometry (20)

PDF
Leveraging chemistry data to improve exposure analyses using the EPA’s CompTo...
PPTX
Using the US EPA's CompTox Chemistry Dashboard to advance non-targeted analys...
PPTX
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
PDF
An open workflow to generate "MS-Ready" structures and improve non-targeted m...
PPTX
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
PPTX
The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...
PPTX
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
PPTX
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
PPTX
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
PPTX
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
PPTX
Consensus ranking and fragmentation prediction for identification of unknowns...
PPTX
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
PPTX
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
PPTX
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
PPTX
Structure Identification Using High Resolution Mass Spectrometry Data and the...
PPTX
Delivering web-based access to data and algorithms to support computational t...
PPTX
Structure Identification Using High Resolution Mass Spectrometry Data and the...
PDF
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
PPTX
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
PPTX
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Leveraging chemistry data to improve exposure analyses using the EPA’s CompTo...
Using the US EPA's CompTox Chemistry Dashboard to advance non-targeted analys...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
An open workflow to generate "MS-Ready" structures and improve non-targeted m...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
Consensus ranking and fragmentation prediction for identification of unknowns...
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Delivering web-based access to data and algorithms to support computational t...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Ad

Recently uploaded (20)

PDF
Sciences of Europe No 170 (2025)
PPT
protein biochemistry.ppt for university classes
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
The scientific heritage No 166 (166) (2025)
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
2. Earth - The Living Planet earth and life
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
BIOMOLECULES PPT........................
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
Sciences of Europe No 170 (2025)
protein biochemistry.ppt for university classes
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
HPLC-PPT.docx high performance liquid chromatography
The scientific heritage No 166 (166) (2025)
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
neck nodes and dissection types and lymph nodes levels
INTRODUCTION TO EVS | Concept of sustainability
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
TOTAL hIP ARTHROPLASTY Presentation.pptx
2. Earth - The Living Planet earth and life
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Classification Systems_TAXONOMY_SCIENCE8.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
BIOMOLECULES PPT........................
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
7. General Toxicologyfor clinical phrmacy.pptx

Consensus ranking and fragmentation prediction for identification of unknowns in high resolution mass spectrometry

  • 1. Photo image area measures 2” H x 6.93” W and can be masked by a collage strip of one, two or three images. The photo image area is located 3.19” from left and 3.81” from top of page. Each image used in collage should be reduced or cropped to a maximum of 2” high, stroked with a 1.5 pt white frame and positioned edge-to-edge with accompanying images. Consensus ranking and fragmentation prediction for identification of unknowns in high resolution mass spectrometry Office of Research and Development National Center for Computational Toxicology, RTP, NC August 20, 2018 Andrew D. McEachran Hussein Al-Ghoul, Ilya Balabin, Tommy Cathey, Alex Chao, Jon Sobus, and Antony J. Williams http://orcid.org/0000-0003-1423-330X The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA AGRO 107
  • 2. General Goals of SSA/NTA - 1 Dust Sample - Negative Ionization Mode - 300 Extracted “Molecular Features” 1) Prioritize “Molecular Features” 2) Correctly assign formulas 3) Correctly assign structures 4) Identify chemical sources 5) Predict chemical concentrations C17H19NO3 12 µg/g (1) (2) (3) (4) (5) EXPOSURE 8/20/18
  • 4. C17H19NO3…. 8/20/18 Schymanski, et al. 2014 plus 100s more…
  • 5. 8/20/18 Analytical Instruments Comp. Tools & Workflows Databases The General Approach
  • 6. 8/20/18 Analytical Instruments Comp. Tools & Workflows Databases The General Approach
  • 13. MS-Ready Structures improve database searching 8/20/18 McEachran, et al, accepted manuscript
  • 14. MS-Ready Structures improve database searching 8/20/18 McEachran, et al, accepted manuscript
  • 16. Data Source Ranking of “known unknowns” • Mass and/or formula unknown to a researcher, contained within a reference database • Most likely candidate chemicals have the most references/sources 8/20/18 C14H22N2O3 266.16304 Chemical Reference Database Sorted candidate structures
  • 17. Initial Data Source Ranking in ChemSpider 8/20/18 • Adopted by NTA researchers around the world
  • 18. • On same 162 chemicals, Dashboard outperforms ChemSpider 8/20/18
  • 19. Additional Data Streams to Improve Identifications • US EPA CompTox Dashboard Data Sources (DS) • PubChem Data Source Count • PubMed Reference Count • Presence in STOFF-IDENT Database • Predicted Environmental Media Occurrence • OPERA PhysChem Properties • NORMAN Network Priority List 8/20/18
  • 20. 8/20/18 All available via Batch Search:
  • 21. 8/20/18 Identification ranks for 1783 chemicals using multiple data streams 𝑆𝑆𝑆𝑆𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 𝑆𝑆𝑆𝑆𝐷𝐷𝐷𝐷 + 𝑆𝑆𝑆𝑆𝑃𝑃𝑃𝑃 + 𝑆𝑆𝑆𝑆𝑅𝑅𝑅𝑅 + 𝑆𝑆𝑆𝑆𝑀𝑀𝑀𝑀 + ⋯
  • 22. Metadata is critical, but need structural confirmation to increase confidence 8/20/18 Schymanski, et al. 2014
  • 24. CFM-ID • Fragmentation prediction for identification in HRMS • Open source code allows for MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >700,000 structures, to be accessible via CompTox Dashboard • Python code to pull matches and score experimental vs predicted spectra • Cosine dot product match score calculation 8/20/18 Allen, et al 2014, 2015, 2016
  • 26. 8/20/18 Predicted MS/MS spectra provide greater coverage than empirical libraries Chao, et al, in prep
  • 27. Evaluating on CASMI 2016 • Critical Assessment of Small Molecule Identification –Training data= 312 peak lists (from 285 substances) • 234 MS/MS in positive mode • 58 in negative mode –Challenge Data= 208 peak lists (from 188 substances) • 127 in positive mode • 81 in negative mode • Precursor ion search window= 15 ppm • Fragment ion match threshold= 0.02 Da • Candidates limited to Dashboard results within precursor ion search window 8/20/18 http://www.casmi-contest.org/2016/index.shtml
  • 28. 8/20/18 # Identified % of Total #1 Hits 89 43% Top 5 154 74% Top 10 174 84% Top 20 190 91% # Identified % of Total #1 Hits 154 74% Top 5 195 94% Top 10 198 95% Top 20 202 97% CFM-ID only CFM-ID +DSSTox Data Sources CASMI 2016 Contest Challenge Set (n=208)
  • 29. Access via CompTox Dashboard 8/20/18 • Data available for download after publication McEachran, et al, in prep
  • 30. 8/20/18 Mockup, work in progress….
  • 31. Dashboard in NTA Workflows Identification by Data Source Ranking Retention Time Prediction MS/MS Data and Fragmentation Prediction Environmental Media Occurrence Functional Use/Product Occurrence 8/20/18 CompTox Dashboard
  • 32. Future Directions • Combined data visualization • Retention time index (RTI) predictions • Ongoing expansion of the database • Integration to public MS databases 8/20/18 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 A C N M e th o d E xp e rim e n ta l R T I PredictedRTI
  • 33. Conclusions • Databases are effective resources in SSA/NTA • CompTox Dashboard provides access to chemistry data for >760,000 chemical substances • Predicted MS/MS spectra linked within the CompTox Dashboard further enhances effectiveness and increases confidence in identifications 8/20/18
  • 34. Acknowledgements EPA NCCT Tony Williams Chris Grulke Jeff Edwards EPA NERL Katherine Phillips Kristin Isaacs Kathie Dionisio Jon Sobus Mark Strynar Elin Ulrich Seth Newton 8/20/18 External Collaborators Emma Schymanski- Univ. Luxembourg Christoph Ruttkies- IPB, Halle Kamel Mansouri- ILS, Inc
  • 35. Questions? • mceachran.andrew@epa.gov • http://orcid.org/0000-0003-1423-330X • Associated presentations: –AGRO 29: Leveraging chemistry data to improve exposure analyses using the EPA’s CompTox Chemistry Dashboard –ANYL 100: Developing tools for high resolution mass spectrometry-based screening via the EPA’s CompTox Chemistry Dashboard –ENVR 152: EPA Comptox Chemistry Dashboard as a data integration hub for environmental chemistry data 8/20/18