SlideShare a Scribd company logo
The US-EPA CompTox Chemicals Dashboard
– a key player in the domain of Open Science,
Cheminformatics, and Online Databases
supporting Non-Targeted Screening
Pittcon, March 2020
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
Antony J. Williams, Elin Ulrich, Joachim Pleil,
Alex Chao, Charles Lowe, Andrew McEachran*
and Jon Sobus
Center for Computational Toxicology and Exposure, US-EPA, RTP, NC
*Agilent Inc.
…
Outline
• Quick overview of the dashboard
• Specific data of interest to this audience
(it’s not just Computational Toxicology)
• Support for Mass Spectrometry
• Data quality in the public domain
• Work in progress – prototypes
1
2
SEARCH
TOX DATA
BIOACTIVITY
SIMILARITY
READ-ACROSS
PUBMED
BATCH SEARCH
CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
BASIC Search
3
Detailed Chemical Pages
4
Properties, Fate and Transport
5
Sources of Exposure to Chemicals
6
Identifiers to Support Searches
7
Link Access
8
Mass Spec Links
9
NIST WebBook
https://webbook.nist.gov/chemistry/
10
MassBank of North America
https://mona.fiehnlab.ucdavis.edu
11
Batch
Searching
12
Aggregate data for a list of chemicals
13
Batch Search Names
14
Excel
Download
Add Other Data of Interest
15
Chemical Lists of
Interest…
16
225 Chemical Lists (and growing)
17
“Volatilome” Human Breath
18
Tire Crumb Rubber (298)
19
Hydraulic Fracturing (1640)
20
Opioids and Metabolites (160)
21
Disinfection By-Products
22
PFAS lists of Chemicals
23
Mycotoxins
• Two lists: 328 and 88 members
24
BIG databases are GREAT!
P
u
b
C
h
e
m
C
A
S
R
e
g
is
try
C
h
e
m
S
p
id
e
r
E
P
A
D
S
S
T
o
x
B
lo
o
d
E
x
p
o
s
o
m
e
1 0 4
1 0 5
1 0 6
1 0 7
1 0 8
1 0 9
ChemicalSubstances
• Thanks to all of the public database efforts
• So much benefit from what’s been done
• There are hundreds of them at this point…
Data Quality is important
• Data quality in free web-based databases!
26
Vomitoxin
27
Vomitoxin - ChemSpider
• 19 “Vomitoxins” – 3 isotopically labeled
28
Vomitoxin – PubChem
29
• 33 unique InChI Keys
PubChem – “virtual chemistry”
• Other databases grow quickly…a lot of “virtual
chemistry” and “make on demand” compounds.
Vomitoxin has 7 ZINC stereoforms.
• The Dashboard database grows slowly (next
release is +20k chemicals in 6 months)
30
“MS-ready”
structures
31
Overview of MS-Ready Structures
• All structure-based chemical substances are
algorithmically processed to
– Split multicomponent chemicals into individual structures
– Desalt and neutralize individual structures
– Remove stereochemical bonds from all chemicals
• MS-Ready structures are then mapped to
original substances to provide a path between
chemicals detected by mass spectrometry to
original substances
32
33
MS-Ready Mappings from
Details Page
34
MS-Ready Mappings Set of 20
substances for “PFOS”
35
Mass and Formula
Searching
36
Advanced Searches
Mass Search
37
Advanced Searches
Mass Search
38
MS-Ready Structures for
Formula Search
39
MS-Ready Mappings
• EXACT Formula: C10H16N2O8: 3 Hits
40
MS-Ready Mappings
• Same Input Formula: C10H16N2O8
• MS Ready Formula Search: 125 Chemicals
41
MS-Ready Mappings
• 125 chemicals returned in total
– 8 of the 125 are single component chemicals
– 3 of the 8 are isotope-labeled
– 3 are neutral compounds and 2 are charged
• Multiple components, stereo, isotopes and
charge all collapsed and mapped through
MS-Ready
42
Batch Searching
mass and formula
43
Batch Searching
• Singleton searches are useful but we work
with thousands of masses and formulae!
• Typical questions
– What is the list of chemicals for the formula CxHyOz
– What is the list of chemicals for a mass +/- error
– Can I get chemical lists in Excel files? In SDF files?
– Can I include properties in the download file?
44
Batch Searching Formula/Mass
45
Searching batches using MS-Ready
Formula (or mass) searching
46
Batch Search in specific lists
47
Benefits of bringing it all together
• The true dashboard benefit is integration
• Rank potential candidates for toxicity using
available data – hazard, exposure, in vitro
48
Candidate ranking
using metadata
49
Data Source Ranking of
“known unknowns”
50
• A mass and/or formula search is
for an unknown chemical but it
is a known chemical contained
within a reference database
• Most likely candidate chemicals
have the most associated data
sources, most associated
literature articles or both
C14H22N2O3
266.16304
Chemical
Reference
Database
Sorted candidate
structures
Data Streams for Ranking
• CompTox Dashboard Data Sources
• PubChem Data Source Count
• PubMed Reference Count
• Toxcast in vitro bioactivity
• Presence in CPDat database
• OPERA PhysChem Properties
• Other possibilities – predicted media
occurrence, frequency of InChIs online
Search 228.115 +/- 5.0 ppm
234 single component chemicals
52
Search 228.115 +/- 5.0 ppm
234 single component chemicals
53
The original ChemSpider work
54
Is a bigger database better?
55
• ChemSpider was 26 million chemicals for
the original work
• Much BIGGER today
• Is bigger better??
• Are there other metadata to use for ranking?
Comparing Search Performance
56
• When dashboard contained 720k chemicals
• Only 3% of ChemSpider size
• What was the comparison in performance?
SAME dataset for comparison
57
How did performance compare?
58
For the same 162 chemicals,
Dashboard outperforms
ChemSpider for both Mass and
Formula Ranking
Identification ranks for 1783 chemicals
using multiple data streams
59
DS: Data Sources
PC: PubChem
PM: PubMed
STOFF: DB
KEMI: DB
Data Sources alone
rank ~75% of the
chemicals as Top Hit
“UVCB”
Chemicals
60
UVCB Chemicals
61
UVCBs challenge in non-target analysis
62
Homologue screening plots from
Swiss Wastewater (Schymanski et al
2014, left) and Novi Sad (right)
o Complex mixtures (UVCBs) are a huge
and very challenging part of the
unknowns in many environmental
samples
Public TSCA Inventory on Dashboard
31,460 Chemicals (1/24/2020)
63
Many Chemicals are “Complex”
>14000 chemicals are UVCBs
64
“Markush Structures”
https://en.wikipedia.org/wiki/Markush_structure
65
How to represent complexity?
66
Work in
Progress
67
List Registration Activities
• Registering and curating numerous lists
– NIST library of chemicals –clean up especially around
stereochemical representation
– United States Geological Survey chemicals in water
– Scientific Working Group for the Analysis of Seized Drugs
– Synthetic Cannabinoids
– Blood Exposome Database
68
Prototype Work in Progress
• CFM-ID
– Viewing and Downloading pre-predicted spectra
– Search spectra against the database
• Structure/substructure/similarity search
• Access to API and web services
• Integration to EPA “Chemical Transformation
Simulator”
69
Predicted Mass Spectra
http://cfmid.wishartlab.com/
• MS/MS spectra prediction for ESI+, ESI-, and EI
• Predictions generated and stored for >800,000
structures, to be accessible via Dashboard
70
Search Expt. vs. Predicted Spectra
Search Expt. vs. Predicted Spectra
Spectral Viewer Comparison
73
Example match
74
Predicted Data Already Public
Publication and Data Files
75
https://epa.figshare.com/articles/CFM-ID_Paper_Data/7776212/1
Published: Alex Chao et al
76
Prototype Development
77
CASMI 2012-2017 revisited
• Application of metadata candidate ranking
and CFM-ID to all five years of CASMI data
78
Method Amenability Prediction
Charlie Lowe
Why?
• Chromatography-mass
spectrometry can be LC or GC
• Which phase is more appropriate
for which chemicals?
Ongoing Work
• Data sources to date
• Massbank of North America
• 9,275 chemicals for non-derivatized GC
• 846 chemicals for derivatized GC
• 816 chemicals for APCI+
• 454 chemicals for APCI-
• 4,907 chemicals for ESI+
• 3,430 chemicals for ESI-
• EPA Non-targeted Analysis Collaborative Trial (ENTACT)
• 886 chemicals for non-derivatized GC
• 44 chemicals for derivatized GC
• 774 chemicals for APCI+
• 431 chemicals for APCI-
• 1,113 chemicals for ESI+
• 648 chemicals for ESI-
TMAP Visualization of MoNA GC Data
API services and Open Data
• Web Services https://actorws.epa.gov/actorws/
• Data sets also available for download..
82
Web Services
https://actorws.epa.gov/actorws/
• Data in UI, JSON and XML format
• Our services are free of course..
83
InChIKey to DTXCIDs
84
https://actorws.epa.gov/actorws/dsstox/v02/msready?identifier
=UVOFGKIRTCCNKG-UHFFFAOYSA-N
Data and Services
used by the
Community
85
NORMAN Suspect List Exchange
https://www.norman-network.com/?q=node/236
86
Integration to MetFrag in place
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0299-2
87
MassBank mapping to Dashboard
Based on Web Service lookup
88
Conclusion
• Dashboard access to data for ~875,000 chemicals
(~895k in the Spring Release)
• MS-Ready data facilitates structure identification
• Related metadata facilitates candidate ranking
89
• Relationship mappings and
chemical lists of great utility
• Curation and mutual
sharing of chemical lists is
important (e.g. NORMAN)
ILS
Kamel Mansouri
EPA ORD
Ann Richard
Chris Grulke
John Wambaugh
Jeremy Dunne
Jeff Edwards
Grace Patlewicz
Alex Chao
Kristin Isaacs
Charles Lowe
James McCord
Seth Newton
Katherine Phillips
Tom Purucker
Jon Sobus
Mark Strynar
Elin Ulrich
Joach Pleil
GDIT
Ilya Balabin
Tom Transue
Tommy Cathey
Acknowledgements
TEAMS
IT Development Team
Curation Team
Collaborators
Emma Schymanski &
the NORMAN Network
Contact
Antony Williams
CCTE, US EPA Office of Research and Development,
Williams.Antony@epa.gov
ORCID: https://orcid.org/0000-0002-2668-4821
91
https://doi.org/10.1186/s13321-017-0247-6

More Related Content

PPTX
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
PPTX
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
PPTX
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
PPTX
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
PPTX
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
PPTX
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
PPTX
Consensus ranking and fragmentation prediction for identification of unknowns...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Consensus ranking and fragmentation prediction for identification of unknowns...

What's hot (20)

PPTX
Chemical identification of unknowns in high resolution mass spectrometry usin...
PPTX
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
PPTX
How to place your research questions or results into the context of the "Lega...
PPTX
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
PPTX
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
PPTX
Accessing information for chemicals in hydraulic fracturing fluids using the ...
PPTX
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
PPTX
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
PPTX
Structure Identification Using High Resolution Mass Spectrometry Data and the...
PPTX
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
PPTX
New developments in delivering public access to data from the National Center...
PPTX
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
PPTX
Development of a Tool for Systematic Integration of Traditional and New Appro...
PPTX
Cheminformatics approaches to support chemical identification delivered via t...
PPTX
Non-targeted analysis supported by data and cheminformatics delivered via the...
PPTX
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
PPTX
Structure Identification Using High Resolution Mass Spectrometry Data and the...
PPTX
Delivering The Benefits of Chemical-Biological Integration in Computational T...
PPTX
Non-targeted analysis supported by data and cheminformatics delivered via the...
Chemical identification of unknowns in high resolution mass spectrometry usin...
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
How to place your research questions or results into the context of the "Lega...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
New developments in delivering public access to data from the National Center...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Development of a Tool for Systematic Integration of Traditional and New Appro...
Cheminformatics approaches to support chemical identification delivered via t...
Non-targeted analysis supported by data and cheminformatics delivered via the...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Non-targeted analysis supported by data and cheminformatics delivered via the...
Ad

Similar to The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open Science, Cheminformatics, and Online Databases supporting Non-Targeted Screening (20)

PPTX
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
PPTX
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
PPTX
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
PPTX
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
PPTX
Delivering web-based access to data and algorithms to support computational t...
PPTX
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
PPTX
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
PPTX
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
PPTX
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
PPTX
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
PPTX
EPA’s CompTox Chemicals Dashboard, a tool with information on ~900,000 chemicals
PPTX
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
PDF
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
PPTX
Accessing Environmental Chemistry Data via Data Dashboards
PPTX
Delivering chemical-associated data via EPA web applications
PPTX
PPTX
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
PPTX
The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...
PPTX
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
Delivering web-based access to data and algorithms to support computational t...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
EPA’s CompTox Chemicals Dashboard, a tool with information on ~900,000 chemicals
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
Accessing Environmental Chemistry Data via Data Dashboards
Delivering chemical-associated data via EPA web applications
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
The US-EPA CompTox Chemicals Dashboard – an online data integration hub suppo...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Ad

Recently uploaded (20)

PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
famous lake in india and its disturibution and importance
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
7. General Toxicologyfor clinical phrmacy.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
The KM-GBF monitoring framework – status & key messages.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Phytochemical Investigation of Miliusa longipes.pdf
Comparative Structure of Integument in Vertebrates.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
ECG_Course_Presentation د.محمد صقران ppt
Cell Membrane: Structure, Composition & Functions
Biophysics 2.pdffffffffffffffffffffffffff
famous lake in india and its disturibution and importance

The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open Science, Cheminformatics, and Online Databases supporting Non-Targeted Screening

Editor's Notes

  • #73: Clarify it’s a mockup in progress
  • #74: Clarify it’s a mockup in progress