SlideShare a Scribd company logo
Center for Computational Toxicology and Exposure, US-EPA, RTP, NC
http://www.orcid.org/0000-0002-2668-4821
Delivering chemical-associated data
via EPA web applications
The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA
Antony J. Williams
Cheminformatics Resources of U.S. Governmental Organizations– May 11th 2022
The State of Internet Chemistry…
• The past two decades has seen an explosion in online data
• There are so many resources to choose from…so much data
• In our world there are dominant aggregating resources done well:
PubChem, ChEMBL, eChemPortal, ECHA
• Do we need yet another online chemistry database?
20 Years of Curating Data in Our Team
The Charge for the
CompTox Chemicals Dashboard
• Develop a “first-stop-shop” for environmental chemical data
to support EPA and partner decision making:
– Centralized location for chemical data (DSSTox foundation)
– Chemistry, exposure, hazard and dosimetry
– Combination of existing data and predictive models
– Publicly accessible, periodically updated, curated
• Easy access to data improves efficiency and ultimately
accelerates chemical risk assessment
CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
5
SEARCH
TOX DATA
BIOACTIVITY
SIMILARITY
READ-ACROSS
PUBMED
BATCH SEARCH
Detailed Chemical Pages
• Chemical page: Wikipedia snippet when available, intrinsic
properties, structural identifiers, linked substances
“Executive Summary” regarding
chemical toxicity
• Overview of toxicity-
related info
• Quantitative values
• Physchem. and Fate &
Transport
• Adverse Outcome
Pathway links
• In vitro bioactivity
summary plot
Experimental and Predicted Data
• Physchem and Fate & Transport
experimental and predicted data
• Data extracted (somewhat curated)
from literature and databases
• Multiple prediction algorithms used:
OPERA, TEST, ACD/Labs, EPI Suite
• Data can be downloaded as Excel,
TSV and CSV files
Chemical Hazard Data
ToxVal Database
• >50k chemicals
• >770k tox. values
• >30 sources of data
• ~5k journals cited
• ~70k citations
Safety Data – Thank you PubChem!
Bring together resources
Sources of Exposure to Chemicals
A recent focus on PFAS chemicals
Similar Compounds
Simple cheminformatics search
Relationships in the data
Manual curation and mapping
Markush Chemical representations
• PFOS is a member of linear perfluoroalkyl sulfonates
15
…and their Markush Children…
• Linear perfluoroalkyl sulfonates has children…
Bioactivity Data
ToxCast
ToxCast/Tox21 Data
Full transparency in terms of data
• Concentration Response data
Full access to concentration-response curves
Use Models Derived from the Data
Searching Literature
and the Internet
Literature Searching
• Real-time retrieval of data from PubMed ~30
million abstracts and growing)
• Choose from set of pre-defined queries
• Adjust and fine tune queries based on interests
Literature Searching
• “Sifting” of results using
multiple terms
• Frequency counting terms
• Color highlighting of terms
• Download list to Excel
• Send list to PubMed for
downloading ref. file
• Direct link via PubMed ID
What’s the best way to search the
internet for chemical data?
• We know how complex chemicals identifiers are…
• CASRN(s)
• Hundreds of names (maybe)
• SMILES
• InChIs
• EINECS, EC numbers
• What can WE do to help you navigate the internet?
External Links – Also use Identifiers
Names, CASRN, PubChem IDs, InChIs…
27
External Links
• Links to ~90 websites providing access to
additional data on the chemical of interest
Chemical Lists and
Categories
A List of Lists of Chemicals
https://comptox.epa.gov/dashboard/chemical_lists
The OECD List of PFAS
http://www.oecd.org/chemicalsafety/portal-perfluorinated-chemicals/
31
Example PFAS-UVCBs
32
PFAS List Paper
https://doi.org/10.3389/fenvs.2022.850019
• What makes our efforts
different?
• MANUAL curation work
• Building lists, cross-
referencing, mapping
relationships, sourcing
and curating data
Batch Searching
Batch Searching
• Singleton searches are great but…
• …we generally want data on LOTS of chemicals!
• Typical questions
• What are the structures for a set of chemical names? Set of CASRNs?
• Can I get chemical lists in Excel files? As a list of SMILES strings?
Can I get an SDF file?
• Can I include predicted properties? OPERA? TEST?
• Are “these chemicals” screened in Toxcast?
• I need masses and formulae for a list of chemicals
Batch Search
Batch Search – Excel, CSV, SDF file
Batch Search
Open Data
Exchange
Since our data are Open…
• They flow into other systems for benefit …
• ECHA eChemPortal
• ChemSpider
• EBI’s UniChem
• PubChem
Developing
Cheminformatics
“PoC Modules”
Six Modules
• Hazard Comparison Profiling – profile chemicals based on hazard
• Alerts – structure, substructure, SMARTS based alerts and flags
• Predict – batch prediction using WebTEST (100s of structures)
• Search – structure/substructure/similarity searches
• Standardize – convert structures into QSAR/MS-Ready forms
• ToxPrints – generate ToxPrint substructural fragments and profile
Module 1: Hazard Module
Module 2: Alerts
Module 3: WebTEST Batch Prediction
Module 4: Structure/Substructure/Similarity
Summary and Conclusion
• CompTox Chemicals Dashboard - a
central hub for environmental data
• ~900k chemical substances (1.2M soon)
• Integrating property data, hazard data,
exposure data, in vitro bioactivity data
• Interrogation of bioactivity data -
• Multiple types of searches
• Batch search for thousands of chemicals
• Real-time property and toxicity predictions
• Downloadable files – CSV, TSV and Excel
Some Related Publications of Interest
You want to know more…
• Lots of resources available
• Presentations: https://tinyurl.com/w5hqs55
• Communities of Practice Videos: https://rb.gy/qsbno1
• Manual: https://rb.gy/4fgydc
• Latest News: https://comptox.epa.gov/dashboard/news_info
49
Acknowledgments
• Contact: Williams.Antony@epa.gov
• Feedback and follow-up is
welcomed! Your questions help.
• The dashboard is based on the
efforts of many more team
members than us
• Many collaborators provide data
also
50
EPA’s Center for Computational Toxicology and Exposure
THANK YOU
Panel Discussion Ideas…
• Thank you to the organizers for the conference. It was very
beneficial to catch up on what’s going on across the groups
• For the Panel Discussion today…selfish wants 
• How do we limit rework in terms of repeat curation?
• How do we coordinate development of QSAR data sets (because
QSAR modeling is not the hard part) – DILI data, Log Kow, Solubility
• There would be so much benefit to InChI-Markush. It’s doable…how
can we collectively fund it?

More Related Content

PPTX
PPTX
Accessing Environmental Chemistry Data via Data Dashboards
PPTX
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
PPTX
PPTX
Accessing data to support pesticide residue and emerging contaminant analysis...
PPTX
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
PPTX
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
PDF
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing data to support pesticide residue and emerging contaminant analysis...
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...

Similar to Delivering chemical-associated data via EPA web applications (20)

PPTX
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
PPTX
US-EPA Chemicals Dashboard and Applications to Digital Design of Molecules
PPTX
Delivering web-based access to data and algorithms to support computational t...
PPTX
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
PPTX
EPA CompTox chemicals dashboard: An online resource for environmental chemists
PPTX
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
PPTX
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
PPTX
Cheminformatics tools supporting dissemination of data associated with US EPA...
PPTX
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
PPTX
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
PPTX
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
PPTX
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
PPTX
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
PPTX
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
PPTX
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
PPTX
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
PPTX
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
PPTX
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
US-EPA Chemicals Dashboard and Applications to Digital Design of Molecules
Delivering web-based access to data and algorithms to support computational t...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
EPA CompTox chemicals dashboard: An online resource for environmental chemists
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
Cheminformatics tools supporting dissemination of data associated with US EPA...
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
Ad

Recently uploaded (20)

PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
. Radiology Case Scenariosssssssssssssss
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
Fluid dynamics vivavoce presentation of prakash
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
An interstellar mission to test astrophysical black holes
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPTX
Application of enzymes in medicine (2).pptx
PPTX
Introcution to Microbes Burton's Biology for the Health
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPT
veterinary parasitology ````````````.ppt
PPTX
The Minerals for Earth and Life Science SHS.pptx
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PPTX
Seminar Hypertension and Kidney diseases.pptx
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
. Radiology Case Scenariosssssssssssssss
Placing the Near-Earth Object Impact Probability in Context
Fluid dynamics vivavoce presentation of prakash
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
An interstellar mission to test astrophysical black holes
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Application of enzymes in medicine (2).pptx
Introcution to Microbes Burton's Biology for the Health
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
veterinary parasitology ````````````.ppt
The Minerals for Earth and Life Science SHS.pptx
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
BODY FLUIDS AND CIRCULATION class 11 .pptx
Seminar Hypertension and Kidney diseases.pptx
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Introduction to Cardiovascular system_structure and functions-1
7. General Toxicologyfor clinical phrmacy.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Ad

Delivering chemical-associated data via EPA web applications

  • 1. Center for Computational Toxicology and Exposure, US-EPA, RTP, NC http://www.orcid.org/0000-0002-2668-4821 Delivering chemical-associated data via EPA web applications The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA Antony J. Williams Cheminformatics Resources of U.S. Governmental Organizations– May 11th 2022
  • 2. The State of Internet Chemistry… • The past two decades has seen an explosion in online data • There are so many resources to choose from…so much data • In our world there are dominant aggregating resources done well: PubChem, ChEMBL, eChemPortal, ECHA • Do we need yet another online chemistry database?
  • 3. 20 Years of Curating Data in Our Team
  • 4. The Charge for the CompTox Chemicals Dashboard • Develop a “first-stop-shop” for environmental chemical data to support EPA and partner decision making: – Centralized location for chemical data (DSSTox foundation) – Chemistry, exposure, hazard and dosimetry – Combination of existing data and predictive models – Publicly accessible, periodically updated, curated • Easy access to data improves efficiency and ultimately accelerates chemical risk assessment
  • 5. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 5 SEARCH TOX DATA BIOACTIVITY SIMILARITY READ-ACROSS PUBMED BATCH SEARCH
  • 6. Detailed Chemical Pages • Chemical page: Wikipedia snippet when available, intrinsic properties, structural identifiers, linked substances
  • 7. “Executive Summary” regarding chemical toxicity • Overview of toxicity- related info • Quantitative values • Physchem. and Fate & Transport • Adverse Outcome Pathway links • In vitro bioactivity summary plot
  • 8. Experimental and Predicted Data • Physchem and Fate & Transport experimental and predicted data • Data extracted (somewhat curated) from literature and databases • Multiple prediction algorithms used: OPERA, TEST, ACD/Labs, EPI Suite • Data can be downloaded as Excel, TSV and CSV files
  • 9. Chemical Hazard Data ToxVal Database • >50k chemicals • >770k tox. values • >30 sources of data • ~5k journals cited • ~70k citations
  • 10. Safety Data – Thank you PubChem! Bring together resources
  • 11. Sources of Exposure to Chemicals
  • 12. A recent focus on PFAS chemicals
  • 14. Relationships in the data Manual curation and mapping
  • 15. Markush Chemical representations • PFOS is a member of linear perfluoroalkyl sulfonates 15
  • 16. …and their Markush Children… • Linear perfluoroalkyl sulfonates has children…
  • 20. Full transparency in terms of data • Concentration Response data
  • 21. Full access to concentration-response curves
  • 22. Use Models Derived from the Data
  • 24. Literature Searching • Real-time retrieval of data from PubMed ~30 million abstracts and growing) • Choose from set of pre-defined queries • Adjust and fine tune queries based on interests
  • 25. Literature Searching • “Sifting” of results using multiple terms • Frequency counting terms • Color highlighting of terms • Download list to Excel • Send list to PubMed for downloading ref. file • Direct link via PubMed ID
  • 26. What’s the best way to search the internet for chemical data? • We know how complex chemicals identifiers are… • CASRN(s) • Hundreds of names (maybe) • SMILES • InChIs • EINECS, EC numbers • What can WE do to help you navigate the internet?
  • 27. External Links – Also use Identifiers Names, CASRN, PubChem IDs, InChIs… 27
  • 28. External Links • Links to ~90 websites providing access to additional data on the chemical of interest
  • 30. A List of Lists of Chemicals https://comptox.epa.gov/dashboard/chemical_lists
  • 31. The OECD List of PFAS http://www.oecd.org/chemicalsafety/portal-perfluorinated-chemicals/ 31
  • 33. PFAS List Paper https://doi.org/10.3389/fenvs.2022.850019 • What makes our efforts different? • MANUAL curation work • Building lists, cross- referencing, mapping relationships, sourcing and curating data
  • 35. Batch Searching • Singleton searches are great but… • …we generally want data on LOTS of chemicals! • Typical questions • What are the structures for a set of chemical names? Set of CASRNs? • Can I get chemical lists in Excel files? As a list of SMILES strings? Can I get an SDF file? • Can I include predicted properties? OPERA? TEST? • Are “these chemicals” screened in Toxcast? • I need masses and formulae for a list of chemicals
  • 37. Batch Search – Excel, CSV, SDF file
  • 40. Since our data are Open… • They flow into other systems for benefit … • ECHA eChemPortal • ChemSpider • EBI’s UniChem • PubChem
  • 42. Six Modules • Hazard Comparison Profiling – profile chemicals based on hazard • Alerts – structure, substructure, SMARTS based alerts and flags • Predict – batch prediction using WebTEST (100s of structures) • Search – structure/substructure/similarity searches • Standardize – convert structures into QSAR/MS-Ready forms • ToxPrints – generate ToxPrint substructural fragments and profile
  • 45. Module 3: WebTEST Batch Prediction
  • 47. Summary and Conclusion • CompTox Chemicals Dashboard - a central hub for environmental data • ~900k chemical substances (1.2M soon) • Integrating property data, hazard data, exposure data, in vitro bioactivity data • Interrogation of bioactivity data - • Multiple types of searches • Batch search for thousands of chemicals • Real-time property and toxicity predictions • Downloadable files – CSV, TSV and Excel
  • 49. You want to know more… • Lots of resources available • Presentations: https://tinyurl.com/w5hqs55 • Communities of Practice Videos: https://rb.gy/qsbno1 • Manual: https://rb.gy/4fgydc • Latest News: https://comptox.epa.gov/dashboard/news_info 49
  • 50. Acknowledgments • Contact: Williams.Antony@epa.gov • Feedback and follow-up is welcomed! Your questions help. • The dashboard is based on the efforts of many more team members than us • Many collaborators provide data also 50 EPA’s Center for Computational Toxicology and Exposure
  • 51. THANK YOU Panel Discussion Ideas… • Thank you to the organizers for the conference. It was very beneficial to catch up on what’s going on across the groups • For the Panel Discussion today…selfish wants  • How do we limit rework in terms of repeat curation? • How do we coordinate development of QSAR data sets (because QSAR modeling is not the hard part) – DILI data, Log Kow, Solubility • There would be so much benefit to InChI-Markush. It’s doable…how can we collectively fund it?