SlideShare a Scribd company logo
Web-based access to experimental and
predicted data for environmental fate,
transport and toxicity data
Antony Williams1, Todd Martin2, Valery Tkachenko3,
Kamel Mansouri4 and Chris Grulke1
1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC
2) National Risk Management Research Laboratory, U.S. Environmental Protection Agency, Cincinnati, OH
3) Science Data Software, LLC, Rockville, MD 20850
4) Integrated Laboratory Systems, Research Triangle Park, North Carolina, United States
August 2018
ACS Fall Meeting, Boston
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
What is the EPA-NCCT?
• National Center for Computational Toxicology – part of
EPA’s Office of Research and Development
• Research driven by EPA’s Chemical Safety for
Sustainability Research Program
– Develop new approaches to evaluate the safety of chemicals
– Integrate advances in biology, biotechnology, chemistry,
exposure science and computer science
• Goal - To identify chemical exposures that may disrupt
biological processes and cause adverse outcomes.
• Prediction models and predicted data are some of our
major outputs
The CompTox Chemistry Dashboard
• A publicly accessible website delivering access:
– ~762,000 chemicals with related property data
– Searchable by chemical, product use, gene and assay
– Experimental and predicted physicochemical property
data, environmental fate and transport, and tox endpoints
– “Bioactivity data” for the ToxCast/Tox21 project – plus
derived models
– Generalized Read-Across (GenRA) module
– “Batch searching” of predicted data for thousands of
chemicals
2
CompTox Dashboard
https://comptox.epa.gov/dashboard
3
CompTox Dashboard
Chemicals
4
Detailed Chemical Pages
5
Physicochemical properties
6
OPERA Predicted Properties
7
Property Initial file Curated Data Curated QSAR ready
AOP 818 818 745
BCF 685 618 608
BioHC 175 151 150
Biowin 1265 1196 1171
BP 5890 5591 5436
HL 1829 1758 1711
KM 631 548 541
KOA 308 277 270
LogP 15809 14544 14041
MP 10051 9120 8656
PC 788 750 735
VP 3037 2840 2716
WF 5764 5076 4836
WS 2348 2046 2010
Curation to QSAR Ready Files
Detailed OPERA Prediction Reports
9
Prediction Details and QMRF Report
EPA T.E.S.T
https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test
11
Physical properties in TEST
Endpoint Definition
Viscosity
A measure of the resistance of a fluid to flow
(cP) defined as the proportionality constant
between shear rate and shear stress
Surface tension
A property of the surface of a liquid (dyn/cm) that
allows it to resist an external force
Water solubility
The amount of a chemical (mg/L) that will
dissolve in liquid water to form a homogeneous
solution
12
Other Dashboard Predictions
• Predictions and models expand outside of
simply physicochemical and environmental
fate and transport
• Examples
– Read-across for Toxicity Endpoints
– Quantitative Structure–Use Relationship (QSUR) models
– High-Throughput ToxicoKinetics (HTTK)
– Models based on high throughput bioactivity data
13
Definitions: Read-Across
• Known information on the property of a substance
(source) is used to make a prediction of the same
property for another substance (target) that is
considered “similar”
14
Source chemical Target chemical
Property  


Reliable data
Missing data
Predicted to be harmfulKnown to be harmful
Acute fish toxicity?
GenRA (Generalised Read-Across)
• Predicting toxicity as a similarity-weighted activity of nearest
neighbors based on chemistry and/or bioactivity descriptors
• Goal: to systematically evaluate read-across performance
and uncertainty using available data
• The approach enabled a performance baseline for read-
across predictions of toxicity effects within specific study
outcomes to be established
15
GenRA (Generalised Read-Across)
GenRA (Generalised Read-Across)
Structure Similarity
GenRA (Generalised Read-Across)
Data gap analysis
GenRA (Generalised Read-Across)
19
Run GenRA
Target
Source analogues
Batch Searching
• Singleton searches are useful but we work
with thousands of chemicals!
• Typical questions
– What are the SMILES strings for a list of 1000 chemicals?
– Do any of this list of chemicals have XXX type of data?
– What are the predicted logP values for a list of chemicals?
– Can I get lists of predicted properties in Excel files? In SDF files?
20
Batch Searching
21
Batch Searching
22
Excel Output
23
Real-Time Predictions
24
Real-Time Predictions
25
Real-Time Predictions
26
API in development
Prototype services available
27
https://comptox.epa.gov/dashboard/web-test/WS?smiles=CCO&method=hc
Our support for FAIR Data
28
Downloadable Data
29
Work in Progress
• Present work in development
– Development of OPERA model web services
– Development of pKa and logD prediction models
– Display of TEST Toxicity endpoints predicted data
– Analytical Data support
• Spectral searching against predicted Mass Spectra
30
pKa Prediction Model
• pKa prediction models based on Open
Data Set of 8000 chemicals – acidic, basic
and amphoteric chemicals
31
Toxicity Endpoints in TEST
Endpoint Definition
96 hour fathead
minnow LC50
Concentration in mg/L that causes 50% of
fathead minnow to die after 96 hours
48 hour Daphnia
magna LC50
Concentration in mg/L that causes 50% of
Daphnia magna to die after 48 hours
48 hour
T. pyriformis IGC50
Concentration in mg/L that causes 50% growth
inhibition to T. pyriformis after 48 hours
Oral rat LD50 Amount of chemical in mg/kg body weight that
causes 50% of rats to die after oral ingestion
32
Developmental
toxicity
Whether or not a chemical causes
developmental toxicity effects to humans or
animals
Ames mutagenicity A compound is positive for mutagenicity if it
induces revertant colony growth in any strain of
Salmonella typhimurium
Property Data is one aspect..
• CompTox Dashboard has many applications
– A central hub for lists of chemicals
– A “publication” environment for our published data – to
enable data transparency
– The basis of other applications in development – RapidTox
for rapid risk assessment using the underpinning data
– Non-targeted analysis support for mass spectrometry
33
Predicted Mass Spectra
http://cfmid.wishartlab.com/
• MS/MS spectra prediction for ESI+, ESI-, and EI
• Predictions generated and stored for >700,000
structures, to be accessible via Dashboard
34
Search Expt. vs. Predicted Spectra
Library Fragmentation
Spectra (20eV)
Observed Fragmentation
Spectra (20eV)
Match
Score
Predicted Mass Spectra
Conclusion
• The CompTox Dashboard provides access to data
for ~762,000 chemicals
• Multiple prediction models available for data gap
filling
– OPERA models and TEST models – PhysChem and Tox endpoints
– Models based on in vitro data – classification models
– Generalized Read-Across delivered
• Real time prediction models expanding in number
• Web services available for some physchem and
toxicity endpoints
37
Contact
Antony Williams
US EPA Office of Research and Development
National Center for Computational Toxicology (NCCT)
Williams.Antony@epa.gov
ORCID: https://orcid.org/0000-0002-2668-4821
38

More Related Content

PPTX
Accessing information for chemicals in hydraulic fracturing fluids using the ...
PPTX
Delivering The Benefits of Chemical-Biological Integration in Computational T...
PPTX
Chemical identification of unknowns in high resolution mass spectrometry usin...
PPTX
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
PPTX
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
PPTX
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
PPTX
Development of a Tool for Systematic Integration of Traditional and New Appro...
PPTX
New developments in delivering public access to data from the National Center...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Chemical identification of unknowns in high resolution mass spectrometry usin...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Development of a Tool for Systematic Integration of Traditional and New Appro...
New developments in delivering public access to data from the National Center...

What's hot (20)

PPTX
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
PPTX
Delivering The Benefits of Chemical-Biological Integration in Computational T...
PPTX
Structure Identification Using High Resolution Mass Spectrometry Data and the...
PPTX
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
PPTX
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
PPT
Adding complex expert knowledge into chemical database and transforming surfa...
PPTX
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
PPTX
Structure Identification Using High Resolution Mass Spectrometry Data and the...
PPTX
Non-targeted analysis supported by data and cheminformatics delivered via the...
PPTX
Structure Identification Using High Resolution Mass Spectrometry Data and the...
PPTX
Incorporating new technologies and High Throughput Screening in the design an...
PPTX
An examination of data quality on QSAR Modeling in regards to the environment...
PPTX
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
PPTX
Does bigger mean better in the world of chemistry databases?
PPTX
How to place your research questions or results into the context of the "Lega...
PDF
The influence of data curation on QSAR Modeling – examining issues of qualit...
PPTX
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
PPTX
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
PDF
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
Adding complex expert knowledge into chemical database and transforming surfa...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Incorporating new technologies and High Throughput Screening in the design an...
An examination of data quality on QSAR Modeling in regards to the environment...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
Does bigger mean better in the world of chemistry databases?
How to place your research questions or results into the context of the "Lega...
The influence of data curation on QSAR Modeling – examining issues of qualit...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
Ad

Similar to Web-based access to experimental and predicted data for environmental fate, transport and toxicity data (20)

PDF
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
PPTX
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
PPTX
Accessing Environmental Chemistry Data via Data Dashboards
PPTX
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
PPTX
Delivering web-based access to data and algorithms to support computational t...
PPTX
Utilizing US-EPA Data Dashboards to Support Exposomics research
PPTX
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
PPTX
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
PPTX
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
PPTX
Accessing data to support pesticide residue and emerging contaminant analysis...
PPTX
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
PPTX
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
PPTX
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
PPTX
Translating research into practical tools: A case study of GenRA, a new read...
PPTX
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
PPTX
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
PPTX
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
PPTX
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
PPTX
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
Accessing Environmental Chemistry Data via Data Dashboards
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Delivering web-based access to data and algorithms to support computational t...
Utilizing US-EPA Data Dashboards to Support Exposomics research
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Accessing data to support pesticide residue and emerging contaminant analysis...
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
Translating research into practical tools: A case study of GenRA, a new read...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
Ad

Recently uploaded (20)

PPTX
Pharmacology of Autonomic nervous system
PPTX
The Minerals for Earth and Life Science SHS.pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
Science Quipper for lesson in grade 8 Matatag Curriculum
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
BIOMOLECULES PPT........................
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPT
veterinary parasitology ````````````.ppt
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
CORDINATION COMPOUND AND ITS APPLICATIONS
PDF
The scientific heritage No 166 (166) (2025)
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPTX
Fluid dynamics vivavoce presentation of prakash
PPTX
Introcution to Microbes Burton's Biology for the Health
Pharmacology of Autonomic nervous system
The Minerals for Earth and Life Science SHS.pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
BODY FLUIDS AND CIRCULATION class 11 .pptx
Placing the Near-Earth Object Impact Probability in Context
Science Quipper for lesson in grade 8 Matatag Curriculum
7. General Toxicologyfor clinical phrmacy.pptx
BIOMOLECULES PPT........................
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
veterinary parasitology ````````````.ppt
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Phytochemical Investigation of Miliusa longipes.pdf
CORDINATION COMPOUND AND ITS APPLICATIONS
The scientific heritage No 166 (166) (2025)
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Fluid dynamics vivavoce presentation of prakash
Introcution to Microbes Burton's Biology for the Health

Web-based access to experimental and predicted data for environmental fate, transport and toxicity data

  • 1. Web-based access to experimental and predicted data for environmental fate, transport and toxicity data Antony Williams1, Todd Martin2, Valery Tkachenko3, Kamel Mansouri4 and Chris Grulke1 1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2) National Risk Management Research Laboratory, U.S. Environmental Protection Agency, Cincinnati, OH 3) Science Data Software, LLC, Rockville, MD 20850 4) Integrated Laboratory Systems, Research Triangle Park, North Carolina, United States August 2018 ACS Fall Meeting, Boston http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  • 2. What is the EPA-NCCT? • National Center for Computational Toxicology – part of EPA’s Office of Research and Development • Research driven by EPA’s Chemical Safety for Sustainability Research Program – Develop new approaches to evaluate the safety of chemicals – Integrate advances in biology, biotechnology, chemistry, exposure science and computer science • Goal - To identify chemical exposures that may disrupt biological processes and cause adverse outcomes. • Prediction models and predicted data are some of our major outputs
  • 3. The CompTox Chemistry Dashboard • A publicly accessible website delivering access: – ~762,000 chemicals with related property data – Searchable by chemical, product use, gene and assay – Experimental and predicted physicochemical property data, environmental fate and transport, and tox endpoints – “Bioactivity data” for the ToxCast/Tox21 project – plus derived models – Generalized Read-Across (GenRA) module – “Batch searching” of predicted data for thousands of chemicals 2
  • 9. Property Initial file Curated Data Curated QSAR ready AOP 818 818 745 BCF 685 618 608 BioHC 175 151 150 Biowin 1265 1196 1171 BP 5890 5591 5436 HL 1829 1758 1711 KM 631 548 541 KOA 308 277 270 LogP 15809 14544 14041 MP 10051 9120 8656 PC 788 750 735 VP 3037 2840 2716 WF 5764 5076 4836 WS 2348 2046 2010 Curation to QSAR Ready Files
  • 11. Prediction Details and QMRF Report
  • 13. Physical properties in TEST Endpoint Definition Viscosity A measure of the resistance of a fluid to flow (cP) defined as the proportionality constant between shear rate and shear stress Surface tension A property of the surface of a liquid (dyn/cm) that allows it to resist an external force Water solubility The amount of a chemical (mg/L) that will dissolve in liquid water to form a homogeneous solution 12
  • 14. Other Dashboard Predictions • Predictions and models expand outside of simply physicochemical and environmental fate and transport • Examples – Read-across for Toxicity Endpoints – Quantitative Structure–Use Relationship (QSUR) models – High-Throughput ToxicoKinetics (HTTK) – Models based on high throughput bioactivity data 13
  • 15. Definitions: Read-Across • Known information on the property of a substance (source) is used to make a prediction of the same property for another substance (target) that is considered “similar” 14 Source chemical Target chemical Property     Reliable data Missing data Predicted to be harmfulKnown to be harmful Acute fish toxicity?
  • 16. GenRA (Generalised Read-Across) • Predicting toxicity as a similarity-weighted activity of nearest neighbors based on chemistry and/or bioactivity descriptors • Goal: to systematically evaluate read-across performance and uncertainty using available data • The approach enabled a performance baseline for read- across predictions of toxicity effects within specific study outcomes to be established 15
  • 20. GenRA (Generalised Read-Across) 19 Run GenRA Target Source analogues
  • 21. Batch Searching • Singleton searches are useful but we work with thousands of chemicals! • Typical questions – What are the SMILES strings for a list of 1000 chemicals? – Do any of this list of chemicals have XXX type of data? – What are the predicted logP values for a list of chemicals? – Can I get lists of predicted properties in Excel files? In SDF files? 20
  • 28. API in development Prototype services available 27 https://comptox.epa.gov/dashboard/web-test/WS?smiles=CCO&method=hc
  • 29. Our support for FAIR Data 28
  • 31. Work in Progress • Present work in development – Development of OPERA model web services – Development of pKa and logD prediction models – Display of TEST Toxicity endpoints predicted data – Analytical Data support • Spectral searching against predicted Mass Spectra 30
  • 32. pKa Prediction Model • pKa prediction models based on Open Data Set of 8000 chemicals – acidic, basic and amphoteric chemicals 31
  • 33. Toxicity Endpoints in TEST Endpoint Definition 96 hour fathead minnow LC50 Concentration in mg/L that causes 50% of fathead minnow to die after 96 hours 48 hour Daphnia magna LC50 Concentration in mg/L that causes 50% of Daphnia magna to die after 48 hours 48 hour T. pyriformis IGC50 Concentration in mg/L that causes 50% growth inhibition to T. pyriformis after 48 hours Oral rat LD50 Amount of chemical in mg/kg body weight that causes 50% of rats to die after oral ingestion 32 Developmental toxicity Whether or not a chemical causes developmental toxicity effects to humans or animals Ames mutagenicity A compound is positive for mutagenicity if it induces revertant colony growth in any strain of Salmonella typhimurium
  • 34. Property Data is one aspect.. • CompTox Dashboard has many applications – A central hub for lists of chemicals – A “publication” environment for our published data – to enable data transparency – The basis of other applications in development – RapidTox for rapid risk assessment using the underpinning data – Non-targeted analysis support for mass spectrometry 33
  • 35. Predicted Mass Spectra http://cfmid.wishartlab.com/ • MS/MS spectra prediction for ESI+, ESI-, and EI • Predictions generated and stored for >700,000 structures, to be accessible via Dashboard 34
  • 36. Search Expt. vs. Predicted Spectra
  • 37. Library Fragmentation Spectra (20eV) Observed Fragmentation Spectra (20eV) Match Score Predicted Mass Spectra
  • 38. Conclusion • The CompTox Dashboard provides access to data for ~762,000 chemicals • Multiple prediction models available for data gap filling – OPERA models and TEST models – PhysChem and Tox endpoints – Models based on in vitro data – classification models – Generalized Read-Across delivered • Real time prediction models expanding in number • Web services available for some physchem and toxicity endpoints 37
  • 39. Contact Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology (NCCT) Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 38