SlideShare a Scribd company logo
Overview of open resources to support
automated structure verification
and elucidation
Antony Williams1 and Emma Schymanski2
1. National Center for Computational Toxicology, US EPA, Research Triangle Park, Durham, NC, USA.
2. Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, Luxembourg.
March 2018
ACS Spring Meeting, New Orleans
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
Today’s Session
• Our focus for the session:
– Access to data to support automated structure verification
and elucidation – NMR and MS
– Data quality, curation and validation – and a call to action
– OPENness is here – Open Access, Data, Source
– Data standards – we already have them and there are
more coming
– Vendors and scientists providing and using available data
– There are tools USING these data for Structure Elucidation
– Cannot be an exhaustive review…but at least a good start
1
An Ideal Scenario…
• All published structures and spectra will be
available from all published articles for
repurposing and reuse in standard
formats (preferably not necessarily Open!)
• Scientists are building open approaches
– MS fragmentation
– NMR shift prediction
– Structure generators
– Computer-Assisted Structure Elucidation (CASE)
• Are we there yet???
2
Publishers sharing data
• We have achieved ideal scenario right?
• No – PDF figures in Supplementary Info is
still the default
• There is a need for public databases of
spectral data. There ARE some out there.
• Analogous to Wikipedia, we are primarily
consumers rather than contributors…
3
Sites sharing data
• There are many sites that “share” spectral
data. Generally in non-open formats
• There are rich resources
• Cannot easily be used to serve automated
structure verification and elucidation
4
PubChem – Spectral Links
5
PubChem - Spectral Links
6
Spectral Links to Partial Data
7
SDBS – Free Not Open
8
SDBS – Free Not Open
9
SDBS – Free Not Open
10
ChemSpider
11
ChemSpider
12
ChemSpider
13
NIST WebBook
https://webbook.nist.gov/chemistry/
14
NIST WebBook
https://webbook.nist.gov/chemistry/
15
Focused Databases
• Focused databases
– Compiled focused databases of Open Data are
preferable
– Spectral data for structure verification and elucidation
– Open Mass Spec Data especially useful (Emma’s
talks!)
– Data can be brought in-house and integrated
– Algorithms can be derived – e.g. NMR shift prediction
16
NMRShiftDB
https://nmrshiftdb.nmr.uni-koeln.de/
17
NMRShiftDB
https://nmrshiftdb.nmr.uni-koeln.de/
18
NMRShiftDB
https://nmrshiftdb.nmr.uni-koeln.de/
19
Open Resources
• Open Databases offer more value
– Bring the data in-house, integrate, link
– Ingest and train algorithms
20
CSEARCH/NMRPREDICT
http://nmrpredict.orc.univie.ac.at/
21
MassBank
https://massbank.eu/MassBank/
22
MassBank
https://massbank.eu/MassBank/
23
m/z CLOUD
https://www.mzcloud.org/
24
Integrating Data and Services
• Integration:
– Use simple URL linking for navigation
– Provide simple services for real time prediction
25
Example Integration via the
CompTox Chemistry Dashboard
26
Link-Based Access
27
Link Access
28
29
Open Data For Bulk Predictions
30
• Open Data for apps
– Structures
– CAS Registry Numbers
– Names
– Formulae
– Mass
• iOS app including
predicted C13 NMR
Mass Searching
31
Mass and CNMR Searching
32
Important Standards in our efforts
• Structures – Molfile, SDF file, InChIs
(standard and non-standard)
• NMR – JCAMP and all its variants
• MS – mz(X)ML, MSP (and all its variants),
MGF, MassBank
33
There are more coming
34
Conclusion
• The abundance of online data continues to grow
• There are “integrated data”, there are databases,
there are online tools, there are mobile apps
• Data Quality is critical and OPENness is enabling
– Open Data
– Open Standards
– Open Source
• The rest of the day will expand on these efforts…
35
Contact
Antony Williams
US EPA Office of Research and Development
National Center for Computational Toxicology (NCCT)
Williams.Antony@epa.gov
ORCID: https://orcid.org/0000-0002-2668-4821
36

More Related Content

PPTX
Sharing chemical structures with peer reviewed publications
PPTX
Development of a Tool for Systematic Integration of Traditional and New Appro...
PPTX
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
PPTX
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
PPTX
New developments in delivering public access to data from the National Center...
PPT
Adding complex expert knowledge into chemical database and transforming surfa...
PPTX
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
PPTX
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Sharing chemical structures with peer reviewed publications
Development of a Tool for Systematic Integration of Traditional and New Appro...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
New developments in delivering public access to data from the National Center...
Adding complex expert knowledge into chemical database and transforming surfa...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Accessing information for chemicals in hydraulic fracturing fluids using the ...

What's hot (20)

PPTX
Chemical identification of unknowns in high resolution mass spectrometry usin...
PDF
SETAC Rome Non-Target Screening For Chemical Discovery
PPTX
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
PPTX
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
PPTX
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
PDF
Small Molecules in Big Data - Analytica Munich
PPTX
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
PPTX
Delivering The Benefits of Chemical-Biological Integration in Computational T...
PPTX
Delivering The Benefits of Chemical-Biological Integration in Computational T...
PPTX
An examination of data quality on QSAR Modeling in regards to the environment...
PPTX
Structure Identification Using High Resolution Mass Spectrometry Data and the...
PPTX
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
PPTX
Non-targeted analysis supported by data and cheminformatics delivered via the...
PPTX
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
PPTX
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
PPTX
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
PPTX
Does bigger mean better in the world of chemistry databases?
PPTX
Structure Identification Using High Resolution Mass Spectrometry Data and the...
PPTX
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Chemical identification of unknowns in high resolution mass spectrometry usin...
SETAC Rome Non-Target Screening For Chemical Discovery
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
Small Molecules in Big Data - Analytica Munich
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
An examination of data quality on QSAR Modeling in regards to the environment...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
Non-targeted analysis supported by data and cheminformatics delivered via the...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
Does bigger mean better in the world of chemistry databases?
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Ad

Similar to Overview of open resources to support automated structure verification and elucidation (20)

PPTX
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
PDF
Using publicly available resources to build a comprehensive knowledgebase of ...
PPTX
Cheminformatics approaches to support chemical identification delivered via t...
PPTX
Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls
PDF
Small Molecules in Big Data fTALES Ghent
PPTX
Evolution of public chemistry databases: past and the future
PPTX
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
PDF
Publication of raw and curated NMR spectroscopic data for organic molecules
PPTX
Environmental Cheminformatics for Unknown ID UC Davis Nov 2018
PPTX
Delivering web-based access to data and algorithms to support computational t...
PPT
PPTX
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
PPT
The UK National Chemical Database Service – an integration of commercial and ...
PPTX
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
PPTX
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
PPTX
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
PPTX
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
PPTX
Chemistry data delivery from the US-EPA to support environmental chemistry
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
Using publicly available resources to build a comprehensive knowledgebase of ...
Cheminformatics approaches to support chemical identification delivered via t...
Automated Structure Annotation and Curation for MassBank: Potential and Pitfalls
Small Molecules in Big Data fTALES Ghent
Evolution of public chemistry databases: past and the future
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Publication of raw and curated NMR spectroscopic data for organic molecules
Environmental Cheminformatics for Unknown ID UC Davis Nov 2018
Delivering web-based access to data and algorithms to support computational t...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
The UK National Chemical Database Service – an integration of commercial and ...
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
CompTox Chemicals Dashboard: Data and tools to support chemical and environme...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
Chemistry data delivery from the US-EPA to support environmental chemistry
Ad

Recently uploaded (20)

PPTX
2. Earth - The Living Planet Module 2ELS
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
2. Earth - The Living Planet Module 2ELS
bbec55_b34400a7914c42429908233dbd381773.pdf
TOTAL hIP ARTHROPLASTY Presentation.pptx
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Comparative Structure of Integument in Vertebrates.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
INTRODUCTION TO EVS | Concept of sustainability
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
The KM-GBF monitoring framework – status & key messages.pptx
Derivatives of integument scales, beaks, horns,.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
7. General Toxicologyfor clinical phrmacy.pptx

Overview of open resources to support automated structure verification and elucidation