SlideShare a Scribd company logo
Validating the Open Spectral
                                                                                                                                                          Ryan Sasaki1, Sergey Golotvin2
                                                                                                                                                              and Antony Williams3

                                                                                                                                                     1 Advanced  Chemistry Development, Inc.

     Database NMR Collection using ACD/Labs
                                                                                                                                                       (ACD/Labs)
                                                                                                                                                     2 ACD Moscow Inc., Moscow,
                                                                                                                                                       Russian Federation


             Verification Algorithms
                                                                                                                                                     3 ChemSpider, Royal Society of Chemistry,
                                                                                                                                                       904 Tamaras Circle, Wake Forest,
                                                                                                                                                       North Carolina 27587, USA



Introduction                                                             2) Chemical shift, integration, and multiplicity information are          Other encountered issues include spectra with low resolution,
In parallel with the development of new 2D NMR techniques, new             predicted for the proposed chemical structure and compared with         incorrect spectrometer frequency, unknown solvents, and of course a
ChemSpider is a free online database of over 26 million unique             the related properties extracted from the experimental spectrum.        series of incorrectly proposed structures
chemical compounds sourced from over 400 different sources                 A comparison is then made based on an auto-assignment
including government laboratories, chemical vendors, and public            procedure3 that finds the best possible fit as the minimum of a
resources. ChemSpider allows its users to deposit data including           special objective function.
structures, properties, links to external resources, and various forms
of spectral data. ChemSpider has aggregated over 2000 high quality         A similar approach is taken for 13C NMR verification but compares
NMR spectra and continues to expand as the community deposits              the experimental and predicted chemical shift values and peak
additional data. The data are generally validated by the community         heights. In both cases the output for each verification procedure
but a batch-wise verification of all 1D 1H and 13C NMR spectral data       is a Match Factor metric (0-1) produced to illustrate the level of
in the database was performed using ACD/Labs NMR verification              consistency between the proposed structure and the experimental         Figure 2: Example of a 1H NMR spectrum with a mixture of
software.                                                                  spectrum. For the purpose of the 1H NMR study, structure-spectrum       components as evidenced by integral values.
                                                                           pairs that generate a match factor >0.8 were considered consistent.
Sources of Spectral Data                                                   For 13C NMR, a match factor of >0.75 was considered consistent.         Inconsistent results for the 13C NMR data were also evaluated. Close
Databases of structures with associated NMR assignments are                                                                                        inspection revealed that the biggest culprit was due to poor S/N that
available as commercial or open data. However, databases of              Analysis of Data                                                          led to the absence of 13C peaks for quaternary carbons. As a result,
NMR spectral curves are less common and generally limited to             The ACD/Labs automated 1H and 13C verification routines were run          the software was unable to find peaks corresponding to quaternary
metabonomics data (for example, the BMRB1 and DrugBank2). One            on the NMR spectra dataset from ChemSpider. The results of this           carbons in many proposed structures and thus a significant number
component of the ChemSpider project is to gather, host, and make         procedure are shown in Figure 1 below:                                    of inconsistent results were observed.
available a structure searchable database of spectral data: 1D/2D                          7%
                                                                                                                    8%


NMR, IR, Raman, and MS. The majority of data are deposited by users                 16%                                                            Conclusions
of ChemSpider. Submission of spectra in the form of JCAMP-DX (for                                           25%
                                                                                                                                                   ChemSpider is an online structure database allowing the community
1D spectra) or images/PDF (for 1D or 2D spectra) are supported. In                                                                                 to participate in the deposition of additional data. A growing NMR
order to deposit a spectrum a user simply searches ChemSpider for                                   77%
                                                                                                                              67%

                                                                                                                                                   spectral curve data collection is available to download. In this way
the associated structure and uploads the JCAMP-DX or image form of                                                                  Consistent
                                                                                                                                    Ambiguous      a major reference source of Open NMR data can be provided. The
                                                                                                A                        B
the spectrum. Community-based curators validate and annotate the                                                                    Inconsistent

                                                                                                                                                   validation of the existing set of spectral data has been performed
data as appropriate to ensure that only the highest quality data are     Figure 1: (A) The ACD/Labs 1H verification methodology suggests           using ACD/Labs NMR Verification routines. The data validation work
available in the database. As the data collection grew, a batchwise      that 77% of the 744 NMR spectra submitted to ChemSpider were              highlighted a number of errors in the data, that have now been
validation of the data quality was required and ACD/Labs NMR             consistent with the proposed chemical structure. (B) The ACD/Labs         resolved, as well as providing a thorough test of the algorithms on
verification software was used to perform the analysis.                  13C verification methodology suggests that 67% of the 704 NMR
                                                                                                                                                   real-world data.
                                                                         spectra submitted to ChemSpider were consistent with the proposed
ACD/Labs NMR Verification Routines                                       chemical structure.
                                                                                                                                                   References
The ACD/Labs approach to 1H NMR verification consists of two steps:                                                                                1) Biological Magnetic Resonance Bank: http://www.bmrb.wisc.edu/
1) The experimental spectrum with an attached chemical structure         Identified Issues with the Data                                           2) DrugBank: http://www.drugbank.ca/
  is automatically processed and analyzed. Analysis includes             Structures that were deemed inconsistent by the ACD/Labs system           3) Automated Structure Verification Based on 1H NMR Prediction S.S.
  automated peak picking, integration, and multiplicity analysis         were manually reviewed. The most frequent reason for inconsistent            Golotvin, E.Vodopianov, B.A. Lefebvre, A.J. Williams, and T.D. Spitzer
                                                                                                                                                      (GSK) Magn. Reson. Chem., 44 (5) 524–538, 2006.
  (extraction of coupling constants and coupling patterns). In           1H NMR verification results were in spectra where multiple

  addition, all extraneous signals present in the spectrum are           components were observed, i.e., a mixture of isomers. Typically
  identified (i.e., solvent, reference, known admixtures, etc. )         these were observed based on two signals in close proximity with                                                  Tel: (416) 368-3435
                                                                         partial integrals (for example 0.6H and 0.4H instead of 1H). Manual                                               Fax: (416) 368-5596
                                                                                                                                                                                           Toll Free: 1-800-304-3988
                                                                         inspection of all inconsistent results revealed 22 such cases where                                               Email: info@acdlabs.com
                                                                         mixtures were present.                                                                                            www.acdlabs.com

More Related Content

PDF
The Performance Validation of Neural Network Based 13C NMR Prediction Using a...
PPTX
DOC
Diccionario de las pelas(1)
TXT
Anatomy of yahoo
PPTX
Power tres deseos
PPSX
Tomada de preços móveis 2
PDF
Autoritzacio web
PDF
Pd suape cp zoneamento_proposto_a1_v1
The Performance Validation of Neural Network Based 13C NMR Prediction Using a...
Diccionario de las pelas(1)
Anatomy of yahoo
Power tres deseos
Tomada de preços móveis 2
Autoritzacio web
Pd suape cp zoneamento_proposto_a1_v1

Viewers also liked (12)

PDF
Bass 4 & 5 strings
PPT
Documentos
DOC
Study visit
PDF
Reniec empleos461 2011
DOC
Las 10 cualidades de los emprendedores
PDF
M Saleh Osman - Resume
PDF
Cinque Terre
PDF
Reniec empleos468 2011
PDF
Policiais fora de função
PDF
Resultados trofeo 2016
PDF
11 Building A Bigger Pension
PPTX
☆HTC ONE M8☆ Mejor Smartphone 2014
Bass 4 & 5 strings
Documentos
Study visit
Reniec empleos461 2011
Las 10 cualidades de los emprendedores
M Saleh Osman - Resume
Cinque Terre
Reniec empleos468 2011
Policiais fora de função
Resultados trofeo 2016
11 Building A Bigger Pension
☆HTC ONE M8☆ Mejor Smartphone 2014
Ad

Similar to Validating the ChemSpider Open Spectral Database NMR Collection using ACD/Labs Verification Algorithms (20)

PPT
PPT
The importance of standards for data exchange and interchange on the Royal So...
PPT
ACS Meeting New Orleans 2013 (CINF)
PDF
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
PPT
Digitizing documents to provide a public spectroscopy database
PPT
Importance of data standards for large scale data integration in chemistry
PPT
ChemSpider, how a free community resource of data can support teaching NMR sp...
PDF
PPT
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
PPT
Cheminformatics and the Structure Elucidation of Natural Products
PPT
Data integration and building a profile for yourself as an online scientist
PDF
Applying Computer Assisted Structure Elucidation Algorithms For The Purpose O...
PPTX
ChemValidator – an online service for validating and standardizing chemical s...
PDF
Computational Chemistry: From Theory to Practice
PDF
Robots, Small Molecules & R
PPT
Hosting public domain chemicals data online for the community – the challenge...
PPT
The importance of standards for data exchange and interchange on the Royal So...
ACS Meeting New Orleans 2013 (CINF)
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Digitizing documents to provide a public spectroscopy database
Importance of data standards for large scale data integration in chemistry
ChemSpider, how a free community resource of data can support teaching NMR sp...
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
Cheminformatics and the Structure Elucidation of Natural Products
Data integration and building a profile for yourself as an online scientist
Applying Computer Assisted Structure Elucidation Algorithms For The Purpose O...
ChemValidator – an online service for validating and standardizing chemical s...
Computational Chemistry: From Theory to Practice
Robots, Small Molecules & R
Hosting public domain chemicals data online for the community – the challenge...
Ad

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Machine Learning_overview_presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
cuic standard and advanced reporting.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
MYSQL Presentation for SQL database connectivity
NewMind AI Weekly Chronicles - August'25-Week II
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Big Data Technologies - Introduction.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Machine Learning_overview_presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
cuic standard and advanced reporting.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Review of recent advances in non-invasive hemoglobin estimation
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx

Validating the ChemSpider Open Spectral Database NMR Collection using ACD/Labs Verification Algorithms

  • 1. Validating the Open Spectral Ryan Sasaki1, Sergey Golotvin2 and Antony Williams3 1 Advanced Chemistry Development, Inc. Database NMR Collection using ACD/Labs (ACD/Labs) 2 ACD Moscow Inc., Moscow, Russian Federation Verification Algorithms 3 ChemSpider, Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, North Carolina 27587, USA Introduction 2) Chemical shift, integration, and multiplicity information are Other encountered issues include spectra with low resolution, In parallel with the development of new 2D NMR techniques, new predicted for the proposed chemical structure and compared with incorrect spectrometer frequency, unknown solvents, and of course a ChemSpider is a free online database of over 26 million unique the related properties extracted from the experimental spectrum. series of incorrectly proposed structures chemical compounds sourced from over 400 different sources A comparison is then made based on an auto-assignment including government laboratories, chemical vendors, and public procedure3 that finds the best possible fit as the minimum of a resources. ChemSpider allows its users to deposit data including special objective function. structures, properties, links to external resources, and various forms of spectral data. ChemSpider has aggregated over 2000 high quality A similar approach is taken for 13C NMR verification but compares NMR spectra and continues to expand as the community deposits the experimental and predicted chemical shift values and peak additional data. The data are generally validated by the community heights. In both cases the output for each verification procedure but a batch-wise verification of all 1D 1H and 13C NMR spectral data is a Match Factor metric (0-1) produced to illustrate the level of in the database was performed using ACD/Labs NMR verification consistency between the proposed structure and the experimental Figure 2: Example of a 1H NMR spectrum with a mixture of software. spectrum. For the purpose of the 1H NMR study, structure-spectrum components as evidenced by integral values. pairs that generate a match factor >0.8 were considered consistent. Sources of Spectral Data For 13C NMR, a match factor of >0.75 was considered consistent. Inconsistent results for the 13C NMR data were also evaluated. Close Databases of structures with associated NMR assignments are inspection revealed that the biggest culprit was due to poor S/N that available as commercial or open data. However, databases of Analysis of Data led to the absence of 13C peaks for quaternary carbons. As a result, NMR spectral curves are less common and generally limited to The ACD/Labs automated 1H and 13C verification routines were run the software was unable to find peaks corresponding to quaternary metabonomics data (for example, the BMRB1 and DrugBank2). One on the NMR spectra dataset from ChemSpider. The results of this carbons in many proposed structures and thus a significant number component of the ChemSpider project is to gather, host, and make procedure are shown in Figure 1 below: of inconsistent results were observed. available a structure searchable database of spectral data: 1D/2D 7% 8% NMR, IR, Raman, and MS. The majority of data are deposited by users 16% Conclusions of ChemSpider. Submission of spectra in the form of JCAMP-DX (for 25% ChemSpider is an online structure database allowing the community 1D spectra) or images/PDF (for 1D or 2D spectra) are supported. In to participate in the deposition of additional data. A growing NMR order to deposit a spectrum a user simply searches ChemSpider for 77% 67% spectral curve data collection is available to download. In this way the associated structure and uploads the JCAMP-DX or image form of Consistent Ambiguous a major reference source of Open NMR data can be provided. The A B the spectrum. Community-based curators validate and annotate the Inconsistent validation of the existing set of spectral data has been performed data as appropriate to ensure that only the highest quality data are Figure 1: (A) The ACD/Labs 1H verification methodology suggests using ACD/Labs NMR Verification routines. The data validation work available in the database. As the data collection grew, a batchwise that 77% of the 744 NMR spectra submitted to ChemSpider were highlighted a number of errors in the data, that have now been validation of the data quality was required and ACD/Labs NMR consistent with the proposed chemical structure. (B) The ACD/Labs resolved, as well as providing a thorough test of the algorithms on verification software was used to perform the analysis. 13C verification methodology suggests that 67% of the 704 NMR real-world data. spectra submitted to ChemSpider were consistent with the proposed ACD/Labs NMR Verification Routines chemical structure. References The ACD/Labs approach to 1H NMR verification consists of two steps: 1) Biological Magnetic Resonance Bank: http://www.bmrb.wisc.edu/ 1) The experimental spectrum with an attached chemical structure Identified Issues with the Data 2) DrugBank: http://www.drugbank.ca/ is automatically processed and analyzed. Analysis includes Structures that were deemed inconsistent by the ACD/Labs system 3) Automated Structure Verification Based on 1H NMR Prediction S.S. automated peak picking, integration, and multiplicity analysis were manually reviewed. The most frequent reason for inconsistent Golotvin, E.Vodopianov, B.A. Lefebvre, A.J. Williams, and T.D. Spitzer (GSK) Magn. Reson. Chem., 44 (5) 524–538, 2006. (extraction of coupling constants and coupling patterns). In 1H NMR verification results were in spectra where multiple addition, all extraneous signals present in the spectrum are components were observed, i.e., a mixture of isomers. Typically identified (i.e., solvent, reference, known admixtures, etc. ) these were observed based on two signals in close proximity with Tel: (416) 368-3435 partial integrals (for example 0.6H and 0.4H instead of 1H). Manual Fax: (416) 368-5596 Toll Free: 1-800-304-3988 inspection of all inconsistent results revealed 22 such cases where Email: info@acdlabs.com mixtures were present. www.acdlabs.com