SlideShare a Scribd company logo
ChemSpider – Building an Online Database of Open Spectra
Antony J. Williams1,Valery Tkachenko1,Alexey Pshenichnov1, Daniel Lowe2, Carlos Coba3,
Kevin Theisen4 and Rudy Potenzone4
1. Royal Society of Chemistry 2. NextMove Software 3. Mestrelab Research 4. iChemLabs LLC
Introduction: ChemSpider is an online
database of over 30 million chemical
compounds from >500 different sources
including chemical vendors, online
public resources and publications.
ChemSpider allows deposition of data
including structures, properties, and
various forms of spectral data. One
activity of the project is to host a
searchable database of 1D/2D NMR,
IR, Raman and Mass Spectral data.
ChemSpider has over 20000 spectra
and expands as the community
deposits additional data.
Sources of Spectral Data: The
majority of data are deposited by users
of ChemSpider. Submission of spectra
in the form of JCAMP-DX, or
images/PDF (for all spectra but
especially for 2D NMR) are supported.
Community-based curators will validate
and annotate the data to ensure that
only the highest quality data are
available on the database.
To create a large NMR database
we are using “text-mining” to extract
spectral data, together with their
associated chemical compounds, then
simulating visual forms of the spectra,.
We have text-mined a large patent
corpus to extract many hundreds of
thousands of NMR spectra to produce
visual depictions as shown in Figure 1.
Text mined spectra are of the form:
1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me,
C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35
(t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H),
4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J =
8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
Figure 1: A spectral depiction from
converting the text-mined spectrum
above. This can be stored in JCAMP to
build a spectral database.
Spectral Visualization: Spectra are
viewed inside the JSpecView spectral
display widget1. Zooming, scrolling and
integration are possible. 2DNMR
spectra are viewed only as images.
Figure 2: The JSpecView spectral
viewing applet supports viewing JCAMP
spectra of 1D NMR, IR, UV-Vis and
Mass Spectrometry data.
Spectroscopic techniques produce
NMR and IR vibrational assignments,
and mass fragment peaks. We are now
working with iChemLabs HTML5
widgets2 for the display of assignments.
Figure 3: Assignments of spectral-
structure associations. Selecting the
peak at 7.5ppm highlights the protons
on the molecule. The assignments are
contained in the JCAMP spectral format.
Future Directions: We intend to
continue to grow the spectral database
by encouraging further depositions from
the community as well as investigating
the possibility of converting spectral
figures to spectral data to host in the
database.
References
1)JSpecView Project: an Open Source
Java viewer and converter for
JCAMP-DX, and XML spectral data
files,http://www.journal.chemistrycentr
al.com/content/1/1/31
2)iChemLabs Web Components
Spectrum Structure Correlations:
http://tinyurl.com/pkz26xf

More Related Content

PPT
Dealing with the complex challenge of managing diverse analytical chemistry d...
PPT
Big data challenges associated with building a national data repository for c...
PPT
Digitizing documents to provide a public spectroscopy database
PPT
Building a data repository to manage chemistry research data
PDF
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
PPT
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
PPT
Cheminformatics and the Structure Elucidation of Natural Products
Dealing with the complex challenge of managing diverse analytical chemistry d...
Big data challenges associated with building a national data repository for c...
Digitizing documents to provide a public spectroscopy database
Building a data repository to manage chemistry research data
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
Cheminformatics and the Structure Elucidation of Natural Products

What's hot (19)

PPT
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
PPT
How the InChI identifier is used to underpin our online chemistry databases a...
PPTX
The needs for chemistry standards, database tools and data curation at the ch...
PPT
Clustering the royal society of chemistry chemical repository to enable enhan...
PPT
Royal society of chemistry activities to develop a data repository for chemis...
PPTX
Chemical intelligence that makes hidden knowledge effortlessly reachable
PPT
The application of text and data mining to enhance the RSC publication archive
PPTX
Semantically supporting data discovery, markup and aggregation in EMODnet
PPT
PPTX
ACS 248th Paper 71 ChAMP Project
PDF
Evidence-based medicinal chemistry using matched molecular series
PPTX
Building a Standard for Standards: The ChAMP Project
PPTX
Serving the medicinal chemistry community with Royal Society of Chemistry che...
PDF
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
PPTX
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
PPT
Sourcing high quality online data resources for computational toxicology
PPT
Hosting a compound centric community resource for chemistry data
PDF
ICIC 2017: New Product Introduction info apps
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
How the InChI identifier is used to underpin our online chemistry databases a...
The needs for chemistry standards, database tools and data curation at the ch...
Clustering the royal society of chemistry chemical repository to enable enhan...
Royal society of chemistry activities to develop a data repository for chemis...
Chemical intelligence that makes hidden knowledge effortlessly reachable
The application of text and data mining to enhance the RSC publication archive
Semantically supporting data discovery, markup and aggregation in EMODnet
ACS 248th Paper 71 ChAMP Project
Evidence-based medicinal chemistry using matched molecular series
Building a Standard for Standards: The ChAMP Project
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
Sourcing high quality online data resources for computational toxicology
Hosting a compound centric community resource for chemistry data
ICIC 2017: New Product Introduction info apps
Ad

Similar to ChemSpider - building an online database of open spectra (20)

PPT
PPTX
DESI Mass Spectrometry
PDF
Development and Validation of a Combined Photoacoustic Micro-Ultrasound Syste...
PPT
The importance of standards for data exchange and interchange on the Royal So...
PPTX
Evolution of open chemical information
PDF
Capillary Electrophoresis of Nucleic Acids Volume I Introduction to the Capil...
PPT
Using online chemistry databases to facilitate structure identification in ma...
PPT
Teaching analytical spectroscopy using online spectroscopic data
PPTX
Multisite UTE 31P Rosette MRSI(PETALUTE)
PDF
NMR, deep learning and molecular structure: a call for data
PPTX
Text mining to produce large chemistry datasets for community access
PPTX
Biophotonics
PPT
Current initiatives in developing research data repositories at the Royal Soc...
PDF
ISMRM_2006-2015_compressed
PPTX
2D NMR ORGANIC SPECTROSCOPY by DR ANTHONY CRASTO
PDF
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
PDF
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
PPTX
NOMAD
PDF
Clinical Applications of Proton MR Spectroscopy.pdf
PDF
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
DESI Mass Spectrometry
Development and Validation of a Combined Photoacoustic Micro-Ultrasound Syste...
The importance of standards for data exchange and interchange on the Royal So...
Evolution of open chemical information
Capillary Electrophoresis of Nucleic Acids Volume I Introduction to the Capil...
Using online chemistry databases to facilitate structure identification in ma...
Teaching analytical spectroscopy using online spectroscopic data
Multisite UTE 31P Rosette MRSI(PETALUTE)
NMR, deep learning and molecular structure: a call for data
Text mining to produce large chemistry datasets for community access
Biophotonics
Current initiatives in developing research data repositories at the Royal Soc...
ISMRM_2006-2015_compressed
2D NMR ORGANIC SPECTROSCOPY by DR ANTHONY CRASTO
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
NOMAD
Clinical Applications of Proton MR Spectroscopy.pdf
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
Ad

Recently uploaded (20)

PDF
Sciences of Europe No 170 (2025)
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
famous lake in india and its disturibution and importance
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPT
6.1 High Risk New Born. Padetric health ppt
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Sciences of Europe No 170 (2025)
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
POSITIONING IN OPERATION THEATRE ROOM.ppt
Phytochemical Investigation of Miliusa longipes.pdf
Biophysics 2.pdffffffffffffffffffffffffff
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
famous lake in india and its disturibution and importance
Introduction to Cardiovascular system_structure and functions-1
The KM-GBF monitoring framework – status & key messages.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
7. General Toxicologyfor clinical phrmacy.pptx
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
6.1 High Risk New Born. Padetric health ppt
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...

ChemSpider - building an online database of open spectra

  • 1. ChemSpider – Building an Online Database of Open Spectra Antony J. Williams1,Valery Tkachenko1,Alexey Pshenichnov1, Daniel Lowe2, Carlos Coba3, Kevin Theisen4 and Rudy Potenzone4 1. Royal Society of Chemistry 2. NextMove Software 3. Mestrelab Research 4. iChemLabs LLC Introduction: ChemSpider is an online database of over 30 million chemical compounds from >500 different sources including chemical vendors, online public resources and publications. ChemSpider allows deposition of data including structures, properties, and various forms of spectral data. One activity of the project is to host a searchable database of 1D/2D NMR, IR, Raman and Mass Spectral data. ChemSpider has over 20000 spectra and expands as the community deposits additional data. Sources of Spectral Data: The majority of data are deposited by users of ChemSpider. Submission of spectra in the form of JCAMP-DX, or images/PDF (for all spectra but especially for 2D NMR) are supported. Community-based curators will validate and annotate the data to ensure that only the highest quality data are available on the database. To create a large NMR database we are using “text-mining” to extract spectral data, together with their associated chemical compounds, then simulating visual forms of the spectra,. We have text-mined a large patent corpus to extract many hundreds of thousands of NMR spectra to produce visual depictions as shown in Figure 1. Text mined spectra are of the form: 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH) Figure 1: A spectral depiction from converting the text-mined spectrum above. This can be stored in JCAMP to build a spectral database. Spectral Visualization: Spectra are viewed inside the JSpecView spectral display widget1. Zooming, scrolling and integration are possible. 2DNMR spectra are viewed only as images. Figure 2: The JSpecView spectral viewing applet supports viewing JCAMP spectra of 1D NMR, IR, UV-Vis and Mass Spectrometry data. Spectroscopic techniques produce NMR and IR vibrational assignments, and mass fragment peaks. We are now working with iChemLabs HTML5 widgets2 for the display of assignments. Figure 3: Assignments of spectral- structure associations. Selecting the peak at 7.5ppm highlights the protons on the molecule. The assignments are contained in the JCAMP spectral format. Future Directions: We intend to continue to grow the spectral database by encouraging further depositions from the community as well as investigating the possibility of converting spectral figures to spectral data to host in the database. References 1)JSpecView Project: an Open Source Java viewer and converter for JCAMP-DX, and XML spectral data files,http://www.journal.chemistrycentr al.com/content/1/1/31 2)iChemLabs Web Components Spectrum Structure Correlations: http://tinyurl.com/pkz26xf