SlideShare a Scribd company logo
Navigating the Complex Web of Chemistry Using ChemSpider
Imagine a time when …. The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar) Chemistry articles are indexed and searchable by a free online service The web is linked together through the “language of chemistry” Publicly funded research data can be shared and discussed in the Open, maybe as ONS?
It’s Coming…Linked Data Cloud
For Synthesis…TotallySynthetic.com
Org Prep Daily  (Blog)
Molbank (Open Access Journal)
Synthetic Pages (Website)
For Chemical Compounds Vendor sites – Aldrich, Alfa Aesar, TCI and 100s of others Government databases – PubChem, DSSTox, FDA databases, ChemIDPlus,… Biological Databases – Protein Database, Stitch, KEGG, ChEBI,… Analytical databases – Red Hen Spectra, NMRShiftDB,…
For Chemical Compounds Vendor sites – Aldrich, Alfa Aesar, TCI and 100s of others Government databases – PubChem, DSSTox, FDA databases, ChemIDPlus,… Biological Databases – Protein Database, Stitch, KEGG, ChEBI,… Analytical databases – Red Hen Spectra, NMRShiftDB,…
What is ChemSpider? ChemSpider is: Building a Structure Centric Community for Chemists 22.2 million compounds, >200 data sources A deposition and curation platform A publishing platform for the community Grows daily – more depositions, more links, more data sources
How Was ChemSpider Built? ChemSpider was a “hobby project”  Housed in a basement and running off three servers – one bought, two built Sensitive to weather and power stability Went live at ACS Spring 2007 in Chicago
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Linked across the internet
Kyoto Encyclopedia of Genes and Genomes
Link off a structure in ChemSpider Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “ Everything”
Links to Patents based on structure
Clickthrough to Patent (SureChem)
Pubmed Articles Linked
Answering Questions for Chemists Questions a chemist might ask… What is the melting point of n-butanol?  What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?
Complex Data and Information
ChemSpider is a structure-centric hub ChemSpider aggregates and links out across the internet Data aggregate based on “structures and links” What defines a chemical compound?
What is a compound?
Question Everything online: www.dhmo.org
PubChem
Caution! Question Everything!
Vancomycin Who will curate? PubChem is not resourced to clean these errors How would you clean such a large dataset?
Vancomycin on ChemSpider  1 compound – 3 days
The EXPERTS must get it right?!
Wikipedia, C&E News, PubChem C&E News (from ACS)
Feedback from Steve Ritter “ As for where we source our structures, our  primary source is the researcher and peer-reviewed papers , because many compounds are novel.  ..we always double check them against one or more primary sources, typically Merck Index and SciFinder.  Although CAS and C&EN are both part of the ACS Publications Division,  we at C&EN still have to pay for our SciFinder access, strangely enough.”
Feedback from Steve Ritter “ As a rule,  we at C&EN don’t use Wikipedia as a primary source for structures or chemical information, and I recommend that policy to anyone .” “ It would be  nice to have an authoritative web-based source of standard, well-drawn structures  for chemists to go to so they can freely cut and paste structures into their papers, PowerPoint presentations, and anything else they might need.  Maybe Wikipedia will be that source one day .”
What About Digitonin?
Comments on the Blog Kirill Degtyarenko  says:   September 15th, 2009 at 1:57 pm   It looks like both ChEBI and Wikipedia structures are wrong as far as aglycon is concerned. According to  http://www3.interscience.wiley.com/journal/20330/abstract “… for the first time to confirm beyond all doubt the structure suggested by Tschesche and Wulff for digitonin by means of modern NMR techniques, and to assign all proton and carbon resonances.” Structure 1 shows methyl group at C-20 going UP, i.e. 20β (while by default spirostan is 20α).
CAS as an authority
The Blogging Community Participate
Will it ever end? The community says the structure of digitonin has “up” 20-Methyl. If so, then multiple substances related to digitonin have OPPOSITE stereo at 20-Methyl The spirostane skeleton is considered to have a “down” Methyl group so all spirostane-related structures would be wrong The ACD/Dictionary has 24 structures with close skeleton and all have the “down” Methyl group.
The FDA’s DailyMed
  Structures on DailyMed
Lack of Stereochemisty
  Incorrect Structures
Wow!
Collaborative Knowledge Management
Drugbank
Taxol on PubChem
FDA’s DailyMed
The InChI Identifier
Multiple Layers
InChIStrings Hash to InChIKeys
InChIs for Taxol
Back to Taxol DrugBank: RCINICONZNJXQF-CLDWUXIMDD ChEBI:   RCINICONZNJXQF-GXKQXQCDDN  Wikipedia: RCINICONZNJXQF-MZXODVADBJ Which one is correct???
InChIKeys for Taxol DrugBank: RCINICONZNJXQF-CLDWUXIMDD ChEBI:   RCINICONZNJXQF-GXKQXQCDDN  Wikipedia: RCINICONZNJXQF-MZXODVADBJ ChEBI and Wikipedia are the SAME structure Drugbank is a DIFFERENT structure – ONE stereocenter
Does one stereocenter matter?
Does one stereocenter matter? Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
Does one stereocenter matter? Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
Building a Structure Centric Community for Chemists
Assertion and  Chemical Entities Who says what Taxol is? What is the “timeline” for a molecule? How do we clean up the Public data? The Quality source is Chemical Abstracts Service…
ChemSpider Searches
ChemSpider Searches
ChemSpider Complex Searches
ChemSpider Searches
ChemSpider Searches
InChIKey Searches Work
The InChI “Resolver”
Content is King and  Quality  Costs Chemistry “content” is big  money Patent searching Structures and properties Drug databases Literature databases Chemical Abstracts Service  (CAS), the “Gold Standard” in Chemistry related information 101 years of content $260 million revenue (2006) >50 million substances  >60 million sequences
Crowd-sourcing Chemistry Curation Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
Chemistry – A Deposition Platform CAS indexes published literature, patents and chemical vendors CAS indexes ChemSpider – >303,000 records “ Lost Chemistry” – syntheses in theses, lab notebooks? Compounds in private collections?  ChemSpider accepts public depositions, linking to websites, hosting of details etc. Accepts structures, text, spectra, images.
Structure Searching Articles… Searching articles based on chemical structure and substructure is very expensive.. but is changing The web IS ready - when will publishers deliver? Structures can be shown Spectra can be interactive Graphics don’t need to be static Publishers can enhance their articles
Semantic Mark-up for Chemistry Semantic mark-up for chemistry is here RSC project prospect (structure linking, IUPAC Gold Book ontology and other ontologies  Nature publishing group compound linking ChemSpider Journal of Chemistry
Nature Chemistry Compound Pages
Project Prospect
ChemSpider and Publishing The curation efforts on ChemSpider led to a set of validated dictionaries Integrate best-in-class  entity extraction  with validated name dictionaries  Additional dictionaries gave reactions, groups, families, hardware and software vendors etc
Name Recognition Azo aldehyde  2   was  synthesized according to a reported  method [17]. To  a stirred  solution  of azo aldehyde  2   (1.08 g, 3.76 mmol )  in  dry CH2Cl2  (30.00 mL)  were  successively  added.  (3,4-diaminophenyl)phenyl methanone  1 (0.40 g, 1.88 mmol) and an excess of anhydrous MgSO4 (2.00 g, 16.67 mmol) .
 
ChemMantis and CJOC
Name-Structure Pairs
Converting Detected Names… Names are searched against a validated dictionary (this expands as ChemSpider is curated) If not found then they are passed through a Name to Structure algorithm If they cannot convert then ChemSpider is searched for non-validated names
Manual Curation is Necessary
Deposit Structures
Custom Dictionaries Entity Extraction built around modified algorithms from SureChem Optimized for “publications” Dictionaries for chemical entities, groups, reactions, elements, families, species… Dictionaries can be expanded
Species – linked to Wikipedia
Semantic Linking of Structures What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “ Everything”
RSC Supplementary Info
RSC Supplementary Info
ChemSpider Synthesis ChemSpider Synthesis will be a home for all things “synthetic”  An online resource for synthetic procedures from blogs, other online resources, RSC supplementary info, other publishers etc. Public peer-review and feedback for synthetic procedures
Online Journals and Live Data
ChemSpider Everywhere Linked from Wikipedia Linked from Open Notebook Science sites using EMBED Linked from Blogs using Structure/Spectra EMBED Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets Integrated to software offerings from Thermo, Waters, Agilent, Bruker
ChemSpider Everywhere : Embed
ChemSpider Everywhere: Spectral Game
ChemSpider Everywhere : ChemMobi
Not in a basement now...
Conclusions The internet enables chemistry, at a reduced cost Web 2.0 is here and improving quality Question Quality! Crowdsourcing to expand, curate and integrate InChIs are enabling chemistry on the internet
You are invited.. Deposit your data with us Structures Spectra Synthesis procedures ChemSpider Synthesis is under development What is Digitonin?
Acknowledgments Valery Tkachenko and Sergey Golotvin RSC infrastructure team The ChemSpider advisory group The Wikipedia Chemistry team JC Bradley, Andy Lang – Spectral Game
Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog

More Related Content

PPT
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
PPT
PPT
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
PPT
Taming The Wild West Of Internet Based Chemistry You Can Help
PPTX
RSC ChemSpider – Building An Internet Based Community For Chemists
PPT
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
Taming The Wild West Of Internet Based Chemistry You Can Help
RSC ChemSpider – Building An Internet Based Community For Chemists
ChemSpider – The Vision and Challenges Associated with Building a Free Online...

What's hot (20)

PPT
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
PPT
How Internet Resources Are Providing a Collaborative Community for Chemistry
PPT
Text Mining for Chemistry and Building a Public Platform for Document Markup
PPT
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
PPT
Citizen Scientists and Their Contributions to Internet Based Chemistry
PPTX
Serving the medicinal chemistry community with Royal Society of Chemistry che...
PPT
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
PPT
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Comm...
PPT
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
PPT
PPT
PPT
Enhancing Discoverability Across Royal Society Of Chemistry Content By Integr...
PPT
Supporting the exploding dimensions of the chemical sciences via global netwo...
PPT
Crowdsourcing, Collaborations And Text Mining In A World Of Open Chemistry
PPT
Building a semantic chemistry platform with the royal society of chemistry
PPT
The royal society of chemistry and its adoption of semantic web technologies ...
PPT
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
How Internet Resources Are Providing a Collaborative Community for Chemistry
Text Mining for Chemistry and Building a Public Platform for Document Markup
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Citizen Scientists and Their Contributions to Internet Based Chemistry
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
ChemSpider - Building a Crowdsourced Chemical Database for the Chemistry Comm...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
Enhancing Discoverability Across Royal Society Of Chemistry Content By Integr...
Supporting the exploding dimensions of the chemical sciences via global netwo...
Crowdsourcing, Collaborations And Text Mining In A World Of Open Chemistry
Building a semantic chemistry platform with the royal society of chemistry
The royal society of chemistry and its adoption of semantic web technologies ...
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
Ad

Viewers also liked (7)

PPT
LETA digitalizācijas pakalpojumi
PDF
Open Source Creativity
PDF
The impact of innovation on travel and tourism industries (World Travel Marke...
PPSX
Reuters: Pictures of the Year 2016 (Part 2)
PDF
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
PDF
The Six Highest Performing B2B Blog Post Formats
PDF
The Outcome Economy
LETA digitalizācijas pakalpojumi
Open Source Creativity
The impact of innovation on travel and tourism industries (World Travel Marke...
Reuters: Pictures of the Year 2016 (Part 2)
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
The Six Highest Performing B2B Blog Post Formats
The Outcome Economy
Ad

Similar to Navigating the Complex Web of Chemistry Using ChemSpider (20)

PPT
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
PPT
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
PPT
Integrating and curating internet based chemistry resources to serve life sci...
PPT
AZ of Chemspider February 2011
PPT
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
PPT
Whitney Symposium Lecturejune 2008 1220331644496491 9
PPT
Chem spider introduction spring 2011
PPT
ChemSpider hosting linking and curating chemistry data for the community
PPT
Chemspider hosting linking and curating chemistry data for the community
PDF
A Presentation at Nature Publishing Group Crowdsourcing, Collaborations and T...
PPT
ChemSpider as an integration hub for interlinked chemistry data
PPT
eScience Resources for the Chemistry Community from the Royal Society of Chem...
PPT
Hosting public domain chemicals data online for the community – the challenge...
PPT
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
PPT
Using Text-Mining and Crowdsourced Curation to Build a Structure Centric Comm...
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
Integrating and curating internet based chemistry resources to serve life sci...
AZ of Chemspider February 2011
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
Whitney Symposium Lecturejune 2008 1220331644496491 9
Chem spider introduction spring 2011
ChemSpider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the community
A Presentation at Nature Publishing Group Crowdsourcing, Collaborations and T...
ChemSpider as an integration hub for interlinked chemistry data
eScience Resources for the Chemistry Community from the Royal Society of Chem...
Hosting public domain chemicals data online for the community – the challenge...
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
Using Text-Mining and Crowdsourced Curation to Build a Structure Centric Comm...

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Approach and Philosophy of On baking technology
PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Dropbox Q2 2025 Financial Results & Investor Presentation
The Rise and Fall of 3GPP – Time for a Sabbatical?
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Building Integrated photovoltaic BIPV_UPV.pdf
Approach and Philosophy of On baking technology
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Encapsulation theory and applications.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Programs and apps: productivity, graphics, security and other tools
Unlocking AI with Model Context Protocol (MCP)
sap open course for s4hana steps from ECC to s4
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Navigating the Complex Web of Chemistry Using ChemSpider

  • 1. Navigating the Complex Web of Chemistry Using ChemSpider
  • 2. Imagine a time when …. The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar) Chemistry articles are indexed and searchable by a free online service The web is linked together through the “language of chemistry” Publicly funded research data can be shared and discussed in the Open, maybe as ONS?
  • 5. Org Prep Daily (Blog)
  • 8. For Chemical Compounds Vendor sites – Aldrich, Alfa Aesar, TCI and 100s of others Government databases – PubChem, DSSTox, FDA databases, ChemIDPlus,… Biological Databases – Protein Database, Stitch, KEGG, ChEBI,… Analytical databases – Red Hen Spectra, NMRShiftDB,…
  • 9. For Chemical Compounds Vendor sites – Aldrich, Alfa Aesar, TCI and 100s of others Government databases – PubChem, DSSTox, FDA databases, ChemIDPlus,… Biological Databases – Protein Database, Stitch, KEGG, ChEBI,… Analytical databases – Red Hen Spectra, NMRShiftDB,…
  • 10. What is ChemSpider? ChemSpider is: Building a Structure Centric Community for Chemists 22.2 million compounds, >200 data sources A deposition and curation platform A publishing platform for the community Grows daily – more depositions, more links, more data sources
  • 11. How Was ChemSpider Built? ChemSpider was a “hobby project” Housed in a basement and running off three servers – one bought, two built Sensitive to weather and power stability Went live at ACS Spring 2007 in Chicago
  • 18. Linked across the internet
  • 19. Kyoto Encyclopedia of Genes and Genomes
  • 20. Link off a structure in ChemSpider Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “ Everything”
  • 21. Links to Patents based on structure
  • 24. Answering Questions for Chemists Questions a chemist might ask… What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?
  • 25. Complex Data and Information
  • 26. ChemSpider is a structure-centric hub ChemSpider aggregates and links out across the internet Data aggregate based on “structures and links” What defines a chemical compound?
  • 27. What is a compound?
  • 31. Vancomycin Who will curate? PubChem is not resourced to clean these errors How would you clean such a large dataset?
  • 32. Vancomycin on ChemSpider 1 compound – 3 days
  • 33. The EXPERTS must get it right?!
  • 34. Wikipedia, C&E News, PubChem C&E News (from ACS)
  • 35. Feedback from Steve Ritter “ As for where we source our structures, our primary source is the researcher and peer-reviewed papers , because many compounds are novel. ..we always double check them against one or more primary sources, typically Merck Index and SciFinder. Although CAS and C&EN are both part of the ACS Publications Division, we at C&EN still have to pay for our SciFinder access, strangely enough.”
  • 36. Feedback from Steve Ritter “ As a rule, we at C&EN don’t use Wikipedia as a primary source for structures or chemical information, and I recommend that policy to anyone .” “ It would be nice to have an authoritative web-based source of standard, well-drawn structures for chemists to go to so they can freely cut and paste structures into their papers, PowerPoint presentations, and anything else they might need. Maybe Wikipedia will be that source one day .”
  • 38. Comments on the Blog Kirill Degtyarenko says: September 15th, 2009 at 1:57 pm It looks like both ChEBI and Wikipedia structures are wrong as far as aglycon is concerned. According to http://www3.interscience.wiley.com/journal/20330/abstract “… for the first time to confirm beyond all doubt the structure suggested by Tschesche and Wulff for digitonin by means of modern NMR techniques, and to assign all proton and carbon resonances.” Structure 1 shows methyl group at C-20 going UP, i.e. 20β (while by default spirostan is 20α).
  • 39. CAS as an authority
  • 40. The Blogging Community Participate
  • 41. Will it ever end? The community says the structure of digitonin has “up” 20-Methyl. If so, then multiple substances related to digitonin have OPPOSITE stereo at 20-Methyl The spirostane skeleton is considered to have a “down” Methyl group so all spirostane-related structures would be wrong The ACD/Dictionary has 24 structures with close skeleton and all have the “down” Methyl group.
  • 43. Structures on DailyMed
  • 45. Incorrect Structures
  • 46. Wow!
  • 53. InChIStrings Hash to InChIKeys
  • 55. Back to Taxol DrugBank: RCINICONZNJXQF-CLDWUXIMDD ChEBI: RCINICONZNJXQF-GXKQXQCDDN Wikipedia: RCINICONZNJXQF-MZXODVADBJ Which one is correct???
  • 56. InChIKeys for Taxol DrugBank: RCINICONZNJXQF-CLDWUXIMDD ChEBI: RCINICONZNJXQF-GXKQXQCDDN Wikipedia: RCINICONZNJXQF-MZXODVADBJ ChEBI and Wikipedia are the SAME structure Drugbank is a DIFFERENT structure – ONE stereocenter
  • 58. Does one stereocenter matter? Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • 59. Does one stereocenter matter? Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
  • 60. Building a Structure Centric Community for Chemists
  • 61. Assertion and Chemical Entities Who says what Taxol is? What is the “timeline” for a molecule? How do we clean up the Public data? The Quality source is Chemical Abstracts Service…
  • 69. Content is King and Quality Costs Chemistry “content” is big money Patent searching Structures and properties Drug databases Literature databases Chemical Abstracts Service (CAS), the “Gold Standard” in Chemistry related information 101 years of content $260 million revenue (2006) >50 million substances >60 million sequences
  • 70. Crowd-sourcing Chemistry Curation Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 71. Chemistry – A Deposition Platform CAS indexes published literature, patents and chemical vendors CAS indexes ChemSpider – >303,000 records “ Lost Chemistry” – syntheses in theses, lab notebooks? Compounds in private collections? ChemSpider accepts public depositions, linking to websites, hosting of details etc. Accepts structures, text, spectra, images.
  • 72. Structure Searching Articles… Searching articles based on chemical structure and substructure is very expensive.. but is changing The web IS ready - when will publishers deliver? Structures can be shown Spectra can be interactive Graphics don’t need to be static Publishers can enhance their articles
  • 73. Semantic Mark-up for Chemistry Semantic mark-up for chemistry is here RSC project prospect (structure linking, IUPAC Gold Book ontology and other ontologies Nature publishing group compound linking ChemSpider Journal of Chemistry
  • 76. ChemSpider and Publishing The curation efforts on ChemSpider led to a set of validated dictionaries Integrate best-in-class entity extraction with validated name dictionaries Additional dictionaries gave reactions, groups, families, hardware and software vendors etc
  • 77. Name Recognition Azo aldehyde 2   was  synthesized according to a reported  method [17]. To  a stirred  solution  of azo aldehyde 2   (1.08 g, 3.76 mmol )  in  dry CH2Cl2  (30.00 mL) were  successively  added.  (3,4-diaminophenyl)phenyl methanone 1 (0.40 g, 1.88 mmol) and an excess of anhydrous MgSO4 (2.00 g, 16.67 mmol) .
  • 78.  
  • 81. Converting Detected Names… Names are searched against a validated dictionary (this expands as ChemSpider is curated) If not found then they are passed through a Name to Structure algorithm If they cannot convert then ChemSpider is searched for non-validated names
  • 82. Manual Curation is Necessary
  • 84. Custom Dictionaries Entity Extraction built around modified algorithms from SureChem Optimized for “publications” Dictionaries for chemical entities, groups, reactions, elements, families, species… Dictionaries can be expanded
  • 85. Species – linked to Wikipedia
  • 86. Semantic Linking of Structures What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “ Everything”
  • 89. ChemSpider Synthesis ChemSpider Synthesis will be a home for all things “synthetic” An online resource for synthetic procedures from blogs, other online resources, RSC supplementary info, other publishers etc. Public peer-review and feedback for synthetic procedures
  • 90. Online Journals and Live Data
  • 91. ChemSpider Everywhere Linked from Wikipedia Linked from Open Notebook Science sites using EMBED Linked from Blogs using Structure/Spectra EMBED Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets Integrated to software offerings from Thermo, Waters, Agilent, Bruker
  • 95. Not in a basement now...
  • 96. Conclusions The internet enables chemistry, at a reduced cost Web 2.0 is here and improving quality Question Quality! Crowdsourcing to expand, curate and integrate InChIs are enabling chemistry on the internet
  • 97. You are invited.. Deposit your data with us Structures Spectra Synthesis procedures ChemSpider Synthesis is under development What is Digitonin?
  • 98. Acknowledgments Valery Tkachenko and Sergey Golotvin RSC infrastructure team The ChemSpider advisory group The Wikipedia Chemistry team JC Bradley, Andy Lang – Spectral Game
  • 99. Thank you [email_address] Twitter: ChemSpiderman www.chemspider.com/blog