SlideShare a Scribd company logo
Feeding and consuming data to
support Open Notebook Science via
          the ChemSpider Platform

Antony Williams, Jean-Claude Bradley, Andrew Lang and
                                      Valery Tkachenko

                               ACS Philadelphia August 2012
Setting the Stage
 Chemists want access to tools and data

     The more capabilities the better
     The more data the better
     And give us an API with that…
     And it should be free…
     And constantly updated…
     And all data should be Open…
     And make it fully Open Source…
     And it needs to be on my mobile…
Setting the Stage
 Chemists have access to tools and data

     The more capabilities the better – we’ll see
     The more data the better – changing daily
     And give us an API with that… - not just one
     And it should be free… - sure
     And constantly updated… - indeed..please help!
     And all data should be Open…- licensing
     And make it fully Open Source… - kinda, sorta
     And it needs to be on my mobile… - sure
Welcome to ChemSpider
 5 years, 28 million chemicals, linking 400 data
  sources and growing daily

 Hosted by the Royal Society of Chemistry
 An important part of our long term strategic vision

 Free to access
 With lots/most/all (?) of the functionality
  necessary to support chemists and Open
  Notebook Science…
Why Use ChemSpider?
Why Use ChemSpider?
Why Use ChemSpider?
Why Use ChemSpider?
Why Use ChemSpider? LINKING OUT
Why Use ChemSpider?
Why Use ChemSpider
Why Use ChemSpider
Why Use ChemSpider
Why Use ChemSpider
What about Syntheses?
ChemSpider SyntheticPages
Work in Progress – 300k Reactions
Storing ONS Reactions
 Working with JC Bradley to host ONS reactions
 Linking directly back to ONS reactions


 What if the links decay?
 Host all related ONS data – benefits of Openness!
 Future applications for RInChIs
What we have been asked for
   “Allow us to grab data”
   “Let us link”
   “Give us web services to integrate”
   “Can we store our data with you?”
   “Can you give us predictions to validate data?”
What we have been asked for
   “Allow us to grab data”
   “Let us link”
   “Give us web services to integrate”
   “Can we store our data with you?”
   “Can you give us predictions to validate data?”



 “Can you build us an ELN?”
Simple Linking to ChemSpider
 Link using ChemSpiderID
 http://www.chemspider.com/1234567
ChemSpider IDs Proliferating Now
Simple Querying Example
 http://
  www.chemspider.com/Search.aspx?q=InChIKey=XXO
Or InChI, or SMILES
 http://www.chemspider.com/Search.aspx?q=InChI=1S
  m1/s1

 http://www.chemspider.com/Search.aspx?
  q=Clc1ccc(cc1)C(O)=C3C(=O)C(=O)N([C@@H]3
  c2cccc(F)c2)CCc5c4ccccc4nc5
Better to provide APIs….
Various Flavors of API
Various Flavors of API
MANY Web Services for integration
Feeding ONS Data into ChemSpider
 ONS data can be deposited into ChemSpider and
  linked out to the ONS pages
 Simply deposit structure(s) and links
Feeding and consuming data to support open notebook science via the chem spider platform
Feeding ONS Data into ChemSpider
 ONS Solubility Challenge
Feeding ONS Data into ChemSpider
So isn’t ONS all about ELNs?
 Open Notebook Science is about
   Making records of research publicly available
    online as it is recorded

 ONS is enabled by software tools and platforms
   Keep the notebook of the researcher online
    with all raw and processed data as it is
    generated (close to or near real time)
   Notebooks as Wikis, Commercial or Free ELNs
    published to the web (choose public/private –
    what data to expose)
Feeding ELN Data into ChemSpider
 Integrate e-Notebooks into ChemSpider

   IDBS e-Workbook plug-in allows direct
    deposition of chemical structures
   Can be extended to more ELN content
      Spectra
      Reactions
      Properties etc.

      Integration Video http://tinyurl.com/9xnprqr
Feeding ELN Data into ChemSpider
How much data is lost?
 How many reactions in a thesis never get
  published?
 How many spectra of common materials could be
  shared?
 How many properties are measured and lost?
 What stands in the way of sharing?
    Is it technology?
    Permissions? “The Boss”, Licensing?

 And yes – there are data quality issues but there
  is algorithmic checking and data curation to help
What could the future look like?
 “Publicly funded” research data flows onto the web
 Licensing is clear and NOT a challenge
 Machines are picking up data and depositing

 EXAMPLE project – Any interest?
   Put your spectra/structure in folders (Dropbox)
   ChemSpider robot scoops, processes and
    deposits – opportunity with JC Bradley
   While processing also predicts spectra and
    compares for validation
Leaving the Stage
 Chemists have access to tools and data

     The more capabilities the better – what’s missing?
     The more data the better – anyone want to share?
     And give us an API with that… - ask us for help
     And it should be free… - it is
     And constantly updated… - help annotate/curate
     And all data should be Open…- licensing
     And make it fully Open Source… - book chapter
     And it needs to be on my mobile… - it is
ChemSpider Mobile
New URLs to try out
 ChemSpider Reactions:
  www.chemspider.com/reactions

 ChemSpider Validation and Standardization
  Platform: www.chemspider.com/cvsp

 ChemSpider Google:
  www.chemspider.com/google
ChemSpider Google
ChemSpider Google
Acknowledgments
 RSC Cheminformatics team
 JC Bradley’s lab
 Daniel Lowe – reactions
 Commercial Software – GGA Software,
  ACD/Labs, OpenEye
 Open Source Components
Thank you

Email: williamsa@rsc.org
Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

More Related Content

PPT
Prelim(2)(4)
PPTX
Investigating Impact Metrics for Performance for the US-EPA National Center f...
PPT
Our dire need to mandate data standards and expectations for scientific publi...
PPT
How One Monkey on a Typewriter Made a Difference to Online Chemistry
PPT
Putting chemistry into the hands of students – chemistry made mobile using re...
PPT
PPTX
Microsoft Academic Search Overview at NFAIS 2012 - Lee Dirks
PPT
Prelim(2)(4)
Investigating Impact Metrics for Performance for the US-EPA National Center f...
Our dire need to mandate data standards and expectations for scientific publi...
How One Monkey on a Typewriter Made a Difference to Online Chemistry
Putting chemistry into the hands of students – chemistry made mobile using re...
Microsoft Academic Search Overview at NFAIS 2012 - Lee Dirks

Similar to Feeding and consuming data to support open notebook science via the chem spider platform (20)

PPT
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
PPT
eScience at the Royal Society of Chemistry and our current initiatives
PPT
eScience Resources for the Chemistry Community from the Royal Society of Chem...
PPT
The UK National Chemical Database Service – an integration of commercial and ...
PPT
Dealing with the complex challenge of managing diverse chemistry data online
PPT
Dealing with the complex challenge of managing diverse chemistry data online
PPTX
ChemValidator – an online service for validating and standardizing chemical s...
PPT
Hosting a compound centric community resource for chemistry data
PPT
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
PDF
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
PPT
ChemSpider Overview Presentation at Special Libraries Association
PPT
Big data challenges associated with building a national data repository for c...
PPT
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
PPT
ChemSpider reactions – delivering a free community resource of chemical synth...
PDF
ChemSpider as a hub for online chemical information resources
PPT
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
PPT
Open innovation contributions from RSC resulting from the Open Phacts project
PPT
Open innovation contributions from RSC resulting from the Open Phacts project
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
eScience at the Royal Society of Chemistry and our current initiatives
eScience Resources for the Chemistry Community from the Royal Society of Chem...
The UK National Chemical Database Service – an integration of commercial and ...
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
ChemValidator – an online service for validating and standardizing chemical s...
Hosting a compound centric community resource for chemistry data
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ChemSpider Overview Presentation at Special Libraries Association
Big data challenges associated with building a national data repository for c...
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider as a hub for online chemical information resources
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
Ad

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Network Security Unit 5.pdf for BCA BBA.
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
Dropbox Q2 2025 Financial Results & Investor Presentation
MIND Revenue Release Quarter 2 2025 Press Release
Review of recent advances in non-invasive hemoglobin estimation
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Network Security Unit 5.pdf for BCA BBA.
The AUB Centre for AI in Media Proposal.docx
Digital-Transformation-Roadmap-for-Companies.pptx
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Ad

Feeding and consuming data to support open notebook science via the chem spider platform

  • 1. Feeding and consuming data to support Open Notebook Science via the ChemSpider Platform Antony Williams, Jean-Claude Bradley, Andrew Lang and Valery Tkachenko ACS Philadelphia August 2012
  • 2. Setting the Stage  Chemists want access to tools and data  The more capabilities the better  The more data the better  And give us an API with that…  And it should be free…  And constantly updated…  And all data should be Open…  And make it fully Open Source…  And it needs to be on my mobile…
  • 3. Setting the Stage  Chemists have access to tools and data  The more capabilities the better – we’ll see  The more data the better – changing daily  And give us an API with that… - not just one  And it should be free… - sure  And constantly updated… - indeed..please help!  And all data should be Open…- licensing  And make it fully Open Source… - kinda, sorta  And it needs to be on my mobile… - sure
  • 4. Welcome to ChemSpider  5 years, 28 million chemicals, linking 400 data sources and growing daily  Hosted by the Royal Society of Chemistry  An important part of our long term strategic vision  Free to access  With lots/most/all (?) of the functionality necessary to support chemists and Open Notebook Science…
  • 9. Why Use ChemSpider? LINKING OUT
  • 17. Work in Progress – 300k Reactions
  • 18. Storing ONS Reactions  Working with JC Bradley to host ONS reactions  Linking directly back to ONS reactions  What if the links decay?  Host all related ONS data – benefits of Openness!  Future applications for RInChIs
  • 19. What we have been asked for  “Allow us to grab data”  “Let us link”  “Give us web services to integrate”  “Can we store our data with you?”  “Can you give us predictions to validate data?”
  • 20. What we have been asked for  “Allow us to grab data”  “Let us link”  “Give us web services to integrate”  “Can we store our data with you?”  “Can you give us predictions to validate data?”  “Can you build us an ELN?”
  • 21. Simple Linking to ChemSpider  Link using ChemSpiderID  http://www.chemspider.com/1234567
  • 23. Simple Querying Example  http:// www.chemspider.com/Search.aspx?q=InChIKey=XXO
  • 24. Or InChI, or SMILES  http://www.chemspider.com/Search.aspx?q=InChI=1S m1/s1  http://www.chemspider.com/Search.aspx? q=Clc1ccc(cc1)C(O)=C3C(=O)C(=O)N([C@@H]3 c2cccc(F)c2)CCc5c4ccccc4nc5
  • 25. Better to provide APIs….
  • 28. MANY Web Services for integration
  • 29. Feeding ONS Data into ChemSpider  ONS data can be deposited into ChemSpider and linked out to the ONS pages  Simply deposit structure(s) and links
  • 31. Feeding ONS Data into ChemSpider  ONS Solubility Challenge
  • 32. Feeding ONS Data into ChemSpider
  • 33. So isn’t ONS all about ELNs?  Open Notebook Science is about  Making records of research publicly available online as it is recorded  ONS is enabled by software tools and platforms  Keep the notebook of the researcher online with all raw and processed data as it is generated (close to or near real time)  Notebooks as Wikis, Commercial or Free ELNs published to the web (choose public/private – what data to expose)
  • 34. Feeding ELN Data into ChemSpider  Integrate e-Notebooks into ChemSpider  IDBS e-Workbook plug-in allows direct deposition of chemical structures  Can be extended to more ELN content  Spectra  Reactions  Properties etc.  Integration Video http://tinyurl.com/9xnprqr
  • 35. Feeding ELN Data into ChemSpider
  • 36. How much data is lost?  How many reactions in a thesis never get published?  How many spectra of common materials could be shared?  How many properties are measured and lost?  What stands in the way of sharing?  Is it technology?  Permissions? “The Boss”, Licensing?  And yes – there are data quality issues but there is algorithmic checking and data curation to help
  • 37. What could the future look like?  “Publicly funded” research data flows onto the web  Licensing is clear and NOT a challenge  Machines are picking up data and depositing  EXAMPLE project – Any interest?  Put your spectra/structure in folders (Dropbox)  ChemSpider robot scoops, processes and deposits – opportunity with JC Bradley  While processing also predicts spectra and compares for validation
  • 38. Leaving the Stage  Chemists have access to tools and data  The more capabilities the better – what’s missing?  The more data the better – anyone want to share?  And give us an API with that… - ask us for help  And it should be free… - it is  And constantly updated… - help annotate/curate  And all data should be Open…- licensing  And make it fully Open Source… - book chapter  And it needs to be on my mobile… - it is
  • 40. New URLs to try out  ChemSpider Reactions: www.chemspider.com/reactions  ChemSpider Validation and Standardization Platform: www.chemspider.com/cvsp  ChemSpider Google: www.chemspider.com/google
  • 43. Acknowledgments  RSC Cheminformatics team  JC Bradley’s lab  Daniel Lowe – reactions  Commercial Software – GGA Software, ACD/Labs, OpenEye  Open Source Components
  • 44. Thank you Email: williamsa@rsc.org Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams