SlideShare a Scribd company logo
ChemSpider – Hosting, Linking and Curating Chemistry Data for the Community  Valery Tkachenko SLA Meeting, June 2011
Chemistry on the Internet 100s  of websites hosting chemistry-related data Chemistry information is generally “compound-based” Chemical “structures” Identifiers, names and synonyms Properties Analytical data How to synthesize Articles, patents, safety information Chemistry “language and dialects”
Dialects describing chemicals
A Pragmatic Vision “ Build a Structure Centric Community” Integrate chemistry across the internet based on “chemical structure” A “structure-based hub” to information and data Let chemists  contribute  their own data Allow the community to  curate & annotate   data
www.chemspider.com
Answering Questions for  Chemists Questions a chemist might ask… What is the melting point of n-heptanol?  What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Aspirin? What is the NMR spectrum of Benzoic Acid? What are the safety handling issues for toluene?
Search for a Chemical…by name
Available Information… Linked to chemical vendors, safety data, toxicity, metabolism…
Available Information….
ChemSpider Today Over  26 million  unique chemicals Over  420  data sources Grows daily – community and RSC depositions Community annotation and curation We  curate, edit, change, enhance  data daily
Three Years of Experience Internet-based chemistry is a  mess ! Public compound databases are  contaminated The annotation/curation of data online is difficult Most database hosts are non-responsive to feedback – “We are a host/repository of data” Who cares ? We all should!!!
Linked Data on the Web
Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications  Compound aggregators Blogs/Wikis and Open Notebook Science
What is the Structure of Vitamin K1?
What is the Structure of Vitamin K1?
Chemical Abstracts “Common Chemistry” Database
Wikipedia
 
 
Internet-Based Chemistry is a Mess Algorithms can get you so far Human curation is necessary Only the  crowds  can help with big data… ChemSpider is over 26 million compounds Imagine if we worked together to create a centralized validated structure-name dictionary! Enhances text-mining, searching, linking…
Search “Vitamin H”
Search “Vitamin H”
“ Curate” Identifiers
“ Curate” Identifiers
“ Curate” Identifiers
Crowd-sourcing Chemistry Curation Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
“ Curate” Identifiers General curation activities Remove incorrect names Correct spellings Add multilingual names Add alternative names In 3 years over  1 million  structure-identifier relationships have been validated – robotically and manually 130 people have participated in validation or annotation. “ Crowds ” can be quite small!
Vancomycin –  Curate This!!!
Vancomycin on ChemSpider  1 compound – 3 days
Crowdsourced “Annotations” Users can add  Descriptions/Syntheses/Commentaries Links to articles Spectral data Photos MP3 files Videos
Multimedia Content Holder
Gaming for Validation of Spectra
Crowdsourced Validation of Spectra
“ Game-based” Validation of Data
ChemSpider SyntheticPages
Sharing Our Activities Presently defining approaches with other public compound databases to share results of curation activities Member of large European project to link data from the Life Sciences. Sharing results of curation is essential Making curation and contribution interfaces Mobile.
Thank you Email: williamsa@rsc.org  Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

More Related Content

PPT
PPT
ChemSpider Overview Presentation at Special Libraries Association
PPT
Taming The Wild West Of Internet Based Chemistry You Can Help
PPT
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...

What's hot (19)

PPT
PPT
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
PPT
Crowdsourcing Chemistry for the Community – 5 Years of Experiences
PPT
How the web has weaved a web of interlinked chemistry data final
PPT
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
PPT
PPT
PPT
The Great Promise of Online Data for Chemistry and the Life Sciences
PPT
Structure verification and elucidation using the ChemSpider database
PPT
Talk_linked_data_for_hcls_at_iswc2009
PPT
How Internet Resources Are Providing a Collaborative Community for Chemistry
PPT
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
PPT
Supporting the exploding dimensions of the chemical sciences via global netwo...
PPT
Building a semantic chemistry platform with the royal society of chemistry
PPT
Dealing with the complex challenge of managing diverse analytical chemistry d...
PPT
Ebi public meeting on internet chemistry databases november 2010
PPT
Connecting Chemists To The Internet Training at Burlington House 2010
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
Crowdsourcing Chemistry for the Community – 5 Years of Experiences
How the web has weaved a web of interlinked chemistry data final
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
The Great Promise of Online Data for Chemistry and the Life Sciences
Structure verification and elucidation using the ChemSpider database
Talk_linked_data_for_hcls_at_iswc2009
How Internet Resources Are Providing a Collaborative Community for Chemistry
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
Supporting the exploding dimensions of the chemical sciences via global netwo...
Building a semantic chemistry platform with the royal society of chemistry
Dealing with the complex challenge of managing diverse analytical chemistry d...
Ebi public meeting on internet chemistry databases november 2010
Connecting Chemists To The Internet Training at Burlington House 2010
Ad

Viewers also liked (7)

PDF
Prototyping is an attitude
PDF
10 Insightful Quotes On Designing A Better Customer Experience
PDF
Learn BEM: CSS Naming Convention
PPTX
How to Build a Dynamic Social Media Plan
PDF
SEO: Getting Personal
PDF
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
PDF
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Prototyping is an attitude
10 Insightful Quotes On Designing A Better Customer Experience
Learn BEM: CSS Naming Convention
How to Build a Dynamic Social Media Plan
SEO: Getting Personal
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Ad

Similar to ChemSpider hosting linking and curating chemistry data for the community (20)

PPT
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
PPT
AZ of Chemspider February 2011
PPT
Crowdsourced Curation of Chemistry Data. How Bad is Online Chemistry Data?
PPTX
RSC ChemSpider – Building An Internet Based Community For Chemists
PPT
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
PPT
Integrating and curating internet based chemistry resources to serve life sci...
PPT
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
PPT
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
PPT
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
PPT
How Community Crowdsourcing and Social Networking is Helping to Build a Quali...
PPT
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
PPT
PPT
Chem spider introduction spring 2011
PPT
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
PPT
ChemSpider as an integration hub for interlinked chemistry data
PPT
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
PPT
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
PPT
Crowdsourcing, Collaborations And Text Mining In A World Of Open Chemistry
ChemSpider as a Platform for Crowd Participation in Curating Chemistry
AZ of Chemspider February 2011
Crowdsourced Curation of Chemistry Data. How Bad is Online Chemistry Data?
RSC ChemSpider – Building An Internet Based Community For Chemists
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
Integrating and curating internet based chemistry resources to serve life sci...
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
A Presentation At Nature Publishing Group Crowdsourcing, Collaborations And T...
How Community Crowdsourcing and Social Networking is Helping to Build a Quali...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
Chem spider introduction spring 2011
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
ChemSpider as an integration hub for interlinked chemistry data
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider as a Foundation for Crowdsourcing and Collaborations in Open Chemi...
Crowdsourcing, Collaborations And Text Mining In A World Of Open Chemistry

Recently uploaded (20)

PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
1. Introduction to Computer Programming.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Hybrid model detection and classification of lung cancer
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
A Presentation on Artificial Intelligence
PPTX
Tartificialntelligence_presentation.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Approach and Philosophy of On baking technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
August Patch Tuesday
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
Chapter 5: Probability Theory and Statistics
A novel scalable deep ensemble learning framework for big data classification...
Digital-Transformation-Roadmap-for-Companies.pptx
A comparative analysis of optical character recognition models for extracting...
1. Introduction to Computer Programming.pptx
Heart disease approach using modified random forest and particle swarm optimi...
Getting Started with Data Integration: FME Form 101
Programs and apps: productivity, graphics, security and other tools
Hybrid model detection and classification of lung cancer
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
TLE Review Electricity (Electricity).pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
A Presentation on Artificial Intelligence
Tartificialntelligence_presentation.pptx
WOOl fibre morphology and structure.pdf for textiles
Approach and Philosophy of On baking technology
Building Integrated photovoltaic BIPV_UPV.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
August Patch Tuesday
Hindi spoken digit analysis for native and non-native speakers
Chapter 5: Probability Theory and Statistics

ChemSpider hosting linking and curating chemistry data for the community

  • 1. ChemSpider – Hosting, Linking and Curating Chemistry Data for the Community Valery Tkachenko SLA Meeting, June 2011
  • 2. Chemistry on the Internet 100s of websites hosting chemistry-related data Chemistry information is generally “compound-based” Chemical “structures” Identifiers, names and synonyms Properties Analytical data How to synthesize Articles, patents, safety information Chemistry “language and dialects”
  • 4. A Pragmatic Vision “ Build a Structure Centric Community” Integrate chemistry across the internet based on “chemical structure” A “structure-based hub” to information and data Let chemists contribute their own data Allow the community to curate & annotate data
  • 6. Answering Questions for Chemists Questions a chemist might ask… What is the melting point of n-heptanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Aspirin? What is the NMR spectrum of Benzoic Acid? What are the safety handling issues for toluene?
  • 7. Search for a Chemical…by name
  • 8. Available Information… Linked to chemical vendors, safety data, toxicity, metabolism…
  • 10. ChemSpider Today Over 26 million unique chemicals Over 420 data sources Grows daily – community and RSC depositions Community annotation and curation We curate, edit, change, enhance data daily
  • 11. Three Years of Experience Internet-based chemistry is a mess ! Public compound databases are contaminated The annotation/curation of data online is difficult Most database hosts are non-responsive to feedback – “We are a host/repository of data” Who cares ? We all should!!!
  • 12. Linked Data on the Web
  • 13. Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science
  • 14. What is the Structure of Vitamin K1?
  • 15. What is the Structure of Vitamin K1?
  • 16. Chemical Abstracts “Common Chemistry” Database
  • 18.  
  • 19.  
  • 20. Internet-Based Chemistry is a Mess Algorithms can get you so far Human curation is necessary Only the crowds can help with big data… ChemSpider is over 26 million compounds Imagine if we worked together to create a centralized validated structure-name dictionary! Enhances text-mining, searching, linking…
  • 26. Crowd-sourcing Chemistry Curation Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 27. “ Curate” Identifiers General curation activities Remove incorrect names Correct spellings Add multilingual names Add alternative names In 3 years over 1 million structure-identifier relationships have been validated – robotically and manually 130 people have participated in validation or annotation. “ Crowds ” can be quite small!
  • 28. Vancomycin – Curate This!!!
  • 29. Vancomycin on ChemSpider 1 compound – 3 days
  • 30. Crowdsourced “Annotations” Users can add Descriptions/Syntheses/Commentaries Links to articles Spectral data Photos MP3 files Videos
  • 32. Gaming for Validation of Spectra
  • 36. Sharing Our Activities Presently defining approaches with other public compound databases to share results of curation activities Member of large European project to link data from the Life Sciences. Sharing results of curation is essential Making curation and contribution interfaces Mobile.
  • 37. Thank you Email: williamsa@rsc.org Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams