SlideShare a Scribd company logo
Delivering on the promise of a 
chemistry data repository 
for the world 
Antony Williams 
Going Native Panel Discussion at the Microsoft eScience Workshop 
0000-0002-2668-4821
A Question to Start… 
• Who in the room has an ORCID?
New Horizons…. 
• Let’s map together all historical chemistry data 
and build systems to integrate it 
• Heck, let’s integrate chemistry and biology 
data and add in disease data too 
• Let’s model the data and see if we can extract 
new relationships – quantitative and qualitative 
• Let’s take what we learn from historical data 
and build better solutions for modern data 
• Let’s make it all available on the web…
Delivering on the promise of a chemistry data repository for the world
What about this…. 
• We’re going to map the world 
• We’re going to take photos of as many places 
as we can and link them together 
• We’ll let people annotate and curate the map 
• Then let’s make it available free on the web 
• We’ll make it available for decision making 
• Put it on Mobile Devices, give it away…
Chemistry data is of value? 
• Reference databases generate hundreds of 
millions of dollars/euros per year 
• So much data generated that could go public 
• Maybe 5% of all data generated is published 
• There is no “Journal of Failed Experiments” 
• Funding agencies start to demand Open Data 
• Scientists want funding but also recognition
A shift to Openness
Open Data is here…
Chemistry data is of value? 
• Reference databases generate hundreds of 
millions of dollars/euros per year 
• So much data generated that could go public 
• Maybe 5% of all data generated is published 
• There is no “Journal of Failed Experiments” 
• Funding agencies start to demand Open Data 
• Scientists want funding but also recognition 
• …so who will fund and build the platforms?
Going Native… speaka da 
lingo 
Chemists clearly benefit from accessing data
Delivering on the promise of a chemistry data repository for the world
What we found… 
• Data quality on the internet can be very poor 
• Everyone wants access to high quality data 
but very few are willing to contribute 
• The primary concerns for contributors 
• It needs to be easy 
• Data licensing 
• Recognition for contributions
Recognition: need to have Impact
Quantitating scientists?
National Information Standards 
Organization and “Altmetrics” 
http://www.niso.org/apps/group_public/download.php/13295/niso_altmetrics_white_paper_draft_v4.pdf
Research Outputs 
• Blogs 
• Research datasets 
• Scientific software 
• Posters and presentations at conferences 
• Electronic theses and dissertations 
• Performances in film and audio 
• Lectures, online classes and teaching activities
Recognizing Contribution 
• In order to encourage participation maybe 
we need to provide recognition of impact 
• How do we measure impact for: 
• Performing peer review? 
• Contributions to more “public platforms”?...
Christmas Curating Wikipedia
Wikipedia Chemboxes 
• http://en.wikipedia.org/wiki/Glucose 
19
Three days of discussion
Three days of discussion 
• If you want to understand Wikipedia 
definitely Go Native and get involved!
Does ONE bond matter???
A short intro to chirality
A short intro to chirality
Educating chemists in data 
• Chemists are more likely to know basic HTML 
over data formats in chemistry 
• Even international standards for data 
interchange and standardization are unknown 
• Standards are ideal for computers to handle
Can we MAKE Quality Data? 
• We are building systems for everyone to 
validate and standardize their data
Where to host research data? 
• Containers for chemical compounds, chemical 
reactions, analytical data, tabular data, etc. 
• Algorithms for data validation and standardization 
• Domain specific search technologies 
• A platform for modeling data 
• Progressing the RSC Data Repository…
Compounds
Reactions
Analytical data
Generating models from data
New Horizons….are here 
• Let’s map together all historical chemistry data 
and build systems to integrate it 
• Heck, let’s integrate chemistry and biology data 
and add in disease data too 
• Let’s model the data and see if we can extract 
new relationships – quantitative and qualitative 
• Let’s take what we learn from historical data 
and build better solutions for modern data 
• Let’s make it all available on the web…
So we DON’T have to do this…
EXTRACTED 
FIGURE 
ORIGINAL 
FIGURE
The path forward 
• Mesh and aggregate published data 
• Encourage deposition of RESEARCH data – 
that will never be published 
• Provide open APIs for data access 
• Educate chemists in digital literacy 
• Funding agencies should mandate data access 
• Collaboration is key – don’t do it alone
Thank you 
Email: williamsa@rsc.org 
ORCID: 0000-0002-2668-4821 
Twitter: @ChemConnector 
Personal Blog: www.chemconnector.com 
SLIDES: www.slideshare.net/AntonyWilliams

More Related Content

PPTX
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
PPTX
Merging the ideal with the real
PDF
Henderson "Institutional Identifiers"
PPTX
Herzog Building New Faculty Services: Altmetric Adoption
PPTX
Responsible metrics for research - Jisc Digifest 2016
PPTX
ICSTI TACC 2014: How Mendeley Illuminates a Broader Definition of Impact
PPT
Enrico Bisogno - United Nations Office on Drugs and Crime (UNODC)
PPTX
UCL’s research IT management systems architecture review aligned with Open Sc...
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Merging the ideal with the real
Henderson "Institutional Identifiers"
Herzog Building New Faculty Services: Altmetric Adoption
Responsible metrics for research - Jisc Digifest 2016
ICSTI TACC 2014: How Mendeley Illuminates a Broader Definition of Impact
Enrico Bisogno - United Nations Office on Drugs and Crime (UNODC)
UCL’s research IT management systems architecture review aligned with Open Sc...

What's hot (20)

PDF
How to support decisions with online collaborative models?
ODP
What are we? Statistical Ecologists or Ecological Statisticians?
PPTX
Practical applications for altmetrics in a changing metrics landscape
ODP
Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...
PPTX
Introduction to ADA
PDF
Navigating the data management ecosystem - Dan Valen
PPTX
National archetype governance in Norway
PDF
Pre processing big data
PPTX
Research information management: making sense of it all
PPTX
Whitehead Seminar 5/2
PPTX
Navigating the data management ecosystem - John Kratz
PDF
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
PPTX
What researchers want with regard to research data management (RDM)
PPTX
Integrated research management system at Edinburgh Napier University
PDF
Leave no research data behind: unlocking the potential of every byte
PPTX
Data Citation and DOIs
PPTX
Questions for knowledge creators
PDF
Secure360 May 2018 Lessons Learned from OWASP T10 Datacall
PPTX
Persisting the fabric of the research ecosystem
PPTX
Architecture and Standards
How to support decisions with online collaborative models?
What are we? Statistical Ecologists or Ecological Statisticians?
Practical applications for altmetrics in a changing metrics landscape
Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...
Introduction to ADA
Navigating the data management ecosystem - Dan Valen
National archetype governance in Norway
Pre processing big data
Research information management: making sense of it all
Whitehead Seminar 5/2
Navigating the data management ecosystem - John Kratz
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
What researchers want with regard to research data management (RDM)
Integrated research management system at Edinburgh Napier University
Leave no research data behind: unlocking the potential of every byte
Data Citation and DOIs
Questions for knowledge creators
Secure360 May 2018 Lessons Learned from OWASP T10 Datacall
Persisting the fabric of the research ecosystem
Architecture and Standards
Ad

Viewers also liked (20)

PDF
Von Bottleplot bis Wine.Woot! - Neue Vermarktungsmodelle im Social Web
ZIP
Balet
PDF
Dufour and Soret Effects on Convective Heat and Mass Transfer in Non-Darcy Do...
PDF
PPTX
Webinar: Reach out to over 1 Billion Internet Users with .ASIA Domains!
PPTX
Ciencias sociales
PDF
4 140925152050-phpapp01
PDF
Iούλιος βερν η μυστηριώδης διαθήκη
PDF
On Some Double Integrals of H -Function of Two Variables and Their Applications
PDF
1gara
DOCX
Akhabatullah terjemah simtuddurror
PPTX
arq Pc2
PDF
The walkingdead brasil-003_pt
PPS
The Heart Of Anatolia (Düzce)
PDF
9 manual alarme-olimpus-boxer-light
DOCX
[Aliran ushul fiqh] pembahasan rina
PDF
SDN Talks São Paulo
PDF
5 contatores e rêles
Von Bottleplot bis Wine.Woot! - Neue Vermarktungsmodelle im Social Web
Balet
Dufour and Soret Effects on Convective Heat and Mass Transfer in Non-Darcy Do...
Webinar: Reach out to over 1 Billion Internet Users with .ASIA Domains!
Ciencias sociales
4 140925152050-phpapp01
Iούλιος βερν η μυστηριώδης διαθήκη
On Some Double Integrals of H -Function of Two Variables and Their Applications
1gara
Akhabatullah terjemah simtuddurror
arq Pc2
The walkingdead brasil-003_pt
The Heart Of Anatolia (Düzce)
9 manual alarme-olimpus-boxer-light
[Aliran ushul fiqh] pembahasan rina
SDN Talks São Paulo
5 contatores e rêles
Ad

Similar to Delivering on the promise of a chemistry data repository for the world (20)

PPT
Big data challenges associated with building a national data repository for c...
PDF
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
PPT
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
PPT
Dealing with the complex challenge of managing diverse chemistry data online
PPT
Dealing with the complex challenge of managing diverse chemistry data online
PDF
ACS CINF Luncheon talk (Boston 2018)
PPT
Royal society of chemistry activities to develop a data repository for chemis...
PPT
Royal society of chemistry activities to develop a data repository for chemis...
PPT
Beyond the paper CV and developing a scientific profile through social media,...
PPT
The Possibilities and Pitfalls of Internet-Based Chemical Data
PPT
Royal Society of Chemistry projects underpinning open innovation
PPTX
Chemistryand web2 ma walker 2 5 10
PDF
The Global Chemistry Network - driving innovation
PPT
Challenging cajoling and rewarding the community for their contributions to o...
PPTX
Serving the medicinal chemistry community with Royal Society of Chemistry che...
PPT
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
PPT
Contributions to the World of eScience from the Royal Society of Chemistry
PPT
Open Notebook Science and One Future for Scientific Research
PPT
Slides for burroughs wellcome foundation ajw100611 sefinal
Big data challenges associated with building a national data repository for c...
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
ACS CINF Luncheon talk (Boston 2018)
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
Beyond the paper CV and developing a scientific profile through social media,...
The Possibilities and Pitfalls of Internet-Based Chemical Data
Royal Society of Chemistry projects underpinning open innovation
Chemistryand web2 ma walker 2 5 10
The Global Chemistry Network - driving innovation
Challenging cajoling and rewarding the community for their contributions to o...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
Contributions to the World of eScience from the Royal Society of Chemistry
Open Notebook Science and One Future for Scientific Research
Slides for burroughs wellcome foundation ajw100611 sefinal

Recently uploaded (20)

PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Comparative Structure of Integument in Vertebrates.pptx
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
neck nodes and dissection types and lymph nodes levels
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
ECG_Course_Presentation د.محمد صقران ppt
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Placing the Near-Earth Object Impact Probability in Context
Taita Taveta Laboratory Technician Workshop Presentation.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Comparative Structure of Integument in Vertebrates.pptx
Viruses (History, structure and composition, classification, Bacteriophage Re...
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
HPLC-PPT.docx high performance liquid chromatography
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
AlphaEarth Foundations and the Satellite Embedding dataset
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
bbec55_b34400a7914c42429908233dbd381773.pdf

Delivering on the promise of a chemistry data repository for the world

  • 1. Delivering on the promise of a chemistry data repository for the world Antony Williams Going Native Panel Discussion at the Microsoft eScience Workshop 0000-0002-2668-4821
  • 2. A Question to Start… • Who in the room has an ORCID?
  • 3. New Horizons…. • Let’s map together all historical chemistry data and build systems to integrate it • Heck, let’s integrate chemistry and biology data and add in disease data too • Let’s model the data and see if we can extract new relationships – quantitative and qualitative • Let’s take what we learn from historical data and build better solutions for modern data • Let’s make it all available on the web…
  • 5. What about this…. • We’re going to map the world • We’re going to take photos of as many places as we can and link them together • We’ll let people annotate and curate the map • Then let’s make it available free on the web • We’ll make it available for decision making • Put it on Mobile Devices, give it away…
  • 6. Chemistry data is of value? • Reference databases generate hundreds of millions of dollars/euros per year • So much data generated that could go public • Maybe 5% of all data generated is published • There is no “Journal of Failed Experiments” • Funding agencies start to demand Open Data • Scientists want funding but also recognition
  • 7. A shift to Openness
  • 8. Open Data is here…
  • 9. Chemistry data is of value? • Reference databases generate hundreds of millions of dollars/euros per year • So much data generated that could go public • Maybe 5% of all data generated is published • There is no “Journal of Failed Experiments” • Funding agencies start to demand Open Data • Scientists want funding but also recognition • …so who will fund and build the platforms?
  • 10. Going Native… speaka da lingo Chemists clearly benefit from accessing data
  • 12. What we found… • Data quality on the internet can be very poor • Everyone wants access to high quality data but very few are willing to contribute • The primary concerns for contributors • It needs to be easy • Data licensing • Recognition for contributions
  • 13. Recognition: need to have Impact
  • 15. National Information Standards Organization and “Altmetrics” http://www.niso.org/apps/group_public/download.php/13295/niso_altmetrics_white_paper_draft_v4.pdf
  • 16. Research Outputs • Blogs • Research datasets • Scientific software • Posters and presentations at conferences • Electronic theses and dissertations • Performances in film and audio • Lectures, online classes and teaching activities
  • 17. Recognizing Contribution • In order to encourage participation maybe we need to provide recognition of impact • How do we measure impact for: • Performing peer review? • Contributions to more “public platforms”?...
  • 19. Wikipedia Chemboxes • http://en.wikipedia.org/wiki/Glucose 19
  • 20. Three days of discussion
  • 21. Three days of discussion • If you want to understand Wikipedia definitely Go Native and get involved!
  • 22. Does ONE bond matter???
  • 23. A short intro to chirality
  • 24. A short intro to chirality
  • 25. Educating chemists in data • Chemists are more likely to know basic HTML over data formats in chemistry • Even international standards for data interchange and standardization are unknown • Standards are ideal for computers to handle
  • 26. Can we MAKE Quality Data? • We are building systems for everyone to validate and standardize their data
  • 27. Where to host research data? • Containers for chemical compounds, chemical reactions, analytical data, tabular data, etc. • Algorithms for data validation and standardization • Domain specific search technologies • A platform for modeling data • Progressing the RSC Data Repository…
  • 32. New Horizons….are here • Let’s map together all historical chemistry data and build systems to integrate it • Heck, let’s integrate chemistry and biology data and add in disease data too • Let’s model the data and see if we can extract new relationships – quantitative and qualitative • Let’s take what we learn from historical data and build better solutions for modern data • Let’s make it all available on the web…
  • 33. So we DON’T have to do this…
  • 35. The path forward • Mesh and aggregate published data • Encourage deposition of RESEARCH data – that will never be published • Provide open APIs for data access • Educate chemists in digital literacy • Funding agencies should mandate data access • Collaboration is key – don’t do it alone
  • 36. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams