SlideShare a Scribd company logo
Open Chemistry: Realizing Open Data, Open Standards and Open Source
                                                                                                                         Marcus D. Hanwell, Kyle Lutz, David Lonie, Chris Harris, and David Cole
                                                                                                               Website: http://openchemistry.org/                     Email: marcus.hanwell@kitware.com, kyle.lutz@kitware.com
                                                                                                                             Scientific Computing, Kitware, Inc, 28 Corporate Drive, Clifton Park, NY 12065.


                                                Avogadro                                                                                                            Open Chemistry                                                                                        Chemical Data Explorer
The Avogadro project is a cross-platform, open-source approach to building chemical                                       The Open Chemistry project is developing a suite of applications and support libraries                         The Chemical Data Explorer is an cross-platform, open-source application that
structures. It uses external simulation packages in addition to integrated analysis and                                   to improve the workflow in computational chemistry, biology, materials science and                              builds on the capabilities of the Visualization Toolkit, Qt and MongoDB. It can
visualization routines. The work presented here illustrates a workflow for quantum                                         related areas. A set of open, connected components that can tackle small problems                              connect to a local or remote database, ingest new data from various sources and
mechanical calculations, allowing the preparation of chemical structures, rough                                           on the desktop, and big research projects requiring significant time on the world’s top                         make that data semantically rich. It can apply informatics techniques to the data
optimization, and subsequent calculation of electron density isosurfaces, molecular                                       supercomputers.                                                                                                it contains to search for structures with particular properties. Work is ongoing to
orbitals, etc.                                                                                                                                                                                                                           more tightly integrate computational job storage and search.
                                                                                                                                               Log File                                                          Input File
                                                                                                                                                                             Simulation



                                                                                                                                                Results                      Informatics                     Job Submission




                                                                                                                                                                          HPC integration


                                                                                                                                                    Local                                                      Cloud
                                                                                                                                                                            Supercomputer



                                                                                                                            Figure 5: The workflow that the Open Chemistry components are being developed for.



  Figure 1: Avogadro application (left), ray-traced molecule (center) and the periodic table widget (right).                                                              OpenQube
Avogadro allows the user to prepare jobs for quantum packages, such as NWChem,                                            OpenQube is a small, open-source C++ library that reads key quantum data from                                    Figure 3: The user interface showing a query and structures (top-left), a scatter plot matrix (top-right), scatter
GAMESS, Gaussian and Q-Chem. Due to the plugin-based nature of the Avogadro                                               calculations produced by codes such as NWChem, GAMESS and Gaussian. It can                                       plot with tooltip (bottom-left), and K-means clustering (bottom-right).
project, many specialized functions can be added for a large range of applications,                                       read in basis sets, eigenvectors and density matrices, and calculate the magnitude
such as molecular docking, surface modeling and electronic structure.                                                     of the molecular orbitals and electron density on regularly-spaced grids. The data
                                                                                                                          produced can be used for further analysis and visualization of electronic structure.
                                                                                                                                                                                                                                                             Visualization Toolkit and ParaView
                                                                                                                                                                                                                                         The Visualization Toolkit (VTK) is an open-source, C++ toolkit for 2D and
                                              MoleQueue                                                                                                                    Chemkit                                                       3D graphics, volume rendering, image processing, visualization and modeling.
The MoleQueue application provides a graphical interface that integrates high-                                                                                                                                                           Development began in 1993, and it now has a large community of developers
                                                                                                                          Chemkit is an open-source, C++ library for molecular modeling, cheminformatics,
performance computing (HPC) resources on the desktop. It offers a seamless                                                                                                                                                                distributed around the world in a diverse set of fields. VTK processes data using
                                                                                                                          and molecular visualization. It features a modular, plugin-based architecture and
integration layer for applications, such as Avogadro, to submit jobs to local and                                                                                                                                                        a data flow graph (pipeline) in which each algorithm takes zero or more inputs
                                                                                                                          includes over 40 plugins that implement 15 file formats, 6 line formats, 4 force-fields,
remote computational resources. Job lifetime is managed by MoleQueue, and results                                                                                                                                                        and produces zero or more outputs. VTK is scalable to large data because it has
                                                                                                                          2 partial charge models, 2 aromaticity models, 8 atom typers and 30 molecular
can be opened in any external program.                                                                                                                                                                                                   distributed algorithms that use MPI to execute on large computing clusters.
                                                                                                                          descriptors. In addition, Chemkit includes an integrated visualization library built
                                                                                                                          on OpenGL/Qt, with Python bindings for easy scripting.




                                                                                                                                                                                                                                           Figure 4: Volume rendered molecular orbital with sliced contour (left), and library dependency graph (right).


                                                                                                                            Figure 6: Cartoon rendering of protein (left), surface rendering (center), and molecule rendering (right).
                                                                                                                                                                                                                                         ParaView is an open-source, cross-platform data analysis and visualization
                                                                                                                                                                                                                                         application. It is one of the flagship open-source projects developed by Kitware,
  Figure 2: The MoleQueue program configuration dialog for a PBS remote system.
                                                                                                                                                                  Software Process                                                       building on VTK and Qt to provide a client-server application that allows users
• Graphical configuration of queues and programs                                                                           These projects are open-source, targeting multiple platforms and architectures. A                              to quickly build visualizations to analyze their data. ParaView was developed to
                                                                                                                          quality-inducing software process is employed using best-of-breed technologies such                            analyze extremely large data sets using distributed memory computing resources.
• Support for Sun Grid Engine, PBS and running calculations locally
                                                                                                                          as Git for distributed version control, Gerrit for code review, CMake for cross-                               It can be used interactively with the cross-platform GUI, or scripted from Python.
• JSON-RPC protocol for interprocess communication over local sockets or ZeroMQ                                                                                                                                                          VTK and ParaView are being augmented with additional functionality for chemistry
                                                                                                                          platform building, CTest for unit/regression testing and CDash for software quality
• C++ and Python client libraries                                                                                         feedback. Most code is BSD licensed, and designed with reuse in mind.                                          through projects such as the Google Summer of Code and Open Chemistry.

More Related Content

PDF
The Open Chemistry Project
PDF
Open Chemistry: Input Preparation, Data Visualization & Analysis
PDF
Avogadro: Open Source Libraries and Application for Computational Chemistry
PDF
Avogadro, Open Chemistry and Semantics
PDF
Open Source Visualization of Scientific Data
PDF
Oscon 2011 Practicing Open Science
PDF
Chemical Databases and Open Chemistry on the Desktop
PDF
Avogadro 2 and Open Chemistry
The Open Chemistry Project
Open Chemistry: Input Preparation, Data Visualization & Analysis
Avogadro: Open Source Libraries and Application for Computational Chemistry
Avogadro, Open Chemistry and Semantics
Open Source Visualization of Scientific Data
Oscon 2011 Practicing Open Science
Chemical Databases and Open Chemistry on the Desktop
Avogadro 2 and Open Chemistry

What's hot (20)

PDF
Evolution of database access technologies in Java-based software projects
PDF
LDV: Light-weight Database Virtualization
PDF
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
PDF
Bioinformatics on Azure
PDF
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
PPTX
PTU: Using Provenance for Repeatability
PDF
GEN: A Database Interface Generator for HPC Programs
PDF
Ipaw14 presentation Quan, Tanu, Ian
PPTX
ExSchema - ICSM'13
PDF
GlobusWorld 2015
PPTX
Tim Pugh-SPEDDEXES 2014
PDF
Reproducible Workflow with Cytoscape and Jupyter Notebook
PDF
07 data structures_and_representations
PDF
GeoDataspace: Simplifying Data Management Tasks with Globus
PPTX
Volunteer Computing using BOINC
PDF
Open-source from/in the enterprise: the RDKit
PPTX
Cluster Computing Web2 Sept2009
PPTX
EUDAT Generic Execution Framework
PDF
ExSciTecH: Expanding Volunteer Computing to Explore Science, Technology, and ...
PDF
Using publicly available resources to build a comprehensive knowledgebase of ...
Evolution of database access technologies in Java-based software projects
LDV: Light-weight Database Virtualization
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Bioinformatics on Azure
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
PTU: Using Provenance for Repeatability
GEN: A Database Interface Generator for HPC Programs
Ipaw14 presentation Quan, Tanu, Ian
ExSchema - ICSM'13
GlobusWorld 2015
Tim Pugh-SPEDDEXES 2014
Reproducible Workflow with Cytoscape and Jupyter Notebook
07 data structures_and_representations
GeoDataspace: Simplifying Data Management Tasks with Globus
Volunteer Computing using BOINC
Open-source from/in the enterprise: the RDKit
Cluster Computing Web2 Sept2009
EUDAT Generic Execution Framework
ExSciTecH: Expanding Volunteer Computing to Explore Science, Technology, and ...
Using publicly available resources to build a comprehensive knowledgebase of ...
Ad

Similar to Open Chemistry: Realizing Open Data, Open Standards, and Open Source (20)

PDF
Open Source Cheminformatics
PDF
All applications
PPTX
Microsoft HPC User Group
PDF
Data-intensive profile for the VAMDC
PDF
Labmatrix Slides 2011 05
PDF
UNM Division of Biocomputing public web applications
PPT
Collaboration and Sharing
PDF
Catmandu / LibreCat Project
PPTX
Sgg crest-presentation-final
PPTX
Granatum_LiSIs_BIBE_2012_presentation_v4.0
PDF
Open source
PPTX
Big data ppt
PPTX
Centralizing sequence analysis
PDF
Cobb u mass_neal_e_science_v06
PDF
Linking the silos. Data and predictive models integration in toxicology.
PDF
Chemogenomics in the cloud: Is the sky the limit?
PDF
MarvinSketch and MarvinView: Tips And Tricks: US UGM 2008
PDF
Data Processing in the Work of NoSQL? An Introduction to Hadoop
PDF
Cloud Biocep
PDF
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Open Source Cheminformatics
All applications
Microsoft HPC User Group
Data-intensive profile for the VAMDC
Labmatrix Slides 2011 05
UNM Division of Biocomputing public web applications
Collaboration and Sharing
Catmandu / LibreCat Project
Sgg crest-presentation-final
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Open source
Big data ppt
Centralizing sequence analysis
Cobb u mass_neal_e_science_v06
Linking the silos. Data and predictive models integration in toxicology.
Chemogenomics in the cloud: Is the sky the limit?
MarvinSketch and MarvinView: Tips And Tricks: US UGM 2008
Data Processing in the Work of NoSQL? An Introduction to Hadoop
Cloud Biocep
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Ad

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Cloud computing and distributed systems.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
20250228 LYD VKU AI Blended-Learning.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25-Week II
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectral efficient network and resource selection model in 5G networks
sap open course for s4hana steps from ECC to s4
Per capita expenditure prediction using model stacking based on satellite ima...
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
Cloud computing and distributed systems.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Open Chemistry: Realizing Open Data, Open Standards, and Open Source

  • 1. Open Chemistry: Realizing Open Data, Open Standards and Open Source Marcus D. Hanwell, Kyle Lutz, David Lonie, Chris Harris, and David Cole Website: http://openchemistry.org/ Email: marcus.hanwell@kitware.com, kyle.lutz@kitware.com Scientific Computing, Kitware, Inc, 28 Corporate Drive, Clifton Park, NY 12065. Avogadro Open Chemistry Chemical Data Explorer The Avogadro project is a cross-platform, open-source approach to building chemical The Open Chemistry project is developing a suite of applications and support libraries The Chemical Data Explorer is an cross-platform, open-source application that structures. It uses external simulation packages in addition to integrated analysis and to improve the workflow in computational chemistry, biology, materials science and builds on the capabilities of the Visualization Toolkit, Qt and MongoDB. It can visualization routines. The work presented here illustrates a workflow for quantum related areas. A set of open, connected components that can tackle small problems connect to a local or remote database, ingest new data from various sources and mechanical calculations, allowing the preparation of chemical structures, rough on the desktop, and big research projects requiring significant time on the world’s top make that data semantically rich. It can apply informatics techniques to the data optimization, and subsequent calculation of electron density isosurfaces, molecular supercomputers. it contains to search for structures with particular properties. Work is ongoing to orbitals, etc. more tightly integrate computational job storage and search. Log File Input File Simulation Results Informatics Job Submission HPC integration Local Cloud Supercomputer Figure 5: The workflow that the Open Chemistry components are being developed for. Figure 1: Avogadro application (left), ray-traced molecule (center) and the periodic table widget (right). OpenQube Avogadro allows the user to prepare jobs for quantum packages, such as NWChem, OpenQube is a small, open-source C++ library that reads key quantum data from Figure 3: The user interface showing a query and structures (top-left), a scatter plot matrix (top-right), scatter GAMESS, Gaussian and Q-Chem. Due to the plugin-based nature of the Avogadro calculations produced by codes such as NWChem, GAMESS and Gaussian. It can plot with tooltip (bottom-left), and K-means clustering (bottom-right). project, many specialized functions can be added for a large range of applications, read in basis sets, eigenvectors and density matrices, and calculate the magnitude such as molecular docking, surface modeling and electronic structure. of the molecular orbitals and electron density on regularly-spaced grids. The data produced can be used for further analysis and visualization of electronic structure. Visualization Toolkit and ParaView The Visualization Toolkit (VTK) is an open-source, C++ toolkit for 2D and MoleQueue Chemkit 3D graphics, volume rendering, image processing, visualization and modeling. The MoleQueue application provides a graphical interface that integrates high- Development began in 1993, and it now has a large community of developers Chemkit is an open-source, C++ library for molecular modeling, cheminformatics, performance computing (HPC) resources on the desktop. It offers a seamless distributed around the world in a diverse set of fields. VTK processes data using and molecular visualization. It features a modular, plugin-based architecture and integration layer for applications, such as Avogadro, to submit jobs to local and a data flow graph (pipeline) in which each algorithm takes zero or more inputs includes over 40 plugins that implement 15 file formats, 6 line formats, 4 force-fields, remote computational resources. Job lifetime is managed by MoleQueue, and results and produces zero or more outputs. VTK is scalable to large data because it has 2 partial charge models, 2 aromaticity models, 8 atom typers and 30 molecular can be opened in any external program. distributed algorithms that use MPI to execute on large computing clusters. descriptors. In addition, Chemkit includes an integrated visualization library built on OpenGL/Qt, with Python bindings for easy scripting. Figure 4: Volume rendered molecular orbital with sliced contour (left), and library dependency graph (right). Figure 6: Cartoon rendering of protein (left), surface rendering (center), and molecule rendering (right). ParaView is an open-source, cross-platform data analysis and visualization application. It is one of the flagship open-source projects developed by Kitware, Figure 2: The MoleQueue program configuration dialog for a PBS remote system. Software Process building on VTK and Qt to provide a client-server application that allows users • Graphical configuration of queues and programs These projects are open-source, targeting multiple platforms and architectures. A to quickly build visualizations to analyze their data. ParaView was developed to quality-inducing software process is employed using best-of-breed technologies such analyze extremely large data sets using distributed memory computing resources. • Support for Sun Grid Engine, PBS and running calculations locally as Git for distributed version control, Gerrit for code review, CMake for cross- It can be used interactively with the cross-platform GUI, or scripted from Python. • JSON-RPC protocol for interprocess communication over local sockets or ZeroMQ VTK and ParaView are being augmented with additional functionality for chemistry platform building, CTest for unit/regression testing and CDash for software quality • C++ and Python client libraries feedback. Most code is BSD licensed, and designed with reuse in mind. through projects such as the Google Summer of Code and Open Chemistry.