SlideShare a Scribd company logo
Atomate: A High-level Interface to Generate, Execute, and
Analyze Computational Materials Science Workflows
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Lab
Berkeley, CA
TMS 2018
Slides (already) posted to: https://hackingmaterials.lbl.gov/
Input	file	flags	
SLURM	format	
how	to	fix	ZPOTRF?	
	
		
q  set	up	the	structure	coordinates	
q  write	input	files,	double-check	all	
the	flags	
q  copy	to	supercomputer	
q  submit	job	to	queue	
q  deal	with	supercomputer	
headaches	
q  monitor	job	
q  fix	error	jobs,	resubmit	to	queue,	
wait	again	
q  repeat	process	for	subsequent	
calculations	in	workflow	
q  parse	output	files	to	obtain	results	
q  copy	and	organize	results,	e.g.,	into	
Excel
2
A schematic of “materials genomics” approaches to
materials science
data
applications
methods
(theory,
ML)
software
implementation
3
Our group builds and maintain several
open-source software libraries
Data generation Data analysis
run and manage millions of computational
tasks over large computing resources	
library of FireWorks-compatible workflows
for materials science applications	
materials data retrieval, featurization,
and visualization for machine learning	
tools for crystal manipulation, data
analysis, and simulation software I/O
*led by Ong group, UCSD	
tools for inverse optimation / adaptive design –
ML chooses what calculations to run
4
This talk will focus on atomate and FireWorks
Data generation Data analysis
run and manage millions of computational
tasks over large computing resources	
library of FireWorks-compatible workflows
for materials science applications	
materials data retrieval, featurization,
and visualization for machine learning	
tools for crystal manipulation, data
analysis, and simulation software I/O
*led by Ong group, UCSD	
tools for inverse optimation / adaptive design –
ML chooses what calculations to run
Today, automated (“high-throughput”) calculations play an
important role in materials data generation
5
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier, Sci
Data 2017, 4, 170085.!
Today, automated (“high-throughput”) calculations play an
important role in materials data generation
6
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier, Sci
Data 2017, 4, 170085.!
Atomate’s goal: make
it easy to generate
comparable data sets
on your own
A “black-box” view of performing a calculation
7
“something”!
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?
Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
8
lots of tedious,
low-level work…!
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?	
Input	file	flags	
SLURM	format	
how	to	fix	ZPOTRF?	
	
		
q  set	up	the	structure	coordinates	
q  write	input	files,	double-check	all	
the	flags	
q  copy	to	supercomputer	
q  submit	job	to	queue	
q  deal	with	supercomputer	
headaches	
q  monitor	job	
q  fix	error	jobs,	resubmit	to	queue,	
wait	again	
q  repeat	process	for	subsequent	
calculations	in	workflow	
q  parse	output	files	to	obtain	results	
q  copy	and	organize	results,	e.g.,	into	
Excel
What would be a better way?
9
“something”!
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?
What would be a better way?
10
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?	
Workflows to run!
q  band structure!
q  surface energies!
ü  elastic tensor!
q  Raman spectrum!
q  QH thermal expansion!
Ideally the method should scale to millions of calculations
11
Results!!
researcher!
Start	with	all	binary	
oxides,	replace	O->S,	
run	several	different	
properties	
Workflows to run!
ü  band structure!
ü  surface energies!
ü  elastic tensor!
q  Raman spectrum!
q  QH thermal expansion!
q  spin-orbit coupling!
Atomate tries make it easy, automatic, and flexible to
generate data with existing simulation packages
12
Results!!
researcher!
Run	many	different	
properties	of	many	
different	materials!
Atomate contains a library of simulation procedures
13
VASP-based
•  band structure
•  spin-orbit coupling
•  hybrid functional
calcs
•  elastic tensor
•  piezoelectric tensor
•  Raman spectra
•  NEB
•  GIBBS method
•  QH thermal
expansion
•  AIMD
•  ferroelectric
•  surface adsorption
•  work functions
Other
•  BoltzTraP
•  FEFF method
•  LAMMPS MD
Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze
computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
Each simulation procedure translates high-level instructions
into a series of low-level tasks
14
quickly and automatically translate PI-style (minimal)
specifications into well-defined FireWorks workflows
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?	
M.	De	Jong,	W.	Chen,	T.	Angsten,	A.	Jain,	R.	Notestine,	A.	Gamst,	et	al.,	
Charting	the	complete	elastic	properties	of	inorganic	crystalline	compounds,	
Sci.	Data.	2	(2015).
Atomate thus encodes and standardizes knowledge about
running various kinds of simulations from domain experts
15
K. Mathew J. Montoya S. Dwaraknath A. Faghaninia
All past and present knowledge, from everyone in the group,
everyone previously in the group, and our collaborators,
about how to run calculations
M. Aykol
S.P. Ong
B. Bocklund T. Smidt
H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood
Z.K. Liu J. Neaton K. Persson A. Jain
+
16
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
17
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
•  Pymatgen can retrieve crystal
structures from the Materials
Project database (MPRester class)
•  It can also manipulate crystal
structures
–  substitutions
–  supercell creation
–  order-disorder (shown at right)
–  interstitial finding
–  surface / slab generation
•  A visual interface to many of the
tools are in Materials Project’s
“Crystal Toolkit” app
18
Crystal structure generation via pymatgen
Example: Order-disorder
resolve partial or mixed
occupancies into a fully
ordered crystal structure
(e.g., mixed oxide-fluoride site
into separate oxygen/fluorine)
19
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
20
Atomate’s main goal – convert structures to workflows
Workflows consist of a series of jobs (“FireWorks”), each
with multiple tasks. Atomate jobs typically (i) run a
calculation and (ii) store the results in a database
Workflow parameters can be customized at
multiple levels of detail
21
1.  Workflows have
various high-level
options
2. Fireworks also
have options / flags
(not shown)
3. Firetasks have
most detailed
number of options /
flags (not shown)
Example 1: “VASP input set” controls
the rules that set DFT parameters
(pseudopotentials, cutoffs, grid
densities, etc) via pymatgen!
!
Example II: If “stability_check” is
enabled, the later parts of the workflow
are skipped if the structure is
determined unstable.!
You can build workflows from scratch or reuse components
to assemble workflows
Multiple workflows are built with the same
components stacked together in different ways like
Legos
22
These two workflows reuse almost
all the same code between the
two!
23
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
•  Once you have the material and the simulation procedure (Workflow),
you need to actually execute the workflow on your computing resource
•  This includes tasks like:
–  submission to calculation queues
–  customization of any computing-specific parameters
•  e.g., path to VASP executable, number of CPUs to parallelize over
–  recovering from failures / job resubmission
–  coordinating jobs across computing centers
–  managing location of jobs
–  tracking the progress of jobs
•  Almost all of this is handled by FireWorks (custodian is used for
encoding fixes to typical errors e.g. VASP ZPOTRF error)
•  FireWorks is a mature software, used by dozens of research groups and
used to to run millions of simulations
24
Calculation execution with FireWorks
FireWorks allows you to write your workflow once and
execute (almost) anywhere
25
•  Execute workflows
locally or at a
supercomputing
center
•  Queue systems
supported
–  PBS
–  SGE
–  SLURM
–  IBM LoadLeveler
–  NEWT (a REST-based
API at NERSC)
–  Cobalt (Argonne LCF)
Dashboard with status of all jobs
26
Job provenance and automatic metadata storage
27
what	machine	
what	time	
what	directory	
	
what	was	the	output	
	
when	was	it	queued	
	
when	did	it	start	running	
	
when	was	it	completed
Detect and rerun failures
•  All kinds of failures can be detected and rerun
–  Soft failures (job quits with error code)
–  hard failures (computing center goes down)
–  human errors
28
“Dynamic workflows” let you program
intelligent, reactive workflows
29
Xiaohui can replace himself with
digital Xiaohui,
programmed into
FireWorks
Customize job priorities
•  Within workflow, or between workflows
•  Completely flexible and can be modified /
updated whenever you want
30
31
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
Atomate – builders
framework
32
“Builders” start with base
collections in a database and
create higher-level collections
that summarize information or
add metadata
33
The atomate database makes it easy to perform various
analyses with pymatgen
atomate output
database(s)!
phase
diagrams
Pourbaix
diagrams
diffusivity via MDband structure analysis
34
Many research groups have run tens of thousands of
materials science workflows with atomate
also used by:
•  Persson research group, UC Berkeley
•  Ong research group, UC San Diego
•  Neaton research group, UC Berkeley
•  Liu research group, Penn State
•  Groups not developing on atomate!
•  e.g., see “Thermal expansion of quaternary nitride coatings” by
Tasnadi et al.
atomate now powers the Materials
Project and will be used to run
hundreds of thousands of
simulations in the next year
(www.materialsproject.org)
•  Link to code:
–  https://www.github.com/hackingmaterials/atomate
•  License: BSD
–  open-source, can be used with commercial software
–  like MIT license but clause to not misuse the Berkeley Lab
name, e.g. for advertising purposes
•  Help and support
–  https://groups.google.com/forum/#!forum/atomate
•  Citation with further information:
–  Mathew, K. et al. Atomate: A high-level interface to
generate, execute, and analyze computational materials
science workflows. Comput. Mater. Sci. 139, 140–152
(2017).
35
Further information on atomate
Thank you!
•  Kiran Mathew
•  Joey Montoya
•  Alireza Faghaninia
•  Shyam Dwaraknath
•  Murat Aykol
•  Hanmei Tang
•  Iek-Heng Chu
•  Tess Smidt
•  Brandon Bocklund
•  Matthew Horton
•  John Dagdelen
•  Brandon Wood
•  Zi-Kiu Liu
•  Jeff Neaton
•  Shyue Ping Ong
•  Kristin Persson
•  all other atomate
contributors!
36
Slides (already) posted to https://hackingmaterials.lbl.gov/

More Related Content

PDF
Materials Project computation and database infrastructure
PDF
Software tools for high-throughput materials data generation and data mining
PDF
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
PDF
The Materials Project: An Electronic Structure Database for Community-Based M...
PDF
Software tools to facilitate materials science research
PDF
The Materials Project: overview and infrastructure
PDF
FireWorks overview
PDF
Software tools for calculating materials properties in high-throughput (pymat...
Materials Project computation and database infrastructure
Software tools for high-throughput materials data generation and data mining
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
The Materials Project: An Electronic Structure Database for Community-Based M...
Software tools to facilitate materials science research
The Materials Project: overview and infrastructure
FireWorks overview
Software tools for calculating materials properties in high-throughput (pymat...

What's hot (20)

PDF
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
PDF
Computational materials design with high-throughput and machine learning methods
PDF
The Materials Project: Experiences from running a million computational scien...
PDF
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
PDF
Automating materials science workflows with pymatgen, FireWorks, and atomate
PDF
How might machine learning help advance solar PV research?
PDF
Automated Machine Learning Applied to Diverse Materials Design Problems
PDF
Software tools, crystal descriptors, and machine learning applied to material...
PDF
Data dissemination and materials informatics at LBNL
PDF
DuraMat Data Analytics
PDF
Atomate: a tool for rapid high-throughput computing and materials discovery
PDF
Density functional theory calculations and data mining for new thermoelectric...
PDF
ICME Workshop Jul 2014 - The Materials Project
PDF
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
PDF
Conducting and Enabling Data-Driven Research Through the Materials Project
PDF
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
PDF
Capturing and leveraging materials science knowledge from millions of journal...
PDF
Open-source tools for generating and analyzing large materials data sets
PDF
Overview of DuraMat software tool development
PDF
Software tools, crystal descriptors, and machine learning applied to material...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Computational materials design with high-throughput and machine learning methods
The Materials Project: Experiences from running a million computational scien...
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
Automating materials science workflows with pymatgen, FireWorks, and atomate
How might machine learning help advance solar PV research?
Automated Machine Learning Applied to Diverse Materials Design Problems
Software tools, crystal descriptors, and machine learning applied to material...
Data dissemination and materials informatics at LBNL
DuraMat Data Analytics
Atomate: a tool for rapid high-throughput computing and materials discovery
Density functional theory calculations and data mining for new thermoelectric...
ICME Workshop Jul 2014 - The Materials Project
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Conducting and Enabling Data-Driven Research Through the Materials Project
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Capturing and leveraging materials science knowledge from millions of journal...
Open-source tools for generating and analyzing large materials data sets
Overview of DuraMat software tool development
Software tools, crystal descriptors, and machine learning applied to material...
Ad

Similar to Atomate: a high-level interface to generate, execute, and analyze computational materials science workflows (20)

PDF
Overview of accelerated materials design efforts in the Hacking Materials res...
PDF
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
PDF
Combining density functional theory calculations, supercomputing, and data-dr...
PDF
Materials Modelling: From theory to solar cells (Lecture 1)
PPTX
Opportunities for X-Ray science in future computing architectures
PPTX
Research Object Community Update
PDF
How Web APIs and Data Centric Tools Power the Materials Project (PyData SV 2013)
PPTX
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
PDF
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
PPTX
AI at Scale for Materials and Chemistry
PDF
Open Source Tools for Materials Informatics
PDF
Sustainable Software for Computational Chemistry and Materials Modeling
PDF
Nephele pegasus
PDF
ECP Application Development
PPSX
Cornell Computational Chemistry Seminar
PPTX
Hattrick-Simpers MRS Webinar on AI in Materials
PPTX
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
PDF
Accelerating New Materials Design with Supercomputing and Machine Learning
PDF
Discovering new functional materials for clean energy and beyond using high-t...
PDF
Software Methods for Sustainable Solutions
Overview of accelerated materials design efforts in the Hacking Materials res...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
Combining density functional theory calculations, supercomputing, and data-dr...
Materials Modelling: From theory to solar cells (Lecture 1)
Opportunities for X-Ray science in future computing architectures
Research Object Community Update
How Web APIs and Data Centric Tools Power the Materials Project (PyData SV 2013)
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
AI at Scale for Materials and Chemistry
Open Source Tools for Materials Informatics
Sustainable Software for Computational Chemistry and Materials Modeling
Nephele pegasus
ECP Application Development
Cornell Computational Chemistry Seminar
Hattrick-Simpers MRS Webinar on AI in Materials
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Accelerating New Materials Design with Supercomputing and Machine Learning
Discovering new functional materials for clean energy and beyond using high-t...
Software Methods for Sustainable Solutions
Ad

More from Anubhav Jain (20)

PDF
A Career at a U.S. National Lab: Perspective from a Mid-Career Scientist
PDF
Research opportunities in materials design using AI/ML
PDF
Accelerating materials discovery with big data and machine learning
PDF
Predicting the Synthesizability of Inorganic Materials: Convex Hulls, Literat...
PDF
Discovering advanced materials for energy applications: theory, high-throughp...
PDF
Applications of Large Language Models in Materials Discovery and Design
PDF
An AI-driven closed-loop facility for materials synthesis
PDF
Best practices for DuraMat software dissemination
PDF
Best practices for DuraMat software dissemination
PDF
Available methods for predicting materials synthesizability using computation...
PDF
Efficient methods for accurately calculating thermoelectric properties – elec...
PDF
Natural Language Processing for Data Extraction and Synthesizability Predicti...
PDF
Machine Learning for Catalyst Design
PDF
Natural language processing for extracting synthesis recipes and applications...
PDF
DuraMat CO1 Central Data Resource: How it started, how it’s going …
PDF
The Materials Project
PDF
Evaluating Chemical Composition and Crystal Structure Representations using t...
PDF
Perspectives on chemical composition and crystal structure representations fr...
PDF
Discovering and Exploring New Materials through the Materials Project
PDF
The Materials Project: Applications to energy storage and functional materia...
A Career at a U.S. National Lab: Perspective from a Mid-Career Scientist
Research opportunities in materials design using AI/ML
Accelerating materials discovery with big data and machine learning
Predicting the Synthesizability of Inorganic Materials: Convex Hulls, Literat...
Discovering advanced materials for energy applications: theory, high-throughp...
Applications of Large Language Models in Materials Discovery and Design
An AI-driven closed-loop facility for materials synthesis
Best practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
Available methods for predicting materials synthesizability using computation...
Efficient methods for accurately calculating thermoelectric properties – elec...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Machine Learning for Catalyst Design
Natural language processing for extracting synthesis recipes and applications...
DuraMat CO1 Central Data Resource: How it started, how it’s going …
The Materials Project
Evaluating Chemical Composition and Crystal Structure Representations using t...
Perspectives on chemical composition and crystal structure representations fr...
Discovering and Exploring New Materials through the Materials Project
The Materials Project: Applications to energy storage and functional materia...

Recently uploaded (20)

PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
famous lake in india and its disturibution and importance
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
The scientific heritage No 166 (166) (2025)
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
INTRODUCTION TO EVS | Concept of sustainability
famous lake in india and its disturibution and importance
Derivatives of integument scales, beaks, horns,.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
microscope-Lecturecjchchchchcuvuvhc.pptx
2. Earth - The Living Planet Module 2ELS
2Systematics of Living Organisms t-.pptx
Cell Membrane: Structure, Composition & Functions
Placing the Near-Earth Object Impact Probability in Context
The KM-GBF monitoring framework – status & key messages.pptx
The scientific heritage No 166 (166) (2025)
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Taita Taveta Laboratory Technician Workshop Presentation.pptx
neck nodes and dissection types and lymph nodes levels
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
HPLC-PPT.docx high performance liquid chromatography
TOTAL hIP ARTHROPLASTY Presentation.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...

Atomate: a high-level interface to generate, execute, and analyze computational materials science workflows

  • 1. Atomate: A High-level Interface to Generate, Execute, and Analyze Computational Materials Science Workflows Anubhav Jain Energy Technologies Area Lawrence Berkeley National Lab Berkeley, CA TMS 2018 Slides (already) posted to: https://hackingmaterials.lbl.gov/ Input file flags SLURM format how to fix ZPOTRF? q  set up the structure coordinates q  write input files, double-check all the flags q  copy to supercomputer q  submit job to queue q  deal with supercomputer headaches q  monitor job q  fix error jobs, resubmit to queue, wait again q  repeat process for subsequent calculations in workflow q  parse output files to obtain results q  copy and organize results, e.g., into Excel
  • 2. 2 A schematic of “materials genomics” approaches to materials science data applications methods (theory, ML) software implementation
  • 3. 3 Our group builds and maintain several open-source software libraries Data generation Data analysis run and manage millions of computational tasks over large computing resources library of FireWorks-compatible workflows for materials science applications materials data retrieval, featurization, and visualization for machine learning tools for crystal manipulation, data analysis, and simulation software I/O *led by Ong group, UCSD tools for inverse optimation / adaptive design – ML chooses what calculations to run
  • 4. 4 This talk will focus on atomate and FireWorks Data generation Data analysis run and manage millions of computational tasks over large computing resources library of FireWorks-compatible workflows for materials science applications materials data retrieval, featurization, and visualization for machine learning tools for crystal manipulation, data analysis, and simulation software I/O *led by Ong group, UCSD tools for inverse optimation / adaptive design – ML chooses what calculations to run
  • 5. Today, automated (“high-throughput”) calculations play an important role in materials data generation 5 M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >4500 elastic tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier, Sci Data 2017, 4, 170085.!
  • 6. Today, automated (“high-throughput”) calculations play an important role in materials data generation 6 M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >4500 elastic tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier, Sci Data 2017, 4, 170085.! Atomate’s goal: make it easy to generate comparable data sets on your own
  • 7. A “black-box” view of performing a calculation 7 “something”! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs?
  • 8. Unfortunately, the inside of the “black box” is usually tedious and “low-level” 8 lots of tedious, low-level work…! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs? Input file flags SLURM format how to fix ZPOTRF? q  set up the structure coordinates q  write input files, double-check all the flags q  copy to supercomputer q  submit job to queue q  deal with supercomputer headaches q  monitor job q  fix error jobs, resubmit to queue, wait again q  repeat process for subsequent calculations in workflow q  parse output files to obtain results q  copy and organize results, e.g., into Excel
  • 9. What would be a better way? 9 “something”! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs?
  • 10. What would be a better way? 10 Results!! researcher! What is the GGA-PBE elastic tensor of GaAs? Workflows to run! q  band structure! q  surface energies! ü  elastic tensor! q  Raman spectrum! q  QH thermal expansion!
  • 11. Ideally the method should scale to millions of calculations 11 Results!! researcher! Start with all binary oxides, replace O->S, run several different properties Workflows to run! ü  band structure! ü  surface energies! ü  elastic tensor! q  Raman spectrum! q  QH thermal expansion! q  spin-orbit coupling!
  • 12. Atomate tries make it easy, automatic, and flexible to generate data with existing simulation packages 12 Results!! researcher! Run many different properties of many different materials!
  • 13. Atomate contains a library of simulation procedures 13 VASP-based •  band structure •  spin-orbit coupling •  hybrid functional calcs •  elastic tensor •  piezoelectric tensor •  Raman spectra •  NEB •  GIBBS method •  QH thermal expansion •  AIMD •  ferroelectric •  surface adsorption •  work functions Other •  BoltzTraP •  FEFF method •  LAMMPS MD Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
  • 14. Each simulation procedure translates high-level instructions into a series of low-level tasks 14 quickly and automatically translate PI-style (minimal) specifications into well-defined FireWorks workflows What is the GGA-PBE elastic tensor of GaAs? M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, et al., Charting the complete elastic properties of inorganic crystalline compounds, Sci. Data. 2 (2015).
  • 15. Atomate thus encodes and standardizes knowledge about running various kinds of simulations from domain experts 15 K. Mathew J. Montoya S. Dwaraknath A. Faghaninia All past and present knowledge, from everyone in the group, everyone previously in the group, and our collaborators, about how to run calculations M. Aykol S.P. Ong B. Bocklund T. Smidt H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood Z.K. Liu J. Neaton K. Persson A. Jain +
  • 16. 16 Full operation diagram job 1 job 2 job 3 job 4 structure! workflow! database of all workflows! automatically submit + execute!output files + database!
  • 17. 17 Full operation diagram job 1 job 2 job 3 job 4 structure! workflow! database of all workflows! automatically submit + execute!output files + database!
  • 18. •  Pymatgen can retrieve crystal structures from the Materials Project database (MPRester class) •  It can also manipulate crystal structures –  substitutions –  supercell creation –  order-disorder (shown at right) –  interstitial finding –  surface / slab generation •  A visual interface to many of the tools are in Materials Project’s “Crystal Toolkit” app 18 Crystal structure generation via pymatgen Example: Order-disorder resolve partial or mixed occupancies into a fully ordered crystal structure (e.g., mixed oxide-fluoride site into separate oxygen/fluorine)
  • 19. 19 Full operation diagram job 1 job 2 job 3 job 4 structure! workflow! database of all workflows! automatically submit + execute!output files + database!
  • 20. 20 Atomate’s main goal – convert structures to workflows Workflows consist of a series of jobs (“FireWorks”), each with multiple tasks. Atomate jobs typically (i) run a calculation and (ii) store the results in a database
  • 21. Workflow parameters can be customized at multiple levels of detail 21 1.  Workflows have various high-level options 2. Fireworks also have options / flags (not shown) 3. Firetasks have most detailed number of options / flags (not shown) Example 1: “VASP input set” controls the rules that set DFT parameters (pseudopotentials, cutoffs, grid densities, etc) via pymatgen! ! Example II: If “stability_check” is enabled, the later parts of the workflow are skipped if the structure is determined unstable.!
  • 22. You can build workflows from scratch or reuse components to assemble workflows Multiple workflows are built with the same components stacked together in different ways like Legos 22 These two workflows reuse almost all the same code between the two!
  • 23. 23 Full operation diagram job 1 job 2 job 3 job 4 structure! workflow! database of all workflows! automatically submit + execute!output files + database!
  • 24. •  Once you have the material and the simulation procedure (Workflow), you need to actually execute the workflow on your computing resource •  This includes tasks like: –  submission to calculation queues –  customization of any computing-specific parameters •  e.g., path to VASP executable, number of CPUs to parallelize over –  recovering from failures / job resubmission –  coordinating jobs across computing centers –  managing location of jobs –  tracking the progress of jobs •  Almost all of this is handled by FireWorks (custodian is used for encoding fixes to typical errors e.g. VASP ZPOTRF error) •  FireWorks is a mature software, used by dozens of research groups and used to to run millions of simulations 24 Calculation execution with FireWorks
  • 25. FireWorks allows you to write your workflow once and execute (almost) anywhere 25 •  Execute workflows locally or at a supercomputing center •  Queue systems supported –  PBS –  SGE –  SLURM –  IBM LoadLeveler –  NEWT (a REST-based API at NERSC) –  Cobalt (Argonne LCF)
  • 26. Dashboard with status of all jobs 26
  • 27. Job provenance and automatic metadata storage 27 what machine what time what directory what was the output when was it queued when did it start running when was it completed
  • 28. Detect and rerun failures •  All kinds of failures can be detected and rerun –  Soft failures (job quits with error code) –  hard failures (computing center goes down) –  human errors 28
  • 29. “Dynamic workflows” let you program intelligent, reactive workflows 29 Xiaohui can replace himself with digital Xiaohui, programmed into FireWorks
  • 30. Customize job priorities •  Within workflow, or between workflows •  Completely flexible and can be modified / updated whenever you want 30
  • 31. 31 Full operation diagram job 1 job 2 job 3 job 4 structure! workflow! database of all workflows! automatically submit + execute!output files + database!
  • 32. Atomate – builders framework 32 “Builders” start with base collections in a database and create higher-level collections that summarize information or add metadata
  • 33. 33 The atomate database makes it easy to perform various analyses with pymatgen atomate output database(s)! phase diagrams Pourbaix diagrams diffusivity via MDband structure analysis
  • 34. 34 Many research groups have run tens of thousands of materials science workflows with atomate also used by: •  Persson research group, UC Berkeley •  Ong research group, UC San Diego •  Neaton research group, UC Berkeley •  Liu research group, Penn State •  Groups not developing on atomate! •  e.g., see “Thermal expansion of quaternary nitride coatings” by Tasnadi et al. atomate now powers the Materials Project and will be used to run hundreds of thousands of simulations in the next year (www.materialsproject.org)
  • 35. •  Link to code: –  https://www.github.com/hackingmaterials/atomate •  License: BSD –  open-source, can be used with commercial software –  like MIT license but clause to not misuse the Berkeley Lab name, e.g. for advertising purposes •  Help and support –  https://groups.google.com/forum/#!forum/atomate •  Citation with further information: –  Mathew, K. et al. Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows. Comput. Mater. Sci. 139, 140–152 (2017). 35 Further information on atomate
  • 36. Thank you! •  Kiran Mathew •  Joey Montoya •  Alireza Faghaninia •  Shyam Dwaraknath •  Murat Aykol •  Hanmei Tang •  Iek-Heng Chu •  Tess Smidt •  Brandon Bocklund •  Matthew Horton •  John Dagdelen •  Brandon Wood •  Zi-Kiu Liu •  Jeff Neaton •  Shyue Ping Ong •  Kristin Persson •  all other atomate contributors! 36 Slides (already) posted to https://hackingmaterials.lbl.gov/