SlideShare a Scribd company logo
Overview of accelerated materials design
efforts in the Hacking Materials research group
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
2
Materials and their properties determine what is
technologically possible
Electric vehicles and solar power
are two technologies that have
been dreamed about for many
decades, but did not have much
real impact for a long time …
1910
1956
3
Materials and their properties determine what is
technologically possible
Today’s revolution in clean energy technologies are largely
due to advancements in materials –
science, engineering, and manufacturing.
Much else might be possible with better materials …
but, as past examples demonstrate, it can take a long time.
What constrains traditional approaches to materials design?
4
“[The Chevrel] discovery resulted from a lot of
unsuccessful experiments of Mg ions insertion
into well-known hosts for Li+ ions insertion, as
well as from the thorough literature analysis
concerning the possibility of divalent ions
intercalation into inorganic materials.”
-Aurbach group, on discovery of Chevrel cathode
for multivalent (e.g., Mg2+) batteries
Levi, Levi, Chasid, Aurbach
J. Electroceramics (2009)
5
Outline
High-throughput
computing and
simulations
Machine learning Text mining
6
Outline
High-throughput
computing and
simulations
Machine learning Text mining
What is density functional theory (DFT)?
7
• 1920s: The Schrödinger equation essentially contains all of chemistry
embedded within it
• it is almost always too complicated to solve due to the numerous electron
interactions and complexity of the wave function entity
• 1960s: DFT is developed and reframes the problem for ground state properties
of the system to be in terms of the charge density, not wavefunction
• makes solutions tractable while in principle not sacrificing accuracy for
the ground state!
e–	
e–	 e–	
e–	 e–	
e–
How does one use DFT to design new materials?
8
A. Jain, Y. Shin, and K. A.
Persson, Nat. Rev. Mater.
1, 15004 (2016).
9
Examples of experimentally-confirmed materials designed
with DFT (1)
Jain, A., Shin, Y., Persson, K.A., 2016. Computational predictions of energy materials using density functional theory.
Nature Reviews Materials 1, 15004.
10
Examples of experimentally-confirmed materials designed
with DFT (2)
Jain, A., Shin, Y., Persson, K.A., 2016. Computational predictions of energy materials using density functional theory.
Nature Reviews Materials 1, 15004.
High-throughput DFT is useful for generating large data sets,
e.g., for materials screening
11
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.
>10,000
elastic tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier, Sci
Data 2017, 4, 170085.
High-throughput DFT is useful for generating large data sets,
e.g., for materials screening
12
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.
>10,000
elastic tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier, Sci
Data 2017, 4, 170085.
Atomate’s goal: make
high-throughput easy and
scalable for everyone
A “black-box” view of performing a calculation
13
“something”
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
14
lots of tedious,
low-level work…
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Input file flags
SLURM format
how to fix ZPOTRF?
q set up the structure coordinates
q write input files, double-check all
the flags
q copy to supercomputer
q submit job to queue
q deal with supercomputer
headaches
q monitor job
q fix error jobs, resubmit to queue,
wait again
q repeat process for subsequent
calculations in workflow
q parse output files to obtain results
q copy and organize results, e.g., into
Excel
What would be a better way?
15
“something”
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
What would be a better way?
16
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Workflows to run
q band structure
q surface energies
ü elastic tensor
q Raman spectrum
q QH thermal expansion
Ideally the method should scale to millions of calculations
17
Results!
researcher
Start with all binary
oxides, replace O->S,
run several different
properties
Workflows to run
ü band structure
ü surface energies
ü elastic tensor
q Raman spectrum
q QH thermal expansion
q spin-orbit coupling
Atomate tries make it easy, automatic, and flexible to
generate data with existing simulation packages
18
Results!
researcher
Run many different
properties of many
different materials!
Each simulation procedure translates high-level instructions
into a series of low-level tasks
19
quickly and automatically translate high-level (minimal)
specifications into well-defined FireWorks workflows
What is the
GGA-PBE elastic
tensor of GaAs?
M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, et al.,
Charting the complete elastic properties of inorganic crystalline compounds,
Sci. Data. 2 (2015).
Atomate contains a library of simulation procedures
20
VASP-based
• band structure
• spin-orbit coupling
• hybrid functional
calcs
• elastic tensor
• piezoelectric tensor
• Raman spectra
• NEB
• GIBBS method
• QH thermal
expansion
• AIMD
• ferroelectric
• surface adsorption
• work functions
• NMR spectra*
• Bader charges*
• Magnetic
orderings*
• SCAN functionals*
Other
• BoltzTraP
• FEFF method
• Q-Chem*
*=added / major
updates in past year
Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze
computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
21
Full operation diagram
job 1
job 2
job 3 job 4
structure workflow database of
all workflows
automatically submit + executeoutput files + database
22
A web-based interface is in progress to give atomate users a
“personal Materials Project” of their own calculations
Atomate now powers the Materials Project
• Online resource of density
functional theory simulation data
for ~85,000 inorganic materials
• Includes band structures, elastic
tensors, piezoelectric tensors,
battery properties and more
• >75,000 registered users
• Free
• www.materialsproject.org
23
Jain et al. Commentary: The Materials Project: A
materials genome approach to accelerating
materials innovation. APL Mater. 1, 11002 (2013).
24
Getting started with atomate
Mathew, K. et al. Atomate: A high-
level interface to generate, execute,
and analyze computational
materials science workflows.
Comput. Mater. Sci. 139, 140–152
(2017).
hackingmaterials.github.io/
atomate
https://groups.google.com/
forum/#!forum/atomate
Paper Docs Support
25
Outline
High-throughput
computing and
simulations
Machine learning Text mining
• With atomate/FireWorks,
the user must decide which
calculations to perform
– E.g., which materials to
calculate
• Rocketsled is an extension
to FireWorks that lets the
computer decide what the
next best calculation is
based on the results of
previous calculations
• Works for materials design
or any other “inverse
computational problem”
26
Rocketsled uses adaptive design to suggest the best
computations to optimize some metric
27
Given a search domain, Rocketsled uses an optimization
engine to select calculations and submit to supercomputers
Optimization engine includes 4 built-in regressors (e.g., RandomForest,
Gaussian Process) and 5 acquisition functions (e.g., Expected
Improvement). Can bootstrap uncertainty estimates. Or use your own!
28
Results of using optimization can be dramatic!
In the problem of finding materials with
high K and high G for superhard
materials (7394 possibilities), Rocketsled
finds solutions ~30-60X faster than
randomly computing the space.
Can use pure ML approaches or use
matminer featurizations for materials
science (latter helps give such good
performance)
29
Results of using optimization can be dramatic!
In the problem of finding materials with
high K and high G for superhard
materials (7394 possibilities), Rocketsled
finds solutions ~30-60X faster than
randomly computing the space.
Even after just 200
calculations of the
7394 possibilities,
all solutions are
almost certain to
be found with
Rocketsled.
Can use pure ML approaches or use
matminer featurizations for materials
science (latter helps give such good
performance)
30
Getting started with rocketsled
Dunn, A.R., et al. Rocketsled: a
software library for optimizing
high-throughput computational
searches. J. Phys. Mater.
https://doi.org/10.1088/2515-
7639/ab0c3d
hackingmaterials.github.io/
rocketsled
https://groups.google.com/for
um/#!forum/fireworkflows
Paper Docs Support
31
Outline
High-throughput
computing and
simulations
Machine learning Text mining
32
What is needed to do machine learning on materials?
How can we represent
chemistry and structure as
vectors?
How do we get
enough output
data for training?
Matminer connects materials data with data mining
algorithms and data visualization libraries
33
Ward, L. et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018).
>60 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
34
Matminer contains a library of descriptors for various
materials science entities
feat = EwaldEnergy([options])
y = feat.featurize([input_data])
• compatible with scikit-
learn pipelining
• automatically deploy
multiprocessing to
parallelize over data
• include citations to
methodology papers
35
Interactive Jupyter notebooks demonstrate use cases
https://github.com/hackingmaterials/matminer_examples
Many examples available:
• Retrieving data from various databases
• Predicting bulk / shear modulus
• Predicting formation energies:
• from composition alone
• with Voronoi-based structure features
included
• with Coulomb matrix and Orbital Field
matrix descriptors (reproducing
previous studies in the literature)
• Making interactive visualizations
• Creating an ML pipeline
36
Getting started with matminer
Ward et al. Matminer : An open
source toolkit for materials data
mining. Computational Materials
Science, 152, 60–69 (2018).
Paper Docs Support
hackingmaterials.github.io
/matminer
https://groups.google.com/
forum/#!forum/matminer
37
Outline
High-throughput
computing and
simulations
Machine learning Text mining
38
Typically several steps of machine learning are performed by
a human researcher – can these be automated?
Descriptors developed and
chosen by a researcher
ML model developed
and chosen by a
researcher
Why can’t we just give the computer some raw input data
(compositions, crystal structures) and output properties and get
back an ML model?
39
Automatminer develops an ML model automatically given
raw data (structures or compositions plus output properties)
Featurizer
MagPie
SOAP
Sine Coulomb Matrix
+ many, many more
• Missing value
imputation
• Scaling
• One-hot
encoding
• PCA-based
• Correlation
• Relief-based
(MultiSURF)
Uses genetic
algorithms to find
the best machine
learning model +
hyperparameters
40
Automatminer can be used as a black box
41
We are benchmarking automatminer vs current state of the
art against 11 problems intended to be a standard test set
Dataset Target(s) Samples
Elastic Tensor KVRH (GPa), GVRH (GPa) 10,987
Dielectric Tensor Refractive index 4,765
JARVIS 2D Exfoliation energy (meV/atom) 636
Materials Project
phonons
Highest LO Phonon Frequency (Last
PhDOS peak)
1,265
Materials Project
(stable)
Band gap (eV), Is metallic? (classification) 106,113
Perovskites Formation energy (eV/atom) 18,928
Experimental Band
Gaps
Is metallic? (classification) 6,354
Experimental Metallic
Glasses
Glass forms? (classification) 7,190
Materials Project (all) Formation energy (eV/atom) 132,752
42
Usually, automatminer does very well
Usually, automatminer outperforms both
state-of-the-art graph based models AND
human-generated models!
But …
43
Graph-based approaches work better in some problems
Hypothesis – automatminer
approaches are better for smaller
data sets, graph-based
approaches are better for larger
data sets
Unfortunately, it can be difficult
to train some of the graph
models on large data sets,
particularly without GPUs, so
the results are not in yet!
44
Getting started with automatminer
Paper Docs Support
hackingmaterials.github.io
/automatminer
https://groups.google.com/
forum/#!forum/matminer
In preparation …
45
Outline
High-throughput
computing and
simulations
Machine learning Text mining
We have extracted ~3
million abstracts of
scientific articles
We will use natural
language processing
algorithms to try to
extract knowledge from
all this data
46
Goal: collect knowledge embedded in the scientific
literature
47
An engine to label the content of scientific abstracts
Collect, clean, and extract information from millions of
published materials science journal abstracts
48
Application: a revised materials search engine
Auto-generated summaries of materials based on text mining
49
Application: materials compositions of interest …
A search for thermoelectrics that do not have Pb or Bi
• We use the word2vec
algorithm (Google) to turn
each unique word in our
corpus into a 200-
dimensional vector
• These vectors encode the
meaning of each word
meaning based on trying to
predict context words
around the target
50
Using the word2vec algorithm to extract knowledge and
make predictions
• Dot product of a composition word
with the word “thermoelectric”
essentially predicts how likely that
word is to appear in an abstract with
the word thermoelectric
• Compositions with high dot products
are typically known thermoelectrics
• Sometimes, compositions have a high
dot product with “thermoelectric” but
have never been studied as a
thermoelectric
• These compositions usually have high
computed power factors! (BoltzTraP)
51
Vector dot products measure similarity
• We can test to see if our
method can predict new
compositions
• For every year in the past
~2 decades (e.g. 2001),
train word embeddings
only until that point in time
• Make predictions of what
materials are the most
promising thermoelectrics
• See if those materials have
thus far been actually
studied as thermoelectrics
52
Can we predict future thermoelectrics discoveries with this
method?
• Thus far, 2 of our top 20 predictions made in
~August 2018 have already been reported in the
literature for the first time as thermoelectrics
– Li3Sb was the subject of a computational study
(predicted zT=2.42) in Oct 2018
– SnTe2 was experimentally found to be a moderately
good thermoelectric (expt zT=0.71) in Dec 2018
• We are currently trying to make some of the
other compounds on the list
– The full list will be published alongside a paper
(currently under review)
53
Next steps
[1] Yang et al. "Low lattice thermal conductivity and
excellent thermoelectric behavior in Li3Sb and Li3Bi."
Journal of Physics: Condensed Matter 30.42 (2018):
425401
[2] Wang et al. "Ultralow lattice thermal conductivity and
electronic properties of monolayer 1T phase semimetal
SiTe2 and SnTe2." Physica E: Low-dimensional Systems and
Nanostructures 108 (2019): 53-59
54
Summary
High-throughput
computing and
simulations
Machine learning Text mining
• Lead developers:
– Atomate: Kiran Mathew
– Rocketsled: Alex Dunn
– Matminer: Logan Ward
– Automatminer: Alex Dunn
– Matscholar: Vahe Tshitoyan, John Dagdelen, Leigh Weston
• And the dozens of other developers who have contributed to
these packages or reported issues!
• Funding: U.S. Department of Energy, Basic Energy Sciences,
Early Career Award
• Additional funding from the DOE-funded Materials Project
55
Acknowledgements

More Related Content

PDF
High-throughput computation and machine learning methods applied to materials...
PDF
Discovering advanced materials for energy applications (with high-throughput ...
PDF
Machine Learning Platform for Catalyst Design
PDF
Materials discovery through theory, computation, and machine learning
PDF
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
PDF
Combined Theory and Data-Driven Approaches to Thermoelectrics Materials Disco...
PDF
Introduction (Part I): High-throughput computation and machine learning appli...
PDF
Combining density functional theory calculations, supercomputing, and data-dr...
High-throughput computation and machine learning methods applied to materials...
Discovering advanced materials for energy applications (with high-throughput ...
Machine Learning Platform for Catalyst Design
Materials discovery through theory, computation, and machine learning
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Combined Theory and Data-Driven Approaches to Thermoelectrics Materials Disco...
Introduction (Part I): High-throughput computation and machine learning appli...
Combining density functional theory calculations, supercomputing, and data-dr...

What's hot (20)

PDF
Methods, tools, and examples (Part II): High-throughput computation and machi...
PDF
Capturing and leveraging materials science knowledge from millions of journal...
PDF
Software tools, crystal descriptors, and machine learning applied to material...
PDF
Discovering advanced materials for energy applications by mining the scientif...
PDF
Combining density functional theory calculations, supercomputing, and data-dr...
PDF
Computational screening of tens of thousands of compounds as potential thermo...
PDF
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
PDF
Combining density functional theory calculations, supercomputing, and data-dr...
PDF
Computational materials design with high-throughput and machine learning methods
PDF
Density functional theory calculations and data mining for new thermoelectric...
PDF
Conducting and Enabling Data-Driven Research Through the Materials Project
PDF
Materials design using knowledge from millions of journal articles via natura...
PDF
Data dissemination and materials informatics at LBNL
PDF
Open Source Tools for Materials Informatics
PDF
Software tools, crystal descriptors, and machine learning applied to material...
PDF
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
PDF
Software tools for data-driven research and their application to thermoelectr...
PDF
The Materials Project: Experiences from running a million computational scien...
PDF
Computational Materials Design and Data Dissemination through the Materials P...
PDF
Atomate: a tool for rapid high-throughput computing and materials discovery
Methods, tools, and examples (Part II): High-throughput computation and machi...
Capturing and leveraging materials science knowledge from millions of journal...
Software tools, crystal descriptors, and machine learning applied to material...
Discovering advanced materials for energy applications by mining the scientif...
Combining density functional theory calculations, supercomputing, and data-dr...
Computational screening of tens of thousands of compounds as potential thermo...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Combining density functional theory calculations, supercomputing, and data-dr...
Computational materials design with high-throughput and machine learning methods
Density functional theory calculations and data mining for new thermoelectric...
Conducting and Enabling Data-Driven Research Through the Materials Project
Materials design using knowledge from millions of journal articles via natura...
Data dissemination and materials informatics at LBNL
Open Source Tools for Materials Informatics
Software tools, crystal descriptors, and machine learning applied to material...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Software tools for data-driven research and their application to thermoelectr...
The Materials Project: Experiences from running a million computational scien...
Computational Materials Design and Data Dissemination through the Materials P...
Atomate: a tool for rapid high-throughput computing and materials discovery

Similar to Overview of accelerated materials design efforts in the Hacking Materials research group (20)

PDF
The Materials Project: A Community Data Resource for Accelerating New Materia...
PDF
Software tools for calculating materials properties in high-throughput (pymat...
PDF
Materials Project computation and database infrastructure
PDF
Discovering new functional materials for clean energy and beyond using high-t...
PDF
Software tools for high-throughput materials data generation and data mining
PDF
Discovering and Exploring New Materials through the Materials Project
PDF
Available methods for predicting materials synthesizability using computation...
PDF
The Materials Project: An Electronic Structure Database for Community-Based M...
PDF
The Materials Project: Applications to energy storage and functional materia...
PDF
NANO266 - Lecture 12 - High-throughput computational materials design
PDF
ICME Workshop Jul 2014 - The Materials Project
PDF
Applications of Machine Learning for Materials Discovery at NREL
PDF
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
PDF
01-10 Exploring new high potential 2D materials - Angioni.pdf
PDF
Recent Advancements in the NIST-JARVIS Infrastructure
PDF
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
PDF
Machine learning for materials design: opportunities, challenges, and methods
PDF
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
PDF
Natural language processing for extracting synthesis recipes and applications...
PPTX
Ema 20190124 v1.4_dist
The Materials Project: A Community Data Resource for Accelerating New Materia...
Software tools for calculating materials properties in high-throughput (pymat...
Materials Project computation and database infrastructure
Discovering new functional materials for clean energy and beyond using high-t...
Software tools for high-throughput materials data generation and data mining
Discovering and Exploring New Materials through the Materials Project
Available methods for predicting materials synthesizability using computation...
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: Applications to energy storage and functional materia...
NANO266 - Lecture 12 - High-throughput computational materials design
ICME Workshop Jul 2014 - The Materials Project
Applications of Machine Learning for Materials Discovery at NREL
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
01-10 Exploring new high potential 2D materials - Angioni.pdf
Recent Advancements in the NIST-JARVIS Infrastructure
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
Machine learning for materials design: opportunities, challenges, and methods
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Natural language processing for extracting synthesis recipes and applications...
Ema 20190124 v1.4_dist

More from Anubhav Jain (20)

PDF
A Career at a U.S. National Lab: Perspective from a Mid-Career Scientist
PDF
Research opportunities in materials design using AI/ML
PDF
Accelerating materials discovery with big data and machine learning
PDF
Predicting the Synthesizability of Inorganic Materials: Convex Hulls, Literat...
PDF
Discovering advanced materials for energy applications: theory, high-throughp...
PDF
Applications of Large Language Models in Materials Discovery and Design
PDF
An AI-driven closed-loop facility for materials synthesis
PDF
Best practices for DuraMat software dissemination
PDF
Best practices for DuraMat software dissemination
PDF
Efficient methods for accurately calculating thermoelectric properties – elec...
PDF
Natural Language Processing for Data Extraction and Synthesizability Predicti...
PDF
Machine Learning for Catalyst Design
PDF
Accelerating New Materials Design with Supercomputing and Machine Learning
PDF
DuraMat CO1 Central Data Resource: How it started, how it’s going …
PDF
The Materials Project
PDF
Evaluating Chemical Composition and Crystal Structure Representations using t...
PDF
Perspectives on chemical composition and crystal structure representations fr...
PDF
Machine Learning Platform for Catalyst Design
PDF
Applications of Natural Language Processing to Materials Design
PDF
Assessing Factors Underpinning PV Degradation through Data Analysis
A Career at a U.S. National Lab: Perspective from a Mid-Career Scientist
Research opportunities in materials design using AI/ML
Accelerating materials discovery with big data and machine learning
Predicting the Synthesizability of Inorganic Materials: Convex Hulls, Literat...
Discovering advanced materials for energy applications: theory, high-throughp...
Applications of Large Language Models in Materials Discovery and Design
An AI-driven closed-loop facility for materials synthesis
Best practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
Efficient methods for accurately calculating thermoelectric properties – elec...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Machine Learning for Catalyst Design
Accelerating New Materials Design with Supercomputing and Machine Learning
DuraMat CO1 Central Data Resource: How it started, how it’s going …
The Materials Project
Evaluating Chemical Composition and Crystal Structure Representations using t...
Perspectives on chemical composition and crystal structure representations fr...
Machine Learning Platform for Catalyst Design
Applications of Natural Language Processing to Materials Design
Assessing Factors Underpinning PV Degradation through Data Analysis

Recently uploaded (20)

PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
Sciences of Europe No 170 (2025)
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
The scientific heritage No 166 (166) (2025)
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PPTX
2. Earth - The Living Planet Module 2ELS
Comparative Structure of Integument in Vertebrates.pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Sciences of Europe No 170 (2025)
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
neck nodes and dissection types and lymph nodes levels
The scientific heritage No 166 (166) (2025)
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
TOTAL hIP ARTHROPLASTY Presentation.pptx
The KM-GBF monitoring framework – status & key messages.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
7. General Toxicologyfor clinical phrmacy.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
2. Earth - The Living Planet Module 2ELS

Overview of accelerated materials design efforts in the Hacking Materials research group

  • 1. Overview of accelerated materials design efforts in the Hacking Materials research group Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA
  • 2. 2 Materials and their properties determine what is technologically possible Electric vehicles and solar power are two technologies that have been dreamed about for many decades, but did not have much real impact for a long time … 1910 1956
  • 3. 3 Materials and their properties determine what is technologically possible Today’s revolution in clean energy technologies are largely due to advancements in materials – science, engineering, and manufacturing. Much else might be possible with better materials … but, as past examples demonstrate, it can take a long time.
  • 4. What constrains traditional approaches to materials design? 4 “[The Chevrel] discovery resulted from a lot of unsuccessful experiments of Mg ions insertion into well-known hosts for Li+ ions insertion, as well as from the thorough literature analysis concerning the possibility of divalent ions intercalation into inorganic materials.” -Aurbach group, on discovery of Chevrel cathode for multivalent (e.g., Mg2+) batteries Levi, Levi, Chasid, Aurbach J. Electroceramics (2009)
  • 7. What is density functional theory (DFT)? 7 • 1920s: The Schrödinger equation essentially contains all of chemistry embedded within it • it is almost always too complicated to solve due to the numerous electron interactions and complexity of the wave function entity • 1960s: DFT is developed and reframes the problem for ground state properties of the system to be in terms of the charge density, not wavefunction • makes solutions tractable while in principle not sacrificing accuracy for the ground state! e– e– e– e– e– e–
  • 8. How does one use DFT to design new materials? 8 A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
  • 9. 9 Examples of experimentally-confirmed materials designed with DFT (1) Jain, A., Shin, Y., Persson, K.A., 2016. Computational predictions of energy materials using density functional theory. Nature Reviews Materials 1, 15004.
  • 10. 10 Examples of experimentally-confirmed materials designed with DFT (2) Jain, A., Shin, Y., Persson, K.A., 2016. Computational predictions of energy materials using density functional theory. Nature Reviews Materials 1, 15004.
  • 11. High-throughput DFT is useful for generating large data sets, e.g., for materials screening 11 M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009. >10,000 elastic tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier, Sci Data 2017, 4, 170085.
  • 12. High-throughput DFT is useful for generating large data sets, e.g., for materials screening 12 M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009. >10,000 elastic tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier, Sci Data 2017, 4, 170085. Atomate’s goal: make high-throughput easy and scalable for everyone
  • 13. A “black-box” view of performing a calculation 13 “something” Results! researcher What is the GGA-PBE elastic tensor of GaAs?
  • 14. Unfortunately, the inside of the “black box” is usually tedious and “low-level” 14 lots of tedious, low-level work… Results! researcher What is the GGA-PBE elastic tensor of GaAs? Input file flags SLURM format how to fix ZPOTRF? q set up the structure coordinates q write input files, double-check all the flags q copy to supercomputer q submit job to queue q deal with supercomputer headaches q monitor job q fix error jobs, resubmit to queue, wait again q repeat process for subsequent calculations in workflow q parse output files to obtain results q copy and organize results, e.g., into Excel
  • 15. What would be a better way? 15 “something” Results! researcher What is the GGA-PBE elastic tensor of GaAs?
  • 16. What would be a better way? 16 Results! researcher What is the GGA-PBE elastic tensor of GaAs? Workflows to run q band structure q surface energies ü elastic tensor q Raman spectrum q QH thermal expansion
  • 17. Ideally the method should scale to millions of calculations 17 Results! researcher Start with all binary oxides, replace O->S, run several different properties Workflows to run ü band structure ü surface energies ü elastic tensor q Raman spectrum q QH thermal expansion q spin-orbit coupling
  • 18. Atomate tries make it easy, automatic, and flexible to generate data with existing simulation packages 18 Results! researcher Run many different properties of many different materials!
  • 19. Each simulation procedure translates high-level instructions into a series of low-level tasks 19 quickly and automatically translate high-level (minimal) specifications into well-defined FireWorks workflows What is the GGA-PBE elastic tensor of GaAs? M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, et al., Charting the complete elastic properties of inorganic crystalline compounds, Sci. Data. 2 (2015).
  • 20. Atomate contains a library of simulation procedures 20 VASP-based • band structure • spin-orbit coupling • hybrid functional calcs • elastic tensor • piezoelectric tensor • Raman spectra • NEB • GIBBS method • QH thermal expansion • AIMD • ferroelectric • surface adsorption • work functions • NMR spectra* • Bader charges* • Magnetic orderings* • SCAN functionals* Other • BoltzTraP • FEFF method • Q-Chem* *=added / major updates in past year Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
  • 21. 21 Full operation diagram job 1 job 2 job 3 job 4 structure workflow database of all workflows automatically submit + executeoutput files + database
  • 22. 22 A web-based interface is in progress to give atomate users a “personal Materials Project” of their own calculations
  • 23. Atomate now powers the Materials Project • Online resource of density functional theory simulation data for ~85,000 inorganic materials • Includes band structures, elastic tensors, piezoelectric tensors, battery properties and more • >75,000 registered users • Free • www.materialsproject.org 23 Jain et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 11002 (2013).
  • 24. 24 Getting started with atomate Mathew, K. et al. Atomate: A high- level interface to generate, execute, and analyze computational materials science workflows. Comput. Mater. Sci. 139, 140–152 (2017). hackingmaterials.github.io/ atomate https://groups.google.com/ forum/#!forum/atomate Paper Docs Support
  • 26. • With atomate/FireWorks, the user must decide which calculations to perform – E.g., which materials to calculate • Rocketsled is an extension to FireWorks that lets the computer decide what the next best calculation is based on the results of previous calculations • Works for materials design or any other “inverse computational problem” 26 Rocketsled uses adaptive design to suggest the best computations to optimize some metric
  • 27. 27 Given a search domain, Rocketsled uses an optimization engine to select calculations and submit to supercomputers Optimization engine includes 4 built-in regressors (e.g., RandomForest, Gaussian Process) and 5 acquisition functions (e.g., Expected Improvement). Can bootstrap uncertainty estimates. Or use your own!
  • 28. 28 Results of using optimization can be dramatic! In the problem of finding materials with high K and high G for superhard materials (7394 possibilities), Rocketsled finds solutions ~30-60X faster than randomly computing the space. Can use pure ML approaches or use matminer featurizations for materials science (latter helps give such good performance)
  • 29. 29 Results of using optimization can be dramatic! In the problem of finding materials with high K and high G for superhard materials (7394 possibilities), Rocketsled finds solutions ~30-60X faster than randomly computing the space. Even after just 200 calculations of the 7394 possibilities, all solutions are almost certain to be found with Rocketsled. Can use pure ML approaches or use matminer featurizations for materials science (latter helps give such good performance)
  • 30. 30 Getting started with rocketsled Dunn, A.R., et al. Rocketsled: a software library for optimizing high-throughput computational searches. J. Phys. Mater. https://doi.org/10.1088/2515- 7639/ab0c3d hackingmaterials.github.io/ rocketsled https://groups.google.com/for um/#!forum/fireworkflows Paper Docs Support
  • 32. 32 What is needed to do machine learning on materials? How can we represent chemistry and structure as vectors? How do we get enough output data for training?
  • 33. Matminer connects materials data with data mining algorithms and data visualization libraries 33 Ward, L. et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018).
  • 34. >60 featurizer classes can generate thousands of potential descriptors that are described in the literature 34 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) • compatible with scikit- learn pipelining • automatically deploy multiprocessing to parallelize over data • include citations to methodology papers
  • 35. 35 Interactive Jupyter notebooks demonstrate use cases https://github.com/hackingmaterials/matminer_examples Many examples available: • Retrieving data from various databases • Predicting bulk / shear modulus • Predicting formation energies: • from composition alone • with Voronoi-based structure features included • with Coulomb matrix and Orbital Field matrix descriptors (reproducing previous studies in the literature) • Making interactive visualizations • Creating an ML pipeline
  • 36. 36 Getting started with matminer Ward et al. Matminer : An open source toolkit for materials data mining. Computational Materials Science, 152, 60–69 (2018). Paper Docs Support hackingmaterials.github.io /matminer https://groups.google.com/ forum/#!forum/matminer
  • 38. 38 Typically several steps of machine learning are performed by a human researcher – can these be automated? Descriptors developed and chosen by a researcher ML model developed and chosen by a researcher Why can’t we just give the computer some raw input data (compositions, crystal structures) and output properties and get back an ML model?
  • 39. 39 Automatminer develops an ML model automatically given raw data (structures or compositions plus output properties) Featurizer MagPie SOAP Sine Coulomb Matrix + many, many more • Missing value imputation • Scaling • One-hot encoding • PCA-based • Correlation • Relief-based (MultiSURF) Uses genetic algorithms to find the best machine learning model + hyperparameters
  • 40. 40 Automatminer can be used as a black box
  • 41. 41 We are benchmarking automatminer vs current state of the art against 11 problems intended to be a standard test set Dataset Target(s) Samples Elastic Tensor KVRH (GPa), GVRH (GPa) 10,987 Dielectric Tensor Refractive index 4,765 JARVIS 2D Exfoliation energy (meV/atom) 636 Materials Project phonons Highest LO Phonon Frequency (Last PhDOS peak) 1,265 Materials Project (stable) Band gap (eV), Is metallic? (classification) 106,113 Perovskites Formation energy (eV/atom) 18,928 Experimental Band Gaps Is metallic? (classification) 6,354 Experimental Metallic Glasses Glass forms? (classification) 7,190 Materials Project (all) Formation energy (eV/atom) 132,752
  • 42. 42 Usually, automatminer does very well Usually, automatminer outperforms both state-of-the-art graph based models AND human-generated models! But …
  • 43. 43 Graph-based approaches work better in some problems Hypothesis – automatminer approaches are better for smaller data sets, graph-based approaches are better for larger data sets Unfortunately, it can be difficult to train some of the graph models on large data sets, particularly without GPUs, so the results are not in yet!
  • 44. 44 Getting started with automatminer Paper Docs Support hackingmaterials.github.io /automatminer https://groups.google.com/ forum/#!forum/matminer In preparation …
  • 46. We have extracted ~3 million abstracts of scientific articles We will use natural language processing algorithms to try to extract knowledge from all this data 46 Goal: collect knowledge embedded in the scientific literature
  • 47. 47 An engine to label the content of scientific abstracts Collect, clean, and extract information from millions of published materials science journal abstracts
  • 48. 48 Application: a revised materials search engine Auto-generated summaries of materials based on text mining
  • 49. 49 Application: materials compositions of interest … A search for thermoelectrics that do not have Pb or Bi
  • 50. • We use the word2vec algorithm (Google) to turn each unique word in our corpus into a 200- dimensional vector • These vectors encode the meaning of each word meaning based on trying to predict context words around the target 50 Using the word2vec algorithm to extract knowledge and make predictions
  • 51. • Dot product of a composition word with the word “thermoelectric” essentially predicts how likely that word is to appear in an abstract with the word thermoelectric • Compositions with high dot products are typically known thermoelectrics • Sometimes, compositions have a high dot product with “thermoelectric” but have never been studied as a thermoelectric • These compositions usually have high computed power factors! (BoltzTraP) 51 Vector dot products measure similarity
  • 52. • We can test to see if our method can predict new compositions • For every year in the past ~2 decades (e.g. 2001), train word embeddings only until that point in time • Make predictions of what materials are the most promising thermoelectrics • See if those materials have thus far been actually studied as thermoelectrics 52 Can we predict future thermoelectrics discoveries with this method?
  • 53. • Thus far, 2 of our top 20 predictions made in ~August 2018 have already been reported in the literature for the first time as thermoelectrics – Li3Sb was the subject of a computational study (predicted zT=2.42) in Oct 2018 – SnTe2 was experimentally found to be a moderately good thermoelectric (expt zT=0.71) in Dec 2018 • We are currently trying to make some of the other compounds on the list – The full list will be published alongside a paper (currently under review) 53 Next steps [1] Yang et al. "Low lattice thermal conductivity and excellent thermoelectric behavior in Li3Sb and Li3Bi." Journal of Physics: Condensed Matter 30.42 (2018): 425401 [2] Wang et al. "Ultralow lattice thermal conductivity and electronic properties of monolayer 1T phase semimetal SiTe2 and SnTe2." Physica E: Low-dimensional Systems and Nanostructures 108 (2019): 53-59
  • 55. • Lead developers: – Atomate: Kiran Mathew – Rocketsled: Alex Dunn – Matminer: Logan Ward – Automatminer: Alex Dunn – Matscholar: Vahe Tshitoyan, John Dagdelen, Leigh Weston • And the dozens of other developers who have contributed to these packages or reported issues! • Funding: U.S. Department of Energy, Basic Energy Sciences, Early Career Award • Additional funding from the DOE-funded Materials Project 55 Acknowledgements