Dies ist die HTML-Version der Datei https://proceedings.mlr.press/v139/luo21a.html.
Google erzeugt beim Web-Durchgang automatische HTML-Versionen von Dokumenten.
Diese Suchbegriffe wurden hervorgehoben: deep discrete flow
GraphDF: A Discrete Flow Model for Molecular Graph Generation
Page 1
GraphDF: A Discrete Flow Model for Molecular Graph Generation
Youzhi Luo 1 Keqiang Yan 1 Shuiwang Ji 1
Abstract
We consider the problem of molecular graph gen-
eration using deep models. While graphs are dis-
crete, most existing methods use continuous la-
tent variables, resulting in inaccurate modeling of
discrete graph structures. In this work, we pro-
pose GraphDF, a novel discrete latent variable
model for molecular graph generation based on
normalizing flow methods. GraphDF uses invert-
ible modulo shift transforms to map discrete la-
tent variables to graph nodes and edges. We show
that the use of discrete latent variables reduces
computational costs and eliminates the negative
effect of dequantization. Comprehensive exper-
imental results show that GraphDF outperforms
prior methods on random generation, property
optimization, and constrained optimization tasks.
1. Introduction
A fundamental problem in drug discovery (Stokes et al.,
2020; Wang et al., 2020) and chemistry is to design novel
molecules with specific properties. This task remains to
be challenging because the space of molecules is naturally
discrete and very large, with estimated size on the order
of 1033 (Polishchuk et al., 2013). Recent advances in deep
generative models have lead to significant progress in this
field. Notably, many recent studies represent molecular
structures as graphs and propose to generate novel molecular
graphs with advanced deep generative models. From a
conceptual and computational perspective, these methods
first map molecular graphs to vectors in the continuous latent
space. When generating, the generative models project a
randomly picked continuous vector in the latent space back
to the molecular graph space.
To make use of generative models with continuous latent
variables, existing methods (Madhawa et al., 2019; Shi*
et al., 2020; Zang & Wang, 2020) convert discrete graph
1Department of Computer Science & Engineering, Texas
A&M University, TX, USA. Correspondence to: Shuiwang Ji
<sji@tamu.edu>.
Proceedings of the 38th International Conference on Machine
Learning, PMLR 139, 2021. Copyright 2021 by the author(s).
data to continuous data by adding real-valued noise. How-
ever, such dequantization processing prevents models from
capturing the original discrete distribution of discrete graph
structures, thus increasing the difficulty of model training.
This becomes a key limitation that makes it hard to capture
the true distribution of graph structures and generate diverse
molecules.
In this work, we propose GraphDF, a generative model
using discrete latent variables for molecular graph genera-
tion. GraphDF generates molecular graphs by sequentially
sampling discrete latent variables and mapping them to
new nodes and edges via invertible modulo shift transforms.
Compared with previous methods, our proposed discrete
transform eliminates the cost of computing Jacobian matrix
and overcomes the limitations resulted from dequantization.
Experimental results show that GraphDF can outperform
previous state-of-the-art methods over various molecule gen-
eration tasks. The implementation of GraphDF has been
integrated into the DIG (Liu et al., 2021) framework.
2. Background and Related Work
2.1. Molecule Generation
Let M = {Mi}m
i=1 be a set of molecules, S(M) ∈ R be a
function computing a specific property score of the molecule
M, and SIM(M,M ) ∈ [0, 1] be a function measuring the
similarity between two molecules M and M . The three
molecule generation tasks can be defined as below.
• Learning a generation model pθ(·) from M, where pθ(M)
is the probability of sampling the molecule M from the
model.
• Learning a property optimization model pθ(·) with respect
to S by maximizing EM∼pθ [S(M)].
• Learning a constrained optimization model pθ(·|M),
so as to maximize EM |M∼pθ [S(M )] while satisfying
SIM(M,M ) > δ, where M is a given molecule, M is
the optimized molecule, and δ is a similarity threshold.
Early attempts for molecule generation (Gómez-Bombarelli
et al., 2018; Kusner et al., 2017; Dai et al., 2018) represented
molecules as SMILES strings (Weininger, 1988) and devel-
oped sequence generation models. These models may not

Page 2
GraphDF: A Discrete Flow Model for Molecular Graph Generation
easily learn the complicated grammatical rules of SMILES
and thus could not generate syntactically valid sequences
well. More recent studies represent molecules as graphs,
and significant progress has been made in molecular graph
generation with deep generative models. A variety of molec-
ular graph generation methods have been developed based
on variational auto-encoders (VAE) (Kingma & Welling,
2013). Some of these methods generate molecular graphs
from a tree structure (Jin et al., 2018; Kajino, 2019); others
add a node or an edge sequentially one at a time (Liu et al.,
2018), or generate node and adjacency matrices directly in
one-shot sampling (Simonovsky & Komodakis, 2018; Ma
et al., 2018). For VAE-based models, Bayesian optimization
can be applied in the latent space to search molecules with
optimized property scores (Jin et al., 2018; Kajino, 2019).
Generative adversarial networks (GAN) (Goodfellow et al.,
2014) have also been used in some molecular graph genera-
tors (De Cao & Kipf, 2018; You et al., 2018a). Similar to
SeqGAN (Yu et al., 2017), these models are trained with
reinforcement learning.
In addition to methods based on VAE and GAN, normal-
izing flow models have been used in various generation
tasks (Dinh et al., 2014; 2016; Rezende & Mohamed, 2015).
These models define invertible transformations between la-
tent variables and data samples, thus allowing for exact
likelihood computation. It has been demonstrated that these
models are able to capture the density of molecular graphs
more accurately (Shi* et al., 2020; Zang & Wang, 2020).
Our proposed discrete flow model is based on the normaliz-
ing flow method.
2.2. Flow Models
A flow model is a parametric invertible function fθ : Rd
R
d, defining an invertible mapping from the latent variable
z ∈ Rd to the data point x ∈ Rd, where x ∼ pX(x) and
z ∼ pZ(z) are random variables. The latent distribution
pZ is a pre-defined probability distribution, e.g., a Gaussian
distribution. The data distribution pX is unknown. But
given an arbitrary data point x ∈ Rd, we can use the change-
of-variable theorem to compute its log-likelihood as
log pX(x) = log pZ(f−1
θ (x)) + log
∣det
∂f−1
θ (x)
∂x
, (1)
where
∂f−1
θ
(x)
∂x
∈ Rd×d is the Jacobian matrix. To train a
flow model fθ on a dataset X = {xi}m
i=1, we use Eqn. (1)
to compute the exact log-likelihoods of all data points and
update the parameter θ by maximizing the log-likelihoods
via gradient descent. To randomly sample a data point from
fθ, a latent variable z is first sampled from the pre-defined
latent distribution pZ. Then we can obtain a data point x
by performing the feedforward transformation x = fθ(z).
Therefore, the mapping fθ should be invertible, and the
computation of the Jacobian matrix
∂f−1
θ
(x)
∂x
needs to be
tractable in order to enabling efficient training and sampling.
A common operation satisfying these two requirements is
the affine coupling layer (Dinh et al., 2016), which has been
used frequently in deep flow models.
Recently, flow models have been used in molecule or graph
generation tasks. Liu et al. (2019) proposes an invertible
message passing neural network, known as GNF, for simple
community graph generation. Following GraphVAE (Si-
monovsky & Komodakis, 2018), several recent studies gen-
erate molecular graphs by generating node and adjacency
matrices in a one-shot fashion via flow models, including
GraphNVP (Madhawa et al., 2019), GRF (Honda et al.,
2019), and MoFlow (Zang & Wang, 2020). GraphAF (Shi*
et al., 2020) generates molecular graphs by adding nodes
and edges sequentially via an autoregressive flow model
(Papamakarios et al., 2017). GraphCNF (Lippe & Gavves,
2021) jointly optimizes a variational inference model and a
flow model for graph generation.
3. A Discrete Flow Model for Graphs
While all current molecular graph generation methods use
continuous latent variables, we argue that such continuous
prior distribution is not suitable to model molecular graph
data, which are naturally discrete. On the other hand, some
recent attempt has been made to develop generative models
with discrete latent variables (Tran et al., 2019). It is, how-
ever, only able to generate 1D sequences, such as texts, and
cannot be used to generate richly structured objects such as
graphs.
In this section, we introduce GraphDF, a novel molecular
graph generative model with discrete latent variables. To our
best knowledge, GraphDF is the first work which demon-
strates that discrete latent variable models have strong ca-
pacity to model the density of complicated molecular graph
data.
3.1. An Overview of Sequential Generation
Let k be the total number of node types and c be the total
number of edge types. We use G = (X, A) to denote a
molecular graph with n nodes, where X ∈ {0, 1}n×k is the
node matrix and A ∈ {0, 1}n×n×c is the adjacency tensor.
X[i, u] = 1 represents that the i-th node has type u, and
A[i, j, v]=1 represents that there is an edge with type v
between the i-th node and j-th node. If there is no edge
between the i-th node and j-th node, then A[i, j, v]=0 for
v = 0,...,c − 1.
Similar to MolecularRNN (Popova et al., 2019), we generate
molecular graphs by a sequential decision process. We
start from an empty molecular graph G0. At step i, let
the current sub-graph be Gi−1 which contains i − 1 nodes.

Page 3
GraphDF: A Discrete Flow Model for Molecular Graph Generation
C
C
O
O
C
C
O
N
C
O
N
C
O
N
a1
<latexit sha1_base64="qvxXKTKQqGCgxt0gPawxPlWgovk=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGLiAQlcyNyywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS0nGqKPNpLGLVCVEzwSXzDTeCdRLFMAoFa4eTu7nffmJK81g+mmnCgghHkg85RWMlv4p9r9ovV9yauwBZJ15OKpCj2S9/9QYxTSMmDRWodddzExNkqAyngs1KvVSzBOkER6xrqcSI6SBbHDsjF1YZkGGsbElDFurviQwjradRaDsjNGO96s3F/7xuaoY3QcZlkhom6XLRMBXExGT+ORlwxagRU0uQKm5vJXSMCqmx+ZRsCN7qy+ukVa95V7X6Q73SuM3jKMIZnMMleHANDbiHJvhAgcMzvMKbI50X5935WLYWnHzmFP7A+fwBoceN5w==</latexit>
a2
<latexit sha1_base64="jODpMrYebbAIPxaKUl2jp1I38Ro=">AAAB7HicbVBNT8JAEJ3iF+IX6tHLRjDxRNp60CPRi0dMLJBAQ7bLFjZsd5vdrQlp+A1ePGiMV3+QN/+NC/Sg4EsmeXlvJjPzopQzbVz32yltbG5t75R3K3v7B4dH1eOTtpaZIjQgkkvVjbCmnAkaGGY47aaK4iTitBNN7uZ+54kqzaR4NNOUhgkeCRYzgo2Vgjoe+PVBteY23AXQOvEKUoMCrUH1qz+UJEuoMIRjrXuem5owx8owwums0s80TTGZ4BHtWSpwQnWYL46doQurDFEslS1h0EL9PZHjROtpEtnOBJuxXvXm4n9eLzPxTZgzkWaGCrJcFGccGYnmn6MhU5QYPrUEE8XsrYiMscLE2HwqNgRv9eV10vYb3lXDf/BrzdsijjKcwTlcggfX0IR7aEEABBg8wyu8OcJ5cd6dj2VrySlmTuEPnM8fo0yN6A==</latexit>
a3
<latexit sha1_base64="YxnVmblYkmZ0a9tO33yJ8LGqPY4=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsotCTaWGLiIQlcyNyywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS1nGqKPNpLGLVCVEzwSXzDTeCdRLFMAoFewwnt3P/8YkpzWP5YKYJCyIcST7kFI2V/Cr2G9V+ueLW3AXIOvFyUoEcrX75qzeIaRoxaahArbuem5ggQ2U4FWxW6qWaJUgnOGJdSyVGTAfZ4tgZubDKgAxjZUsaslB/T2QYaT2NQtsZoRnrVW8u/ud1UzO8DjIuk9QwSZeLhqkgJibzz8mAK0aNmFqCVHF7K6FjVEiNzadkQ/BWX14n7XrNa9Tq9/VK8yaPowhncA6X4MEVNOEOWuADBQ7P8ApvjnRenHfnY9lacPKZU/gD5/MHpNGN6Q==</latexit>
b21
<latexit sha1_base64="9l4CjM3HWwIZ8NT7KVUbf0lpTBs=">AAAB73icbVA9SwNBEJ3zM8avqKXNYiJYhbtYaBm0sYxgPiA5wt5mLlmyt3fu7gnhyJ+wsVDE1r9j579xk1yhiQ8GHu/NMDMvSATXxnW/nbX1jc2t7cJOcXdv/+CwdHTc0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHssHM0nQj+hQ8pAzaqzUqQT9rOZNK/1S2a26c5BV4uWkDDka/dJXbxCzNEJpmKBadz03MX5GleFM4LTYSzUmlI3pELuWShqh9rP5vVNybpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjtZ1wmqUHJFovCVBATk9nzZMAVMiMmllCmuL2VsBFVlBkbUdGG4C2/vEpatap3Wa3d18r1mzyOApzCGVyAB1dQhztoQBMYCHiGV3hzHp0X5935WLSuOfnMCfyB8/kD272PMA==</latexit>
b31
<latexit sha1_base64="EKBErNKxiLt/qBgr2QKYVFMYPwE=">AAAB73icbVA9TwJBEJ3DL8Qv1NJmI5hYkTsotCTaWGIiHwlcyN6ywIa9vXN3zoRc+BM2Fhpj69+x89+4wBUKvmSSl/dmMjMviKUw6LrfTm5jc2t7J79b2Ns/ODwqHp+0TJRoxpsskpHuBNRwKRRvokDJO7HmNAwkbweT27nffuLaiEg94DTmfkhHSgwFo2ilTjnopzVvVu4XS27FXYCsEy8jJcjQ6Be/eoOIJSFXyCQ1puu5Mfop1SiY5LNCLzE8pmxCR7xrqaIhN366uHdGLqwyIMNI21JIFurviZSGxkzDwHaGFMdm1ZuL/3ndBIfXfipUnCBXbLlomEiCEZk/TwZCc4ZyagllWthbCRtTTRnaiAo2BG/15XXSqla8WqV6Xy3Vb7I48nAG53AJHlxBHe6gAU1gIOEZXuHNeXRenHfnY9mac7KZU/gD5/MH3USPMQ==</latexit>
G1
<latexit sha1_base64="gcc9jSXN9EUzvx//j4C+HoVfs7Q=">AAAB7nicbVA9SwNBEJ3zM8avqKXNYiJYhbtYaBm00DKC+YDkCHubvWTJ3t6xOyeEIz/CxkIRW3+Pnf/GTXKFJj4YeLw3w8y8IJHCoOt+O2vrG5tb24Wd4u7e/sFh6ei4ZeJUM95ksYx1J6CGS6F4EwVK3kk0p1EgeTsY38789hPXRsTqEScJ9yM6VCIUjKKV2pW7fuZNK/1S2a26c5BV4uWkDDka/dJXbxCzNOIKmaTGdD03QT+jGgWTfFrspYYnlI3pkHctVTTixs/m507JuVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbz2M6GSFLlii0VhKgnGZPY7GQjNGcqJJZRpYW8lbEQ1ZWgTKtoQvOWXV0mrVvUuq7WHWrl+k8dRgFM4gwvw4ArqcA8NaAKDMTzDK7w5ifPivDsfi9Y1J585gT9wPn8AP5yO2Q==</latexit>
G2
<latexit sha1_base64="Qrljrl6LeDVQIMYoyRBglFdL07E=">AAAB7nicbVA9SwNBEJ3zM8avqKXNYiJYhbtYaBm00DKC+YDkCHubuWTJ3t6xuyeEIz/CxkIRW3+Pnf/GTXKFJj4YeLw3w8y8IBFcG9f9dtbWNza3tgs7xd29/YPD0tFxS8epYthksYhVJ6AaBZfYNNwI7CQKaRQIbAfj25nffkKleSwfzSRBP6JDyUPOqLFSu3LXz2rTSr9UdqvuHGSVeDkpQ45Gv/TVG8QsjVAaJqjWXc9NjJ9RZTgTOC32Uo0JZWM6xK6lkkao/Wx+7pScW2VAwljZkobM1d8TGY20nkSB7YyoGellbyb+53VTE177GZdJalCyxaIwFcTEZPY7GXCFzIiJJZQpbm8lbEQVZcYmVLQheMsvr5JWrepdVmsPtXL9Jo+jAKdwBhfgwRXU4R4a0AQGY3iGV3hzEufFeXc+Fq1rTj5zAn/gfP4AQSKO2g==</latexit>
G3
<latexit sha1_base64="3kUHrBSJT0bPTYItbYvmpsF8zCc=">AAAB7nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsotCRaaImJfCRwIXvLAhv29i67cybkwo+wsdAYW3+Pnf/GBa5Q8CWTvLw3k5l5QSyFQdf9dnIbm1vbO/ndwt7+weFR8fikZaJEM95kkYx0J6CGS6F4EwVK3ok1p2EgeTuY3M799hPXRkTqEacx90M6UmIoGEUrtct3/bQ2K/eLJbfiLkDWiZeREmRo9ItfvUHEkpArZJIa0/XcGP2UahRM8lmhlxgeUzahI961VNGQGz9dnDsjF1YZkGGkbSkkC/X3REpDY6ZhYDtDimOz6s3F/7xugsNrPxUqTpArtlw0TCTBiMx/JwOhOUM5tYQyLeythI2ppgxtQgUbgrf68jppVSterVJ9qJbqN1kceTiDc7gED66gDvfQgCYwmMAzvMKbEzsvzrvzsWzNOdnMKfyB8/kDQqiO2w==</latexit>
G0
<latexit sha1_base64="OUppMX9nvgBAawUge6ZDCJCa0GA=">AAAB7nicbVA9SwNBEJ3zM8avqKXNYiJYhbtYaBm00DKC+YDkCHubvWTJ3t6xOyeEIz/CxkIRW3+Pnf/GTXKFJj4YeLw3w8y8IJHCoOt+O2vrG5tb24Wd4u7e/sFh6ei4ZeJUM95ksYx1J6CGS6F4EwVK3kk0p1EgeTsY38789hPXRsTqEScJ9yM6VCIUjKKV2pW7fuZOK/1S2a26c5BV4uWkDDka/dJXbxCzNOIKmaTGdD03QT+jGgWTfFrspYYnlI3pkHctVTTixs/m507JuVUGJIy1LYVkrv6eyGhkzCQKbGdEcWSWvZn4n9dNMbz2M6GSFLlii0VhKgnGZPY7GQjNGcqJJZRpYW8lbEQ1ZWgTKtoQvOWXV0mrVvUuq7WHWrl+k8dRgFM4gwvw4ArqcA8NaAKDMTzDK7w5ifPivDsfi9Y1J585gT9wPn8APhaO2A==</latexit>
Step 0
Step 1
Step 2
Step 3
b32
<latexit sha1_base64="Oc55GfTEAPyJduLgB1WKQ5JiQVc=">AAAB73icbVA9TwJBEJ3DL8Qv1NJmI5hYkbuj0JJoY4mJfCRwIXvLHmzY3Tt390zIhT9hY6Extv4dO/+NC1yh4EsmeXlvJjPzwoQzbVz32ylsbG5t7xR3S3v7B4dH5eOTto5TRWiLxDxW3RBrypmkLcMMp91EUSxCTjvh5Hbud56o0iyWD2aa0EDgkWQRI9hYqVsNB1ndn1UH5YpbcxdA68TLSQVyNAflr/4wJqmg0hCOte55bmKCDCvDCKezUj/VNMFkgke0Z6nEguogW9w7QxdWGaIoVrakQQv190SGhdZTEdpOgc1Yr3pz8T+vl5roOsiYTFJDJVkuilKOTIzmz6MhU5QYPrUEE8XsrYiMscLE2IhKNgRv9eV10vZrXr3m3/uVxk0eRxHO4BwuwYMraMAdNKEFBDg8wyu8OY/Oi/PufCxbC04+cwp/4Hz+AN7KjzI=</latexit>
Figure 1. An illustration of sequential generation procedure.
The generative model fθ first generates a new node with
type ai based on a sampled latent variable zi, where ai
{0,...,k − 1}. Afterwards, the edges between the new
node and all nodes in Gi−1 are generated sequentially by
fθ. Specifically, the edge incident to the j-th node of Gi−1
is generated based on the latent variable zij, whose edge
type bij ∈ {0,...,c − 1,c}, where the extra edge type c
means no edge. If bij violates the chemical bond valency
rule, it is re-generated by re-sampling a new zij. The above
process is an autoregressive function of previously generated
elements and repeats until no new edge is added at step i,
i.e., bij = c for j = 1,...,i−1. An example of this process
is given in Fig. 1. In summary, generating a molecular graph
G is equivalent to generating the sequence representation
SG = (a1,a2,b21,a3,b31,b32,a4,... ) by
ai = fθ(a1,...,bi−1,i−2; zi), i > 0,
bij = fθ(a1,...,bi,j−1; zij), j = 1,...,i − 1.
(2)
3.2. Generation with Discrete Latent Variables
In our method, all latent variables are discrete and sampled
from multinomial distributions. Specifically, zi is sampled
from a multinomial distribution pZa with trainable parame-
ters (α0,...,αk−1) as
pZa (zi = s) =
exp (αs)
k−1
t=0 exp (αt)
.
(3)
Similarly, zij is sampled from the multinomial distribution
pZb with trainable parameters (β0,...,βc). We use a dis-
crete flow model to reversibly map discrete latent variables
to new nodes and edges. The discrete transform used in the
Flow
pZa
<latexit sha1_base64="0ggLGbi8xdq1TJUx03cbEU9us4M=">AAAB8HicbVA9T8MwEL2Ur1K+CowsFi0SU5WUAcYKFsYi0Q9oo8hxndaq7US2g1RF/RUsDCDEys9h49/gthmg5UknPb13p7t7YcKZNq777RTW1jc2t4rbpZ3dvf2D8uFRW8epIrRFYh6rbog15UzSlmGG026iKBYhp51wfDPzO09UaRbLezNJqC/wULKIEWys9FBNguwxwNNqUK64NXcOtEq8nFQgRzMof/UHMUkFlYZwrHXPcxPjZ1gZRjidlvqppgkmYzykPUslFlT72fzgKTqzygBFsbIlDZqrvycyLLSeiNB2CmxGetmbif95vdREV37GZJIaKsliUZRyZGI0+x4NmKLE8IklmChmb0VkhBUmxmZUsiF4yy+vkna95l3U6nf1SuM6j6MIJ3AK5+DBJTTgFprQAgICnuEV3hzlvDjvzseiteDkM8fwB87nDy9yj/8=</latexit>
Flow
R-GCN
Flow
HL
<latexit sha1_base64="P5hewfRdl79dlSrcQa3Oy8qll2I=">AAAB7HicbVA9TwJBEJ3zE/ELtbTZCCZW5A4LLYk2FBaYeEACJ9lb9mDD3u5ld8+EXPgNNhYaY+sPsvPfuMAVCr5kkpf3ZjIzL0w408Z1v5219Y3Nre3CTnF3b//gsHR03NIyVYT6RHKpOiHWlDNBfcMMp51EURyHnLbD8e3Mbz9RpZkUD2aS0CDGQ8EiRrCxkl9pPN5V+qWyW3XnQKvEy0kZcjT7pa/eQJI0psIQjrXuem5iggwrwwin02Iv1TTBZIyHtGupwDHVQTY/dorOrTJAkVS2hEFz9fdEhmOtJ3FoO2NsRnrZm4n/ed3URNdBxkSSGirIYlGUcmQkmn2OBkxRYvjEEkwUs7ciMsIKE2PzKdoQvOWXV0mrVvUuq7X7Wrl+k8dRgFM4gwvw4Arq0IAm+ECAwTO8wpsjnBfn3flYtK45+cwJ/IHz+QOjGY3o</latexit>
Flow
pZb
<latexit sha1_base64="RBhCTaumSrR7Pj6w+aZt1RFedk0=">AAAB8HicbVA9T8MwEL2Ur1K+CowsFi0SU5WUAcYKFsYi0Q9oo8hxndaq7US2g1RF/RUsDCDEys9h49/gthmg5UknPb13p7t7YcKZNq777RTW1jc2t4rbpZ3dvf2D8uFRW8epIrRFYh6rbog15UzSlmGG026iKBYhp51wfDPzO09UaRbLezNJqC/wULKIEWys9FBNguwxCKfVoFxxa+4caJV4OalAjmZQ/uoPYpIKKg3hWOue5ybGz7AyjHA6LfVTTRNMxnhIe5ZKLKj2s/nBU3RmlQGKYmVLGjRXf09kWGg9EaHtFNiM9LI3E//zeqmJrvyMySQ1VJLFoijlyMRo9j0aMEWJ4RNLMFHM3orICCtMjM2oZEPwll9eJe16zbuo1e/qlcZ1HkcRTuAUzsGDS2jALTShBQQEPMMrvDnKeXHenY9Fa8HJZ47hD5zPHzD4kAA=</latexit>
a1
<latexit sha1_base64="qvxXKTKQqGCgxt0gPawxPlWgovk=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGLiAQlcyNyywIa9vcvungm58BtsLDTG1h9k579xgSsUfMkkL+/NZGZemAiujet+O4WNza3tneJuaW//4PCofHzS0nGqKPNpLGLVCVEzwSXzDTeCdRLFMAoFa4eTu7nffmJK81g+mmnCgghHkg85RWMlv4p9r9ovV9yauwBZJ15OKpCj2S9/9QYxTSMmDRWodddzExNkqAyngs1KvVSzBOkER6xrqcSI6SBbHDsjF1YZkGGsbElDFurviQwjradRaDsjNGO96s3F/7xuaoY3QcZlkhom6XLRMBXExGT+ORlwxagRU0uQKm5vJXSMCqmx+ZRsCN7qy+ukVa95V7X6Q73SuM3jKMIZnMMleHANDbiHJvhAgcMzvMKbI50X5935WLYWnHzmFP7A+fwBoceN5w==</latexit>
a2
<latexit sha1_base64="jODpMrYebbAIPxaKUl2jp1I38Ro=">AAAB7HicbVBNT8JAEJ3iF+IX6tHLRjDxRNp60CPRi0dMLJBAQ7bLFjZsd5vdrQlp+A1ePGiMV3+QN/+NC/Sg4EsmeXlvJjPzopQzbVz32yltbG5t75R3K3v7B4dH1eOTtpaZIjQgkkvVjbCmnAkaGGY47aaK4iTitBNN7uZ+54kqzaR4NNOUhgkeCRYzgo2Vgjoe+PVBteY23AXQOvEKUoMCrUH1qz+UJEuoMIRjrXuem5owx8owwums0s80TTGZ4BHtWSpwQnWYL46doQurDFEslS1h0EL9PZHjROtpEtnOBJuxXvXm4n9eLzPxTZgzkWaGCrJcFGccGYnmn6MhU5QYPrUEE8XsrYiMscLE2HwqNgRv9eV10vYb3lXDf/BrzdsijjKcwTlcggfX0IR7aEEABBg8wyu8OcJ5cd6dj2VrySlmTuEPnM8fo0yN6A==</latexit>
b21
<latexit sha1_base64="9l4CjM3HWwIZ8NT7KVUbf0lpTBs=">AAAB73icbVA9SwNBEJ3zM8avqKXNYiJYhbtYaBm0sYxgPiA5wt5mLlmyt3fu7gnhyJ+wsVDE1r9j579xk1yhiQ8GHu/NMDMvSATXxnW/nbX1jc2t7cJOcXdv/+CwdHTc0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHssHM0nQj+hQ8pAzaqzUqQT9rOZNK/1S2a26c5BV4uWkDDka/dJXbxCzNEJpmKBadz03MX5GleFM4LTYSzUmlI3pELuWShqh9rP5vVNybpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjtZ1wmqUHJFovCVBATk9nzZMAVMiMmllCmuL2VsBFVlBkbUdGG4C2/vEpatap3Wa3d18r1mzyOApzCGVyAB1dQhztoQBMYCHiGV3hzHp0X5935WLSuOfnMCfyB8/kD272PMA==</latexit>
pZa
<latexit sha1_base64="0ggLGbi8xdq1TJUx03cbEU9us4M=">AAAB8HicbVA9T8MwEL2Ur1K+CowsFi0SU5WUAcYKFsYi0Q9oo8hxndaq7US2g1RF/RUsDCDEys9h49/gthmg5UknPb13p7t7YcKZNq777RTW1jc2t4rbpZ3dvf2D8uFRW8epIrRFYh6rbog15UzSlmGG026iKBYhp51wfDPzO09UaRbLezNJqC/wULKIEWys9FBNguwxwNNqUK64NXcOtEq8nFQgRzMof/UHMUkFlYZwrHXPcxPjZ1gZRjidlvqppgkmYzykPUslFlT72fzgKTqzygBFsbIlDZqrvycyLLSeiNB2CmxGetmbif95vdREV37GZJIaKsliUZRyZGI0+x4NmKLE8IklmChmb0VkhBUmxmZUsiF4yy+vkna95l3U6nf1SuM6j6MIJ3AK5+DBJTTgFprQAgICnuEV3hzlvDjvzseiteDkM8fwB87nDy9yj/8=</latexit>
C
C
O
O
C
z1
<latexit sha1_base64="N+S6IZnEe5lDb+zbF2PRiXqrMdo=">AAAB7HicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGLigQlcyN4ywIa9vcvungle+A02Fhpj6w+y89+4wBUKvmSSl/dmMjMvTATXxnW/ncLa+sbmVnG7tLO7t39QPjxq6ThVDH0Wi1g9hFSj4BJ9w43Ah0QhjUKB7XB8M/Pbj6g0j+W9mSQYRHQo+YAzaqzkV596XrVXrrg1dw6ySrycVCBHs1f+6vZjlkYoDRNU647nJibIqDKcCZyWuqnGhLIxHWLHUkkj1EE2P3ZKzqzSJ4NY2ZKGzNXfExmNtJ5Eoe2MqBnpZW8m/ud1UjO4CjIuk9SgZItFg1QQE5PZ56TPFTIjJpZQpri9lbARVZQZm0/JhuAtv7xKWvWad1Gr39Urjes8jiKcwCmcgweX0IBbaIIPDDg8wyu8OdJ5cd6dj0VrwclnjuEPnM8fx/aOAA==</latexit>
z2
<latexit sha1_base64="tk8MnFUw7xVEHpVWqImcw6buh90=">AAAB7HicbVBNT8JAEJ3iF+IX6tHLRjDxRNp60CPRi0dMLJhAQ7bLFjZsd5vdrQk2/AYvHjTGqz/Im//GBXpQ8CWTvLw3k5l5UcqZNq777ZTW1jc2t8rblZ3dvf2D6uFRW8tMERoQyaV6iLCmnAkaGGY4fUgVxUnEaSca38z8ziNVmklxbyYpDRM8FCxmBBsrBfWnvl/vV2tuw50DrRKvIDUo0OpXv3oDSbKECkM41rrruakJc6wMI5xOK71M0xSTMR7SrqUCJ1SH+fzYKTqzygDFUtkSBs3V3xM5TrSeJJHtTLAZ6WVvJv7ndTMTX4U5E2lmqCCLRXHGkZFo9jkaMEWJ4RNLMFHM3orICCtMjM2nYkPwll9eJW2/4V00/Du/1rwu4ijDCZzCOXhwCU24hRYEQIDBM7zCmyOcF+fd+Vi0lpxi5hj+wPn8Acl7jgE=</latexit>
z21
<latexit sha1_base64="mGEHp6vJAR3z16vWMJacKFgC0pI=">AAAB73icbVA9T8MwEL2Ur1K+CowsFi0SU5WEAcYKFsYi0Q+pjSrHdVqrthNsB6lE/RMsDCDEyt9h49/gthmg5UknPb13p7t7YcKZNq777RTW1jc2t4rbpZ3dvf2D8uFRS8epIrRJYh6rTog15UzSpmGG006iKBYhp+1wfDPz249UaRbLezNJaCDwULKIEWys1Kk+9TPfm1b75Ypbc+dAq8TLSQVyNPrlr94gJqmg0hCOte56bmKCDCvDCKfTUi/VNMFkjIe0a6nEguogm987RWdWGaAoVrakQXP190SGhdYTEdpOgc1IL3sz8T+vm5roKsiYTFJDJVksilKOTIxmz6MBU5QYPrEEE8XsrYiMsMLE2IhKNgRv+eVV0vJr3kXNv/Mr9es8jiKcwCmcgweXUIdbaEATCHB4hld4cx6cF+fd+Vi0Fpx85hj+wPn8AQC8j0g=</latexit>
Conditional
information
Conditional
information
Conditional
information
Figure 2. An illustration of generation with GraphDF framework.
discrete flow is a modulo shift transform in the form of
q(z)=(z + µ) mod t,
(4)
where t is the number of categories, and z, µ ∈ {0,...,t −
1}. Formally, the discrete transform for generating new
nodes and edges in Eqn. (2) is the composition of D modulo
shift modules as
ai = qD
i ◦···◦ q1
i (zi), i > 0,
bij = qD
ij ◦···◦ q1
ij(zij), j = 1,...,i − 1,
(5)
where f ◦ g(·) is defined as f (g(·)), and
qd
i (z)=(z + µd
i ) mod k, d = 1,...,D
qd
ij(z)=(z + µd
ij) mod (c + 1), d = 1,...,D.
(6)
Here the shift factors µd
i
and µd
ij are functions of
(a1,a2,b21,...,bi−1,i−2) and (a1,a2,b21,...,bi,j−1), re-
spectively. They both play the role of capturing conditional
information from previously generated elements. The cal-
culation of µd
i and µd
ij will be described in Sec. 3.4. An
illustrative example of generating a simple molecular graph
with this framework is given in Fig. 2.
The use of discrete latent variables is the key factor distin-
guishing our method from other generation methods based

Page 4
GraphDF: A Discrete Flow Model for Molecular Graph Generation
on flow models. Our model has several advantages. The
change-of-variable formula of discrete invertible mapping
does not involve the Jacobian matrix term in Eqn. (1). Hence
the probabilities of sampled nodes and edges can be directly
obtained as
p(ai|a1,a2,b21,...,bi−1,i−2) = pZa (zi),
p(bij|a1,a2,b21,...,bi,j−1) = pZb (zij),
(7)
leading to dramatic reduction in computational cost.
More importantly, discrete latent variables free the gener-
ative model from the defects of dequantization. Previous
methods (Madhawa et al., 2019; Shi* et al., 2020; Zang
& Wang, 2020) convert categorical representation of graph
nodes and edges to continuous data by adding real-valued
noise to their one-hot encodings. Then continuous flow
models map such pseudo continuous data to latent vari-
ables subject to Gaussian distribution. We argue that such
a dequantization operation can be problematic. First, the
generative model does not capture the true distribution of
original discrete data, but the distorted distribution after
dequantization. Second, due to the randomness of noise, the
resulting continuous data points can be very different even if
they are obtained by performing dequantization on the same
graph. Hence during training, a single graph may be mapped
to two distant points in the latent space, leading to model
oscillation and failure to converge. However, our GraphDF
maps the discrete graph data to discrete latent space by dis-
crete invertible flow without introducing any distribution
distortion. Hence it can model the underlying distribution
of graph structures more accurately, thereby eliminating the
impact of two issues caused by dequantization.
Note that our method is fundamentally different from
GraphCNF (Lippe & Gavves, 2021), though dequantization
is avoided in both methods. GraphCNF maps discrete values
to continuous latent space where different categories occupy
non-overlapping regions. Such a discrete-to-continuous
mapping leads to distribution distortions. In the generation
process, discrete values are sampled by applying Bayes,
and all graph nodes and edges are generated at once. On
the other hand, GraphDF uses a fundamentally different
approach to model the discrete distribution. GraphDF com-
pletely discards continuous latent space and maps discrete
values to discrete latent space by the modulo shift trans-
form. No distribution distortion is introduced here. In
addition, compared with GraphCNF, the sequential fashion
of GraphDF can capture the graph structure information
efficiently and guarantee the chemical validity because of
using bond valency check in the generation.
3.3. Inference and Training
The inverse process of generation is inference, which in-
volves mapping from graphs to discrete latent variables.
Given a graph G = (X, A), we compute its sequence repre-
sentation SG by re-ordering its nodes and edges according
to the breadth-first search (BFS) order. Then we can obtain
the discrete latent variables corresponding to each element
of this sequence by inverting Eqn. (5) as
zi = o1
i ◦···◦ oD
i (ai), i > 0,
zij = o1
ij ◦···◦ oD
ij (bij), j = 1,...,i − 1,
(8)
where
od
i (z)=(z − µd
i ) mod k, d = 1,...,D
od
ij(z)=(z − µd
ij) mod (c + 1), d = 1,...,D.
(9)
Similar to the generation process, the shift factors µd
i and
µd
ij capture autoregressive information from elements in SG
before ai and bij, respectively.
We train a GraphDF model on a molecular graph dataset by
maximizing the log-likelihoods of all data samples. Given
a molecular graph G, we first perform inference on G to
compute the latent variables corresponding to its nodes and
edges. Then the log-likelihood of G can be computed as
log p(G) =
n
i=1
log pZa (zi) +
i−1
j=1
log pZb (zij)
 . (10)
The parameters of GraphDF models and discrete prior dis-
tributions pZa ,pZb are updated by gradient descent to maxi-
mize the log-likelihoods of all data samples.
3.4. Invariant and Discrete Conditional Generation
The transformation of discrete flow in Eqn. (5) is autore-
gressive and requires capturing conditional information into
µd
i and µd
ij. Tran et al. (2019) uses language models, like
LSTM (Hochreiter & Schmidhuber, 1997) and Transformer
(Vaswani et al., 2017), in discrete flow models to develop
sequence generators. However, the same method cannot be
used directly to generate SG for two reasons as below.
• Language models are only aware of the node and edge
type information encoded in SG, while rich structure in-
formation of graphs is not captured.
• There is no intrinsic order in an intermediately generated
sub-graph, but language models assume an order when
dealing with SG. For instance, for (a1,a2) = (1, 0) and
(a1,a2) = (0, 1), the generative model should give the
same estimate p(b21|a1,a2), but language models cannot
provide such a guarantee.
To tackle the two issues, we propose to capture conditional
information from intermediate sub-graphs by using graph
convolutional networks (GCN) (Kipf & Welling, 2017; Gao

Page 5
GraphDF: A Discrete Flow Model for Molecular Graph Generation
et al., 2018; Gao & Ji, 2019). Specifically, let the sub-
graph generated immediately before adding the node ai
be Gi−1 = (X, A), we use an L-layer relational GCN (R-
GCN) (Schlichtkrull et al., 2018) to obtain node embeddings
HL ∈ Rn×r from Gi−1 by performing L message passing
operations. The l-th message passing is performed by:
Hl =
c
v=1
ReLU
( ˜
D
1
2
v
˜Av˜D1
2
v
Hl−1Wl
v
)
,
(11)
where H0 = X, ˜Av = A[:, :,v]+I, ˜Dv is a diagonal matrix
and˜Dv[j, j] =
n
u=1
˜Av[j, u], {Wl
v }c
v=1 are trainable weight
matrices of the l-th layer. Note that the same R-GCN is
shared among all modulo shift modules in the discrete flow.
Afterwards, µd
i is calculated by a classification network
based on multi-layer perceptrons (MLP) as
h = sum(HL),
µd
i = arg max MLPd
a (h) , d = 1,...,D,
(12)
where h ∈ Rr is the graph embedding obtained by perform-
ing sum-pooling on node embeddings HL, MLP
d
a projects
the input to a k-dimensional logits vector, and arg max
outputs the index position of the maximum logits value.
Since arg max is not differentiable, we use the softmax-
temperature function to approximate the gradient (Jang et al.,
2016) during training as
γd = MLPd
a(h), d = 1,...,D,
d
d
d
d
softmax(
γd
τ
), d = 1,...,D,
(13)
where we fix the temperature τ = 0.1 in experiments.
Similarly, to obtain µd
ij, we use the R-GCN to extract
node embeddings HL and graph embedding h from Gi−1,j,
which is the sub-graph generated immediately before adding
the edge bij. Then we compute µd
ij by MLP as:
µd
ij = arg max MLPd
b(Con(h, HL
i ,HL
j )) ,d = 1,...,D,
where HL
i and HL
j denote the node embeddings of the i-
th and j-th node of Gi−1,j, respectively, and Con(·) is the
concatenation operation. The softmax approximation is also
used here to perform back propagation through arg max.
By using our proposed method, the discrete flow model
can elegantly extract conditional information from graph
structures, which is then incorporated into the mapping from
discrete latent space to newly generated graph nodes and
edges. The conditional information is guaranteed to be
invariant to the order of previously generated elements. We
summarize the detailed generation and training algorithms
of GraphDF in Appendix A and B, respectively.
3.5. Property Optimization with Reinforcement
Learning
The above discussions involve molecular graph generation
independent of property scores. However, both property
optimization and constrained optimization require the gen-
erated molecules to have optimal property scores. To this
end, we use reinforcement learning to fine-tune a trained
GraphDF model to optimize the specified property score.
We can formulate the sequential generation process with
GraphDF as a Markov Decision Process (MDP). Formally,
at each step, we treat the currently generated sub-graph as
the state, and adding a new node or edge as the action. The
action probabilities p(ai|Gi−1) and p(bij|Gi−1,j) are thus
determined by the GraphDF model in Eqn. (7). The rewards
consist of intermediate reward and final reward. The inter-
mediate reward is the penalty for violating chemical bond
valency rule. The final reward incorporates the property
score and penalty for containing excessive steric strain or
violating functional group filters of ZINC (Irwin et al., 2012)
in the finally generated molecule.
Following GCPN (You et al., 2018a), we use Proximal Pol-
icy Optimization (PPO) (Schulman et al., 2017), a powerful
policy optimization algorithm, to fine-tune GraphDF models
in the above reinforcement learning environment. The loss
function for a generated graph G is
L(G, θ) =
n
i=1
Lc(ri,Ai) +
n
j=1
Lc(rij,Aij)
 ,
Lc(r, A) = min{rA, clip(r, 1 − ϵ, 1 + ϵ)A},
(14)
where ri =
p(ai|Gi−1;θ)
p(ai|Gi−1old)
and rij =
p(bij |Gi−1,j ;θ)
p(bij |Gi−1,j old)
are
ratios of action probabilities by new policy and old policy.
Ai and Aij are estimated advantage functions, for which we
use the accumulated rewards of future steps in experiments.
4. Experiments
In this section, we evaluate our GraphDF on three tasks
of molecule generation, as mentioned in Section 2.1. We
show that over most tasks, GraphDF can outperform three
strong baselines, JT-VAE (Jin et al., 2018), GCPN (You
et al., 2018a) and MRNN (Popova et al., 2019), and other
molecular graph generation methods based on flow models
including GraphNVP (Madhawa et al., 2019), GRF (Honda
et al., 2019), GraphAF (Shi* et al., 2020), MoFlow (Zang &
Wang, 2020) and GraphCNF (Lippe & Gavves, 2021).
4.1. Random Generation
Data. For random generation of molecular graphs, we
train and evaluate GraphDF on three molecule datasets,
ZINC250K (Irwin et al., 2012), QM9 (Ramakrishnan

Page 6
GraphDF: A Discrete Flow Model for Molecular Graph Generation
Table 1. Random generation performance on ZINC250K dataset. Results with ∗ are taken from Shi* et al. (2020). Results of MoFlow
(with ) are obtained by running its official source code.
Method
Validity
Validity w/o check
Uniqueness Novelty
Reconstruction
JT-VAE
100%
n/a
100%
100%
76.7%
GCPN
100%
20%
99.97%
100%
n/a
MRNN
100%
65%
99.89%
100%
n/a
GraphNVP
42.6%
n/a
94.8%
100%
100%
GRF
73.4%
n/a
53.7%
100%
100%
GraphAF
100%
68%
99.1%
100%
100%
MoFlow
100%
50.3%
99.99%
100%
100%
GraphCNF
96.35%
n/a
99.98%
99.98%
100%
GraphDF
100%
89.03%
99.16%
100%
100%
Table 2. Random generation performance on QM9 dataset. Results of MoFlow (with ) are obtained by running its official source code.
Method
Validity
Validity w/o check
Uniqueness
Novelty
Reconstruction
GraphNVP
83.1%
n/a
99.2%
58.2%
100%
GRF
84.5%
n/a
66%
58.6%
100%
GraphAF
100%
67%
94.15%
88.83%
100%
MoFlow
100%
88.96%
98.53%
96.04%
100%
GraphDF
100%
82.67%
97.62%
98.1%
100%
Table 3. Random generation performance on MOSES dataset. Results with ∗ are taken from Polykovskiy et al. (2020).
Method
Validity
Validity w/o check
Uniqueness
Novelty
Reconstruction
JT-VAE
100%
n/a
99.96%
91.43%
n/a
GraphCNF
95.66%
n/a
99.98%
100%
n/a
GraphAF
100%
71%
99.99%
100%
100%
GraphDF
100%
87.58%
99.55%
100%
100%
Table 4. Random generation performance on COMMUNITY-SMALL and EGO-SMALL datasets. Following GNF (Liu et al., 2019), we
evaluate with MMD metric, the first set of results are obtained using node distribution matching, and the second set of results are obtained
from evaluating 1024 generated graphs without node distribution matching.
Method
COMMUNITY-SMALL
EGO-SMALL
Degree Cluster Orbit Degree Cluster
Orbit
GNF
0.20
0.20
0.11
0.03
0.11
0.001
GraphAF
0.18
0.20
0.02
0.03
0.11
0.001
GraphDF
0.06
0.12
0.03
0.04
0.13
0.010
GNF(1024)
0.12
0.15
0.02
0.01
0.03
0.0008
GraphAF(1024)
0.06
0.10
0.015
0.04
0.04
0.008
GraphDF(1024)
0.04
0.10
0.018
0.07
0.10
0.014
et al., 2014), and MOSES (Polykovskiy et al., 2020). All
molecules are preprocessed to kekulized form by RDKit
(Landrum, 2016). In addition, we conduct experiments
to show that GraphDF is very general and can be used to
graphs other than molecular graphs. Following the experi-
ments of GNF (Liu et al., 2019), we test GraphDF on two
generic graph datasets, COMMUNITY-SMALL and EGO-
SMALL. Detailed information about all datasets used in
random generation experiments is provided in Appendix C.
Setup. We adopt five widely used metrics to evaluate
GraphDF on molecular graph generation. Among all gen-
erated molecules, Validity is the percentage of molecules
which do not violate chemical valency rule. Since we use re-
sampling when valency check fails as mentioned in Sec. 3.1,
we also report Validity w/o check, which is the validity per-

Page 7
GraphDF: A Discrete Flow Model for Molecular Graph Generation
centage when this valency correction is disabled. Among
all generated valid molecules, Uniqueness and Novelty are
the percentage of unique molecules and molecules not ap-
pearing in the training data, respectively. Reconstruction is
the percentage of molecules which can be reconstructed
from their latent variables. The five metrics are calcu-
lated from 10,000 generated molecular graphs. We find
that the reported metrics in MoFlow paper (Zang & Wang,
2020) are not calculated from the same 10,000 generated
molecules, so we recalculate the metrics of MoFlow by re-
running its official source code for fair comparison. As for
COMMUNITY-SMALL and EGO-SMALL datasets, we
follow GNF (Liu et al., 2019) to calculate the Maximum
Mean Discrepancy (MMD) (Gretton et al., 2012) distance of
statistics of degrees, clustering coefficients and orbit counts
between generated graphs and graphs in the dataset. We
calculate the MMD from 1024 generated graphs in the two
cases of using and not using node distribution matching.
The flow model in GraphDF consists of 12 modulo shift
modules, where the shared R-GCN has 3 message passing
layers. The hidden dimension and output dimension of
node embeddings are 128. For molecular graphs, the input
node embeddings are one-hot embeddings of node types.
For graphs in COMMUNITY-SMALL and EGO-SMALL
datasets, the input node embeddings are one-hot indicator
vectors. After message passing layers, a batch normalization
layer and a sum pooling layer are used to get the graph
embedding. The two MLPs for calculating µd
i and µd
ij both
have two fully-connected layers, with a hidden dimension
of 128 and a non-linear activation function of tanh. A single
R-GCN is used among all 12 modulo shift modules in the
discrete flow model, while each modulo shift module has
an MLP to calculate µd
i or µd
ij. We use the above model
architecture configuration for all experiments in this paper.
Adam (Kingma & Ba, 2015) optimizer is used to train
GraphDF models. On each molecule dataset, GraphDF
is trained for 10 epochs, where the batch size is 32 and
the learning rate is 0.001. On COMMUNITY-SMALL and
EGO-SMALL datasets, GraphDF is trained for 1000 epochs,
where the batch size is 16 and the learning rate is 0.001. See
Appendix D for more training and generation details.
Results. Table 1, Table 2 and Table 3 show that GraphDF
can outperform the state-of-the-art methods over most met-
rics on all three molecule datasets. Thanks to the invertible
property of flow models and valency correction, GraphDF
can always achieve 100% validity and reconstruction rate.
In addition, GraphDF achieves reasonably high uniqueness
and novelty rate on three datasets, showing that it does not
overfit or produce mode collapse. Particularly, GraphDF can
outperform GraphAF by a large margin in terms of validity
w/o check. Since GraphAF generates molecular graphs in
the same sequential fashion but uses continuous flow, we
Table 5. Property optimization performance evaluated by top-3
property scores. The scores of MRNN are taken from Shi* et al.
(2020).
Method
Penalized logP
QED
1st
2nd
3rd
1st
2nd
3rd
ZINC
(Dataset)
4.52
4.3
4.23
0.948
0.948
0.948
JT-VAE
5.3
4.93
4.49
0.925
0.911
0.910
GCPN
7.98
7.85
7.80
0.948
0.947
0.946
MRNN
8.63
6.08
4.73
0.844
0.796
0.736
GraphAF
12.23
11.29
11.05
0.948
0.948
0.947
MoFlow
n/a
n/a
n/a
0.948
0.948
0.948
GraphDF
13.7
13.18
13.17
0.948
0.948
0.948
believe that the good performance of GraphDF comes from
the usage of discrete flow. It strongly demonstrates that
discrete latent variables can learn chemical rules and model
the underlying distributions more accurately than continu-
ous latent variables. Besides, results in Table 4 show that
GraphDF can consistently achieve good performance on
generic graph data when compared with GNF and GraphAF.
4.2. Property Optimization
Setup. In property optimization task, our objective is to
generate novel molecules with high property scores. We
use penalized logP and QED (Bickerton et al., 2012) as
the optimization targets. Penalized logP is the logP score
reduced by synthetic accessibility and ring size, and QED
is quantitative estimation of drug-likeness. We calculate
them using the scripts from the official implementation of
GraphAF (Shi* et al., 2020).
We use reinforcement learning to optimize these two targets.
In this process, both intermediate reward and final reward
are considered. The intermediate reward is the penalization
for violating chemical bond valency rule. At a generation
step, if the action in this step is to add a new edge to the sub-
graph and the new edge results in the violation of chemical
bond valency rule, then a negative reward of -1 is assigned
to this step. Besides, when a complete molecular graph G
is generated, the final reward R(G) is calculated as
R(G) = Rp(G) − Rss(G) − Rf (G).
(15)
Rss(G) is 1 if G is too sterically strained, i.e., the average
angle bend energy of G exceeds 0.82 kcal/mol, otherwise
Rss(G) is 0. Rf (G) is 1 when functional group filters
of ZINC (Irwin et al., 2012) detect problematic functional
groups from G, otherwise Rf (G) is 0. Rp(G) is related to
the property score of G. Denote by logP(·) the computation
of penalized logP property and QED(·) the computation of

Page 8
GraphDF: A Discrete Flow Model for Molecular Graph Generation
13.70
13.17
13.18
13.09
(a) Four molecules with top penalized logP scores. Note that they
are all long carbon chains, only differing in one atom. These are
typical chemical structures with high penalized logP property scores.
0.948
0.948
0.948
0.946
(b) Four molecules with top QED scores.
Figure 3. An illustration of generated molecules with best property
scores after property optimization.
QED property, we use
Rp(G) = exp (logP(G)/3.0 − 4.0),
and
Rp(G)=2.0 ∗ QED(G),
respectively. The final reward R(G) is distributed to all
intermediate steps with a discount factor γ = 0.9. Specifi-
cally, if the total number of steps to generate G is T, then
the assigned reward is γT −tR(G) for the t-th step where
t = 1,...,T.
-8.34
-1.61
-8.40
-0.54
Figure 4. An illustration of two examples in constrained optimiza-
tion of penalized logP.
We first pretrain GraphDF model on ZINC250K dataset for
1000 epochs and then fine-tune the model with reinforce-
ment learning for 200 iterations. More details about model
training are presented in Appendix D. Following previous
literatures, we report the top-3 property scores of molecules
generated from the model.
Results. We summarize the top-3 property scores of
ZINC250K dataset and generation methods in Table 5. It
shows that GraphDF can outperform all baselines in penal-
ized logP optimization task and achieve comparable perfor-
mance in QED optimization task. Note that similar rein-
forcement learning procedure described in Sec. 3.5 is used
to optimize the property scores in GCPN (You et al., 2018a)
and GraphAF (Shi* et al., 2020). Hence, we believe that the
good performance of GraphDF demonstrates its strong ca-
pacity to explore the complicated chemical structure space
and generate diverse molecular graphs. Some generated
molecules with high penalized logP scores and QED scores
are visualized in Figure 3.
4.3. Constrained Optimization
Data. In this task, we aim to modify the input molecu-
lar graph so as to improve its penalized logP score while
the similarity between the input and modified molecules
is higher than a threshold δ. Following JT-VAE (Jin et al.,
2018), GCPN (You et al., 2018a), GraphAF (Shi* et al.,
2020), and MoFlow (Zang & Wang, 2020), we use selected
800 molecules from ZINC250K with low penalized logP
scores as the input molecules to be optimized. However, we
find that the set of 800 molecules used in JT-VAE and GCPN
are different from that used in GraphAF and MoFlow, and
the official implementation of GCPN does not provide code
for re-running constrained optimization on new datasets.
Hence, for fair comparison, we train GraphDF and report

Page 9
GraphDF: A Discrete Flow Model for Molecular Graph Generation
Table 6. Constrained optimization performance on 800 molecules used in JT-VAE and GCPN.
δ
JT-VAE
GCPN
GraphDF
Improvement Similarity
Success Improvement Similarity
Success Improvement Similarity
Success
0.0
1.91±2.04
0.28±0.15
97.5%
4.20±1.28
0.32±0.12
100%
5.93±1.97
0.30±0.12
100%
0.2
1.68±1.85
0.33±0.13
97.1%
4.12±1.19
0.34±0.11
100%
5.62±1.65
0.32±0.10
100%
0.4
0.84±1.45
0.51±0.10
83.6%
2.49±1.30
0.47±0.08
100%
4.13±1.41
0.47±0.07
100%
0.6
0.21±0.71
0.69±0.06
46.4%
0.79±0.63
0.68±0.08
100%
1.72±1.15
0.67±0.06
93%
Table 7. Constrained optimization performance on 800 molecules used in GraphAF and MoFlow.
δ
GraphAF
MoFlow
GraphDF
Improvement Similarity
Success Improvement Similarity
Success Improvement Similarity
Success
0.0
13.13±6.89
0.29±0.15
100%
8.61±5.44
0.30±0.20
98.88%
14.15±6.86
0.29±0.13
100%
0.2
11.90±6.86
0.33±0.12
100%
7.06±5.04
0.43±0.20
96.75%
12.77±6.59
0.32±0.11
100%
0.4
8.21±6.51
0.49±0.09
99.88%
4.71±4.55
0.61±0.18
85.75%
9.19±6.43
0.48±0.08
99.63%
0.6
4.98±6.49
0.66±0.05
96.88%
2.10±2.86
0.79±0.14
58.25%
4.51±5.80
0.65±0.05
92.13%
results on two datasets, one is used in JT-VAE and GCPN
while the other is used in GraphAF and MoFlow.
Setup. Similar to the property optimization task, GraphDF
is pretrained on ZINC250K dataset for 1000 epochs and
then fine-tuned with reinforcement learning. More details
about model training can be found in Appendix D.
We use the reward defined in Eqn. (15), where Rp(G) is
defined as
Rp(G) = logP(G) − logP(Gin),
where Gin is the input molecular graph. The initial state
of a generation process is set to be a sub-graph of the in-
put molecular graph. Specifically, given an input molec-
ular graph Gin, we first re-order its nodes by a breadth-
first search, then randomly remove the last m nodes and
edges incident to them, where m is randomly selected from
{0, 1, 2, 3, 4, 5}. The remaining sub-graph is set as the ini-
tial state.
We use Tanimoto similarity with Morgan fingerprint (Rogers
& Hahn, 2010) to compute the similarity between the input
and the optimized molecules. We optimize the input 800
molecules under 4 different similarity constraints of 0.0, 0.2,
0.4, 0.6, respectively. The success rate under each similarity
constraint is reported, which is the percentage of molecules
that can be successfully optimized over 800 molecules. Over
all successfully optimized molecules, we calculate the mean
and standard deviation of the largest property improvement,
and similarities between them and their corresponding input
molecules.
Results. Results of constrained optimization on two
datasets are summarized in Table 6 and Table 7, respec-
tively. Results show that GraphDF can achieve much higher
average property improvement than JT-VAE, GCPN, and
MoFlow under 4 similarity constraints, and outperform
GraphAF under 3 of 4 similarity constraints. In addition,
the average similarities achieved by GraphDF are compara-
ble with all baselines, and unlike JT-VAE or MoFlow, the
success rate of GraphDF does not decrease dramatically as
the similarity threshold increases. The good performance
further proves that GraphDF has a strong ability to search
molecules with high property scores in chemical space. We
illustrate several optimization examples in Figure 4.
5. Conclusion
We propose GraphDF, the first molecular graph generation
framework using discrete latent variables. GraphDF sequen-
tially map discrete latent variables to graph nodes and edges
via invertible modulo shift transforms. The use of discrete
latent variables results in more accurate modeling of graph
structures and stronger capacity to explore the chemical
space. More importantly, GraphDF can avoid expensive
computation of Jacobian matrix and eliminate the negative
effects of dequantization, which exist in many prior methods.
We experimentally demonstrate the advantages of GraphDF
and achieve new state-of-the-art results in three molecule
generation tasks. The major limitations of GraphDF are
relying on BFS node ordering during training and the gener-
ation speed is slower than one-shot methods. In the future,
we would like to apply GraphDF to other types of graph
data and extend it to graph translation problems.
Acknowledgements
We thank Chence Shi for his help on providing technical
details on the implementation of reinforcement learning
method. This work was supported in part by National Sci-
ence Foundation grant DBI-2028361.

Page 10
GraphDF: A Discrete Flow Model for Molecular Graph Generation
References
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S.,
and Hopkins, A. L. Quantifying the chemical beauty of
drugs. Nature chemistry, 4(2):90–98, 2012.
Dai, H., Tian, Y., Dai, B., Skiena, S., and Song, L. Syntax-
directed variational autoencoder for structured data. In
7th International Conference on Learning Representa-
tions, 2018.
De Cao, N. and Kipf, T. MolGAN: An implicit generative
model for small molecular graphs. ICML 2018 workshop
on Theoretical Foundations and Applications of Deep
Generative Models, 2018.
Dinh, L., Krueger, D., and Bengio, Y. NICE: Non-linear
independent components estimation. arXiv preprint
arXiv:1410.8516, 2014.
Dinh, L., Sohl-Dickstein, J., and Bengio, S. Density estima-
tion using real NVP. In 4th International Conference on
Learning Representations, 2016.
Gao, H. and Ji, S. Graph U-Nets. In Chaudhuri, K. and
Salakhutdinov, R. (eds.), Proceedings of the 36th Inter-
national Conference on Machine Learning, volume 97 of
Proceedings of Machine Learning Research, pp. 2083–
2092. PMLR, 09–15 Jun 2019.
Gao, H., Wang, Z., and Ji, S. Large-scale learnable graph
convolutional networks. In Proceedings of the 24th ACM
SIGKDD International Conference on Knowledge Discov-
ery & Data Mining, KDD ’18, pp. 1416–1424, New York,
NY, USA, 2018. Association for Computing Machinery.
Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D.,
Hernández-Lobato, J. M., Sánchez-Lengeling, B., She-
berla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams,
R. P., and Aspuru-Guzik, A. Automatic chemical de-
sign using a data-driven continuous representation of
molecules. ACS central science, 4(2):268–276, 2018.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y.
Generative adversarial nets. In Ghahramani, Z., Welling,
M., Cortes, C., Lawrence, N., and Weinberger, K. Q.
(eds.), Advances in Neural Information Processing Sys-
tems, volume 27, pp. 2672–2680. Curran Associates, Inc.,
2014.
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B.,
and Smola, A. A kernel two-sample test. The Journal of
Machine Learning Research, 13(1):723–773, 2012.
Hochreiter, S. and Schmidhuber, J. Long short-term memory.
Neural computation, 9(8):1735–1780, 1997.
Honda, S., Akita, H., Ishiguro, K., Nakanishi, T., and Oono,
K. Graph residual flow for molecular graph generation.
arXiv preprint arXiv:1909.13521, 2019.
Irwin, J. J., Sterling, T., Mysinger, M. M., Bolstad, E. S.,
and Coleman, R. G. ZINC: a free tool to discover chem-
istry for biology. Journal of chemical information and
modeling, 52(7):1757–1768, 2012.
Jang, E., Gu, S., and Poole, B. Categorical reparameteriza-
tion with Gumbel-Softmax. In 4th International Confer-
ence on Learning Representations, 2016.
Jin, W., Barzilay, R., and Jaakkola, T. Junction tree vari-
ational autoencoder for molecular graph generation. In
Dy, J. and Krause, A. (eds.), Proceedings of the 35th In-
ternational Conference on Machine Learning, volume 80
of Proceedings of Machine Learning Research, pp. 2323–
2332, 2018.
Kajino, H. Molecular hypergraph grammar with its appli-
cation to molecular optimization. In Chaudhuri, K. and
Salakhutdinov, R. (eds.), Proceedings of the 36th Inter-
national Conference on Machine Learning, volume 97 of
Proceedings of Machine Learning Research, Long Beach,
California, USA, 2019.
Kingma, D. P. and Ba, J. Adam: A method for stochastic
optimization. In Proceddings of the 3rd international
conference on learning representations, 2015.
Kingma, D. P. and Welling, M. Auto-Encoding variational
bayes. In 2nd International Conference on Learning
Representations, 2013.
Kipf, T. N. and Welling, M. Semi-supervised classification
with graph convolutional networks. In 5th International
Conference on Learning Representations, 2017.
Kusner, M. J., Paige, B., and Hernández-Lobato, J. M.
Grammar variational autoencoder. In Precup, D. and Teh,
Y. W. (eds.), Proceedings of the 34th International Con-
ference on Machine Learning, volume 70 of Proceedings
of Machine Learning Research, pp. 1945–1954, 2017.
Landrum, G. RDKit: Open-source cheminformatics. http:
//www.rdkit.org, 2016. Accessed:2021-01-16.
Lippe, P. and Gavves, E. Categorical normalizing flows via
continuous transformations. In 9th International Confer-
ence on Learning Representations, 2021.
Liu, J., Kumar, A., Ba, J., Kiros, J., and Swersky, K.
Graph normalizing flows. In Wallach, H., Larochelle, H.,
Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R.
(eds.), Advances in Neural Information Processing Sys-
tems, volume 32, pp. 13578–13588. Curran Associates,
Inc., 2019.

Page 11
GraphDF: A Discrete Flow Model for Molecular Graph Generation
Liu, M., Luo, Y., Wang, L., Xie, Y., Yuan, H., Gui, S.,
Yu, H., Xu, Z., Zhang, J., Liu, Y., et al. DIG: A turnkey
library for diving into graph deep learning research. arXiv
preprint arXiv:2103.12608, 2021.
Liu, Q., Allamanis, M., Brockschmidt, M., and Gaunt, A.
Constrained graph variational autoencoders for molecule
design. In Bengio, S., Wallach, H., Larochelle, H., Grau-
man, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Ad-
vances in Neural Information Processing Systems, vol-
ume 31, pp. 7795–7804. Curran Associates, Inc., 2018.
Ma, T., Chen, J., and Xiao, C. Constrained generation of
semantically valid graphs via regularizing variational au-
toencoders. In Bengio, S., Wallach, H., Larochelle, H.,
Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.),
Advances in Neural Information Processing Systems, vol-
ume 31, pp. 7113–7124. Curran Associates, Inc., 2018.
Madhawa, K., Ishiguro, K., Nakago, K., and Abe, M. Graph-
NVP: An invertible flow model for generating molecular
graphs. arXiv preprint arXiv:1905.11600, 2019.
Papamakarios, G., Pavlakou, T., and Murray, I. Masked
autoregressive flow for density estimation. In Guyon,
I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R.,
Vishwanathan, S., and Garnett, R. (eds.), Advances in
Neural Information Processing Systems, volume 30, pp.
2338–2347. Curran Associates, Inc., 2017.
Polishchuk, P. G., Madzhidov, T. I., and Varnek, A. Es-
timation of the size of drug-like chemical space based
on GDB-17 data. Journal of computer-aided molecular
design, 27(8):675–679, 2013.
Polykovskiy, D., Zhebrak, A., Sanchez-Lengeling, B., Golo-
vanov, S., Tatanov, O., Belyaev, S., Kurbanov, R., Ar-
tamonov, A., Aladinskiy, V., Veselov, M., Kadurin,
A., Johansson, S., Chen, H., Nikolenko, S., Aspuru-
Guzik, A., and Zhavoronkov, A. Molecular sets
(MOSES): A benchmarking platform for molecular gen-
eration models. Frontiers in Pharmacology, 11:1931,
2020. ISSN 1663-9812. doi: 10.3389/fphar.2020.
565644. URL https://www.frontiersin.org/
article/10.3389/fphar.2020.565644.
Popova, M., Shvets, M., Oliva, J., and Isayev, O. Molecu-
larRNN: Generating realistic molecular graphs with op-
timized properties. arXiv preprint arXiv:1905.13372,
2019.
Ramakrishnan, R., Dral, P. O., Rupp, M., and Von Lilienfeld,
O. A. Quantum chemistry structures and properties of
134 kilo molecules. Scientific data, 1(1):1–7, 2014.
Rezende, D. and Mohamed, S. Variational inference with
normalizing flows. In Bach, F. and Blei, D. (eds.), Pro-
ceedings of the 32nd International Conference on Ma-
chine Learning, volume 37 of Proceedings of Machine
Learning Research, pp. 1530–1538, Lille, France, 2015.
Rogers, D. and Hahn, M. Extended-connectivity finger-
prints. Journal of chemical information and modeling, 50
(5):742–754, 2010.
Schlichtkrull, M., Kipf, T. N., Bloem, P., van den Berg,
R., Titov, I., and Welling, M. Modeling relational data
with graph convolutional networks. In Gangemi, A., Nav-
igli, R., Vidal, M.-E., Hitzler, P., Troncy, R., Hollink, L.,
Tordai, A., and Alam, M. (eds.), The Semantic Web, pp.
593–607, Cham, 2018. Springer International Publishing.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. Proximal policy optimization algorithms.
arXiv preprint arXiv:1707.06347, 2017.
Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B.,
and Eliassi-Rad, T. Collective classification in network
data. AI magazine, 29(3):93–93, 2008.
Shi*, C., Xu*, M., Zhu, Z., Zhang, W., Zhang, M., and
Tang, J. GraphAF: a flow-based autoregressive model
for molecular graph generation. In 8th International
Conference on Learning Representations, 2020.
Simonovsky, M. and Komodakis, N. GraphVAE: Towards
generation of small graphs using variational autoencoders.
In Kurková, V., Manolopoulos, Y., Hammer, B., Iliadis,
L., and Maglogiannis, I. (eds.), Artificial Neural Net-
works and Machine Learning – ICANN 2018, pp. 412–
422, Cham, 2018. Springer International Publishing.
Stokes, J. M., Yang, K., Swanson, K., Jin, W., Cubillos-Ruiz,
A., Donghia, N. M., MacNair, C. R., French, S., Carfrae,
L. A., Bloom-Ackerman, Z., et al. A deep learning ap-
proach to antibiotic discovery. Cell, 180(4):688–702,
2020.
Tran, D., Vafa, K., Agrawal, K., Dinh, L., and Poole, B.
Discrete flows: Invertible generative models of discrete
data. In Wallach, H., Larochelle, H., Beygelzimer, A.,
d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances
in Neural Information Processing Systems, volume 32,
pp. 14719–14728. Curran Associates, Inc., 2019.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. At-
tention is all you need. In Guyon, I., Luxburg, U. V.,
Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S.,
and Garnett, R. (eds.), Advances in Neural Information
Processing Systems, volume 30, pp. 5998–6008. Curran
Associates, Inc., 2017.

Page 12
GraphDF: A Discrete Flow Model for Molecular Graph Generation
Wang, Z., Liu, M., Luo, Y., Xu, Z., Xie, Y., Wang, L.,
Cai, L., and Ji, S. Advanced graph and sequence neu-
ral networks for molecular property prediction and drug
discovery. arXiv preprint arXiv:2012.01981, 2020.
Weininger, D. SMILES, a chemical language and informa-
tion system. 1. introduction to methodology and encoding
rules. Journal of chemical information and computer sci-
ences, 28(1):31–36, 1988.
You, J., Liu, B., Ying, Z., Pande, V., and Leskovec, J. Graph
convolutional policy network for goal-directed molecular
graph generation. In Bengio, S., Wallach, H., Larochelle,
H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.),
Advances in Neural Information Processing Systems, vol-
ume 31, pp. 6410–6421. Curran Associates, Inc., 2018a.
You, J., Ying, R., Ren, X., Hamilton, W., and Leskovec,
J. GraphRNN: Generating realistic graphs with deep
auto-regressive models. In Dy, J. and Krause, A. (eds.),
Proceedings of the 35th International Conference on Ma-
chine Learning, volume 80 of Proceedings of Machine
Learning Research, pp. 5708–5717, Stockholmsmässan,
Stockholm Sweden, 2018b. PMLR.
Yu, L., Zhang, W., Wang, J., and Yu, Y. SeqGAN: Sequence
generative adversarial nets with policy gradient. In Pro-
ceedings of the AAAI conference on artificial intelligence,
volume 31, 2017.
Zang, C. and Wang, F. MoFlow: an invertible flow model for
generating molecular graphs. In Proceedings of the 26th
ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, pp. 617–626, 2020.