Learning to Solve
Optimization Problems Constrained
with Partial Differential Equations
Abstract
Partial differential equation (PDE)-constrained optimization arises in many scientific and engineering domains, such as energy systems, fluid dynamics, and material design. In these problems, the decision variables (e.g., control inputs or design parameters) are tightly coupled with the PDE state variables, and the feasible set is implicitly defined by the governing PDE constraints. This coupling makes the problems computationally demanding, as it requires handling high-dimensional discretizations and dynamic constraints. To address these challenges, this paper introduces a learning-based framework that integrates a dynamic predictor with an optimization surrogate. The dynamic predictor, a novel time-discrete Neural Operator (Lu_2021, ), efficiently approximates system trajectories governed by PDE dynamics, while the optimization surrogate leverages proxy optimizer techniques (kotary2021end, ) to approximate the associated optimal decisions. This dual-network design enables real-time approximation of optimal strategies while explicitly capturing the coupling between decisions and PDE dynamics. We validate the proposed approach on benchmark PDE-constrained optimization tasks including Burgers’ equation, heat equation, and voltage regulation, and demonstrate that it achieves solution quality comparable to classical control-based algorithms such as the Direct Method and Model Predictive Control (MPC), while providing up to four orders of magnitude improvement in computational speed.
Antiquus S. Hippocampus, Natalia Cerebro & Amelie P. Amygdale ∗Equal Contribution
Department of Computer Science
Cranberry-Lemon University
Pittsburgh, PA 15213, USA
{hippo,brain,jen}@cs.cranberry-lemon.edu
&Ji Q. Ren & Yevgeny LeNet
Department of Computational Neuroscience
University of the Witwatersrand
Joburg, South Africa
{robot,net}@wits.ac.za
and Coauthor
Affiliation
Address
email
1 Introduction
Partial differential equations (PDEs) are central to modeling a wide range of physical and engineering systems, capturing phenomena that evolve jointly over space and time. Applications include fluid transport, which can be modeled with Burgers’ equations which are key in fluid mechanics and gas dynamics, thermal processes captured by heat equations, often used in financial mathematics and image analysis, and voltage dynamics in power systems. Optimizing decisions in such domains requires solving PDE-constrained optimization problems, where control or design objectives must be optimized while ensuring consistency with the underlying governing PDE dynamics.
Despite their importance, PDE-constrained optimization remains computationally intractable. Traditional approaches typically rely on finite element and finite difference discretizations combined with adjoint-based optimization. These methods often suffer from high dimensionality, high computational cost, and convergence difficulties when applied to nonlinear dynamics. For example, when applied to the nonlinear Burgers’ equations, our experiments show that the runtimes of the adjoint-based and Nonlinear MPC approaches run in approximately and ; such computational cost hinder their applicability in large-scale or real-time decision-making tasks.
In various classes of continuous optimization tasks not constrained by PDEs, proxy optimizers (kotary2021end, ) have recently emerged as a powerful tool to quickly find approximate solutions. In parallel, researchers have focused on developing a variety of surrogates (raissi2017physics, ; kovachki2024neuraloperatorlearningmaps, ; li2021fourier, ; chen2019neural, ) for capturing system dynamics. Our approach is a unique combination of elements from both these areas, and consists of a dual-network architecture to tackle PDE-constrained optimization problems where both the state dynamics and the control decisions evolve in space and time.
Contributions. Specifically, the paper makes the following contributions: (1) It introduces a novel learning-based method to efficiently approximate solutions of PDE-constrained optimization problems. Our approach, PDE-OP (Optimization Proxy), consists of a dual-network architecture in which a dynamic predictor captures the system dynamics, while a surrogate controller approximates the optimal control actions. These components are trained in a synergistic manner in a self-supervised fashion using a primal–dual method. (2) It shows that the proposed time-discrete Neural Operator accurately captures the dynamics governed by PDEs. (3) It demonstrates that the proposed framework achieves decision quality comparable to state-of-the-art numerical solvers, such as Model Predictive Control and the Adjoint-sensitivity method, while providing several orders of magnitude improvements in computational speed. The approach is validated on a set of widely adopted PDE-constrained optimization tasks, including Burgers’, heat, and voltage equations. We believe that these results could boost the adoption of learning-based methods for control problems that require high-fidelity dynamic modeling and near real-time optimal decision-making.
2 Related Work
A central paradigm for decision-making coupled with dynamical systems is Model Predictive Control (MPC) (mayne2000constrained, ). It repeatedly solves an optimization problem over a finite time horizon using predicted system states. While MPC provides high-quality control decisions, its reliance on repeatedly solving an optimization problem makes it computationally intractable and limits its applicability in real-time settings. In a similar fashion, exact PDE-constrained optimization methods rely on time–space discretization of the governing equations, after which the resulting high-dimensional system is incorporated into an optimization problem (hinze2008optimization, ; troeltzsch2010optimal, ). While theoretically well-founded, these approaches become quickly impractical as the problem size grows, especially in the presence of nonlinear dynamics or when real-time feasibility is required. To mitigate these limitations, surrogate models have been developed for reduced-order adjoint solutions (zahr2014progressive, ), sensitivity PDEs (dihlmann2015certified, ), and learning PDE operators (Hwang_2022, ), offering significant computational advantages. Nevertheless, these methods often face trade-offs between accuracy and efficiency and limited adaptability across operating regimes(kovachki2024neuraloperatorlearningmaps, ; Lu_2021, ; willcox2021physics, ).
In parallel, to approximate solutions of constrained optimization problems in real-time setting, machine learning researchers have developed a class of methods known as proxy optimizers. These methods employ machine learning models, to learn mappings from problem parameters or system configurations to optimal decisions, thereby significantly accelerating inference (kotary2021end, ) compared to classical optimization solvers. Both supervised approaches (fioretto2020lagrangian, ), which rely on precomputed optimal solutions, and self-supervised ones (park2023self, ), which exploit knowledge of the problem structure, have been proposed, showing promising results. A recurring challenge in this setting is ensuring feasibility: learned solutions may violate problem constraints. This issue has been tackled through penalty-based training (fioretto2020lagrangian, ), implicit-layer formulations that embed constraints directly into the model architecture (donti2020dc3, ), or post-processing projections to restore feasibility (kotary2024learningjointmodelsprediction, ).
A large body of recent work has also focused on learning surrogates for PDE dynamics. Physics-Informed Neural Networks (2019JCoPh.378..686R, ) directly encode PDE residuals into the loss function, enabling solutions that adhere to the governing equations. Neural operator approaches, such as the Fourier Neural Operator (li2021fourier, ) and DeepONet (Lu_2021, ), learn mappings between infinite-dimensional function spaces, which allows them to approximate solution operators of families of PDEs, with strong generalization across different boundary conditions and coefficients (kovachki2024neuraloperatorlearningmaps, ). Other operator-learning frameworks (li2020multipole, ; bhattacharya2021model, ) and kernel-based methods (raissi2021deepgreen, ) have also been developed. While these approaches excel at reproducing PDE solutions, they are typically not designed to incorporate decision variables or to directly solve PDE-constrained optimization problems. The closest related work is (divito2024learningsolvedifferentialequation, ), where the authors propose a learning-based surrogate solver to tackle ODE or SDE-constrained optimization tasks. Our approach goes further, by integrating operator-learning surrogates within the optimization pipeline, thereby coupling the representation of PDE dynamics with proxy optimizers and enabling real-time solutions to PDE-constrained optimization tasks.
3 Settings and Goals
This paper focuses on optimization problem constrained by a system of partial differential equations:
(1a) | ||||
s.t. | (1b) | |||
(1c) | ||||
(1d) | ||||
(1e) | ||||
(1f) |
In this formulation, the control action is denoted by , where is the space–time domain, with being the initial time instant and the time horizon, respectively, and being the spatial domain, e.g., if , then . The system state is represented by . and denote the independent temporal and spatial variable, respectively. The optimization objective (1a) combines a terminal cost , which depends on the state at the final time , with a running cost , integrated over time and space, and an effort cost, which is the energy used to control the system.
The system dynamics are governed by two PDE constraints. Equation (1b) models the temporal evolution of the state, while (1c) captures spatial dependencies, for example through diffusion–reaction relations. Initial conditions (1d) prescribe the state distribution at , while boundary conditions (1e) specify the state behavior on the spatial boundary . Finally, (1f) represents additional inequality and equality constraints on , which may encode feasibility requirements or application-specific restrictions.

Temperature control example. In the context of thermal regulation problems, the decision variable represents localized actuators applied along the medium, while the state variable describes the temperature distribution over space and time. These two components are tightly coupled: the control inputs directly enter the governing PDE as forcing terms, shaping the spatio-temporal evolution of the state, while the optimization objective is evaluated on the resulting state trajectory. System dynamics (1b–1c) follow a controlled reaction–diffusion equation, which models heat diffusion, dissipation, and externally applied control. The initial condition (1d) specifies the temperature distribution at , and the boundary condition (1e) enforces zero-flux (Neumann) boundaries, preventing heat transfer across the domain boundary. The objective 1a combines a terminal cost to match the target temperature profile, a running cost for deviations along the horizon, and a control effort penalty. A simplified illustration is provided in Fig.1, while a full mathematical description is given in Section 5.2.
Challenges. While fundamental for many applications, Problem (1) presents three key challenges:
-
1.
Finding optimal solutions to Problem (1) is computationally intractable. Even without the differential equation constraints, the decision version of the problem alone is NP-hard in general.
- 2.
-
3.
Finally, the integration of the system dynamics in the optimization framework renders traditional numerical methods impractical for real-time applications.
4 PDE-Optimization Proxy
To tackle the challenges discussed above, we introduce PDE-Optimization Proxy (PDE-OP): a fully differentiable surrogate for PDE-constrained optimization. PDE-OP adopts a dual-network architecture: the first network, , approximates the optimal decision variable , while the second, , predicts the corresponding state variables using a discrete-time neural operator architecture. Here, and denote the trainable parameters of and , respectively. These two components are jointly trained in an end-to-end differentiable manner, and a schematic illustration of their interaction is provided in Figure 2. In what follows, we first describe each network individually and then detail their integration within the proposed learning framework.
Learning Goal. PDE-OP trains on a distribution of problem instances generated by different states for various and values. For instance, in the thermal regulation example, this distribution corresponds to the variety of temperature profiles that arise from different control inputs. With reference to (1a), the learning objective is:
(2a) | |||
s.t.(1b)–(1f), | (2b) |
where, to simplify the notation, we denote the estimated control variable as , while the state variable estimate is denoted as . Given the complexity of solving Problem (2), the goal is to develop a fast and accurate neural PDE optimization surrogate. This approach uses the concept of proxy optimizers (kotary2021end, ), which is detailed next.

4.1 Surrogate Controller
The first objective is to establish a neural network-based mapping that transforms the system state of a PDE-constrained optimization problem 1 to a control action . Practically, this mapping is implemented as a neural network , which takes the current system state and desired terminal state as input, and outputs an array of coefficients that are then used to estimate the corresponding optimal decision variables. These coefficients are the weights used to combine a set of basis functions to represent the control action, and are described next.
Basis function representation of the control action.
To approximate the optimal control action using a neural network, we represent the control trajectory as a linear combination of basis functions. This representation has also been proposed in basis_cite_1 ; basis_cite_2 .
This design choice is motivated by the fact that, although the output of a neural network lies in a finite-dimensional space, combining it with continuous basis functions allows us to construct a continuous approximation of the control trajectory . In this way, the controller model operates in a low-dimensional space, while the basis functions map these finite number of outputs into a continuous-time signal.
As shown in the additional experiments provided in Appendix 9.6, direct modeling of continuous-time control actions, either with a neural network or classical method, leads to qualitatively similar performance to our basis expansion approach, but while the computational cost of our method remains extremely low, the running time of the classical methods grows significantly. These results show that our method is suitable for real-time tasks.
Consider the control action , where denotes a collocation point; at each time instant , the control is expressed as a weighted sum of continuous basis functions :
(3) |
where denotes the coefficients of the basis functions at time instant . For example, in one of our experiments, we adopt sinusoidal basis functions, e.g. , , , , , which provide a Fourier-like expansion of the control signal. Here denotes the number of sinusoidal modes included in the expansion, with higher values of allowing for better approximations of the control variable.
Surrogate controller model training. The role of the controller surrogate is to predict the array of coefficients . Formally, the network approximates the function
(4) |
which maps the state variable and desired terminal state to the corresponding controller weights , subsequently used to construct the control action according to (3).
The surrogate controller model is trained on-the-fly, using a dataset , where denotes the state variables estimated by the dynamic predictor , is the desired terminal state and is the number of samples.
Here, bold symbols denote the space-discrete version of the corresponding vector , with each collocation point corresponding to a specific output of , e.g. . Note that this setting is unsupervised: optimal weights are unknown, and candidate weights are evaluated solely by the objective value in (1a) they induce. Figure 2 illustrates the interaction between PDE-OP’s proxy optimizer and dynamic predictor at time instant .
Given the predicted state (for we have ), the surrogate controller outputs an estimate of which are then used to construct the control action ; this is subsequently used by to predict the next state . This process is repeated at each collocation point, until the terminal state is estimated.
To build our surrogate controller, we build on the proxy optimizer method proposed in (park2023self, ), which allows self-supervised training without access to precomputed optimal solutions, while directly encouraging feasibility through primal–dual updates.
Once trained, the model can be used to generate near-optimal solutions at low cost. The next section discusses how the state variables are estimated using a discrete-time Neural Operator architecture.
4.2 Discrete-time Neural Operator
The second component of PDE-OP is a discrete-time Neural Operator architecture, here denoted as , which models the PDE solution operator. The proposed dynamic predictor employs two subnetworks: the first, which we denote branch network, encodes input functions (e.g., control signals), and the second, called trunk network, encodes the coordinates where the solution is evaluated (e.g., spatial locations). Their outputs are then combined to approximate the target operator. This architecture is inspired by (Lu_2021, ), a neural operator framework designed to approximate mappings between infinite-dimensional function spaces.
Recall that the PDE-constrained problem couples system dynamics and decision variables. To take account for this coupling, the proposed architecture incorporates both the system state and the controller output. Given the current estimated state and the control weights estimated by , the network outputs an estimate of the state at the subsequent collocation point in time:
(5) |
Rolling out (5) across yields the full estimated state trajectory, .
Branch network.
The branch net encodes the estimated system state sampled at fixed sensor locations, ,
together with estimated control weights , and maps this concatenated input to a -dimensional vector of coefficients
.
Trunk network.
The trunk net encodes the spatial query coordinate , producing a latent feature vector .
The estimated subsequent (in time) state at location is obtained by combining the branch coefficients and trunk features through an inner product:
(6) |
The remainder of this section details the method by which the state variables are precisely estimated using the proposed discrete-time Neural Operator.
Model initialization and training. Given the proposed time-discrete neural operator model takes as input the weights of the basis functions estimated by the surrogate controller , it is practical to initialize the dynamic predictor model. To achieve this, we initialize in a supervised fashion; its training data are generated by sampling time-varying control weight sequences from a gaussian distribution, e.g., , and by specifying the terminal state . The corresponding solution satisfying (1b)–(1f) is computed using a numerical PDE solver. Each trajectory is discretized into one-step training samples of the form . Formally, the dataset of training pairs is given by . The network is trained to minimize the prediction error between its outputs and the ground-truth values. Formally, this model is trained by minimizing the loss:
(7) |
where denotes the dataset of pairs . This training strategy ensures that learns to approximate the PDE solution operator given a distribution of weights approximating the distribution of estimated weights generated by the surrogate controller , so that it can provide accurate PDE-solutions given the predicted controller weights .
4.3 Handling Static and Dynamics Constraints Jointly
To integrate the proposed time-discrete neural perator within the decision process, this paper proposes a Primal Dual (LD) learning approach, which is inspired by the augmented Lagrangian relaxation technique (Hestenes1969MultiplierAG, ) adopted in classic optimization. In Lagrangian relaxation, some or all the problem constraints are relaxed into the objective function using Lagrangian multipliers to capture the penalties induced by violating them. The proposed formulation leverages Lagrangian duality to integrate trainable regularization terms that encapsulates violations of the constraint functions. When all the constraints are relaxed, the violation-based Lagrangian of problem (1) is
where , are defined in (1a), and (1f), respectively, and is defined as follows,
(8) |
It denotes the set of all equality constraints of problem (1), thus extending the constraints in (1e), with the system dynamics (1b)-(1c) and the initial and boundary conditions equations (1d)-(1e) written in an implicit form as above. Therein, and are the vectors of Lagrange multipliers associated with functions and , e.g. are associated with the -th equality in and -th inequality in , respectively. The key advantage of expressing the system dynamics (1b)-(1c) and initial and boundary conditions (1d)-(1e) in the same implicit form as the equality constraints , (as shown in (8)), is treating the system dynamics in the same manner as the constraint functions . This enables us to satisfy the system dynamics and the static set of constraints ensuring that they are incorporated seamlessly into the optimization process.
The proposed primal-dual learning method uses an iterative approach to find good values of the primal , and dual variables; it uses an augmented modified Lagrangian as a loss function to train the prediction , as employed
(9) |
where represents the total objective cost, while and measures the constraint violations incurred by prediction and . The loss function in (9) combines an objective term to ensure minimization of the objective function (1a) with a weighted penalty on violations of the constraint functions . This accounts for the contribution of both networks during training, which is balanced via iterative updates of the multipliers in (9) based on the amount of violation of the associated constraint function.
At iteration , finding the optimal parameters requires solving
(10) |
where and denote the DE-OP’s optimization and predictor models and , at iteration , with . This step is approximated using a stochastic gradient descent method
(11) |
where denotes the learning rate and and represent the gradients of the loss function with respect to the parameters and , respectively, at the current iteration . Importantly, this step does not recompute the training parameters from scratch, but updates the weights , based on their value at the previous iteration. Finally, the Lagrange multipliers are updated as
(12) |
where denotes the Lagrange step size and the max operation is performed element-wise.
Next, we evaluate our method on a range of PDE-constrained optimization tasks with varying levels of complexity, including cases space-time varying control actions and nonlinear PDE dynamics.
5 Experimental Setting
In this section, we evaluate the proposed method on three optimal control tasks of increasing complexity. In these tasks the dynamics are described by the following PDE equations: Voltage Dynamics (linear PDE with time-invariant control input ), 1D Heat equation (linear PDE with time-variant control input ), and 1D viscous Burgers’ equation (nonlinear PDE with time-variant control input ). PDE-OP is compared against state-of-the-art approaches for optimal control problems: the Direct method (Direct-Method, ), the Adjoint-Sensitivity method (adjoint-method, ), and (non-linear) Model Predictive Control (NMPC) (MPC, ). These methods are widely adopted in the literature, all of which rely on numerical PDE solvers whenever dynamics or gradients are evaluated. The Open-loop (direct/adjoint) Control method requires a full forward rollout per gradient step and a backward adjoint evalution, while MPC solves a finite-horizon problem at every control step (i.e., a receding horizon). The runtime of these methods is determined by repeated calls to the underlying numerical PDE solver and grows with the spatial grid size (), horizon length (), and complexity (e.g., non-linearity) of the PDE dynamics. For each experiment and method, we report the accuracy, measured as the error between the terminal state and the predicted terminal state, , the inference time of PDE-OP and runtime of the classical methods, both expressed in seconds. For each task, we assess the capability of the proposed time-discrete neural operator to capture system dynamics by comparing its outputs against the solutions obtained from a numerical PDE solver.
Please refer to Appendices 8, and 10 for details on the baseline methods and hyperparameter settings, respectively.
In Appendix 9.4 we report an ablation to evaluate the impact of our basis functions, introduced in Section 4.1). In summary, this experiment shows that using a full formulation is impractical for NMPC, pushing solve times well beyond real-time limits. Finally, we show that Adjoint methods suffer from inaccurate gradients, which leads to a marked decrease in solution quality.
5.1 Voltage Control Optimization
The problem of regulating the voltage magnitude in a leaky electrical transmission line, subject to a reaction-diffusion PDE with Neumann conditions is described as follows:
(13a) | |||||
s.t. | (13b) | ||||
(13c) | |||||
(13d) | |||||
(13e) |
Here, represents the voltage profile across the space-time domain defined by the cartesian product between and time horizon . The system dynamics, expressed by (13b), model the physical phenomena of diffusion (), voltage leakage towards a reference () and the applied control action (). , , and are all constants representing the diffusion rate, leakage rate, and control gain, respectively. The system is initialized from a state according to (13c) and is constrained by zero-flux Neumann boundary conditions (13d), implying no voltage diffusion across the boundaries of the domain. The control decision is a space-dependent function representing the externally applied voltage, constrained by actuator limits in (13e). The objective (13a) minimizes a combination of terminal cost and control energy.
Datasets and methods. In the system dynamics (13b), we set the diffusion rate , leakage rate , and control gain , with a uniform reference voltage . The ground-truth dynamics are simulated using a second-order implicit Crank-Nicolson finite-difference method(Crank_Nicolson_1947, ). To initialize the proposed time-discrete Neural Operator , we generate a dataset by sampling a diverse set of control functions from a zero-mean Gaussian Random Field (GRF) with a squared-exponential kernel, with mean and covariance function as:
(14) |
where is the length–scale and the marginal variance. The corresponding solution is computed using a numerical solver, and then split into training, validation, and test set.
PDE-OP’s model estimates the voltage profile ; for this prediction yield , which is compared against the target profile . PDE-OP’s surrogate controller is trained to produce optimal control action while minimizing objective (13a). Our approach is benchmarked against Direct and Adjoint-Method, which are described in Appendix 8.1- 8.2. At inference time, PDE-OP’s surrogate controller is tested on three unseen target profiles of different level of complexities: constant (), linear ramp (), and sine wave (). For each target, the control is generated in a single forward pass.


Results. Figure 4 shows the predicted voltage (central sub-figure of Fig. 4) of PDE-OP’s dynamic component at test time, which is compared to the solution (left sub-figure) obtained with a numerical PDE-solver. The figure shows that the predicted voltage closely matches the numerical PDE-solver solution, with an absolute error (right sub-figure) on the order of across the space–time domain.
Figure 3 presents a comparison between PDE-OP’s predicted terminal state and that obtained using the classical methods, over target profiles : sinusoidal, linear and constant. This figure shows that the voltage distribution produced by PDE-OP is well aligned with those produced by the Direct and the Adjoint-sensitivity method. As shown in the first row of Table 1, the direct method is the slowest, requiring and yielding an MSE of . The Adjoint method runs in with a similar accuracy, . Notably, PDE-OP is the fastest, by a significant margin: it runs in only s ( faster than Adjoint; faster than Direct) and yield the lowest MSE, with . These results show the enormous potential of the proposed approach, which produces solutions that are qualitatively similar (or even slightly better) than the classical methods, while being between and times faster than them.
PDE System | Control Type | Methods | Runtime | Accuracy | |||||||||||
|
Open-Loop |
|
|
|
|||||||||||
|
Closed-Loop |
|
|
|
|||||||||||
|
Closed-Loop |
|
|
|


5.2 Optimal Control of the Heat Equation
This task involves controlling the evolution of temperature in a one-dimensional domain described by a reaction–diffusion PDE. Given an initial profile, the controller aims to reach a prescribed target state with minimal control effort. The problem is formulated as:
(15a) | |||||
s.t. | (15b) | ||||
(15c) | |||||
(15d) | |||||
(15e) | |||||
(15f) |
The loss components in (15a) defines as follows.
(16a) | ||||
(16b) | ||||
(16c) |
Here represents the temperature profile. The objective (15a) balances three terms: a terminal loss for final state accuracy, a running loss for temperature tracking, and an effort loss for control efficiency, described by (16a)–(16c).
Datasets and methods. The solutions of (15b) are generated using a Crank-Nicolson method, which deal with time-varying source term by averaging the control input over each time step. To train our models, we generate a dataset of diverse trajectories. First, smooth, time-varying control weights () are generated by sampling each of the weight trajectories from a GRF over the time domain. The full spatial-temporal control field is then reconstructed from these weights constrained by (15f) using a set of fixed basis functions, as defined in (15c). For each generated control field, the full PDE is solved to generate a ground-truth temperature evolution . Each complete simulation yields a training sample, consisting of the control input weight trajectory , and the corresponding state trajectory evaluated at a set of fixed sensor locations. The dataset is then split into 80% training, 10% validation and 10% test.
Our framework uses two neural networks. First discrete-time neural operator, , is trained as a surrogate model described in Section (4.2). It act as a one-step time-advancement operator, learning the mapping . Second, PDE-OP’s surrogate controller is a recurrent neural network that acts as sequential decision-maker. At each time step , it takes the current state and the final target to produce the control weights . The controller is trained end-to-end by unrolling the trajectory using the surrogate . We benchmark our approach against a highly-tuned Adjoint-Method and Linear MPC(LMPC). Further implementation details are in Appendix 8.3 - 8.4.
At inference, we evaluate PDE-OP model on three unseen target profiles: constant(), linear ramp(), and sine wave().
Results. Figure 6 shows the temperature (central sub-figure of Fig. 6) estimated by PDE-OP’s dynamic component at test time, which is compared to the solution (left sub-figure) obtained with a numerical PDE-solver. The figure shows that also in this setting the predicted temperature aligns closely with the numerical PDE-solver solution, with an absolute error on the order of across the space–time domain.
Figure 5 illustrates the comparison of PDE-OP ’s predicted terminal state and those obtained using the classical methods, over the same target profiles introduced in the previous experiment. As also shown in Table 1, all three methods achieve the same accuracy, reporting a MSE of . However, PDE-OP is by far the fastest method, running in . This corresponds to a speedup over LMPC (running in ) and a speedup over the Adjoint method (which runs in ). These results demonstrate that even in a more complex setting, with time-variant control actions and linear PDE-dynamics, our method produces solutions that are qualitatively comparable to the classical methods, but at a significantly reduced computation cost.
5.3 Optimal Control of the 1D Burgers’ Equation
To test the ability of PDE-OP to handle nonlinear dynamics, we address the control of the 1D viscous Burgers’ equation with Dirichlet boundary condition. The control input and external force , is parameterized identically to the heat equation task, using time-varying weights for a set of basis functions. The optimization goal is to find the optimal trajectory of these weights subject to Burger’s equation (17b), initial (17d) and Dirchlet’s boundary condition (17e) and box constraints (17f).
(17a) | |||||
s.t. | (17b) | ||||
(17c) | |||||
(17d) | |||||
(17e) | |||||
(17f) |
is velocity field, and represents the viscosity parameter. The PDE in (17b) contains convective nonlinearity which has a significant control challenge not present in the linear heat equation. The objective retrains the same structure, balancing final state accuracy, path tracking, and control efficiency.
Datasets and methods. The ground-truth dynamics for the nonlinear Burgers’ equation are simulated using a high-fidelity numerical solver. The dataset generation process mirrors that of the heat equation: time-varying weights are sampled from a GRF, and the force field is reconstructed using spatial basis functions, and the PDE is solved to generate trajectories. Due to the system’s high nonlinearity, we benchmark our method against a much more general and computationally expensive Nonlinear MPC (NMPC) and Adjoint-Sensitivity Method. Further implementation details are in Appendix 8.3 - 8.4.
At inference, we test our PDE-OP model on three unseen target profiles: constant (), parabolic () and sine wave (). We don’t use a a linear target profile, because of the Dirichlet BCs.


Results. Figure 8 shows the system dynamics (central sub-figure of Fig. 8) estimated by PDE-OP’s dynamic component at test time, which is compared to the solution (left sub-figure) obtained with a numerical PDE-solver. The figure shows that the predicted temperature aligns closely with the numerical PDE-solver solution; with absolute error on the order of across the space–time domain, demonstrating the ability of PDE-OP’s dynamic predictor to accurately capture system dynamics even under nonlinearities.
Figure 7 illustrates the comparison of PDE-OP ’s predicted terminal state and those obtained using the classical methods, over the same target profiles introduced in the previous experiments. As shown in Figure 7 and Table 1, each method produces highly accurate solutions, with NMPC achieving the lowest error () but requiring . The Adjoint method is slightly less accurate, yielding a in . In contrast, PDE-OP produces qualitatively comparable predictions while being the fastest by a big margin - approximately faster than NMPC and faster than the Adjoint method!
6 Conclusion
This work was motivated by the efficiency requirements associated with solving partial differential equations-constrained optimization problems. It introduced a novel learning-based framework, PDE-OP, which incorporates differential equation constraints into optimization tasks for real-time applications. The approach uses a dual-network architecture, with one approximating the control strategies and another capturing the associated PDEs. This architecture exploits a primal-dual method to ensure that both the dynamics dictated by the PDEs and the optimization objectives are concurrently learned and respected. This integration enables end-to-end differentiation, allowing for efficient gradient-based optimization, and, to our knowledge, solves PDE-constrained optimization problems in real-time for the first time. Empirical evaluations across a set of PDE-constrained optimization tasks illustrated PDE-OP’s capability to address these complex challenges adeptly. Our comprehensive results demonstrate the effectiveness of our approach and its broad potential applicability across various scientific and engineering domains where system dynamics arise in optimization or control processes.
7 Acknowledgments
The material is based upon work supported by National Science Foundations (NSF) awards 2533631, 2401285, 2334448, and 2334936, and Defense Advanced Research Projects Agency (DARPA) under Contract No. #HR0011252E005. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF or DARPA.
References
- (1) Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, March 2021.
- (2) James Kotary, Ferdinando Fioretto, Pascal Van Hentenryck, and Bryan Wilder. End-to-end constrained optimization learning: A survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 4475–4482, 2021.
- (3) Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations, 2017.
- (4) Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces, 2024.
- (5) Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations, 2021.
- (6) Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 6572–6583. Curran Associates Inc., 2018.
- (7) David Q Mayne, James B Rawlings, CV Rao, and PO Scokaert. Constrained model predictive control: Stability and optimality. Automatica, 36(6):789–814, 2000.
- (8) Michael Hinze, René Pinnau, Michael Ulbrich, and Stefan Ulbrich. Optimization with PDE Constraints, volume 23 of Mathematical Modelling: Theory and Applications. Springer, 2008.
- (9) Fredi Tröltzsch. Optimal Control of Partial Differential Equations: Theory, Methods, and Applications, volume 112 of Graduate Studies in Mathematics. American Mathematical Society, 2010.
- (10) Matthew J Zahr and Charbel Farhat. Progressive construction of a parametric reduced-order model for pde-constrained optimization. International Journal for Numerical Methods in Engineering, 2014. https://arxiv.org/abs/1407.7618.
- (11) Markus A Dihlmann and Bernard Haasdonk. Certified pde-constrained parameter optimization using reduced basis surrogate models for evolution problems. Computational Optimization and Applications, 60:753–787, 2015.
- (12) Rakhoon Hwang, Jae Yong Lee, Jin Young Shin, and Hyung Ju Hwang. Solving pde-constrained control problems using operator learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4):4504–4512, June 2022.
- (13) Karen Willcox and Omar Ghattas. The imperative of physics-based machine learning and the role of reduced-order models. Nature Computational Science, 1(3):166–168, 2021.
- (14) Ferdinando Fioretto, Pascal Van Hentenryck, Terrence WK Mak, Cuong Tran, Federico Baldo, and Michele Lombardi. Lagrangian duality for constrained deep learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 118–135. Springer, 2020.
- (15) Seonho Park and Pascal Van Hentenryck. Self-supervised primal-dual learning for constrained optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 4052–4060, 2023.
- (16) Priya L Donti, David Rolnick, and J Zico Kolter. Dc3: A learning method for optimization with hard constraints. In ICLR, 2020.
- (17) James Kotary, Vincenzo Di Vito, Jacob Cristopher, Pascal Van Hentenryck, and Ferdinando Fioretto. Learning joint models of prediction and optimization, 2024.
- (18) M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, Feb 2019.
- (19) Zongyi Li, Nikola B. Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew M. Stuart, and Anima Anandkumar. Multipole graph neural operator for parametric partial differential equations. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 6755–6766, 2020.
- (20) Kaushik Bhattacharya, Bamdad Hosseini, Nikola B. Kovachki, and Andrew M. Stuart. Model reduction and neural networks for parametric pdes. In Mathematical and Scientific Machine Learning (MSML), volume 145 of Proceedings of Machine Learning Research, pages 203–233. PMLR, 2021.
- (21) Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. The deepgreen algorithm for learning generic pdes. Journal of Computational Physics, 420:109707, 2021.
- (22) Vincenzo Di Vito, Mostafa Mohammadian, Kyri Baker, and Ferdinando Fioretto. Learning to solve differential equation constrained optimization problems, 2024.
- (23) Mingquan Feng, Zhijie Chen, Yixin Huang, Yizhou Liu, and Junchi Yan. Optimal control operator perspective and a neural adaptive spectral method. Proceedings of the AAAI Conference on Artificial Intelligence, 39(14):14567–14575, Apr. 2025.
- (24) Zhexian Li, Athanassios S. Fokas, and Ketan Savla. On linear quadratic regulator for the heat equation with general boundary conditions. In 2024 IEEE 63rd Conference on Decision and Control (CDC), pages 7594–7599, 2024.
- (25) Magnus R. Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4:303–320, 1969.
- (26) John T. Betts. Practical Methods for Optimal Control and Estimation Using Nonlinear Programming. SIAM, 2 edition, 2010.
- (27) Yang Cao, Shengtai Li, Linda Petzold, and Radu Serban. Adjoint sensitivity analysis for differential-algebraic equations: The adjoint dae system and its numerical solution. SIAM Journal on Scientific Computing, 24(3):1076–1089, 2003.
- (28) Max Schwenzer, Muzaffer Ay, Thomas Bergs, and Dirk Abel. Review on model predictive control: an engineering perspective. The International Journal of Advanced Manufacturing Technology, 117:1327 – 1349, 2021.
- (29) J. Crank and P. Nicolson. A practical method for numerical evaluation of solutions of partial differential equations of the heat-conduction type. Mathematical Proceedings of the Cambridge Philosophical Society, 43(1):50–67, 1947.
- (30) Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190–1208, 1995.
- (31) Steven Diamond and Stephen Boyd. Cvxpy: A python-embedded modeling language for convex optimization. The Journal of Machine Learning Research, 17(1):2909–2913, 2016.
- (32) Dieter Kraft. Algorithm 733: Tomp–fortran modules for optimal control calculations. ACM Transactions on Mathematical Software (TOMS), 20:262 – 281, 1994.
- (33) Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
- (34) Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9:1735–1780, 1997.
Appendix
8 Classical Methods
8.1 Direct Method (Finite-Difference Approach)
The direct method treats the simulator as a black box mapping the bounded control vector to the discrete objective and employs a standard optimizer to minimize it. Here the control variable is discretized at each collocation point: .
Mechanism. We use L-BFGS-B[30] with per–coordinate box bounds to minimize (31). No analytic gradients are provided; instead, numerical derivatives are queried via finite differences. With a forward-difference method,
(18) |
where is the -th unit vector and is a small scalar (e.g., ). For higher accuracy (to which corresponds a double computational cost), the central-difference method is adopted:
(19) |
Computational cost. Each gradient evaluation requires simulator calls with forward differences (or with central differences), where denotes the number of spatial collocation points. Since one simulator call is a complete rollout (27) over time steps, the complexity per iteration is . This method is robust and simple, but its cost scales linearly with , which becomes prohibitive for dense spatial discretizations.
8.2 Adjoint Sensitivity Method for Open-Loop Problem
The Adjoint-Sensitivity Method efficiently computes exact gradients of PDE-constrained objectives w.r.t. high-dimensional control variables. Crucially, the cost of one gradient evaluation is one forward solve + one backward adjoint solve, regardless of the number of control parameters. We derive the continuous adjoint for the open-loop, time-invariant Voltage Control Optimization (5.1) and then we present the discrete-time adjoint state used in our Crank–Nicolson (CN) implementation.
Function spaces & BCs. Consider , , and impose homogeneous Neumann BCs on so that the boundary terms vanish in the variational calculus; the adjoint inherits the same Neumann BCs.
The problem consists of finding minimizing
(20) |
subject to
(21a) | ||||
(21b) | ||||
(21c) |
where is the regularization weight in the control effort.
Lagrangian. We define
(22) |
where is the adjoint state. By imposing and integrating by parts in and (with Neumann BCs) yields the adjoint system.
Adjoint PDE.
(23) |
Terminal condition.
(24) |
Gradient w.r.t. . Because is time-invariant, variation w.r.t. gives
(25) |
We optimize with L -BFGS -B under elementwise bounds ; no projection step is needed.
Semi-discrete forward dynamics. To simplify the notation, let be the discretized version of on a uniform grid with Neumann BCs, be the the Neumann Laplacian, and the static control vector with elements . Define and . Then
(26) |
Crank–Nicolson (CN) step. With and , CN gives
(27) |
Equivalently, the affine one-step map is
(28) | ||||
(29) | ||||
(30) |
Discrete objective. With we have:
(31) |
Discrete adjoint & gradient w.r.t. static .
(32) |
(33) |
Computational cost. One forward CN rollout (27) and one backward adjoint sweep (32) yield the exact discrete gradient (33); then, is updated with L-BFGS-B under box bounds. In 1D, the CN matrix is time-invariant and can be prefactored once, so that each step reduces to cheap triangular-matrix solver call; the totla cost per iteration is .
8.3 Time-Varying Adjoint for Closed-Loop Baselines
The adjoint method extends naturally to a time-varying control . The continuous adjoint PDE and terminal/boundary conditions remain as per Eq. (23)–(24), while the gradients w.r.t. are calculated as:
(34) |
We use this time-varying adjoint for the Heat and Burgers experiments (please refer to Section 5.2 and 5.3); the discrete forms below match our CN implementations.
Heat optimization problem — CN discretization.
With and a quadratic running/terminal costs, the discrete adjoint equation and per-step gradient are
(35) | ||||
(36) |
This is the gradient used by LMPC in Section 5.2. represent the weight matrices.
Burgers’ optimization problem — implicit CN step.
We write one CN step as the implicit map with Dirichlet endpoints enforced at each step. Let
(37) |
where is the Jacobian of the semi-discrete residual, and stacks the actuator shapes. For running cost , the closed-loop adjoint/backward sweep is
(38) | ||||
(39) | ||||
(40) |
which are the NMPC equations we implemented and related to the experiments in Section 5.3. Here and are adjoint the variables.
8.4 Model Predictive Control (MPC)
In what follows, we introduce the MPC algorithm; to simplify notation we omit the dependency from the spatial variable . At each time instant , MPC solves a finite-horizon problem of steps given inputs ; it applies the first input , then applies shifts and iterates(receding horizon).
Linear MPC for the Heat equation.
With Crank–Nicolson discretization and cosine-basis expansion for the control action, the dynamics are linear:
(41) |
We solve the convex QP
(42) | ||||
s.t. | ||||
Here, , , (Neumann BCs), stacks ; (42) is solved with CVXPY/OSQP [31]. Here and are the CN matrices of the system, obtained from the semi-discrete PDE operator :
(43) |
This way one CN step writes as:
(44) |
Nonlinear MPC for the Burgers equation.
For viscous Burgers, the CN step is nonlinear due to convection term; we write one step as the implicit map
(45) |
where is the unique CN solution for Burgers with input (computed via fsolve on the CN residual and by imposing Dirichlet endpoints at each step).
(46) | ||||
s.t. |
Choice of controller.
For the voltage optimization task, we use Direct and Adjoint-method for open loop. For the heat equation, CN yields a linear PDE, so the finite-horizon problem is a convex QP, which can be solved with Linear MPC. For Burgers’ equation, the dynamics are nonlinear and CN produces an implicit nonlinear update, so we use Nonlinear MPC, with NMPC optimizing over an implicit CN step. In both the heat and burgers’ tasks, we also use Adjoint-method for closed loop, and report its results.
9 Additional Experimental Results
In this section we evaluate the runtime and terminal tracking error (MSE) achieved by each method on polynomial target profiles, and report these metrics in Tables 2, 3, and 4. All methods use the same spatial/temporal discretization and solver settings as described in Appendix 10.

9.1 Voltage Control Optimization
We test each method on additional target profiles : (i) constant: so ; (ii) ramp: so . (Equivalently, with and , respectively.) Since a ramp target profile is infeasible under zero-flux (Neumann) boundaries, all methods can only produce approximations.
Methods | ||||
Runtime | Accuracy | Runtime | Accuracy | |
Direct (Finite Diff.) | 2.422s | 4.6498e-11 | 12.810s | 0.00005 |
Adjoint-Method | 0.049s | 2.8213e-12 | 0.464s | 0.00004 |
PDE-OP (ours) | 0.037s | 1.5275e-6 | 0.050s | 0.00007 |
Results.
Constant target . The Adjoint method attains the highest accuracy (MSE ) in s, while the Direct method is much slower at s with MSE .PDE-OP is fastest at s (about faster than Direct and faster than Adjoint) and reports an MSE of .
Ramp target . The Adjoint method yields the lowest error (), running in s; the Direct method achieves but takes s. PDE-OP runs in s, about faster than Direct and faster than Adjoint method, producing an MSE of , which is on same order of magnitude of the baselines.
9.2 Optimal Control of the 1D Heat Equation
For the 1D heat equation (closed-loop control), we consider affine targets with and . Under zero-flux (Neumann) boundary conditions, the ramp profile is not exactly admissible, as its derivative at the boundaries is nonzero; all controllers therefore track its best achievable approximation under these constraint. Here, we compare the proposed PDE-OP model with LMPC and the Adjoint-Sensitivity Method.
Methods | ||||
Runtime | Accuracy | Runtime | Accuracy | |
LMPC | 0.443s | 2.3289e-14 | 0.562s | 0.00008 |
Adjoint-Method | 0.485s | 6.6612e-11 | 0.552s | 0.00008 |
PDE-OP (ours) | 0.223s | 1.5210e-6 | 0.132s | 0.00009 |
Results.
Constant target . LMPC achieves produces accurate terminal profile (MSE ) and runs in s; the Adjoint runs in s with MSE . PDE-OP is fastest at s (about faster than LMPC and than Adjoint) with a still small error of .
Ramp target . LMPC and Adjoint achieves for the highest accuracy (MSE=) with runtimes s and s, respectively. PDE-OP runs only in s (about faster LMPC and than the Adjoint), with an MSE of of the same order of magnitude as the baselines.
9.3 Optimal Control of the 1D Burgers’ Equation
For the 1D viscous Burgers equation with Dirichlet boundaries , admissible steady targets must satisfy these conditions. We therefore consider (i) a zero (constant) target , and (ii) a parabolic target (we use in experiments). We compare our method with NMPC and time-varying Adjoint baseline.
Methods | ||||
Runtime | Accuracy | Runtime | Accuracy | |
NMPC | 132.243s | 5.6760e-14 | 671.033s | 0.00006 |
Adjoint-Method | 21.487s | 5.8290e-11 | 122.116s | 0.00013 |
PDE-OP (ours) | 0.059s | 0.00002 | 0.073s | 0.00031 |
Results.
Zero target . NMPC reaches nearly machine precision (MSE ) but is slow at s; the Adjoint baseline is faster at s with MSE . PDE-OP is fastest at s (about vs. NMPC and vs. Adjoint) with a small but larger error of .
Parabolic target . NMPC attains the lowest error (MSE ) in s; Adjoint yields at s. PDE-OP runs in s—about faster than NMPC and faster than Adjoint—with MSE (same order of magnitude).
Takeaway. For nonlinear Burgers dynamics with Dirichlet BCs, NMPC/Adjoint provide the best MSE, while PDE-OP delivers three to four orders of magnitude lower runtime with modest accuracy loss.
9.4 Discrete-Time Neural Operator with Direct Control Field
In this section we describe a variant of our discrete-time neural operator that learns directly the spatio–temporal control field rather than the weights of the basis function as introduced in Section 4.2. Similarly to the original DeepONet [1], PDE-OP’s model factors into a branch network that encodes input functions and a trunk network that encodes query coordinates. At each time , given the current state and the applied control , this model predicts the next state at coordinate :
(47) |
Branch network.
The branch net encodes the estimated system state sampled at fixed sensor locations
together with estimated control signals , and maps this concatenated input to a -dimensional vector of coefficients
.
Trunk network.
The trunk net encodes the spatial query coordinate , producing a latent feature vector .
The estimated subsequent (in time) state at location is obtained by combining the branch coefficients and trunk features through an inner product:
(48) |
Model initialization and training (direct ). In this setting the dynamic predictor takes as input the control action sampled on the spatial grid, together with the current state. is initialized in a supervised fashion by generating the target trajectories with a numerical PDE solver under diverse time-varying controls .
We sample admissible control sequences on the grid from a zero-mean Gaussian random field (GRF) with prescribed spatial covariance (and optional temporal correlation), after which a clipping operation is applied to meet the actuator bounds . Rolling the solver with these inputs produces states satisfying (1b)-(1f). Each trajectory is converted into one-step training pairs
Formally, the dataset consists of triples , with and . is trained to minimize the following loss
(49) |
9.5 MPC for Direct Control Field
Here we describe a “direct-” variant for viscous Burgers with Dirichlet boundaries, where the control action is parameterized at every spatial collocation point. At each time a vector, the optimization task consists of finding (e.g., for a given time instant we have spatial collocation point), and applying an implicit Crank–Nicolson step to propagate the state in time from to .
(50) | ||||
s.t. |
In case of closed loop (NMPC), a horizon of length yields decision variables per iteration; we warm start across receding horizons and use bound-constrained quasi-Newton (e.g., L-BFGS-B/SLSQP). Here represent the weight matrices, which correspond to identical with the Lagrange multipliers of our PDE-OP’s training algorithm. To compute the gradients, we either backpropagate through the discrete CN integrator using automatic differentiation or use the fully discrete adjoint; both methods are exact but require one forward rollout and one backward sweep over the horizon. This grid-wise formulation removes any basis bias and shows the attainable tracking when the controller has full spatial freedom, but it is While direct modeling of the control action doesn’t require any basis representation, its computational cost higher w.r.t. to the case in which a basis representation is adopted: the per-step cost scales roughly with , due to line-search/gradient evaluation entailing multiple CN solves. We therefore include it as a comprehensive baseline to contextualize the efficiency of our reduced-parameter (mode-based) controllers.
9.6 Experimental Results For Direct Control Field
When using the direct control field modeling for , we evaluate each method on the nonlinear, time-variant optimal control of the 1D Burgers’ equation, using the same target profiles presented in Section 5.3. Tables 5-6 report the runtime (s) and terminal tracking error (MSE) of each method. All methods use the same spatial/temporal discretization and solver settings as in 10. Figure 9 shows the predicted voltage (central sub-figure of Fig. 9) of PDE-OP’s dynamic component at test time, which is compared to the solution (left sub-figure) obtained with a numerical PDE-solver. Figures 10 and 11 show the comparison of PDE-OP’s predicted terminal state and those obtained using the baseline methods, over the same target profiles introduced in the main text. In figure 10, we set for NMPC and for the Adjoint-method, while in figure 11, we set for NMPC and for the Adjoint-method.
Method | Runtime | MSE | Runtime | MSE | Runtime | MSE |
NMPC | 39.254s | 0.00002 | 76.368s | 0.00014 | 36.483s | 7.9141e06 |
Adjoint-Method | 47.805s | 0.00124 | 79.644s | 0.00317 | 24.956s | 2.7618e08 |
PDE-OP (ours) | 0.267s | 0.00003 | 0.256s | 0.00008 | 0.257s | 0.00008 |
Results.
Across all targets, PDE-OP is two orders of magnitude faster than classical methods, between – faster then the NMPC and the Adjoint. In terms of accuracy, for the sine target, PDE-OP achieves the lowest MSE; for the parabola, NMPC attains the best results with PDE-OP close behind; for the zero target, the adjoint method achieves the lowest MSE with PDE-OP producing an MSE of the order of .
Method | Runtime | MSE | Runtime | MSE | Runtime | MSE |
NMPC | 826.176s | 0.00016 | 3143.735s | 0.00032 | 470.851s | 0.00061 |
Adjoint-Method | 186.910s | 0.00129 | 263.891s | 0.00316 | 51.702s | 4.9213e09 |
PDE-OP (ours) | 0.254s | 0.00003 | 0.277s | 0.00008 | 0.956s | 0.00008 |
Results.
With (NMPC) and (adjoint), PDE-OP remains dramatically faster, three to four orders of magnitude—than classical solvers. For the parabola target, PDE-OP is faster than NMPC and faster than adjoint, while also reporting the best MSE (). For the sine target, the speedups over classical methods is vs. NMPC and vs. adjoint, while attaining (again) the lowest accuracy (). For the zero target, the Adjoint method attains near-perfect regulation (), with PDE-OP is still fastest, running in and approximately faster than the Adjoint method and faster than NMPC, while reporting a very small tracking error ().


10 Hyperparameter Settings
In Tables 7, 8, and 9, we provide the values of the hyperparameters of PDE-OP model and the baselines baseline methods related to the voltage optimization experiments; Tables 10, 11, and 12, report the values of the hyperparameters of PDE-OP model and the baselines baseline methods related to the heat equation experiments; lastly, Tables 13, 14, and 13 report the values of the hyperparameters of PDE-OP model and the baselines baseline methods related to the Burgers’ equation experiments. For a fair evaluation, each method adopts the same PDE, grid, and horizon parameters .
MLP [33] a fully connected feedforward network.
Category | Hyperparameter | Value |
PDE & Simulation | Domain length | |
Final time | ||
Diffusion | ||
Leakage | ||
Control gain | ||
Initial state | ||
Reference | ||
Control bounds | ||
Discretization | Solver grid | |
Solver steps | ||
Number of sensors | ||
PDE-OP’s | Learning Rate | |
Number of epochs | ||
Optimizer | Adam | |
Batch size | ||
Activation function | SiLU | |
Lagrange Multiplier | ||
Branch net : # layers | ||
Branch net : hidden sizes | ||
Trunk net : # layers | ||
Trunk net : hidden sizes | ||
Feature dimension | ||
PDE-OP’s | Learning rate | |
Number of epochs | ||
Optimizer | AdamW | |
Batch size | ||
Activation function | RELU | |
MLP: # layers | ||
MLP: hidden sizes |
Category | Hyperparameter | Value / Note |
Optimizer | Algorithm | L-BFGS-B (box bounds ) |
Max iterations | 100 | |
Tolerances | (and/or gtol) | |
Initialization | (uniform) | |
Finite diff. | Stencil | forward (optionally central) |
Step size | , | |
Cost eval | Rollout solver | CN |
Runtime note | evals/grad (or for central) |
Category | Hyperparameter | Value / Note |
Optimizer | Algorithm | L-BFGS-B (box bounds ) |
Max iterations | 100 | |
Tolerances | (and/or gtol) | |
Initialization | ||
Adjoint solves | Forward stepper | CN |
Backward sweep | discrete adjoint (one pass) | |
Numerics | Linear solves | direct (LU) or iterative tol |
LSTM [34] denotes a Long Short-Term Memory network.
Category | Hyperparameter | Value |
PDE & Simulation | Domain length | |
Final time | ||
Diffusion | ||
Leakage | ||
Control gain | ||
Initial state | ||
Reference | ||
Weight bounds | ||
Discretization | Solver grid | |
Solver steps | ||
Number of sensors | ||
Number of basis functions | ||
PDE-OP’s | Learning Rate | |
Number of epochs | ||
Optimizer | AdamW | |
Batch size | ||
Activation function | RELU | |
Branch net : # layers | ||
Branch net : hidden sizes | ||
Trunk net : # layers | ||
Trunk net : hidden sizes | ||
Feature dimension | ||
PDE-OP’s | Learning rate | |
Number of epochs | ||
Optimizer | AdamW | |
Batch size | ||
Activation function | RELU | |
LSTM: # layers | ||
LSTM: hidden sizes |
Category | Parameter | Value / Note |
Horizon | Prediction length | |
Dynamics | State update | |
Discretization | Crank–Nicolson (linear) | |
Actuation | Basis∗ | , |
Solver | QP solver | OSQP (CVXPY), warm start |
Stopping | Tolerances | default OSQP (primal/dual res.) |
Category | Parameter | Value / Note |
Horizon | Prediction length | |
Forward | Integrator | Crank–Nicolson (linear), same |
Adjoint | Backward recursion | |
Gradient | Per step | |
Optimizer | Algorithm | L-BFGS-B, warm start |
Stopping | Max iters / tolerances | 100; |
LSTM[34] denotes a Long Short-Term Memory network.
Category | Hyperparameter | Value |
PDE & Simulation | Domain length | |
Final time | ||
Viscosity | ||
Weight bounds | ||
Discretization | Solver grid | |
Solver steps | ||
Number of sensors | ||
Number of basis functions | ||
PDE-OP’s | Learning Rate | |
Number of epochs | ||
Optimizer | AdamW | |
Batch size | ||
Activation function | RELU | |
Branch net : # layers | ||
Branch net : hidden sizes | ||
Trunk net : # layers | ||
Trunk net : hidden sizes | ||
Feature dimension | ||
PDE-OP’s | Learning rate | |
Number of epochs | ||
Optimizer | AdamW | |
Batch size | ||
Activation function | RELU | |
LSTM: # layers | ||
LSTM: hidden sizes |
Category | Parameter | Value / Note |
Horizon | Prediction length | e.g. (receding horizon) |
Dynamics | Implicit step (CN) | |
Actuation | Basis∗ | , |
Solver | Nonlinear optimizer | SLSQP (single-shooting), box bounds; warm start by shift |
CN inner | Newton solve per step | tolerance , max iters |
Category | Parameter | Value / Note |
Horizon | Prediction lentgh | 10 |
Forward | Implicit CN dynamics | as in NMPC, Dirichlet enforced each step |
Jacobian | Semi-discrete residual | |
Adjoint | Backward sweep | ; |
Gradient | Per-step control gradient | |
Optimizer | Outer optimizer | L-BFGS-B, warm start |