Efficient Norm-Based Reachable Sets via
Iterative Dynamic Programming

Akash Harapanahalli and Samuel Coogan1 1Akash Harapanahalli and Samuel Coogan are with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30318, USA. {aharapan,sam.coogan}.gatech.edu
Abstract

In this work, we present a numerical optimal control framework for reachable set computation using normotopes, a new set representation as a norm ball with a shaping matrix. In reachable set computations, we expect to continuously vary the shape matrix as a function of time. Incorporating the shape dynamics as an input, we build a controlled embedding system using a linear differential inclusion overapproximating the dynamics of the system, where a single forward simulation of this embedding system always provides an overapproximating reachable set of the system, no matter the choice of hypercontrol. By iteratively solving a linear quadratic approximation of the nonlinear optimal hypercontrol problem, we synthesize less conservative final reachable sets, providing a natural tradeoff between runtime and accuracy. Terminating our algorithm at any point always returns a valid reachable set overapproximation.

I Introduction

One way to assess the safety of a complex nonlinear system under uncertainty is to verify the specification for its forward reachable set. For nonlinear systems, the reachable set is often too complex to determine in closed form—instead, various algorithmic methods have been developed to compute guaranteed overapproximations. To combat conservativeness in these algorithmic strategies, different set representations have been developed, which have different tradeoffs between reachable set accuracy and computational efficiency. To name a few, there are zonotopes [1], constrained zonotopes [2], polynomial zonotopes [3], hybrid zonotopes [4], ellipsotopes [5], and taylor models [6]. For linear systems, any generalized star set [7] can be propagated using only n+1n+1 simulations. Beyond these generator-based approaches, there are many other frameworks for reachable set computation. For instance, level set methods use the Hamilton-Jacobi equations [8, 9, 10] to represent the reachable set as a level set of a solution to a partial differential equation.

From a dual perspective, one can evolve halfspaces in linear systems using the adjoint equation [11]. This dual formulation was recently re-explored in [12], where they consider general parametric set representations defined by the intersection of constraints, instead of ones generated by constrained combinations. As shown in [12], the parameters defining the constraints can be arbitrarily updated as long as the offset term compensates accordingly. This observation is the basic backbone of this work—we directly optimize over the dynamics of these parameters to attempt to minimize the final volume of our overapproximating reachable set.

Computing reachable sets using norm balls is not a novel approach on its own, and there are many works applying results from contraction theory to compute such overapproximations. For instance, [13, 14] use bounds on logarithmic norms to bloat or shrink norm balls along nominal trajectories. The works [15, 16] use interval analysis to overapproximate the Jacobian matrix to automatically bound the logarithmic norm. More recently, [17] uses a more accurate interval matrix to bound the logarithmic norm by comparing specifically to the nominal trajectory. The advantage of these works is in the simplicity of computing the logarithmic norm, which is often known in closed form—for example, as a simple eigenvalue maximization in the 2\ell_{2} case, or a maximum row/column sum computation in the \ell_{\infty}/1\ell_{1} cases [18, 19]. However, to our knowledge, existing works fix the norm along a real portion of the trajectory, using semi-definite programming to find a single shaping matrix along that segment. In this work, instead of successively choosing fixed norms for segments of the trajectory using semi-definite programming, we consider the shaping matrix as a continuous curve we can dynamically control, and use iterative dynamic programming along the nominal trajectory to find smoothly varying shaping matrices.

Contributions

In this work, we reformulate the reachable set computation problem as an optimal control problem. First, we propose a simple constraint-based set representation called a normotope, which is the sublevel set of a norm with a specified shaping matrix, center, and radius. We build a controlled embedding system for the original system using interval-based linear differential inclusions and logarithmic norms, with the observation that we may smoothly vary the shape matrix of the normotope as long as the offset term compensates accordingly—we call this the hypercontrol.

Next, we pose the following question: given an initial set for the system, can we find a hypercontrol mapping through the embedding space such that final reachable set has minimal volume? To address this problem, we define an optimal control problem to synthesize hypercontrol policies resulting in small terminal set volume. We show how a dynamic programming algorithms like iLQR can trade extra computations to iteratively optimize the accuracy of the reachable set estimate with respect to the hypercontrol. With two case studies, we show how (i) just a few cheap iterations can dramatically improve the reachable set accuracy, and (ii) that the algorithm can learn nontrivial policies which accumulate extra uncertainty earlier in the trajectory to shrink the final overapproximating reachable set.

II Background and Mathematical Preliminaries

II-A Notations

Let \|\cdot\| denote a norm. Let II denote the identity matrix of appropriate dimension and 𝟏\mathbf{1} denote the vector of all ones of appropriate dimension. For linear operator A:nxnyA:\mathbb{R}^{n_{x}}\to\mathbb{R}^{n_{y}} between normed vector spaces, we set Axy=supxnx:xnx=1Axny\|A\|_{x\to y}=\sup_{x\in\mathbb{R}^{n_{x}}:\|x\|_{\mathbb{R}^{n_{x}}}=1}\|Ax\|_{\mathbb{R}^{n_{y}}} to be the operator norm of AA. For linear operator A:nxnxA:\mathbb{R}^{n_{x}}\to\mathbb{R}^{n_{x}}, define the induced logarithmic norm μ(A)=lim suph0I+hAxx1h\mu(A)=\limsup_{h\searrow 0}\frac{\|I+hA\|_{x\to x}-1}{h}. GL(n)GL(n) is the set of invertible n×nn\times n matrices. Let co𝒮\operatorname{co}\mathcal{S} denote the convex hull of 𝒮\mathcal{S}.

II-B Nonlinear Systems and Reachable Sets

We consider nonlinear dynamical systems of the form

x˙=f(t,x,w),\displaystyle\dot{x}=f(t,x,w), (1)

where xnxx\in\mathbb{R}^{n_{x}} is the state of the system, wnww\in\mathbb{R}^{n_{w}} is a disturbance input to the system, and f:0×nx×nwnxf:\mathbb{R}_{\geq 0}\times\mathbb{R}^{n_{x}}\times\mathbb{R}^{n_{w}}\to\mathbb{R}^{n_{x}} is a C1C^{1} smooth function.

Under these assumptions, the system (1) has a unique trajectory starting from any initial condition x0nxx_{0}\in\mathbb{R}^{n_{x}} at t0t_{0}, under essentially bounded disturbance map 𝐰:[t0,)nw\mathbf{w}:[t_{0},\infty)\to\mathbb{R}^{n_{w}} defined for some neighborhood of t0t_{0}. Let ϕf(t;t0,x0,𝐰)\phi_{f}(t;t_{0},x_{0},\mathbf{w}) denote this trajectory.

Suppose we are given an initial set 𝒳0nx\mathcal{X}_{0}\subseteq\mathbb{R}^{n_{x}} at initial time t0t_{0}, and a compact disturbance set 𝒲nw\mathcal{W}\subset\mathbb{R}^{n_{w}}. Then we can define the (forward) reachable set of the system as

f(t;t0,𝒳0,𝒲)\displaystyle\mathcal{R}_{f}(t;t_{0},\mathcal{X}_{0},\mathcal{W})
:={ϕf(t;t0,x0,𝐰):x0𝒳0,𝐰:[t0,tf]𝒲}.\displaystyle:=\{\phi_{f}(t;t_{0},x_{0},\mathbf{w}):x_{0}\in\mathcal{X}_{0},\,\mathbf{w}:[t_{0},t_{f}]\to\mathcal{W}\}.

The reachable set is generally intractable to find in closed-form, both analytically and computationally, as it requires an infinite number of forward simulations of the system. The goal of this work and many others in the literature [20] is to efficiently compute a set overapproximation ¯f(t;t0,𝒳0,𝒲)f(t;t0,𝒳0,𝒲)\overline{\mathcal{R}}_{f}(t;t_{0},\mathcal{X}_{0},\mathcal{W})\supseteq\mathcal{R}_{f}(t;t_{0},\mathcal{X}_{0},\mathcal{W}).

An overapproximation of the true reachable set can verify several safety specifications for the system—for instance, reach-avoid problems can be posed as

¯(tf;t0,𝒳0,𝒲)𝒢reach 𝒢 at t=tf,¯(t;t0,𝒳0,𝒲)𝒜t[t0,tf]avoid 𝒜 for every t.\displaystyle\underbrace{\overline{\mathcal{R}}(t_{f};t_{0},\mathcal{X}_{0},\mathcal{W})\subseteq\mathcal{G}}_{\text{reach $\mathcal{G}$ at $t=t_{f}$}},\ \underbrace{\overline{\mathcal{R}}(t;t_{0},\mathcal{X}_{0},\mathcal{W})\subseteq\mathcal{A}^{\complement}\ \forall t\in[t_{0},t_{f}]}_{\text{avoid $\mathcal{A}$ for every $t$}}.

In the coming sections, we propose a new set representation called a normotope, build a controlled embedding system whose trajectory provides a normotope overapproximating the true reachable set, and optimize over the choice of hypercontrol mappings in this embedding space.

III Normotopes: A Simple Constraint-Based Set Representation

In this section, we introduce the normotope, some basic properties, and a simple recipe to compute the normotope reachable set of a linear system using the adjoint dynamics.

III-A Definition and Examples

Definition 1.

Given a norm \|\cdot\|, a normotope is the set

x̊,α,y:={xnx:α(xx̊)y},\displaystyle\llbracket{\mathring{x}},\alpha,y\rrbracket:=\{x\in\mathbb{R}^{n_{x}}:\|\alpha(x-{\mathring{x}})\|\leq y\},

where x̊nx{\mathring{x}}\in\mathbb{R}^{n_{x}} is the center, αGL(nx)\alpha\in GL({n_{x}}) is the square invertible nx×nx{n_{x}}\times{n_{x}} shape matrix, and y>0y>0 is the offset.

Normotopes are a special case of a parametope, the general constraint-based set representation introduced in [12], and are dual to basic ellipsotopes [5] (letting z=αy(xx̊)z=\tfrac{\alpha}{y}(x-{\mathring{x}}), x̊,α,y={yα1z+x̊:z1}\llbracket{\mathring{x}},\alpha,y\rrbracket=\{y\alpha^{-1}z+{\mathring{x}}:\|z\|\leq 1\}, which is a basic ellipsotope with center x̊{\mathring{x}} and generator matrix yα1y\alpha^{-1}).

Example 1 (2\ell_{2} normotopes).

The normotope

x̊,α,y2\displaystyle\llbracket{\mathring{x}},\alpha,y\rrbracket_{2} ={xnx:α(xx̊)2y}\displaystyle=\{x\in\mathbb{R}^{n_{x}}:\|\alpha(x-{\mathring{x}})\|_{2}\leq y\}
={xnx:(xx̊)TαTα(xx̊)y2}\displaystyle=\{x\in\mathbb{R}^{n_{x}}:(x-{\mathring{x}})^{T}\alpha^{T}\alpha(x-{\mathring{x}})\leq y^{2}\}

is equivalent to an ellipsoid centered at x̊{\mathring{x}}, with weighting matrix αTα\alpha^{T}\alpha and radius y2y^{2}. αGL(n)\alpha\in GL(n) implies αTα0\alpha^{T}\alpha\succ 0.

Example 2 (\ell_{\infty} normotopes).

The normotope

x̊,α,y\displaystyle\llbracket{\mathring{x}},\alpha,y\rrbracket_{\infty} ={xnx:α(xx̊)y}\displaystyle=\{x\in\mathbb{R}^{n_{x}}:\|\alpha(x-{\mathring{x}})\|_{\infty}\leq y\}
={xnx:y𝟏α(xx̊)y𝟏}\displaystyle=\{x\in\mathbb{R}^{n_{x}}:-y\mathbf{1}\leq\alpha(x-{\mathring{x}})\leq y\mathbf{1}\}
={xnx:αx̊y𝟏αxαx̊+y𝟏}\displaystyle=\{x\in\mathbb{R}^{n_{x}}:\alpha{\mathring{x}}-y\mathbf{1}\leq\alpha x\leq\alpha{\mathring{x}}+y\mathbf{1}\}

is equivalent to the H-rep polytope given by {x:Hxb}\{x:Hx\leq b\}, H=[αα]H=\begin{bmatrix}\alpha\\ -\alpha\end{bmatrix}, b=[αx̊+y𝟏αx̊+y𝟏]b=\begin{bmatrix}\alpha{\mathring{x}}+y\mathbf{1}\\ -\alpha{\mathring{x}}+y\mathbf{1}\end{bmatrix}, with 2n2n faces and 2n2^{n} vertices.

Example 3 (1\ell_{1} normotopes).

The normotope

x̊,α,y1\displaystyle\llbracket{\mathring{x}},\alpha,y\rrbracket_{1} ={xnx:α(xx̊)1y}\displaystyle=\{x\in\mathbb{R}^{n_{x}}:\|\alpha(x-{\mathring{x}})\|_{1}\leq y\}
={xnx:y𝟏Sα(xx̊)y𝟏}\displaystyle=\{x\in\mathbb{R}^{n_{x}}:-y\mathbf{1}\leq S\alpha(x-{\mathring{x}})\leq y\mathbf{1}\}
={xnx:Sαx̊y𝟏SαxSαx̊+y𝟏}\displaystyle=\{x\in\mathbb{R}^{n_{x}}:S\alpha{\mathring{x}}-y\mathbf{1}\leq S\alpha x\leq S\alpha{\mathring{x}}+y\mathbf{1}\}

where S2nx×nxS\in\mathbb{R}^{2^{n_{x}}\times{n_{x}}} is the matrix consisting of all 2nx2^{n_{x}} possible choices of signs, i.e., the kk-th row Sk=(s1k,,snk)S_{k}=(s^{k}_{1},\dots,s^{k}_{n}) where sjk{+1,1}s^{k}_{j}\in\{+1,-1\}. is equivalent to the H-rep polytope given by {x:Hxb}\{x:Hx\leq b\}, H=[SαSα]H=\begin{bmatrix}S\alpha\\ -S\alpha\end{bmatrix}, b=[Sαx̊+y𝟏Sαx̊+y𝟏]b=\begin{bmatrix}S\alpha{\mathring{x}}+y\mathbf{1}\\ -S\alpha{\mathring{x}}+y\mathbf{1}\end{bmatrix}, with 2n2^{n} faces and 2n2n vertices.

Remark 1 (Nomenclature).

We choose the name “normotope” over “norm-ball” to emphasize how we expect the shape matrix α\alpha to vary along reachable set trajectories.

III-B The LTV Case

In the undisturbed linear time-varying case (LTV) with a normotope initial set, the adjoint equation provides the true reachable set of the system as a normotope with fixed offset.

Proposition 1.

For the LTV system

x˙(t)=A(t)x(t),\displaystyle\dot{x}(t)=A(t)x(t),

the true reachable set is given by

f(t;t0,x̊0,α0,y0)=x̊(t),α(t),y0,\displaystyle\mathcal{R}_{f}(t;t_{0},\llbracket{\mathring{x}}_{0},\alpha_{0},y_{0}\rrbracket)=\llbracket{\mathring{x}}(t),\alpha(t),y_{0}\rrbracket,

where tx̊(t),α(t)t\mapsto{\mathring{x}}(t),\alpha(t) satisfy the following ODEs,

x̊˙(t)\displaystyle\dot{{\mathring{x}}}(t) =A(t)x̊(t),α˙(t)=α(t)A(t).\displaystyle=A(t){\mathring{x}}(t),\quad\dot{\alpha}(t)=-\alpha(t)A(t).

with initial conditions x̊(t0)=x̊0{\mathring{x}}(t_{0})={\mathring{x}}_{0}, α(t0)=α0\alpha(t_{0})=\alpha_{0}.

Proof.

We first recall how the flow map is given by ϕf(t;t0,x0)=Φ(t;t0)x0\phi_{f}(t;t_{0},x_{0})=\Phi(t;t_{0})x_{0}, where Φ\Phi is the state transition matrix satisfying the matrix ODE

Φ˙(t;t0)=A(t)Φ(t;t0),Φ(t0;t0)=𝐈,\displaystyle\dot{\Phi}(t;t_{0})=A(t)\Phi(t;t_{0}),\quad\Phi(t_{0};t_{0})=\mathbf{I},

with the property Φ1(t;t0)=Φ(t0;t)\Phi^{-1}(t;t_{0})=\Phi(t_{0};t). Thus, the true reachable set is given by

(t;t0,x̊0,α0,y0)={Φ(t;t0)x:α0(xx̊0)y0}\displaystyle\mathcal{R}(t;t_{0},\llbracket{\mathring{x}}_{0},\alpha_{0},y_{0}\rrbracket)=\{\Phi(t;t_{0})x:\|\alpha_{0}(x-{\mathring{x}}_{0})\|\leq y_{0}\}
={z:α0(Φ(t0;t)zx̊0)y0}\displaystyle=\{z:\|\alpha_{0}(\Phi(t_{0};t)z-{\mathring{x}}_{0})\|\leq y_{0}\}
={z:α0Φ(t0;t)(zΦ(t;t0)x̊0)y0}\displaystyle=\{z:\|\alpha_{0}\Phi(t_{0};t)(z-\Phi(t;t_{0}){\mathring{x}}_{0})\|\leq y_{0}\}
=Φ(t;t0)x̊,α0Φ(t0;t),y0=x̊(t),α(t),y0,\displaystyle=\llbracket\Phi(t;t_{0}){\mathring{x}},\alpha_{0}\Phi(t_{0};t),y_{0}\rrbracket=\llbracket{\mathring{x}}(t),\alpha(t),y_{0}\rrbracket,

where α(t)\alpha(t) satisfies the following,

α˙(t)=α0ddtΦ(t;t0)1=α0Φ(t;t0)1Φ˙(t;t0)Φ(t;t0)1\displaystyle\dot{\alpha}(t)=\alpha_{0}\frac{\mathrm{d}}{\mathrm{d}t}\Phi(t;t_{0})^{-1}=-\alpha_{0}\Phi(t;t_{0})^{-1}\dot{\Phi}(t;t_{0})\Phi(t;t_{0})^{-1}
=α0Φ(t0;t)A(t)Φ(t;t0)Φ(t;t0)1=α(t)A(t),\displaystyle=-\alpha_{0}\Phi(t_{0};t)A(t)\Phi(t;t_{0})\Phi(t;t_{0})^{-1}=-\alpha(t)A(t),

and x̊(t){\mathring{x}}(t) satisfies x̊˙(t)=A(t)x̊(t)\dot{{\mathring{x}}}(t)=A(t){\mathring{x}}(t). ∎

We can interpret the ODE α˙(t)=α(t)A(t)\dot{\alpha}(t)=-\alpha(t)A(t) as each row αjT\alpha_{j}^{T} evolving according to the adjoint dynamics α˙j(t)=A(t)Tαj(t)\dot{\alpha}_{j}(t)=-A(t)^{T}\alpha_{j}(t).

In the nonlinear systems case, the true reachable set is generally not itself a normotope. In the next section, we demonstrate how a similar embedding system approach can obtain guaranteed overapproximating normotope reachable sets by adding an appropriate dynamic compensation term to the offset yy using logarithmic and induced matrix norms.

IV Normotope Embeddings

In Proposition 1, we showed that the true reachable set in the linear time-varying case was given by a single forward simulation of another ODE (what we will refer to as an embedding system), by evolving the center x̊{\mathring{x}} using the system dynamics and the rows of the shape matrix α\alpha using the adjoint dynamics. In this section, for nonlinear systems, we demonstrate how a similar controlled embedding approach can be used to bound the behavior of any nonlinear dynamics.

IV-A Dynamics Abstraction: Linear Differential Inclusions

A linear differential inclusion (LDI) [21, p.51] encompassing the error dynamics of the system (1) is an inclusion where for every x𝒳x\in\mathcal{X} and w𝒲w\in\mathcal{W}, for prespecified x̊𝒳nx{\mathring{x}}\in\mathcal{X}\subseteq\mathbb{R}^{n_{x}} and ẘ𝒲nw{\mathring{w}}\in\mathcal{W}\subseteq\mathbb{R}^{n_{w}},

f(t,x,w)f(t,x̊,ẘ)x(t)(xx̊)+w(t)(wẘ),\displaystyle f(t,x,w)-f(t,{\mathring{x}},{\mathring{w}})\in\mathcal{M}^{x}(t)(x-{\mathring{x}})+\mathcal{M}^{w}(t)(w-{\mathring{w}}), (2)

where x(t)nx×nx\mathcal{M}^{x}(t)\subseteq\mathbb{R}^{{n_{x}}\times{n_{x}}} and w(t)nx×nw\mathcal{M}^{w}(t)\subseteq\mathbb{R}^{{n_{x}}\times{n_{w}}} are sets of matrices. There are many ways to obtain matrix sets satisfying (2). For example, if co¯{fx(t,x,w)}x(t)\overline{\operatorname{co}}\{\frac{\partial f}{\partial x}(t,x,w)\}\subseteq\mathcal{M}^{x}(t) and co¯{fw(t,x,w)}w(t)\overline{\operatorname{co}}\{\frac{\partial f}{\partial w}(t,x,w)\}\subseteq\mathcal{M}^{w}(t) for every (x,w)𝒳×𝒲(x,w)\in\mathcal{X}\times\mathcal{W} (where co¯\overline{\operatorname{co}} is the closed convex hull), then (2) holds for every (x,w)𝒳×𝒲(x,w)\in\mathcal{X}\times\mathcal{W} [17, Prop. 1], [21, p.55].

We briefly discuss an algorithmic method from [17, Cor. 1] to obtain the matrices x\mathcal{M}^{x} and w\mathcal{M}^{w} using interval analysis on the first partial derivatives of ff. Suppose we are bounding a C1C^{1} function g:nmg:\mathbb{R}^{n}\to\mathbb{R}^{m}, and we are given given an interval set Z1××ZnnZ_{1}\times\cdots\times Z_{n}\subseteq\mathbb{R}^{n}. Then the following implication holds, for fixed z̊Z1××Zn\mathring{z}\in Z_{1}\times\cdots\times Z_{n},

gixj(Z1,,Zj,z̊j+1,,z̊n)[]ij\displaystyle\frac{\partial g_{i}}{\partial x_{j}}(Z_{1},\dots,Z_{j},\mathring{z}_{j+1},\dots,\mathring{z}_{n})\subseteq[\mathcal{M}]_{ij}
g(z)g(z̊)[](zz̊),\displaystyle\implies g(z)-g(\mathring{z})\in[\mathcal{M}](z-\mathring{z}),

for every zZ1×Znz\in Z_{1}\times\cdots Z_{n} [17, Cor. 1], where the RHS is evaluated using interval matrix multiplication [22]. We can apply this procedure to the map f(t,,):nx×nwnxf(t,\cdot,\cdot):\mathbb{R}^{n_{x}}\times\mathbb{R}^{n_{w}}\to\mathbb{R}^{n_{x}} by defining an equivalent map f^t:nx+nwnx\hat{f}_{t}:\mathbb{R}^{{n_{x}}+{n_{w}}}\to\mathbb{R}^{n_{x}} to obtain the bound

f(t,x,w)f(t,x̊,ẘ)[(t)][xx̊wx̊],\displaystyle f(t,x,w)-f(t,{\mathring{x}},{\mathring{w}})\in[\mathcal{M}(t)]\begin{bmatrix}x-{\mathring{x}}\\ w-{\mathring{x}}\end{bmatrix},

and finally, we can split [(t)]nx×(nx+nw)[\mathcal{M}(t)]\subseteq\mathbb{R}^{{n_{x}}\times({n_{x}}+{n_{w}})} into two interval matrices [x(t)]nx×nx[\mathcal{M}^{x}(t)]\subseteq\mathbb{R}^{{n_{x}}\times{n_{x}}} and [w(t)]nx×nw[\mathcal{M}^{w}(t)]\subseteq\mathbb{R}^{{n_{x}}\times{n_{w}}}.

We simplify the expression by replacing the interval matrices [x(t)][\mathcal{M}^{x}(t)] and [w(t)][\mathcal{M}^{w}(t)] with the convex hull of a finite set of corners. For example, if the interval matrix [x(t)][\mathcal{M}^{x}(t)] has 44 non-singleton entries, then there are 24=162^{4}=16 corners Mix(t)M_{i}^{x}(t) to consider. Thus, we obtain a bound of the following form,

f(t,x,w)\displaystyle f(t,x,w)- f(t,x̊,ẘ)\displaystyle f(t,{\mathring{x}},{\mathring{w}})\in
co{Mix(t)}i(xx̊)+co{Mjw(t)}j(wẘ).\displaystyle\operatorname{co}\{M_{i}^{x}(t)\}_{i}(x-{\mathring{x}})+\operatorname{co}\{M_{j}^{w}(t)\}_{j}(w-{\mathring{w}}).

As we will see in Theorem 1 below, these finite sets will help us build a simple embedding system by evaluating logarithmic and induced matrix norms over each corner.

IV-B Controlled Embedding System

In Theorem 1, we apply build a particularly well structured embedding system for the special case of a normotope set with a linear differential inclusion bounding the error dynamics to the center of the normotope.

Theorem 1 (Controlled normotope embedding).

Let \|\cdot\| be a differentiable norm, the 1\ell_{1}-norm, or the \ell_{\infty}-norm. Let μ\mu be the induced logarithmic norm. Define the following controlled embedding system,

x̊˙(t)\displaystyle\dot{{\mathring{x}}}(t) =f(t,x̊,ẘ),\displaystyle=f(t,{\mathring{x}},{\mathring{w}}), (3a)
α˙(t)\displaystyle\dot{\alpha}(t) =U(t,x̊,α,y),\displaystyle=U(t,{\mathring{x}},\alpha,y), (3b)
y˙(t)=(maxiμ(U(t,x̊,α,y)α1+αMix(t)α1))y+maxjαMjw(t)wx,\displaystyle\begin{split}\dot{y}(t)&=\left(\max_{i}\mu(U(t,{\mathring{x}},\alpha,y)\alpha^{-1}+\alpha M^{x}_{i}(t)\alpha^{-1})\right)y\\ &\quad\quad+\max_{j}\|\alpha M^{w}_{j}(t)\|_{w\to x},\end{split} (3c)

where f(t,x,w)f(t,x̊,ẘ)co{Mix(t)}(xx̊)+co{Miw(t)}(wẘ)f(t,x,w)-f(t,{\mathring{x}},{\mathring{w}})\in\operatorname{co}\{M^{x}_{i}(t)\}(x-{\mathring{x}})+\operatorname{co}\{M^{w}_{i}(t)\}(w-{\mathring{w}}) for every (x,w)x̊,α,y×𝒲(x,w)\in\partial\llbracket{\mathring{x}},\alpha,y\rrbracket\times\mathcal{W}. Then for every t0t\geq 0,

f(t,x̊0,α0,y0,𝒲)x̊(t),α(t),y(t),\displaystyle\mathcal{R}_{f}(t,\llbracket{\mathring{x}}_{0},\alpha_{0},y_{0}\rrbracket,\mathcal{W})\subseteq\llbracket{\mathring{x}}(t),\alpha(t),y(t)\rrbracket,

where t(x̊(t),α(t),y(t))t\mapsto({\mathring{x}}(t),\alpha(t),y(t)) is the trajectory of the system (3) under any locally Lipschitz map UU.

Proof.

((i) \|\cdot\| is differentiable). For (x,w)x̊,α,y×𝒲(x,w)\in\partial\llbracket{\mathring{x}},\alpha,y\rrbracket\times\mathcal{W}, there is Mxco{Mix}iM^{x}\in\operatorname{co}\{M^{x}_{i}\}_{i} and Mwco{Mjw}jM^{w}\in\operatorname{co}\{M^{w}_{j}\}_{j} such that

ξ(t,x,w):=αg[U]+xg(Mx(xx̊)+Mw(wẘ))\displaystyle\xi(t,x,w):=\partial_{\alpha}g[U]+\partial_{x}g(M^{x}(x-{\mathring{x}})+M^{w}(w-{\mathring{w}}))
=z|α(xx̊)(U(xx̊)+α(Mx(xx̊)+Mw(wẘ)))\displaystyle=\tfrac{\partial\|\cdot\|}{\partial z}\big|_{\alpha(x-{\mathring{x}})}(U(x-{\mathring{x}})+\alpha(M^{x}(x-{\mathring{x}})+M^{w}(w-{\mathring{w}})))
=limh0α(xx̊)+hU(xx̊)+hMx(xx̊)+hMw(wẘ)α(xx̊)h\displaystyle=\lim_{h\to 0}\tfrac{\|\alpha(x-{\mathring{x}})+hU(x-{\mathring{x}})+hM^{x}(x-{\mathring{x}})+hM^{w}(w-{\mathring{w}})\|-\|\alpha(x-{\mathring{x}})\|}{h}
lim suph0(I+h(Uα1+αMxα1))α(xx̊)α(xx̊)h\displaystyle\leq\limsup_{h\searrow 0}\tfrac{\|(I+h(U\alpha^{-1}+\alpha M^{x}\alpha^{-1}))\alpha(x-{\mathring{x}})\|-\|\alpha(x-{\mathring{x}})\|}{h}
+αMw(wẘ)\displaystyle\quad+\|\alpha M^{w}(w-{\mathring{w}})\|
lim suph0I+h(Uα1+αMxα1)xx1hα(xx̊)\displaystyle\leq\limsup_{h\searrow 0}\tfrac{\|I+h(U\alpha^{-1}+\alpha M^{x}\alpha^{-1})\|_{x\to x}-1}{h}\|\alpha(x-{\mathring{x}})\|
+αMw(wẘ)\displaystyle\quad+\|\alpha M^{w}(w-{\mathring{w}})\|
=μ(Uα1+αMxα1)y+αMwop\displaystyle=\mu(U\alpha^{-1}+\alpha M^{x}\alpha^{-1})y+\|\alpha M^{w}\|_{\operatorname{op}}
maxiμ(Uα1+αMixα1)y+maxjαMjwwx.\displaystyle\leq\max_{i}\mu(U\alpha^{-1}+\alpha M^{x}_{i}\alpha^{-1})y+\max_{j}\|\alpha M^{w}_{j}\|_{w\to x}.

where the final inequality holds since μ\mu and wx\|\cdot\|_{w\to x} are convex functions [23, Lemma 2.3] with arguments linear in MxM^{x} and MwM^{w} respectively. [12, Cor. 1] concludes the proof.

((ii) =\|\cdot\|=\|\cdot\|_{\infty}) We first note that the normotope {α(xx̊)y}\{\|\alpha(x-{\mathring{x}})\|_{\infty}\leq y\} is equivalent to the symmetric H-rep polytope {y^α(xx̊)y^}\{-\hat{y}\leq\alpha(x-{\mathring{x}})\leq\hat{y}\} where y^=y𝟏nx\hat{y}=y\mathbf{1}\in\mathbb{R}^{n_{x}}. Then for any xx satisfying y^α(xx̊)y^-\hat{y}\leq\alpha(x-{\mathring{x}})\leq\hat{y} and |αk(xx̊)|=y|\alpha_{k}(x-{\mathring{x}})|=y for some kk (where αk\alpha_{k} is the kk-th row of α\alpha),

ξk(t,x,w):=α˙k(xx̊)+αk(f(t,x,w)f(t,x̊,ẘ))\displaystyle\xi_{k}(t,x,w):=\dot{\alpha}_{k}(x-{\mathring{x}})+\alpha_{k}(f(t,x,w)-f(t,{\mathring{x}},{\mathring{w}}))
=α˙k(xx̊)+αk(Mx(xx̊)+Mw(wẘ))\displaystyle=\dot{\alpha}_{k}(x-{\mathring{x}})+\alpha_{k}(M^{x}(x-{\mathring{x}})+M^{w}(w-{\mathring{w}}))
=(α˙kα1+αkMxα1)α(xx̊)+αkMw(wẘ)\displaystyle=(\dot{\alpha}_{k}\alpha^{-1}+\alpha_{k}M^{x}\alpha^{-1})\alpha(x-{\mathring{x}})+\alpha_{k}M^{w}(w-{\mathring{w}})
((α˙kα1+αkMxα1)k\displaystyle\leq\big((\dot{\alpha}_{k}\alpha^{-1}+\alpha_{k}M^{x}\alpha^{-1})_{k}
+jk|(α˙kα1+αkMxα1)j|)y+αkMw(wẘ)\displaystyle\quad+\textstyle\sum_{j\neq k}|(\dot{\alpha}_{k}\alpha^{-1}+\alpha_{k}M^{x}\alpha^{-1})_{j}|\big)y+\alpha_{k}M^{w}(w-{\mathring{w}})
μ(Uα1+αMxα1)y+αMwwx\displaystyle\leq\mu_{\infty}(U\alpha^{-1}+\alpha M^{x}\alpha^{-1})y+\|\alpha M^{w}\|_{w\to x}
maxiμ(Uα1+αMixα1)y+maxjαMjwwx\displaystyle\leq\max_{i}\mu_{\infty}(U\alpha^{-1}+\alpha M^{x}_{i}\alpha^{-1})y+\max_{j}\|\alpha M^{w}_{j}\|_{w\to x}

by definition of the μ\mu_{\infty} log norm [23, Table 2.1], and concluding with the same logic as part (i). The resulting H-polytope embedding recovers the same trajectory as (3) with the equivalence y^=y𝟏\hat{y}=y\mathbf{1}.

((iii) =1\|\cdot\|=\|\cdot\|_{1}) We first note that

x1\displaystyle\|x\|_{1} =i|xi|=maxsjS(sjTx)=Sx,\displaystyle=\textstyle\sum_{i}|x_{i}|=\max_{s_{j}\in S}(s_{j}^{T}x)=\|Sx\|_{\infty},

where S2nx×nxS\in\mathbb{R}^{2^{n_{x}}\times{n_{x}}} is the matrix consisting of all 2nx2^{n_{x}} possible choices of signs, i.e., the kk-th row Sk=(s1k,,snk)S_{k}=(s^{k}_{1},\dots,s^{k}_{n}) where sjk{+1,1}s^{k}_{j}\in\{+1,-1\}. Thus, the normotope {α(xx̊)1y}\{\|\alpha(x-{\mathring{x}})\|_{1}\leq y\} is equivalent to the symmetric H-rep polytope given by {x:y^Sα(xx̊)y^}\{x:-\hat{y}\leq S\alpha(x-{\mathring{x}})\leq\hat{y}\}, where y^=y𝟏\hat{y}=y\mathbf{1}. Let S+S^{+} be a left inverse of SS (S+S=𝐈S^{+}S=\mathbf{I}). Then for any xx satisfying y^Sα(xx̊)y^-\hat{y}\leq S\alpha(x-{\mathring{x}})\leq\hat{y} and |Skα(xx̊)|=y|S_{k}\alpha(x-{\mathring{x}})|=y for some kk (where αk\alpha_{k} is the kk-th row of α\alpha),

ξk(t,x,w)=(Skα˙)(xx̊)\displaystyle\xi_{k}(t,x,w)=(S_{k}\dot{\alpha})(x-{\mathring{x}})
+Skα(Mx(xx̊)+Mw(wẘ))\displaystyle\quad+S_{k}\alpha(M^{x}(x-{\mathring{x}})+M^{w}(w-{\mathring{w}}))
=(Skα˙α1S++SkαMxα1S+)Sα(xx̊)\displaystyle=(S_{k}\dot{\alpha}\alpha^{-1}S^{+}+S_{k}\alpha M^{x}\alpha^{-1}S^{+})S\alpha(x-{\mathring{x}})
+SkαMw(wẘ)\displaystyle\quad+S^{k}\alpha M^{w}(w-{\mathring{w}})
μ(S(Uα1+αMxα1)S+)y+SαMww,\displaystyle\leq\mu_{\infty}(S(U\alpha^{-1}+\alpha M^{x}\alpha^{-1})S^{+})y+\|S\alpha M^{w}\|_{\infty\to w},

concluding with the logic of part (ii). Finally, for any AA,

SAS+\displaystyle\|SAS^{+}\|_{\infty} =supz:z=1SAS+z=supx:Sx=1SAx\displaystyle=\sup_{z:\|z\|_{\infty}=1}\|SAS^{+}z\|_{\infty}=\sup_{x:\|Sx\|_{\infty}=1}\|SAx\|_{\infty}
=supx:x1Ax1=A1,\displaystyle=\sup_{x:\|x\|_{1}}\|Ax\|_{1}=\|A\|_{1},

thus, plugging into their definitions, we see that μ(SAS+)=μ1(A)\mu_{\infty}(SAS^{+})=\mu_{1}(A) and SAw=Aw1\|SA\|_{w\to\infty}=\|A\|_{w\to 1}. ∎

In practice, we first overapproximate the normotope x̊(t),α(t),y(t)\llbracket{\mathring{x}}(t),\alpha(t),y(t)\rrbracket with an interval subset, then apply the methodology from the previous section to obtain the matrices {Mix(t)}i\{M_{i}^{x}(t)\}_{i} and {Mjw(t)}j\{M_{j}^{w}(t)\}_{j}. Finally, we note that the dynamics (3c) are not the only possible choice of embedding dynamics; in fact, any expression satisfying the hypotheses of [12, Cor. 1] would constitute a valid embedding. However, when the dynamics are abstracted using a linear differential inclusion as (2), the expression (3c) is a natural choice with direct connections to contraction analysis [24, 19, 23].

Remark 2 (Connection to contraction theory).

The U=0U=0 case of Theorem 1 recovers the fixed norm results from [13, 16, 17] for the norm with fixed shape matrix xα=αx\|x\|_{\alpha}=\|\alpha x\|. The introduction of the hypercontrol UU allows us to continuously vary the shape matrix along the trajectory of the normotope embedding system.

V An Optimal Control Perspective Towards Reachable Set Computation

In this section, we synthesize the policy UU from (3) using an optimal control formulation. For clarity of presentation, we omit the disturbance input in this section and consider the nonlinear system

x˙=f(t,x),\displaystyle\dot{x}=f(t,x),

where f:×nxnxf:\mathbb{R}\times\mathbb{R}^{n_{x}}\to\mathbb{R}^{n_{x}} is C1C^{1} differentiable, but we note that all results in this section can easily be generalized to the case with a compact disturbance.

As seen in Proposition 1, for LTV systems the best choice of embedding dynamics is α˙=αA(t)\dot{\alpha}=-\alpha A(t). We choose hypercontrol policies of the form

U(t,x̊,α,y)=αfx(x̊)+Uff(t),\displaystyle U(t,{\mathring{x}},\alpha,y)=-\alpha\frac{\partial f}{\partial x}({\mathring{x}})+U_{\text{ff}}(t),

where UffU_{\text{ff}} is a feed-forward input to the embedding system. As discussed in [25], closing the loop with the adjoint of the linearization cancels the first order expansion of the dynamics and generally provides reasonable results when the offset is small and the system is close to linear. In the rest of this section, using the controlled embedding system from the previous section, we formulate an optimal control problem and use iterative dynamic programming to optimize the accuracy of our reachable set overapproximations.

V-A Cost Function Design

Suppose we are given an initial set x̊0,α0,y0\llbracket{\mathring{x}}_{0},\alpha_{0},y_{0}\rrbracket at t=t0t=t_{0}, and we are interested in verifying a goal specification at t=tft=t_{f}. For a goal specification, we are purely interested in the size of the reachable set at the final time, and therefore, we design an optimal control problem with a pure terminal cost. The following Lemma recalls a standard result whereby maximizing the log-determinant of the shape matrix minimizes the volume of the parametope set.

Lemma 1.

For any norm \|\cdot\|, a minimizer of the objecive function

Φ(x̊,α,y)=logdet(αTα/y2)\displaystyle\Phi({\mathring{x}},\alpha,y)=-\log\det(\alpha^{T}\alpha/y^{2}) (4)

minimizes the volume of the normotope set

vol(x̊,α,y).\displaystyle\operatorname{vol}(\llbracket{\mathring{x}},\alpha,y\rrbracket).
Proof.

The normotope set

x̊,α,y\displaystyle\llbracket{\mathring{x}},\alpha,y\rrbracket ={x:α(xx̊)y}={x:αy(xx̊)1}\displaystyle=\{x:\|\alpha(x-{\mathring{x}})\|\leq y\}=\{x:\|\tfrac{\alpha}{y}(x-{\mathring{x}})\|\leq 1\}
={(αy)1z+x̊:z1}\displaystyle=\{(\tfrac{\alpha}{y})^{-1}z+{\mathring{x}}:\|z\|\leq 1\}

is an affine image of the norm ball 1(0)\mathcal{B}_{1}(0). Thus, the volume

vol(x̊,α,y)=|det((αy)1)|vol(1(0))\displaystyle\operatorname{vol}(\llbracket{\mathring{x}},\alpha,y\rrbracket)=|\det((\tfrac{\alpha}{y})^{-1})|\operatorname{vol}(\mathcal{B}_{1}(0))

is inversely proportional to |det(αy)||\det(\tfrac{\alpha}{y})|. Since log(()2)\log((\cdot)^{2}) is a monotone function for positive inputs, we can maximize

log(|det(αy)|2)=log(det(αTy)det(αy))=logdet(αTαy2),\displaystyle\log(|\det(\tfrac{\alpha}{y})|^{2})=\log(\det(\tfrac{\alpha^{T}}{y})\det(\tfrac{\alpha}{y}))=\log\det(\tfrac{\alpha^{T}\alpha}{y^{2}}),

completing the proof. ∎

We implement normotopes and the terminal cost using JAX for Python, taking care to respect the stringent requirements of JAX, which provides us several crucial capabilities. Normotopes are natively implemented as a Pytree node class, allowing us to use them as a valid JAX type for operations such as automatic differentiation. For the terminal cost (4), we use jnp.linalg.cholesky instead of composing log and det for numerical stability.

V-B Open-Loop Hypercontrol Synthesis: Iterative Linear Quadratic Regulator

Algorithm 1 Reach-iLQR

Input: initial set X0=x̊0,α0,y0X_{0}=\llbracket{\mathring{x}}_{0},\alpha_{0},y_{0}\rrbracket, time interval [t0,tf][t_{0},t_{f}]
Parameters: step size γ>0\gamma>0, regularizer R0R\succ 0, volume threshold Φmax\Phi_{\text{max}}\in\mathbb{R}

1:Uff(0)(t)0U_{\text{ff}}^{(0)}(t)\leftarrow 0, d(t)0d(t)\leftarrow 0, K(t)0K(t)\leftarrow 0
2:i1i\leftarrow 1
3:repeat
4:  Forward simulate under the updated control Uff(i)U_{\text{ff}}^{(i)} from (9) to obtain X(i)(t)X^{(i)}(t) until Φ>Φmax\Phi>\Phi_{\text{max}} or t=tft=t_{f}.
5:  Compute terminal conditions (7) using automatic differentiation on terminal cost Φ\Phi from (4).
6:  Backward simulate Ricatti equations (6) using automatic differentiation to obtain d(t),K(t)d(t),K(t) from (8).
7:  ii+1i\leftarrow i+1
8:until termination criterion reached

Suppose we are given an initial set x̊0,α0,y0\llbracket{\mathring{x}}_{0},\alpha_{0},y_{0}\rrbracket. In this section, we apply the continuous-time iterative linear quadratic regulator approach (iLQR) [26] to numerically optimize the following optimal control problem.

minimizeUff:[t0,tf]nx×nxΦ(x̊(tf),α(tf),y(tf))\displaystyle\operatornamewithlimits{minimize}_{U_{\text{ff}}:[t_{0},t_{f}]\to\mathbb{R}^{{n_{x}}\times{n_{x}}}}\ \Phi({\mathring{x}}(t_{f}),\alpha(t_{f}),y(t_{f})) (5)
s.t. f(t,x)f(t,x̊)co{Mix(t)}(xx̊),\displaystyle\text{s.t. }\quad f(t,x)-f(t,{\mathring{x}})\in\operatorname{co}\{M_{i}^{x}(t)\}(x-{\mathring{x}}),
x̊˙(t)=f(t,x̊),α˙(t)=αfx(t,x̊)+Uff(t)y˙(t)=(maxiμ(Uff(t)α1+α(Mix(t)fx(t,x̊))α1))y\displaystyle\begin{aligned} \dot{{\mathring{x}}}(t)&=f(t,{\mathring{x}}),\\ \dot{\alpha}(t)&=-\alpha\tfrac{\partial f}{\partial x}(t,{\mathring{x}})+U_{\text{ff}}(t)\\ \dot{y}(t)&=\left(\max_{i}\mu(U_{\text{ff}}(t)\alpha^{-1}+\alpha(M_{i}^{x}(t)-\tfrac{\partial f}{\partial x}(t,{\mathring{x}}))\alpha^{-1})\right)y\\ \end{aligned}
(x̊(t0),α(t0),y(t0))=(x̊0,α0,y0)\displaystyle({\mathring{x}}(t_{0}),\alpha(t_{0}),y(t_{0}))=({\mathring{x}}_{0},\alpha_{0},y_{0})

For the rest of this section, we write X=vec(x̊0,α0,y0)nX=nx+nx2+1X=\operatorname{vec}({\mathring{x}}_{0},\alpha_{0},y_{0})\in\mathbb{R}^{n_{X}}=\mathbb{R}^{{n_{x}}+n_{x}^{2}+1}, and we abbreviate the embedding dynamics (3) as X˙=F(t,X,U)\dot{X}=F(t,X,U).

Next, we describe Algorithm 1, which uses continuous-time iterative linear quadratic regulation (iLQR) to improve the feedforward term UffU_{\text{ff}}. Suppose we are at timestep ii, given Uff(i)(t)U_{\text{ff}}^{(i)}(t) and the corresponding embedding system trajectory X(i)(t)X^{(i)}(t) from forward simulating the embedding dynamics (3).

V-B1 Backward Pass

Since we have no running cost, the backwards pass simplifies to the following ODEs,

σ˙(t)\displaystyle-\dot{\sigma}(t) =12QUT(t)R1QU(t),\displaystyle=-\tfrac{1}{2}Q_{U}^{T}(t)R^{-1}Q_{U}(t), (6a)
s˙(t)\displaystyle-\dot{s}(t) =FXT(t)s(t)QUXT(t)R1QU(t),\displaystyle=F_{X}^{T}(t)s(t)-Q_{UX}^{T}(t)R^{-1}Q_{U}(t), (6b)
S˙(t)\displaystyle-\dot{S}(t) =QXX(t)QUXT(t)R1QUX(t),\displaystyle=Q_{XX}(t)-Q_{UX}^{T}(t)R^{-1}Q_{UX}(t), (6c)

where R0R\succ 0 is some regularization matrix,

QU(t)\displaystyle Q_{U}(t) =FUT(t)s(t),\displaystyle=F_{U}^{T}(t)s(t),
QXX(t)\displaystyle Q_{XX}(t) =FXT(t)S(t)+S(t)FX(t),\displaystyle=F_{X}^{T}(t)S(t)+S(t)F_{X}(t),
QUX(t)\displaystyle Q_{UX}(t) =FUT(t)S(t),\displaystyle=F_{U}^{T}(t)S(t),

and the following

FUT(t)\displaystyle F_{U}^{T}(t) =FU(t,X(i)(t),Uff(i)(t)),\displaystyle=\frac{\partial F}{\partial U}(t,X^{(i)}(t),U_{\text{ff}}^{(i)}(t)),
FXT(t)\displaystyle F_{X}^{T}(t) =FX(t,X(i)(t),Uff(i)(t)),\displaystyle=\frac{\partial F}{\partial X}(t,X^{(i)}(t),U_{\text{ff}}^{(i)}(t)),

are computed using automatic differentiation in JAX. The ODEs are initialized by automatically differentiating the terminal cost Φ\Phi with respect to the state XX as

σ(tf)\displaystyle\sigma(t_{f}) =Φ(X(tf)),\displaystyle=\Phi(X(t_{f})), (7a)
s(tf)\displaystyle s(t_{f}) =ΦX(X(tf))T,\displaystyle=\frac{\partial\Phi}{\partial X}(X(t_{f}))^{T}, (7b)
S(tf)\displaystyle S(t_{f}) =2ΦX2(X(tf)).\displaystyle=\frac{\partial^{2}\Phi}{\partial X^{2}}(X(t_{f})). (7c)

Finally, we obtain curves d(t)d(t) and K(t)K(t) as

d(t)=R1QU(t),K(t)=R1QUX(t).\displaystyle d(t)=R^{-1}Q_{U}(t),\quad K(t)=R^{-1}Q_{UX}(t). (8)

The key result from iLQR is that the variational system (first order expansion of the dynamics)

δX˙=FXT(t)δX(t)+FUT(t)δU(t)\displaystyle\dot{\delta X}=F_{X}^{T}(t)\delta X(t)+F_{U}^{T}(t)\delta U(t)

and the second order expansion of the value function

V(x,t)σ(t)+s(t)TδX(t)+12δX(t)TS(t)δX(t)\displaystyle V(x,t)\approx\sigma(t)+s(t)^{T}\delta X(t)+\tfrac{1}{2}\delta X(t)^{T}S(t)\delta X(t)

satisfy the Hamilton-Jacobi-Bellman PDE with feedback controller δU(t)=d(t)K(t)δX(t)\delta U(t)=-d(t)-K(t)\delta X(t) (for the full derivation, please see [26]).

V-B2 Forward Pass

From the backward pass, we obtain the curves d(t)d(t) and K(t)K(t). We then perform another forward sweep by updating UffU_{\text{ff}} using the following (where ii has been incremented),

Uff(i)(t)=Uff(i1)(t)γd(t)K(t)(X(i)(t)X(i1)(t)),\displaystyle U_{\text{ff}}^{(i)}(t)=U_{\text{ff}}^{(i-1)}(t)-\gamma d(t)-K(t)(X^{(i)}(t)-X^{(i-1)}(t)), (9)

where γ>0\gamma>0 is the step size, X(i1)X^{(i-1)} is the previous embedding trajectory under Uff(i1)U_{\text{ff}}^{(i-1)}, and X(i)X^{(i)} is the new embedding trajectory under Uff(i)U_{\text{ff}}^{(i)}. Theoretically, when γ\gamma is small enough, the displacement δX=X(i)X(i1)\delta X=X^{(i)}-X^{(i-1)} along the trajectory is small enough where the first order dynamics expansion and second order value expansion remain accurate.

While the forward pass always produces a valid reachable set overapproximation, it is often possible that the reachable set blows up before the prescribed final time tft_{f}. To combat this situation, we introduce a threshold Φmax\Phi_{\text{max}}, where we stop the forward simulation if Φ(X(t))>Φmax\Phi(X(t))>\Phi_{\text{max}}. When this happens, we proceed as if the final time tft_{f} was the time instance before the threshold was violated; in practice, we find that iterates will eventually tend towards policies which reach the final time. The result is an algorithm where each iteration provides a valid overapproximate reachable tube, tending towards those with smaller terminal set volumes.

Remark 3 (Comparison to the literature).

Previous approaches [13, 16, 17] break the time interval [t0,tf][t_{0},t_{f}] into segments, solve intermediate semidefinite programs to optimize over the potential choices of norms along each individual segment, and ensure the reachable set containment by overapproximating the norm ball at the end of the previous segment with the new norm ball at the start of the next segment. Algorithm 1 instead provides a continuous formulation where the hypercontrol smoothly varies the shaping matrix along the nominal trajectory, both avoiding successive norm overapproximations and allowing us to backpropogate the terminal set volume all the way to the initial timestep.

VI Examples

In the following examples, we demonstrate how Algorithm 1 can (i) be used to quickly modify the reachable set to obtain drastically improve estimates in a short amount of time, and (ii) learn nontrivial open-loop policies for more difficult benchmarks. For all ODE simulations, we use Euler integration with a step size of 0.010.01. 111The code is available at https://github.com/gtfactslab/Harapanahalli_ACC2026.

Refer to caption
Refer to caption
Figure 1: Top: Overapproximating reachable sets for the robot arm for t[0,10]t\in[0,10] are plotted for various iteration numbers of Algorithm 1, with Monte Carlo samples in red. Bottom: The terminal cost Φ\Phi is plotted as a function of the iLQR iteration number in blue, while the cumulative runtime is in red. After 5 iterations of iLQR, the overapproximate reachable set reaches the final timestep. Iterating further, we can trade runtime for less overconservatism in the final reachable set.
Refer to caption
Refer to caption
Figure 2: Top: Overapproximating reachable sets for the Van der Pol oscillator for t[0,7]t\in[0,7] are shown for various iteration numbers of Algorithm 1, with Monte Carlo samples in red. Bottom: The function Φ(X(t))\Phi(X(t)) is plotted as a function of time for the same iteration numbers. While the pure adjoint dynamics (i=1i=1) does well for small horizons, the error quickly accumulates over longer horizons and cannot reach the goal of tf=7t_{f}=7s. On the other hand, after 3000 iterations of iLQR, the algorithm qualitatively learns to accumulate extra error in earlier timesteps so it can reach and minimize the final timestep.

VI-A Robot Arm

Consider the dynamics of a robot arm from [27, 15]

q˙1\displaystyle\dot{q}_{1} =z1,q˙2=z2,\displaystyle=z_{1},\quad\dot{q}_{2}=z_{2},
z˙1\displaystyle\dot{z}_{1} =1mq22+ML2/3(2mq2z1z2kd1z1+kp1(u1q1)),\displaystyle=\tfrac{1}{mq_{2}^{2}+ML^{2}/3}(-2mq_{2}z_{1}z_{2}-k_{d_{1}}z_{1}+k_{p_{1}}(u_{1}-q_{1})),
z˙2\displaystyle\dot{z}_{2} =q2z12+1m(kd2z2+kp2(u2q2)),\displaystyle=q_{2}z_{1}^{2}+\tfrac{1}{m}(-k_{d_{2}}z_{2}+k_{p_{2}}(u_{2}-q_{2})),

with u1=2u_{1}=2, u2=1u_{2}=1, m=M=1m=M=1, L=3L=\sqrt{3}, kp1=2k_{p_{1}}=2, kp2=1k_{p_{2}}=1, kd1=2k_{d_{1}}=2, kd2=1k_{d_{2}}=1. We use the initial set x̊0,α0,y02\llbracket{\mathring{x}}_{0},\alpha_{0},y_{0}\rrbracket_{2}, where α0=P1/2/0.1\alpha_{0}=P^{1/2}/0.1 with PP and x̊0{\mathring{x}}_{0} from [15], and y0=1y_{0}=1. We note that this initial set has a radius 2.52.5 times larger than the largest radius considered in the comparison example from [17], and thus a volume that is 3939 times larger. We run Algorithm 1 with tf=10t_{f}=10, γ=1\gamma=1, R=20R=20, and Φmax=0.1\Phi_{\text{max}}=-0.1 for 20 iterations, taking a total of 7.5 seconds, and the results are shown in Figure 1.

VI-B Van der Pol Oscillator

Consider the Van der Pol oscillator written in usual state space coordinates as

x˙1=x2,x˙2=x1+η(1x12)x2,\displaystyle\dot{x}_{1}=x_{2},\quad\dot{x}_{2}=-x_{1}+\eta(1-x_{1}^{2})x_{2},

with nonlinearity η=1\eta=1. We consider the initial set x̊0,α0,y02\llbracket{\mathring{x}}_{0},\alpha_{0},y_{0}\rrbracket_{2} for x̊0=[2 0]T{\mathring{x}}_{0}=[-2\ 0]^{T}, α0=80I\alpha_{0}=80I, and y0=1y_{0}=1.

We split the iLQR iterations into two phases. We first start with a small regularization of R=0.5IR=0.5I as an exploratory phase, keeping track of the best reachable set computation in all the iterates (the one that reaches the furthest before violation Φ>Φmax=1.75\Phi>\Phi_{\text{max}}=-1.75, with the smallest volume). After 750 iterations, we revert to the best iterate from the first phase (i=707i=707) and use a larger regularizer R=5IR=5I for another 750 iterations to help reach the end. All 1500 iterations took a total of 23.8 seconds to execute.

VII Conclusion

In this work, we presented an optimal control framework for synthesizing guaranteed overapproximate reachable sets with small terminal volume using iterative dynamic programming. Our framework builds off the controlled parametric embedding approach described in [12], constructing a special case for normotope sets using logarithmic and induced matrix norms. In future work, we plan to use techniques from reinforcement learning to learn closed-loop policies using offline training to mimic the output of Algorithm 1, which would be quick to evaluate for online reachable set computations.

References

  • [1] M. Althoff, “An Introduction to CORA 2015,” in Proc. of the 1st and 2nd Workshop on Applied Verification for Continuous and Hybrid Systems, pp. 120–151, EasyChair, Dec. 2015.
  • [2] J. K. Scott, D. M. Raimondo, G. R. Marseglia, and R. D. Braatz, “Constrained zonotopes: A new tool for set-based estimation and fault detection,” Automatica, vol. 69, pp. 126–136, 2016.
  • [3] N. Kochdumper and M. Althoff, “Sparse polynomial zonotopes: A novel set representation for reachability analysis,” IEEE Transactions on Automatic Control, vol. 66, no. 9, pp. 4043–4058, 2020.
  • [4] T. J. Bird, H. C. Pangborn, N. Jain, and J. P. Koeln, “Hybrid zonotopes: A new set representation for reachability analysis of mixed logical dynamical systems,” Automatica, vol. 154, p. 111107, 2023.
  • [5] S. Kousik, A. Dai, and G. X. Gao, “Ellipsotopes: Uniting ellipsoids and zonotopes for reachability analysis and fault detection,” IEEE Transactions on Automatic Control, vol. 68, no. 6, pp. 3440–3452, 2022.
  • [6] X. Chen, Reachability analysis of non-linear hybrid systems using taylor models. PhD Thesis, Fachgruppe Informatik, RWTH Aachen University, 2015.
  • [7] P. S. Duggirala and M. Viswanathan, “Parsimonious, simulation based verification of linear systems,” in International conference on computer aided verification, pp. 477–494, Springer, 2016.
  • [8] I. Mitchell and C. J. Tomlin, “Level set methods for computation in hybrid systems,” in Hybrid Systems: Computation and Control (HSCC), pp. 310–323, 2000.
  • [9] S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin, “Hamilton-Jacobi reachability: A brief overview and recent advances,” in 56th Conference on Decision and Control (CDC), pp. 2242–2253, 2017.
  • [10] S. Bansal and C. J. Tomlin, “Deepreach: A deep learning approach to high-dimensional reachability,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824, IEEE, 2021.
  • [11] P. Varaiya, “Reach set computation using optimal control,” in Verification of Digital and Hybrid Systems, pp. 323–331, Springer, 2000.
  • [12] A. Harapanahalli and S. Coogan, “Parametric reachable sets via controlled dynamical embeddings,” arXiv preprint arXiv:2504.06955, 2025.
  • [13] J. Maidens and M. Arcak, “Reachability analysis of nonlinear systems using matrix measures,” IEEE Transactions on Automatic Control, vol. 60, no. 1, pp. 265–270, 2014.
  • [14] M. Arcak and J. Maidens, “Simulation-based reachability analysis for nonlinear systems using componentwise contraction properties,” Principles of Modeling: Essays Dedicated to Edward A. Lee on the Occasion of His 60th Birthday, pp. 61–76, 2018.
  • [15] C. Fan, J. Kapinski, X. Jin, and S. Mitra, “Locally optimal reach set over-approximation for nonlinear systems,” in Proceedings of the 13th International Conference on Embedded Software, pp. 1–10, 2016.
  • [16] C. Fan, J. Kapinski, X. Jin, and S. Mitra, “Simulation-driven reachability using matrix measures,” ACM Transactions on Embedded Computing Systems (TECS), vol. 17, no. 1, pp. 1–28, 2017.
  • [17] A. Harapanahalli and S. Coogan, “A linear differential inclusion for contraction analysis to known trajectories,” arXiv preprint arXiv:2411.11587, 2024.
  • [18] C. Desoer and M. Vidyasagar, Feedback systems: Input-output properties. Society for Industrial and Applied Mathematics, 1975.
  • [19] A. Davydov, S. Jafarpour, and F. Bullo, “Non-Euclidean contraction theory for robust nonlinear stability,” IEEE Transactions on Automatic Control, vol. 67, no. 12, pp. 6667–6681, 2022.
  • [20] M. Althoff, G. Frehse, and A. Girard, “Set propagation techniques for reachability analysis,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 4, no. 1, pp. 369–395, 2021.
  • [21] S. Boyd, L. E. Ghaoui, E. Feron, and V. Balakrishnan, Linear matrix inequalities in system and control theory. SIAM, 1994.
  • [22] L. Jaulin, M. Kieffer, O. Didrit, and E. Walter, Applied interval analysis. Springer, 1st edition. ed., 2001.
  • [23] F. Bullo, Contraction theory for dynamical systems. Kindle Direct Publishing, 1.1 ed., 2023.
  • [24] E. D. Sontag, “Contractive systems with inputs,” in Perspectives in Mathematical System Theory, Control, and Signal Processing: A Festschrift in Honor of Yutaka Yamamoto on the Occasion of his 60th Birthday, pp. 217–228, Springer, 2010.
  • [25] A. Harapanahalli, S. Jafarpour, and S. Coogan, “immrax: A parallelizable and differentiable toolbox for interval analysis and mixed monotone reachability in JAX,” IFAC-PapersOnLine, vol. 58, no. 11, pp. 75–80, 2024.
  • [26] J. Lieskovskỳ, J. Bušek, and T. Vyhlídal, “Continuous-time iterative linear-quadratic regulator,” arXiv preprint arXiv:2505.15525, 2025.
  • [27] D. Angeli, E. Sontag, and Y. Wang, “A characterization of integral input-to-state stability,” IEEE Transactions on Automatic Control, vol. 45, no. 6, pp. 1082–1097, 2000.

Appendix

VII-A The LTV Case: Adjoint Satisfies Pontryagin’s Principle

As a sanity check, we verify that the adjoint equation satisfies Pontryagin’s Minimum Principle (PMP) for the problem (5) in the LTV case. Consider the linear system

x˙=A(t)x(t),\displaystyle\dot{x}=A(t)x(t),

and the embedding system from Theorem 1, simplifying to

x̊˙=A(t)x̊(t),α˙=α(t)A(t)+U~(t)α(t),y˙=μ(U~(t))y,\displaystyle\dot{{\mathring{x}}}=A(t){\mathring{x}}(t),\ \dot{\alpha}=-\alpha(t)A(t)+\tilde{U}(t)\alpha(t),\ \dot{y}=\mu(\tilde{U}(t))y,

where we have made the variable substitution U~(t)=(α(t)A(t)+U(t))α(t)1\tilde{U}(t)=(\alpha(t)A(t)+U(t))\alpha(t)^{-1}. Consider the volume terminal cost from (4),

Φ(x̊,α,y)=logdet(αTα)+2nlogy.\displaystyle\Phi({\mathring{x}},\alpha,y)=-\log\det(\alpha^{T}\alpha)+2n\log y.

The Hamiltonian of this problem is

H(t,x̊,α,y,px̊,pα,py,U~)\displaystyle H(t,{\mathring{x}},\alpha,y,p_{\mathring{x}},p_{\alpha},p_{y},\tilde{U})
=px̊,Ax̊pα,αA+pα,U~α+12pyyλmax(U~T+U~),\displaystyle=\langle p_{\mathring{x}},A{\mathring{x}}\rangle-\langle p_{\alpha},\alpha A\rangle+\langle p_{\alpha},\tilde{U}\alpha\rangle+\tfrac{1}{2}p_{y}y\lambda_{\max}(\tilde{U}^{T}+\tilde{U}),

thus, the costate dynamics satisfy

p˙α\displaystyle\dot{p}_{\alpha} =Hα=pαATU~Tpα\displaystyle=-H_{\alpha}=p_{\alpha}A^{T}-\tilde{U}^{T}p_{\alpha}
p˙y\displaystyle\dot{p}_{y} =Hy=12pyλmax(U~T+U~)\displaystyle=-H_{y}=-\tfrac{1}{2}p_{y}\lambda_{\max}(\tilde{U}^{T}+\tilde{U})

with terminal costates given by

pα(tf)=Φα=2α(tf)T,py(tf)=Φy=2n/y(tf).\displaystyle p_{\alpha}(t_{f})=\tfrac{\partial\Phi}{\partial\alpha}=-2\alpha(t_{f})^{-T},\quad p_{y}(t_{f})=\tfrac{\partial\Phi}{\partial y}=2n/y(t_{f}).

Set U~=0\tilde{U}^{*}=0. Then y˙=0\dot{y}=0, p˙y=0\dot{p}_{y}=0, and

ddtαpαT=α˙pαT+αp˙αT=αApαT+αApαT=0\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}\alpha p_{\alpha}^{T}=\dot{\alpha}p_{\alpha}^{T}+\alpha\dot{p}_{\alpha}^{T}=-\alpha Ap_{\alpha}^{T}+\alpha Ap_{\alpha}^{T}=0
α(t)pα(t)T=α(tf)pα(tf)T=2𝐈\displaystyle\implies\alpha(t)p_{\alpha}(t)^{T}=\alpha(t_{f})p_{\alpha}(t_{f})^{T}=-2\mathbf{I}

Plugging into the Hamiltonian,

H(,U~)=px̊,Ax̊pα,αA+pα,U~α+λmax(U~T+U~)\displaystyle H(\dots,\tilde{U})=\langle p_{\mathring{x}},A{\mathring{x}}\rangle-\langle p_{\alpha},\alpha A\rangle+\langle p_{\alpha},\tilde{U}\alpha\rangle+\lambda_{\max}(\tilde{U}^{T}+\tilde{U})
=H(,U~)+nλmax(U~T+U~)+tr(pαTU~α)\displaystyle=H(\dots,\tilde{U}^{*})+n\lambda_{\max}(\tilde{U}^{T}+\tilde{U})+\operatorname{tr}(p_{\alpha}^{T}\tilde{U}\alpha)
=H(,U~)+nλmax(U~T+U~)2tr(U~)\displaystyle=H(\dots,\tilde{U}^{*})+n\lambda_{\max}(\tilde{U}^{T}+\tilde{U})-2\operatorname{tr}(\tilde{U})
=H(,U~)+nλmax(U~T+U~)tr(U~T+U~)\displaystyle=H(\dots,\tilde{U}^{*})+n\lambda_{\max}(\tilde{U}^{T}+\tilde{U})-\operatorname{tr}(\tilde{U}^{T}+\tilde{U})
=H(,U~)+i=1n(λmax(U~T+U~)λi(U~T+U~)0).\displaystyle=H(\dots,\tilde{U}^{*})+\sum_{i=1}^{n}(\underbrace{\lambda_{\max}(\tilde{U}^{T}+\tilde{U})-\lambda_{i}(\tilde{U}^{T}+\tilde{U})}_{\geq 0}).

Thus, H(,U~)H(\dots,\tilde{U}^{*}) satisfies PMP.