Cyber Risk Management and Mitigation Under Controlled Stochastic SIS Model

Shize Na Department of Financial and Actuarial Mathematics, Xi’an Jiaotong-Liverpool University, Suzhou, P.R. China, 215123. Zhuo Jin Department of Actuarial Studies and Business Analytics, Macquarie University, NSW 2113, Australia Ran Xu Corresponding author.
E-mail address:
Shize.Na18@student.xjtlu.edu.cn(S. Na),
zhuo.jin@mq.edu.au (Z. Jin),
Ran.Xu@xjtlu.edu.cn(R. Xu),
Hailiang.Yang@xjtlu.edu.cn(H. Yang). Department of Financial and Actuarial Mathematics, Xi’an Jiaotong-Liverpool University, Suzhou, P.R. China, 215123. Hailiang Yang Department of Financial and Actuarial Mathematics, Xi’an Jiaotong-Liverpool University, Suzhou, P.R. China, 215123.

(September 27, 2025)

Abstract

In this paper, we formulate cyber risk management and mitigation as a stochastic optimal control problem under a stochastic Susceptible-Infected-Susceptible (SIS) epidemic model. To capture the dynamics and interplay of management and mitigation strategies, we introduce two stochastic controls: (i) a proactive risk management control to reduce external cyber attacks and internal contagion effects, and (ii) a reactive mitigation control to accelerate system recovery from cyber infection. The interplay between these controls is modeled by minimizing the expected discounted running costs, which balance proactive management expenses against reactive mitigation expenditures. We derive the associated Hamilton-Jacobi-Bellman (HJB) equation and characterize the value function as its unique viscosity solution. For numerical implementation, we propose a Policy Improvement Algorithm (PIA) and prove its convergence via Backward Stochastic Differential Equations (BSDEs). Finally, we present numerical results through a benchmark example, suboptimal control analysis, sensitivity analysis, and comparative statics.

Keywords: Cyber risk modeling, stochastic SIS model, stochastic control, policy improvement algorithm

1 Introduction

Cybersecurity has emerged as a critical concern in the industrial, financial, insurance, and governmental sectors due to the increasing digitization and interconnectedness of various systems. Currently, there is no unanimous definition of cyber risk. Some widely accepted definitions include but not limit to: Cyber risks are operational risks that may result in potential violation of confidentiality, availability, or integrity of information systems (Cebula and Young, , 2010); it is a financial risk associated with network and computer incidents and leading to the failure of information systems (Böhme and Kataria, , 2006; Böhme et al., , 2010).

In academia, the study of cyber risk has attracted the attention of researchers across many fields. For example, researchers in computer science have been aware of the importance of security in cyberspace and have made many contributions concerning cyber risk detection (Moore et al., , 2006; Garcia-Teodoro et al., , 2009; Cárdenas et al., , 2011; Liu et al., , 2016), security breach prediction (Zhan et al., , 2015; Bakdash et al., , 2018), and computer system enhancement (Jang-Jaccard and Nepal, , 2014). In control systems and automation, researchers are more interested in the design of various optimal (deterministic) controls against denial-of-service and/or false data injection types of cyber attacks to the cyber-physical systems, to name a few, see e.g., Amin et al., (2009); Pasqualetti et al., (2013); Fawzi et al., (2014); Wakaiki et al., (2019); Liu et al., (2025) and associated references. In the field of business and corporate finance, the studies focus on investigating cyber risk under the framework of enterprise risk management (Stoneburner et al., , 2002; Gordon et al., , 2003; Öğüt et al., , 2011; Paté-Cornell et al., , 2018). In addition, in insurance and actuarial science, researchers are more interested in modeling the cyber risk in terms of its frequency, severity, and dependence with a range of statistical and stochastic techniques, for example, Herath and Herath, (2011); Mukhopadhyay et al., (2013); Eling and Loperfido, (2017); Eling and Jung, (2018); Xu and Hua, (2019); Dou et al., (2020); Malavasi et al., (2022) and references therein. For a recent comprehensive cross-disciplinary review on modeling and management of cyber risk can refer to He et al., (2024), and for the review of data availability associated with cyber risk, one can refer to Cremer et al., (2022).

Traditional cyber risk models (especially in the community of insurance and actuarial science) are adept at capturing statistical properties and temporal trends of losses, yet they often fail to account for the propagation of cyber threats. By studying how cyber risks spread in a closed system, one can identify critical factors that determine the scale of contagion and the magnitude of the losses. This deeper understanding enhances the design of risk management and mitigation strategies. It is noted that the parallels between cyber risk and disease spread make epidemiology a natural framework for a similar analysis in cyber risk, given that both phenomena involve contagion, interdependent exposure, and dynamic evolution over time.

Inspired by methods from genetic epidemiology, Gil et al., (2014) introduced a statistical framework to assess the susceptibility of individual nodes in a network. Their approach treats the services running on a host as the defining risk factor for cyber threat exposure, drawing an analogy to genetic penetrance models. This conceptual borrowing allows for a structured analysis of how certain network configurations increase vulnerability. In addition, Liu et al., (2016) developed an innovative compartmental model for malware propagation by adapting epidemiological concepts to cybersecurity, where the computers are recognized as heterogeneous nodes with different protection levels in the network. The model categorizes nodes into three distinct states: weakly protected susceptible (W-nodes), strongly protected susceptible (S-nodes), and infected (I-nodes). The authors discussed the malware-free equilibrium when the “basic reproduction number” (an important metric in epidemiological models) $R_{0}<1$ and $R_{0}\geq 1$ respectively. These findings align with earlier work by Mishra and Pandey, (2014), who employed a susceptible-exposed-infectious-susceptible-with-vaccination model to analyze worm propagation. More recently, Fahrenwaldt et al., (2018) models the spread of cyber infections with an interacting Markov chain and claims with a marked point process, where the spread process is of a pure Poisson jump process, whereas transitions are described by the susceptible-infected-susceptible (SIS) epidemic model. And, the dependence among different nodes (i.e., firms, computers, or devices) is modeled by an undirected network. Through a simulation study, the authors demonstrate that network topology plays a crucial role in determining insurance prices and designing effective risk management strategies. Xu and Hua, (2019) employs both Markov and non-Markov processes within an SIS network model. Their Markov formulation incorporates dual infection pathways, where Poisson processes are used to capture both internal network transmission and external threats. This formulation yields valuable dynamic upper bounds for infection probabilities and stationary probability estimates. The empirical validation in the study reveals the critical influence of recovery rates on insurance premium calculations. Later, Antonio et al., (2021) extends the Markov-based SIS model by incorporating network clustering coefficients, such that the model can capture the local network clustering that inhibits epidemic spread through modified transition probabilities. Using N-intertwined mean-field approximation, they derived dynamic infection probability bounds that improve premium accuracy when applied to both synthetic and real-world networks. Hillairet and Lopez, (2021) describes the spread of a cyber attack at a global level with a susceptible-infected-recovered (SIR) model, and approximates the cumulative number of individuals in each group with a Gaussian process. Furthermore, Hillairet et al., (2022) proposes a multi-group epidemic model to assess the impact of large-scale cyber attacks on insurance portfolios, which captures the interdependencies between actors and can be calibrated with limited data, enabling efficient scenario analysis of cyber events. For other relevant studies in recent literature, the readers may refer to He et al., (2024) and references therein.

Recent studies have significantly advanced the modeling of cyber risk propagation through epidemiological approaches, in particular, using SIS and its extended models with network-dependent interactions. While these models effectively capture contagion dynamics, dependence structures, and loss distributions, they remain descriptive rather than prescriptive. To be specific, many studies focus on predicting the spread of contagions and estimating losses, but fail to incorporate active intervention or control strategies to manage or mitigate the cyber risk in real-time. Hence, a critical limitation in existing cyber risk modeling with epidemic models is the absence of formal control mechanisms to identify optimal risk management and mitigation strategies. It is noted that this gap mirrors the recent developments in biological epidemic control literature, where the later works incorporate various optimal stochastic control (such as vaccination) into (non)stochastic SI(R)S models, see, for example, Boukanjime et al., (2021); Tran and Yin, (2021); Barnett et al., (2023); Sonveaux and Winkin, (2023); Federico et al., (2024); Chen et al., (2025).

Therefore, in this paper, we address such a gap by formalizing the cyber risk management and mitigation problem as a stochastic optimal control problem under a stochastic Susceptible-Infected-Susceptible (SIS) model. The objective of this paper is not to develop a sophisticated stochastic model that can fit perfectly to any existing cyber risk and cybersecurity data, but, rather, to provide a theoretical stochastic control framework for studying the cyber risk management and mitigation problems. The contributions of our research are summarized as follows:

•

Cyber risk has emerged as a critical threat to modern enterprises, governments, and financial systems, with the attacks illustrating potential contagion across various sectors. There is a clear trend in recent literature on employing stochastic processes, especially stochastic epidemic models, to capture the various contagion mechanisms of cyber risk. However, no studies have investigated the (optimal) risk management and risk mitigation strategies under such a stochastic contagion model. We bridge the gap in the literature by formulating the contagion dynamics of the cyber risks in a closed system as a controlled stochastic SIS model. Two distinct control variables are considered: one is a proactive control mechanism governing risk management, which reduces or prevents external cyber attacks and internal contagion effects, such as firewall upgrades, isolating infected systems, and disconnecting breached servers; another is reactive control mechanism associated with risk mitigation, which refers to the set of immediate tactical actions taken to recover the affected systems, for example, remove malware, close backdoor, or reset compromised credentials. Our work is the first to integrate these dual controls into a unified stochastic optimal control framework for cyber risk, providing a rigorous mathematical foundation for dynamic decision-making in cyber risk management and mitigation. Such a dual control framework can capture, to some extent, the trade-off between risk prevention/management and risk mitigation, which is a key challenge in practice, especially when facing resource constraints.
•

We characterize the value function(the expected discounted running costs) as the unique viscosity solution to the Hamilton-Jacobi-Bellman (HJB) equation derived from our stochastic control formulation for cyber risk management and mitigation. By applying viscosity theory, we establish the existence and uniqueness of the solution of the HJB equation under mild regularity assumptions. This analytical framework ensures numerical tractability, as the uniqueness property guarantees well-posedness for iterative computational methods via the dynamic programming principle. Moreover, our approach naturally accommodates extensions to more complex settings, including jump-diffusion processes and generalized cost structures, broadening its applicability beyond the current setup.
•

For the numerical implementation, we develop a Policy Improvement Algorithm (PIA) grounded in the Bellman-Howard policy iteration framework for such an infinite-time horizon stochastic cyber risk control problem. The algorithm demonstrates superior computational efficiency compared to other alternative numerical methods, such as Markov chain approximation with value iterations. The algorithm typically converges within a few iterative steps. In addition, we rigorously establish the algorithm’s convergence (at an exponential rate) and stability under error perturbations during iteration by mapping the value function at each step to the unique solution of some (iteratively defined) infinite-horizon Backward Stochastic Differential Equations (BSDEs). Given the inherent challenge of our underlying stochastic control problem (i.e., an infinite-time horizon problem with unknown boundary conditions), these results constitute a nontrivial extension of the finite-horizon frameworks, such as that of Kerimkulov et al., (2020). We further emphasize that our algorithm and the BSDE-based convergence and stability analysis can be further applied to those stochastic control problems with a random horizon or optimal stopping problems seamlessly. Finally, while the focus of this paper is on a one-dimensional diffusion process (specifically, the stochastic SIS model), all algorithmic and theoretical results extend directly to multi-dimensional controlled diffusion processes, enabling the applications of our results to more generalized compartmental models, for example, the Susceptible-Infectious-Recovered (SIR) model, the Susceptible-Infectious-Recovered-Vaccinated (SIRV) model, etc.

The remainder of the paper is organized as follows. Section 2 presents our controlled stochastic SIS model featuring dual control variables for cyber risk management and mitigation. We reformulate the problem as an optimal stochastic control problem under a (drift-controlled) diffusion model, and derive fundamental properties of the value function. Section 3 establishes our core theoretical contribution, where we prove that the value function can be characterized as the unique viscosity solution to the associated Hamilton-Jacobi-Bellman equation. In Section 4, we propose a numerical method, namely the Policy Improvement Algorithm, together with convergence results based on the theory of infinite-time horizon BSDEs. We also provide various numerical results on the optimal cyber risk management and mitigation strategies, including comprehensive sensitivity analysis and comparative statics. The conclusion is given in Section 5. Finally, some technical proofs are collected in the Appendices for completeness.

2 Stochastic SIS model and mathematical formulation

We start with a standard Susceptible-Infected-Susceptible(SIS) model, which can be characterized by two state variables, namely the number of susceptible nodes (for example, terminal computers or servers in a network) $S_{t}$ , and the number of cyber-infected nodes $I_{t}$ . Assume the total number of nodes in the system at time $t$ is $N_{t}$ . Similar to the usual step in studying the classical deterministic SIS model in the epidemiological literature, we normalize the system size to one such that we replace the susceptible and infected nodes $S_{t}$ and $I_{t}$ in the system by $s_{t}=S_{t}/N_{t}$ and $i_{t}=I_{t}/N_{t}$ , respectively. In addition, we further introduce the stochastic version of the SIS model, see e.g. Tran and Yin, (2021); Barnett et al., (2023), and the dynamics of the state variables are expressed as

\begin{cases}\operatorname*{\mathrm{d\!}}s_{t}=-(\alpha+\beta i_{t})s_{t}\operatorname*{\mathrm{d\!}}t+\gamma i_{t}\operatorname*{\mathrm{d\!}}t-\sigma s_{t}i_{t}\operatorname*{\mathrm{d\!}}W_{t},\\ \operatorname*{\mathrm{d\!}}i_{t}=(\alpha+\beta i_{t})s_{t}\operatorname*{\mathrm{d\!}}t-\gamma i_{t}\operatorname*{\mathrm{d\!}}t+\sigma s_{t}i_{t}\operatorname*{\mathrm{d\!}}W_{t},\\ s_{0}+i_{0}=1,\end{cases}

(2.1)

where we introduce white noise shocks $W_{t}$ , which is a parameter perturbation of $\beta$ with volatility $\sigma$ . Here, $\alpha$ is the rate at which susceptible nodes (devices/users) are infected by external cyber threats (e.g., malicious emails,denial-of-service); $\beta$ denotes the rate at which the infected nodes propagate malware to susceptible nodes; and $\gamma$ is a baseline (unassisted) recovery rate of the compromised nodes become functional and threat-free, which may be due to the baseline cybersecurity measures like antivirus or basic IT policies. For simplicity, we assume that all the abovementioned model parameters in (2.1) are constants.

Now, we extend (2.1) to a controlled stochastic SIS model with two control variables. On one hand, we allow the above SIS model to be controlled under cyber risk management protocols through proactive protection measures (see, e.g., Barnett et al., (2023) for a similar control variable in an epidemic study). Let $\eta_{t}$ be the fraction of nodes (both susceptible and cyber-infected) at time $t$ in the system under protection with certain measures, which can reduce the effectiveness of both the external cyber threats to susceptible nodes and propagation from cyber-infected nodes to susceptible nodes. Note that it is possible to introduce an extra parameter $\zeta\in[0,1]$ and replace $\eta_{t}$ by $(1-\zeta\eta_{t})$ in (2.1), which captures the incomplete effectiveness of the protection measure. We set $\zeta=1$ and ignore this parameter in our study; for a detailed discussion, one can consult, for example, Barnett et al., (2023). On the other hand, we further replace $\gamma$ in (2.1) by $\gamma+\rho_{t}$ , where $\rho_{t}$ is a controlled recovery rate at time $t$ , which can be interpreted as the enhanced recovery rate due to certain reactive interventions to the cyber-infected nodes.

We assume that the pair of cyber risk management and mitigation control actions $(\eta_{\cdot},\rho_{\cdot})$ takes value in a nonempty set $U$ in $[0,1]\times[0,\infty)$ . Then, the dynamics of the controlled stochastic SIS model can be described by the following system of stochastic differential equations(SDEs):

\begin{cases}\operatorname*{\mathrm{d\!}}s_{t}=-\alpha\eta_{t}s_{t}\operatorname*{\mathrm{d\!}}t-\beta\eta_{t}^{2}i_{t}s_{t}\operatorname*{\mathrm{d\!}}t+(\gamma+\rho_{t})i_{t}\operatorname*{\mathrm{d\!}}t-\sigma s_{t}i_{t}\operatorname*{\mathrm{d\!}}W_{t},\\ \operatorname*{\mathrm{d\!}}i_{t}=\alpha\eta_{t}s_{t}\operatorname*{\mathrm{d\!}}t+\beta\eta_{t}^{2}i_{t}s_{t}\operatorname*{\mathrm{d\!}}t-(\gamma+\rho_{t})i_{t}\operatorname*{\mathrm{d\!}}t+\sigma s_{t}i_{t}\operatorname*{\mathrm{d\!}}W_{t},\\ s_{0}+i_{0}=1.\end{cases}

(2.2)

Note that the control $\eta$ is applied to both susceptible nodes and infected nodes, hence we have a quadratic form in each of the second term in (2.2).

According to a similar analysis in Theorem 2.1 of Tran and Yin, (2021), we can formulate the above controlled SIS model into a one-dimensional controlled diffusion process. To be specific, for any initial value $(s_{0},i_{0})\in(0,1)^{2}$ satisfying $s_{0}+i_{0}=1$ and $(\eta_{t},\rho_{t})\equiv(\eta_{0},\rho_{0})\in U$ , the system of SDEs (2.2) has a unique strong global solution $(s_{t},i_{t})_{t\geq 0}$ , and $s_{t}+i_{t}=1$ almost surely for any $t\geq 0$ . Therefore, in the following, we focus on the dynamics of the fraction of cyber-infected nodes in the system,

\operatorname*{\mathrm{d\!}}i_{t}=\Big[\alpha\eta_{t}(1-i_{t})+\beta\eta_{t}^{2}i_{t}(1-i_{t})-(\gamma+\rho_{t})i_{t}\Big]\operatorname*{\mathrm{d\!}}t+\sigma(1-i_{t})i_{t}\operatorname*{\mathrm{d\!}}W_{t}.

We shall provide a rigorous proof of the above assertion in Proposition 2.1 below. To proceed with our analysis under simplified notations, we reformulate the above cyber risk management and mitigation problem as a stochastic control problem for a general diffusion process given in (2.3) below.

Let $(\Omega,\mathcal{F},\mathbb{F}=\{\mathcal{F}\}_{t\geq 0},\mathbb{P})$ be a filtered probability space satisfying the usual condition, and on which a one–dimensional Brownian motion $W=(W_{t})_{t\geq 0}$ is defined. Let $X=(X_{t})_{t\geq 0}$ denote the fraction of cyber-infected nodes in the system at time $t$ , which is driven by the following controlled diffusion process:

\operatorname*{\mathrm{d\!}}X_{t}=b(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}t+\sigma(X_{t})\operatorname*{\mathrm{d\!}}W_{t},\qquad X_{0}=x\in(0,1),

(2.3)

where $b(x,\eta,\rho)=\eta\alpha(1-x)+x[\eta^{2}\beta(1-x)-(\gamma+\rho)]$ , $\sigma(x)=\sigma x(1-x)$ (with abuse of notation, we use the volatility parameter $\sigma$ in (2.2) as the function $\sigma(\cdot)$ here in (2.3) for notation simplicity), and $\alpha,\beta,\gamma$ and $\sigma$ are model parameters in the stochastic SIS system (2.2). The control processes $(\eta_{.},\rho_{.}):\mathbb{R}_{+}\times\Omega\to U$ are progressively measurable, where $U$ is the control action space, we may just set $U=[0,1]\times[0,\infty)$ , $\mathbb{R}_{+}$ denotes the set $[0,\infty)$ . Furthermore, throughout the paper, we let $\mathbb{P}_{x}$ and $\mathbb{E}_{x}$ denote the probability measure and expectation operator when $X_{0}=x$ , respectively.

Assumption 2.1

The mappings $x\mapsto b(x,\eta,\rho)$ and $x\mapsto\sigma(x)$ are continuous in $x$ , and the former is uniformly in the control $(\eta,\rho)\in U$ . There exists a constant $K_{1}>0$ such that for any $(\eta,\rho)\in U$ , and all $x,y\in(0,1)$ we have

|b(x,\eta,\rho)-b(y,\eta,\rho)|+|\sigma(x)-\sigma(y)|\leq K_{1}|x-y|.

(2.4)

Moreover, for all $(\eta,\rho)\in U$ and $x\in(0,1)$ , it also holds that

|b(x,\eta,\rho)|\leq K_{2}(1+|x|+|\eta|+|\rho|),

and

|\sigma(x)|\leq K_{2}(1+|x|),

for some $K_{2}>0$ .

Assumption 2.1 holds obviously in our stochastic SIS model. To be specific, consider any $x,y\in(0,1)$ , one has

	$\displaystyle\|b(x,\eta,\rho)-b(y,\eta,\rho)\|$	$\displaystyle=\left\|\left(\eta^{2}\beta-\eta\alpha-(\gamma+\rho)\right)(x-y)-\eta^{2}\beta(x^{2}-y^{2})\right\|$
		$\displaystyle\leq\left\|\left(\eta^{2}\beta-\eta\alpha-(\gamma+\rho)\right)\right\|\|x-y\|+\eta^{2}\beta(x+y)\|x-y\|$
		$\displaystyle\leq\Big(\left\|\left(\eta^{2}\beta-\eta\alpha-(\gamma+\rho)\right)\right\|+2\eta^{2}\beta\Big)\|x-y\|,$

and

|\sigma(x)-\sigma(y)|=|\sigma(x-y)-\sigma(x^{2}-y^{2})|\leq\sigma\Big(1+(x+y)\Big)|x-y|\leq 3\sigma|x-y|.

Hence, by letting $K_{1}=\max\left\{\Big(\left|\left(\eta^{2}\beta-\eta\alpha-(\gamma+\rho)\right)\right|+2\eta^{2}\beta\Big),3\sigma\right\}$ , one arrives at (2.4). Moreover, take any $x\in(0,1)$ and $(\eta,\rho)\in U$ , we have

\begin{split}|b(x,\eta,\rho)|&=\left|\eta\alpha(1-x)+x\left[\eta^{2}\beta(1-x)-(\gamma+\rho)\right]\right|\\ &\leq\left|\eta[\alpha(1-x)+x\beta(1-x)]\right|+\left|x(\gamma+\rho)\right|\\ &\leq(\alpha+\beta)\cdot|\eta|+\gamma|x|+|\rho|\\ &\leq K_{1}\cdot(1+|x|+|\eta|+|\rho|),\end{split}

and

|\sigma(x)|=|\sigma x(1-x)|\leq K_{2}\cdot|x|,

for some $K_{2}>0$ .

Now, we are ready to define the cost functional associated with our cyber risk management and mitigation problem (2.3) as

J(x;\eta,\rho):=\mathbb{E}_{x}\left[\int_{0}^{\infty}e^{-\delta t}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}t\right],\quad x\in(0,1),(\eta,\rho)\in U,

(2.5)

where $\delta>0$ is the discounting factor, $f:(0,1)\times[0,1]\times[0,\infty)\to[0,\infty)$ is a running cost function. In addition, without loss of generality, we restrict our study to the set of admissible controls given below. Let $\mathcal{U}$ denotes all progressively measurable random processes $(u,\nu)=\{(u_{t},\nu_{t}),t\geq 0\}$ taking values in $U$ , and define the set of admissible control processes by

\mathcal{U}_{0}:=\left\{(u,\nu)\in\mathcal{U}:|u_{s}|+|\nu_{s}|\leq M_{u,\nu}<\infty\;\text{a.s. uniformly in}\;s\in\mathbb{R}_{+}\right\}.

Then, the value function is defined as

V(x):=\inf_{(\eta,\rho)\in\mathcal{U}_{0}}J(x;\eta,\rho),\qquad x\in(0,1).

(2.6)

To show the well-posedness of the controlled diffusion process (2.3), we prove the following proposition.

Proposition 2.1

For any control $(\eta,\rho)\in\mathcal{U}_{0}$ and initial value $X_{0}=x\in(0,1)$ , the controlled SDE (2.3) has a unique global positive solution $X_{t}$ for all $t\in\mathbb{R}_{+}$ such that

\mathbb{P}_{x}(X_{t}\in(0,1),\forall t\in\mathbb{R}_{+})=1.

Proof. The proof is similar to the proof of Theorem 3.1 in Gray et al., (2011). Let $\tau$ be the explosion time of the $X_{t}$ driven by (2.3). Since the drift and diffusion coefficients are locally Lipschitz, there exists a unique local solution $X_{t}$ for $t\in[0,\tau)$ . For any initial value $X_{0}=x\in(0,1)$ , consider a sufficiently large $N>0$ such that $x\in(\frac{1}{N},1-\frac{1}{N})$ . Then, for each $n\geq N$ , define the first exit time

\tau_{n}:=\inf\Big\{t\in[0,\tau):X_{t}\notin\left(\frac{1}{n},1-\frac{1}{n}\right)\Big\},

with the convention that $\inf\{\emptyset\}=\infty$ . Obviously, $\tau_{n}$ is increasing in $n$ , hence we let $\tau_{\infty}=\lim_{n\to\infty}\tau_{n}$ , and $\tau_{\infty}\leq\tau$ almost surely. Then, the rest is to show $\tau_{\infty}=\infty$ almost surely. We prove the statement by the method of contradiction. Assume that there exists $T>0$ and $\epsilon\in(0,1)$ such that

\mathbb{P}_{x}(\tau_{\infty}\leq T)>\epsilon.

Then, there is a sufficiently large $\tilde{n}\geq N$ such that $\mathbb{P}_{x}(\tau_{n}\leq T)\geq\epsilon$ for all $n\geq\tilde{n}$ . We define an auxiliary function $v(z):=\frac{1}{z}+\frac{1}{1-z}$ for $z\in(0,1)$ , then for any $t\in[0,T]$ and $n\geq\tilde{n}$ , an application of the Dynkin’s formula gives,

\mathbb{E}_{x}(v(X_{t\wedge\tau_{n}}))=v(x)+\mathbb{E}_{x}\left[\int_{0}^{t\wedge\tau_{n}}\mathcal{L}^{\eta,\rho}v(X_{s})\operatorname*{\mathrm{d\!}}s\right],

where $\mathcal{L}^{\eta,\rho}$ is the infinitesimal generator of $X$ for any fixed $\eta$ and $\rho$ given as

	$\displaystyle\mathcal{L}^{\eta,\rho}v(x)=$	$\displaystyle\eta\alpha\left(-\frac{1}{x^{2}}+\frac{1}{(1-x)^{2}}\right)(1-x)+x\left(-\frac{1}{x^{2}}+\frac{1}{(1-x)^{2}}\right)[\eta^{2}\beta(1-x)-(\gamma+\rho)]$
		$\displaystyle\quad+\sigma^{2}x^{2}(1-x)^{2}\left(\frac{1}{x^{3}}+\frac{1}{(1-x)^{3}}\right).$

With some simple algebra, one has

\mathcal{L}^{\eta,\rho}v(x)\leq\frac{\eta\alpha+\eta^{2}\beta}{1-x}+\frac{\gamma+\rho}{x}+\sigma^{2}\left(\frac{1}{x}+\frac{1}{1-x}\right)\leq C(\eta,\rho)v(x),\quad\text{for }x\in(0,1),

where $C(\eta,\rho)=\max\{\eta\alpha+\eta^{2}\beta,\gamma+\rho\}+\sigma^{2}$ . Take a finite modification of random process $C(\eta_{s\wedge\tau},\rho_{s\wedge\tau}),s\geq 0$ by defining

\widetilde{C}(\eta_{s\wedge\tau},\rho_{s\wedge\tau}):=\begin{cases}C(\eta_{s\wedge\tau},\rho_{s\wedge\tau}),&\omega\in\Omega^{\prime},\\[4.30554pt] M_{u,\nu},&\omega\in\Omega\setminus\Omega^{\prime},\end{cases}

where $\mathbb{P}_{x}\left(\widetilde{C}(\eta_{s\wedge\tau},\rho_{s\wedge\tau})=C(\eta_{s\wedge\tau},\rho_{s\wedge\tau})\right)=1$ for every $s\geq 0$ . Then $\widetilde{C}=C$ almost surely, but $\widetilde{C}$ is bounded by $M_{u,\nu}<\infty$ everywhere. By Tonelli’s theorem applied to the nonnegative process $\widetilde{C}(\eta_{s\wedge\tau_{n}},\rho_{s\wedge\tau_{n}})\cdot v(X_{s\wedge\tau_{n}})$ , we have

\begin{split}\mathbb{E}_{x}\big[v(X_{t\wedge\tau_{n}})\big]&=v(x)+\int_{0}^{t}\mathbb{E}_{x}\Big[\widetilde{C}(\eta_{s\wedge\tau_{n}},\rho_{s\wedge\tau_{n}})\cdot v(X_{s\wedge\tau_{n}})\Big]\,\operatorname*{\mathrm{d\!}}s\\ &\leq v(x)+\int_{0}^{t}\mathbb{E}_{x}\Big[\sup_{\omega\in\Omega}\left\{\widetilde{C}(\eta_{s\wedge\tau_{n}},\rho_{s\wedge\tau_{n}})\right\}\cdot v(X_{s\wedge\tau_{n}})\Big]\,\operatorname*{\mathrm{d\!}}s\\ &\leq v(x)+M_{u,\nu}\int_{0}^{t}\mathbb{E}_{x}\big[v(X_{s\wedge\tau_{n}})\big]\,\operatorname*{\mathrm{d\!}}s,\end{split}

where the last inequality follows from the boundedness of $\widetilde{C}$ on all $\omega\in\Omega$ by the definition of the admissible control set $\mathcal{U}_{0}$ . In particular, the joint measurability of $C(\eta_{s\wedge\tau_{n}},\rho_{s\wedge\tau_{n}})$ and $v(X_{s\wedge\tau_{n}})$ w.r.t. product $\sigma$ -algebra $\mathcal{B}([0,u])\otimes\mathcal{F}_{u},\forall u\geq 0$ , together with the bounds

C(\eta_{s\wedge\tau_{n}},\rho_{s\wedge\tau_{n}})\geq\gamma+\sigma^{2}>0,\quad v(X_{s\wedge\tau_{n}})\geq\frac{1}{n}>0\quad\text{a.e.},

ensure the validity of applying Tonelli’s theorem.

Then, with the help of Grönwall’s inequality, one has

v(x)e^{\overline{C}T}\geq\mathbb{E}_{x}(v(X_{T\wedge\tau_{n}})\geq\mathbb{E}_{x}\Big(\bm{1}_{\{\tau_{n}\leq T\}}v(X_{\tau_{n}})\Big)=n\mathbb{P}_{x}(\tau_{n}\leq T)\geq n\epsilon,

(2.7)

where we applied the fact that $X_{\tau_{n}}$ equals to either $\frac{1}{n}$ or $1-\frac{1}{n}$ . Then by sending $n\to\infty$ in (2.7), one arrives at the contradiction $\infty>v(x)e^{\overline{C}T}=\infty$ , and completes the proof.

Assumption 2.2

There exists a constant $K>0$ , and $m\in\mathbb{N}$ such that for any pair of controls $(\eta,\rho)\in U$ , we have

|f(x,\eta,\rho)-f(y,\eta,\rho)|\leq K(1+|x|^{m}+|y|^{m})|x-y|,

|f(x,\eta,\rho)|\leq K(1+|x|+|\eta|^{2}+|\rho|^{2}),

for all $x,y\in(0,1)$ . Additionally, let $\delta>0$ , the cost function $f(x,\eta,\rho)$ is bounded by a constant $C_{f}$ such that

\mathbb{E}_{x}\left[\int_{0}^{\infty}e^{-\delta t}|f(X_{t},\eta_{t},\rho_{t})|\operatorname*{\mathrm{d\!}}t\right]<\infty.

Remark 2.1

The typical form of the cost function we shall investigate later in this paper can be expressed as

f(x,\eta,\rho):=a_{0}+a_{I}x+a^{S}_{m}(1-\eta)^{2}+(a^{I}_{m}-a^{S}_{m})x(1-\eta)^{2}+a_{r}x\rho^{2},

(2.8)

where $a_{0}>0$ is the underlying marginal costs of running the system, $a_{I}>0$ is the marginal cost generated from infected nodes, $a^{S}_{m}$ and $a^{I}_{m}$ are the marginal costs associated with cyber risk management control for susceptible nodes and cyber-infected nodes, respectively. We shall assume that $a^{I}_{m}>a^{S}_{m}>0$ , which means that costs associated with management for infected nodes are, in general, higher than those for functional and threat-free nodes. Finally, $a_{r}>0$ is the marginal cost of cyber risk mitigation control. Not that Assumption 2.2 is satisfied for the class of cost functions defined in (2.8).

To proceed, we show some important properties of the value function in the following proposition. Throughout the paper, we assume that the Assumptions 2.1 and 2.2 hold true.

Proposition 2.2

(i)

If the running cost function $f(x,\cdot,\cdot)$ is nondecreasing in $x$ for each pair of admissible controls, then $V(x)$ is nondecreasing in $x$ for all $x\in(0,1)$ .
(ii)

For any $x,y\in(0,1),m\in\mathbb{N}$ , there exists a constant $C>0$ such that

$|V(x)-V(y)|\leq C(1+x^{m}+y^{m})|x-y|.$ (2.9)

Proof. See Appendix A.

3 Hamilton-Jacobi-Bellman equation and the viscosity solution

In this section, we first state the dynamic programming equation associated with our stochastic control problem and derive (heuristically) the corresponding Hamilton-Jacobi-Bellman (HJB) equation. Let $\theta$ be any $\mathbb{F}$ -stopping time, for any $x\in(0,1)$ , we have

V(x)=\inf_{(\eta,\rho)\in\mathcal{U}_{0}}\mathbb{E}_{x}\left[\int_{0}^{\theta}e^{-\delta t}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}t+e^{-\delta\theta}V(X_{\theta})\right]

(3.1)

The proof of (3.1) for controlled diffusion processes follows the classical arguments in the literature, see for example, Theorem 3.1.6 of Krylov, (1980), where the main point is the continuity of the value function, which is the present case in our study.

If the value function is sufficiently smooth, then by following the standard arguments with the help of the dynamic programming principle and Itô’s formula, we obtain the following HJB equation,

\inf_{(\rho,\eta)\in U}\left\{b(x,\eta,\rho)u_{x}(x)+\frac{1}{2}\sigma^{2}(x)u_{xx}(x)-\delta u(x)+f(x,\eta,\rho)\right\}=0,\quad x\in(0,1).

(3.2)

Note that the heuristic arguments verifying that value function $V$ is a classical solution to the HJB equation (3.2) assume in prior that $V$ is twice continuously differentiable on $(0,1)$ , which is not the present case, where we only have Lipchitz continuity in $V$ . Hence, we adopt the concept of the viscosity solution introduced in Crandall and Lions, (1983) and characterize the optimal value function $V$ as the unique viscosity solution to the HJB equation (3.2).

Definition 3.1

Let $u:(0,1)\to\mathbb{R}$ be a locally Lipschitz continuous function.

(i)

$u$ is local viscosity supersolution of (3.2) at $x\in(0,1)$ , if for any $\varphi\in C^{2}((0,1))$ with $u\geq\varphi$ , whenever $u-\varphi$ attains a local minimum at $x$ , we have

\inf_{(\rho,\eta)\in U}\left\{b(x,\eta,\rho)\varphi_{x}(x)+\frac{1}{2}\sigma^{2}(x)\varphi_{xx}(x)-\delta\varphi(x)+f(x,\eta,\rho)\right\}\leq 0.

(ii)

$u$ is a local viscosity subsolution of (3.2) at $x\in(0,1)$ , if for any $\phi\in C^{2}((0,1))$ with $u\leq\phi$ , whenever $u-\phi$ attains a local maximum at $x$ , we have

\inf_{(\rho,\eta)\in U}\left\{b(x,\eta,\rho)\phi_{x}(x)+\frac{1}{2}\sigma^{2}(x)\phi_{xx}(x)-\delta\phi(x)+f(x,\eta,\rho)\right\}\geq 0.

(iv)

$u$ is a viscosity solution of (3.2) on $(0,1)$ if it is both a viscosity supersolution and subsolution of (3.2) on all $x\in(0,1)$ .

We first provide the result regarding the existence of the viscosity solution for the HJB equation (3.2) on $(0,1)$ .

Proposition 3.1

The value function $V$ is a viscosity solution of (3.2) on $(0,1)$ .

Proof. See Appendix B.

To prove the uniqueness, we introduce the following alternative definition of viscosity solution (for second-order differential equations), see for example Yong and Zhou, (1999). For any function $u\in C((0,1))$ and $x\in(0,1)$ , the so-called second–order superdiffiential of $u$ at $x$ is defined as

D^{2,+}_{x}u(x):=\Bigg\{(p,q)\in\mathbb{R}\times\mathbb{R}:\lim\sup_{h\to 0}\frac{u(x+h)-u(x)-ph-\frac{q}{2}h^{2}}{h^{2}}\leq 0\Bigg\},

and the second–order subdiffiential of $u$ at $x$ is defined as

D^{2,-}_{x}u(x):=\Bigg\{(p,q)\in\mathbb{R}\times\mathbb{R}:\lim\inf_{h\to 0}\frac{u(x+h)-u(x)-ph-\frac{q}{2}h^{2}}{h^{2}}\geq 0\Bigg\}.

In addition, we let

F(x,u(x),u_{x}(x),u_{xx}(x)):=\inf_{(\rho,\eta)\in U}\left\{b(x,\eta,\rho)u_{x}(x)+\frac{1}{2}\sigma^{2}(x)u_{xx}(x)-\delta u(x)+f(x,\eta,\rho)\right\},

and rewrite (3.2) as

F(x,u(x),u_{x}(x),u_{xx}(x))=0,\quad\text{for }x\in(0,1).

(3.3)

Definition 3.2

A Lipschitz continuous function $u:(0,1)\to\mathbb{R}$ is called a viscosity supersolution of (3.3) at $x\in(0,1)$ if

F(x,u(x),p,q)\leq 0,

for all $(p,q)\in D^{2,-}_{x}u(x)$ .

A Lipschitz continuous function $u:(0,1)\to\mathbb{R}$ is called a viscosity subsolution of (3.3) at $x\in(0,1)$ if

F(x,u(x),p,q)\geq 0,

for all $(p,q)\in D^{2,+}_{x}u(x)$ .

If $u$ is both a viscosity supersolution and a viscosity subsolution of (3.3) at $x\in(0,1)$ , then it is a viscosity solution at $x$ .

Proposition 3.2

(Comparison Principle) Let the increasing and Lipschitz continuous functions $u$ and $v$ be a viscosity supersolution and a viscosity subsolution of the HJB equation (3.2) (as well as (3.3)) respectively. For any closed interval $\mathcal{O}\subset(0,1)$ , if $u\geq v$ on $\partial\mathcal{O}$ , then $u\geq v$ on $\mathcal{O}$ .

Proof. See Appendix C.

Proposition 3.3

Let $v\in C((0,1))$ be any increasing and Lipschtiz continuous viscosity supersolution of (3.2), then $v(x)\geq V(x)$ for all $x\in(0,1)$ .

Proof. The proof follows the standard arguments of using Itô’s formula with a density argument dealing with the nonsmoothness of any viscosity supersolution $v$ of (3.2), see, for example, Nguyen-Ngoc and Yor, (2004); Azcue and Muler, (2005). We omit the details here.

By Proposition 3.3, for a sufficiently large closed interval $\mathcal{O}\subset(0,1)$ , we can characterize the value function $V$ as the viscosity solution of (3.2) with the smallest value on the boundary $\partial\mathcal{O}$ in the class of increasing and Lipschitz continuous viscosity solutions of (3.2). Let $\mathcal{V}$ denote the set of all increasing and Lipschitz continuous functions that are viscosity solutions of (3.2) on $\mathcal{O}$ , then, we characterize the value function $V$ as

V=\{v\in\mathcal{V}~|~v(x)\leq h(x)\text{ for all }h\in\mathcal{V}\text{ and }x\in\partial\mathcal{O}\}.

(3.4)

Proposition 3.4

The value function $V$ characterized in (3.4) is the unique viscosity solution of (3.2) on $\mathcal{O}$ .

Proof. Consider another increasing and Lipschitz continuous viscosity solution $\varphi$ of (3.2) on $\mathcal{O}$ satisfying (3.4). On one hand, $\varphi$ is a viscosity supersolution, and by Proposition 3.3, we have $\varphi\geq V$ on $\mathcal{O}$ . On the other hand, $\varphi$ is also an increasing and Lipschitz continuous viscosity subsolution of (3.2) with the fact that $\varphi(x)\leq V(x)$ for $x\in\partial\mathcal{O}$ (since $V\in\mathcal{V}$ ), then by the comparison principle given in Proposition 3.2, we have $V\geq\varphi$ on $\mathcal{O}$ , whenever $V$ is considered as a viscosity supersolution and $\varphi$ as a viscosity subsolution. Therefore, we obtain $\varphi=V$ on $\mathcal{O}$ and complete the proof.

4 Numerical algorithm and examples

4.1 Policy improvement algorithm

Obtaining a closed-form solution to (3.2) is seldom feasible; hence, in this paper, we apply a policy improvement algorithm, namely Bellman–Howard policy improvement/iteration algorithm (see Algorithm 1 below), to solve the cyber risk management and mitigation problem numerically. Iterative algorithms for solving optimal control problems can trace their origins to Bellman’s pioneering work Bellman, (1955, 1966), which introduced value iteration methods for finite space-time problems and established their convergence properties. Howard, (1960) later developed the policy improvement algorithm in the context of discrete space-time Markov decision processes (MDPs).

The proof of the convergence is of paramount importance in using policy improvement algorithms. Among the earliest convergence analyses for policy iteration in MDPs is the work of Puterman and Brumelle, (1979), which employed an abstract function–space framework applicable to both discrete and continuous settings. A key insight from their study is that policy iteration can be interpreted as a form of Newton’s method, inheriting similar convergence properties, i.e., whenever initialized near the true solution, the algorithm achieves quadratic convergence. Later Puterman, (1981) provides similar results on the convergence of the policy iteration algorithm for controlled diffusion processes. Further extensions were made by Santos and Rust, (2004), who examined discrete-time problems with continuous state and control spaces. Their work generalizes the results of Puterman and Brumelle, (1979), demonstrating global convergence while retaining local quadratic convergence rate under standard conditions and superlinear convergence rate under broader assumptions. More recently, under a setting of continuous space–time controlled diffusion processes, Kerimkulov et al., (2020) established a global rate of convergence and stability of the Bellman–Howard policy iteration algorithm with the help of techniques in Backward Stochastic Differential Equations (BSDEs). Therefore, we follow the main steps in Kerimkulov et al., (2020) when proving the convergence of our policy iteration algorithm, the main theorem is given in Theorem 4.1 below. Note that the study in Kerimkulov et al., (2020) focuses on finite-horizon stochastic control problems; a brief discussion of the infinite-horizon counterpart can be found in Remark 4.3 of Kerimkulov et al., (2020), which suggests that the convergence result still holds for some sufficiently large discount factor $\delta>0$ , although no formal proof is provided.

Without loss of generality, we consider a sufficiently large closed interval $\mathcal{O}=[\underline{x},\overline{x}]\subset(0,1)$ . Further, we let the running cost function $f$ be in the form of (2.8),

f(x,\eta,\rho)=a_{0}+a_{I}x+a^{S}_{m}(1-\eta)^{2}+(a^{I}_{m}-a^{S}_{m})x(1-\eta)^{2}+a_{r}x\rho^{2},\quad(x,\eta,\rho)\in\mathcal{O}\times U,

where $a_{0}$ is the baseline marginal costs associated with the management of the system; $a_{I}$ is the (extra) marginal running costs incurred by cyber-infected nodes in the system; $a_{m}^{I}$ and $a_{m}^{S}$ denote the marginal costs associated with proactive risk management for cyber-infected nodes and susceptible (functional and threat-free) nodes, respectively; $a_{r}$ denotes the marginal costs generated by reactive risk mitigation (intervention) to enhance the recovery rate in the system. The Policy Improvement Algorithm (PIA) for solving (3.2) on $\mathcal{O}$ is given in Algorithm 1.

Algorithm 1 Policy improvement algorithm

1:Initialize: Set spatial discretizations

\{x_{k}\}_{k=1}^{N}

with

N

grid points of

[\underline{x},\overline{x}]

. Choose an initial guess of control

u^{0}:=(\eta^{0},\rho^{0})\in\mathcal{U}_{0}

, constant for each

x_{k}

2:repeat

3: Solve the Bellman–type ODE:

\frac{1}{2}\sigma^{2}D_{xx}v^{n}+b^{u^{n}}D_{x}v^{n}+f^{u^{n}}-\delta v^{n}=0,\quad x\in[\underline{x},\overline{x}].

(4.1)

4: Control update by the first-order conditions

\eta^{n+1}(x)=\begin{cases}1-\frac{(\alpha+\beta x)(1-x)D_{x}v^{n}}{2\left[(a_{m}^{I}-a_{m}^{S})x+a^{S}_{m}\right]}&\text{if in }[0,1],\\ 0&\text{if below 0},\\ 1&\text{otherwise}.\end{cases}

\rho^{n+1}(x)=\begin{cases}\frac{D_{x}v^{n}}{2a_{r}x}&\text{if }D_{x}v^{n}>0,\\ 0&\text{otherwise}.\end{cases}

5:until The normalized

L^{2}

-norm between two nearest iterations

\frac{\|v^{n+1}-v^{n}\|_{2}}{\sqrt{N}}<\epsilon.

6:return:

v^{n}

and

u^{n+1}

We shall remark that to solve the Bellman-type ODE (4.1) during each iteration step of Algorithm 1, we need to impose boundary conditions which are not available in our problem. Hence, we impose an approximated Dirichlet condition at the left boundary ( $\underline{x}$ ) and a Neumann condition at the right boundary ( $\overline{x}$ ) based on the conventional Monte-Carlo simulation method for the performance function (2.5) under the current control strategy (extrapolated for all $x\in(0,1)$ ) at each iteration step. The effectiveness of Algorithm 1 has been validated for an example of an infinite-horizon stochastic control problem with analytical solutions, as shown in Asmussen and Taksar, (1997).

For completeness, we present the convergence theorem of the Algorithm 1 in Theorem 4.1, together with the policy improvement theorem (see Theorem 4.2) and algorithm stability under perturbations to the solution of the Bellman-type ODE (4.1) (see Theorem 4.3).

Theorem 4.1

Assume Assumptions 2.1, 2.2, D.1, and D.2 hold. Let $v\in C^{1}(\mathcal{O})$ be the (viscosity) solution to the HJB equation (3.2) on $\mathcal{O}$ , and let $(v^{n})_{n\in\mathbb{N}}\in C^{2}(\mathcal{O})$ be the sequence of smooth approximations generated by Algorithm 1. Then there exists $q\in(0,1)$ and the initial guess $v^{0}$ , such that for all $x\in\mathcal{O}$ there is a constant $C=C(x,\delta)$ ( $\delta$ is a given discounting rate) satisfying

|v(x)-v^{n}(x)|^{2}\leq C(x,\delta)q^{n}.

(4.2)

The proof of Theorem 4.1 follows a similar argument to Theorem 4.1 in Kerimkulov et al., (2020). To be specific, at the $n$ th iteration, the solution $v^{n}$ of the Bellman-type ODE (4.1) can be characterized as the solution to a corresponding BSDE $Y^{n}$ . By the uniqueness property of the infinite-horizon BSDE, the equivalence between $v^{n}$ and $Y^{n}$ is thus established. In addition, the solution to the HJB equation (3.2) can also be represented by a BSDE, which follows directly from Lemma D.5, where the key ingredient is the comparison principle for BSDEs. Finally, applying the contraction property from Lemma D.4, the desired convergence result follows. For a more detailed explanation of the underlying idea of Algorithm 1, we refer the reader to Kerimkulov et al., (2020). Due to the complexity, we postpone the detailed proof together with some preliminary assumptions and lemmas to Appendix D at the end of the paper.

Moreover, the following theorem establishes the monotone improvement property of the policy improvement algorithm.

Theorem 4.2

Let Assumptions 2.1, 2.2, D.1, and D.2 hold, and fix $n\in\mathbb{N}$ . Let $v^{n}$ and $v^{n+1}$ be the solutions of (4.1) at steps $n$ and $n+1$ of the Algorithm 1. Then, for all $x\in(0,1)$ , it holds that

v^{n+1}(x)\geq v^{n}(x).

Proof. The proof follows a similar argument as the proof of Theorem 5.1 of Kerimkulov et al., (2020) together with the comparison principle of infinite-horizon BSDE given in Theorem D.3.

To finish this subsection, we provide a stability property of the policy improvement algorithm under the perturbations to the solution of the Bellman-type ODE (4.1)(note that the perturbations come from both the fact that Eq. (4.1) is only solved approximately and our approximated boundary conditions). Hence, updating (with the first-order condition) the controls $\eta$ and $\rho$ at each iteration step in Algorithm 1 is essentially performed only with the approximated solution, which can cumulate errors in the iteration.

Let $\varepsilon$ be a set of parameters that determines the accuracy of the solution to the ODE (4.1). Let $u^{n}_{\varepsilon}$ be the policy at iteration $n$ obtained from an approximate solution to the ODE (4.1), let $v^{n}_{\varepsilon}$ be the solution of

\frac{1}{2}\sigma^{2}D_{xx}v^{n}_{\varepsilon}(x)+b^{u^{n}_{\varepsilon}}D_{x}v^{n}_{\varepsilon}+f^{u^{n}_{\varepsilon}}(x)-\delta v^{n}_{\varepsilon}=0,\quad x\in[\underline{x},\overline{x}],

(4.3)

with true boundary conditions. And let $\tilde{v}^{n}_{\varepsilon}$ be the approximate solution to

	$\displaystyle\frac{1}{2}\sigma^{2}D_{xx}v^{n}_{\varepsilon}(x)+b^{u^{n}_{\varepsilon}}D_{x}v^{n}_{\varepsilon}+f^{u^{n}_{\varepsilon}}(x)-\delta v^{n}_{\varepsilon}$	$\displaystyle=0,\quad x\in(\underline{x},\overline{x}),$
	$\displaystyle v^{n}_{\varepsilon}(\underline{x})$	$\displaystyle=m^{u^{n}_{\varepsilon}}_{1}(\underline{x}),$
	$\displaystyle D_{x}v^{n}_{\varepsilon}(\overline{x})$	$\displaystyle=m^{u^{n}_{\varepsilon}}_{2}(\underline{x}),$

where $m^{u^{n}_{\varepsilon}}_{1}(\underline{x})$ and $m^{u^{n}_{\varepsilon}}_{2}(\underline{x})$ denote the approximate values of $J(\underline{x};u^{n}_{\varepsilon})$ and the left derivative of $J(\cdot;u^{n}_{\varepsilon})$ at $\overline{x}$ respectively. Then, the policy function (see the definition of this function in Assumption D.1) for the next iteration step is given by

u^{n+1}_{\varepsilon}=u\bigl(x,(\sigma D_{x}\widetilde{v}^{n}_{\varepsilon})(x)\bigr)=\operatorname*{arg\,min}_{u\in U}\Bigl\{(b^{u}D_{x}\widetilde{v}^{n}_{\varepsilon})(x)+f^{u}(x)\Bigr\}.

Theorem 4.3

Let Assumptions 2.1, 2.2, D.1, and D.2 hold. Let $(v^{n})_{n\in\mathbb{N}}$ be the approximation sequence given by Algorithm 1. Let $(v^{n}_{\varepsilon})_{n\in\mathbb{N}}$ be the approximation sequence given by (4.3). Let $u^{\ast}$ and $X^{x,u^{\ast}}$ be the optimal control process for (3.2) and the associated controlled diffusion started from $x\in(0,1)$ , respectively. Assume that $D_{x}\tilde{v}^{n}_{\varepsilon}$ is uniformly bounded. Then there exist $q\in(0,1)$ and $0<\gamma<\delta$ , such that for all $x\in(0,1)$ , there exists a constant $C=C(x,\delta)$ with

|v^{n}(x)-v_{\varepsilon}^{n}(x)|^{2}\leq C(x,\delta)q^{n}+2\sum_{k=1}^{n}q^{k}\left\|\Bigl(\sigma(D_{x}v_{\varepsilon}^{k}-D_{x}\tilde{v}_{\varepsilon}^{k})\Bigr)(X^{x,u^{\ast}})\right\|_{\widehat{\mathbb{L}}_{\gamma-\delta}^{2}},

where ${\widehat{\mathbb{L}}_{\gamma-\delta}^{2}}$ is the $\mathbb{L}^{2}_{\gamma-\delta}$ -norm under probability measure $\widehat{\mathbb{P}}$ , see Appendix D.

Proof. The proof follows the same idea as Theorem 6.1 of Kerimkulov et al., (2020), except that Lemma D.4 is used in its place. We omit the details here.

4.2 Benchmark example

The following Example 4.1 presents a benchmark example in which the Algorithm 1 is applied to numerically solve the cyber risk management and mitigation problem. It is important to note that the parameter $a^{S}_{m}$ shall be small (which is not unreasonable) to ensure the convergence of the PIA. Additionally, the initial guess for $\eta$ is set to zero. These choices are made to help obtain a smooth solution for the severely stiff ODE. Moreover, a combination of a Dirichlet condition at the left boundary and a Neumann condition at the right boundary of the computing interval $\{\underline{x},\overline{x}\}$ will be used to solve the stiff ODE (4.1). Those are simulated by the conventional Monte-Carlo method.

Example 4.1

We first provide a benchmark example for the cyber risk control problem (2.6). Let $a_{0}=0.5,a_{I}=5,a_{m}^{I}=2.5,a_{m}^{S}=0.5,a_{r}=5$ . Furthermore, we set the discount factor $\delta=0.05$ . Additionally, let the external cyber attacks rate be $\alpha=0.5$ , internal contagion rate $\beta=0.5$ , (unassisted) recovery rate $\gamma=0.15$ , and the diffusion coefficient $\sigma=0.3$ . Finally, we set $\underline{x}=0.01$ , $\overline{x}=0.99$ , $N=1000$ and $\epsilon=10^{-4}$ .

The results of the optimal strategy and value function are given in Figures 1(b) and 1(a), respectively; we also provide the convergence of the errors (in terms of normalized $L^{2}$ -norm of the difference between two successive value function), which shows the computational efficiency of the algorithm, see Figure 1(c). The algorithm converges with an error less than $10^{-4}$ within eight steps starting from the initial guess $\eta^{0}=0$ and $\rho^{0}=0$ . To better understand the behavior of the optimal control $(\eta^{*},\rho^{*})$ , we observe that $\eta^{*}$ remains at zero (strong proactive control) when the ratio of cyber-infected nodes is considerably small, while $\rho^{*}$ decays fast from its maximum. This pattern indicates that when the current system has a small ratio of cyber-infected nodes, applying the risk mitigation control to enhance the recovery rate is less effective than implementing proactive management control to prevent the internal contagions and external cyber attacks. Consequently, the optimal strategy prioritizes risk management over risk mitigation at the outset.

Furthermore, Figure 1(b) shows that both the optimal controls $\eta^{*}$ (cyber risk management) and $\rho^{*}$ (cyber risk mitigation) decrease as the ratio of cyber-infected nodes $x$ increases. In particular, we have the following observations:

•

When the cyber-infected ratio is low (i.e., $x\approx 0$ ), both controls are set high, which reflects a strong incentive for the decision maker to invest in cyber risk prevention and mitigation in an early stage.
•

As the cyber-infected ratio rises ( $x\uparrow 1$ ), both controls decline. This suggests that once the system is already heavily compromised, additional investments in prevention or mitigation have diminishing impact on reducing risk, especially for the proactive risk management control $\eta$ . Hence, the decision maker may want to prioritize resource allocation toward risk mitigation strategies.

The above observations align with the intuition of cyber risk control under limited resources: it is most effective to intervene early, when cyber-infected ratios are still small; while intervention becomes less valuable (and less cost-effective) when the system is already in a state of widespread infection.

4.3 Suboptimal control analysis

Example 4.2

In this example, we provide numerical analysis when we fix either the risk management control at zero ( $\eta(\cdot)\equiv 1$ ) or the risk mitigation control at zero ( $\rho(\cdot)\equiv 0$ ). The resulting optimal strategies (of the single control) and value functions are given in Figure 4.2.

When removing the reactive cyber risk mitigation control (i.e., $\rho\equiv 0$ ), the optimal proactive risk management control $\eta^{*}$ becomes noticeably stronger (see Figure 2(a)) compared to the benchmark example, which compensates for the absence of reactive mitigation controls. The value function remains nearly unchanged at small $x$ , but rises substantially (reaching about $55$ versus the benchmark level of $28$ ) when the cyber-infected ratio is close to one, see Figure 2(c). This suggests that as cyber-infection level increases, the marginal effectiveness of risk management control drops off, and proactive risk management alone cannot effectively substitute for reactive mitigation control in a highly infected system.

On the other hand, when we remove the risk management control (i.e., $\eta\equiv 1$ ), the optimal reactive control $\rho^{*}$ decreases relative to the benchmark (Figure 2(b)), reflecting the limited effectiveness of mitigation when proactive control from a risk management perspective is unavailable. In this case, the value function rises overall (Figure 2(d)), shifting from a range between 20 to 30 under the benchmark example to a range between 75 to 80. This shows that relying solely on reactive mitigation strategies results in higher expected costs, confirming that mitigation control cannot fully substitute for proactive management.

4.4 Sensitivity analysis

In this subsection, we numerically analyze the distinct roles of proactive control $\eta$ (risk management) and reactive control $\rho$ (risk mitigation) in shaping the optimal value function. In particular, we deviate by a series of small changes (uniformly in $x$ ) for $\eta$ and $\rho$ , respectively, from the optimal strategy $(\eta^{*},\rho^{*})$ in the benchmark example. The results are plotted in Figure 3(a)–3(d).

We conclude this subsection with the following observations:

•

Proactive risk management control ( $\eta$ ): According to Figure 3(a), increasing $\eta$ produces a substantial and nearly uniform increase in value function (higher expected discounted costs) across all initial states of cyber-infected level, confirming the consistent effectiveness of proactive measures when managing cyber risks in the system. Conversely, decreasing $\eta$ has limited effects on the magnitude of the value function (see Figure 3(b)). Note that, since $\eta^{*}(x)=0$ for small values of $x$ , the decrease of $\eta$ cannot be applied in this case; hence, we do not observe any changes in the corresponding value function. But, when the cyber-infected ratio is high, the inability to sustain strong proactive control leads to a noticeable increase in the value function (but not comparable to the corresponding scenario when increasing $\eta$ ).
•

Reactive risk mitigation control ( $\rho$ ): Both Figures 3(c) and 3(d) show that perturbations on the value of $\rho^{*}$ exert their strongest influence when the system is in a status with a high cyber-infected ratio. Such an observation indicates that the reactive mitigation control is most valuable once the system contains a large number of infected nodes. In addition, unlike the risk management control, we can observe an obvious non-uniform change in the value function with respect to $x$ . In particular, when we add positive perturbations to $\rho^{*}$ (i.e., adding redundant mitigation controls), the increase in the value function (i.e., expected discounted costs) is moderate and consistent. However, adding negative perturbations to $\rho^{*}$ , which refers to insufficient mitigation controls, can distort the shape of the value function and sharply increase expected discounted costs. One may also conjecture (see Figure 3(d)) that there exists a critical “threshold level” for cyber risk mitigation control, such that insufficient reactive mitigation control below the “level” can cause tremendous losses. We leave this interesting observation for future research.
•

Overall conclusion: The sensitivity analysis together with the suboptimal control analysis in Example 4.2 highlight an “asymmetry” between the two types of control strategies. Proactive risk management provides consistent and broad benefits, and can partially substitute for the absence of mitigation controls. By contrast, reactive mitigation is valuable only when the system is heavily compromised with a high ratio of cyber-infected nodes, and cannot substitute for missing proactive risk management control. In practice, this implies that effective cyber risk control strategies require front-loaded investment in proactive defense, with reactive mitigation serving as a complementary safeguard against severe system breakdown rather than a stand-alone strategy.

Remark 4.1

One may notice that each “value function” obtained (by solving the corresponding ODEs under perturbed control) in the above sensitivity analysis (Figure 3(a)–3(d)) are not necessarily the objective function $J$ given in (2.5) under perturbed control.

In this remark, we assume the solution to the Bellman ODE is twice continuously differentiable. While the numerical solution is not necessarily twice continuously differentiable, it approximates a $C^{2}((0,1))$ solution under standard regularity assumptions and sufficient discretization accuracy. If we fixed the control by each perturbation above, such as $\eta\pm\Delta\eta$ and $\rho\pm\Delta\rho$ , then the control space $U$ is reduced to a singleton. As the consequence, one could apply the Itô’s formula for the discounted process $e^{-\delta t}u(X_{t}^{x})$ between 0 and ${T\wedge\tau_{n}}$ with a sequence of stopping times $\tau_{n}:=\inf\{t\geq 0:\int_{0}^{T\wedge\tau_{n}}e^{-\delta t}D_{x}u(X_{t}^{x})\sigma(X_{t}^{x})\operatorname*{\mathrm{d\!}}W_{t}\geq n\}$ ,

\begin{split}&\mathbb{E}\left[e^{-\delta{T\wedge\tau_{n}}}u\left(X_{T\wedge\tau_{n}}^{x}\right)\right]\\ &=u(x)+\mathbb{E}\left[\int_{0}^{T\wedge\tau_{n}}e^{-\delta t}[b(X_{t}^{x})D_{x}u(X_{t}^{x})+\frac{1}{2}\sigma^{2}(X_{t}^{x})D_{xx}u(X_{t}^{x})-\delta u(X_{t}^{x})]\operatorname*{\mathrm{d\!}}t\right]\\ &\hskip 18.49988pt+\mathbb{E}\left[\int_{0}^{T\wedge\tau_{n}}e^{-\delta t}D_{x}u(X_{t}^{x})\sigma(X_{t}^{x})\operatorname*{\mathrm{d\!}}W_{t}\right]\\ &=u(x)+\mathbb{E}\left[\int_{0}^{T\wedge\tau_{n}}e^{-\delta t}[b(X_{t}^{x})D_{x}u(X_{t}^{x})+\frac{1}{2}\sigma^{2}(X_{t}^{x})D_{xx}u(X_{t}^{x})-\delta u(X_{t}^{x})]\operatorname*{\mathrm{d\!}}t\right],\end{split}

where denote $X^{x}_{t}:=X^{0,x,\eta,\rho}_{t}$ , $b(X^{x}_{t}):=b^{\eta,\rho}(X^{x}_{t})$ and note the stochastic integral is a local martingale. Sending $n$ to infinity, then

\mathbb{E}\left[e^{-\delta{T}}u\left(X_{T}^{x}\right)\right]=u(x)+\mathbb{E}\left[\int_{0}^{T}e^{-\delta t}[b(X_{t}^{x})D_{x}u(X_{t}^{x})+\frac{1}{2}\sigma^{2}(X_{t}^{x})D_{xx}u(X_{t}^{x})-\delta u(X_{t}^{x})]\operatorname*{\mathrm{d\!}}t\right]

holds by dominated convergence theorem. By using the fact that $u$ is a solution to the ODE stated above and then sending $T$ to infinity, we observe that

u(x)=\mathbb{E}\left[\int_{0}^{\infty}e^{-\delta t}f^{\eta,\rho}(x)\operatorname*{\mathrm{d\!}}t\right]

coincides with the definition of objective function $J(x,\eta,\rho)$ .

4.5 Comparative statics

In this final subsection, we perform a comparative statics analysis across all model parameters, including $\alpha$ , $\beta$ , $\sigma$ , and the marginal cost parameters ( $a_{I},a_{m}^{I},a_{m}^{S},a_{r}$ ) in the cost function given in (2.8). Note that it is not necessary to investigate the parameter $\gamma$ , which contributes additively to the mitigation control $\rho$ in numerical results.

Comparative statics for $\alpha$ and $\beta$ . We first compare different values of the external cyber attack rate $\alpha$ and the internal contagious rate $\beta$ , keeping other parameters unchanged in Example 4.1. The resulting optimal strategies are given in Figures 4(a), 4(c) (for $\alpha$ ) and 4(b), 4(d) (for $\beta$ ), respectively.

Figures 4.4 show that an increase in both $\alpha$ (the rate of external cyber attacks) and $\beta$ (the rate of internal cyber risk propagation) requires stronger risk management and mitigation controls, with the optimal proactive management $\eta^{*}$ moving closer to $0$ and the optimal reactive mitigation $\rho^{*}$ rising. This reflects that it is optimal to simultaneously reinforce both preventive measures and reactive responses when cyber risks escalate. The impact of $\alpha$ is more pronounced than that of $\beta$ , leading to sharper adjustments in both controls. From a cyber risk perspective, this distinction is natural: a surge in external attacks compels the decision maker to rapidly escalate defensive risk management measures and amplify mitigation controls. In contrast, internal risk propagation dynamics among susceptible and infected nodes induce a more gradual–though less pronounced–reinforcement of the two controls. Such a result thus demonstrates the necessity of preferential allocation of resources to the management of external cyber attacks.

Comparative statics for σ. We compare different values of the volatility parameter $\sigma$ in the stochastic system, assuming other parameters remain the same as in Example 4.1. Figure 4.5 illustrates the impact of increasing the volatility parameter $\sigma$ on the optimal control $(\eta^{*},\rho^{*})$ . As $\sigma$ rises from $0.1$ to $0.5$ , $1$ , and $2$ , both the proactive management $\eta$ and the reactive mitigation $\rho$ weaken slightly. Intuitively, higher volatility increases uncertainty in the evolution of the cyber-infected ratio in the system, making aggressive interventions less effective. The small magnitude (compared with the cases when changing $\alpha$ and $\beta$ ) of the change indicates that the control policy is primarily driven by the system’s drift dynamics, with stochastic fluctuations playing a secondary role.

Comparative statics for $a_{m}^{I}$ and $a_{m}^{S}$ . We further compare different values of the marginal costs associated with cyber risk management for cyber-infected nodes and susceptible nodes, respectively. We plot the resulting optimal strategies in Figure 4.6.

It is reasonable to observe that when the marginal costs of proactive management control associated with either cyber-infected nodes $a_{m}^{I}$ or susceptible nodes $a_{m}^{S}$ increase, the optimal proactive management control $\eta^{*}$ weakens. On the other hand, the optimal reactive mitigation control $\rho^{*}$ becomes stronger when the marginal costs of management control incurred by cyber-infected nodes ( $a_{m}^{I}$ ) increase, since the costs associated with mitigation control become relatively cheaper. However, when the marginal costs ( $a_{m}^{S}$ ) increase, we observe a decreasing trend (rather uniform for all levels of infection ratios) in the optimal mitigation control $\rho^{*}$ ; in fact, such a counter-intuitive result is not unreasonable, since with a higher value of $a_{m}^{S}$ , it becomes more expensive for risk management control when the system has a large amount of susceptible nodes; then, the decision maker may choose to reduce (moderately) the mitigation control, which essentially reduce the transition intensity from cyber-infected state to susceptible state. In addition, the optimal reactive mitigation control is more sensitive to the change of $a_{m}^{I}$ when the cyber-infected ratio in the system is low, but it is more sensitive to the change of $a_{m}^{S}$ when the system is heavily compromised. Furthermore, one can observe that the optimal risk management control $\eta^{*}(x)$ , as a function of the cyber-infected ratio $x$ , exhibits a concave shape (see e.g., Figure 6(a)), and the concavity diminishes when $a_{m}^{I}$ decreases. Such a phenomenon may be rooted in the interaction (or trade-off) between the marginal costs $a_{I}$ and $a_{m}^{I}$ . When $a_{I}$ is (sufficiently) larger than $a_{m}^{I}$ , that is, the costs associated with cyber-infected nodes under management control are negligible compared to the baseline management costs of all cyber-infected nodes, then the optimal management control strategy is “neutral” to the cyber-infected ratio, hence results in a linear form. But, when $a_{m}^{I}$ is sufficiently larger than $a_{I}$ , the optimal management control, as a function of cyber-infected ratio, becomes a concave function. It means that, to minimize the total expected discounted costs, the decision maker might want to reduce the strength of proactive management control aggressively when the cyber-infected ratio is at a moderate level, and the tendency declines when the cyber-infected ratio approaches one (hence, a concave form). One can observe a similar result when changing the value of $a_{I}$ and keeping $a_{m}^{I}$ fixed, see Figures 7(a) and 7(c).

Comparative statics for $a_{I}$ and $a_{r}$ . We change the value of marginal costs ( $a_{I}$ ) incurred by cyber-infected nodes in the system, or the marginal costs ( $a_{r}$ ) associated with reactive risk mitigation control, keeping other parameters unchanged in Example 4.1.

When the marginal costs $a_{I}$ increase, Figures 7(a) and 7(c) show a similar (but in the opposite way) result of the optimal management control $\eta^{*}$ as we observed in Figures 6(a) and 6(c), where a stronger prevention measure is achieved by expanding the zero-valued plateau (representing maximal prevention) for a larger range of cyber-infected ratios. This expansion occurs in an almost uniform, additive manner, where as $a_{I}$ rises, the switching threshold at which $\eta^{*}$ departs from zero shifts rightward by roughly the same increment. In addition, with a small value in $a_{I}$ (e.g., $a_{I}=1$ ), the optimal proactive management control $\eta^{*}(x)$ , as a function of cyber-infected ratio $x$ , exhibits a concave property; and the concavity diminishes when $a_{I}$ increases and eventually exceeds the value of $a_{m}^{I}$ . However, the optimal reactive mitigation $\rho^{*}$ adjusts in a significantly different way compared with what we obtained in Figure 6(c), especially when the cyber-infected ratio ( $x$ ) is close to one. To be specific, when the system is heavily compromised, for a large value of $a_{I}$ (compared to $a_{m}^{I}$ ), it is optimal to increase the reactive mitigation control to reduce the number of infected nodes so that the expected discounted costs can be reduced significantly.

On the other hand, Figures 7(b) and 7(d) show that increasing the mitigation cost $a_{r}$ from $1$ to $2.5$ , $5$ , and $7.5$ leads to a systematic weakening of the mitigation strategy $\rho^{*}$ , while the management strategy $\eta^{*}$ is strengthened by expanding its zero-valued plateau (maximal prevention) so that strong prevention is applied earlier and more widely. The movement pattern of $\rho^{*}$ is similar to that observed in Figures 7(a) and 7(c), in the sense that the adjustment is roughly uniform across the state space, but the direction is opposite: a higher value of $a_{I}$ lifts $\rho$ proportionally, whereas a higher mitigation cost $a_{r}$ pushes $\rho^{*}$ downward. This observation highlights that when the operating costs associated with cyber-infected nodes become more costly, it is optimal to reinforce both risk management and mitigation strategies. However, when mitigation becomes expensive, the optimal strategy reallocates effort towards risk management control with a reduced reliance on expensive reactive mitigation controls.

5 Conclusion and future outlook

In this paper, we model cyber risk management and mitigation as a stochastic optimal control problem within a stochastic Susceptible-Infected-Susceptible (SIS) epidemic framework. We introduce two dynamic controls to capture real-time risk management and mitigation strategies: 1) a proactive control ( $\eta$ ) that reduces external cyber attacks and internal contagion effects; 2) a reactive control ( $\rho$ ) that speeds up the recovery of infected nodes. We formulate this as a dual stochastic control problem governed by a general diffusion process. Theoretically, we establish the well-posedness of the controlled SIS model under these dual controls and prove that the associated value function is the unique increasing and Lipschitz-continuous viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation derived from the control problem.

For numerical implementation, we propose a Policy Improvement Algorithm (PIA) and demonstrate its convergence using Backward Stochastic Differential Equations (BSDEs). Our convergence result extends existing finite-horizon analyses to the infinite-horizon case. Then, we present a benchmark example that illustrates the optimal risk management and mitigation strategy, along with the corresponding value function, for a given model parameter set. We further examine suboptimal performance and sensitivity by: 1) removing one control entirely in the benchmark scenario; 2) introducing small perturbations to each optimal control; 3) conducting a comprehensive comparative statics analysis across all model parameters. The sensitivity and suboptimal control analyses reveal a fundamental asymmetry between the two control strategies, where proactive risk management control demonstrates consistent system-wide benefits and exhibits partial substitutability for reactive mitigation when absent; however, reactive risk mitigation control shows value only during high-infection scenarios and cannot compensate for missing proactive measures. Furthermore, some interesting observations are drawn from the comparative statics, including: the asymmetric impact of external attack frequency versus internal contagion rates on optimal control strategies underscores a possible critical policy implication; effective cyber defense requires prioritizing resource allocation toward external threat management; the optimal control strategy, particularly proactive risk management, exhibits significantly different behavioral patterns depending on the current infection ratio. This variation stems from the interaction between the operational costs of maintaining all infected nodes in the system and the marginal costs of implementing risk management controls on these compromised nodes.

Finally, we remark that our work lays a foundation for several natural extensions in the field. One direction is incorporating jump processes to model sudden, large-scale cyber attacks or system failures, which could better capture extreme events beyond the diffusion approximation. Another extension involves regime-switching dynamics, where the network environment or external threat landscape changes over time, influencing both infection propagation and optimal control strategies. Further research may also explore multi-layered or networked SIS models with heterogeneous nodes, time delays, and partial observation, enabling more realistic and granular cyber risk management strategies. These extensions could provide a richer theoretical framework and more practical insights for robust cyber defense policies under uncertainty and complex operational conditions. We left them for future research.

Statements and Declarations

No competing interests.

Acknowledgments

Zhuo Jin and Hailiang Yang were supported by the National Natural Science Foundation of China Grant [Grant 12471452]. Ran Xu was supported by the National Natural Science Foundation of China [Grants 12201506 and 12371468].

Appendix A Proof of Proposition 2.1

(i) To prove the assertion, we first show that the cost functional $J$ defined in (2.8) is non-decreasing in $x$ if $f(x,\cdot,\cdot)$ is non-decreasing in $x$ for each pair of admissible controls. This can be proved using the density argument and Itô’s formula. To be specific, let’s firstly argue the Yamada & Watanabe’s comparison principle of Itô’s diffusion (see, e.g., Karatzas and Shreve, (1991)). The diffusion term $\sigma(x)$ holds the locally Lipschitz property, and further observe that $|\sigma(x)-\sigma(y)|\leq h(|x-y|)$ by simply taking $h(x):=3\sigma x:=ax$ . There exists a strictly decreasing sequence $\{a_{n}\}_{n\in\mathbb{N}}\downarrow 0$ with $a_{0}:=1$ and $\int_{[a_{n-1},a_{n}]}h^{-2}(u)du=n$ for every $n\in\mathbb{N}$ . To see this, one explicitly has $\int_{[a_{n-1},a_{n}]}h^{-2}(u)du=\frac{1}{a^{2}}\cdot\left(\frac{1}{a_{n}}-\frac{1}{a_{n+1}}\right)=n$ , which gives $a_{n}=\frac{2}{2+a^{2}n(n+1)}$ satisfying such properties as required for each $n\in\mathbb{N}$ . Moreover, we would like to take a nonnegative continuous function $p(x)$ dominated by $2/(nh^{2}(x)0$ , such that $1\leq\int_{[a_{n-1},a_{n}]}p(u)\operatorname*{\mathrm{d\!}}u\leq\int_{[a_{n-1},a_{n}]}2/(nh^{2}(u))\operatorname*{\mathrm{d\!}}u=2$ , so that we get a normalized function $\rho_{n}(x):(a_{n-1},a_{n})\to\mathbb{R}$ continuous in $x$ taking the form of

0\leq\rho_{n}(x):=\frac{p(x)}{\int_{(a_{n-1},a_{n})}p(u)\operatorname*{\mathrm{d\!}}u}\cdot I_{(a_{n-1},a_{n})}(x)\leq p(x)\leq\frac{2}{nh^{2}(x)}.

For example, one can take $p(x)=1/(nh^{2}(x))$ . Notice also that a property

\int_{0}^{|x|}\rho_{n}(x)\operatorname*{\mathrm{d\!}}x\leq\int_{0}^{\infty}\frac{p(x)}{\int_{(a_{n-1},a_{n})}p(u)\operatorname*{\mathrm{d\!}}u}\cdot I_{(a_{n-1},a_{n})}(x)\operatorname*{\mathrm{d\!}}x=\int_{(a_{n-1},a_{n})}\frac{p(x)}{\int_{(a_{n-1},a_{n})}p(u)du}\operatorname*{\mathrm{d\!}}x=1

holds for some real number $x$ . Next, assign the function

\psi_{n}(x):=\int_{0}^{|x|}\int_{0}^{y}\rho_{n}(u)\operatorname*{\mathrm{d\!}}u\operatorname*{\mathrm{d\!}}y,\;\text{ on }\;\mathbb{R}

which is even and twice continuously differentiable:

|D_{x}\psi_{n}(x)|=\left|\frac{d}{dx}\left(\int_{0}^{|x|}\int_{0}^{y}\rho_{n}(u)dudy\right)\right|=\int_{0}^{|x|}\rho_{n}(u)du\leq 1

by fundamental theorem of calculus, and

\begin{split}\lim_{n\nearrow\infty}\psi_{n}(x)&=\int_{0}^{|x|}\lim_{n\nearrow\infty}\int_{[0,y]}\rho_{n}(u)\operatorname*{\mathrm{d\!}}u\operatorname*{\mathrm{d\!}}y\\ &=\int_{0}^{|x|}\lim_{n\nearrow\infty}\int_{(a_{n-1},a_{n})\cap(0,y)}\frac{p(u)}{\int_{(a_{n-1},a_{n})}p(v)dv}\cdot\operatorname*{\mathrm{d\!}}u\operatorname*{\mathrm{d\!}}y\\ &=\int_{0}^{|x|}\lim_{n\nearrow\infty}\int_{(a_{n-1},a_{n})}\frac{p(u)}{\int_{(a_{n-1},a_{n})}p(v)\operatorname*{\mathrm{d\!}}v}\cdot\operatorname*{\mathrm{d\!}}u\operatorname*{\mathrm{d\!}}y\\ &=\int_{0}^{|x|}1\cdot\operatorname*{\mathrm{d\!}}y=|x|,\end{split}

where observe that $\{\int_{0}^{y}\rho_{n}(u)\operatorname*{\mathrm{d\!}}u\}_{n\in\mathbb{N}}\uparrow 1$ is at least non-decreasing sequence for each $y\geq 0$ , hence allowing us to apply the monotone convergence theorem. Consider two random processes (namely the solution to the corresponding controlled SDE (2.3)) $X^{x,\eta,\rho}_{t}$ and $X^{y,\eta,\rho}_{t}$ for different initial data $x\leq y$ , each of that has continuous trajectory for every individual $\omega\in\Omega$ . Now, apply the Itô’s formula for the random process $\varphi_{n}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s}):=\psi_{n}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s})\cdot\mathbf{1}_{(0,\infty)}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s})$ :

\begin{split}\mathbb{E}\Big[\varphi_{n}(X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t})\Big]\leq&\mathbb{E}\Big[\varphi_{n}(X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t})\Big]-(x-y)\\ =&\mathbb{E}\Bigg[\int_{0}^{t}D_{x}\varphi_{n}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s})\cdot\left[b(X^{x,\eta,\rho}_{s})-b(X^{y,\eta,\rho}_{s})\right]\operatorname*{\mathrm{d\!}}s\Bigg]\\ +&\frac{1}{2}\mathbb{E}\Bigg[\int_{0}^{t}D_{xx}\varphi_{n}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s})\cdot\left[\sigma(X^{x,\eta,\rho}_{s})-\sigma(X^{y,\eta,\rho}_{s})\right]^{2}\operatorname*{\mathrm{d\!}}s\Bigg]\\ \leq&\mathbb{E}\Bigg[\int_{0}^{t}D_{x}\varphi_{n}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s})\cdot\left[b(X^{x,\eta,\rho}_{s})-b(X^{y,\eta,\rho}_{s})\right]\operatorname*{\mathrm{d\!}}s\Bigg]+\frac{t}{n}\\ \leq&K\cdot\mathbb{E}\Bigg[\int_{0}^{t}[X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s}]^{+}\operatorname*{\mathrm{d\!}}s\Bigg]+\frac{t}{n},\end{split}

(A.1)

where the stochastic integral vanished due to the fact $\mathbb{E}\Big(\int_{0}^{t}|\sigma(X_{s})|^{2}\operatorname*{\mathrm{d\!}}s\Big)<\infty$ for each $t\geq 0$ ; the second inequality holds by the established property of function $h$ and $\psi_{n}$ ; and further by the Lipschitz continuity of $b(\cdot)$ to reach the last line in (A.1). By sending $n\nearrow\infty$ on both sides of (A.1) with Lebesgue dominated convergence theorem yields

	$\displaystyle\lim_{n\nearrow\infty}\mathbb{E}\Big[\varphi_{n}(X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t})\Big]$	$\displaystyle=\mathbb{E}\Big[\lim_{n\nearrow\infty}\varphi_{n}(X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t})\Big]$
		$\displaystyle=\mathbb{E}\left[X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t}\right]^{+}\leq K\cdot\mathbb{E}\Bigg[\int_{0}^{t}[X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s}]^{+}\operatorname*{\mathrm{d\!}}s\Bigg]$

As a consequence, the desired comparison principle follows by Grönwall’s inequality, i.e.

\mathbb{E}[X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t}]^{+}=0,\;\forall t\geq 0.

Secondly, let’s assume that $\mathbb{E}\left[f(X^{x,\eta,\rho}_{t},\eta_{t},\rho_{t})-f(X^{y,\eta,\rho}_{t},\eta_{t},\rho_{t})\right]^{+}>0$ . However, this immediately turns out that there exists some positive measure set $\Omega_{+}\subset\Omega$ such that

\int_{\Omega_{+}}\left[X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t}\right]^{+}\mathbb{P}(\operatorname*{\mathrm{d\!}}\omega)>0

by the non-decreasing property of cost functional. This apparently contradicts the result we just obtained.

Thirdly, we can thus claim that the comparison principle of two objective functions: $J(x,\eta,\rho)\leq J(y,\eta,\rho)$ if $x\leq y$ under each control $(\eta,\rho)\in\mathcal{U}_{0}$ . To show this, observe that

	$\displaystyle\mathbb{E}\left[\int_{0}^{\infty}e^{-\delta t}[f(X^{x,\eta,\rho}_{t},\eta_{t},\rho_{t})-f(X^{y,\eta,\rho}_{t},\eta_{t},\rho_{t})]^{+}\operatorname*{\mathrm{d\!}}t\right]$
	$\displaystyle\quad\leq 2\sup_{z\in\{x,y\}}\mathbb{E}\left[\int_{0}^{\infty}e^{-\delta t}\|f(X^{z,\eta,\rho}_{t},\eta_{t},\rho_{t})\|\operatorname*{\mathrm{d\!}}t\right]<\infty$

by Assumption 2.2. That, together with the joint measurability of process $f(X^{z,\eta,\rho}_{t},\eta_{t},\rho_{t})$ on product $\sigma$ -field $\mathcal{B}([0,t])\otimes\mathcal{F}_{t}$ for every $t\geq 0$ and $z\in\{x,y\}$ , ensures the applicability of Fubini’s theorem. Therefore, interchange the order of the two integrals

\begin{split}&\mathbb{E}\left[\int_{0}^{\infty}e^{-\delta t}[f(X^{x,\eta,\rho}_{t},\eta_{t},\rho_{t})-f(X^{y,\eta,\rho}_{t},\eta_{t},\rho_{t})]^{+}\operatorname*{\mathrm{d\!}}t\right]\\ &\quad=\int_{0}^{\infty}e^{-\delta t}\mathbb{E}_{x}[f(X^{x,\eta,\rho}_{t},\eta_{t},\rho_{t})-f(X^{y,\eta,\rho}_{t},\eta_{t},\rho_{t})]^{+}\operatorname*{\mathrm{d\!}}t=0.\end{split}

Hence, the non-decreasing property of $V$ is a straightforward argument. To be specific, we consider $0<x\leq y<1$ , and for any $\varepsilon>0$ , let $(\eta^{\varepsilon}(y),\rho^{\varepsilon}(y))\in\mathcal{U}_{0}$ be an $\varepsilon$ -optimal control for $X_{0}=y$ such that $J(y;\eta^{\varepsilon}(y),\rho^{\varepsilon}(y))\leq V(y)+\varepsilon$ . Then, since $J(x;\cdot,\cdot)$ is non-decreasing in $x$ for any given admissible control, we have

V(x)\leq J(x;\eta^{\varepsilon}(y),\rho^{\varepsilon}(y))\leq J(y;\eta^{\varepsilon}(y),\rho^{\varepsilon}(y))\leq V(y)+\varepsilon,

and by sending $\varepsilon\to 0$ , we complete the proof.

(ii) We first show a similar result as stated in (2.9) holds for the cost functional $J$ for any given admissible control. For fixed $(\eta,\rho)\in\mathcal{U}_{0}$ , and take any $x,y\in(0,1)$ , and a time $0<T<\infty$ , we have

		$\displaystyle\|J(x;\eta,\rho)-J(y;\eta,\rho)\|$
		$\displaystyle\leq\mathbb{E}\left[\int_{0}^{\infty}e^{-\delta t}\left\|f(X_{t}^{x,\eta,\rho},\eta_{t},\rho_{t})-f(X_{t}^{y,\eta,\rho},\eta_{t},\rho_{t})\right\|\operatorname*{\mathrm{d\!}}t\right]$
		$\displaystyle\leq K\mathbb{E}\left[\int_{0}^{T}e^{-\delta t}(1+\|X_{t}^{x,\eta,\rho}\|^{m}+\|X_{t}^{y,\eta,\rho}\|^{m})\|X_{t}^{x,\eta,\rho}-X_{t}^{y,\eta,\rho}\|\operatorname{\mathrm{d\!}}t\right]+2C_{f}\int_{T}^{\infty}e^{-\delta t}\operatorname{\mathrm{d\!}}t$
		$\displaystyle\leq C_{K,m}\left(\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2m}}(1+\|X_{t}^{x,\eta,\rho}\|^{m+1}+\|X_{t}^{y,\eta,\rho}\|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg]\right)^{\frac{m}{m+1}}$
		$\displaystyle\qquad\times\left(\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2}}(\|X_{t}^{x,\eta,\rho}-X_{t}^{y,\eta,\rho}\|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg]\right)^{\frac{1}{m+1}}+2C_{f}e^{-\delta T}.$		(A.2)

where $X^{x,\eta,\rho}_{\cdot}$ denotes the controlled process under fixed control $(\eta,\rho)\in\mathcal{U}_{0}$ with initial value $x$ . Next, we estimate the following two terms

I_{1}:=\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2m}}(1+|X_{t}^{x,\eta,\rho}|^{m+1}+|X_{t}^{y,\eta,\rho}|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg]

and

I_{2}:=\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2}}(|X_{t}^{x,\eta,\rho}-X_{t}^{y,\eta,\rho}|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg].

Therefore, we can apply boundedness property of the moment of the controlled SDE within $[0,T]$ for any $T<\infty$ , there exists a constant $N=N(m,K)$ such that

	$\displaystyle I_{1}$	$\displaystyle\leq\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2m}}(1+\sup_{0\leq t\leq T}\|X_{t}^{x,\eta,\rho}\|^{m+1}+\sup_{0\leq t\leq T}\|X_{t}^{y,\eta,\rho}\|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg]$
		$\displaystyle\leq\left(1+Ne^{NT}(1+\|x\|^{m+1})+Ne^{NT}(1+\|y\|^{m+1})\right)$

and

\displaystyle I_{2}\leq Ne^{NT}|x-y|^{m+1},

which can be seen, for example, in Corollary 2.5.12 in Krylov, (1980). Thus, by combining $I_{1},I_{2}$ and equation (A) and applying Minkowski’s inequality, we attain that

|J(x;\eta,\rho)-J(y;\eta,\rho)|\leq C(T,m,\delta)(1+|x|^{m}+|y|^{m})|x-y|.

Then, without loss of generality, we consider $x\geq y$ such that $V(x)\geq V(y)$ . Take an $\varepsilon$ -optimal control $(\eta^{\varepsilon},\rho^{\varepsilon})\in\mathcal{U}_{0}$ such that $V(y)\geq J(y;\eta^{\epsilon},\rho^{\epsilon})-\varepsilon$ . Thus, we obtain the following inequality

|V(x)-V(y)|=V(x)-V(y)\leq J(x;\eta^{\epsilon},\rho^{\epsilon})-J(y;\eta^{\epsilon},\rho^{\epsilon})+\varepsilon\leq C(1+|x|^{m}+|y|^{m})|x-y|+\varepsilon.

The desired result will be achieved by sending $\varepsilon$ to zero. Furthermore, $V(x)$ is also continuous uniformly in $x\in(0,1)$ . The Lipschitz and uniform continuity will follow again due to $|x|^{m},|y|^{m}\in(0,1)$ .

Appendix B Proof of Proposition 3.1

(i) Viscosity subsolution. We consider any test function $\phi\in C^{2}((0,1))$ with $\phi\geq V$ such that $\phi(x)=V(x)$ for any given $x\in(0,1)$ , we just need to show that

\inf_{(\rho,\eta)\in U}\Big\{(\mathcal{L}^{\eta,\rho}-\delta)\phi(x)+f(x,\eta,\rho)\Big\}\geq 0,

where

\mathcal{L}^{\eta,\rho}\phi(x)=b(x,\eta,\rho)\phi_{x}(x)+\frac{1}{2}\sigma^{2}(x)\phi_{xx}(x)

is the infinitesimal generator of the controlled diffusion given by (2.3) under a pair of fixed control $(\eta,\rho)\in U$ . Fix any $(\eta,\rho)\in U$ , consider the control $\eta_{t}\equiv\eta,\rho_{t}\equiv\rho$ and the corresponding controlled diffusion $X$ . For a sufficiently small $r>0$ such that $(x-r,x+r)\subset(0,1)$ , consider the stopping time $\tau:=\inf\{s>0:X_{s}\notin(x-r,x+r)\}$ . Then, by applying Itô’s formula and the dynamic programming principle, we have

$\displaystyle 0$	$\displaystyle\leq\mathbb{E}_{x}\left[e^{-\delta\tau}V(X_{\tau})+\int_{0}^{\tau}e^{-\delta t}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}t-V(x)\right]$
	$\displaystyle\leq\mathbb{E}_{x}\left[e^{-\delta\tau}\phi(X_{\tau})+\int_{0}^{\tau}e^{-\delta t}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}t-\phi(x)\right]$
	$\displaystyle=\mathbb{E}_{x}\left[\int_{0}^{\tau}e^{-\delta t}(\mathcal{L}^{\eta,\rho}-\delta)\phi(X_{t})\operatorname{\mathrm{d\!}}t+\int_{0}^{\tau}e^{-\delta t}f(X_{t},\eta_{t},\rho_{t})\operatorname{\mathrm{d\!}}t\right].$	(B.1)

Now, we assume that $L(x):=(\mathcal{L}^{\eta,\rho}-\delta)\phi(x)+f(x,\eta,\rho)\leq-\epsilon<0$ for any given $(\eta,\rho)\in U$ . By the continuity of the function $L$ , there exists a $h>0$ such that $L(y)<-\epsilon/2$ for all $y\in(x-h,x+h)$ . Then, let $\tilde{\tau}:=\inf\{s>0:X_{s}\notin(x-h,x+h)\}$ , we have

\mathbb{E}_{x}\Big[\int_{0}^{\tilde{\tau}}e^{-\delta t}L(X_{t})\operatorname*{\mathrm{d\!}}t\Big]<-\mathbb{E}_{x}\Big[\int_{0}^{\tilde{\tau}}e^{-\delta t}\frac{\epsilon}{2}\operatorname*{\mathrm{d\!}}t\Big]=-\mathbb{E}_{x}\Big[\frac{\epsilon(1-e^{-\delta\tilde{\tau}})}{2\delta}\Big]<0,

which is a contradiction to (B) if we choose $r=h$ above. Hence, by the arbitrariness of the control, we must have

\inf_{(\rho,\eta)\in U}\Big\{(\mathcal{L}^{\eta,\rho}-\delta)\phi(x)+f(x,\eta,\rho)\Big\}\geq 0.

(ii) Viscosity supersolution. For any $x\in(0,1)$ , let $\varphi\in C^{2}((0,1))$ be any test function such that $\varphi-V$ attains a maximum value of zero at $x$ , we show that

\inf_{(\rho,\eta)\in U}\left\{(\mathcal{L}^{\eta,\rho}-\delta)\varphi(x)+f(x,\eta,\rho)\right\}\leq 0.

We prove this by contradiction. Assume that $\inf_{(\rho,\eta)\in U}\left\{(\mathcal{L}^{\eta,\rho}-\delta)\varphi(x)+f(x,\eta,\rho)\right\}>0$ , then by the continuity of the function $(\mathcal{L}^{\eta,\rho}-\delta)\varphi(\cdot)$ and $f(\cdot,\eta,\rho)$ uniformly in the control, there exist $\epsilon>0$ and $\varpi>0$ such that

\inf_{(\rho,\eta)\in U}\left\{(\mathcal{L}^{\eta,\rho}-\delta)\varphi(y)+f(y,\eta,\rho)\right\}>\epsilon,\quad\text{for all }y\in(x-\varpi,x+\varpi)\subset(0,1).

Then, for any fixed control $(\eta_{t},\rho_{t})_{t\geq 0}\equiv(\eta,\rho)\in U$ , let $X_{t}$ be the controlled process with $X_{0}=x$ . We define the first exit time of the interval $(x-\varpi,x+\varpi)$ as $\tau^{\varpi}:=\inf\{t>0:X_{t}\notin(x-\varpi,x+\varpi)\}$ . Then, by applying Itô’s formula, we have

	$\displaystyle\varphi(x)$	$\displaystyle=\mathbb{E}_{x}\Bigg[e^{-\delta(t\wedge\tau^{\varpi})}\varphi(X_{t\wedge\tau^{\varpi}})-\int_{0}^{t\wedge\tau^{\varpi}}e^{-\delta s}(\mathcal{L}^{\eta,\rho}-\delta)\varphi(X_{s})\operatorname*{\mathrm{d\!}}s\Bigg]$
		$\displaystyle\leq\mathbb{E}_{x}\Bigg[e^{-\delta(t\wedge\tau^{\varpi})}V(X_{t\wedge\tau^{\varpi}})+\int_{0}^{t\wedge\tau^{\varpi}}e^{-\delta s}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}s\Bigg]$
		$\displaystyle\quad-\mathbb{E}_{x}\Bigg[\Big(\int_{0}^{t\wedge\tau^{\varpi}}e^{-\delta s}(\mathcal{L}^{\eta,\rho}-\delta)\varphi(X_{s})\operatorname{\mathrm{d\!}}s+\int_{0}^{t\wedge\tau^{\varpi}}e^{-\delta s}f(X_{t},\eta_{t},\rho_{t})\operatorname{\mathrm{d\!}}s\Big)\Bigg]$
		$\displaystyle\leq\mathbb{E}_{x}\Bigg[e^{-\delta(t\wedge\tau^{\varpi})}V(X_{t\wedge\tau^{\varpi}})+\int_{0}^{t\wedge\tau^{\varpi}}e^{-\delta s}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}s\Bigg]-\epsilon\mathbb{E}_{x}\Bigg[\frac{1-e^{-t\wedge\tau^{\varpi}}}{\delta}\Bigg].$

Then, by the arbitrariness of control $(\eta,\rho)\in U$ , the dynamic programming principle and the fact that $\tau^{\varpi}>0$ for all $\omega\in\Omega$ , we obtain that $\varphi(x)<V(x)$ , which is a contradiction since $\varphi(x)=V(x)$ .

Appendix C Proof of Proposition 3.4

We prove the comparison principle by the usual arguments of contradiction (see e.g. Touzi, (2012), Albrecher et al., (2022)). Assume that

0<M:=\sup_{x\in\mathcal{O}}\{v(x)-u(x)\},

and $x^{*}:=\operatorname*{arg\,max}_{x\in\mathcal{O}}\{v(x)-u(x)\}$ . Since both $u$ and $v$ are Lipschitz continuous on $\mathcal{O}\subset(0,1)$ , there exists a constant $K>0$ such that

|u(x)-u(y)|\leq K|x-y|,\quad|v(x)-v(y)|\leq K|x-y|,\quad\text{for }x,y\in\mathcal{O}.

To proceed, let us consider the following set

\mathcal{S}:=\Big\{(x,y)\in\mathcal{O}\times\mathcal{O}:x\leq y\Big\}.

For all $\alpha>0$ , define two auxiliary functions,

	$\displaystyle\Psi_{\alpha}(x,y)$	$\displaystyle:=\frac{\alpha}{2}(x-y)^{2}+\frac{2K}{\alpha^{2}(y-x)+\alpha},$
	$\displaystyle H_{\alpha}(x,y)$	$\displaystyle:=v(x)-u(y)-\Psi_{\alpha}(x,y).$

Let $M_{\alpha}:=\max_{(x,y)\in\mathcal{\mathcal{S}}}H_{\alpha}(x,y)$ , and $(x_{\alpha},y_{\alpha}):=\operatorname*{arg\,max}_{(x,y)\in\mathcal{S}}H_{\alpha}(x,y)$ . Then, we directly have $M_{\alpha}\geq H_{\alpha}(x^{*},x^{*})=M-\frac{2K}{\alpha}$ , hence

\liminf_{\alpha\to\infty}M_{\alpha}\geq M>0.

Note that, by using the increasing and Lipschitz continuous property of $u$ and $v$ and the fact that $u\geq v$ on $\partial\mathcal{O}$ , we can show that there exists a sufficiently large $\overline{\alpha}$ such that for all $\alpha\geq\overline{\alpha}$ , $(x_{\alpha},y_{\alpha})$ is an interior point of $\mathcal{O}$ (see, for example, (Wang et al., , 2024, Appendix B)).

Then by the following inequality,

H_{\alpha}(x_{\alpha},x_{\alpha})+H_{\alpha}(y_{\alpha},y_{\alpha})\leq 2H_{\alpha}(x_{\alpha},y_{\alpha}),

we arrive at

\alpha|x_{\alpha}-y_{\alpha}|^{2}\leq|u(x_{\alpha})-u(y_{\alpha})|+|v(x_{\alpha})-v(y_{\alpha})|+4K(y_{\alpha}-x_{\alpha})\leq 6K|x_{\alpha}-y_{\alpha}|.

Therefore, by considering a sequence $\alpha_{n}\to\infty$ as $n\to\infty$ such that $(x_{\alpha_{n}},y_{\alpha_{n}})\to(\hat{x},\hat{y})\in\mathcal{O}$ , we have

|x_{\alpha_{n}}-y_{\alpha_{n}}|\leq\frac{6K}{\alpha_{n}}

which yields $\hat{x}=\hat{y}$ .

Then, we construct two twice continuously differentiable functions

	$\displaystyle\varphi(x)$	$\displaystyle=\Psi_{\alpha}(x,y_{\alpha})-\Psi_{\alpha}(x_{\alpha},y_{\alpha})+v(x_{\alpha}),$
	$\displaystyle\psi(y)$	$\displaystyle=\Psi_{\alpha}(x_{\alpha},y_{\alpha})-\Psi_{\alpha}(x_{\alpha},y)+u(y_{\alpha}),$

which are essentially test functions for the subsolution and supersolution of (3.2) at the points $x_{\alpha}$ and $y_{\alpha}$ , respectively. To simplify our proof here, we first assume that $v(x)$ and $u(y)$ are twice continuously differentiable at $x_{\alpha}$ and $y_{\alpha}$ , respectively; one can resort to a more general theorem to get a similar result when $u,v$ are not twice continuously differentiable at the point (see e.g. Crandall et al., (1992)). Since $H_{\alpha}$ reaches a local maximum at $(x_{\alpha},y_{\alpha})$ which is an interior point in $\mathcal{O}$ , hence we have

\frac{\partial}{\partial x}H_{\alpha}(x_{\alpha},y_{\alpha})=\frac{\partial}{\partial y}H_{\alpha}(x_{\alpha},y_{\alpha})=0.

Therefore, we arrive at

v_{x}(x_{\alpha})=\frac{\partial}{\partial x}\Psi_{\alpha}(x_{\alpha},y_{\alpha})=-\frac{\partial}{\partial y}\Psi_{\alpha}(x_{\alpha},y_{\alpha})=u_{y}(y_{\alpha}).

In addition, the Hessian matrix is negative semi-definite,

\begin{pmatrix}\frac{\partial^{2}}{\partial x^{2}}H_{\alpha}(x_{\alpha},y_{\alpha})&\frac{\partial^{2}}{\partial x\partial y}H_{\alpha}(x_{\alpha},y_{\alpha})\\ \frac{\partial^{2}}{\partial y\partial x}H_{\alpha}(x_{\alpha},y_{\alpha})&\frac{\partial^{2}}{\partial y^{2}}H_{\alpha}(x_{\alpha},y_{\alpha})\end{pmatrix}\leq 0

(C.1)

Let $A=v_{xx}(x_{\alpha}),B=u_{yy}(y_{\alpha})$ , and

M=\begin{pmatrix}\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})&\frac{\partial^{2}}{\partial x\partial y}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\\ \frac{\partial^{2}}{\partial y\partial x}\Psi_{\alpha}(x_{\alpha},y_{\alpha})&\frac{\partial^{2}}{\partial y^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha}),\end{pmatrix}

we can rewrite (C.1) as

\begin{pmatrix}A-\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})&-\frac{\partial^{2}}{\partial x\partial y}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\\ -\frac{\partial^{2}}{\partial y\partial x}\Psi_{\alpha}(x_{\alpha},y_{\alpha})&-B-\frac{\partial^{2}}{\partial y^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\end{pmatrix}=\begin{pmatrix}A&0\\ 0&-B\end{pmatrix}-M\leq 0.

Then, according to (Crandall et al., , 1992, Theorem 3.2), for any $\varepsilon>0$ , there exists $A_{\varepsilon},B_{\varepsilon}\in\mathbb{R}$ such that,

\begin{pmatrix}A_{\varepsilon}&0\\ 0&-B_{\varepsilon}\end{pmatrix}\leq M+\varepsilon M^{2},

(C.2)

and $(\varphi_{x}(x_{\alpha}),A_{\varepsilon})\in D^{2,+}_{x_{\alpha}}v(x_{\alpha})$ and $(\psi_{x}(x_{\alpha}),B_{\varepsilon})\in D^{2,-}_{y_{\alpha}}u(y_{\alpha})$ , hence, we have

F(x_{\alpha},v(x_{\alpha}),\varphi_{x}(x_{\alpha}),A_{\varepsilon})\geq 0,\quad\text{ and }\quad F(y_{\alpha},u(x_{\alpha}),\psi_{y}(y_{\alpha}),B_{\varepsilon})\leq 0.

(C.3)

In addition, by noting that $M=\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\begin{pmatrix}1&-1\\ -1&1\end{pmatrix}$ , we obtain from (C.2) that

	$\displaystyle A_{\varepsilon}-B_{\varepsilon}$	$\displaystyle=\begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix}A_{\varepsilon}&0\\ 0&-B_{\varepsilon}\end{pmatrix}\begin{pmatrix}1\\ 1\end{pmatrix}\leq\begin{pmatrix}1&1\end{pmatrix}(M+\varepsilon M^{2})\begin{pmatrix}1\\ 1\end{pmatrix}$
		$\displaystyle=\begin{pmatrix}1&1\end{pmatrix}\Bigg[\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\begin{pmatrix}1&-1\\ -1&1\end{pmatrix}+\varepsilon\Big(\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\Big)^{2}\begin{pmatrix}1&-1\\ -1&1\end{pmatrix}\Bigg]\begin{pmatrix}1\\ 1\end{pmatrix}^{2}$
		$\displaystyle=\Big(\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})+2\varepsilon\Big(\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\Big)^{2}\Big)\begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix}1&-1\\ -1&1\end{pmatrix}\begin{pmatrix}1\\ 1\end{pmatrix}=0.$

Therefore, we can derive from (C.3),

$\displaystyle 0$	$\displaystyle\leq F(x_{\alpha},v(x_{\alpha}),\varphi_{x}(x_{\alpha}),A_{\varepsilon})-F(y_{\alpha},u(x_{\alpha}),\psi_{y}(y_{\alpha}),B_{\varepsilon})$
	$\displaystyle\leq\inf_{(\rho,\eta)\in U}\Big\{b(x_{\alpha},\eta,\rho)\varphi_{x}(x_{\alpha})-b(y_{\alpha},\eta,\rho)\psi_{y}(y_{\alpha})+f(x_{\alpha},\eta,\rho)-f(y_{\alpha},\eta,\rho)\Big\}$
	$\displaystyle+\frac{1}{2}\Big(\sigma^{2}(x_{\alpha})A_{\varepsilon}-\sigma^{2}(y_{\alpha})B_{\varepsilon}\Big)-\delta(v(x_{\alpha})-u(y_{\alpha})).$	(C.4)

Hence, by (C), we can obtain

	$\displaystyle 0<$	$\displaystyle M\leq\liminf_{\alpha\to\infty}M_{\alpha}\leq\lim_{n\to\infty}M_{\alpha_{n}}=\lim_{n\to\infty}(v(x_{\alpha_{n}})-u(y_{\alpha_{n}}))$
		$\displaystyle\leq\lim_{n\to\infty}\frac{1}{\delta}\Bigg(\inf_{(\rho,\eta)\in U}\Big\{b(x_{\alpha},\eta,\rho)\varphi_{x}(x_{\alpha})-b(y_{\alpha},\eta,\rho)\psi_{y}(y_{\alpha})+f(x_{\alpha},\eta,\rho)-f(y_{\alpha},\eta,\rho)\Big\}$
		$\displaystyle\qquad\qquad+\frac{1}{2}\Big(\sigma^{2}(x_{\alpha})A_{\varepsilon}-\sigma^{2}(y_{\alpha})B_{\varepsilon}\Big)\Bigg)$
		$\displaystyle=\inf_{(\rho,\eta)\in U}\{b(\hat{x},\eta,\rho)\}\frac{1}{\delta}(\varphi_{x}(\hat{x})-\psi_{y}(\hat{x}))+\frac{\sigma^{2}(\hat{x})}{2\delta}(A_{\varepsilon}-B_{\varepsilon})$
		$\displaystyle\leq\inf_{(\rho,\eta)\in U}\{b(\hat{x},\eta,\rho)\}\frac{1}{\delta}(\varphi_{x}(\hat{x})-\psi_{y}(\hat{x}))=0,$

which is a contradiction, then we complete the proof.

Appendix D Infinite-Horizon BSDE and Convergence of PIA

D.1 Assumptions and notations

Fix a filtered probability space $(\Omega,\mathcal{F}_{t},\mathbb{F}:=(\mathcal{F}_{t})_{0\leq t\leq\infty},\mathbb{P})$ and let $W=(W_{t})_{0\leq t<\infty}$ be a one-dimensional Wiener process on this space. For self-containment, we introduce the following notations:

(i)

We first introduce the notations of some spaces that are involved in the later analysis: For any $\gamma>0$ , let $\mathbb{L}_{\mathcal{F}}^{2,\gamma}(0,\infty;\mathbb{R})$ be the set of all $\mathbb{R}$ -valued $\mathbb{F}$ -adapted process $\phi_{\cdot}$ such that

\mathbb{E}\Bigg[\int_{0}^{\infty}e^{2\gamma t}|\phi_{t}|^{2}\operatorname*{\mathrm{d\!}}t\Bigg]<\infty.

Let $\mathbb{L}_{\mathcal{F}}^{2,\gamma}(\Omega;C[0,\infty];\mathbb{R})$ be the set of all $\mathbb{R}$ -valued $\mathbb{F}$ -adapted continuous process $\phi_{\cdot}$ such that

\mathbb{E}\Big[\sup_{t\in[0,\infty]}e^{2\gamma t}|\phi_{t}|^{2}\Big]<\infty.

Finally, let $\mathbb{L}_{\mathcal{F}_{\tau}}^{2,\gamma}(\Omega;\mathbb{R})$ be the set of $\mathbb{R}$ -valued $\mathcal{F}_{\tau}$ -measurable random variables $\xi$ such that $\mathbb{E}\Big[e^{2\gamma\tau}|\xi|^{2}\Big]<\infty$ , where $\tau$ is any $\mathbb{F}$ -stopping time taking values in $[0,\infty]$ .

Then, we define the space

\mathbb{B}_{\gamma}[0,\infty]:=\left\{\mathbb{L}_{\mathcal{F}}^{2,\gamma}(\Omega;C[0,\infty];\mathbb{R})\bigcap\mathbb{L}_{\mathcal{F}}^{2,\gamma}(0,\infty;\mathbb{R})\right\}\times\mathbb{L}_{\mathcal{F}}^{2,\gamma}(0,\infty;\mathbb{R})

(D.1)

with the norm

\displaystyle\|(Y_{\cdot},Z_{\cdot})\|_{\mathbb{B}_{\gamma}[0,\infty]}:=\Bigg(\mathbb{E}\Bigg[\sup_{t\in[0,\infty]}e^{2\gamma t}|Y_{t}|^{2}+\int_{0}^{\infty}e^{2\gamma t}|Y_{t}|^{2}\operatorname*{\mathrm{d\!}}t+\int_{0}^{\infty}e^{2\gamma t}|Z_{t}|^{2}\operatorname*{\mathrm{d\!}}t\Bigg]\Bigg)^{1/2},

for any $(Y_{\cdot},Z_{\cdot})\in\mathbb{B}_{\gamma}[0,\infty]$ . Obviously, $\mathbb{B}_{\gamma}[0,\infty]$ with the norm is a Banach space.

(ii)

For a constant $\gamma>0$ , and predictable process $(\phi_{t})_{t\geq 0}$ , we introduce the $\mathbb{L}_{\gamma}^{2}$ -norm:

\|\phi\|_{\mathbb{L}^{2}_{\gamma}}:=\left(\mathbb{E}\int_{0}^{\infty}e^{2\gamma t}|\phi_{t}|^{2}\operatorname*{\mathrm{d\!}}t\right)^{1/2}.

(iii)

For adapted process $\phi_{\cdot}$ such that $\mathbb{E}\left[\int_{0}^{\infty}|\phi_{t}|^{2}\operatorname*{\mathrm{d\!}}t\right]<\infty$ , we define

$(\phi\bullet W)_{t}:=\int_{0}^{t}\phi_{s}\operatorname*{\mathrm{d\!}}W_{s},\quad\text{for}\;t\geq 0.$
(iv)

For any continuous local martingale $M$ let $(\langle M\rangle_{t})_{t\geq 0}$ denote the quadratic variation process and let

$\mathcal{E}(M)_{t}:=\exp\left\{M_{t}-\frac{1}{2}\langle M\rangle_{t}\right\},\quad\text{for}\;t\geq 0$

denotes the Doléans-Dade exponential of $M_{t}$ given the initial data $M_{0}=0$ .

Assumption D.1

For each fixed $(x,z)\in(0,1)\times\mathbb{R}$ , we assume the function

u(x,z):=\operatorname*{arg\,min}_{u\in U}\left\{\big(b^{u}\sigma^{-1}\big)(x)z+f^{u}(x)\right\},

is measurable.

For notation simplicity, we write $u(x,z)=(\eta,\rho)(x,z)\in U$ , where $U=[0,1]\times[0,\infty)$ is the action space of our cyber risk control problem (2.5), and $b^{u}(x)=b(x,\eta,\rho)$ , and $f^{u}(x)=f(x,\eta,\rho)$ , which are the drift term and cost function in (3.2), $\sigma^{-1}(x)=1/\sigma(x)$ . One can refer to Kerimkulov et al., (2020) for a short discussion on the validity of the measurability.

Assumption D.2

There are constants $K,\theta\geq 0$ such that the following hold:

(i)

For $x\in(0,1)$ , $u,u^{\prime}\in U$ ,

$|b^{u}(x)-b^{u^{\prime}}(x)|\leq\sqrt{\theta}|u-u^{\prime}|,$

and for all $x\in(0,1)$ , $u\in U$ , we have

$|\big(b^{u}\sigma^{-1}\big)(x)|<K.$

(ii)

For all $x,x^{\prime}\in(0,1),z,z^{\prime}\in\mathbb{R}$ , and $u\in U$ we have that

	$\displaystyle\|u(x,z)-u(x^{\prime},z)\|$	$\displaystyle\leq K\|x-x^{\prime}\|,$
	$\displaystyle\|u(x,z)-u(x,z^{\prime})\|$	$\displaystyle\leq\sqrt{\theta}\|z-z^{\prime}\|.$

(iii)

For $x\in(0,1)$ , $u,u^{\prime}\in U$ ,

$|f^{u}(x)-f^{u^{\prime}}(x)|\leq\sqrt{\theta}|u-u^{\prime}|.$

D.2 Some preliminary lemmas

The following lemma is a straightforward result of Girsanov’s theorem under the random process $\Big(\big(b^{u}\sigma^{-1}\big)(X_{s})\Big)_{s\geq 0}$ , which is bounded (according to Assumption D.2). This result will be helpful in the construction of a contraction mapping when proving Theorem 4.1 under a new measure $\widehat{\mathbb{P}}$ on the probability space.

Lemma D.1

Let $0\leq t\leq\infty$ , $x\in(0,1)$ , and $X:=X^{x,u^{*}}$ be the unique solution to the SDE (2.3), started from time $0$ with initial data $X_{0}=x$ , and controlled by the optimal control process $u^{*}=(u^{*}_{t})_{t\geq 0}$ (with a abuse of notation, we use $u=(\eta_{.},\rho_{.})\in\mathcal{U}_{0}$ to denote the control process as well). Then, $\operatorname*{\mathrm{d\!}}\widehat{\mathbb{P}}:=\mathcal{E}\Big(\big(b^{u^{*}}\sigma^{-1}\big(X)\bullet W\Big)_{\infty}\operatorname*{\mathrm{d\!}}\mathbb{P}$ is equivalent to the probability measure $\mathbb{P}$ , and the process

\widehat{W}_{t}:=W_{t}+\int_{0}^{t}b^{u^{*}_{s}}(X_{s})\sigma^{-1}(X_{s})\operatorname*{\mathrm{d\!}}s

is a $\widehat{\mathbb{P}}$ -Wiener process.

Proof. See Theorem 6.8.8 of Krylov, (2002).

Assumption D.3

For all $(z,Z)\in\mathbb{R}^{2}$ , $F_{s}(z,Z)$ for $s\in[0,\infty]$ is progressively measurable. And, there exist constants $L_{1},L_{2},\gamma>0$ such that $F_{\cdot}(0,0)\in\mathbb{L}_{\mathcal{F}}^{2,\gamma}(0,\infty;\mathbb{R})$ , and for any $z,Z,\bar{z},\bar{Z}\in\mathbb{R}$ , $s\in[0,\infty)$ ,

|F_{s}(z,Z)-F_{s}(\bar{z},\bar{Z})|\leq L_{1}|z-\bar{z}|+L_{2}|Z-\bar{Z}|,\quad\mathbb{P}\text{-}a.s.,

(D.2)

and

\gamma>\frac{L^{2}_{1}+L_{2}^{2}}{2}.

(D.3)

The following lemma states the unique solution of an infinite-horizon BSDE associated with our infinite-horizon stochastic control problem.

Lemma D.2

Let Assumption D.3 hold. For any given $z_{\cdot}\in\mathbb{L}_{\mathcal{F}}^{2,\gamma}(0,\infty;\mathbb{R})$ , the following BSDE

Y_{t}=\int_{t}^{\infty}F_{s}(z_{s},Z_{s})\operatorname*{\mathrm{d\!}}s-\int_{t}^{\infty}Z_{s}\operatorname*{\mathrm{d\!}}W_{s},\quad t\in[0,\infty),

(D.4)

admits a unique solution $(Y_{\cdot},Z_{\cdot})\in\mathbb{B}_{\gamma}[0,\infty]$ . Further, there exists a constant $C>0$ such that if $(\overline{Y}_{\cdot},\overline{Z}_{\cdot})\in\mathbb{B}_{\gamma}[0,\infty]$ is the solution of (D.4) with $F$ replaced by $\overline{F}$ , where $\overline{F}$ satisfies Assumption D.3, then

		$\displaystyle\\|(Y,Z)-(\overline{Y},\overline{Z})\\|_{\mathbb{B}_{\gamma}[0,\infty]}$
	$\displaystyle\leq$	$\displaystyle C\Bigg(\int_{0}^{\infty}e^{2\gamma s}\|F_{s}(z_{s},Z_{s})\|-\overline{F}_{s}(z_{s},Z_{s})\|^{2}\operatorname*{\mathrm{d\!}}s\Bigg)^{1/2}$		(D.5)

Proof. See, for example,(Yong and Zhou, , 1999, Theorem 7.3.6).

Moreover, the comparison principle for infinite-horizon BSDE was also well established, see, e.g. (Hamadène et al., , 1999, Theorem 2.2 ).

Lemma D.3

Let $(Y^{i},Z^{i})\in\mathbb{B}_{\gamma}[0,\infty]$ be the solution to the following BSDE

Y^{i}_{t}=\xi^{i}+\int_{t}^{\infty}F^{i}_{s}(Z_{s}^{i})\operatorname*{\mathrm{d\!}}s-\int_{t}^{\infty}Z^{i}_{s}\operatorname*{\mathrm{d\!}}W_{s},~t\in[0,\infty],~i=1,2,

where $\xi^{i}\in\mathbb{L}_{\mathcal{F}_{\infty}}^{2,\gamma}(\Omega;\mathbb{R})$ and $F^{i}$ satisfy the Assumption D.3 (with $z$ removed) for $i=1,2$ , respectively. If further $\xi^{1}\geq\xi^{2}~a.s.$ , and $F^{1}_{s}(Z^{2}_{s})\geq F^{2}_{s}(Z^{2}_{s})~a.s.$ for all $s\geq 0$ , then $Y^{1}_{t}\geq Y^{2}_{t}~\mathbb{P}\text{-}a.s.$ for all $t\geq 0$ .

Now, we are ready to present the following Lemma D.4, which plays a key role in the proof of Theorem 4.1. Lemma D.4 follows a similar idea to Lemma A.5 of Kerimkulov et al., (2020), see also Lemma 3.2 of Fuhrman and Tessitore, (2004).

Lemma D.4

Let $F:\Omega\times[0,\infty)\times\mathbb{R}\times\mathbb{R}\to\mathbb{R}$ be a measurable function that satisfies Assumption D.3. Fix $\delta>(L_{1}+L_{2})^{2}/2$ . Let $(Y_{\cdot},Z_{\cdot})\in\mathbb{B}_{\delta}[0,\infty]$ be the unique solution to (D.4) for any given $z_{\cdot}\in\mathbb{L}_{\mathcal{F}}^{2,\delta}(0,\infty;\mathbb{R})$ . Moreover, assume that for $z^{1}_{\cdot},z^{2}_{\cdot}\in\mathbb{L}_{\mathcal{F}}^{2,\delta}(0,\infty;\mathbb{R})$ the following condition is satisfied:

|F_{t}(z^{1}_{t},Z^{1}_{t})-F_{t}(z^{2}_{t},Z^{2}_{t})|\leq L_{1}|z^{1}_{t}-z^{2}_{t}|+L_{2}|Z^{1}_{t}-Z^{2}_{t}|,\quad t\in[0,\infty],~\mathbb{P}\textbf{-}a.s.

Then there is $0<\gamma<\delta$ and $q\in(0,1)$ such that for any $t\geq 0$ we have

\mathbb{E}\Big[e^{2\gamma t}|Y^{1}_{t}-Y^{2}_{t}|^{2}\Big]+\|Z^{1}-Z^{2}\|^{2}_{\mathbb{L}^{2}_{\gamma}}\leq q\|z^{1}-z^{2}\|^{2}_{\mathbb{L}^{2}_{\gamma}},

(D.6)

where $(Y^{i}_{\cdot},Z^{i}_{\cdot})\in\mathbb{B}_{\delta}[0,\infty]$ is the unique solution to (D.4) corresponding to $z_{i}\in\mathbb{L}_{\mathcal{F}}^{2,\delta}(0,\infty;\mathbb{R})$ for $i=1,2$ , respectively. Furthermore, one can choose $\gamma$ sufficiently large and $\delta-\gamma>0$ sufficiently small such that $q\in(0,1/2)$ , and the above results hold as well.

Proof. Denote $\Delta Y:=Y^{1}-Y^{2},\Delta Z:=Z^{1}-Z^{1},$ and $\Delta z:=z^{1}-z^{2}$ . Then, apply Itô’s formula to $e^{2\gamma s}|\Delta Y_{s}|^{2}$ :

\begin{split}e^{2\gamma t}|\Delta Y_{t}|^{2}+\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s&=\int_{t}^{\infty}e^{2\gamma s}[2\Delta Y_{s}\big(F_{s}(z^{1}_{s},Z^{1}_{s})-F_{s}(z^{2}_{s},Z^{2}_{s})\big)-2\gamma|\Delta Y_{s}|^{2}]\operatorname*{\mathrm{d\!}}s\\ &-2\int_{t}^{\infty}e^{2\gamma s}|\Delta Y_{s}||\Delta Z_{s}|\operatorname*{\mathrm{d\!}}W_{s}.\end{split}

By taking the expectation of the equation above, we get

\begin{split}&\mathbb{E}\left[e^{2\gamma t}|\Delta Y_{t}|^{2}+\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\right]\\ &\qquad=\mathbb{E}\left[\int_{t}^{\infty}e^{2\gamma s}[2\Delta Y_{s}(F_{s}(z^{1}_{s},Z^{1}_{s})-F_{s}(z^{2}_{s},Z^{2}_{s}))-2\gamma|\Delta Y_{s}|^{2}]\operatorname*{\mathrm{d\!}}s\right].\end{split}

(D.7)

By the Lipschitz property of the generator and by Young’s inequality, we observe that, for any $\epsilon>0$ ,

	$\displaystyle\mathbb{E}\left[\int_{t}^{\infty}e^{2\gamma s}\big(2\Delta Y_{s}(F_{s}(z^{1}_{s},Z^{1}_{s})-F_{s}(z^{2}_{s},Z^{2}_{s}))-2\gamma\|\Delta Y_{t}\|^{2}\big)\operatorname*{\mathrm{d\!}}s\right]$
$\displaystyle\leq$	$\displaystyle\mathbb{E}\left[\int_{t}^{\infty}e^{2\gamma s}\big(2\Delta Y_{s}(L_{1}\|\Delta z_{s}\|+L_{2}\|\Delta Z_{s}\|)-2\gamma\|\Delta Y_{t}\|^{2}\big)\operatorname*{\mathrm{d\!}}s\right]$
$\displaystyle\leq$	$\displaystyle\mathbb{E}\left[\int_{t}^{\infty}e^{2\gamma s}\big((L_{1}+L_{2})\epsilon\Delta\|Y_{s}\|^{2}+L_{1}\epsilon^{-1}\|\Delta z_{s}\|^{2}+L_{2}\epsilon^{-1}\|\Delta Z_{s}\|^{2}-2\gamma\|\Delta Y_{t}\|^{2}\big)\operatorname*{\mathrm{d\!}}s\right]$
$\displaystyle\leq$	$\displaystyle\max\left\{\frac{(L_{1}+L_{2})L_{1}}{2\gamma},\frac{(L_{1}+L_{2})L_{2}}{2\gamma}\right\}\cdot\mathbb{E}\left[\int_{t}^{\infty}e^{2\gamma s}\Big(\|\Delta z_{s}\|^{2}+\|\Delta Z_{s}\|^{2}\Big)\operatorname*{\mathrm{d\!}}s\right],$	(D.8)

where the last inequality holds by choosing $\epsilon=2\gamma/(L_{1}+L_{2})$ . Then, take $\gamma$ sufficiently large such that $\delta-\gamma>0$ sufficiently small, we have

\tilde{q}:=\max\left\{\frac{(L_{1}+L_{2})L_{1}}{2\gamma},\frac{(L_{1}+L_{2})L_{2}}{2\gamma}\right\}\in\left(0,\frac{1}{2}\right),

and subtracting $\tilde{q}\cdot\mathbb{E}\big[\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\big]$ on both sides of (D.7) together with (D.2), we have

\mathbb{E}\left[e^{2\gamma t}|\Delta Y_{t}|^{2}+(1-\tilde{q})\cdot\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\right]\leq\tilde{q}\cdot\mathbb{E}\Big[\int_{t}^{\infty}e^{2\gamma s}|\Delta z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\Big].

(D.9)

Dividing by $1-\tilde{q}$ on both sides of (D.9), we have

\begin{split}\mathbb{E}\left[e^{2\gamma t}|\Delta Y_{t}|^{2}+\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\right]&\leq\mathbb{E}\left[\frac{1}{1-\tilde{q}}\cdot e^{2\gamma t}|\Delta Y_{t}|^{2}+\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\right]\\ &\leq q\cdot\mathbb{E}\Big[\int_{t}^{\infty}e^{2\gamma s}\left(|\Delta z_{s}|^{2}\right)\operatorname*{\mathrm{d\!}}s\Big],\end{split}

where the first inequality holds due to $\frac{1}{1-\tilde{q}}>2$ , and the second holds by setting $q:=\frac{\tilde{q}}{1-\tilde{q}}\in(0,1)$ . If one further chooses $\gamma$ large such that $\tilde{q}\in(0,1/3)$ , hence one get $q\in(0,1/2)$ , and complete the proof.

Lemma D.5

The solution $Y_{0}^{x}$ to the following BSDE

Y_{0}^{x}=\int_{0}^{\infty}F_{s}(Z_{s},Z_{s})\operatorname*{\mathrm{d\!}}s-\int_{0}^{\infty}Z_{s}\operatorname*{\mathrm{d\!}}\widehat{W}_{s},

(D.10)

with $F$ given by (D.14) which satisfies Assumption D.3, is deterministic and continuous function on $(0,1)$ , and is a viscosity solution to to the HJB equation (3.2).

Proof. We change the measure back to $\mathbb{P}$ and rewrite the BSDE (D.10) as

Y_{0}^{x}=\int_{0}^{\infty}e^{-\delta s}f^{u(X_{s},Z_{s})}(X_{s})\operatorname*{\mathrm{d\!}}s-\int_{0}^{\infty}Z_{s}\operatorname*{\mathrm{d\!}}W_{s}.

By the definition of infinite-horizon BSDE, $Y_{t}$ is a $\mathcal{F}_{t}$ -adapted process for $t\geq 0$ . Obviously, $Y^{x}_{0}$ is measurable w.r.t. trivial $\sigma$ -field $\mathcal{F}_{0}$ , hence deterministic. The continuity of $Y^{x}_{0}$ for $x\in(0,1)$ follows the inequality (D.2) combined with the continuity of $F$ and $X^{x,u}_{\cdot}$ in $x$ . Let’s then prove that $Y_{0}^{x}$ is the viscosity solution to the HJB equation (3.2). We only show the case for viscosity supersolution, and the subsolution property will follow from the same idea. As we have proved $Y^{x}_{0}$ is deterministic, consider the discounted process $Y^{x}_{s}=e^{-\delta s}v(X_{s})$ with $v(x)=Y_{0}^{x}$ . Take $\varphi\in C^{2}(0,1)$ being a test function such that $v-\varphi$ attains its (local) minimum value of zero at any $x\in(0,1)$ , we shall have the supersolution inequality holds. We prove the assertion by contradiction. Assume that

\inf_{u\in U}\Big\{\big(b^{u}D_{x}\varphi\big)(x)+\frac{1}{2}\big(\sigma^{2}D_{xx}\varphi\big)(x)-\delta\varphi(x)+f^{u}(x)\Big\}>0.

Since $\varphi$ is smooth enough and $f$ is continuous, there exists $\epsilon>0$ , such that for any $u\in U$ , $y\in(x-\epsilon,x+\epsilon)$ , we have $v(y)\geq\varphi(y)$ and

(b^{u}D_{x}\varphi\big)(y)+\frac{1}{2}\big(\sigma^{2}D_{xx}\varphi\big)(y)-\delta\varphi(y)+f^{u}(y)>0.

(D.11)

Let $\tau:=\inf\{s\geq 0:X_{s}\notin(x-\epsilon,x+\epsilon)\}\wedge t$ for any $t\geq 0$ , and consider the pair of stopped processes

(Y^{x}_{s\wedge\tau},Z_{s}\mathbf{1}_{[0,\tau]}),\quad s\in[0,t],

which solves the following finite-horizon BSDE

v(x)=\xi_{1}+\int_{0}^{\infty}e^{-\delta s}f^{u(X_{s},Z_{s})}(X_{s})\mathbf{1}_{[0,\tau]}\operatorname*{\mathrm{d\!}}s-\int_{0}^{\infty}Z_{s}\mathbf{1}_{[0,\tau]}\operatorname*{\mathrm{d\!}}W_{s},

where $\xi_{1}:=e^{-\delta\tau}v(X_{\tau})$ . On the other hand, consider another pair of stopped processes

\left(e^{-\delta s}\varphi(X_{s\wedge\tau}),e^{-\delta s}(\sigma D_{x}\varphi)(X_{s})\mathbf{1}_{[0,\tau]}\right),\quad s\in[0,t],

Apply the Itô’s formula to $e^{-\delta s}\varphi(X_{s})$ , then

\varphi(x)=\xi_{2}-\int_{0}^{\infty}\mathbf{1}_{[0,\tau]}e^{-\delta s}\Big(b^{u^{\prime}}D_{x}\varphi+\frac{1}{2}\sigma^{2}D_{xx}\varphi-\delta\varphi\Big)(X_{s})\operatorname*{\mathrm{d\!}}s-\int_{0}^{\infty}e^{-\delta s}(\sigma D_{x}\varphi)(X_{s})\mathbf{1}_{[0,\tau]}\operatorname*{\mathrm{d\!}}W_{s},

where $\xi_{2}:=e^{-\delta\tau}\varphi(X_{\tau})$ . Then, since $v(X_{s})\geq\varphi(X_{s})$ for all $x\in[0,\tau]$ , we have $\xi_{1}\geq\xi_{2}$ . In addition, by (D.11), one has

f^{u\big(X_{s},(\sigma D_{x}\varphi)(X_{s})\big)}(X_{s})>-\Big(b^{u}D_{x}\varphi+\frac{1}{2}\sigma^{2}D_{xx}\varphi-\delta\varphi\Big)(X_{s}),\quad s\in[0,\tau).

Hence, by Theorem D.3 (with a further argument of strict comparison principle, see e.g. (Pham, , 2009, Theorem 6.2.2)), we have $v(x)>\varphi(x)$ , which is a contradiction. Hence, we have the supersolution inequality.

D.3 Proof of Theorem 4.1

Let $v^{n}$ be the smooth solution to the Bellman-type ODE (4.1), and recall the updated control at $n$ th iteration

u^{n}(x):=\operatorname*{arg\,min}_{u\in U}\big\{(b^{u}(x)D_{x}v^{n-1}(x)+f^{u}(x)\big\}=u\big(x,\sigma(x)D_{x}v^{n-1}(x)\big).

Define $X:=X^{x,u^{*}}$ as the solution to the SDE (2.3) started from $X_{0}=x$ and controlled by the optimal control policy $u^{*}$ . Apply the Itô’s formula to $v^{n}(X_{s})$ , we have

\begin{split}\operatorname*{\mathrm{d\!}}v^{n}(X_{s})&=\left(\frac{1}{2}(\sigma^{2}D_{xx}v^{n})(X_{s})+(b^{u^{*}}D_{x}v^{n})(X_{s})\right)\operatorname*{\mathrm{d\!}}s+(\sigma D_{x}v^{n})(X_{s})\operatorname*{\mathrm{d\!}}W_{s}\\ &=\left((b^{u^{*}}D_{x}v^{n})(X_{s})-(b^{u^{n}}D_{x}v^{n})(X_{s})-f^{u^{n}}(X_{s})+\delta v^{n}(X_{s})\right)\operatorname*{\mathrm{d\!}}s+(\sigma D_{x}v^{n})(X_{s})\operatorname*{\mathrm{d\!}}W_{s},\end{split}

(D.12)

where the second equality holds since $v^{n}$ is a solution to the ODE (4.1). Let’s define for,

	$\displaystyle Y^{n}_{s}$	$\displaystyle:=v^{n}(X_{s})$
	$\displaystyle Z_{s}^{n}$	$\displaystyle:=\big(\sigma D_{x}v^{n}\big)(X_{s}),$
	$\displaystyle F_{s}(z,Z)$	$\displaystyle:=(b^{u(X_{s},z)}\sigma^{-1})(X_{s})Z+f^{u(X_{s},z)}(X_{s}).$

Hence, we can write (D.12) as

\operatorname*{\mathrm{d\!}}Y^{n}_{s}=\Big((b^{u^{*}}D_{x}v^{n})(X_{s})-F_{s}(Z^{n-1}_{s},Z^{n}_{s})+\delta Y^{n}_{s}\Big)\operatorname*{\mathrm{d\!}}s+Z^{n}_{s}\operatorname*{\mathrm{d\!}}W_{s}.

(D.13)

Let $\delta>0$ be a discounting rate, and define

\widetilde{Y}^{n}_{s}:=e^{-\delta s}Y^{n}_{s},\quad\widetilde{Z}^{\cdot}_{s}:=e^{-\delta s}Z^{\cdot}_{s},\quad\widetilde{F}_{s}(z,Z):=e^{-\delta s}F_{s}(e^{\delta s}z,e^{\delta s}Z).

(D.14)

Then, we can rewrite the above BSDE (D.13) as

\operatorname*{\mathrm{d\!}}\widetilde{Y}^{n}_{s}=\Big(e^{-\delta s}(b^{u^{*}}D_{x}v^{n})(X_{s})-\widetilde{F}_{s}(\widetilde{Z}^{n-1}_{s},\widetilde{Z}^{n}_{s})\Big)\operatorname*{\mathrm{d\!}}s+\widetilde{Z}^{n}_{s}\operatorname*{\mathrm{d\!}}W_{s}.

Moreover, one observes that

\mathbb{P}\Big(\lim_{t\to\infty}\widetilde{Y}^{n}_{t}=0\Big)=\mathbb{P}\Big(\lim_{t\to\infty}e^{-\delta t}v^{n}(X_{t})=0\Big)=1,

since the value function is finite and we have verified that $X_{t}\in(0,1)$ almost surely for any $t\geq 0$ (see, Proposition 2.1). Hence, we have

\widetilde{Y}^{n}_{t}=\int_{t}^{\infty}\Big(\widetilde{F}_{s}(\widetilde{Z}^{n-1}_{s},\widetilde{Z}^{n}_{s})-(b^{u^{*}}\sigma^{-1})(X_{s})\widetilde{Z}^{n}_{s}\Big)\operatorname*{\mathrm{d\!}}s-\int_{t}^{\infty}\widetilde{Z}^{n}_{s}\operatorname*{\mathrm{d\!}}W_{s}.

(D.15)

By Lemma D.1, we change the measure to $\widehat{\mathbb{P}}$ , and (D.15) can be rewritten as

\widetilde{Y}^{n}_{t}=\int_{t}^{\infty}\widetilde{F}_{s}(\widetilde{Z}^{n-1}_{s},\widetilde{Z}^{n}_{s})\operatorname*{\mathrm{d\!}}s-\int_{t}^{\infty}\widetilde{Z}^{n}_{s}\operatorname*{\mathrm{d\!}}\widehat{W}_{s}.

(D.16)

Similarly, consider the following BSDE with $v^{n}$ and $v^{n-1}$ replaced by the value function of our stochastic control problem $v$ (2.6) in (D.16):

\widetilde{Y}_{t}=\int_{t}^{\infty}\widetilde{F}_{s}(\widetilde{Z}_{s},\widetilde{Z}_{s})\operatorname*{\mathrm{d\!}}s-\int_{t}^{\infty}\widetilde{Z}_{s}\operatorname*{\mathrm{d\!}}\widehat{W}_{s}.

(D.17)

Note that for $\gamma<\delta$ , we have $(\widetilde{Y}^{n}_{\cdot},\widetilde{Z}^{n}_{\cdot}),\,(\widetilde{Y}_{\cdot},\widetilde{Z}_{\cdot})\in\mathbb{B}_{\gamma}[0,\infty]$ , and by Assumption D.2,

	$\displaystyle\|\widetilde{F}_{s}(Z_{s},Z_{s})-$	$\displaystyle\widetilde{F}_{s}(Z_{s}^{n-1},Z_{s}^{n})\|$
		$\displaystyle=\Big\|(b^{u(X_{s},e^{\delta s}Z_{s})}\sigma^{-1})(X_{s})\cdot Z_{s}+e^{-\delta s}f^{u(X_{s},e^{\delta s}Z_{s})}(X_{s})$
		$\displaystyle\hskip 18.49988pt-(b^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}\sigma^{-1})(X_{s})\cdot Z^{n}_{s}+e^{-\delta s}f^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big\|$
		$\displaystyle\leq\Big\|\sigma^{-1}(X_{s})Z_{s}\Big\|\Big\|b^{u(X_{s},e^{\delta s}Z_{s})}(X_{s})-b^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big\|+\Big\|b^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big\|\Big\|Z_{s}-Z^{n}_{s}\Big\|$
		$\displaystyle\hskip 18.49988pt+e^{-\delta s}\Big\|f^{u(X_{s},e^{\delta s}Z_{s})}(X_{s})-f^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big\|$
		$\displaystyle\leq e^{-\delta s}C\theta\|e^{\delta s}Z_{s}-e^{\delta s}Z_{s}^{n-1}\|+K\|Z_{s}-Z_{s}^{n}\|+e^{-\delta s}\theta\|e^{\delta s}Z_{s}-e^{\delta s}Z_{s}^{n-1}\|$
		$\displaystyle=\theta(C+1)\|Z_{s}-Z_{s}^{n-1}\|+K\|Z_{s}-Z_{s}^{n}\|,$

where we have applied the fact that there is a constant $C>0$ , $|\sigma^{-1}(X_{s})Z_{s}|=e^{-\delta s}|D_{x}v(X_{s})|\leq e^{-\delta s}C$ (given the derivative exists) by the property of the value function given in Proposition 2.2. Hence, one may choose $\delta$ large enough such that Assumption D.3 holds for $\widetilde{F}$ . Then, by applying Lemmas D.4 and D.5, there is $0<\gamma<\delta$ ,

\widehat{\mathbb{E}}\Big[e^{2\gamma t}|Y_{t}-Y^{n}_{t}|^{2}\Big]+\|Z-Z^{n}\|^{2}_{\widehat{\mathbb{L}}^{2}_{\gamma}}\leq q\|Z-Z^{n-1}\|^{2}_{\widehat{\mathbb{L}}^{2}_{\gamma}},

(D.18)

for any $t\geq 0$ , where $\widehat{\mathbb{E}}$ and $\widehat{\mathbb{L}}^{2}_{\gamma}$ denote the expectation and $\mathbb{L}^{2}_{\gamma}$ -norm under measure $\widehat{\mathbb{P}}$ . In addition, by noting that the solution $Y^{x}_{0}$ and $Y^{x,n}_{0}$ are the value function $v(x)$ and approximation sequence at $n$ th iteration $v^{n}(x)$ in the Algorithm 1 (with a recursive argument, see e.g. Kerimkulov et al., (2020)), we have

|v(x)-v^{n}(x)|^{2}\leq q^{n}\widehat{\mathbb{E}}\Bigg[\int_{0}^{\infty}e^{2(\gamma-\delta)s}\Big|\sigma(X_{s})\Big|^{2}\,\Big|D_{x}v(X_{s})-D_{x}v^{0}(X_{s})\Big|^{2}\operatorname*{\mathrm{d\!}}s\Bigg].

Hence, we complete the proof.

References

Albrecher et al., (2022) Albrecher, H., Azcue, P., and Muler, N. (2022). Optimal ratcheting of dividends in a Brownian risk model. SIAM Journal on Financial Mathematics, 13(3):657–701.
Amin et al., (2009) Amin, S., Cárdenas, A. A., and Sastry, S. S. (2009). Safe and secure networked control systems under denial-of-service attacks. In International workshop on hybrid systems: computation and control, pages 31–45. Springer.
Antonio et al., (2021) Antonio, Y., Indratno, S. W., and Saputro, S. W. (2021). Pricing of cyber insurance premiums using a Markov-based dynamic model with clustering structure. PLoS One, 16(10):e0258867.
Asmussen and Taksar, (1997) Asmussen, S. and Taksar, M. (1997). Controlled diffusion models for optimal dividend pay-out. Insurance: Mathematics and Economics, 20(1):1–15.
Azcue and Muler, (2005) Azcue, P. and Muler, N. (2005). Optimal reinsurance and dividend distribution policies in the Cramér-Lundberg model. Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics, 15(2):261–308.
Bakdash et al., (2018) Bakdash, J. Z., Hutchinson, S., Zaroukian, E. G., Marusich, L. R., Thirumuruganathan, S., Sample, C., Hoffman, B., and Das, G. (2018). Malware in the future? Forecasting of analyst detection of cyber events. Journal of Cybersecurity, 4(1):tyy007.
Barnett et al., (2023) Barnett, M., Buchak, G., and Yannelis, C. (2023). Epidemic responses under uncertainty. Proceedings of the National Academy of Sciences, 120(2):e2208111120.
Bellman, (1955) Bellman, R. (1955). Functional equations in the theory of dynamic programming. v. positivity and quasi-linearity. Proceedings of the National Academy of Sciences, 41(10):743–746.
Bellman, (1966) Bellman, R. (1966). Dynamic programming. Science, 153(3731):34–37.
Böhme and Kataria, (2006) Böhme, R. and Kataria, G. (2006). Models and measures for correlation in cyber-insurance. In Weis, volume 2(1), page 3.
Böhme et al., (2010) Böhme, R., Schwartz, G., et al. (2010). Modeling cyber-insurance: towards a unifying framework. In WEIS.
Boukanjime et al., (2021) Boukanjime, B., El-Fatini, M., Laaribi, A., Taki, R., and Wang, K. (2021). A Markovian regime-switching stochastic hybrid time-delayed epidemic model with vaccination. Automatica, 133:109881.
Cárdenas et al., (2011) Cárdenas, A. A., Amin, S., Lin, Z.-S., Huang, Y.-L., Huang, C.-Y., and Sastry, S. (2011). Attacks against process control systems: risk assessment, detection, and response. In Proceedings of the 6th ACM symposium on information, computer and communications security, pages 355–366.
Cebula and Young, (2010) Cebula, J. L. and Young, L. R. (2010). A taxonomy of operational cyber security risks. Technical report, Carnegie Mellon University. CMU/SEI-2010-TN-028.
Chen et al., (2025) Chen, J., Feng, K., Freddi, L., Goreac, D., and Li, J. (2025). Optimality of vaccination for prevalence-constrained SIRS epidemics. Applied Mathematics & Optimization, 91(1):1–26.
Crandall et al., (1992) Crandall, M. G., Ishii, H., and Lions, P.-L. (1992). User’s guide to viscosity solutions of second order partial differential equations. Bulletin of the American mathematical society, 27(1):1–67.
Crandall and Lions, (1983) Crandall, M. G. and Lions, P.-L. (1983). Viscosity solutions of Hamilton-Jacobi equations. Transactions of the American mathematical society, 277(1):1–42.
Cremer et al., (2022) Cremer, F., Sheehan, B., Fortmann, M., Kia, A. N., Mullins, M., Murphy, F., and Materne, S. (2022). Cyber risk and cybersecurity: a systematic review of data availability. The Geneva papers on risk and insurance. Issues and practice, 47(3):698.
Dou et al., (2020) Dou, W., Tang, W., Wu, X., Qi, L., Xu, X., Zhang, X., and Hu, C. (2020). An insurance theory based optimal cyber-insurance contract against moral hazard. Information Sciences, 527:576–589.
Eling and Jung, (2018) Eling, M. and Jung, K. (2018). Copula approaches for modeling cross-sectional dependence of data breach losses. Insurance: Mathematics and Economics, 82:167–180.
Eling and Loperfido, (2017) Eling, M. and Loperfido, N. (2017). Data breaches: Goodness of fit, pricing, and risk measurement. Insurance: Mathematics and Economics, 75:126–136.
Fahrenwaldt et al., (2018) Fahrenwaldt, M. A., Weber, S., and Weske, K. (2018). Pricing of cyber insurance contracts in a network model. ASTIN Bulletin: The Journal of the IAA, 48(3):1175–1218.
Fawzi et al., (2014) Fawzi, H., Tabuada, P., and Diggavi, S. (2014). Secure estimation and control for cyber-physical systems under adversarial attacks. IEEE Transactions on Automatic control, 59(6):1454–1467.
Federico et al., (2024) Federico, S., Ferrari, G., and Torrente, M.-L. (2024). Optimal vaccination in a SIRS epidemic model. Economic Theory, 77(1):49–74.
Fuhrman and Tessitore, (2004) Fuhrman, M. and Tessitore, G. (2004). Infinite horizon backward stochastic differential equations and elliptic equations in Hilbert spaces. The Annals of Probability, 32(1B):607 – 660.
Garcia-Teodoro et al., (2009) Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., and Vázquez, E. (2009). Anomaly-based network intrusion detection: Techniques, systems and challenges. Computers & Security, 28(1-2):18–28.
Gil et al., (2014) Gil, S., Kott, A., and Barabási, A.-L. (2014). A genetic epidemiology approach to cyber-security. Scientific Reports, 4(1):5659.
Gordon et al., (2003) Gordon, L. A., Loeb, M. P., and Sohail, T. (2003). A framework for using insurance for cyber-risk management. Communications of the ACM, 46(3):81–85.
Gray et al., (2011) Gray, A., Greenhalgh, D., Hu, L., Mao, X., and Pan, J. (2011). A stochastic differential equation SIS epidemic model. SIAM Journal on Applied Mathematics, 71(3):876–902.
Hamadène et al., (1999) Hamadène, S., Lepeltier, J.-P., and Wu, Z. (1999). Infinite horizon reflected backward stochastic differential equations and applications in mixed control and game problems. Probability and Mathematical Statistics, 19(2):211–234.
He et al., (2024) He, R., Jin, Z., and Li, J. S.-H. (2024). Modeling and management of cyber risk: a cross-disciplinary review. Annals of Actuarial Science, 18(2):270–309.
Herath and Herath, (2011) Herath, H. and Herath, T. (2011). Copula-based actuarial model for pricing cyber-insurance policies. Insurance markets and companies: analyses and actuarial computations, 2(1):7–20.
Hillairet and Lopez, (2021) Hillairet, C. and Lopez, O. (2021). Propagation of cyber incidents in an insurance portfolio: counting processes combined with compartmental epidemiological models. Scandinavian Actuarial Journal, 2021(8):671–694.
Hillairet et al., (2022) Hillairet, C., Lopez, O., d’Oultremont, L., and Spoorenberg, B. (2022). Cyber-contagion model with network structure applied to insurance. Insurance: Mathematics and Economics, 107:88–101.
Howard, (1960) Howard, R. A. (1960). Dynamic programming and Markov processes. John Wiley.
Jang-Jaccard and Nepal, (2014) Jang-Jaccard, J. and Nepal, S. (2014). A survey of emerging threats in cybersecurity. Journal of Computer and System Sciences, 80(5):973–993.
Karatzas and Shreve, (1991) Karatzas, I. and Shreve, S. E. (1991). Brownian Motion and Stochastic Calculus, volume 113 of Graduate Texts in Mathematics. Springer, New York, NY, 2 edition.
Kerimkulov et al., (2020) Kerimkulov, B., Siska, D., and Szpruch, L. (2020). Exponential convergence and stability of howard’s policy improvement algorithm for controlled diffusions. SIAM Journal on Control and Optimization, 58(3):1314–1340.
Krylov, (1980) Krylov, N. V. (1980). Controlled diffusion processes. Springer.
Krylov, (2002) Krylov, N. V. (2002). Introduction to the Theory of Random Processes. American Mathematical Society, Providence, Rhode Island.
Liu et al., (2025) Liu, W., Li, L., Sun, J., Deng, F., Wang, G., and Chen, J. (2025). Data-driven control against false data injection attacks. Automatica, 179:112399.
Liu et al., (2016) Liu, Y., Dong, M., Ota, K., and Liu, A. (2016). Activetrust: Secure and trustable routing in wireless sensor networks. IEEE Transactions on Information Forensics and Security, 11(9):2013–2027.
Malavasi et al., (2022) Malavasi, M., Peters, G. W., Shevchenko, P. V., Trück, S., Jang, J., and Sofronov, G. (2022). Cyber risk frequency, severity and insurance viability. Insurance: Mathematics and Economics, 106:90–114.
Mishra and Pandey, (2014) Mishra, B. K. and Pandey, S. K. (2014). Dynamic model of worm propagation in computer network. Applied Mathematical Modelling, 38(7-8):2173–2179.
Moore et al., (2006) Moore, D., Shannon, C., Brown, D. J., Voelker, G. M., and Savage, S. (2006). Inferring internet denial-of-service activity. ACM Transactions on Computer Systems (TOCS), 24(2):115–139.
Mukhopadhyay et al., (2013) Mukhopadhyay, A., Chatterjee, S., Saha, D., Mahanti, A., and Sadhukhan, S. K. (2013). Cyber-risk decision models: To insure it or not? Decision Support Systems, 56:11–26.
Nguyen-Ngoc and Yor, (2004) Nguyen-Ngoc, L. and Yor, M. (2004). Some martingales associated to reflected Lévy processes. In Séminaire de Probabilités XXXVIII, pages 42–69. Springer.
Öğüt et al., (2011) Öğüt, H., Raghunathan, S., and Menon, N. (2011). Cyber security risk management: Public policy implications of correlated risk, imperfect ability to prove loss, and observability of self-protection. Risk Analysis: An International Journal, 31(3):497–512.
Pasqualetti et al., (2013) Pasqualetti, F., Dörfler, F., and Bullo, F. (2013). Attack detection and identification in cyber-physical systems. IEEE Transactions on Automatic Control, 58(11):2715–2729.
Paté-Cornell et al., (2018) Paté-Cornell, M.-E., Kuypers, M., Smith, M., and Keller, P. (2018). Cyber risk management for critical infrastructure: a risk analysis model and three case studies. Risk Analysis, 38(2):226–241.
Pham, (2009) Pham, H. (2009). Continuous-time stochastic control and optimization with financial applications, volume 61. Springer Science & Business Media.
Puterman, (1981) Puterman, M. (1981). On the convergence of policy iteration for controlled diffusions. Journal of Optimization Theory and Applications, 33(1):137–144.
Puterman and Brumelle, (1979) Puterman, M. L. and Brumelle, S. L. (1979). On the convergence of policy iteration in stationary dynamic programming. Mathematics of Operations Research, 4(1):60–69.
Santos and Rust, (2004) Santos, M. S. and Rust, J. (2004). Convergence properties of policy iteration. SIAM Journal on Control and Optimization, 42(6):2094–2115.
Sonveaux and Winkin, (2023) Sonveaux, C. and Winkin, J. J. (2023). State feedback control law design for an age-dependent SIR model. Automatica, 158:111297.
Stoneburner et al., (2002) Stoneburner, G., Goguen, A., Feringa, A., et al. (2002). Risk management guide for information technology systems. Nist special publication, 800(30):800–30.
Touzi, (2012) Touzi, N. (2012). Optimal stochastic control, stochastic target problems, and backward SDE, volume 29. Springer Science & Business Media.
Tran and Yin, (2021) Tran, K. and Yin, G. (2021). Optimal control and numerical methods for hybrid stochastic SIS models. Nonlinear Analysis: Hybrid Systems, 41:101051.
Wakaiki et al., (2019) Wakaiki, M., Cetinkaya, A., and Ishii, H. (2019). Stabilization of networked control systems under DoS attacks and output quantization. IEEE Transactions on Automatic Control, 65(8):3560–3575.
Wang et al., (2024) Wang, W., Xu, R., and Yan, K. (2024). Optimal ratcheting of dividends with capital injection. Mathematics of Operations Research.
Xu and Hua, (2019) Xu, M. and Hua, L. (2019). Cybersecurity insurance: Modeling and pricing. North American Actuarial Journal, 23(2):220–249.
Yong and Zhou, (1999) Yong, J. and Zhou, X. Y. (1999). Stochastic controls: Hamiltonian systems and HJB equations, volume 43. Springer Science & Business Media.
Zhan et al., (2015) Zhan, Z., Xu, M., and Xu, S. (2015). Predicting cyber attack rates with extreme values. IEEE Transactions on Information Forensics and Security, 10(8):1666–1677.

	$\displaystyle\|b(x,\eta,\rho)-b(y,\eta,\rho)\|$	$\displaystyle=\left\|\left(\eta^{2}\beta-\eta\alpha-(\gamma+\rho)\right)(x-y)-\eta^{2}\beta(x^{2}-y^{2})\right\|$
		$\displaystyle\leq\left\|\left(\eta^{2}\beta-\eta\alpha-(\gamma+\rho)\right)\right\|\|x-y\|+\eta^{2}\beta(x+y)\|x-y\|$
		$\displaystyle\leq\Big(\left\|\left(\eta^{2}\beta-\eta\alpha-(\gamma+\rho)\right)\right\|+2\eta^{2}\beta\Big)\|x-y\|,$

		$\displaystyle\|J(x;\eta,\rho)-J(y;\eta,\rho)\|$
		$\displaystyle\leq\mathbb{E}\left[\int_{0}^{\infty}e^{-\delta t}\left\|f(X_{t}^{x,\eta,\rho},\eta_{t},\rho_{t})-f(X_{t}^{y,\eta,\rho},\eta_{t},\rho_{t})\right\|\operatorname*{\mathrm{d\!}}t\right]$
		$\displaystyle\leq K\mathbb{E}\left[\int_{0}^{T}e^{-\delta t}(1+\|X_{t}^{x,\eta,\rho}\|^{m}+\|X_{t}^{y,\eta,\rho}\|^{m})\|X_{t}^{x,\eta,\rho}-X_{t}^{y,\eta,\rho}\|\operatorname{\mathrm{d\!}}t\right]+2C_{f}\int_{T}^{\infty}e^{-\delta t}\operatorname{\mathrm{d\!}}t$
		$\displaystyle\leq C_{K,m}\left(\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2m}}(1+\|X_{t}^{x,\eta,\rho}\|^{m+1}+\|X_{t}^{y,\eta,\rho}\|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg]\right)^{\frac{m}{m+1}}$
		$\displaystyle\qquad\times\left(\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2}}(\|X_{t}^{x,\eta,\rho}-X_{t}^{y,\eta,\rho}\|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg]\right)^{\frac{1}{m+1}}+2C_{f}e^{-\delta T}.$		(A.2)

	$\displaystyle\|u(x,z)-u(x^{\prime},z)\|$	$\displaystyle\leq K\|x-x^{\prime}\|,$
	$\displaystyle\|u(x,z)-u(x,z^{\prime})\|$	$\displaystyle\leq\sqrt{\theta}\|z-z^{\prime}\|.$

	$\displaystyle\|\widetilde{F}_{s}(Z_{s},Z_{s})-$	$\displaystyle\widetilde{F}_{s}(Z_{s}^{n-1},Z_{s}^{n})\|$
		$\displaystyle=\Big\|(b^{u(X_{s},e^{\delta s}Z_{s})}\sigma^{-1})(X_{s})\cdot Z_{s}+e^{-\delta s}f^{u(X_{s},e^{\delta s}Z_{s})}(X_{s})$
		$\displaystyle\hskip 18.49988pt-(b^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}\sigma^{-1})(X_{s})\cdot Z^{n}_{s}+e^{-\delta s}f^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big\|$
		$\displaystyle\leq\Big\|\sigma^{-1}(X_{s})Z_{s}\Big\|\Big\|b^{u(X_{s},e^{\delta s}Z_{s})}(X_{s})-b^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big\|+\Big\|b^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big\|\Big\|Z_{s}-Z^{n}_{s}\Big\|$
		$\displaystyle\hskip 18.49988pt+e^{-\delta s}\Big\|f^{u(X_{s},e^{\delta s}Z_{s})}(X_{s})-f^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big\|$
		$\displaystyle\leq e^{-\delta s}C\theta\|e^{\delta s}Z_{s}-e^{\delta s}Z_{s}^{n-1}\|+K\|Z_{s}-Z_{s}^{n}\|+e^{-\delta s}\theta\|e^{\delta s}Z_{s}-e^{\delta s}Z_{s}^{n-1}\|$
		$\displaystyle=\theta(C+1)\|Z_{s}-Z_{s}^{n-1}\|+K\|Z_{s}-Z_{s}^{n}\|,$