Cyber Risk Management and Mitigation Under Controlled Stochastic SIS Model

Shize Na Department of Financial and Actuarial Mathematics, Xi’an Jiaotong-Liverpool University, Suzhou, P.R. China, 215123. Zhuo Jin Department of Actuarial Studies and Business Analytics, Macquarie University, NSW 2113, Australia Ran Xu Corresponding author.
E-mail address:
Shize.Na18@student.xjtlu.edu.cn(S. Na),
zhuo.jin@mq.edu.au (Z. Jin),
Ran.Xu@xjtlu.edu.cn(R. Xu),
Hailiang.Yang@xjtlu.edu.cn(H. Yang). Department of Financial and Actuarial Mathematics, Xi’an Jiaotong-Liverpool University, Suzhou, P.R. China, 215123.
Hailiang Yang Department of Financial and Actuarial Mathematics, Xi’an Jiaotong-Liverpool University, Suzhou, P.R. China, 215123.
(September 27, 2025)
Abstract

In this paper, we formulate cyber risk management and mitigation as a stochastic optimal control problem under a stochastic Susceptible-Infected-Susceptible (SIS) epidemic model. To capture the dynamics and interplay of management and mitigation strategies, we introduce two stochastic controls: (i) a proactive risk management control to reduce external cyber attacks and internal contagion effects, and (ii) a reactive mitigation control to accelerate system recovery from cyber infection. The interplay between these controls is modeled by minimizing the expected discounted running costs, which balance proactive management expenses against reactive mitigation expenditures. We derive the associated Hamilton-Jacobi-Bellman (HJB) equation and characterize the value function as its unique viscosity solution. For numerical implementation, we propose a Policy Improvement Algorithm (PIA) and prove its convergence via Backward Stochastic Differential Equations (BSDEs). Finally, we present numerical results through a benchmark example, suboptimal control analysis, sensitivity analysis, and comparative statics.

Keywords: Cyber risk modeling, stochastic SIS model, stochastic control, policy improvement algorithm

1 Introduction

Cybersecurity has emerged as a critical concern in the industrial, financial, insurance, and governmental sectors due to the increasing digitization and interconnectedness of various systems. Currently, there is no unanimous definition of cyber risk. Some widely accepted definitions include but not limit to: Cyber risks are operational risks that may result in potential violation of confidentiality, availability, or integrity of information systems (Cebula and Young, , 2010); it is a financial risk associated with network and computer incidents and leading to the failure of information systems (Böhme and Kataria, , 2006; Böhme et al., , 2010).

In academia, the study of cyber risk has attracted the attention of researchers across many fields. For example, researchers in computer science have been aware of the importance of security in cyberspace and have made many contributions concerning cyber risk detection (Moore et al., , 2006; Garcia-Teodoro et al., , 2009; Cárdenas et al., , 2011; Liu et al., , 2016), security breach prediction (Zhan et al., , 2015; Bakdash et al., , 2018), and computer system enhancement (Jang-Jaccard and Nepal, , 2014). In control systems and automation, researchers are more interested in the design of various optimal (deterministic) controls against denial-of-service and/or false data injection types of cyber attacks to the cyber-physical systems, to name a few, see e.g., Amin et al., (2009); Pasqualetti et al., (2013); Fawzi et al., (2014); Wakaiki et al., (2019); Liu et al., (2025) and associated references. In the field of business and corporate finance, the studies focus on investigating cyber risk under the framework of enterprise risk management (Stoneburner et al., , 2002; Gordon et al., , 2003; Öğüt et al., , 2011; Paté-Cornell et al., , 2018). In addition, in insurance and actuarial science, researchers are more interested in modeling the cyber risk in terms of its frequency, severity, and dependence with a range of statistical and stochastic techniques, for example, Herath and Herath, (2011); Mukhopadhyay et al., (2013); Eling and Loperfido, (2017); Eling and Jung, (2018); Xu and Hua, (2019); Dou et al., (2020); Malavasi et al., (2022) and references therein. For a recent comprehensive cross-disciplinary review on modeling and management of cyber risk can refer to He et al., (2024), and for the review of data availability associated with cyber risk, one can refer to Cremer et al., (2022).

Traditional cyber risk models (especially in the community of insurance and actuarial science) are adept at capturing statistical properties and temporal trends of losses, yet they often fail to account for the propagation of cyber threats. By studying how cyber risks spread in a closed system, one can identify critical factors that determine the scale of contagion and the magnitude of the losses. This deeper understanding enhances the design of risk management and mitigation strategies. It is noted that the parallels between cyber risk and disease spread make epidemiology a natural framework for a similar analysis in cyber risk, given that both phenomena involve contagion, interdependent exposure, and dynamic evolution over time.

Inspired by methods from genetic epidemiology, Gil et al., (2014) introduced a statistical framework to assess the susceptibility of individual nodes in a network. Their approach treats the services running on a host as the defining risk factor for cyber threat exposure, drawing an analogy to genetic penetrance models. This conceptual borrowing allows for a structured analysis of how certain network configurations increase vulnerability. In addition, Liu et al., (2016) developed an innovative compartmental model for malware propagation by adapting epidemiological concepts to cybersecurity, where the computers are recognized as heterogeneous nodes with different protection levels in the network. The model categorizes nodes into three distinct states: weakly protected susceptible (W-nodes), strongly protected susceptible (S-nodes), and infected (I-nodes). The authors discussed the malware-free equilibrium when the “basic reproduction number” (an important metric in epidemiological models) R0<1R_{0}<1 and R01R_{0}\geq 1 respectively. These findings align with earlier work by Mishra and Pandey, (2014), who employed a susceptible-exposed-infectious-susceptible-with-vaccination model to analyze worm propagation. More recently, Fahrenwaldt et al., (2018) models the spread of cyber infections with an interacting Markov chain and claims with a marked point process, where the spread process is of a pure Poisson jump process, whereas transitions are described by the susceptible-infected-susceptible (SIS) epidemic model. And, the dependence among different nodes (i.e., firms, computers, or devices) is modeled by an undirected network. Through a simulation study, the authors demonstrate that network topology plays a crucial role in determining insurance prices and designing effective risk management strategies. Xu and Hua, (2019) employs both Markov and non-Markov processes within an SIS network model. Their Markov formulation incorporates dual infection pathways, where Poisson processes are used to capture both internal network transmission and external threats. This formulation yields valuable dynamic upper bounds for infection probabilities and stationary probability estimates. The empirical validation in the study reveals the critical influence of recovery rates on insurance premium calculations. Later, Antonio et al., (2021) extends the Markov-based SIS model by incorporating network clustering coefficients, such that the model can capture the local network clustering that inhibits epidemic spread through modified transition probabilities. Using N-intertwined mean-field approximation, they derived dynamic infection probability bounds that improve premium accuracy when applied to both synthetic and real-world networks. Hillairet and Lopez, (2021) describes the spread of a cyber attack at a global level with a susceptible-infected-recovered (SIR) model, and approximates the cumulative number of individuals in each group with a Gaussian process. Furthermore, Hillairet et al., (2022) proposes a multi-group epidemic model to assess the impact of large-scale cyber attacks on insurance portfolios, which captures the interdependencies between actors and can be calibrated with limited data, enabling efficient scenario analysis of cyber events. For other relevant studies in recent literature, the readers may refer to He et al., (2024) and references therein.

Recent studies have significantly advanced the modeling of cyber risk propagation through epidemiological approaches, in particular, using SIS and its extended models with network-dependent interactions. While these models effectively capture contagion dynamics, dependence structures, and loss distributions, they remain descriptive rather than prescriptive. To be specific, many studies focus on predicting the spread of contagions and estimating losses, but fail to incorporate active intervention or control strategies to manage or mitigate the cyber risk in real-time. Hence, a critical limitation in existing cyber risk modeling with epidemic models is the absence of formal control mechanisms to identify optimal risk management and mitigation strategies. It is noted that this gap mirrors the recent developments in biological epidemic control literature, where the later works incorporate various optimal stochastic control (such as vaccination) into (non)stochastic SI(R)S models, see, for example, Boukanjime et al., (2021); Tran and Yin, (2021); Barnett et al., (2023); Sonveaux and Winkin, (2023); Federico et al., (2024); Chen et al., (2025).

Therefore, in this paper, we address such a gap by formalizing the cyber risk management and mitigation problem as a stochastic optimal control problem under a stochastic Susceptible-Infected-Susceptible (SIS) model. The objective of this paper is not to develop a sophisticated stochastic model that can fit perfectly to any existing cyber risk and cybersecurity data, but, rather, to provide a theoretical stochastic control framework for studying the cyber risk management and mitigation problems. The contributions of our research are summarized as follows:

  • Cyber risk has emerged as a critical threat to modern enterprises, governments, and financial systems, with the attacks illustrating potential contagion across various sectors. There is a clear trend in recent literature on employing stochastic processes, especially stochastic epidemic models, to capture the various contagion mechanisms of cyber risk. However, no studies have investigated the (optimal) risk management and risk mitigation strategies under such a stochastic contagion model. We bridge the gap in the literature by formulating the contagion dynamics of the cyber risks in a closed system as a controlled stochastic SIS model. Two distinct control variables are considered: one is a proactive control mechanism governing risk management, which reduces or prevents external cyber attacks and internal contagion effects, such as firewall upgrades, isolating infected systems, and disconnecting breached servers; another is reactive control mechanism associated with risk mitigation, which refers to the set of immediate tactical actions taken to recover the affected systems, for example, remove malware, close backdoor, or reset compromised credentials. Our work is the first to integrate these dual controls into a unified stochastic optimal control framework for cyber risk, providing a rigorous mathematical foundation for dynamic decision-making in cyber risk management and mitigation. Such a dual control framework can capture, to some extent, the trade-off between risk prevention/management and risk mitigation, which is a key challenge in practice, especially when facing resource constraints.

  • We characterize the value function(the expected discounted running costs) as the unique viscosity solution to the Hamilton-Jacobi-Bellman (HJB) equation derived from our stochastic control formulation for cyber risk management and mitigation. By applying viscosity theory, we establish the existence and uniqueness of the solution of the HJB equation under mild regularity assumptions. This analytical framework ensures numerical tractability, as the uniqueness property guarantees well-posedness for iterative computational methods via the dynamic programming principle. Moreover, our approach naturally accommodates extensions to more complex settings, including jump-diffusion processes and generalized cost structures, broadening its applicability beyond the current setup.

  • For the numerical implementation, we develop a Policy Improvement Algorithm (PIA) grounded in the Bellman-Howard policy iteration framework for such an infinite-time horizon stochastic cyber risk control problem. The algorithm demonstrates superior computational efficiency compared to other alternative numerical methods, such as Markov chain approximation with value iterations. The algorithm typically converges within a few iterative steps. In addition, we rigorously establish the algorithm’s convergence (at an exponential rate) and stability under error perturbations during iteration by mapping the value function at each step to the unique solution of some (iteratively defined) infinite-horizon Backward Stochastic Differential Equations (BSDEs). Given the inherent challenge of our underlying stochastic control problem (i.e., an infinite-time horizon problem with unknown boundary conditions), these results constitute a nontrivial extension of the finite-horizon frameworks, such as that of Kerimkulov et al., (2020). We further emphasize that our algorithm and the BSDE-based convergence and stability analysis can be further applied to those stochastic control problems with a random horizon or optimal stopping problems seamlessly. Finally, while the focus of this paper is on a one-dimensional diffusion process (specifically, the stochastic SIS model), all algorithmic and theoretical results extend directly to multi-dimensional controlled diffusion processes, enabling the applications of our results to more generalized compartmental models, for example, the Susceptible-Infectious-Recovered (SIR) model, the Susceptible-Infectious-Recovered-Vaccinated (SIRV) model, etc.

The remainder of the paper is organized as follows. Section 2 presents our controlled stochastic SIS model featuring dual control variables for cyber risk management and mitigation. We reformulate the problem as an optimal stochastic control problem under a (drift-controlled) diffusion model, and derive fundamental properties of the value function. Section 3 establishes our core theoretical contribution, where we prove that the value function can be characterized as the unique viscosity solution to the associated Hamilton-Jacobi-Bellman equation. In Section 4, we propose a numerical method, namely the Policy Improvement Algorithm, together with convergence results based on the theory of infinite-time horizon BSDEs. We also provide various numerical results on the optimal cyber risk management and mitigation strategies, including comprehensive sensitivity analysis and comparative statics. The conclusion is given in Section 5. Finally, some technical proofs are collected in the Appendices for completeness.

2 Stochastic SIS model and mathematical formulation

We start with a standard Susceptible-Infected-Susceptible(SIS) model, which can be characterized by two state variables, namely the number of susceptible nodes (for example, terminal computers or servers in a network) StS_{t}, and the number of cyber-infected nodes ItI_{t}. Assume the total number of nodes in the system at time tt is NtN_{t}. Similar to the usual step in studying the classical deterministic SIS model in the epidemiological literature, we normalize the system size to one such that we replace the susceptible and infected nodes StS_{t} and ItI_{t} in the system by st=St/Nts_{t}=S_{t}/N_{t} and it=It/Nti_{t}=I_{t}/N_{t}, respectively. In addition, we further introduce the stochastic version of the SIS model, see e.g. Tran and Yin, (2021); Barnett et al., (2023), and the dynamics of the state variables are expressed as

{dst=(α+βit)stdt+γitdtσstitdWt,dit=(α+βit)stdtγitdt+σstitdWt,s0+i0=1,\begin{cases}\operatorname*{\mathrm{d\!}}s_{t}=-(\alpha+\beta i_{t})s_{t}\operatorname*{\mathrm{d\!}}t+\gamma i_{t}\operatorname*{\mathrm{d\!}}t-\sigma s_{t}i_{t}\operatorname*{\mathrm{d\!}}W_{t},\\ \operatorname*{\mathrm{d\!}}i_{t}=(\alpha+\beta i_{t})s_{t}\operatorname*{\mathrm{d\!}}t-\gamma i_{t}\operatorname*{\mathrm{d\!}}t+\sigma s_{t}i_{t}\operatorname*{\mathrm{d\!}}W_{t},\\ s_{0}+i_{0}=1,\end{cases} (2.1)

where we introduce white noise shocks WtW_{t}, which is a parameter perturbation of β\beta with volatility σ\sigma. Here, α\alpha is the rate at which susceptible nodes (devices/users) are infected by external cyber threats (e.g., malicious emails,denial-of-service); β\beta denotes the rate at which the infected nodes propagate malware to susceptible nodes; and γ\gamma is a baseline (unassisted) recovery rate of the compromised nodes become functional and threat-free, which may be due to the baseline cybersecurity measures like antivirus or basic IT policies. For simplicity, we assume that all the abovementioned model parameters in (2.1) are constants.

Now, we extend (2.1) to a controlled stochastic SIS model with two control variables. On one hand, we allow the above SIS model to be controlled under cyber risk management protocols through proactive protection measures (see, e.g., Barnett et al., (2023) for a similar control variable in an epidemic study). Let ηt\eta_{t} be the fraction of nodes (both susceptible and cyber-infected) at time tt in the system under protection with certain measures, which can reduce the effectiveness of both the external cyber threats to susceptible nodes and propagation from cyber-infected nodes to susceptible nodes. Note that it is possible to introduce an extra parameter ζ[0,1]\zeta\in[0,1] and replace ηt\eta_{t} by (1ζηt)(1-\zeta\eta_{t}) in (2.1), which captures the incomplete effectiveness of the protection measure. We set ζ=1\zeta=1 and ignore this parameter in our study; for a detailed discussion, one can consult, for example, Barnett et al., (2023). On the other hand, we further replace γ\gamma in (2.1) by γ+ρt\gamma+\rho_{t}, where ρt\rho_{t} is a controlled recovery rate at time tt, which can be interpreted as the enhanced recovery rate due to certain reactive interventions to the cyber-infected nodes.

We assume that the pair of cyber risk management and mitigation control actions (η,ρ)(\eta_{\cdot},\rho_{\cdot}) takes value in a nonempty set UU in [0,1]×[0,)[0,1]\times[0,\infty). Then, the dynamics of the controlled stochastic SIS model can be described by the following system of stochastic differential equations(SDEs):

{dst=αηtstdtβηt2itstdt+(γ+ρt)itdtσstitdWt,dit=αηtstdt+βηt2itstdt(γ+ρt)itdt+σstitdWt,s0+i0=1.\begin{cases}\operatorname*{\mathrm{d\!}}s_{t}=-\alpha\eta_{t}s_{t}\operatorname*{\mathrm{d\!}}t-\beta\eta_{t}^{2}i_{t}s_{t}\operatorname*{\mathrm{d\!}}t+(\gamma+\rho_{t})i_{t}\operatorname*{\mathrm{d\!}}t-\sigma s_{t}i_{t}\operatorname*{\mathrm{d\!}}W_{t},\\ \operatorname*{\mathrm{d\!}}i_{t}=\alpha\eta_{t}s_{t}\operatorname*{\mathrm{d\!}}t+\beta\eta_{t}^{2}i_{t}s_{t}\operatorname*{\mathrm{d\!}}t-(\gamma+\rho_{t})i_{t}\operatorname*{\mathrm{d\!}}t+\sigma s_{t}i_{t}\operatorname*{\mathrm{d\!}}W_{t},\\ s_{0}+i_{0}=1.\end{cases} (2.2)

Note that the control η\eta is applied to both susceptible nodes and infected nodes, hence we have a quadratic form in each of the second term in (2.2).

According to a similar analysis in Theorem 2.1 of Tran and Yin, (2021), we can formulate the above controlled SIS model into a one-dimensional controlled diffusion process. To be specific, for any initial value (s0,i0)(0,1)2(s_{0},i_{0})\in(0,1)^{2} satisfying s0+i0=1s_{0}+i_{0}=1 and (ηt,ρt)(η0,ρ0)U(\eta_{t},\rho_{t})\equiv(\eta_{0},\rho_{0})\in U, the system of SDEs (2.2) has a unique strong global solution (st,it)t0(s_{t},i_{t})_{t\geq 0}, and st+it=1s_{t}+i_{t}=1 almost surely for any t0t\geq 0. Therefore, in the following, we focus on the dynamics of the fraction of cyber-infected nodes in the system,

dit=[αηt(1it)+βηt2it(1it)(γ+ρt)it]dt+σ(1it)itdWt.\operatorname*{\mathrm{d\!}}i_{t}=\Big[\alpha\eta_{t}(1-i_{t})+\beta\eta_{t}^{2}i_{t}(1-i_{t})-(\gamma+\rho_{t})i_{t}\Big]\operatorname*{\mathrm{d\!}}t+\sigma(1-i_{t})i_{t}\operatorname*{\mathrm{d\!}}W_{t}.

We shall provide a rigorous proof of the above assertion in Proposition 2.1 below. To proceed with our analysis under simplified notations, we reformulate the above cyber risk management and mitigation problem as a stochastic control problem for a general diffusion process given in (2.3) below.

Let (Ω,,𝔽={}t0,)(\Omega,\mathcal{F},\mathbb{F}=\{\mathcal{F}\}_{t\geq 0},\mathbb{P}) be a filtered probability space satisfying the usual condition, and on which a one–dimensional Brownian motion W=(Wt)t0W=(W_{t})_{t\geq 0} is defined. Let X=(Xt)t0X=(X_{t})_{t\geq 0} denote the fraction of cyber-infected nodes in the system at time tt, which is driven by the following controlled diffusion process:

dXt=b(Xt,ηt,ρt)dt+σ(Xt)dWt,X0=x(0,1),\operatorname*{\mathrm{d\!}}X_{t}=b(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}t+\sigma(X_{t})\operatorname*{\mathrm{d\!}}W_{t},\qquad X_{0}=x\in(0,1), (2.3)

where b(x,η,ρ)=ηα(1x)+x[η2β(1x)(γ+ρ)]b(x,\eta,\rho)=\eta\alpha(1-x)+x[\eta^{2}\beta(1-x)-(\gamma+\rho)], σ(x)=σx(1x)\sigma(x)=\sigma x(1-x) (with abuse of notation, we use the volatility parameter σ\sigma in (2.2) as the function σ()\sigma(\cdot) here in (2.3) for notation simplicity), and α,β,γ\alpha,\beta,\gamma and σ\sigma are model parameters in the stochastic SIS system (2.2). The control processes (η.,ρ.):+×ΩU(\eta_{.},\rho_{.}):\mathbb{R}_{+}\times\Omega\to U are progressively measurable, where UU is the control action space, we may just set U=[0,1]×[0,)U=[0,1]\times[0,\infty), +\mathbb{R}_{+} denotes the set [0,)[0,\infty). Furthermore, throughout the paper, we let x\mathbb{P}_{x} and 𝔼x\mathbb{E}_{x} denote the probability measure and expectation operator when X0=xX_{0}=x, respectively.

Assumption 2.1

The mappings xb(x,η,ρ)x\mapsto b(x,\eta,\rho) and xσ(x)x\mapsto\sigma(x) are continuous in xx, and the former is uniformly in the control (η,ρ)U(\eta,\rho)\in U. There exists a constant K1>0K_{1}>0 such that for any (η,ρ)U(\eta,\rho)\in U, and all x,y(0,1)x,y\in(0,1) we have

|b(x,η,ρ)b(y,η,ρ)|+|σ(x)σ(y)|K1|xy|.|b(x,\eta,\rho)-b(y,\eta,\rho)|+|\sigma(x)-\sigma(y)|\leq K_{1}|x-y|. (2.4)

Moreover, for all (η,ρ)U(\eta,\rho)\in U and x(0,1)x\in(0,1), it also holds that

|b(x,η,ρ)|K2(1+|x|+|η|+|ρ|),|b(x,\eta,\rho)|\leq K_{2}(1+|x|+|\eta|+|\rho|),

and

|σ(x)|K2(1+|x|),|\sigma(x)|\leq K_{2}(1+|x|),

for some K2>0K_{2}>0.

Assumption 2.1 holds obviously in our stochastic SIS model. To be specific, consider any x,y(0,1)x,y\in(0,1), one has

|b(x,η,ρ)b(y,η,ρ)|\displaystyle|b(x,\eta,\rho)-b(y,\eta,\rho)| =|(η2βηα(γ+ρ))(xy)η2β(x2y2)|\displaystyle=\left|\left(\eta^{2}\beta-\eta\alpha-(\gamma+\rho)\right)(x-y)-\eta^{2}\beta(x^{2}-y^{2})\right|
|(η2βηα(γ+ρ))||xy|+η2β(x+y)|xy|\displaystyle\leq\left|\left(\eta^{2}\beta-\eta\alpha-(\gamma+\rho)\right)\right||x-y|+\eta^{2}\beta(x+y)|x-y|
(|(η2βηα(γ+ρ))|+2η2β)|xy|,\displaystyle\leq\Big(\left|\left(\eta^{2}\beta-\eta\alpha-(\gamma+\rho)\right)\right|+2\eta^{2}\beta\Big)|x-y|,

and

|σ(x)σ(y)|=|σ(xy)σ(x2y2)|σ(1+(x+y))|xy|3σ|xy|.|\sigma(x)-\sigma(y)|=|\sigma(x-y)-\sigma(x^{2}-y^{2})|\leq\sigma\Big(1+(x+y)\Big)|x-y|\leq 3\sigma|x-y|.

Hence, by letting K1=max{(|(η2βηα(γ+ρ))|+2η2β),3σ}K_{1}=\max\left\{\Big(\left|\left(\eta^{2}\beta-\eta\alpha-(\gamma+\rho)\right)\right|+2\eta^{2}\beta\Big),3\sigma\right\}, one arrives at (2.4). Moreover, take any x(0,1)x\in(0,1) and (η,ρ)U(\eta,\rho)\in U, we have

|b(x,η,ρ)|=|ηα(1x)+x[η2β(1x)(γ+ρ)]||η[α(1x)+xβ(1x)]|+|x(γ+ρ)|(α+β)|η|+γ|x|+|ρ|K1(1+|x|+|η|+|ρ|),\begin{split}|b(x,\eta,\rho)|&=\left|\eta\alpha(1-x)+x\left[\eta^{2}\beta(1-x)-(\gamma+\rho)\right]\right|\\ &\leq\left|\eta[\alpha(1-x)+x\beta(1-x)]\right|+\left|x(\gamma+\rho)\right|\\ &\leq(\alpha+\beta)\cdot|\eta|+\gamma|x|+|\rho|\\ &\leq K_{1}\cdot(1+|x|+|\eta|+|\rho|),\end{split}

and

|σ(x)|=|σx(1x)|K2|x|,|\sigma(x)|=|\sigma x(1-x)|\leq K_{2}\cdot|x|,

for some K2>0K_{2}>0.

Now, we are ready to define the cost functional associated with our cyber risk management and mitigation problem (2.3) as

J(x;η,ρ):=𝔼x[0eδtf(Xt,ηt,ρt)dt],x(0,1),(η,ρ)U,J(x;\eta,\rho):=\mathbb{E}_{x}\left[\int_{0}^{\infty}e^{-\delta t}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}t\right],\quad x\in(0,1),(\eta,\rho)\in U, (2.5)

where δ>0\delta>0 is the discounting factor, f:(0,1)×[0,1]×[0,)[0,)f:(0,1)\times[0,1]\times[0,\infty)\to[0,\infty) is a running cost function. In addition, without loss of generality, we restrict our study to the set of admissible controls given below. Let 𝒰\mathcal{U} denotes all progressively measurable random processes (u,ν)={(ut,νt),t0}(u,\nu)=\{(u_{t},\nu_{t}),t\geq 0\} taking values in UU, and define the set of admissible control processes by

𝒰0:={(u,ν)𝒰:|us|+|νs|Mu,ν<a.s. uniformly ins+}.\mathcal{U}_{0}:=\left\{(u,\nu)\in\mathcal{U}:|u_{s}|+|\nu_{s}|\leq M_{u,\nu}<\infty\;\text{a.s. uniformly in}\;s\in\mathbb{R}_{+}\right\}.

Then, the value function is defined as

V(x):=inf(η,ρ)𝒰0J(x;η,ρ),x(0,1).V(x):=\inf_{(\eta,\rho)\in\mathcal{U}_{0}}J(x;\eta,\rho),\qquad x\in(0,1). (2.6)

To show the well-posedness of the controlled diffusion process (2.3), we prove the following proposition.

Proposition 2.1

For any control (η,ρ)𝒰0(\eta,\rho)\in\mathcal{U}_{0} and initial value X0=x(0,1)X_{0}=x\in(0,1), the controlled SDE (2.3) has a unique global positive solution XtX_{t} for all t+t\in\mathbb{R}_{+} such that

x(Xt(0,1),t+)=1.\mathbb{P}_{x}(X_{t}\in(0,1),\forall t\in\mathbb{R}_{+})=1.

Proof. The proof is similar to the proof of Theorem 3.1 in Gray et al., (2011). Let τ\tau be the explosion time of the XtX_{t} driven by (2.3). Since the drift and diffusion coefficients are locally Lipschitz, there exists a unique local solution XtX_{t} for t[0,τ)t\in[0,\tau). For any initial value X0=x(0,1)X_{0}=x\in(0,1), consider a sufficiently large N>0N>0 such that x(1N,11N)x\in(\frac{1}{N},1-\frac{1}{N}). Then, for each nNn\geq N, define the first exit time

τn:=inf{t[0,τ):Xt(1n,11n)},\tau_{n}:=\inf\Big\{t\in[0,\tau):X_{t}\notin\left(\frac{1}{n},1-\frac{1}{n}\right)\Big\},

with the convention that inf{}=\inf\{\emptyset\}=\infty. Obviously, τn\tau_{n} is increasing in nn, hence we let τ=limnτn\tau_{\infty}=\lim_{n\to\infty}\tau_{n}, and ττ\tau_{\infty}\leq\tau almost surely. Then, the rest is to show τ=\tau_{\infty}=\infty almost surely. We prove the statement by the method of contradiction. Assume that there exists T>0T>0 and ϵ(0,1)\epsilon\in(0,1) such that

x(τT)>ϵ.\mathbb{P}_{x}(\tau_{\infty}\leq T)>\epsilon.

Then, there is a sufficiently large n~N\tilde{n}\geq N such that x(τnT)ϵ\mathbb{P}_{x}(\tau_{n}\leq T)\geq\epsilon for all nn~n\geq\tilde{n}. We define an auxiliary function v(z):=1z+11zv(z):=\frac{1}{z}+\frac{1}{1-z} for z(0,1)z\in(0,1), then for any t[0,T]t\in[0,T] and nn~n\geq\tilde{n}, an application of the Dynkin’s formula gives,

𝔼x(v(Xtτn))=v(x)+𝔼x[0tτnη,ρv(Xs)ds],\mathbb{E}_{x}(v(X_{t\wedge\tau_{n}}))=v(x)+\mathbb{E}_{x}\left[\int_{0}^{t\wedge\tau_{n}}\mathcal{L}^{\eta,\rho}v(X_{s})\operatorname*{\mathrm{d\!}}s\right],

where η,ρ\mathcal{L}^{\eta,\rho} is the infinitesimal generator of XX for any fixed η\eta and ρ\rho given as

η,ρv(x)=\displaystyle\mathcal{L}^{\eta,\rho}v(x)= ηα(1x2+1(1x)2)(1x)+x(1x2+1(1x)2)[η2β(1x)(γ+ρ)]\displaystyle\eta\alpha\left(-\frac{1}{x^{2}}+\frac{1}{(1-x)^{2}}\right)(1-x)+x\left(-\frac{1}{x^{2}}+\frac{1}{(1-x)^{2}}\right)[\eta^{2}\beta(1-x)-(\gamma+\rho)]
+σ2x2(1x)2(1x3+1(1x)3).\displaystyle\quad+\sigma^{2}x^{2}(1-x)^{2}\left(\frac{1}{x^{3}}+\frac{1}{(1-x)^{3}}\right).

With some simple algebra, one has

η,ρv(x)ηα+η2β1x+γ+ρx+σ2(1x+11x)C(η,ρ)v(x),for x(0,1),\mathcal{L}^{\eta,\rho}v(x)\leq\frac{\eta\alpha+\eta^{2}\beta}{1-x}+\frac{\gamma+\rho}{x}+\sigma^{2}\left(\frac{1}{x}+\frac{1}{1-x}\right)\leq C(\eta,\rho)v(x),\quad\text{for }x\in(0,1),

where C(η,ρ)=max{ηα+η2β,γ+ρ}+σ2C(\eta,\rho)=\max\{\eta\alpha+\eta^{2}\beta,\gamma+\rho\}+\sigma^{2}. Take a finite modification of random process C(ηsτ,ρsτ),s0C(\eta_{s\wedge\tau},\rho_{s\wedge\tau}),s\geq 0 by defining

C~(ηsτ,ρsτ):={C(ηsτ,ρsτ),ωΩ,Mu,ν,ωΩΩ,\widetilde{C}(\eta_{s\wedge\tau},\rho_{s\wedge\tau}):=\begin{cases}C(\eta_{s\wedge\tau},\rho_{s\wedge\tau}),&\omega\in\Omega^{\prime},\\[4.30554pt] M_{u,\nu},&\omega\in\Omega\setminus\Omega^{\prime},\end{cases}

where x(C~(ηsτ,ρsτ)=C(ηsτ,ρsτ))=1\mathbb{P}_{x}\left(\widetilde{C}(\eta_{s\wedge\tau},\rho_{s\wedge\tau})=C(\eta_{s\wedge\tau},\rho_{s\wedge\tau})\right)=1 for every s0s\geq 0. Then C~=C\widetilde{C}=C almost surely, but C~\widetilde{C} is bounded by Mu,ν<M_{u,\nu}<\infty everywhere. By Tonelli’s theorem applied to the nonnegative process C~(ηsτn,ρsτn)v(Xsτn)\widetilde{C}(\eta_{s\wedge\tau_{n}},\rho_{s\wedge\tau_{n}})\cdot v(X_{s\wedge\tau_{n}}), we have

𝔼x[v(Xtτn)]=v(x)+0t𝔼x[C~(ηsτn,ρsτn)v(Xsτn)]dsv(x)+0t𝔼x[supωΩ{C~(ηsτn,ρsτn)}v(Xsτn)]dsv(x)+Mu,ν0t𝔼x[v(Xsτn)]ds,\begin{split}\mathbb{E}_{x}\big[v(X_{t\wedge\tau_{n}})\big]&=v(x)+\int_{0}^{t}\mathbb{E}_{x}\Big[\widetilde{C}(\eta_{s\wedge\tau_{n}},\rho_{s\wedge\tau_{n}})\cdot v(X_{s\wedge\tau_{n}})\Big]\,\operatorname*{\mathrm{d\!}}s\\ &\leq v(x)+\int_{0}^{t}\mathbb{E}_{x}\Big[\sup_{\omega\in\Omega}\left\{\widetilde{C}(\eta_{s\wedge\tau_{n}},\rho_{s\wedge\tau_{n}})\right\}\cdot v(X_{s\wedge\tau_{n}})\Big]\,\operatorname*{\mathrm{d\!}}s\\ &\leq v(x)+M_{u,\nu}\int_{0}^{t}\mathbb{E}_{x}\big[v(X_{s\wedge\tau_{n}})\big]\,\operatorname*{\mathrm{d\!}}s,\end{split}

where the last inequality follows from the boundedness of C~\widetilde{C} on all ωΩ\omega\in\Omega by the definition of the admissible control set 𝒰0\mathcal{U}_{0}. In particular, the joint measurability of C(ηsτn,ρsτn)C(\eta_{s\wedge\tau_{n}},\rho_{s\wedge\tau_{n}}) and v(Xsτn)v(X_{s\wedge\tau_{n}}) w.r.t. product σ\sigma-algebra ([0,u])u,u0\mathcal{B}([0,u])\otimes\mathcal{F}_{u},\forall u\geq 0, together with the bounds

C(ηsτn,ρsτn)γ+σ2>0,v(Xsτn)1n>0a.e.,C(\eta_{s\wedge\tau_{n}},\rho_{s\wedge\tau_{n}})\geq\gamma+\sigma^{2}>0,\quad v(X_{s\wedge\tau_{n}})\geq\frac{1}{n}>0\quad\text{a.e.},

ensure the validity of applying Tonelli’s theorem.

Then, with the help of Grönwall’s inequality, one has

v(x)eC¯T𝔼x(v(XTτn)𝔼x(𝟏{τnT}v(Xτn))=nx(τnT)nϵ,v(x)e^{\overline{C}T}\geq\mathbb{E}_{x}(v(X_{T\wedge\tau_{n}})\geq\mathbb{E}_{x}\Big(\bm{1}_{\{\tau_{n}\leq T\}}v(X_{\tau_{n}})\Big)=n\mathbb{P}_{x}(\tau_{n}\leq T)\geq n\epsilon, (2.7)

where we applied the fact that XτnX_{\tau_{n}} equals to either 1n\frac{1}{n} or 11n1-\frac{1}{n}. Then by sending nn\to\infty in (2.7), one arrives at the contradiction >v(x)eC¯T=\infty>v(x)e^{\overline{C}T}=\infty, and completes the proof.  

Assumption 2.2

There exists a constant K>0K>0, and mm\in\mathbb{N} such that for any pair of controls (η,ρ)U(\eta,\rho)\in U, we have

|f(x,η,ρ)f(y,η,ρ)|K(1+|x|m+|y|m)|xy|,|f(x,\eta,\rho)-f(y,\eta,\rho)|\leq K(1+|x|^{m}+|y|^{m})|x-y|,
|f(x,η,ρ)|K(1+|x|+|η|2+|ρ|2),|f(x,\eta,\rho)|\leq K(1+|x|+|\eta|^{2}+|\rho|^{2}),

for all x,y(0,1)x,y\in(0,1). Additionally, let δ>0\delta>0, the cost function f(x,η,ρ)f(x,\eta,\rho) is bounded by a constant CfC_{f} such that

𝔼x[0eδt|f(Xt,ηt,ρt)|dt]<.\mathbb{E}_{x}\left[\int_{0}^{\infty}e^{-\delta t}|f(X_{t},\eta_{t},\rho_{t})|\operatorname*{\mathrm{d\!}}t\right]<\infty.
Remark 2.1

The typical form of the cost function we shall investigate later in this paper can be expressed as

f(x,η,ρ):=a0+aIx+amS(1η)2+(amIamS)x(1η)2+arxρ2,f(x,\eta,\rho):=a_{0}+a_{I}x+a^{S}_{m}(1-\eta)^{2}+(a^{I}_{m}-a^{S}_{m})x(1-\eta)^{2}+a_{r}x\rho^{2}, (2.8)

where a0>0a_{0}>0 is the underlying marginal costs of running the system, aI>0a_{I}>0 is the marginal cost generated from infected nodes, amSa^{S}_{m} and amIa^{I}_{m} are the marginal costs associated with cyber risk management control for susceptible nodes and cyber-infected nodes, respectively. We shall assume that amI>amS>0a^{I}_{m}>a^{S}_{m}>0, which means that costs associated with management for infected nodes are, in general, higher than those for functional and threat-free nodes. Finally, ar>0a_{r}>0 is the marginal cost of cyber risk mitigation control. Not that Assumption 2.2 is satisfied for the class of cost functions defined in (2.8).

To proceed, we show some important properties of the value function in the following proposition. Throughout the paper, we assume that the Assumptions 2.1 and 2.2 hold true.

Proposition 2.2
  1. (i)

    If the running cost function f(x,,)f(x,\cdot,\cdot) is nondecreasing in xx for each pair of admissible controls, then V(x)V(x) is nondecreasing in xx for all x(0,1)x\in(0,1).

  2. (ii)

    For any x,y(0,1),mx,y\in(0,1),m\in\mathbb{N}, there exists a constant C>0C>0 such that

    |V(x)V(y)|C(1+xm+ym)|xy|.|V(x)-V(y)|\leq C(1+x^{m}+y^{m})|x-y|. (2.9)

Proof. See Appendix A.  

3 Hamilton-Jacobi-Bellman equation and the viscosity solution

In this section, we first state the dynamic programming equation associated with our stochastic control problem and derive (heuristically) the corresponding Hamilton-Jacobi-Bellman (HJB) equation. Let θ\theta be any 𝔽\mathbb{F}-stopping time, for any x(0,1)x\in(0,1), we have

V(x)=inf(η,ρ)𝒰0𝔼x[0θeδtf(Xt,ηt,ρt)dt+eδθV(Xθ)]V(x)=\inf_{(\eta,\rho)\in\mathcal{U}_{0}}\mathbb{E}_{x}\left[\int_{0}^{\theta}e^{-\delta t}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}t+e^{-\delta\theta}V(X_{\theta})\right] (3.1)

The proof of (3.1) for controlled diffusion processes follows the classical arguments in the literature, see for example, Theorem 3.1.6 of Krylov, (1980), where the main point is the continuity of the value function, which is the present case in our study.

If the value function is sufficiently smooth, then by following the standard arguments with the help of the dynamic programming principle and Itô’s formula, we obtain the following HJB equation,

inf(ρ,η)U{b(x,η,ρ)ux(x)+12σ2(x)uxx(x)δu(x)+f(x,η,ρ)}=0,x(0,1).\inf_{(\rho,\eta)\in U}\left\{b(x,\eta,\rho)u_{x}(x)+\frac{1}{2}\sigma^{2}(x)u_{xx}(x)-\delta u(x)+f(x,\eta,\rho)\right\}=0,\quad x\in(0,1). (3.2)

Note that the heuristic arguments verifying that value function VV is a classical solution to the HJB equation (3.2) assume in prior that VV is twice continuously differentiable on (0,1)(0,1), which is not the present case, where we only have Lipchitz continuity in VV. Hence, we adopt the concept of the viscosity solution introduced in Crandall and Lions, (1983) and characterize the optimal value function VV as the unique viscosity solution to the HJB equation (3.2).

Definition 3.1

Let u:(0,1)u:(0,1)\to\mathbb{R} be a locally Lipschitz continuous function.

  1. (i)

    uu is local viscosity supersolution of (3.2) at x(0,1)x\in(0,1), if for any φC2((0,1))\varphi\in C^{2}((0,1)) with uφu\geq\varphi, whenever uφu-\varphi attains a local minimum at xx, we have

    inf(ρ,η)U{b(x,η,ρ)φx(x)+12σ2(x)φxx(x)δφ(x)+f(x,η,ρ)}0.\inf_{(\rho,\eta)\in U}\left\{b(x,\eta,\rho)\varphi_{x}(x)+\frac{1}{2}\sigma^{2}(x)\varphi_{xx}(x)-\delta\varphi(x)+f(x,\eta,\rho)\right\}\leq 0.
  2. (ii)

    uu is a local viscosity subsolution of (3.2) at x(0,1)x\in(0,1), if for any ϕC2((0,1))\phi\in C^{2}((0,1)) with uϕu\leq\phi, whenever uϕu-\phi attains a local maximum at xx, we have

    inf(ρ,η)U{b(x,η,ρ)ϕx(x)+12σ2(x)ϕxx(x)δϕ(x)+f(x,η,ρ)}0.\inf_{(\rho,\eta)\in U}\left\{b(x,\eta,\rho)\phi_{x}(x)+\frac{1}{2}\sigma^{2}(x)\phi_{xx}(x)-\delta\phi(x)+f(x,\eta,\rho)\right\}\geq 0.
  3. (iv)

    uu is a viscosity solution of (3.2) on (0,1)(0,1) if it is both a viscosity supersolution and subsolution of (3.2) on all x(0,1)x\in(0,1).

We first provide the result regarding the existence of the viscosity solution for the HJB equation (3.2) on (0,1)(0,1).

Proposition 3.1

The value function VV is a viscosity solution of (3.2) on (0,1)(0,1).

Proof. See Appendix B.  

To prove the uniqueness, we introduce the following alternative definition of viscosity solution (for second-order differential equations), see for example Yong and Zhou, (1999). For any function uC((0,1))u\in C((0,1)) and x(0,1)x\in(0,1), the so-called second–order superdiffiential of uu at xx is defined as

Dx2,+u(x):={(p,q)×:limsuph0u(x+h)u(x)phq2h2h20},D^{2,+}_{x}u(x):=\Bigg\{(p,q)\in\mathbb{R}\times\mathbb{R}:\lim\sup_{h\to 0}\frac{u(x+h)-u(x)-ph-\frac{q}{2}h^{2}}{h^{2}}\leq 0\Bigg\},

and the second–order subdiffiential of uu at xx is defined as

Dx2,u(x):={(p,q)×:liminfh0u(x+h)u(x)phq2h2h20}.D^{2,-}_{x}u(x):=\Bigg\{(p,q)\in\mathbb{R}\times\mathbb{R}:\lim\inf_{h\to 0}\frac{u(x+h)-u(x)-ph-\frac{q}{2}h^{2}}{h^{2}}\geq 0\Bigg\}.

In addition, we let

F(x,u(x),ux(x),uxx(x)):=inf(ρ,η)U{b(x,η,ρ)ux(x)+12σ2(x)uxx(x)δu(x)+f(x,η,ρ)},F(x,u(x),u_{x}(x),u_{xx}(x)):=\inf_{(\rho,\eta)\in U}\left\{b(x,\eta,\rho)u_{x}(x)+\frac{1}{2}\sigma^{2}(x)u_{xx}(x)-\delta u(x)+f(x,\eta,\rho)\right\},

and rewrite (3.2) as

F(x,u(x),ux(x),uxx(x))=0,for x(0,1).F(x,u(x),u_{x}(x),u_{xx}(x))=0,\quad\text{for }x\in(0,1). (3.3)
Definition 3.2

A Lipschitz continuous function u:(0,1)u:(0,1)\to\mathbb{R} is called a viscosity supersolution of (3.3) at x(0,1)x\in(0,1) if

F(x,u(x),p,q)0,F(x,u(x),p,q)\leq 0,

for all (p,q)Dx2,u(x)(p,q)\in D^{2,-}_{x}u(x).

A Lipschitz continuous function u:(0,1)u:(0,1)\to\mathbb{R} is called a viscosity subsolution of (3.3) at x(0,1)x\in(0,1) if

F(x,u(x),p,q)0,F(x,u(x),p,q)\geq 0,

for all (p,q)Dx2,+u(x)(p,q)\in D^{2,+}_{x}u(x).

If uu is both a viscosity supersolution and a viscosity subsolution of (3.3) at x(0,1)x\in(0,1), then it is a viscosity solution at xx.

Proposition 3.2

(Comparison Principle) Let the increasing and Lipschitz continuous functions uu and vv be a viscosity supersolution and a viscosity subsolution of the HJB equation (3.2) (as well as (3.3)) respectively. For any closed interval 𝒪(0,1)\mathcal{O}\subset(0,1), if uvu\geq v on 𝒪\partial\mathcal{O}, then uvu\geq v on 𝒪\mathcal{O}.

Proof. See Appendix C.  

Proposition 3.3

Let vC((0,1))v\in C((0,1)) be any increasing and Lipschtiz continuous viscosity supersolution of (3.2), then v(x)V(x)v(x)\geq V(x) for all x(0,1)x\in(0,1).

Proof. The proof follows the standard arguments of using Itô’s formula with a density argument dealing with the nonsmoothness of any viscosity supersolution vv of (3.2), see, for example, Nguyen-Ngoc and Yor, (2004); Azcue and Muler, (2005). We omit the details here.  

By Proposition 3.3, for a sufficiently large closed interval 𝒪(0,1)\mathcal{O}\subset(0,1), we can characterize the value function VV as the viscosity solution of (3.2) with the smallest value on the boundary 𝒪\partial\mathcal{O} in the class of increasing and Lipschitz continuous viscosity solutions of (3.2). Let 𝒱\mathcal{V} denote the set of all increasing and Lipschitz continuous functions that are viscosity solutions of (3.2) on 𝒪\mathcal{O}, then, we characterize the value function VV as

V={v𝒱|v(x)h(x) for all h𝒱 and x𝒪}.V=\{v\in\mathcal{V}~|~v(x)\leq h(x)\text{ for all }h\in\mathcal{V}\text{ and }x\in\partial\mathcal{O}\}. (3.4)
Proposition 3.4

The value function VV characterized in (3.4) is the unique viscosity solution of (3.2) on 𝒪\mathcal{O}.

Proof. Consider another increasing and Lipschitz continuous viscosity solution φ\varphi of (3.2) on 𝒪\mathcal{O} satisfying (3.4). On one hand, φ\varphi is a viscosity supersolution, and by Proposition 3.3, we have φV\varphi\geq V on 𝒪\mathcal{O}. On the other hand, φ\varphi is also an increasing and Lipschitz continuous viscosity subsolution of (3.2) with the fact that φ(x)V(x)\varphi(x)\leq V(x) for x𝒪x\in\partial\mathcal{O} (since V𝒱V\in\mathcal{V}), then by the comparison principle given in Proposition 3.2, we have VφV\geq\varphi on 𝒪\mathcal{O}, whenever VV is considered as a viscosity supersolution and φ\varphi as a viscosity subsolution. Therefore, we obtain φ=V\varphi=V on 𝒪\mathcal{O} and complete the proof.  

4 Numerical algorithm and examples

4.1 Policy improvement algorithm

Obtaining a closed-form solution to (3.2) is seldom feasible; hence, in this paper, we apply a policy improvement algorithm, namely Bellman–Howard policy improvement/iteration algorithm (see Algorithm 1 below), to solve the cyber risk management and mitigation problem numerically. Iterative algorithms for solving optimal control problems can trace their origins to Bellman’s pioneering work Bellman, (1955, 1966), which introduced value iteration methods for finite space-time problems and established their convergence properties. Howard, (1960) later developed the policy improvement algorithm in the context of discrete space-time Markov decision processes (MDPs).

The proof of the convergence is of paramount importance in using policy improvement algorithms. Among the earliest convergence analyses for policy iteration in MDPs is the work of Puterman and Brumelle, (1979), which employed an abstract function–space framework applicable to both discrete and continuous settings. A key insight from their study is that policy iteration can be interpreted as a form of Newton’s method, inheriting similar convergence properties, i.e., whenever initialized near the true solution, the algorithm achieves quadratic convergence. Later Puterman, (1981) provides similar results on the convergence of the policy iteration algorithm for controlled diffusion processes. Further extensions were made by Santos and Rust, (2004), who examined discrete-time problems with continuous state and control spaces. Their work generalizes the results of Puterman and Brumelle, (1979), demonstrating global convergence while retaining local quadratic convergence rate under standard conditions and superlinear convergence rate under broader assumptions. More recently, under a setting of continuous space–time controlled diffusion processes, Kerimkulov et al., (2020) established a global rate of convergence and stability of the Bellman–Howard policy iteration algorithm with the help of techniques in Backward Stochastic Differential Equations (BSDEs). Therefore, we follow the main steps in Kerimkulov et al., (2020) when proving the convergence of our policy iteration algorithm, the main theorem is given in Theorem 4.1 below. Note that the study in Kerimkulov et al., (2020) focuses on finite-horizon stochastic control problems; a brief discussion of the infinite-horizon counterpart can be found in Remark 4.3 of Kerimkulov et al., (2020), which suggests that the convergence result still holds for some sufficiently large discount factor δ>0\delta>0, although no formal proof is provided.

Without loss of generality, we consider a sufficiently large closed interval 𝒪=[x¯,x¯](0,1)\mathcal{O}=[\underline{x},\overline{x}]\subset(0,1). Further, we let the running cost function ff be in the form of (2.8),

f(x,η,ρ)=a0+aIx+amS(1η)2+(amIamS)x(1η)2+arxρ2,(x,η,ρ)𝒪×U,f(x,\eta,\rho)=a_{0}+a_{I}x+a^{S}_{m}(1-\eta)^{2}+(a^{I}_{m}-a^{S}_{m})x(1-\eta)^{2}+a_{r}x\rho^{2},\quad(x,\eta,\rho)\in\mathcal{O}\times U,

where a0a_{0} is the baseline marginal costs associated with the management of the system; aIa_{I} is the (extra) marginal running costs incurred by cyber-infected nodes in the system; amIa_{m}^{I} and amSa_{m}^{S} denote the marginal costs associated with proactive risk management for cyber-infected nodes and susceptible (functional and threat-free) nodes, respectively; ara_{r} denotes the marginal costs generated by reactive risk mitigation (intervention) to enhance the recovery rate in the system. The Policy Improvement Algorithm (PIA) for solving (3.2) on 𝒪\mathcal{O} is given in Algorithm 1.

Algorithm 1 Policy improvement algorithm
1:Initialize: Set spatial discretizations {xk}k=1N\{x_{k}\}_{k=1}^{N} with NN grid points of [x¯,x¯][\underline{x},\overline{x}]. Choose an initial guess of control u0:=(η0,ρ0)𝒰0u^{0}:=(\eta^{0},\rho^{0})\in\mathcal{U}_{0}, constant for each xkx_{k}
2:repeat
3:  Solve the Bellman–type ODE:
12σ2Dxxvn+bunDxvn+funδvn=0,x[x¯,x¯].\frac{1}{2}\sigma^{2}D_{xx}v^{n}+b^{u^{n}}D_{x}v^{n}+f^{u^{n}}-\delta v^{n}=0,\quad x\in[\underline{x},\overline{x}]. (4.1)
4:  Control update by the first-order conditions
ηn+1(x)={1(α+βx)(1x)Dxvn2[(amIamS)x+amS]if in [0,1],0if below 0,1otherwise.\eta^{n+1}(x)=\begin{cases}1-\frac{(\alpha+\beta x)(1-x)D_{x}v^{n}}{2\left[(a_{m}^{I}-a_{m}^{S})x+a^{S}_{m}\right]}&\text{if in }[0,1],\\ 0&\text{if below 0},\\ 1&\text{otherwise}.\end{cases}
ρn+1(x)={Dxvn2arxif Dxvn>0,0otherwise.\rho^{n+1}(x)=\begin{cases}\frac{D_{x}v^{n}}{2a_{r}x}&\text{if }D_{x}v^{n}>0,\\ 0&\text{otherwise}.\end{cases}
5:until The normalized L2L^{2}-norm between two nearest iterations
vn+1vn2N<ϵ.\frac{\|v^{n+1}-v^{n}\|_{2}}{\sqrt{N}}<\epsilon.
6:return: vnv^{n} and un+1u^{n+1}.

We shall remark that to solve the Bellman-type ODE (4.1) during each iteration step of Algorithm 1, we need to impose boundary conditions which are not available in our problem. Hence, we impose an approximated Dirichlet condition at the left boundary (x¯\underline{x}) and a Neumann condition at the right boundary (x¯\overline{x}) based on the conventional Monte-Carlo simulation method for the performance function (2.5) under the current control strategy (extrapolated for all x(0,1)x\in(0,1)) at each iteration step. The effectiveness of Algorithm 1 has been validated for an example of an infinite-horizon stochastic control problem with analytical solutions, as shown in Asmussen and Taksar, (1997).

For completeness, we present the convergence theorem of the Algorithm 1 in Theorem 4.1, together with the policy improvement theorem (see Theorem 4.2) and algorithm stability under perturbations to the solution of the Bellman-type ODE (4.1) (see Theorem 4.3).

Theorem 4.1

Assume Assumptions 2.12.2D.1, and D.2 hold. Let vC1(𝒪)v\in C^{1}(\mathcal{O}) be the (viscosity) solution to the HJB equation (3.2) on 𝒪\mathcal{O}, and let (vn)nC2(𝒪)(v^{n})_{n\in\mathbb{N}}\in C^{2}(\mathcal{O}) be the sequence of smooth approximations generated by Algorithm 1. Then there exists q(0,1)q\in(0,1) and the initial guess v0v^{0}, such that for all x𝒪x\in\mathcal{O} there is a constant C=C(x,δ)C=C(x,\delta) (δ\delta is a given discounting rate) satisfying

|v(x)vn(x)|2C(x,δ)qn.|v(x)-v^{n}(x)|^{2}\leq C(x,\delta)q^{n}. (4.2)

The proof of Theorem 4.1 follows a similar argument to Theorem 4.1 in Kerimkulov et al., (2020). To be specific, at the nnth iteration, the solution vnv^{n} of the Bellman-type ODE (4.1) can be characterized as the solution to a corresponding BSDE YnY^{n}. By the uniqueness property of the infinite-horizon BSDE, the equivalence between vnv^{n} and YnY^{n} is thus established. In addition, the solution to the HJB equation (3.2) can also be represented by a BSDE, which follows directly from Lemma D.5, where the key ingredient is the comparison principle for BSDEs. Finally, applying the contraction property from Lemma D.4, the desired convergence result follows. For a more detailed explanation of the underlying idea of Algorithm 1, we refer the reader to Kerimkulov et al., (2020). Due to the complexity, we postpone the detailed proof together with some preliminary assumptions and lemmas to Appendix D at the end of the paper.

Moreover, the following theorem establishes the monotone improvement property of the policy improvement algorithm.

Theorem 4.2

Let Assumptions 2.1, 2.2, D.1, and D.2 hold, and fix nn\in\mathbb{N}. Let vnv^{n} and vn+1v^{n+1} be the solutions of (4.1) at steps nn and n+1n+1 of the Algorithm 1. Then, for all x(0,1)x\in(0,1), it holds that

vn+1(x)vn(x).v^{n+1}(x)\geq v^{n}(x).

Proof. The proof follows a similar argument as the proof of Theorem 5.1 of Kerimkulov et al., (2020) together with the comparison principle of infinite-horizon BSDE given in Theorem D.3.  

To finish this subsection, we provide a stability property of the policy improvement algorithm under the perturbations to the solution of the Bellman-type ODE (4.1)(note that the perturbations come from both the fact that Eq. (4.1) is only solved approximately and our approximated boundary conditions). Hence, updating (with the first-order condition) the controls η\eta and ρ\rho at each iteration step in Algorithm 1 is essentially performed only with the approximated solution, which can cumulate errors in the iteration.

Let ε\varepsilon be a set of parameters that determines the accuracy of the solution to the ODE (4.1). Let uεnu^{n}_{\varepsilon} be the policy at iteration nn obtained from an approximate solution to the ODE (4.1), let vεnv^{n}_{\varepsilon} be the solution of

12σ2Dxxvεn(x)+buεnDxvεn+fuεn(x)δvεn=0,x[x¯,x¯],\frac{1}{2}\sigma^{2}D_{xx}v^{n}_{\varepsilon}(x)+b^{u^{n}_{\varepsilon}}D_{x}v^{n}_{\varepsilon}+f^{u^{n}_{\varepsilon}}(x)-\delta v^{n}_{\varepsilon}=0,\quad x\in[\underline{x},\overline{x}], (4.3)

with true boundary conditions. And let v~εn\tilde{v}^{n}_{\varepsilon} be the approximate solution to

12σ2Dxxvεn(x)+buεnDxvεn+fuεn(x)δvεn\displaystyle\frac{1}{2}\sigma^{2}D_{xx}v^{n}_{\varepsilon}(x)+b^{u^{n}_{\varepsilon}}D_{x}v^{n}_{\varepsilon}+f^{u^{n}_{\varepsilon}}(x)-\delta v^{n}_{\varepsilon} =0,x(x¯,x¯),\displaystyle=0,\quad x\in(\underline{x},\overline{x}),
vεn(x¯)\displaystyle v^{n}_{\varepsilon}(\underline{x}) =m1uεn(x¯),\displaystyle=m^{u^{n}_{\varepsilon}}_{1}(\underline{x}),
Dxvεn(x¯)\displaystyle D_{x}v^{n}_{\varepsilon}(\overline{x}) =m2uεn(x¯),\displaystyle=m^{u^{n}_{\varepsilon}}_{2}(\underline{x}),

where m1uεn(x¯)m^{u^{n}_{\varepsilon}}_{1}(\underline{x}) and m2uεn(x¯)m^{u^{n}_{\varepsilon}}_{2}(\underline{x}) denote the approximate values of J(x¯;uεn)J(\underline{x};u^{n}_{\varepsilon}) and the left derivative of J(;uεn)J(\cdot;u^{n}_{\varepsilon}) at x¯\overline{x} respectively. Then, the policy function (see the definition of this function in Assumption D.1) for the next iteration step is given by

uεn+1=u(x,(σDxv~εn)(x))=argminuU{(buDxv~εn)(x)+fu(x)}.u^{n+1}_{\varepsilon}=u\bigl(x,(\sigma D_{x}\widetilde{v}^{n}_{\varepsilon})(x)\bigr)=\operatorname*{arg\,min}_{u\in U}\Bigl\{(b^{u}D_{x}\widetilde{v}^{n}_{\varepsilon})(x)+f^{u}(x)\Bigr\}.
Theorem 4.3

Let Assumptions 2.1, 2.2, D.1, and D.2 hold. Let (vn)n(v^{n})_{n\in\mathbb{N}} be the approximation sequence given by Algorithm 1. Let (vεn)n(v^{n}_{\varepsilon})_{n\in\mathbb{N}} be the approximation sequence given by (4.3). Let uu^{\ast} and Xx,uX^{x,u^{\ast}} be the optimal control process for (3.2) and the associated controlled diffusion started from x(0,1)x\in(0,1), respectively. Assume that Dxv~εnD_{x}\tilde{v}^{n}_{\varepsilon} is uniformly bounded. Then there exist q(0,1)q\in(0,1) and 0<γ<δ0<\gamma<\delta, such that for all x(0,1)x\in(0,1), there exists a constant C=C(x,δ)C=C(x,\delta) with

|vn(x)vεn(x)|2C(x,δ)qn+2k=1nqk(σ(DxvεkDxv~εk))(Xx,u)𝕃^γδ2,|v^{n}(x)-v_{\varepsilon}^{n}(x)|^{2}\leq C(x,\delta)q^{n}+2\sum_{k=1}^{n}q^{k}\left\|\Bigl(\sigma(D_{x}v_{\varepsilon}^{k}-D_{x}\tilde{v}_{\varepsilon}^{k})\Bigr)(X^{x,u^{\ast}})\right\|_{\widehat{\mathbb{L}}_{\gamma-\delta}^{2}},

where 𝕃^γδ2{\widehat{\mathbb{L}}_{\gamma-\delta}^{2}} is the 𝕃γδ2\mathbb{L}^{2}_{\gamma-\delta}-norm under probability measure ^\widehat{\mathbb{P}}, see Appendix D.

Proof. The proof follows the same idea as Theorem 6.1 of Kerimkulov et al., (2020), except that Lemma D.4 is used in its place. We omit the details here.  

4.2 Benchmark example

The following Example 4.1 presents a benchmark example in which the Algorithm 1 is applied to numerically solve the cyber risk management and mitigation problem. It is important to note that the parameter amSa^{S}_{m} shall be small (which is not unreasonable) to ensure the convergence of the PIA. Additionally, the initial guess for η\eta is set to zero. These choices are made to help obtain a smooth solution for the severely stiff ODE. Moreover, a combination of a Dirichlet condition at the left boundary and a Neumann condition at the right boundary of the computing interval {x¯,x¯}\{\underline{x},\overline{x}\} will be used to solve the stiff ODE (4.1). Those are simulated by the conventional Monte-Carlo method.

Example 4.1

We first provide a benchmark example for the cyber risk control problem (2.6). Let a0=0.5,aI=5,amI=2.5,amS=0.5,ar=5a_{0}=0.5,a_{I}=5,a_{m}^{I}=2.5,a_{m}^{S}=0.5,a_{r}=5. Furthermore, we set the discount factor δ=0.05\delta=0.05. Additionally, let the external cyber attacks rate be α=0.5\alpha=0.5, internal contagion rate β=0.5\beta=0.5, (unassisted) recovery rate γ=0.15\gamma=0.15, and the diffusion coefficient σ=0.3\sigma=0.3. Finally, we set x¯=0.01\underline{x}=0.01, x¯=0.99\overline{x}=0.99, N=1000N=1000 and ϵ=104\epsilon=10^{-4}.

Refer to caption
(a) Value function
Refer to caption
(b) Optimal control strategies
Refer to caption
(c) Convergence of the errors in the PIA.
Figure 4.1: Results of benchmark example 4.1

The results of the optimal strategy and value function are given in Figures 1(b) and 1(a), respectively; we also provide the convergence of the errors (in terms of normalized L2L^{2}-norm of the difference between two successive value function), which shows the computational efficiency of the algorithm, see Figure 1(c). The algorithm converges with an error less than 10410^{-4} within eight steps starting from the initial guess η0=0\eta^{0}=0 and ρ0=0\rho^{0}=0. To better understand the behavior of the optimal control (η,ρ)(\eta^{*},\rho^{*}), we observe that η\eta^{*} remains at zero (strong proactive control) when the ratio of cyber-infected nodes is considerably small, while ρ\rho^{*} decays fast from its maximum. This pattern indicates that when the current system has a small ratio of cyber-infected nodes, applying the risk mitigation control to enhance the recovery rate is less effective than implementing proactive management control to prevent the internal contagions and external cyber attacks. Consequently, the optimal strategy prioritizes risk management over risk mitigation at the outset.

Furthermore, Figure 1(b) shows that both the optimal controls η\eta^{*} (cyber risk management) and ρ\rho^{*} (cyber risk mitigation) decrease as the ratio of cyber-infected nodes xx increases. In particular, we have the following observations:

  • When the cyber-infected ratio is low (i.e., x0x\approx 0), both controls are set high, which reflects a strong incentive for the decision maker to invest in cyber risk prevention and mitigation in an early stage.

  • As the cyber-infected ratio rises (x1x\uparrow 1), both controls decline. This suggests that once the system is already heavily compromised, additional investments in prevention or mitigation have diminishing impact on reducing risk, especially for the proactive risk management control η\eta. Hence, the decision maker may want to prioritize resource allocation toward risk mitigation strategies.

The above observations align with the intuition of cyber risk control under limited resources: it is most effective to intervene early, when cyber-infected ratios are still small; while intervention becomes less valuable (and less cost-effective) when the system is already in a state of widespread infection.

4.3 Suboptimal control analysis

Example 4.2

In this example, we provide numerical analysis when we fix either the risk management control at zero (η()1\eta(\cdot)\equiv 1) or the risk mitigation control at zero (ρ()0\rho(\cdot)\equiv 0). The resulting optimal strategies (of the single control) and value functions are given in Figure 4.2.

Refer to caption
(a) Optimal η\eta^{*} with ρ0\rho\equiv 0.
Refer to caption
(b) Optimal ρ\rho^{*} with η1\eta\equiv 1.
Refer to caption
(c) Value function when only risk management control is present.
Refer to caption
(d) Value function when only risk mitigation control is present.
Figure 4.2: Optimal strategy and value function for suboptimal controls

When removing the reactive cyber risk mitigation control (i.e., ρ0\rho\equiv 0), the optimal proactive risk management control η\eta^{*} becomes noticeably stronger (see Figure 2(a)) compared to the benchmark example, which compensates for the absence of reactive mitigation controls. The value function remains nearly unchanged at small xx, but rises substantially (reaching about 5555 versus the benchmark level of 2828) when the cyber-infected ratio is close to one, see Figure 2(c). This suggests that as cyber-infection level increases, the marginal effectiveness of risk management control drops off, and proactive risk management alone cannot effectively substitute for reactive mitigation control in a highly infected system.

On the other hand, when we remove the risk management control (i.e., η1\eta\equiv 1), the optimal reactive control ρ\rho^{*} decreases relative to the benchmark (Figure 2(b)), reflecting the limited effectiveness of mitigation when proactive control from a risk management perspective is unavailable. In this case, the value function rises overall (Figure 2(d)), shifting from a range between 20 to 30 under the benchmark example to a range between 75 to 80. This shows that relying solely on reactive mitigation strategies results in higher expected costs, confirming that mitigation control cannot fully substitute for proactive management.

4.4 Sensitivity analysis

In this subsection, we numerically analyze the distinct roles of proactive control η\eta (risk management) and reactive control ρ\rho (risk mitigation) in shaping the optimal value function. In particular, we deviate by a series of small changes (uniformly in xx) for η\eta and ρ\rho, respectively, from the optimal strategy (η,ρ)(\eta^{*},\rho^{*}) in the benchmark example. The results are plotted in Figure 3(a)3(d).

Refer to caption
(a) Value functions for positive perturbations
on η\eta^{*}.
Refer to caption
(b) Value functions for negative perturbations
on η\eta^{*}.
Refer to caption
(c) Value functions for positive perturbations
on ρ\rho^{*}.
Refer to caption
(d) Value functions for negative perturbations
on ρ\rho^{*}.
Figure 4.3: Sensitivity analysis on small deviations from the optimal control.

We conclude this subsection with the following observations:

  • Proactive risk management control (η\eta): According to Figure 3(a), increasing η\eta produces a substantial and nearly uniform increase in value function (higher expected discounted costs) across all initial states of cyber-infected level, confirming the consistent effectiveness of proactive measures when managing cyber risks in the system. Conversely, decreasing η\eta has limited effects on the magnitude of the value function (see Figure 3(b)). Note that, since η(x)=0\eta^{*}(x)=0 for small values of xx, the decrease of η\eta cannot be applied in this case; hence, we do not observe any changes in the corresponding value function. But, when the cyber-infected ratio is high, the inability to sustain strong proactive control leads to a noticeable increase in the value function (but not comparable to the corresponding scenario when increasing η\eta).

  • Reactive risk mitigation control (ρ\rho): Both Figures 3(c) and 3(d) show that perturbations on the value of ρ\rho^{*} exert their strongest influence when the system is in a status with a high cyber-infected ratio. Such an observation indicates that the reactive mitigation control is most valuable once the system contains a large number of infected nodes. In addition, unlike the risk management control, we can observe an obvious non-uniform change in the value function with respect to xx. In particular, when we add positive perturbations to ρ\rho^{*} (i.e., adding redundant mitigation controls), the increase in the value function (i.e., expected discounted costs) is moderate and consistent. However, adding negative perturbations to ρ\rho^{*}, which refers to insufficient mitigation controls, can distort the shape of the value function and sharply increase expected discounted costs. One may also conjecture (see Figure 3(d)) that there exists a critical “threshold level” for cyber risk mitigation control, such that insufficient reactive mitigation control below the “level” can cause tremendous losses. We leave this interesting observation for future research.

  • Overall conclusion: The sensitivity analysis together with the suboptimal control analysis in Example 4.2 highlight an “asymmetry” between the two types of control strategies. Proactive risk management provides consistent and broad benefits, and can partially substitute for the absence of mitigation controls. By contrast, reactive mitigation is valuable only when the system is heavily compromised with a high ratio of cyber-infected nodes, and cannot substitute for missing proactive risk management control. In practice, this implies that effective cyber risk control strategies require front-loaded investment in proactive defense, with reactive mitigation serving as a complementary safeguard against severe system breakdown rather than a stand-alone strategy.

Remark 4.1

One may notice that each “value function” obtained (by solving the corresponding ODEs under perturbed control) in the above sensitivity analysis (Figure 3(a)3(d)) are not necessarily the objective function JJ given in (2.5) under perturbed control.

In this remark, we assume the solution to the Bellman ODE is twice continuously differentiable. While the numerical solution is not necessarily twice continuously differentiable, it approximates a C2((0,1))C^{2}((0,1)) solution under standard regularity assumptions and sufficient discretization accuracy. If we fixed the control by each perturbation above, such as η±Δη\eta\pm\Delta\eta and ρ±Δρ\rho\pm\Delta\rho, then the control space UU is reduced to a singleton. As the consequence, one could apply the Itô’s formula for the discounted process eδtu(Xtx)e^{-\delta t}u(X_{t}^{x}) between 0 and Tτn{T\wedge\tau_{n}} with a sequence of stopping times τn:=inf{t0:0TτneδtDxu(Xtx)σ(Xtx)dWtn}\tau_{n}:=\inf\{t\geq 0:\int_{0}^{T\wedge\tau_{n}}e^{-\delta t}D_{x}u(X_{t}^{x})\sigma(X_{t}^{x})\operatorname*{\mathrm{d\!}}W_{t}\geq n\},

𝔼[eδTτnu(XTτnx)]=u(x)+𝔼[0Tτneδt[b(Xtx)Dxu(Xtx)+12σ2(Xtx)Dxxu(Xtx)δu(Xtx)]dt]+𝔼[0TτneδtDxu(Xtx)σ(Xtx)dWt]=u(x)+𝔼[0Tτneδt[b(Xtx)Dxu(Xtx)+12σ2(Xtx)Dxxu(Xtx)δu(Xtx)]dt],\begin{split}&\mathbb{E}\left[e^{-\delta{T\wedge\tau_{n}}}u\left(X_{T\wedge\tau_{n}}^{x}\right)\right]\\ &=u(x)+\mathbb{E}\left[\int_{0}^{T\wedge\tau_{n}}e^{-\delta t}[b(X_{t}^{x})D_{x}u(X_{t}^{x})+\frac{1}{2}\sigma^{2}(X_{t}^{x})D_{xx}u(X_{t}^{x})-\delta u(X_{t}^{x})]\operatorname*{\mathrm{d\!}}t\right]\\ &\hskip 18.49988pt+\mathbb{E}\left[\int_{0}^{T\wedge\tau_{n}}e^{-\delta t}D_{x}u(X_{t}^{x})\sigma(X_{t}^{x})\operatorname*{\mathrm{d\!}}W_{t}\right]\\ &=u(x)+\mathbb{E}\left[\int_{0}^{T\wedge\tau_{n}}e^{-\delta t}[b(X_{t}^{x})D_{x}u(X_{t}^{x})+\frac{1}{2}\sigma^{2}(X_{t}^{x})D_{xx}u(X_{t}^{x})-\delta u(X_{t}^{x})]\operatorname*{\mathrm{d\!}}t\right],\end{split}

where denote Xtx:=Xt0,x,η,ρX^{x}_{t}:=X^{0,x,\eta,\rho}_{t}, b(Xtx):=bη,ρ(Xtx)b(X^{x}_{t}):=b^{\eta,\rho}(X^{x}_{t}) and note the stochastic integral is a local martingale. Sending nn to infinity, then

𝔼[eδTu(XTx)]=u(x)+𝔼[0Teδt[b(Xtx)Dxu(Xtx)+12σ2(Xtx)Dxxu(Xtx)δu(Xtx)]dt]\mathbb{E}\left[e^{-\delta{T}}u\left(X_{T}^{x}\right)\right]=u(x)+\mathbb{E}\left[\int_{0}^{T}e^{-\delta t}[b(X_{t}^{x})D_{x}u(X_{t}^{x})+\frac{1}{2}\sigma^{2}(X_{t}^{x})D_{xx}u(X_{t}^{x})-\delta u(X_{t}^{x})]\operatorname*{\mathrm{d\!}}t\right]

holds by dominated convergence theorem. By using the fact that uu is a solution to the ODE stated above and then sending TT to infinity, we observe that

u(x)=𝔼[0eδtfη,ρ(x)dt]u(x)=\mathbb{E}\left[\int_{0}^{\infty}e^{-\delta t}f^{\eta,\rho}(x)\operatorname*{\mathrm{d\!}}t\right]

coincides with the definition of objective function J(x,η,ρ)J(x,\eta,\rho).

4.5 Comparative statics

In this final subsection, we perform a comparative statics analysis across all model parameters, including α\alpha, β\beta, σ\sigma, and the marginal cost parameters (aI,amI,amS,ara_{I},a_{m}^{I},a_{m}^{S},a_{r}) in the cost function given in (2.8). Note that it is not necessary to investigate the parameter γ\gamma, which contributes additively to the mitigation control ρ\rho in numerical results.

Comparative statics for α\alpha and β\beta. We first compare different values of the external cyber attack rate α\alpha and the internal contagious rate β\beta, keeping other parameters unchanged in Example 4.1. The resulting optimal strategies are given in Figures 4(a), 4(c) (for α\alpha) and 4(b), 4(d) (for β\beta), respectively.

Refer to caption
(a) Optimal η\eta^{*} for different α\alpha
Refer to caption
(b) Optimal η\eta^{*} for different β\beta
Refer to caption
(c) Optimal ρ\rho^{*} for different α\alpha
Refer to caption
(d) Optimal ρ\rho^{*} for different β\beta
Figure 4.4: Comparison analysis for α\alpha(left) and β\beta(right).

Figures 4.4 show that an increase in both α\alpha (the rate of external cyber attacks) and β\beta (the rate of internal cyber risk propagation) requires stronger risk management and mitigation controls, with the optimal proactive management η\eta^{*} moving closer to 0 and the optimal reactive mitigation ρ\rho^{*} rising. This reflects that it is optimal to simultaneously reinforce both preventive measures and reactive responses when cyber risks escalate. The impact of α\alpha is more pronounced than that of β\beta, leading to sharper adjustments in both controls. From a cyber risk perspective, this distinction is natural: a surge in external attacks compels the decision maker to rapidly escalate defensive risk management measures and amplify mitigation controls. In contrast, internal risk propagation dynamics among susceptible and infected nodes induce a more gradual–though less pronounced–reinforcement of the two controls. Such a result thus demonstrates the necessity of preferential allocation of resources to the management of external cyber attacks.

Comparative statics for σ. We compare different values of the volatility parameter σ\sigma in the stochastic system, assuming other parameters remain the same as in Example 4.1. Figure 4.5 illustrates the impact of increasing the volatility parameter σ\sigma on the optimal control (η,ρ)(\eta^{*},\rho^{*}). As σ\sigma rises from 0.10.1 to 0.50.5, 11, and 22, both the proactive management η\eta and the reactive mitigation ρ\rho weaken slightly. Intuitively, higher volatility increases uncertainty in the evolution of the cyber-infected ratio in the system, making aggressive interventions less effective. The small magnitude (compared with the cases when changing α\alpha and β\beta) of the change indicates that the control policy is primarily driven by the system’s drift dynamics, with stochastic fluctuations playing a secondary role.

Refer to caption
(a) Optimal η\eta^{*} for different σ\sigma
Refer to caption
(b) Optimal ρ\rho^{*} for different σ\sigma
Figure 4.5: Comparison analysis for σ\sigma.

Comparative statics for amIa_{m}^{I} and amSa_{m}^{S}. We further compare different values of the marginal costs associated with cyber risk management for cyber-infected nodes and susceptible nodes, respectively. We plot the resulting optimal strategies in Figure 4.6.

Refer to caption
(a) Optimal η\eta^{*} for different amIa^{I}_{m}.
Refer to caption
(b) Optimal η\eta^{*} for different amSa^{S}_{m}.
Refer to caption
(c) Optimal ρ\rho^{*} for different amIa^{I}_{m}.
Refer to caption
(d) Optimal ρ\rho^{*} for different amSa^{S}_{m}.
Figure 4.6: Comparison analysis for amIa_{m}^{I}(left) and amSa_{m}^{S}(right).

It is reasonable to observe that when the marginal costs of proactive management control associated with either cyber-infected nodes amIa_{m}^{I} or susceptible nodes amSa_{m}^{S} increase, the optimal proactive management control η\eta^{*} weakens. On the other hand, the optimal reactive mitigation control ρ\rho^{*} becomes stronger when the marginal costs of management control incurred by cyber-infected nodes (amIa_{m}^{I}) increase, since the costs associated with mitigation control become relatively cheaper. However, when the marginal costs (amSa_{m}^{S}) increase, we observe a decreasing trend (rather uniform for all levels of infection ratios) in the optimal mitigation control ρ\rho^{*}; in fact, such a counter-intuitive result is not unreasonable, since with a higher value of amSa_{m}^{S}, it becomes more expensive for risk management control when the system has a large amount of susceptible nodes; then, the decision maker may choose to reduce (moderately) the mitigation control, which essentially reduce the transition intensity from cyber-infected state to susceptible state. In addition, the optimal reactive mitigation control is more sensitive to the change of amIa_{m}^{I} when the cyber-infected ratio in the system is low, but it is more sensitive to the change of amSa_{m}^{S} when the system is heavily compromised. Furthermore, one can observe that the optimal risk management control η(x)\eta^{*}(x), as a function of the cyber-infected ratio xx, exhibits a concave shape (see e.g., Figure 6(a)), and the concavity diminishes when amIa_{m}^{I} decreases. Such a phenomenon may be rooted in the interaction (or trade-off) between the marginal costs aIa_{I} and amIa_{m}^{I}. When aIa_{I} is (sufficiently) larger than amIa_{m}^{I}, that is, the costs associated with cyber-infected nodes under management control are negligible compared to the baseline management costs of all cyber-infected nodes, then the optimal management control strategy is “neutral” to the cyber-infected ratio, hence results in a linear form. But, when amIa_{m}^{I} is sufficiently larger than aIa_{I}, the optimal management control, as a function of cyber-infected ratio, becomes a concave function. It means that, to minimize the total expected discounted costs, the decision maker might want to reduce the strength of proactive management control aggressively when the cyber-infected ratio is at a moderate level, and the tendency declines when the cyber-infected ratio approaches one (hence, a concave form). One can observe a similar result when changing the value of aIa_{I} and keeping amIa_{m}^{I} fixed, see Figures 7(a) and 7(c).

Comparative statics for aIa_{I} and ara_{r}. We change the value of marginal costs (aIa_{I}) incurred by cyber-infected nodes in the system, or the marginal costs (ara_{r}) associated with reactive risk mitigation control, keeping other parameters unchanged in Example 4.1.

Refer to caption
(a) Optimal η\eta^{*} for different aIa_{I}
Refer to caption
(b) Optimal η\eta^{*} for different ara_{r}
Refer to caption
(c) Optimal ρ\rho^{*} for different aIa_{I}
Refer to caption
(d) Optimal η\eta^{*} for different ara_{r}
Figure 4.7: Comparison analysis for aIa_{I}(left) and ara_{r}(right).

When the marginal costs aIa_{I} increase, Figures 7(a) and 7(c) show a similar (but in the opposite way) result of the optimal management control η\eta^{*} as we observed in Figures 6(a) and 6(c), where a stronger prevention measure is achieved by expanding the zero-valued plateau (representing maximal prevention) for a larger range of cyber-infected ratios. This expansion occurs in an almost uniform, additive manner, where as aIa_{I} rises, the switching threshold at which η\eta^{*} departs from zero shifts rightward by roughly the same increment. In addition, with a small value in aIa_{I} (e.g., aI=1a_{I}=1), the optimal proactive management control η(x)\eta^{*}(x), as a function of cyber-infected ratio xx, exhibits a concave property; and the concavity diminishes when aIa_{I} increases and eventually exceeds the value of amIa_{m}^{I}. However, the optimal reactive mitigation ρ\rho^{*} adjusts in a significantly different way compared with what we obtained in Figure 6(c), especially when the cyber-infected ratio (xx) is close to one. To be specific, when the system is heavily compromised, for a large value of aIa_{I} (compared to amIa_{m}^{I}), it is optimal to increase the reactive mitigation control to reduce the number of infected nodes so that the expected discounted costs can be reduced significantly.

On the other hand, Figures  7(b) and 7(d) show that increasing the mitigation cost ara_{r} from 11 to 2.52.5, 55, and 7.57.5 leads to a systematic weakening of the mitigation strategy ρ\rho^{*}, while the management strategy η\eta^{*} is strengthened by expanding its zero-valued plateau (maximal prevention) so that strong prevention is applied earlier and more widely. The movement pattern of ρ\rho^{*} is similar to that observed in Figures 7(a) and 7(c), in the sense that the adjustment is roughly uniform across the state space, but the direction is opposite: a higher value of aIa_{I} lifts ρ\rho proportionally, whereas a higher mitigation cost ara_{r} pushes ρ\rho^{*} downward. This observation highlights that when the operating costs associated with cyber-infected nodes become more costly, it is optimal to reinforce both risk management and mitigation strategies. However, when mitigation becomes expensive, the optimal strategy reallocates effort towards risk management control with a reduced reliance on expensive reactive mitigation controls.

5 Conclusion and future outlook

In this paper, we model cyber risk management and mitigation as a stochastic optimal control problem within a stochastic Susceptible-Infected-Susceptible (SIS) epidemic framework. We introduce two dynamic controls to capture real-time risk management and mitigation strategies: 1) a proactive control (η\eta) that reduces external cyber attacks and internal contagion effects; 2) a reactive control (ρ\rho) that speeds up the recovery of infected nodes. We formulate this as a dual stochastic control problem governed by a general diffusion process. Theoretically, we establish the well-posedness of the controlled SIS model under these dual controls and prove that the associated value function is the unique increasing and Lipschitz-continuous viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation derived from the control problem.

For numerical implementation, we propose a Policy Improvement Algorithm (PIA) and demonstrate its convergence using Backward Stochastic Differential Equations (BSDEs). Our convergence result extends existing finite-horizon analyses to the infinite-horizon case. Then, we present a benchmark example that illustrates the optimal risk management and mitigation strategy, along with the corresponding value function, for a given model parameter set. We further examine suboptimal performance and sensitivity by: 1) removing one control entirely in the benchmark scenario; 2) introducing small perturbations to each optimal control; 3) conducting a comprehensive comparative statics analysis across all model parameters. The sensitivity and suboptimal control analyses reveal a fundamental asymmetry between the two control strategies, where proactive risk management control demonstrates consistent system-wide benefits and exhibits partial substitutability for reactive mitigation when absent; however, reactive risk mitigation control shows value only during high-infection scenarios and cannot compensate for missing proactive measures. Furthermore, some interesting observations are drawn from the comparative statics, including: the asymmetric impact of external attack frequency versus internal contagion rates on optimal control strategies underscores a possible critical policy implication; effective cyber defense requires prioritizing resource allocation toward external threat management; the optimal control strategy, particularly proactive risk management, exhibits significantly different behavioral patterns depending on the current infection ratio. This variation stems from the interaction between the operational costs of maintaining all infected nodes in the system and the marginal costs of implementing risk management controls on these compromised nodes.

Finally, we remark that our work lays a foundation for several natural extensions in the field. One direction is incorporating jump processes to model sudden, large-scale cyber attacks or system failures, which could better capture extreme events beyond the diffusion approximation. Another extension involves regime-switching dynamics, where the network environment or external threat landscape changes over time, influencing both infection propagation and optimal control strategies. Further research may also explore multi-layered or networked SIS models with heterogeneous nodes, time delays, and partial observation, enabling more realistic and granular cyber risk management strategies. These extensions could provide a richer theoretical framework and more practical insights for robust cyber defense policies under uncertainty and complex operational conditions. We left them for future research.

Statements and Declarations

No competing interests.

Acknowledgments

Zhuo Jin and Hailiang Yang were supported by the National Natural Science Foundation of China Grant [Grant 12471452]. Ran Xu was supported by the National Natural Science Foundation of China [Grants 12201506 and 12371468].

Appendix A Proof of Proposition 2.1

(i) To prove the assertion, we first show that the cost functional JJ defined in (2.8) is non-decreasing in xx if f(x,,)f(x,\cdot,\cdot) is non-decreasing in xx for each pair of admissible controls. This can be proved using the density argument and Itô’s formula. To be specific, let’s firstly argue the Yamada & Watanabe’s comparison principle of Itô’s diffusion (see, e.g., Karatzas and Shreve, (1991)). The diffusion term σ(x)\sigma(x) holds the locally Lipschitz property, and further observe that |σ(x)σ(y)|h(|xy|)|\sigma(x)-\sigma(y)|\leq h(|x-y|) by simply taking h(x):=3σx:=axh(x):=3\sigma x:=ax. There exists a strictly decreasing sequence {an}n0\{a_{n}\}_{n\in\mathbb{N}}\downarrow 0 with a0:=1a_{0}:=1 and [an1,an]h2(u)𝑑u=n\int_{[a_{n-1},a_{n}]}h^{-2}(u)du=n for every nn\in\mathbb{N}. To see this, one explicitly has [an1,an]h2(u)𝑑u=1a2(1an1an+1)=n\int_{[a_{n-1},a_{n}]}h^{-2}(u)du=\frac{1}{a^{2}}\cdot\left(\frac{1}{a_{n}}-\frac{1}{a_{n+1}}\right)=n, which gives an=22+a2n(n+1)a_{n}=\frac{2}{2+a^{2}n(n+1)} satisfying such properties as required for each nn\in\mathbb{N}. Moreover, we would like to take a nonnegative continuous function p(x)p(x) dominated by 2/(nh2(x)02/(nh^{2}(x)0, such that 1[an1,an]p(u)du[an1,an]2/(nh2(u))du=21\leq\int_{[a_{n-1},a_{n}]}p(u)\operatorname*{\mathrm{d\!}}u\leq\int_{[a_{n-1},a_{n}]}2/(nh^{2}(u))\operatorname*{\mathrm{d\!}}u=2, so that we get a normalized function ρn(x):(an1,an)\rho_{n}(x):(a_{n-1},a_{n})\to\mathbb{R} continuous in xx taking the form of

0ρn(x):=p(x)(an1,an)p(u)duI(an1,an)(x)p(x)2nh2(x).0\leq\rho_{n}(x):=\frac{p(x)}{\int_{(a_{n-1},a_{n})}p(u)\operatorname*{\mathrm{d\!}}u}\cdot I_{(a_{n-1},a_{n})}(x)\leq p(x)\leq\frac{2}{nh^{2}(x)}.

For example, one can take p(x)=1/(nh2(x))p(x)=1/(nh^{2}(x)). Notice also that a property

0|x|ρn(x)dx0p(x)(an1,an)p(u)duI(an1,an)(x)dx=(an1,an)p(x)(an1,an)p(u)𝑑udx=1\int_{0}^{|x|}\rho_{n}(x)\operatorname*{\mathrm{d\!}}x\leq\int_{0}^{\infty}\frac{p(x)}{\int_{(a_{n-1},a_{n})}p(u)\operatorname*{\mathrm{d\!}}u}\cdot I_{(a_{n-1},a_{n})}(x)\operatorname*{\mathrm{d\!}}x=\int_{(a_{n-1},a_{n})}\frac{p(x)}{\int_{(a_{n-1},a_{n})}p(u)du}\operatorname*{\mathrm{d\!}}x=1

holds for some real number xx. Next, assign the function

ψn(x):=0|x|0yρn(u)dudy, on \psi_{n}(x):=\int_{0}^{|x|}\int_{0}^{y}\rho_{n}(u)\operatorname*{\mathrm{d\!}}u\operatorname*{\mathrm{d\!}}y,\;\text{ on }\;\mathbb{R}

which is even and twice continuously differentiable:

|Dxψn(x)|=|ddx(0|x|0yρn(u)𝑑u𝑑y)|=0|x|ρn(u)𝑑u1|D_{x}\psi_{n}(x)|=\left|\frac{d}{dx}\left(\int_{0}^{|x|}\int_{0}^{y}\rho_{n}(u)dudy\right)\right|=\int_{0}^{|x|}\rho_{n}(u)du\leq 1

by fundamental theorem of calculus, and

limnψn(x)=0|x|limn[0,y]ρn(u)dudy=0|x|limn(an1,an)(0,y)p(u)(an1,an)p(v)𝑑vdudy=0|x|limn(an1,an)p(u)(an1,an)p(v)dvdudy=0|x|1dy=|x|,\begin{split}\lim_{n\nearrow\infty}\psi_{n}(x)&=\int_{0}^{|x|}\lim_{n\nearrow\infty}\int_{[0,y]}\rho_{n}(u)\operatorname*{\mathrm{d\!}}u\operatorname*{\mathrm{d\!}}y\\ &=\int_{0}^{|x|}\lim_{n\nearrow\infty}\int_{(a_{n-1},a_{n})\cap(0,y)}\frac{p(u)}{\int_{(a_{n-1},a_{n})}p(v)dv}\cdot\operatorname*{\mathrm{d\!}}u\operatorname*{\mathrm{d\!}}y\\ &=\int_{0}^{|x|}\lim_{n\nearrow\infty}\int_{(a_{n-1},a_{n})}\frac{p(u)}{\int_{(a_{n-1},a_{n})}p(v)\operatorname*{\mathrm{d\!}}v}\cdot\operatorname*{\mathrm{d\!}}u\operatorname*{\mathrm{d\!}}y\\ &=\int_{0}^{|x|}1\cdot\operatorname*{\mathrm{d\!}}y=|x|,\end{split}

where observe that {0yρn(u)du}n1\{\int_{0}^{y}\rho_{n}(u)\operatorname*{\mathrm{d\!}}u\}_{n\in\mathbb{N}}\uparrow 1 is at least non-decreasing sequence for each y0y\geq 0, hence allowing us to apply the monotone convergence theorem. Consider two random processes (namely the solution to the corresponding controlled SDE (2.3)) Xtx,η,ρX^{x,\eta,\rho}_{t} and Xty,η,ρX^{y,\eta,\rho}_{t} for different initial data xyx\leq y, each of that has continuous trajectory for every individual ωΩ\omega\in\Omega. Now, apply the Itô’s formula for the random process φn(Xsx,η,ρXsy,η,ρ):=ψn(Xsx,η,ρXsy,η,ρ)𝟏(0,)(Xsx,η,ρXsy,η,ρ)\varphi_{n}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s}):=\psi_{n}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s})\cdot\mathbf{1}_{(0,\infty)}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s}):

𝔼[φn(Xtx,η,ρXty,η,ρ)]𝔼[φn(Xtx,η,ρXty,η,ρ)](xy)=𝔼[0tDxφn(Xsx,η,ρXsy,η,ρ)[b(Xsx,η,ρ)b(Xsy,η,ρ)]ds]+12𝔼[0tDxxφn(Xsx,η,ρXsy,η,ρ)[σ(Xsx,η,ρ)σ(Xsy,η,ρ)]2ds]𝔼[0tDxφn(Xsx,η,ρXsy,η,ρ)[b(Xsx,η,ρ)b(Xsy,η,ρ)]ds]+tnK𝔼[0t[Xsx,η,ρXsy,η,ρ]+ds]+tn,\begin{split}\mathbb{E}\Big[\varphi_{n}(X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t})\Big]\leq&\mathbb{E}\Big[\varphi_{n}(X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t})\Big]-(x-y)\\ =&\mathbb{E}\Bigg[\int_{0}^{t}D_{x}\varphi_{n}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s})\cdot\left[b(X^{x,\eta,\rho}_{s})-b(X^{y,\eta,\rho}_{s})\right]\operatorname*{\mathrm{d\!}}s\Bigg]\\ +&\frac{1}{2}\mathbb{E}\Bigg[\int_{0}^{t}D_{xx}\varphi_{n}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s})\cdot\left[\sigma(X^{x,\eta,\rho}_{s})-\sigma(X^{y,\eta,\rho}_{s})\right]^{2}\operatorname*{\mathrm{d\!}}s\Bigg]\\ \leq&\mathbb{E}\Bigg[\int_{0}^{t}D_{x}\varphi_{n}(X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s})\cdot\left[b(X^{x,\eta,\rho}_{s})-b(X^{y,\eta,\rho}_{s})\right]\operatorname*{\mathrm{d\!}}s\Bigg]+\frac{t}{n}\\ \leq&K\cdot\mathbb{E}\Bigg[\int_{0}^{t}[X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s}]^{+}\operatorname*{\mathrm{d\!}}s\Bigg]+\frac{t}{n},\end{split} (A.1)

where the stochastic integral vanished due to the fact 𝔼(0t|σ(Xs)|2ds)<\mathbb{E}\Big(\int_{0}^{t}|\sigma(X_{s})|^{2}\operatorname*{\mathrm{d\!}}s\Big)<\infty for each t0t\geq 0; the second inequality holds by the established property of function hh and ψn\psi_{n}; and further by the Lipschitz continuity of b()b(\cdot) to reach the last line in (A.1). By sending nn\nearrow\infty on both sides of (A.1) with Lebesgue dominated convergence theorem yields

limn𝔼[φn(Xtx,η,ρXty,η,ρ)]\displaystyle\lim_{n\nearrow\infty}\mathbb{E}\Big[\varphi_{n}(X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t})\Big] =𝔼[limnφn(Xtx,η,ρXty,η,ρ)]\displaystyle=\mathbb{E}\Big[\lim_{n\nearrow\infty}\varphi_{n}(X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t})\Big]
=𝔼[Xtx,η,ρXty,η,ρ]+K𝔼[0t[Xsx,η,ρXsy,η,ρ]+ds]\displaystyle=\mathbb{E}\left[X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t}\right]^{+}\leq K\cdot\mathbb{E}\Bigg[\int_{0}^{t}[X^{x,\eta,\rho}_{s}-X^{y,\eta,\rho}_{s}]^{+}\operatorname*{\mathrm{d\!}}s\Bigg]

As a consequence, the desired comparison principle follows by Grönwall’s inequality, i.e.

𝔼[Xtx,η,ρXty,η,ρ]+=0,t0.\mathbb{E}[X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t}]^{+}=0,\;\forall t\geq 0.

Secondly, let’s assume that 𝔼[f(Xtx,η,ρ,ηt,ρt)f(Xty,η,ρ,ηt,ρt)]+>0\mathbb{E}\left[f(X^{x,\eta,\rho}_{t},\eta_{t},\rho_{t})-f(X^{y,\eta,\rho}_{t},\eta_{t},\rho_{t})\right]^{+}>0. However, this immediately turns out that there exists some positive measure set Ω+Ω\Omega_{+}\subset\Omega such that

Ω+[Xtx,η,ρXty,η,ρ]+(dω)>0\int_{\Omega_{+}}\left[X^{x,\eta,\rho}_{t}-X^{y,\eta,\rho}_{t}\right]^{+}\mathbb{P}(\operatorname*{\mathrm{d\!}}\omega)>0

by the non-decreasing property of cost functional. This apparently contradicts the result we just obtained.

Thirdly, we can thus claim that the comparison principle of two objective functions: J(x,η,ρ)J(y,η,ρ)J(x,\eta,\rho)\leq J(y,\eta,\rho) if xyx\leq y under each control (η,ρ)𝒰0(\eta,\rho)\in\mathcal{U}_{0}. To show this, observe that

𝔼[0eδt[f(Xtx,η,ρ,ηt,ρt)f(Xty,η,ρ,ηt,ρt)]+dt]\displaystyle\mathbb{E}\left[\int_{0}^{\infty}e^{-\delta t}[f(X^{x,\eta,\rho}_{t},\eta_{t},\rho_{t})-f(X^{y,\eta,\rho}_{t},\eta_{t},\rho_{t})]^{+}\operatorname*{\mathrm{d\!}}t\right]
2supz{x,y}𝔼[0eδt|f(Xtz,η,ρ,ηt,ρt)|dt]<\displaystyle\quad\leq 2\sup_{z\in\{x,y\}}\mathbb{E}\left[\int_{0}^{\infty}e^{-\delta t}|f(X^{z,\eta,\rho}_{t},\eta_{t},\rho_{t})|\operatorname*{\mathrm{d\!}}t\right]<\infty

by Assumption 2.2. That, together with the joint measurability of process f(Xtz,η,ρ,ηt,ρt)f(X^{z,\eta,\rho}_{t},\eta_{t},\rho_{t}) on product σ\sigma-field ([0,t])t\mathcal{B}([0,t])\otimes\mathcal{F}_{t} for every t0t\geq 0 and z{x,y}z\in\{x,y\}, ensures the applicability of Fubini’s theorem. Therefore, interchange the order of the two integrals

𝔼[0eδt[f(Xtx,η,ρ,ηt,ρt)f(Xty,η,ρ,ηt,ρt)]+dt]=0eδt𝔼x[f(Xtx,η,ρ,ηt,ρt)f(Xty,η,ρ,ηt,ρt)]+dt=0.\begin{split}&\mathbb{E}\left[\int_{0}^{\infty}e^{-\delta t}[f(X^{x,\eta,\rho}_{t},\eta_{t},\rho_{t})-f(X^{y,\eta,\rho}_{t},\eta_{t},\rho_{t})]^{+}\operatorname*{\mathrm{d\!}}t\right]\\ &\quad=\int_{0}^{\infty}e^{-\delta t}\mathbb{E}_{x}[f(X^{x,\eta,\rho}_{t},\eta_{t},\rho_{t})-f(X^{y,\eta,\rho}_{t},\eta_{t},\rho_{t})]^{+}\operatorname*{\mathrm{d\!}}t=0.\end{split}

Hence, the non-decreasing property of VV is a straightforward argument. To be specific, we consider 0<xy<10<x\leq y<1, and for any ε>0\varepsilon>0, let (ηε(y),ρε(y))𝒰0(\eta^{\varepsilon}(y),\rho^{\varepsilon}(y))\in\mathcal{U}_{0} be an ε\varepsilon-optimal control for X0=yX_{0}=y such that J(y;ηε(y),ρε(y))V(y)+εJ(y;\eta^{\varepsilon}(y),\rho^{\varepsilon}(y))\leq V(y)+\varepsilon. Then, since J(x;,)J(x;\cdot,\cdot) is non-decreasing in xx for any given admissible control, we have

V(x)J(x;ηε(y),ρε(y))J(y;ηε(y),ρε(y))V(y)+ε,V(x)\leq J(x;\eta^{\varepsilon}(y),\rho^{\varepsilon}(y))\leq J(y;\eta^{\varepsilon}(y),\rho^{\varepsilon}(y))\leq V(y)+\varepsilon,

and by sending ε0\varepsilon\to 0, we complete the proof.

(ii) We first show a similar result as stated in (2.9) holds for the cost functional JJ for any given admissible control. For fixed (η,ρ)𝒰0(\eta,\rho)\in\mathcal{U}_{0}, and take any x,y(0,1)x,y\in(0,1), and a time 0<T<0<T<\infty, we have

|J(x;η,ρ)J(y;η,ρ)|\displaystyle|J(x;\eta,\rho)-J(y;\eta,\rho)|
𝔼[0eδt|f(Xtx,η,ρ,ηt,ρt)f(Xty,η,ρ,ηt,ρt)|dt]\displaystyle\leq\mathbb{E}\left[\int_{0}^{\infty}e^{-\delta t}\left|f(X_{t}^{x,\eta,\rho},\eta_{t},\rho_{t})-f(X_{t}^{y,\eta,\rho},\eta_{t},\rho_{t})\right|\operatorname*{\mathrm{d\!}}t\right]
K𝔼[0Teδt(1+|Xtx,η,ρ|m+|Xty,η,ρ|m)|Xtx,η,ρXty,η,ρ|dt]+2CfTeδtdt\displaystyle\leq K\mathbb{E}\left[\int_{0}^{T}e^{-\delta t}(1+|X_{t}^{x,\eta,\rho}|^{m}+|X_{t}^{y,\eta,\rho}|^{m})|X_{t}^{x,\eta,\rho}-X_{t}^{y,\eta,\rho}|\operatorname*{\mathrm{d\!}}t\right]+2C_{f}\int_{T}^{\infty}e^{-\delta t}\operatorname*{\mathrm{d\!}}t
CK,m(𝔼[0Te(m+1)δt2m(1+|Xtx,η,ρ|m+1+|Xty,η,ρ|m+1)dt])mm+1\displaystyle\leq C_{K,m}\left(\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2m}}(1+|X_{t}^{x,\eta,\rho}|^{m+1}+|X_{t}^{y,\eta,\rho}|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg]\right)^{\frac{m}{m+1}}
×(𝔼[0Te(m+1)δt2(|Xtx,η,ρXty,η,ρ|m+1)dt])1m+1+2CfeδT.\displaystyle\qquad\times\left(\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2}}(|X_{t}^{x,\eta,\rho}-X_{t}^{y,\eta,\rho}|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg]\right)^{\frac{1}{m+1}}+2C_{f}e^{-\delta T}. (A.2)

where Xx,η,ρX^{x,\eta,\rho}_{\cdot} denotes the controlled process under fixed control (η,ρ)𝒰0(\eta,\rho)\in\mathcal{U}_{0} with initial value xx. Next, we estimate the following two terms

I1:=𝔼[0Te(m+1)δt2m(1+|Xtx,η,ρ|m+1+|Xty,η,ρ|m+1)dt]I_{1}:=\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2m}}(1+|X_{t}^{x,\eta,\rho}|^{m+1}+|X_{t}^{y,\eta,\rho}|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg]

and

I2:=𝔼[0Te(m+1)δt2(|Xtx,η,ρXty,η,ρ|m+1)dt].I_{2}:=\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2}}(|X_{t}^{x,\eta,\rho}-X_{t}^{y,\eta,\rho}|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg].

Therefore, we can apply boundedness property of the moment of the controlled SDE within [0,T][0,T] for any T<T<\infty, there exists a constant N=N(m,K)N=N(m,K) such that

I1\displaystyle I_{1} 𝔼[0Te(m+1)δt2m(1+sup0tT|Xtx,η,ρ|m+1+sup0tT|Xty,η,ρ|m+1)dt]\displaystyle\leq\mathbb{E}\Bigg[\int_{0}^{T}e^{-\frac{(m+1)\delta t}{2m}}(1+\sup_{0\leq t\leq T}|X_{t}^{x,\eta,\rho}|^{m+1}+\sup_{0\leq t\leq T}|X_{t}^{y,\eta,\rho}|^{m+1})\operatorname*{\mathrm{d\!}}t\Bigg]
(1+NeNT(1+|x|m+1)+NeNT(1+|y|m+1))\displaystyle\leq\left(1+Ne^{NT}(1+|x|^{m+1})+Ne^{NT}(1+|y|^{m+1})\right)

and

I2NeNT|xy|m+1,\displaystyle I_{2}\leq Ne^{NT}|x-y|^{m+1},

which can be seen, for example, in Corollary 2.5.12 in Krylov, (1980). Thus, by combining I1,I2I_{1},I_{2} and equation (A) and applying Minkowski’s inequality, we attain that

|J(x;η,ρ)J(y;η,ρ)|C(T,m,δ)(1+|x|m+|y|m)|xy|.|J(x;\eta,\rho)-J(y;\eta,\rho)|\leq C(T,m,\delta)(1+|x|^{m}+|y|^{m})|x-y|.

Then, without loss of generality, we consider xyx\geq y such that V(x)V(y)V(x)\geq V(y). Take an ε\varepsilon-optimal control (ηε,ρε)𝒰0(\eta^{\varepsilon},\rho^{\varepsilon})\in\mathcal{U}_{0} such that V(y)J(y;ηϵ,ρϵ)εV(y)\geq J(y;\eta^{\epsilon},\rho^{\epsilon})-\varepsilon. Thus, we obtain the following inequality

|V(x)V(y)|=V(x)V(y)J(x;ηϵ,ρϵ)J(y;ηϵ,ρϵ)+εC(1+|x|m+|y|m)|xy|+ε.|V(x)-V(y)|=V(x)-V(y)\leq J(x;\eta^{\epsilon},\rho^{\epsilon})-J(y;\eta^{\epsilon},\rho^{\epsilon})+\varepsilon\leq C(1+|x|^{m}+|y|^{m})|x-y|+\varepsilon.

The desired result will be achieved by sending ε\varepsilon to zero. Furthermore, V(x)V(x) is also continuous uniformly in x(0,1)x\in(0,1). The Lipschitz and uniform continuity will follow again due to |x|m,|y|m(0,1)|x|^{m},|y|^{m}\in(0,1).

Appendix B Proof of Proposition 3.1

(i) Viscosity subsolution. We consider any test function ϕC2((0,1))\phi\in C^{2}((0,1)) with ϕV\phi\geq V such that ϕ(x)=V(x)\phi(x)=V(x) for any given x(0,1)x\in(0,1), we just need to show that

inf(ρ,η)U{(η,ρδ)ϕ(x)+f(x,η,ρ)}0,\inf_{(\rho,\eta)\in U}\Big\{(\mathcal{L}^{\eta,\rho}-\delta)\phi(x)+f(x,\eta,\rho)\Big\}\geq 0,

where

η,ρϕ(x)=b(x,η,ρ)ϕx(x)+12σ2(x)ϕxx(x)\mathcal{L}^{\eta,\rho}\phi(x)=b(x,\eta,\rho)\phi_{x}(x)+\frac{1}{2}\sigma^{2}(x)\phi_{xx}(x)

is the infinitesimal generator of the controlled diffusion given by (2.3) under a pair of fixed control (η,ρ)U(\eta,\rho)\in U. Fix any (η,ρ)U(\eta,\rho)\in U, consider the control ηtη,ρtρ\eta_{t}\equiv\eta,\rho_{t}\equiv\rho and the corresponding controlled diffusion XX. For a sufficiently small r>0r>0 such that (xr,x+r)(0,1)(x-r,x+r)\subset(0,1), consider the stopping time τ:=inf{s>0:Xs(xr,x+r)}\tau:=\inf\{s>0:X_{s}\notin(x-r,x+r)\}. Then, by applying Itô’s formula and the dynamic programming principle, we have

0\displaystyle 0 𝔼x[eδτV(Xτ)+0τeδtf(Xt,ηt,ρt)dtV(x)]\displaystyle\leq\mathbb{E}_{x}\left[e^{-\delta\tau}V(X_{\tau})+\int_{0}^{\tau}e^{-\delta t}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}t-V(x)\right]
𝔼x[eδτϕ(Xτ)+0τeδtf(Xt,ηt,ρt)dtϕ(x)]\displaystyle\leq\mathbb{E}_{x}\left[e^{-\delta\tau}\phi(X_{\tau})+\int_{0}^{\tau}e^{-\delta t}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}t-\phi(x)\right]
=𝔼x[0τeδt(η,ρδ)ϕ(Xt)dt+0τeδtf(Xt,ηt,ρt)dt].\displaystyle=\mathbb{E}_{x}\left[\int_{0}^{\tau}e^{-\delta t}(\mathcal{L}^{\eta,\rho}-\delta)\phi(X_{t})\operatorname*{\mathrm{d\!}}t+\int_{0}^{\tau}e^{-\delta t}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}t\right]. (B.1)

Now, we assume that L(x):=(η,ρδ)ϕ(x)+f(x,η,ρ)ϵ<0L(x):=(\mathcal{L}^{\eta,\rho}-\delta)\phi(x)+f(x,\eta,\rho)\leq-\epsilon<0 for any given (η,ρ)U(\eta,\rho)\in U. By the continuity of the function LL, there exists a h>0h>0 such that L(y)<ϵ/2L(y)<-\epsilon/2 for all y(xh,x+h)y\in(x-h,x+h). Then, let τ~:=inf{s>0:Xs(xh,x+h)}\tilde{\tau}:=\inf\{s>0:X_{s}\notin(x-h,x+h)\}, we have

𝔼x[0τ~eδtL(Xt)dt]<𝔼x[0τ~eδtϵ2dt]=𝔼x[ϵ(1eδτ~)2δ]<0,\mathbb{E}_{x}\Big[\int_{0}^{\tilde{\tau}}e^{-\delta t}L(X_{t})\operatorname*{\mathrm{d\!}}t\Big]<-\mathbb{E}_{x}\Big[\int_{0}^{\tilde{\tau}}e^{-\delta t}\frac{\epsilon}{2}\operatorname*{\mathrm{d\!}}t\Big]=-\mathbb{E}_{x}\Big[\frac{\epsilon(1-e^{-\delta\tilde{\tau}})}{2\delta}\Big]<0,

which is a contradiction to (B) if we choose r=hr=h above. Hence, by the arbitrariness of the control, we must have

inf(ρ,η)U{(η,ρδ)ϕ(x)+f(x,η,ρ)}0.\inf_{(\rho,\eta)\in U}\Big\{(\mathcal{L}^{\eta,\rho}-\delta)\phi(x)+f(x,\eta,\rho)\Big\}\geq 0.

(ii) Viscosity supersolution. For any x(0,1)x\in(0,1), let φC2((0,1))\varphi\in C^{2}((0,1)) be any test function such that φV\varphi-V attains a maximum value of zero at xx, we show that

inf(ρ,η)U{(η,ρδ)φ(x)+f(x,η,ρ)}0.\inf_{(\rho,\eta)\in U}\left\{(\mathcal{L}^{\eta,\rho}-\delta)\varphi(x)+f(x,\eta,\rho)\right\}\leq 0.

We prove this by contradiction. Assume that inf(ρ,η)U{(η,ρδ)φ(x)+f(x,η,ρ)}>0\inf_{(\rho,\eta)\in U}\left\{(\mathcal{L}^{\eta,\rho}-\delta)\varphi(x)+f(x,\eta,\rho)\right\}>0, then by the continuity of the function (η,ρδ)φ()(\mathcal{L}^{\eta,\rho}-\delta)\varphi(\cdot) and f(,η,ρ)f(\cdot,\eta,\rho) uniformly in the control, there exist ϵ>0\epsilon>0 and ϖ>0\varpi>0 such that

inf(ρ,η)U{(η,ρδ)φ(y)+f(y,η,ρ)}>ϵ,for all y(xϖ,x+ϖ)(0,1).\inf_{(\rho,\eta)\in U}\left\{(\mathcal{L}^{\eta,\rho}-\delta)\varphi(y)+f(y,\eta,\rho)\right\}>\epsilon,\quad\text{for all }y\in(x-\varpi,x+\varpi)\subset(0,1).

Then, for any fixed control (ηt,ρt)t0(η,ρ)U(\eta_{t},\rho_{t})_{t\geq 0}\equiv(\eta,\rho)\in U, let XtX_{t} be the controlled process with X0=xX_{0}=x. We define the first exit time of the interval (xϖ,x+ϖ)(x-\varpi,x+\varpi) as τϖ:=inf{t>0:Xt(xϖ,x+ϖ)}\tau^{\varpi}:=\inf\{t>0:X_{t}\notin(x-\varpi,x+\varpi)\}. Then, by applying Itô’s formula, we have

φ(x)\displaystyle\varphi(x) =𝔼x[eδ(tτϖ)φ(Xtτϖ)0tτϖeδs(η,ρδ)φ(Xs)ds]\displaystyle=\mathbb{E}_{x}\Bigg[e^{-\delta(t\wedge\tau^{\varpi})}\varphi(X_{t\wedge\tau^{\varpi}})-\int_{0}^{t\wedge\tau^{\varpi}}e^{-\delta s}(\mathcal{L}^{\eta,\rho}-\delta)\varphi(X_{s})\operatorname*{\mathrm{d\!}}s\Bigg]
𝔼x[eδ(tτϖ)V(Xtτϖ)+0tτϖeδsf(Xt,ηt,ρt)ds]\displaystyle\leq\mathbb{E}_{x}\Bigg[e^{-\delta(t\wedge\tau^{\varpi})}V(X_{t\wedge\tau^{\varpi}})+\int_{0}^{t\wedge\tau^{\varpi}}e^{-\delta s}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}s\Bigg]
𝔼x[(0tτϖeδs(η,ρδ)φ(Xs)ds+0tτϖeδsf(Xt,ηt,ρt)ds)]\displaystyle\quad-\mathbb{E}_{x}\Bigg[\Big(\int_{0}^{t\wedge\tau^{\varpi}}e^{-\delta s}(\mathcal{L}^{\eta,\rho}-\delta)\varphi(X_{s})\operatorname*{\mathrm{d\!}}s+\int_{0}^{t\wedge\tau^{\varpi}}e^{-\delta s}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}s\Big)\Bigg]
𝔼x[eδ(tτϖ)V(Xtτϖ)+0tτϖeδsf(Xt,ηt,ρt)ds]ϵ𝔼x[1etτϖδ].\displaystyle\leq\mathbb{E}_{x}\Bigg[e^{-\delta(t\wedge\tau^{\varpi})}V(X_{t\wedge\tau^{\varpi}})+\int_{0}^{t\wedge\tau^{\varpi}}e^{-\delta s}f(X_{t},\eta_{t},\rho_{t})\operatorname*{\mathrm{d\!}}s\Bigg]-\epsilon\mathbb{E}_{x}\Bigg[\frac{1-e^{-t\wedge\tau^{\varpi}}}{\delta}\Bigg].

Then, by the arbitrariness of control (η,ρ)U(\eta,\rho)\in U, the dynamic programming principle and the fact that τϖ>0\tau^{\varpi}>0 for all ωΩ\omega\in\Omega, we obtain that φ(x)<V(x)\varphi(x)<V(x), which is a contradiction since φ(x)=V(x)\varphi(x)=V(x).

Appendix C Proof of Proposition 3.4

We prove the comparison principle by the usual arguments of contradiction (see e.g. Touzi, (2012), Albrecher et al., (2022)). Assume that

0<M:=supx𝒪{v(x)u(x)},0<M:=\sup_{x\in\mathcal{O}}\{v(x)-u(x)\},

and x:=argmaxx𝒪{v(x)u(x)}x^{*}:=\operatorname*{arg\,max}_{x\in\mathcal{O}}\{v(x)-u(x)\}. Since both uu and vv are Lipschitz continuous on 𝒪(0,1)\mathcal{O}\subset(0,1), there exists a constant K>0K>0 such that

|u(x)u(y)|K|xy|,|v(x)v(y)|K|xy|,for x,y𝒪.|u(x)-u(y)|\leq K|x-y|,\quad|v(x)-v(y)|\leq K|x-y|,\quad\text{for }x,y\in\mathcal{O}.

To proceed, let us consider the following set

𝒮:={(x,y)𝒪×𝒪:xy}.\mathcal{S}:=\Big\{(x,y)\in\mathcal{O}\times\mathcal{O}:x\leq y\Big\}.

For all α>0\alpha>0, define two auxiliary functions,

Ψα(x,y)\displaystyle\Psi_{\alpha}(x,y) :=α2(xy)2+2Kα2(yx)+α,\displaystyle:=\frac{\alpha}{2}(x-y)^{2}+\frac{2K}{\alpha^{2}(y-x)+\alpha},
Hα(x,y)\displaystyle H_{\alpha}(x,y) :=v(x)u(y)Ψα(x,y).\displaystyle:=v(x)-u(y)-\Psi_{\alpha}(x,y).

Let Mα:=max(x,y)𝒮Hα(x,y)M_{\alpha}:=\max_{(x,y)\in\mathcal{\mathcal{S}}}H_{\alpha}(x,y), and (xα,yα):=argmax(x,y)𝒮Hα(x,y)(x_{\alpha},y_{\alpha}):=\operatorname*{arg\,max}_{(x,y)\in\mathcal{S}}H_{\alpha}(x,y). Then, we directly have MαHα(x,x)=M2KαM_{\alpha}\geq H_{\alpha}(x^{*},x^{*})=M-\frac{2K}{\alpha}, hence

lim infαMαM>0.\liminf_{\alpha\to\infty}M_{\alpha}\geq M>0.

Note that, by using the increasing and Lipschitz continuous property of uu and vv and the fact that uvu\geq v on 𝒪\partial\mathcal{O}, we can show that there exists a sufficiently large α¯\overline{\alpha} such that for all αα¯\alpha\geq\overline{\alpha}, (xα,yα)(x_{\alpha},y_{\alpha}) is an interior point of 𝒪\mathcal{O} (see, for example, (Wang et al., , 2024, Appendix B)).

Then by the following inequality,

Hα(xα,xα)+Hα(yα,yα)2Hα(xα,yα),H_{\alpha}(x_{\alpha},x_{\alpha})+H_{\alpha}(y_{\alpha},y_{\alpha})\leq 2H_{\alpha}(x_{\alpha},y_{\alpha}),

we arrive at

α|xαyα|2|u(xα)u(yα)|+|v(xα)v(yα)|+4K(yαxα)6K|xαyα|.\alpha|x_{\alpha}-y_{\alpha}|^{2}\leq|u(x_{\alpha})-u(y_{\alpha})|+|v(x_{\alpha})-v(y_{\alpha})|+4K(y_{\alpha}-x_{\alpha})\leq 6K|x_{\alpha}-y_{\alpha}|.

Therefore, by considering a sequence αn\alpha_{n}\to\infty as nn\to\infty such that (xαn,yαn)(x^,y^)𝒪(x_{\alpha_{n}},y_{\alpha_{n}})\to(\hat{x},\hat{y})\in\mathcal{O}, we have

|xαnyαn|6Kαn|x_{\alpha_{n}}-y_{\alpha_{n}}|\leq\frac{6K}{\alpha_{n}}

which yields x^=y^\hat{x}=\hat{y}.

Then, we construct two twice continuously differentiable functions

φ(x)\displaystyle\varphi(x) =Ψα(x,yα)Ψα(xα,yα)+v(xα),\displaystyle=\Psi_{\alpha}(x,y_{\alpha})-\Psi_{\alpha}(x_{\alpha},y_{\alpha})+v(x_{\alpha}),
ψ(y)\displaystyle\psi(y) =Ψα(xα,yα)Ψα(xα,y)+u(yα),\displaystyle=\Psi_{\alpha}(x_{\alpha},y_{\alpha})-\Psi_{\alpha}(x_{\alpha},y)+u(y_{\alpha}),

which are essentially test functions for the subsolution and supersolution of (3.2) at the points xαx_{\alpha} and yαy_{\alpha}, respectively. To simplify our proof here, we first assume that v(x)v(x) and u(y)u(y) are twice continuously differentiable at xαx_{\alpha} and yαy_{\alpha}, respectively; one can resort to a more general theorem to get a similar result when u,vu,v are not twice continuously differentiable at the point (see e.g. Crandall et al., (1992)). Since HαH_{\alpha} reaches a local maximum at (xα,yα)(x_{\alpha},y_{\alpha}) which is an interior point in 𝒪\mathcal{O}, hence we have

xHα(xα,yα)=yHα(xα,yα)=0.\frac{\partial}{\partial x}H_{\alpha}(x_{\alpha},y_{\alpha})=\frac{\partial}{\partial y}H_{\alpha}(x_{\alpha},y_{\alpha})=0.

Therefore, we arrive at

vx(xα)=xΨα(xα,yα)=yΨα(xα,yα)=uy(yα).v_{x}(x_{\alpha})=\frac{\partial}{\partial x}\Psi_{\alpha}(x_{\alpha},y_{\alpha})=-\frac{\partial}{\partial y}\Psi_{\alpha}(x_{\alpha},y_{\alpha})=u_{y}(y_{\alpha}).

In addition, the Hessian matrix is negative semi-definite,

(2x2Hα(xα,yα)2xyHα(xα,yα)2yxHα(xα,yα)2y2Hα(xα,yα))0\begin{pmatrix}\frac{\partial^{2}}{\partial x^{2}}H_{\alpha}(x_{\alpha},y_{\alpha})&\frac{\partial^{2}}{\partial x\partial y}H_{\alpha}(x_{\alpha},y_{\alpha})\\ \frac{\partial^{2}}{\partial y\partial x}H_{\alpha}(x_{\alpha},y_{\alpha})&\frac{\partial^{2}}{\partial y^{2}}H_{\alpha}(x_{\alpha},y_{\alpha})\end{pmatrix}\leq 0 (C.1)

Let A=vxx(xα),B=uyy(yα)A=v_{xx}(x_{\alpha}),B=u_{yy}(y_{\alpha}), and

M=(2x2Ψα(xα,yα)2xyΨα(xα,yα)2yxΨα(xα,yα)2y2Ψα(xα,yα),)M=\begin{pmatrix}\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})&\frac{\partial^{2}}{\partial x\partial y}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\\ \frac{\partial^{2}}{\partial y\partial x}\Psi_{\alpha}(x_{\alpha},y_{\alpha})&\frac{\partial^{2}}{\partial y^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha}),\end{pmatrix}

we can rewrite (C.1) as

(A2x2Ψα(xα,yα)2xyΨα(xα,yα)2yxΨα(xα,yα)B2y2Ψα(xα,yα))=(A00B)M0.\begin{pmatrix}A-\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})&-\frac{\partial^{2}}{\partial x\partial y}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\\ -\frac{\partial^{2}}{\partial y\partial x}\Psi_{\alpha}(x_{\alpha},y_{\alpha})&-B-\frac{\partial^{2}}{\partial y^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\end{pmatrix}=\begin{pmatrix}A&0\\ 0&-B\end{pmatrix}-M\leq 0.

Then, according to (Crandall et al., , 1992, Theorem 3.2), for any ε>0\varepsilon>0, there exists Aε,BεA_{\varepsilon},B_{\varepsilon}\in\mathbb{R} such that,

(Aε00Bε)M+εM2,\begin{pmatrix}A_{\varepsilon}&0\\ 0&-B_{\varepsilon}\end{pmatrix}\leq M+\varepsilon M^{2}, (C.2)

and (φx(xα),Aε)Dxα2,+v(xα)(\varphi_{x}(x_{\alpha}),A_{\varepsilon})\in D^{2,+}_{x_{\alpha}}v(x_{\alpha}) and (ψx(xα),Bε)Dyα2,u(yα)(\psi_{x}(x_{\alpha}),B_{\varepsilon})\in D^{2,-}_{y_{\alpha}}u(y_{\alpha}), hence, we have

F(xα,v(xα),φx(xα),Aε)0, and F(yα,u(xα),ψy(yα),Bε)0.F(x_{\alpha},v(x_{\alpha}),\varphi_{x}(x_{\alpha}),A_{\varepsilon})\geq 0,\quad\text{ and }\quad F(y_{\alpha},u(x_{\alpha}),\psi_{y}(y_{\alpha}),B_{\varepsilon})\leq 0. (C.3)

In addition, by noting that M=2x2Ψα(xα,yα)(1111)M=\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\begin{pmatrix}1&-1\\ -1&1\end{pmatrix}, we obtain from (C.2) that

AεBε\displaystyle A_{\varepsilon}-B_{\varepsilon} =(11)(Aε00Bε)(11)(11)(M+εM2)(11)\displaystyle=\begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix}A_{\varepsilon}&0\\ 0&-B_{\varepsilon}\end{pmatrix}\begin{pmatrix}1\\ 1\end{pmatrix}\leq\begin{pmatrix}1&1\end{pmatrix}(M+\varepsilon M^{2})\begin{pmatrix}1\\ 1\end{pmatrix}
=(11)[2x2Ψα(xα,yα)(1111)+ε(2x2Ψα(xα,yα))2(1111)](11)2\displaystyle=\begin{pmatrix}1&1\end{pmatrix}\Bigg[\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\begin{pmatrix}1&-1\\ -1&1\end{pmatrix}+\varepsilon\Big(\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\Big)^{2}\begin{pmatrix}1&-1\\ -1&1\end{pmatrix}\Bigg]\begin{pmatrix}1\\ 1\end{pmatrix}^{2}
=(2x2Ψα(xα,yα)+2ε(2x2Ψα(xα,yα))2)(11)(1111)(11)=0.\displaystyle=\Big(\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})+2\varepsilon\Big(\frac{\partial^{2}}{\partial x^{2}}\Psi_{\alpha}(x_{\alpha},y_{\alpha})\Big)^{2}\Big)\begin{pmatrix}1&1\end{pmatrix}\begin{pmatrix}1&-1\\ -1&1\end{pmatrix}\begin{pmatrix}1\\ 1\end{pmatrix}=0.

Therefore, we can derive from (C.3),

0\displaystyle 0 F(xα,v(xα),φx(xα),Aε)F(yα,u(xα),ψy(yα),Bε)\displaystyle\leq F(x_{\alpha},v(x_{\alpha}),\varphi_{x}(x_{\alpha}),A_{\varepsilon})-F(y_{\alpha},u(x_{\alpha}),\psi_{y}(y_{\alpha}),B_{\varepsilon})
inf(ρ,η)U{b(xα,η,ρ)φx(xα)b(yα,η,ρ)ψy(yα)+f(xα,η,ρ)f(yα,η,ρ)}\displaystyle\leq\inf_{(\rho,\eta)\in U}\Big\{b(x_{\alpha},\eta,\rho)\varphi_{x}(x_{\alpha})-b(y_{\alpha},\eta,\rho)\psi_{y}(y_{\alpha})+f(x_{\alpha},\eta,\rho)-f(y_{\alpha},\eta,\rho)\Big\}
+12(σ2(xα)Aεσ2(yα)Bε)δ(v(xα)u(yα)).\displaystyle+\frac{1}{2}\Big(\sigma^{2}(x_{\alpha})A_{\varepsilon}-\sigma^{2}(y_{\alpha})B_{\varepsilon}\Big)-\delta(v(x_{\alpha})-u(y_{\alpha})). (C.4)

Hence, by (C), we can obtain

0<\displaystyle 0< Mlim infαMαlimnMαn=limn(v(xαn)u(yαn))\displaystyle M\leq\liminf_{\alpha\to\infty}M_{\alpha}\leq\lim_{n\to\infty}M_{\alpha_{n}}=\lim_{n\to\infty}(v(x_{\alpha_{n}})-u(y_{\alpha_{n}}))
limn1δ(inf(ρ,η)U{b(xα,η,ρ)φx(xα)b(yα,η,ρ)ψy(yα)+f(xα,η,ρ)f(yα,η,ρ)}\displaystyle\leq\lim_{n\to\infty}\frac{1}{\delta}\Bigg(\inf_{(\rho,\eta)\in U}\Big\{b(x_{\alpha},\eta,\rho)\varphi_{x}(x_{\alpha})-b(y_{\alpha},\eta,\rho)\psi_{y}(y_{\alpha})+f(x_{\alpha},\eta,\rho)-f(y_{\alpha},\eta,\rho)\Big\}
+12(σ2(xα)Aεσ2(yα)Bε))\displaystyle\qquad\qquad+\frac{1}{2}\Big(\sigma^{2}(x_{\alpha})A_{\varepsilon}-\sigma^{2}(y_{\alpha})B_{\varepsilon}\Big)\Bigg)
=inf(ρ,η)U{b(x^,η,ρ)}1δ(φx(x^)ψy(x^))+σ2(x^)2δ(AεBε)\displaystyle=\inf_{(\rho,\eta)\in U}\{b(\hat{x},\eta,\rho)\}\frac{1}{\delta}(\varphi_{x}(\hat{x})-\psi_{y}(\hat{x}))+\frac{\sigma^{2}(\hat{x})}{2\delta}(A_{\varepsilon}-B_{\varepsilon})
inf(ρ,η)U{b(x^,η,ρ)}1δ(φx(x^)ψy(x^))=0,\displaystyle\leq\inf_{(\rho,\eta)\in U}\{b(\hat{x},\eta,\rho)\}\frac{1}{\delta}(\varphi_{x}(\hat{x})-\psi_{y}(\hat{x}))=0,

which is a contradiction, then we complete the proof.

Appendix D Infinite-Horizon BSDE and Convergence of PIA

D.1 Assumptions and notations

Fix a filtered probability space (Ω,t,𝔽:=(t)0t,)(\Omega,\mathcal{F}_{t},\mathbb{F}:=(\mathcal{F}_{t})_{0\leq t\leq\infty},\mathbb{P}) and let W=(Wt)0t<W=(W_{t})_{0\leq t<\infty} be a one-dimensional Wiener process on this space. For self-containment, we introduce the following notations:

  • (i)

    We first introduce the notations of some spaces that are involved in the later analysis: For any γ>0\gamma>0, let 𝕃2,γ(0,;)\mathbb{L}_{\mathcal{F}}^{2,\gamma}(0,\infty;\mathbb{R}) be the set of all \mathbb{R}-valued 𝔽\mathbb{F}-adapted process ϕ\phi_{\cdot} such that

    𝔼[0e2γt|ϕt|2dt]<.\mathbb{E}\Bigg[\int_{0}^{\infty}e^{2\gamma t}|\phi_{t}|^{2}\operatorname*{\mathrm{d\!}}t\Bigg]<\infty.

    Let 𝕃2,γ(Ω;C[0,];)\mathbb{L}_{\mathcal{F}}^{2,\gamma}(\Omega;C[0,\infty];\mathbb{R}) be the set of all \mathbb{R}-valued 𝔽\mathbb{F}-adapted continuous process ϕ\phi_{\cdot} such that

    𝔼[supt[0,]e2γt|ϕt|2]<.\mathbb{E}\Big[\sup_{t\in[0,\infty]}e^{2\gamma t}|\phi_{t}|^{2}\Big]<\infty.

    Finally, let 𝕃τ2,γ(Ω;)\mathbb{L}_{\mathcal{F}_{\tau}}^{2,\gamma}(\Omega;\mathbb{R}) be the set of \mathbb{R}-valued τ\mathcal{F}_{\tau}-measurable random variables ξ\xi such that 𝔼[e2γτ|ξ|2]<\mathbb{E}\Big[e^{2\gamma\tau}|\xi|^{2}\Big]<\infty, where τ\tau is any 𝔽\mathbb{F}-stopping time taking values in [0,][0,\infty].

    Then, we define the space

    𝔹γ[0,]:={𝕃2,γ(Ω;C[0,];)𝕃2,γ(0,;)}×𝕃2,γ(0,;)\mathbb{B}_{\gamma}[0,\infty]:=\left\{\mathbb{L}_{\mathcal{F}}^{2,\gamma}(\Omega;C[0,\infty];\mathbb{R})\bigcap\mathbb{L}_{\mathcal{F}}^{2,\gamma}(0,\infty;\mathbb{R})\right\}\times\mathbb{L}_{\mathcal{F}}^{2,\gamma}(0,\infty;\mathbb{R}) (D.1)

    with the norm

    (Y,Z)𝔹γ[0,]:=(𝔼[supt[0,]e2γt|Yt|2+0e2γt|Yt|2dt+0e2γt|Zt|2dt])1/2,\displaystyle\|(Y_{\cdot},Z_{\cdot})\|_{\mathbb{B}_{\gamma}[0,\infty]}:=\Bigg(\mathbb{E}\Bigg[\sup_{t\in[0,\infty]}e^{2\gamma t}|Y_{t}|^{2}+\int_{0}^{\infty}e^{2\gamma t}|Y_{t}|^{2}\operatorname*{\mathrm{d\!}}t+\int_{0}^{\infty}e^{2\gamma t}|Z_{t}|^{2}\operatorname*{\mathrm{d\!}}t\Bigg]\Bigg)^{1/2},

    for any (Y,Z)𝔹γ[0,](Y_{\cdot},Z_{\cdot})\in\mathbb{B}_{\gamma}[0,\infty]. Obviously, 𝔹γ[0,]\mathbb{B}_{\gamma}[0,\infty] with the norm is a Banach space.

  • (ii)

    For a constant γ>0\gamma>0, and predictable process (ϕt)t0(\phi_{t})_{t\geq 0}, we introduce the 𝕃γ2\mathbb{L}_{\gamma}^{2}-norm:

    ϕ𝕃γ2:=(𝔼0e2γt|ϕt|2dt)1/2.\|\phi\|_{\mathbb{L}^{2}_{\gamma}}:=\left(\mathbb{E}\int_{0}^{\infty}e^{2\gamma t}|\phi_{t}|^{2}\operatorname*{\mathrm{d\!}}t\right)^{1/2}.
  • (iii)

    For adapted process ϕ\phi_{\cdot} such that 𝔼[0|ϕt|2dt]<\mathbb{E}\left[\int_{0}^{\infty}|\phi_{t}|^{2}\operatorname*{\mathrm{d\!}}t\right]<\infty, we define

    (ϕW)t:=0tϕsdWs,fort0.(\phi\bullet W)_{t}:=\int_{0}^{t}\phi_{s}\operatorname*{\mathrm{d\!}}W_{s},\quad\text{for}\;t\geq 0.
  • (iv)

    For any continuous local martingale MM let (Mt)t0(\langle M\rangle_{t})_{t\geq 0} denote the quadratic variation process and let

    (M)t:=exp{Mt12Mt},fort0\mathcal{E}(M)_{t}:=\exp\left\{M_{t}-\frac{1}{2}\langle M\rangle_{t}\right\},\quad\text{for}\;t\geq 0

    denotes the Doléans-Dade exponential of MtM_{t} given the initial data M0=0M_{0}=0.

Assumption D.1

For each fixed (x,z)(0,1)×(x,z)\in(0,1)\times\mathbb{R}, we assume the function

u(x,z):=argminuU{(buσ1)(x)z+fu(x)},u(x,z):=\operatorname*{arg\,min}_{u\in U}\left\{\big(b^{u}\sigma^{-1}\big)(x)z+f^{u}(x)\right\},

is measurable.

For notation simplicity, we write u(x,z)=(η,ρ)(x,z)Uu(x,z)=(\eta,\rho)(x,z)\in U, where U=[0,1]×[0,)U=[0,1]\times[0,\infty) is the action space of our cyber risk control problem (2.5), and bu(x)=b(x,η,ρ)b^{u}(x)=b(x,\eta,\rho), and fu(x)=f(x,η,ρ)f^{u}(x)=f(x,\eta,\rho), which are the drift term and cost function in (3.2), σ1(x)=1/σ(x)\sigma^{-1}(x)=1/\sigma(x). One can refer to Kerimkulov et al., (2020) for a short discussion on the validity of the measurability.

Assumption D.2

There are constants K,θ0K,\theta\geq 0 such that the following hold:

  1. (i)

    For x(0,1)x\in(0,1), u,uUu,u^{\prime}\in U,

    |bu(x)bu(x)|θ|uu|,|b^{u}(x)-b^{u^{\prime}}(x)|\leq\sqrt{\theta}|u-u^{\prime}|,

    and for all x(0,1)x\in(0,1), uUu\in U, we have

    |(buσ1)(x)|<K.|\big(b^{u}\sigma^{-1}\big)(x)|<K.
  2. (ii)

    For all x,x(0,1),z,zx,x^{\prime}\in(0,1),z,z^{\prime}\in\mathbb{R}, and uUu\in U we have that

    |u(x,z)u(x,z)|\displaystyle|u(x,z)-u(x^{\prime},z)| K|xx|,\displaystyle\leq K|x-x^{\prime}|,
    |u(x,z)u(x,z)|\displaystyle|u(x,z)-u(x,z^{\prime})| θ|zz|.\displaystyle\leq\sqrt{\theta}|z-z^{\prime}|.
  3. (iii)

    For x(0,1)x\in(0,1), u,uUu,u^{\prime}\in U,

    |fu(x)fu(x)|θ|uu|.|f^{u}(x)-f^{u^{\prime}}(x)|\leq\sqrt{\theta}|u-u^{\prime}|.

D.2 Some preliminary lemmas

The following lemma is a straightforward result of Girsanov’s theorem under the random process ((buσ1)(Xs))s0\Big(\big(b^{u}\sigma^{-1}\big)(X_{s})\Big)_{s\geq 0}, which is bounded (according to Assumption D.2). This result will be helpful in the construction of a contraction mapping when proving Theorem 4.1 under a new measure ^\widehat{\mathbb{P}} on the probability space.

Lemma D.1

Let 0t0\leq t\leq\infty, x(0,1)x\in(0,1), and X:=Xx,uX:=X^{x,u^{*}} be the unique solution to the SDE (2.3), started from time 0 with initial data X0=xX_{0}=x, and controlled by the optimal control process u=(ut)t0u^{*}=(u^{*}_{t})_{t\geq 0} (with a abuse of notation, we use u=(η.,ρ.)𝒰0u=(\eta_{.},\rho_{.})\in\mathcal{U}_{0} to denote the control process as well). Then, d^:=((buσ1(X)W)d\operatorname*{\mathrm{d\!}}\widehat{\mathbb{P}}:=\mathcal{E}\Big(\big(b^{u^{*}}\sigma^{-1}\big(X)\bullet W\Big)_{\infty}\operatorname*{\mathrm{d\!}}\mathbb{P} is equivalent to the probability measure \mathbb{P}, and the process

W^t:=Wt+0tbus(Xs)σ1(Xs)ds\widehat{W}_{t}:=W_{t}+\int_{0}^{t}b^{u^{*}_{s}}(X_{s})\sigma^{-1}(X_{s})\operatorname*{\mathrm{d\!}}s

is a ^\widehat{\mathbb{P}}-Wiener process.

Proof. See Theorem 6.8.8 of Krylov, (2002).  

Assumption D.3

For all (z,Z)2(z,Z)\in\mathbb{R}^{2}, Fs(z,Z)F_{s}(z,Z) for s[0,]s\in[0,\infty] is progressively measurable. And, there exist constants L1,L2,γ>0L_{1},L_{2},\gamma>0 such that F(0,0)𝕃2,γ(0,;)F_{\cdot}(0,0)\in\mathbb{L}_{\mathcal{F}}^{2,\gamma}(0,\infty;\mathbb{R}), and for any z,Z,z¯,Z¯z,Z,\bar{z},\bar{Z}\in\mathbb{R}, s[0,)s\in[0,\infty),

|Fs(z,Z)Fs(z¯,Z¯)|L1|zz¯|+L2|ZZ¯|,-a.s.,|F_{s}(z,Z)-F_{s}(\bar{z},\bar{Z})|\leq L_{1}|z-\bar{z}|+L_{2}|Z-\bar{Z}|,\quad\mathbb{P}\text{-}a.s., (D.2)

and

γ>L12+L222.\gamma>\frac{L^{2}_{1}+L_{2}^{2}}{2}. (D.3)

The following lemma states the unique solution of an infinite-horizon BSDE associated with our infinite-horizon stochastic control problem.

Lemma D.2

Let Assumption D.3 hold. For any given z𝕃2,γ(0,;)z_{\cdot}\in\mathbb{L}_{\mathcal{F}}^{2,\gamma}(0,\infty;\mathbb{R}), the following BSDE

Yt=tFs(zs,Zs)dstZsdWs,t[0,),Y_{t}=\int_{t}^{\infty}F_{s}(z_{s},Z_{s})\operatorname*{\mathrm{d\!}}s-\int_{t}^{\infty}Z_{s}\operatorname*{\mathrm{d\!}}W_{s},\quad t\in[0,\infty), (D.4)

admits a unique solution (Y,Z)𝔹γ[0,](Y_{\cdot},Z_{\cdot})\in\mathbb{B}_{\gamma}[0,\infty]. Further, there exists a constant C>0C>0 such that if (Y¯,Z¯)𝔹γ[0,](\overline{Y}_{\cdot},\overline{Z}_{\cdot})\in\mathbb{B}_{\gamma}[0,\infty] is the solution of (D.4) with FF replaced by F¯\overline{F}, where F¯\overline{F} satisfies Assumption D.3, then

(Y,Z)(Y¯,Z¯)𝔹γ[0,]\displaystyle\|(Y,Z)-(\overline{Y},\overline{Z})\|_{\mathbb{B}_{\gamma}[0,\infty]}
\displaystyle\leq C(0e2γs|Fs(zs,Zs)|F¯s(zs,Zs)|2ds)1/2\displaystyle C\Bigg(\int_{0}^{\infty}e^{2\gamma s}|F_{s}(z_{s},Z_{s})|-\overline{F}_{s}(z_{s},Z_{s})|^{2}\operatorname*{\mathrm{d\!}}s\Bigg)^{1/2} (D.5)

Proof. See, for example,(Yong and Zhou, , 1999, Theorem 7.3.6).  

Moreover, the comparison principle for infinite-horizon BSDE was also well established, see, e.g. (Hamadène et al., , 1999, Theorem 2.2 ).

Lemma D.3

Let (Yi,Zi)𝔹γ[0,](Y^{i},Z^{i})\in\mathbb{B}_{\gamma}[0,\infty] be the solution to the following BSDE

Yti=ξi+tFsi(Zsi)dstZsidWs,t[0,],i=1,2,Y^{i}_{t}=\xi^{i}+\int_{t}^{\infty}F^{i}_{s}(Z_{s}^{i})\operatorname*{\mathrm{d\!}}s-\int_{t}^{\infty}Z^{i}_{s}\operatorname*{\mathrm{d\!}}W_{s},~t\in[0,\infty],~i=1,2,

where ξi𝕃2,γ(Ω;)\xi^{i}\in\mathbb{L}_{\mathcal{F}_{\infty}}^{2,\gamma}(\Omega;\mathbb{R}) and FiF^{i} satisfy the Assumption D.3 (with zz removed) for i=1,2i=1,2, respectively. If further ξ1ξ2a.s.\xi^{1}\geq\xi^{2}~a.s., and Fs1(Zs2)Fs2(Zs2)a.s.F^{1}_{s}(Z^{2}_{s})\geq F^{2}_{s}(Z^{2}_{s})~a.s. for all s0s\geq 0, then Yt1Yt2-a.s.Y^{1}_{t}\geq Y^{2}_{t}~\mathbb{P}\text{-}a.s. for all t0t\geq 0.

Now, we are ready to present the following Lemma D.4, which plays a key role in the proof of Theorem 4.1. Lemma D.4 follows a similar idea to Lemma A.5 of Kerimkulov et al., (2020), see also Lemma 3.2 of Fuhrman and Tessitore, (2004).

Lemma D.4

Let F:Ω×[0,)××F:\Omega\times[0,\infty)\times\mathbb{R}\times\mathbb{R}\to\mathbb{R} be a measurable function that satisfies Assumption D.3. Fix δ>(L1+L2)2/2\delta>(L_{1}+L_{2})^{2}/2. Let (Y,Z)𝔹δ[0,](Y_{\cdot},Z_{\cdot})\in\mathbb{B}_{\delta}[0,\infty] be the unique solution to (D.4) for any given z𝕃2,δ(0,;)z_{\cdot}\in\mathbb{L}_{\mathcal{F}}^{2,\delta}(0,\infty;\mathbb{R}). Moreover, assume that for z1,z2𝕃2,δ(0,;)z^{1}_{\cdot},z^{2}_{\cdot}\in\mathbb{L}_{\mathcal{F}}^{2,\delta}(0,\infty;\mathbb{R}) the following condition is satisfied:

|Ft(zt1,Zt1)Ft(zt2,Zt2)|L1|zt1zt2|+L2|Zt1Zt2|,t[0,],-a.s.|F_{t}(z^{1}_{t},Z^{1}_{t})-F_{t}(z^{2}_{t},Z^{2}_{t})|\leq L_{1}|z^{1}_{t}-z^{2}_{t}|+L_{2}|Z^{1}_{t}-Z^{2}_{t}|,\quad t\in[0,\infty],~\mathbb{P}\textbf{-}a.s.

Then there is 0<γ<δ0<\gamma<\delta and q(0,1)q\in(0,1) such that for any t0t\geq 0 we have

𝔼[e2γt|Yt1Yt2|2]+Z1Z2𝕃γ22qz1z2𝕃γ22,\mathbb{E}\Big[e^{2\gamma t}|Y^{1}_{t}-Y^{2}_{t}|^{2}\Big]+\|Z^{1}-Z^{2}\|^{2}_{\mathbb{L}^{2}_{\gamma}}\leq q\|z^{1}-z^{2}\|^{2}_{\mathbb{L}^{2}_{\gamma}}, (D.6)

where (Yi,Zi)𝔹δ[0,](Y^{i}_{\cdot},Z^{i}_{\cdot})\in\mathbb{B}_{\delta}[0,\infty] is the unique solution to (D.4) corresponding to zi𝕃2,δ(0,;)z_{i}\in\mathbb{L}_{\mathcal{F}}^{2,\delta}(0,\infty;\mathbb{R}) for i=1,2i=1,2, respectively. Furthermore, one can choose γ\gamma sufficiently large and δγ>0\delta-\gamma>0 sufficiently small such that q(0,1/2)q\in(0,1/2), and the above results hold as well.

Proof. Denote ΔY:=Y1Y2,ΔZ:=Z1Z1,\Delta Y:=Y^{1}-Y^{2},\Delta Z:=Z^{1}-Z^{1}, and Δz:=z1z2\Delta z:=z^{1}-z^{2}. Then, apply Itô’s formula to e2γs|ΔYs|2e^{2\gamma s}|\Delta Y_{s}|^{2}:

e2γt|ΔYt|2+te2γs|ΔZs|2ds=te2γs[2ΔYs(Fs(zs1,Zs1)Fs(zs2,Zs2))2γ|ΔYs|2]ds2te2γs|ΔYs||ΔZs|dWs.\begin{split}e^{2\gamma t}|\Delta Y_{t}|^{2}+\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s&=\int_{t}^{\infty}e^{2\gamma s}[2\Delta Y_{s}\big(F_{s}(z^{1}_{s},Z^{1}_{s})-F_{s}(z^{2}_{s},Z^{2}_{s})\big)-2\gamma|\Delta Y_{s}|^{2}]\operatorname*{\mathrm{d\!}}s\\ &-2\int_{t}^{\infty}e^{2\gamma s}|\Delta Y_{s}||\Delta Z_{s}|\operatorname*{\mathrm{d\!}}W_{s}.\end{split}

By taking the expectation of the equation above, we get

𝔼[e2γt|ΔYt|2+te2γs|ΔZs|2ds]=𝔼[te2γs[2ΔYs(Fs(zs1,Zs1)Fs(zs2,Zs2))2γ|ΔYs|2]ds].\begin{split}&\mathbb{E}\left[e^{2\gamma t}|\Delta Y_{t}|^{2}+\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\right]\\ &\qquad=\mathbb{E}\left[\int_{t}^{\infty}e^{2\gamma s}[2\Delta Y_{s}(F_{s}(z^{1}_{s},Z^{1}_{s})-F_{s}(z^{2}_{s},Z^{2}_{s}))-2\gamma|\Delta Y_{s}|^{2}]\operatorname*{\mathrm{d\!}}s\right].\end{split} (D.7)

By the Lipschitz property of the generator and by Young’s inequality, we observe that, for any ϵ>0\epsilon>0,

𝔼[te2γs(2ΔYs(Fs(zs1,Zs1)Fs(zs2,Zs2))2γ|ΔYt|2)ds]\displaystyle\mathbb{E}\left[\int_{t}^{\infty}e^{2\gamma s}\big(2\Delta Y_{s}(F_{s}(z^{1}_{s},Z^{1}_{s})-F_{s}(z^{2}_{s},Z^{2}_{s}))-2\gamma|\Delta Y_{t}|^{2}\big)\operatorname*{\mathrm{d\!}}s\right]
\displaystyle\leq 𝔼[te2γs(2ΔYs(L1|Δzs|+L2|ΔZs|)2γ|ΔYt|2)ds]\displaystyle\mathbb{E}\left[\int_{t}^{\infty}e^{2\gamma s}\big(2\Delta Y_{s}(L_{1}|\Delta z_{s}|+L_{2}|\Delta Z_{s}|)-2\gamma|\Delta Y_{t}|^{2}\big)\operatorname*{\mathrm{d\!}}s\right]
\displaystyle\leq 𝔼[te2γs((L1+L2)ϵΔ|Ys|2+L1ϵ1|Δzs|2+L2ϵ1|ΔZs|22γ|ΔYt|2)ds]\displaystyle\mathbb{E}\left[\int_{t}^{\infty}e^{2\gamma s}\big((L_{1}+L_{2})\epsilon\Delta|Y_{s}|^{2}+L_{1}\epsilon^{-1}|\Delta z_{s}|^{2}+L_{2}\epsilon^{-1}|\Delta Z_{s}|^{2}-2\gamma|\Delta Y_{t}|^{2}\big)\operatorname*{\mathrm{d\!}}s\right]
\displaystyle\leq max{(L1+L2)L12γ,(L1+L2)L22γ}𝔼[te2γs(|Δzs|2+|ΔZs|2)ds],\displaystyle\max\left\{\frac{(L_{1}+L_{2})L_{1}}{2\gamma},\frac{(L_{1}+L_{2})L_{2}}{2\gamma}\right\}\cdot\mathbb{E}\left[\int_{t}^{\infty}e^{2\gamma s}\Big(|\Delta z_{s}|^{2}+|\Delta Z_{s}|^{2}\Big)\operatorname*{\mathrm{d\!}}s\right], (D.8)

where the last inequality holds by choosing ϵ=2γ/(L1+L2)\epsilon=2\gamma/(L_{1}+L_{2}). Then, take γ\gamma sufficiently large such that δγ>0\delta-\gamma>0 sufficiently small, we have

q~:=max{(L1+L2)L12γ,(L1+L2)L22γ}(0,12),\tilde{q}:=\max\left\{\frac{(L_{1}+L_{2})L_{1}}{2\gamma},\frac{(L_{1}+L_{2})L_{2}}{2\gamma}\right\}\in\left(0,\frac{1}{2}\right),

and subtracting q~𝔼[te2γs|ΔZs|2ds]\tilde{q}\cdot\mathbb{E}\big[\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\big] on both sides of (D.7) together with (D.2), we have

𝔼[e2γt|ΔYt|2+(1q~)te2γs|ΔZs|2ds]q~𝔼[te2γs|Δzs|2ds].\mathbb{E}\left[e^{2\gamma t}|\Delta Y_{t}|^{2}+(1-\tilde{q})\cdot\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\right]\leq\tilde{q}\cdot\mathbb{E}\Big[\int_{t}^{\infty}e^{2\gamma s}|\Delta z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\Big]. (D.9)

Dividing by 1q~1-\tilde{q} on both sides of (D.9), we have

𝔼[e2γt|ΔYt|2+te2γs|ΔZs|2ds]𝔼[11q~e2γt|ΔYt|2+te2γs|ΔZs|2ds]q𝔼[te2γs(|Δzs|2)ds],\begin{split}\mathbb{E}\left[e^{2\gamma t}|\Delta Y_{t}|^{2}+\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\right]&\leq\mathbb{E}\left[\frac{1}{1-\tilde{q}}\cdot e^{2\gamma t}|\Delta Y_{t}|^{2}+\int_{t}^{\infty}e^{2\gamma s}|\Delta Z_{s}|^{2}\operatorname*{\mathrm{d\!}}s\right]\\ &\leq q\cdot\mathbb{E}\Big[\int_{t}^{\infty}e^{2\gamma s}\left(|\Delta z_{s}|^{2}\right)\operatorname*{\mathrm{d\!}}s\Big],\end{split}

where the first inequality holds due to 11q~>2\frac{1}{1-\tilde{q}}>2, and the second holds by setting q:=q~1q~(0,1)q:=\frac{\tilde{q}}{1-\tilde{q}}\in(0,1). If one further chooses γ\gamma large such that q~(0,1/3)\tilde{q}\in(0,1/3), hence one get q(0,1/2)q\in(0,1/2), and complete the proof.  

Lemma D.5

The solution Y0xY_{0}^{x} to the following BSDE

Y0x=0Fs(Zs,Zs)ds0ZsdW^s,Y_{0}^{x}=\int_{0}^{\infty}F_{s}(Z_{s},Z_{s})\operatorname*{\mathrm{d\!}}s-\int_{0}^{\infty}Z_{s}\operatorname*{\mathrm{d\!}}\widehat{W}_{s}, (D.10)

with FF given by (D.14) which satisfies Assumption D.3, is deterministic and continuous function on (0,1)(0,1), and is a viscosity solution to to the HJB equation (3.2).

Proof. We change the measure back to \mathbb{P} and rewrite the BSDE (D.10) as

Y0x=0eδsfu(Xs,Zs)(Xs)ds0ZsdWs.Y_{0}^{x}=\int_{0}^{\infty}e^{-\delta s}f^{u(X_{s},Z_{s})}(X_{s})\operatorname*{\mathrm{d\!}}s-\int_{0}^{\infty}Z_{s}\operatorname*{\mathrm{d\!}}W_{s}.

By the definition of infinite-horizon BSDE, YtY_{t} is a t\mathcal{F}_{t}-adapted process for t0t\geq 0. Obviously, Y0xY^{x}_{0} is measurable w.r.t. trivial σ\sigma-field 0\mathcal{F}_{0}, hence deterministic. The continuity of Y0xY^{x}_{0} for x(0,1)x\in(0,1) follows the inequality (D.2) combined with the continuity of FF and Xx,uX^{x,u}_{\cdot} in xx. Let’s then prove that Y0xY_{0}^{x} is the viscosity solution to the HJB equation (3.2). We only show the case for viscosity supersolution, and the subsolution property will follow from the same idea. As we have proved Y0xY^{x}_{0} is deterministic, consider the discounted process Ysx=eδsv(Xs)Y^{x}_{s}=e^{-\delta s}v(X_{s}) with v(x)=Y0xv(x)=Y_{0}^{x}. Take φC2(0,1)\varphi\in C^{2}(0,1) being a test function such that vφv-\varphi attains its (local) minimum value of zero at any x(0,1)x\in(0,1), we shall have the supersolution inequality holds. We prove the assertion by contradiction. Assume that

infuU{(buDxφ)(x)+12(σ2Dxxφ)(x)δφ(x)+fu(x)}>0.\inf_{u\in U}\Big\{\big(b^{u}D_{x}\varphi\big)(x)+\frac{1}{2}\big(\sigma^{2}D_{xx}\varphi\big)(x)-\delta\varphi(x)+f^{u}(x)\Big\}>0.

Since φ\varphi is smooth enough and ff is continuous, there exists ϵ>0\epsilon>0, such that for any uUu\in U, y(xϵ,x+ϵ)y\in(x-\epsilon,x+\epsilon), we have v(y)φ(y)v(y)\geq\varphi(y) and

(buDxφ)(y)+12(σ2Dxxφ)(y)δφ(y)+fu(y)>0.(b^{u}D_{x}\varphi\big)(y)+\frac{1}{2}\big(\sigma^{2}D_{xx}\varphi\big)(y)-\delta\varphi(y)+f^{u}(y)>0. (D.11)

Let τ:=inf{s0:Xs(xϵ,x+ϵ)}t\tau:=\inf\{s\geq 0:X_{s}\notin(x-\epsilon,x+\epsilon)\}\wedge t for any t0t\geq 0, and consider the pair of stopped processes

(Ysτx,Zs𝟏[0,τ]),s[0,t],(Y^{x}_{s\wedge\tau},Z_{s}\mathbf{1}_{[0,\tau]}),\quad s\in[0,t],

which solves the following finite-horizon BSDE

v(x)=ξ1+0eδsfu(Xs,Zs)(Xs)𝟏[0,τ]ds0Zs𝟏[0,τ]dWs,v(x)=\xi_{1}+\int_{0}^{\infty}e^{-\delta s}f^{u(X_{s},Z_{s})}(X_{s})\mathbf{1}_{[0,\tau]}\operatorname*{\mathrm{d\!}}s-\int_{0}^{\infty}Z_{s}\mathbf{1}_{[0,\tau]}\operatorname*{\mathrm{d\!}}W_{s},

where ξ1:=eδτv(Xτ)\xi_{1}:=e^{-\delta\tau}v(X_{\tau}). On the other hand, consider another pair of stopped processes

(eδsφ(Xsτ),eδs(σDxφ)(Xs)𝟏[0,τ]),s[0,t],\left(e^{-\delta s}\varphi(X_{s\wedge\tau}),e^{-\delta s}(\sigma D_{x}\varphi)(X_{s})\mathbf{1}_{[0,\tau]}\right),\quad s\in[0,t],

Apply the Itô’s formula to eδsφ(Xs)e^{-\delta s}\varphi(X_{s}), then

φ(x)=ξ20𝟏[0,τ]eδs(buDxφ+12σ2Dxxφδφ)(Xs)ds0eδs(σDxφ)(Xs)𝟏[0,τ]dWs,\varphi(x)=\xi_{2}-\int_{0}^{\infty}\mathbf{1}_{[0,\tau]}e^{-\delta s}\Big(b^{u^{\prime}}D_{x}\varphi+\frac{1}{2}\sigma^{2}D_{xx}\varphi-\delta\varphi\Big)(X_{s})\operatorname*{\mathrm{d\!}}s-\int_{0}^{\infty}e^{-\delta s}(\sigma D_{x}\varphi)(X_{s})\mathbf{1}_{[0,\tau]}\operatorname*{\mathrm{d\!}}W_{s},

where ξ2:=eδτφ(Xτ)\xi_{2}:=e^{-\delta\tau}\varphi(X_{\tau}). Then, since v(Xs)φ(Xs)v(X_{s})\geq\varphi(X_{s}) for all x[0,τ]x\in[0,\tau], we have ξ1ξ2\xi_{1}\geq\xi_{2}. In addition, by (D.11), one has

fu(Xs,(σDxφ)(Xs))(Xs)>(buDxφ+12σ2Dxxφδφ)(Xs),s[0,τ).f^{u\big(X_{s},(\sigma D_{x}\varphi)(X_{s})\big)}(X_{s})>-\Big(b^{u}D_{x}\varphi+\frac{1}{2}\sigma^{2}D_{xx}\varphi-\delta\varphi\Big)(X_{s}),\quad s\in[0,\tau).

Hence, by Theorem D.3 (with a further argument of strict comparison principle, see e.g. (Pham, , 2009, Theorem 6.2.2)), we have v(x)>φ(x)v(x)>\varphi(x), which is a contradiction. Hence, we have the supersolution inequality.  

D.3 Proof of Theorem 4.1

Let vnv^{n} be the smooth solution to the Bellman-type ODE (4.1), and recall the updated control at nnth iteration

un(x):=argminuU{(bu(x)Dxvn1(x)+fu(x)}=u(x,σ(x)Dxvn1(x)).u^{n}(x):=\operatorname*{arg\,min}_{u\in U}\big\{(b^{u}(x)D_{x}v^{n-1}(x)+f^{u}(x)\big\}=u\big(x,\sigma(x)D_{x}v^{n-1}(x)\big).

Define X:=Xx,uX:=X^{x,u^{*}} as the solution to the SDE (2.3) started from X0=xX_{0}=x and controlled by the optimal control policy uu^{*}. Apply the Itô’s formula to vn(Xs)v^{n}(X_{s}), we have

dvn(Xs)=(12(σ2Dxxvn)(Xs)+(buDxvn)(Xs))ds+(σDxvn)(Xs)dWs=((buDxvn)(Xs)(bunDxvn)(Xs)fun(Xs)+δvn(Xs))ds+(σDxvn)(Xs)dWs,\begin{split}\operatorname*{\mathrm{d\!}}v^{n}(X_{s})&=\left(\frac{1}{2}(\sigma^{2}D_{xx}v^{n})(X_{s})+(b^{u^{*}}D_{x}v^{n})(X_{s})\right)\operatorname*{\mathrm{d\!}}s+(\sigma D_{x}v^{n})(X_{s})\operatorname*{\mathrm{d\!}}W_{s}\\ &=\left((b^{u^{*}}D_{x}v^{n})(X_{s})-(b^{u^{n}}D_{x}v^{n})(X_{s})-f^{u^{n}}(X_{s})+\delta v^{n}(X_{s})\right)\operatorname*{\mathrm{d\!}}s+(\sigma D_{x}v^{n})(X_{s})\operatorname*{\mathrm{d\!}}W_{s},\end{split} (D.12)

where the second equality holds since vnv^{n} is a solution to the ODE (4.1). Let’s define for,

Ysn\displaystyle Y^{n}_{s} :=vn(Xs)\displaystyle:=v^{n}(X_{s})
Zsn\displaystyle Z_{s}^{n} :=(σDxvn)(Xs),\displaystyle:=\big(\sigma D_{x}v^{n}\big)(X_{s}),
Fs(z,Z)\displaystyle F_{s}(z,Z) :=(bu(Xs,z)σ1)(Xs)Z+fu(Xs,z)(Xs).\displaystyle:=(b^{u(X_{s},z)}\sigma^{-1})(X_{s})Z+f^{u(X_{s},z)}(X_{s}).

Hence, we can write (D.12) as

dYsn=((buDxvn)(Xs)Fs(Zsn1,Zsn)+δYsn)ds+ZsndWs.\operatorname*{\mathrm{d\!}}Y^{n}_{s}=\Big((b^{u^{*}}D_{x}v^{n})(X_{s})-F_{s}(Z^{n-1}_{s},Z^{n}_{s})+\delta Y^{n}_{s}\Big)\operatorname*{\mathrm{d\!}}s+Z^{n}_{s}\operatorname*{\mathrm{d\!}}W_{s}. (D.13)

Let δ>0\delta>0 be a discounting rate, and define

Y~sn:=eδsYsn,Z~s:=eδsZs,F~s(z,Z):=eδsFs(eδsz,eδsZ).\widetilde{Y}^{n}_{s}:=e^{-\delta s}Y^{n}_{s},\quad\widetilde{Z}^{\cdot}_{s}:=e^{-\delta s}Z^{\cdot}_{s},\quad\widetilde{F}_{s}(z,Z):=e^{-\delta s}F_{s}(e^{\delta s}z,e^{\delta s}Z). (D.14)

Then, we can rewrite the above BSDE (D.13) as

dY~sn=(eδs(buDxvn)(Xs)F~s(Z~sn1,Z~sn))ds+Z~sndWs.\operatorname*{\mathrm{d\!}}\widetilde{Y}^{n}_{s}=\Big(e^{-\delta s}(b^{u^{*}}D_{x}v^{n})(X_{s})-\widetilde{F}_{s}(\widetilde{Z}^{n-1}_{s},\widetilde{Z}^{n}_{s})\Big)\operatorname*{\mathrm{d\!}}s+\widetilde{Z}^{n}_{s}\operatorname*{\mathrm{d\!}}W_{s}.

Moreover, one observes that

(limtY~tn=0)=(limteδtvn(Xt)=0)=1,\mathbb{P}\Big(\lim_{t\to\infty}\widetilde{Y}^{n}_{t}=0\Big)=\mathbb{P}\Big(\lim_{t\to\infty}e^{-\delta t}v^{n}(X_{t})=0\Big)=1,

since the value function is finite and we have verified that Xt(0,1)X_{t}\in(0,1) almost surely for any t0t\geq 0 (see, Proposition 2.1). Hence, we have

Y~tn=t(F~s(Z~sn1,Z~sn)(buσ1)(Xs)Z~sn)dstZ~sndWs.\widetilde{Y}^{n}_{t}=\int_{t}^{\infty}\Big(\widetilde{F}_{s}(\widetilde{Z}^{n-1}_{s},\widetilde{Z}^{n}_{s})-(b^{u^{*}}\sigma^{-1})(X_{s})\widetilde{Z}^{n}_{s}\Big)\operatorname*{\mathrm{d\!}}s-\int_{t}^{\infty}\widetilde{Z}^{n}_{s}\operatorname*{\mathrm{d\!}}W_{s}. (D.15)

By Lemma D.1, we change the measure to ^\widehat{\mathbb{P}}, and (D.15) can be rewritten as

Y~tn=tF~s(Z~sn1,Z~sn)dstZ~sndW^s.\widetilde{Y}^{n}_{t}=\int_{t}^{\infty}\widetilde{F}_{s}(\widetilde{Z}^{n-1}_{s},\widetilde{Z}^{n}_{s})\operatorname*{\mathrm{d\!}}s-\int_{t}^{\infty}\widetilde{Z}^{n}_{s}\operatorname*{\mathrm{d\!}}\widehat{W}_{s}. (D.16)

Similarly, consider the following BSDE with vnv^{n} and vn1v^{n-1} replaced by the value function of our stochastic control problem vv (2.6) in (D.16):

Y~t=tF~s(Z~s,Z~s)dstZ~sdW^s.\widetilde{Y}_{t}=\int_{t}^{\infty}\widetilde{F}_{s}(\widetilde{Z}_{s},\widetilde{Z}_{s})\operatorname*{\mathrm{d\!}}s-\int_{t}^{\infty}\widetilde{Z}_{s}\operatorname*{\mathrm{d\!}}\widehat{W}_{s}. (D.17)

Note that for γ<δ\gamma<\delta, we have (Y~n,Z~n),(Y~,Z~)𝔹γ[0,](\widetilde{Y}^{n}_{\cdot},\widetilde{Z}^{n}_{\cdot}),\,(\widetilde{Y}_{\cdot},\widetilde{Z}_{\cdot})\in\mathbb{B}_{\gamma}[0,\infty], and by Assumption D.2,

|F~s(Zs,Zs)\displaystyle|\widetilde{F}_{s}(Z_{s},Z_{s})- F~s(Zsn1,Zsn)|\displaystyle\widetilde{F}_{s}(Z_{s}^{n-1},Z_{s}^{n})|
=|(bu(Xs,eδsZs)σ1)(Xs)Zs+eδsfu(Xs,eδsZs)(Xs)\displaystyle=\Big|(b^{u(X_{s},e^{\delta s}Z_{s})}\sigma^{-1})(X_{s})\cdot Z_{s}+e^{-\delta s}f^{u(X_{s},e^{\delta s}Z_{s})}(X_{s})
(bu(Xs,eδsZsn1)σ1)(Xs)Zsn+eδsfu(Xs,eδsZsn1)(Xs)|\displaystyle\hskip 18.49988pt-(b^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}\sigma^{-1})(X_{s})\cdot Z^{n}_{s}+e^{-\delta s}f^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big|
|σ1(Xs)Zs||bu(Xs,eδsZs)(Xs)bu(Xs,eδsZsn1)(Xs)|+|bu(Xs,eδsZsn1)(Xs)||ZsZsn|\displaystyle\leq\Big|\sigma^{-1}(X_{s})Z_{s}\Big|\Big|b^{u(X_{s},e^{\delta s}Z_{s})}(X_{s})-b^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big|+\Big|b^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big|\Big|Z_{s}-Z^{n}_{s}\Big|
+eδs|fu(Xs,eδsZs)(Xs)fu(Xs,eδsZsn1)(Xs)|\displaystyle\hskip 18.49988pt+e^{-\delta s}\Big|f^{u(X_{s},e^{\delta s}Z_{s})}(X_{s})-f^{u(X_{s},e^{\delta s}Z^{n-1}_{s})}(X_{s})\Big|
eδsCθ|eδsZseδsZsn1|+K|ZsZsn|+eδsθ|eδsZseδsZsn1|\displaystyle\leq e^{-\delta s}C\theta|e^{\delta s}Z_{s}-e^{\delta s}Z_{s}^{n-1}|+K|Z_{s}-Z_{s}^{n}|+e^{-\delta s}\theta|e^{\delta s}Z_{s}-e^{\delta s}Z_{s}^{n-1}|
=θ(C+1)|ZsZsn1|+K|ZsZsn|,\displaystyle=\theta(C+1)|Z_{s}-Z_{s}^{n-1}|+K|Z_{s}-Z_{s}^{n}|,

where we have applied the fact that there is a constant C>0C>0, |σ1(Xs)Zs|=eδs|Dxv(Xs)|eδsC|\sigma^{-1}(X_{s})Z_{s}|=e^{-\delta s}|D_{x}v(X_{s})|\leq e^{-\delta s}C (given the derivative exists) by the property of the value function given in Proposition 2.2. Hence, one may choose δ\delta large enough such that Assumption D.3 holds for F~\widetilde{F}. Then, by applying Lemmas  D.4 and D.5, there is 0<γ<δ0<\gamma<\delta,

𝔼^[e2γt|YtYtn|2]+ZZn𝕃^γ22qZZn1𝕃^γ22,\widehat{\mathbb{E}}\Big[e^{2\gamma t}|Y_{t}-Y^{n}_{t}|^{2}\Big]+\|Z-Z^{n}\|^{2}_{\widehat{\mathbb{L}}^{2}_{\gamma}}\leq q\|Z-Z^{n-1}\|^{2}_{\widehat{\mathbb{L}}^{2}_{\gamma}}, (D.18)

for any t0t\geq 0, where 𝔼^\widehat{\mathbb{E}} and 𝕃^γ2\widehat{\mathbb{L}}^{2}_{\gamma} denote the expectation and 𝕃γ2\mathbb{L}^{2}_{\gamma}-norm under measure ^\widehat{\mathbb{P}}. In addition, by noting that the solution Y0xY^{x}_{0} and Y0x,nY^{x,n}_{0} are the value function v(x)v(x) and approximation sequence at nnth iteration vn(x)v^{n}(x) in the Algorithm 1 (with a recursive argument, see e.g. Kerimkulov et al., (2020)), we have

|v(x)vn(x)|2qn𝔼^[0e2(γδ)s|σ(Xs)|2|Dxv(Xs)Dxv0(Xs)|2ds].|v(x)-v^{n}(x)|^{2}\leq q^{n}\widehat{\mathbb{E}}\Bigg[\int_{0}^{\infty}e^{2(\gamma-\delta)s}\Big|\sigma(X_{s})\Big|^{2}\,\Big|D_{x}v(X_{s})-D_{x}v^{0}(X_{s})\Big|^{2}\operatorname*{\mathrm{d\!}}s\Bigg].

Hence, we complete the proof.

References

  • Albrecher et al., (2022) Albrecher, H., Azcue, P., and Muler, N. (2022). Optimal ratcheting of dividends in a Brownian risk model. SIAM Journal on Financial Mathematics, 13(3):657–701.
  • Amin et al., (2009) Amin, S., Cárdenas, A. A., and Sastry, S. S. (2009). Safe and secure networked control systems under denial-of-service attacks. In International workshop on hybrid systems: computation and control, pages 31–45. Springer.
  • Antonio et al., (2021) Antonio, Y., Indratno, S. W., and Saputro, S. W. (2021). Pricing of cyber insurance premiums using a Markov-based dynamic model with clustering structure. PLoS One, 16(10):e0258867.
  • Asmussen and Taksar, (1997) Asmussen, S. and Taksar, M. (1997). Controlled diffusion models for optimal dividend pay-out. Insurance: Mathematics and Economics, 20(1):1–15.
  • Azcue and Muler, (2005) Azcue, P. and Muler, N. (2005). Optimal reinsurance and dividend distribution policies in the Cramér-Lundberg model. Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics, 15(2):261–308.
  • Bakdash et al., (2018) Bakdash, J. Z., Hutchinson, S., Zaroukian, E. G., Marusich, L. R., Thirumuruganathan, S., Sample, C., Hoffman, B., and Das, G. (2018). Malware in the future? Forecasting of analyst detection of cyber events. Journal of Cybersecurity, 4(1):tyy007.
  • Barnett et al., (2023) Barnett, M., Buchak, G., and Yannelis, C. (2023). Epidemic responses under uncertainty. Proceedings of the National Academy of Sciences, 120(2):e2208111120.
  • Bellman, (1955) Bellman, R. (1955). Functional equations in the theory of dynamic programming. v. positivity and quasi-linearity. Proceedings of the National Academy of Sciences, 41(10):743–746.
  • Bellman, (1966) Bellman, R. (1966). Dynamic programming. Science, 153(3731):34–37.
  • Böhme and Kataria, (2006) Böhme, R. and Kataria, G. (2006). Models and measures for correlation in cyber-insurance. In Weis, volume 2(1), page 3.
  • Böhme et al., (2010) Böhme, R., Schwartz, G., et al. (2010). Modeling cyber-insurance: towards a unifying framework. In WEIS.
  • Boukanjime et al., (2021) Boukanjime, B., El-Fatini, M., Laaribi, A., Taki, R., and Wang, K. (2021). A Markovian regime-switching stochastic hybrid time-delayed epidemic model with vaccination. Automatica, 133:109881.
  • Cárdenas et al., (2011) Cárdenas, A. A., Amin, S., Lin, Z.-S., Huang, Y.-L., Huang, C.-Y., and Sastry, S. (2011). Attacks against process control systems: risk assessment, detection, and response. In Proceedings of the 6th ACM symposium on information, computer and communications security, pages 355–366.
  • Cebula and Young, (2010) Cebula, J. L. and Young, L. R. (2010). A taxonomy of operational cyber security risks. Technical report, Carnegie Mellon University. CMU/SEI-2010-TN-028.
  • Chen et al., (2025) Chen, J., Feng, K., Freddi, L., Goreac, D., and Li, J. (2025). Optimality of vaccination for prevalence-constrained SIRS epidemics. Applied Mathematics & Optimization, 91(1):1–26.
  • Crandall et al., (1992) Crandall, M. G., Ishii, H., and Lions, P.-L. (1992). User’s guide to viscosity solutions of second order partial differential equations. Bulletin of the American mathematical society, 27(1):1–67.
  • Crandall and Lions, (1983) Crandall, M. G. and Lions, P.-L. (1983). Viscosity solutions of Hamilton-Jacobi equations. Transactions of the American mathematical society, 277(1):1–42.
  • Cremer et al., (2022) Cremer, F., Sheehan, B., Fortmann, M., Kia, A. N., Mullins, M., Murphy, F., and Materne, S. (2022). Cyber risk and cybersecurity: a systematic review of data availability. The Geneva papers on risk and insurance. Issues and practice, 47(3):698.
  • Dou et al., (2020) Dou, W., Tang, W., Wu, X., Qi, L., Xu, X., Zhang, X., and Hu, C. (2020). An insurance theory based optimal cyber-insurance contract against moral hazard. Information Sciences, 527:576–589.
  • Eling and Jung, (2018) Eling, M. and Jung, K. (2018). Copula approaches for modeling cross-sectional dependence of data breach losses. Insurance: Mathematics and Economics, 82:167–180.
  • Eling and Loperfido, (2017) Eling, M. and Loperfido, N. (2017). Data breaches: Goodness of fit, pricing, and risk measurement. Insurance: Mathematics and Economics, 75:126–136.
  • Fahrenwaldt et al., (2018) Fahrenwaldt, M. A., Weber, S., and Weske, K. (2018). Pricing of cyber insurance contracts in a network model. ASTIN Bulletin: The Journal of the IAA, 48(3):1175–1218.
  • Fawzi et al., (2014) Fawzi, H., Tabuada, P., and Diggavi, S. (2014). Secure estimation and control for cyber-physical systems under adversarial attacks. IEEE Transactions on Automatic control, 59(6):1454–1467.
  • Federico et al., (2024) Federico, S., Ferrari, G., and Torrente, M.-L. (2024). Optimal vaccination in a SIRS epidemic model. Economic Theory, 77(1):49–74.
  • Fuhrman and Tessitore, (2004) Fuhrman, M. and Tessitore, G. (2004). Infinite horizon backward stochastic differential equations and elliptic equations in Hilbert spaces. The Annals of Probability, 32(1B):607 – 660.
  • Garcia-Teodoro et al., (2009) Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., and Vázquez, E. (2009). Anomaly-based network intrusion detection: Techniques, systems and challenges. Computers & Security, 28(1-2):18–28.
  • Gil et al., (2014) Gil, S., Kott, A., and Barabási, A.-L. (2014). A genetic epidemiology approach to cyber-security. Scientific Reports, 4(1):5659.
  • Gordon et al., (2003) Gordon, L. A., Loeb, M. P., and Sohail, T. (2003). A framework for using insurance for cyber-risk management. Communications of the ACM, 46(3):81–85.
  • Gray et al., (2011) Gray, A., Greenhalgh, D., Hu, L., Mao, X., and Pan, J. (2011). A stochastic differential equation SIS epidemic model. SIAM Journal on Applied Mathematics, 71(3):876–902.
  • Hamadène et al., (1999) Hamadène, S., Lepeltier, J.-P., and Wu, Z. (1999). Infinite horizon reflected backward stochastic differential equations and applications in mixed control and game problems. Probability and Mathematical Statistics, 19(2):211–234.
  • He et al., (2024) He, R., Jin, Z., and Li, J. S.-H. (2024). Modeling and management of cyber risk: a cross-disciplinary review. Annals of Actuarial Science, 18(2):270–309.
  • Herath and Herath, (2011) Herath, H. and Herath, T. (2011). Copula-based actuarial model for pricing cyber-insurance policies. Insurance markets and companies: analyses and actuarial computations, 2(1):7–20.
  • Hillairet and Lopez, (2021) Hillairet, C. and Lopez, O. (2021). Propagation of cyber incidents in an insurance portfolio: counting processes combined with compartmental epidemiological models. Scandinavian Actuarial Journal, 2021(8):671–694.
  • Hillairet et al., (2022) Hillairet, C., Lopez, O., d’Oultremont, L., and Spoorenberg, B. (2022). Cyber-contagion model with network structure applied to insurance. Insurance: Mathematics and Economics, 107:88–101.
  • Howard, (1960) Howard, R. A. (1960). Dynamic programming and Markov processes. John Wiley.
  • Jang-Jaccard and Nepal, (2014) Jang-Jaccard, J. and Nepal, S. (2014). A survey of emerging threats in cybersecurity. Journal of Computer and System Sciences, 80(5):973–993.
  • Karatzas and Shreve, (1991) Karatzas, I. and Shreve, S. E. (1991). Brownian Motion and Stochastic Calculus, volume 113 of Graduate Texts in Mathematics. Springer, New York, NY, 2 edition.
  • Kerimkulov et al., (2020) Kerimkulov, B., Siska, D., and Szpruch, L. (2020). Exponential convergence and stability of howard’s policy improvement algorithm for controlled diffusions. SIAM Journal on Control and Optimization, 58(3):1314–1340.
  • Krylov, (1980) Krylov, N. V. (1980). Controlled diffusion processes. Springer.
  • Krylov, (2002) Krylov, N. V. (2002). Introduction to the Theory of Random Processes. American Mathematical Society, Providence, Rhode Island.
  • Liu et al., (2025) Liu, W., Li, L., Sun, J., Deng, F., Wang, G., and Chen, J. (2025). Data-driven control against false data injection attacks. Automatica, 179:112399.
  • Liu et al., (2016) Liu, Y., Dong, M., Ota, K., and Liu, A. (2016). Activetrust: Secure and trustable routing in wireless sensor networks. IEEE Transactions on Information Forensics and Security, 11(9):2013–2027.
  • Malavasi et al., (2022) Malavasi, M., Peters, G. W., Shevchenko, P. V., Trück, S., Jang, J., and Sofronov, G. (2022). Cyber risk frequency, severity and insurance viability. Insurance: Mathematics and Economics, 106:90–114.
  • Mishra and Pandey, (2014) Mishra, B. K. and Pandey, S. K. (2014). Dynamic model of worm propagation in computer network. Applied Mathematical Modelling, 38(7-8):2173–2179.
  • Moore et al., (2006) Moore, D., Shannon, C., Brown, D. J., Voelker, G. M., and Savage, S. (2006). Inferring internet denial-of-service activity. ACM Transactions on Computer Systems (TOCS), 24(2):115–139.
  • Mukhopadhyay et al., (2013) Mukhopadhyay, A., Chatterjee, S., Saha, D., Mahanti, A., and Sadhukhan, S. K. (2013). Cyber-risk decision models: To insure it or not? Decision Support Systems, 56:11–26.
  • Nguyen-Ngoc and Yor, (2004) Nguyen-Ngoc, L. and Yor, M. (2004). Some martingales associated to reflected Lévy processes. In Séminaire de Probabilités XXXVIII, pages 42–69. Springer.
  • Öğüt et al., (2011) Öğüt, H., Raghunathan, S., and Menon, N. (2011). Cyber security risk management: Public policy implications of correlated risk, imperfect ability to prove loss, and observability of self-protection. Risk Analysis: An International Journal, 31(3):497–512.
  • Pasqualetti et al., (2013) Pasqualetti, F., Dörfler, F., and Bullo, F. (2013). Attack detection and identification in cyber-physical systems. IEEE Transactions on Automatic Control, 58(11):2715–2729.
  • Paté-Cornell et al., (2018) Paté-Cornell, M.-E., Kuypers, M., Smith, M., and Keller, P. (2018). Cyber risk management for critical infrastructure: a risk analysis model and three case studies. Risk Analysis, 38(2):226–241.
  • Pham, (2009) Pham, H. (2009). Continuous-time stochastic control and optimization with financial applications, volume 61. Springer Science & Business Media.
  • Puterman, (1981) Puterman, M. (1981). On the convergence of policy iteration for controlled diffusions. Journal of Optimization Theory and Applications, 33(1):137–144.
  • Puterman and Brumelle, (1979) Puterman, M. L. and Brumelle, S. L. (1979). On the convergence of policy iteration in stationary dynamic programming. Mathematics of Operations Research, 4(1):60–69.
  • Santos and Rust, (2004) Santos, M. S. and Rust, J. (2004). Convergence properties of policy iteration. SIAM Journal on Control and Optimization, 42(6):2094–2115.
  • Sonveaux and Winkin, (2023) Sonveaux, C. and Winkin, J. J. (2023). State feedback control law design for an age-dependent SIR model. Automatica, 158:111297.
  • Stoneburner et al., (2002) Stoneburner, G., Goguen, A., Feringa, A., et al. (2002). Risk management guide for information technology systems. Nist special publication, 800(30):800–30.
  • Touzi, (2012) Touzi, N. (2012). Optimal stochastic control, stochastic target problems, and backward SDE, volume 29. Springer Science & Business Media.
  • Tran and Yin, (2021) Tran, K. and Yin, G. (2021). Optimal control and numerical methods for hybrid stochastic SIS models. Nonlinear Analysis: Hybrid Systems, 41:101051.
  • Wakaiki et al., (2019) Wakaiki, M., Cetinkaya, A., and Ishii, H. (2019). Stabilization of networked control systems under DoS attacks and output quantization. IEEE Transactions on Automatic Control, 65(8):3560–3575.
  • Wang et al., (2024) Wang, W., Xu, R., and Yan, K. (2024). Optimal ratcheting of dividends with capital injection. Mathematics of Operations Research.
  • Xu and Hua, (2019) Xu, M. and Hua, L. (2019). Cybersecurity insurance: Modeling and pricing. North American Actuarial Journal, 23(2):220–249.
  • Yong and Zhou, (1999) Yong, J. and Zhou, X. Y. (1999). Stochastic controls: Hamiltonian systems and HJB equations, volume 43. Springer Science & Business Media.
  • Zhan et al., (2015) Zhan, Z., Xu, M., and Xu, S. (2015). Predicting cyber attack rates with extreme values. IEEE Transactions on Information Forensics and Security, 10(8):1666–1677.