SlideShare a Scribd company logo
European Journal of Operational Research 137 (2002) 558–573
                                                                                                         www.elsevier.com/locate/dsw



                                                 Stochastics and Statistics

      Nonlinear stochastic programming by Monte-Carlo estimators
                                                Leonidas L. Sakalauskas               *

          Department of Statistical Modelling, Institute of Mathematics and Informatics, Akademijos 4, Vilnius 2600, Lithuania
                                         Received 18 January 2000; accepted 26 February 2001




Abstract

   Methods for solving stochastic programming (SP) problems by a finite series of Monte-Carlo samples are considered.
The accuracy of solution is treated in a statistical manner, testing the hypothesis of optimality according to statistical
criteria. The rule for adjusting the Monte-Carlo sample size is introduced to ensure the convergence and to find the
solution of the SP problem using a reasonable number of Monte-Carlo trials. Issues of implementation of the developed
approach in decision making and other applicable fields are considered too. Ó 2002 Elsevier Science B.V. All rights
reserved.

Keywords: Monte-Carlo method; Stochastic programming; Kuhn–Tucker conditions; Statistical hypothesis




1. Introduction

      We consider the standard stochastic programming (SP) problem with a single inequality-constraint

         F0 ðxÞ ¼ Ef0 ð x; xÞ ! min;
         F1 ðxÞ ¼ Ef1 ð x; xÞ 6 0;                                                                                               ð1Þ
                n
         x2R ;

where functions fi : Rn  X ! R, i ¼ 0; 1, satisfy certain conditions on continuity and differentiability,
x 2 X is an elementary event in some probability space ðX; R; Px Þ, E is the symbol of mathematical ex-
pectation. Assume that we have a possibility to get finite sequences of realizations of x and that values of
functions f0 ; f1 are available at any point x 2 Rn for any realization of x 2 X.
   The constraint optimization with mean-valued objective and constraint functions occurs in many ap-
plied problems of engineering, statistics, finances, business management, etc. Stochastic procedures for
solving problems of this kind are often considered and several concepts are proposed to ensure and improve


  *
      Tel.: +370-2-729312; fax: +370-2-729209.
      E-mail address: sakal@ktl.mii.lt (L.L. Sakalauskas).

0377-2217/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 7 - 2 2 1 7 ( 0 1 ) 0 0 1 0 9 - 6
L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573          559


the convergence behavior of developed methods (Ermolyev and Wets, 1988; Shapiro, 1989; Kall and
Wallace, 1994; Prekopa, 1995). The concept of stochastic approximation is quite well studied, when the
convergence is ensured by varying certain step-length multipliers in a scheme of stochastic gradient search
(see, e.g., Ermolyev, 1976; Mikhalevitch et al., 1987; Ermolyev and Wets, 1988; Uriasyev, 1990; Kall and
Wallace, 1994; Dippon, 1998). However, the following obstacles are often mentioned in the implementation
of stochastic approximation:
• It is not so clear when to stop the process of stochastic approximation.
• The methods of stochastic approximation converge rather slowly.
   Improvement of convergence behavior using more accurate estimators is a well-known concept,
which has already found an application in a semi-stochastic approximation (Ermolyev and Wets, 1988;
Uriasyev, 1990; Kall and Wallace, 1994). However, the obstacles mentioned above occur here too. In
this paper, we attempt to develop this concept starting from a theoretical scheme of methods with a
relative stochastic error (see Polyak, 1983). The main requirement of this scheme is that the variance of
the stochastic gradient be varied during the optimization procedure so that it would remain propor-
tional to the square of the gradient norm. We show that such an approach offers an opportunity to
develop implementable algorithms of stochastic programming, using a finite series of Monte-Carlo
estimators for the construction of gradient-type methods, where the accuracy of estimators is adjusted
in a special way.
   We briefly survey stochastic differentiation techniques and also introduce a set of Monte-Carlo esti-
mators for SP. The accuracy of the solution is treated in a statistical manner, testing the hypothesis of
optimality according to statistical criteria. Although we pay special attention to the case of the Gaussian
measure Px because of its importance in applications, the technique developed could be successfully ex-
tended to other distributions as well as to other cases of SP, that differ from those considered here. The
developed methods are studied numerically in solving test examples and problems in practice (see also
Sakalauskas, 1992, 1997, 2000; Sakalauskas and Steishunas, 1993).


2. Stochastic differentiation and Monte-Carlo estimators

   The gradient search is the most often used way of constructing methods for numerical optimization.
Since mathematical expectations in (1) are computed explicitly only in rare cases, all the more it is com-
plicated to analytically compute the gradients of functions, containing this expression. The Monte-Carlo
method is a universal and convenient tool of estimating these expectations and it could be applied to es-
timate derivatives too. The procedures of gradient evaluation are often constructed expressing a gradient as
an expectation and then evaluating this expectation by means of statistical simulation (see Rubinstein,
1983; Kall and Wallace, 1994; Prekopa, 1995; Ermolyev and Norkin, 1995).
   Let us introduce a set of Monte-Carlo estimators needed for construction of a stochastic optimization
procedure. Assume without loss of simplicity the measure Px to be absolutely continuous, i.e. it can be
defined by the density function p : Rn  X ! Rþ . Assume, also, that values of p are available at any point
x 2 Rn for any realization of x 2 X. First, let us consider the expectation
                          Z
      F ðxÞ ¼ Ef ðx; xÞ     f ðx; yÞ Á pðx; yÞdy;                                                       ð2Þ
                           Rn


where the function f and the density function p are differentiable with respect to x in the entire space Rn . Let
us denote the support of measure Px as

     SðxÞ ¼ fy j pðx; yÞ  0g;    x 2 Rn :
560                          L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573


Then it is not difficult to see that the vector-column of the gradient of this function could be expressed as
      rF ðxÞ ¼ Eðrx f ðx; xÞ þ f ðx; xÞ Á rx ln pðx; xÞÞ                                                                   ð3Þ

(we assume rx ln pðx; yÞ ¼ 0, y 62 SðxÞ). We also see from the equation
      Erx ln pðx; xÞ ¼ 0
                                                                       R
(which was obtained by differentiating the equation                     X
                                                                           pðx; yÞdy ¼ 1) that various expressions of the gra-
dient follow. For instance, the formula
      rF ðxÞ ¼ Eð rx f ðx; xÞ þ ðf ðx; xÞ À f ðx; ExÞÞ Á rx ln pðx; xÞÞ                                                    ð4Þ

serves as an example of such an expression.
   Thus we see that it is possible to express the expectation and its gradient through a linear operator from
the same probability space. Hence, operators (2)–(4) can be estimated by means of the same Monte-Carlo
sample. It depends on the task solved, which formula, (3) or (4), is better for use. For instance, expression (4)
can provide smaller variances of gradient components than (3), if the variances of x components are small.
   Now, assume the Monte-Carlo sample to be given for some x 2 D  Rn :

      Y ¼ ðy 1 ; y 2 ; . . . ; y N Þ;                                                                                      ð5Þ

where y i are independent random vectors identically distributed with the density pðx; ÁÞ : X ! Rþ . We have
the estimates

      ~        1 XN
      Fi ðxÞ ¼       fi ðx; y j Þ;                                                                                         ð6Þ
               N j¼1

                      1 X  N
      ~
      D2i ðxÞ ¼
        F                     ðfi ðx; y j Þ À Fi ðxÞÞ2 ;
                                              ~            i ¼ 0; 1:                                                       ð7Þ
                    N À 1 i¼1
If we introduce the Lagrange function
      Lðx; kÞ ¼ F0 ðxÞ þ k Á F1 ðxÞ;                                                                                       ð8Þ

which may be treated as an expectation of the stochastic Lagrange function
      lðx; k; xÞ ¼ f0 ðx; xÞ þ k Á f1 ðx; xÞ;

then the estimate of the gradient of the Lagrange function

      ~            1 X j
                      N
      rx Lðx; kÞ ¼       G;                                                                                                ð9Þ
                   N j¼1

could be considered according to (3), as the average of identically distributed independent vectors
      Gj ¼ rx lðx; k; y j Þ þ lðx; k; y j Þ Á rx ln pðx; y j Þ;

where EGj ¼ rLðx; kÞ, j ¼ 1; . . . ; N . The sampling covariance matrix

             1 X j ~
                N
                                      ~ 0
      A¼           ðG À rx LÞ Á ðGj À rx LÞ                                                                              ð10Þ
             N j¼1

will be used later on too.
L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573         561


3. The rule for iterative adjustment of sample size

    The Monte-Carlo sample size N should be chosen to compute the estimators introduced in the previous
section. Sometimes this sample size is taken fixed and sufficiently large to ensure the required accuracy of
estimates in all the iterations of the optimization process. Very often this size is about 1000–1500 trials or
still more, and, if the number of optimization steps is large, solution of an SP problem can require a great
amount of computations (Jun Shao, 1989). Besides, it is well known that the fixed sample size, although
very large, is sufficient only to ensure the convergence to some neighborhood of the optimal point (see, e.g.,
Polyak, 1983; Sakalauskas, 1997).
    Note that there is no great necessity to estimate functions with a high accuracy on starting the opti-
mization, because then it suffices to evaluate only an approximate direction leading to the optimum.
Therefore the samples should not be taken large at the beginning of optimization, adjusting their sizes so as
to obtain the estimates of the objective and constraint functions with a desired accuracy only at the moment
of decision making on the solution finding. It is mentioned above that the concept of using more accurate
estimators has already found an application in semi-stochastic approximation methods. We develop here
this concept in an iterative scheme of methods with a relative stochastic error (see Polyak, 1983), choosing
at every next iteration the sample size inversely proportional to the square of norm of the gradient estimate
from the current iteration. It can be proved under some conditions that such a rule enables us to construct
the converging stochastic methods of SP.
    Let xþ 2 Rn be now the solution of SP problem (1). By virtue of the Kuhn–Tucker theorem (see, e.g.,
Bertsekas, 1982) there exist values kþ P 0; rF0 ðxþ Þ; rF1 ðxþ Þ such that

     rF0 ðxþ Þ þ kþ Á rF1 ðxþ Þ ¼ 0;      kþ Á F1 ðxþ Þ ¼ 0:                                                ð11Þ

    In SP, the well-known gradient-type procedures of constrained optimization can be applied in iterative
search of the Kuhn–Tucker solution (Kall and Wallace, 1994; Ermolyev and Wets, 1988; Mikhalevitch et
al., 1987). We analyze the peculiarities of realization of such methods constructing a stochastic version of
the well-known Arrow–Hurvitz–Udzava procedure.

Theorem 1. Let the functions Fi : Rn ! R, i ¼ 0; 1, expressed as expectations (3), where x 2 X is an event
from the probability space ðX; R; Px Þ, Px is an absolutely continuous measure with the density function
p : Rn  X ! Rþ , be convex and twice differentiable. Assume that all the eigenvalues of the matrix of second
derivatives r2 Lðx; kþ Þ, 8x 2 Rn , are uniformly bounded and belong to the interval ½m; MŠ, m  0, and, besides,
             xx

                                m
     jrF1 ðxÞ À rF1 ðxþ Þj 6      jrF1 ðxþ Þj   8x 2 Rn ; jrF1 ðxþ Þj  0;
                                M
where ðxþ ; kþ Þ is a point satisfying the Kuhn–Tucker conditions (11).
   Let us state, in addition, that for any x 2 Rn and any number N P 1 there exists a possibility to obtain
estimates (7) and

     ~           1 XN
     rx Fi ðxÞ ¼       Gi ðx; y j Þ;
                 N j¼1

where fy j gN is a sample of independent vectors, identically distributed with the density pðx; ÁÞ : X ! Rþ ,
            1
EGi ðx; xÞ ¼ rx Fi ðxÞ, and the conditions of uniform boundedness on variances are valid:
                            2                                  2
      Eðf1 ðx; xÞ À F1 ðxÞÞ  d; EjGi ðx; xÞ À rFi ðxÞj  K;
        8x 2 Rn ; i ¼ 0; 1:
562                         L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573


   Let now an initial point x0 2 Rn , the scalar k0 , and an initial sample size N 0 be given and the random se-
                 1
quence fxt ; kt gt¼0 be defined according to
                        ~
        xtþ1 ¼ xt À q Á rx Lðxt ; kt Þ;
                                                                                                             ð12Þ
                            ~
        ktþ1 ¼ kt þ q Á a Á F1 ðxt Þ;

where estimates (6), (9) are used varying the sample size N t according to the rule
                            C
        N tþ1 P                          ;                                                                   ð13Þ
                       ~ x Lðxt ; kt Þj2
                  q Á jr

                                                                              
q  0, a  0, C  0 are certain constants. Then, there exist positive values q, C such that
                                   !
                              2
               þ 2  jkt À kþ j   1
          t
     E jx À x j þ               þ t  B Á bt ; t ¼ 1; 2; . . . ;
                         a       N

                                                   
for certain values 0  b  1, B  0, as q  q, C  C .

      The proof is given in Appendix A.

                                                                                             2
Remark. An interesting comment follows. As we can see, the error jxt À xþ j and the sample size N t have a
linear rate of changing, conditioned by the value 0  b  1, which depends mostly on the constant C in
expression (13) and conditionality of the Hessian of Lagrange function. It follows from the proof of the
theorem that N t =N tþ1 % b for large t. Then, by virtue of the formula of geometrical progression
        X
        t
                           Q
              Ni % Nt Á       ;
        i¼0
                          1Àb

where Q is a certain constant.

   Note now that the stochastic error of estimates depends on the sample size at the moment of the
stopping decision, say N t , and, in its turn, the accuracy of the solution. Thus, the ratio of the total number
                  Pt
of computations i¼0 N i with the sample size at the stopping moment N t can be considered as bounded by
a certain constant and not dependent on this accuracy. Thus, if we have a certain resource to compute one
value of the objective or constraint function with an admissible accuracy, then the optimization requires in
fact only several times more computations. This enables us to construct reasonable, from a computational
viewpoint, stochastic methods for SP. The procedure (12) and the rule (13) are developed further in
Section 4.
   The choice of the best metrics for computing the norm in (13) or other parameters in stochastic gradient
procedures of the considered kind requires a separate study. The theorem proved is important in principle
and the proof could be extended without much difficulty to the case of unconstrained optimization as well
as that of optimization with several constraints (see also Sakalauskas, 1997).


4. The algorithm for SP by the finite series of Monte-Carlo samples

   While implementing the method, developed in the previous section, we have to keep in mind that the
performance and stopping of computation of (12) and (13) could be done only on the base of finite series of
L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573    563


Monte-Carlo samples (5). In such a case, the application of methods of the theory of statistical decisions is
a constructive way of building an implementable method for SP.
   A possible decision on finding the optimal solution should be examined at each iteration of the opti-
mization process. Namely, the decision on optimum finding could be made, if, first, there is no reason to
reject the hypothesis on the validity of Kuhn–Tucker conditions (11), and, second, the objective and
constraint functions are estimated with admissible accuracy. Since the distribution of introduced estimators
(5) and (9) can be approximated by the one- and multivariate Gaussian law, when the Monte-Carlo sample
size is sufficiently large (see, e.g., Bhattacharya and Rao, 1976; Box and Watson, 1962; Bentkus and Gotze,
1999), it is convenient to use the well-known methods of normal sample analysis for our purposes.
   Thus, the validity of the first optimality condition in (11) could be tested by means of the well-known
multidimensional Hotelling T 2 -statistics (see, e.g., Krishnaiah and Lee, 1980). Namely, the optimality
hypothesis could be accepted for some point x with the significance l, if the following condition vanishes:

                 ~ 0            ~
      ðN À nÞ Á ðrx LÞ Á AÀ1 Á ðrx LÞ=n 6 Fishðl; n; N À nÞ;                                              ð14Þ

where Fishðl; n; N t À nÞ is the l-quantile of the Fisher distribution with ðn; N t À nÞ degrees of freedom,
~       ~
rx L ¼ rx Lðx; kÞ is the estimate (9) of the Lagrange function gradient, and A is the normalizing matrix (10)
estimated at the point ðx; kÞ, N is the size of sample (5). In the opposite case this hypothesis is rejected.
   Now we make two important remarks related to the statistical error of estimator (6) of the objective and
constraint functions. Since the real values of these functions remain unknown to us and we can operate only
with their estimators, produced by statistical simulation, first of all, the upper confidence bound of the
constraint function estimate has to be used for performance of (12) and for testing the second optimality
condition in (14), and, second, the sample size N has to be sufficiently large in order to estimate the
confidence intervals of the objective and constraint functions with an admissible accuracy. It is convenient
here to use the asymptotic normality and to approximate the respective confidence bounds by means of the
sampling variance (7).
   Summarizing all such considerations, we propose the following algorithm for SP by finite Monte-Carlo
series. Assume some initial value k0 P 0, some initial point x0 2 Rn to be given, random sample (5) of a
certain initial size N 0 be generated at this point, and Monte-Carlo estimates (6), (7), (9), and (10) be
computed. Let us find a solution such that the statistical hypothesis on the validity of the first Kuhn–Tucker
condition (11) would not be rejected and the objective and constraint functions would be evaluated with
permissible confidence intervals.
   Then, such a version of procedure (11) could be applied in stochastic solving of (1):

                      ~
      xtþ1 ¼ xt À q Á rx Lðxt ; kt Þ;
                                                                                                          ð15Þ
                                  ~               ~
      ktþ1 ¼ max½0; kt þ q Á a Á ðF1 ðxt Þ þ gb Á DF1 ðxt ÞÞŠ;

where q; a, are the normalizing multipliers, gb is the b-quantile of the standard normal distribution. Note
                                                                 ~           ~
that we introduce an estimate of the upper confidence bound F1 ðxt Þ þ gb Á DF1 ðxt Þ, taking into account the
statistical error of estimate (6) of the constraint function and using the normal approximation.
   As mentioned above, we can choose the constant C and the metric to compute the norm while the rule
(13) is implemented. A possible way is to compute the estimate of the gradient norm in the metric, induced
by the sampling covariance matrix (10), and to set C ¼ n Á Fishðc; n; N t À nÞ. This is convenient for inter-
pretations, because, in such a case, the random error of the gradient does not exceed the gradient norm
approximately with probability 1 À c. Such a modification of (13) is as follows:

                    n Á Fishðc; n; N t À nÞ
      N tþ1 P                                      :                                                      ð16Þ
                q Á ðrx Lt Þ0 Á ðAt ÞÀ1 Á ðrx Lt Þ
                       ~                   ~
564                        L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573


In numerical realization it is recommended to introduce the minimal and maximal values Nmin (usually
$20–50) and Nmax to avoid great fluctuations of sample size in iterations.
   Thus, the procedure (15) is iterated adjusting the sample size according to (13) or (16) and testing the
optimality conditions at each iteration. If:
   (a) Criterion (14) does not contradict the hypothesis that the gradient of the Lagrange function is equal
to 0 (first condition in (11)).
   (b) The constraint condition (second condition in (11)) vanishes with a given probability b:
      ~               ~
      F1 ðxt Þ þ gb Á DF1 ðxt Þ 6 0:

   (c) The estimated lengths of the confidence interval of the objective function and the constraint one do
not exceed the admissible accuracy ei :
                  pffiffiffiffi
      2gbi Á DFi = N 6 ei ; i ¼ 0; 1;

where gbi is the bi -quantile of the standard normal distribution, there are no reasons to reject the hypothesis
about the optimum finding. Therefore, there is a basis to stop the optimization and to make a decision on
the optimum finding with an admissible accuracy. If one at least condition out of three is not satisfied, then
a next sample is generated and the optimization is continued. As follows from Section 3, the optimization
process should stop after generating a finite number of samples (5).
   Also note that, since the statistical testing of the optimality hypothesis is grounded by the convergence of
distribution of sampling estimates to the Gaussian law, additional standard points could be introduced,
considering the rate of convergence to the normal law and following from the Berry–Esseen or large de-
viation theorems.
   An extension of the developed approach to the case of several inequalities-constraints is related, first,
with the consideration of joint confidence statements when the statistical error of constraint estimates is
taken into account. The simplest way is to consider a vectorial case of (12) or (15) and, separately for each
constraint, to compute the corresponding confidence bounds in (15) and condition (c). The case of
equalities-constraints is reduced to the one considered here, changing each equality by two inequalities.
Such remarks should be also taken into consideration applying our approach to other cases of SP as well as
to constraint optimization procedures, differing from that of Arrow–Hurwitz–Udzava.


5. Numerical study of the stochastic optimization algorithm

   Let us discuss a numerical experiment to study the proposed algorithm. Since typical functions are often
in practice of a quadratic character with some nonlinear disturbance in a neighborhood of the optimal
point, we consider a test example of SP with functions of this kind

        F ðxÞ  Ef0 ðx þ xÞ ! min;
                                                                                                                   ð17Þ
        P ðf1 ðx þ xÞ 6 0Þ À p P 0;

where
                 X
                 n                                                   X
                                                                     n
      f0 ðyÞ ¼         ðai yi2 þ bi Á ð1 À cosðci Á yi ÞÞÞ;   f1 ¼         ðyi þ 0:5Þ;
                 i¼1                                                 i¼1

yi ¼ xi þ xi ; xi are random and normally Nð0; d 2 Þ distributed, d ¼ 0:5, a ¼ ð8:00; 5:00; 4:30; 9:10;
1:50; 5:00; 4:00; 4:70; 8:40; 10:00Þ, b ¼ ð3:70; 1:00; 2:10; 0:50; 0:20; 4:00; 2:00; 2:10; 5:80; 5:00Þ, c ¼ ð0:45; 0:50;
0:10; 0:60; 0:35; 0:50; 0:25; 0:15; 0:40; 0:50Þ and n 6 10.
L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573          565


    Such an example may be treated as a stochastic version of the deterministic task of mathematical
programming with the objective function f0 and the constraint function, f1 where the controlled variables
are measured under some Gaussian error with the variance d 2 . Note that our task is not convex.
    The stopping criteria introduced above essentially depend on the rate of convergence of the distribution
of estimates (7) and (11) to the Gaussian law (see also Sakalauskas and Steishunas, 1993). We study the
distribution of statistic in criterion (14) by means of statistical simulation, solving the test task, when
p ¼ 0; n ¼ 2. The optimal point is known in this case: xþ ¼ 0 Let the gradient of considered functions be
evaluated numerically by the Monte-Carlo estimator following from (3). Thus, 400 Monte-Carlo samples of
size N ¼ ð50; 100; 200; 500; 1000Þ were generated and the T 2 -statistics in (15) were computed for each
sample. The hypothesis on the difference of empirical distribution of this statistic from the Fisher distri-
bution was tested according to the criteria x2 and X2 (see Bolshev and Smirnov, 1983). The value of the first
criterion for N ¼ 50 is x2 ¼ 0:2746 at the optimal point against the critical value 0.46 (p ¼ 0:05), and that
of the next one is X2 ¼ 1:616 against the critical value 2.49 (p ¼ 0:05). Besides, the hypothesis on the co-
incidence of empirical distribution of the considered statistics from the Fisher distribution was rejected at
the points differing from the optimal one according to the criteria x2 and X2 (if r ¼ jx À xþ j P 0:1). So the
distribution of multidimensional T 2 -statistics, in our case, can be approximated sufficiently well by the
Fisher distribution with even not large samples ðN ffi 50Þ.
    Further, the dependencies of the stopping probability according to criterion (14) on the distance
r ¼ jx À xþ j to the optimal point were studied. These dependencies are presented in Fig. 1 (for confidence
l ¼ 0:95). The same dependencies are given in Fig. 2 for N ¼ 1000 and l ¼ ð0:90; 0:95; 0:99Þ. So we see that
by adjusting the sample size, we are able to test the optimality hypothesis in a statistical way and to evaluate
the objective and constraint functions with a desired accuracy.




                                 Fig. 1. Stopping probability according to (14), l ¼ 0:95.




                                Fig. 2. Stopping probability according to (14), N ¼ 1000.
566                    L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573


    Now, let us consider the results, obtained for this example, by iterating procedure (15) 40 times and
changing the sample size according to the rule (16), where N 0 ¼ Nmin ¼ 50, and Nmax is chosen according to
the restrictions on the confidence bounds of the objective and constraint functions. The initial data were
chosen or varied: n ¼ ð2; 5; 10Þ, probability in (17) (i.e., p ¼ ð0:0; 0:3; 0:6; 0:9Þ), initial point (x0 ¼
ðx0 ; x0 ; . . . ; x0 Þ, x0 ¼ À1; 1 6 i 6 n), multipliers in (15) ða ¼ 0:1; q ¼ 20), probabilities in (b), (c) and (15)
  1 2               n     i
(b ¼ b0 ¼ b1 ¼ 0:95), confidence in (14) ðl ¼ 0:95Þ, probability in (16) ðc ¼ 0:95Þ, lengths of confidence
                                         ~
intervals in (c) (e0 ¼ 5% (from F ðxt Þ), e1 ¼ 0:05). The optimization was repeated 400 times. Conditions (a)–
(c) were satisfied one time for all paths of optimization. Thus, a decision could be made on the optimum
finding with an admissible accuracy for all paths (the sampling frequency of stopping after t iterations with
confidence intervals is presented in Fig. 3 ðp ¼ 0:6; n ¼ 2Þ). Minimal, average and maximal values of the
number of iterations and those of the total Monte-Carlo trials are presented in Table 1 that were needed to
solve the optimization task ðn ¼ 2Þ, depending on the probability p. The number of iterations and that of
total Monte-Carlo trials, needed for stopping, are presented in Table 2 depending on the dimension
n ðp ¼ 0; n P 2). The sampling estimates of the stopping point are ~stop ¼ ð0:006 Æ 0:053; À0:003 Æ 0:026Þ
                                                                               x
for p ¼ 0; n ¼ 2 (compare with xopt ¼ ð0; 0Þ). These results illustrate that the algorithm proposed can be
successfully applied when the objective and constraint functions are convex and smooth only in a neigh-
borhood of the optimum.




                                   Fig. 3. Frequency of stopping (with confidence interval).


Table 1
n¼2
                                                                                           P
  P            Number of iterations, t                             Total number of trials ð t Nt Þ
               Min             Mean                Max             Min             Mean                 Max
  0.0           6              11:5 Æ 0:2          19              1029              2842 Æ 90           7835
  0.3           4              11:2 Æ 0:3          27              1209             4720 Æ 231          18 712
  0.6           7              12:5 Æ 0:3          29              1370             4984 Æ 216          15 600
  0.9          10              31:5 Æ 1:1          73              1360            13100 Æ 629          37 631


Table 2
p ¼ 0; n P 2
                                                                                            P
  n            Number of iterations, t                              Total number of trials ð t Nt Þ
               Min             Mean                 Max             Min              Mean               Max
   2           6               11:5 Æ 0:2           19              1029             2842 Æ 90          7835
   5           6               12:2 Æ 0:2           21              1333             3696 Æ 104         9021
  10           7               13:2 Æ 0:2           27              1405             3930 Æ 101         8668
L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573         567

                                                           ~                   ~                     ~
   The averaged dependencies of the objective function EF0t , the constraint EF1t , the sample size EN t , the
                      ~                        ~
Lagrange multiplier Ekt and the solution point Ext depending on the iteration number t are given in Figs. 4–
8 to illustrate the convergence and the behavior of the optimization process ðp ¼ 0:6; n ¼ 2Þ. Also, one
path of realization of the optimization process illustrates the stochastic character of this process in these
figures.
   Thus, the theoretical analysis and numerical experiment show that the proposed approach enables us to
solve SP problems with a sufficient accuracy using a reasonable amount of computations (5–20 iterations




                                       Fig. 4. Change of the objective function.




                                      Fig. 5. Change of the constraint function.




                                          Fig. 6. Change of the sample size.
568                       L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573




                                                  Fig. 7. Change of the Lagrange multiplier.




                                    Fig. 8. Change of the point of iteration: (Â) average, (Ã) path.




and 3000–10 000 total Monte-Carlo trials). If we keep in mind that the application of the Monte-Carlo
procedure usually requires 1000–2000 trials in statistical simulation and estimation of one value of the
function, the optimization by our approach can require only 3–5 times more computation.



6. Application to decision support in investment planning

    The investment planning in many fields is related with the control of risk factors in the case of uncer-
tainty. It is often possible to describe this uncertainty by probabilistic models and thus to reduce the de-
cision making problem in finance planning to SP problems. Namely, the objective function could have a
meaning of average losses that should be optimized under the constraint on the probability of the desired
scenario of events.
    Let us consider, as a typical example, the investment planning in the gas delivery (see Guldman, 1983;
Ermolyev and Wets, 1988). Omitting some details, the task is as follows. Choose x ¼ ðx1 ; . . . ; x12 Þ; z, to
minimize the expected costs of purchases, storage operations, and the increment of storage capacity,

                   X
                   12                       X
                                            12
      F ðx; zÞ ¼         ct Á xt þ c0 Á E         jxt À xt j þ c13 Á maxfxt g þ c14 Á z;
                                                                      t
                   t¼1                      t¼1


where xt is the amount of gas ordered from pipeline per month t ¼ 1; 2; . . . ; 12; z is the increment of the gas
storage, subject to the condition on probability of a reliable supply,
L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573                    569

                                  8
PtÀ1                      9
ðxt À xt Þ À a1 Á s¼1 ðxs À xs Þ 6 b1 þ a2 Á z
PtÀ1                   =
       Uðx; zÞ ¼ p À P ðAÞ  p À P x
ÀðxtPt xt Þ À a3 Á s¼1 ðxs À xs Þ 6 b2 þ a4 Á z 6 0;
                                            À

More Related Content

PDF
A comparative analysis of predictve data mining techniques3
PDF
Stochastic Differentiation
PDF
Multiple scales
PDF
V2302179
PDF
11.[1 11]a seasonal arima model for nigerian gross domestic product
PDF
11.a seasonal arima model for nigerian gross domestic product
PDF
project report(1)
PPTX
Computational logic Propositional Calculus proof system
A comparative analysis of predictve data mining techniques3
Stochastic Differentiation
Multiple scales
V2302179
11.[1 11]a seasonal arima model for nigerian gross domestic product
11.a seasonal arima model for nigerian gross domestic product
project report(1)
Computational logic Propositional Calculus proof system

What's hot (6)

PDF
M.G.Goman, A.V.Khramtsovsky (1997) - Global Stability Analysis of Nonlinear A...
PDF
Computational logic First Order Logic
PDF
Computational logic First Order Logic_part2
PDF
Ac2640014009
PDF
C222529
DOC
The Estimations Based on the Kolmogorov Complexity and ...
M.G.Goman, A.V.Khramtsovsky (1997) - Global Stability Analysis of Nonlinear A...
Computational logic First Order Logic
Computational logic First Order Logic_part2
Ac2640014009
C222529
The Estimations Based on the Kolmogorov Complexity and ...
Ad

Viewers also liked (7)

PDF
'Green' startup investment
PDF
Exwfylla 13 8 2010
PDF
Exwfylla 26 6 2010
PDF
IBM x86 Data Center IT Optimization Strategy
PDF
Aalto media lab 20.3.2014
PDF
Asfali lov
PDF
Active Walker Model for the Formation of Human and Animal Trail Systems
'Green' startup investment
Exwfylla 13 8 2010
Exwfylla 26 6 2010
IBM x86 Data Center IT Optimization Strategy
Aalto media lab 20.3.2014
Asfali lov
Active Walker Model for the Formation of Human and Animal Trail Systems
Ad

Similar to Nonlinear stochastic programming by Monte-Carlo estimators (20)

PDF
Nonlinear Stochastic Optimization by the Monte-Carlo Method
PDF
On Approach of Estimation Time Scales of Relaxation of Concentration of Charg...
PDF
On Approach of Estimation Time Scales of Relaxation of Concentration of Charg...
PDF
On Approach of Estimation Time Scales of Relaxation of Concentration of Charg...
PDF
Generalized Functions, Gelfand Triples and the Imaginary Resolvent Theorem
PDF
Merged Talk: A Verified Optimizer for Quantum Circuits & Verified Translation...
PDF
Graph theoretic approach to solve measurement placement problem for power system
PDF
BNL_Research_Report
PDF
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
PDF
Data sparse approximation of the Karhunen-Loeve expansion
PDF
The Odd Generalized Exponential Log Logistic Distribution
PDF
First paper with the NITheCS affiliation
PDF
On a Deterministic Property of the Category of k-almost Primes: A Determinist...
PDF
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
PDF
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
PDF
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
PDF
Stochastic Approximation and Simulated Annealing
PDF
Optimization of technological process to decrease dimensions of circuits xor ...
Nonlinear Stochastic Optimization by the Monte-Carlo Method
On Approach of Estimation Time Scales of Relaxation of Concentration of Charg...
On Approach of Estimation Time Scales of Relaxation of Concentration of Charg...
On Approach of Estimation Time Scales of Relaxation of Concentration of Charg...
Generalized Functions, Gelfand Triples and the Imaginary Resolvent Theorem
Merged Talk: A Verified Optimizer for Quantum Circuits & Verified Translation...
Graph theoretic approach to solve measurement placement problem for power system
BNL_Research_Report
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Data sparse approximation of the Karhunen-Loeve expansion
The Odd Generalized Exponential Log Logistic Distribution
First paper with the NITheCS affiliation
On a Deterministic Property of the Category of k-almost Primes: A Determinist...
Application of the Monte-Carlo Method to Nonlinear Stochastic Optimization wi...
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
A Mathematical Model for the Hormonal Responses During Neurally Mediated Sync...
Stochastic Approximation and Simulated Annealing
Optimization of technological process to decrease dimensions of circuits xor ...

More from SSA KPI (20)

PDF
Germany presentation
PDF
Grand challenges in energy
PDF
Engineering role in sustainability
PDF
Consensus and interaction on a long term strategy for sustainable development
PDF
Competences in sustainability in engineering education
PDF
Introducatio SD for enginers
PPT
DAAD-10.11.2011
PDF
Talking with money
PDF
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
PDF
Dynamics of dice games
PPT
Energy Security Costs
PPT
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
PDF
Advanced energy technology for sustainable development. Part 5
PDF
Advanced energy technology for sustainable development. Part 4
PDF
Advanced energy technology for sustainable development. Part 3
PDF
Advanced energy technology for sustainable development. Part 2
PDF
Advanced energy technology for sustainable development. Part 1
PPT
Fluorescent proteins in current biology
PPTX
Neurotransmitter systems of the brain and their functions
PPT
Elements of Theory for Multi-Neuronal Systems
Germany presentation
Grand challenges in energy
Engineering role in sustainability
Consensus and interaction on a long term strategy for sustainable development
Competences in sustainability in engineering education
Introducatio SD for enginers
DAAD-10.11.2011
Talking with money
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
Dynamics of dice games
Energy Security Costs
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 1
Fluorescent proteins in current biology
Neurotransmitter systems of the brain and their functions
Elements of Theory for Multi-Neuronal Systems

Recently uploaded (20)

PPTX
Cell Structure & Organelles in detailed.
PDF
01-Introduction-to-Information-Management.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Lesson notes of climatology university.
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Presentation on HIE in infants and its manifestations
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Complications of Minimal Access Surgery at WLH
PPTX
GDM (1) (1).pptx small presentation for students
Cell Structure & Organelles in detailed.
01-Introduction-to-Information-Management.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Lesson notes of climatology university.
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Chinmaya Tiranga quiz Grand Finale.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Supply Chain Operations Speaking Notes -ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Presentation on HIE in infants and its manifestations
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Complications of Minimal Access Surgery at WLH
GDM (1) (1).pptx small presentation for students

Nonlinear stochastic programming by Monte-Carlo estimators

  • 1. European Journal of Operational Research 137 (2002) 558–573 www.elsevier.com/locate/dsw Stochastics and Statistics Nonlinear stochastic programming by Monte-Carlo estimators Leonidas L. Sakalauskas * Department of Statistical Modelling, Institute of Mathematics and Informatics, Akademijos 4, Vilnius 2600, Lithuania Received 18 January 2000; accepted 26 February 2001 Abstract Methods for solving stochastic programming (SP) problems by a finite series of Monte-Carlo samples are considered. The accuracy of solution is treated in a statistical manner, testing the hypothesis of optimality according to statistical criteria. The rule for adjusting the Monte-Carlo sample size is introduced to ensure the convergence and to find the solution of the SP problem using a reasonable number of Monte-Carlo trials. Issues of implementation of the developed approach in decision making and other applicable fields are considered too. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Monte-Carlo method; Stochastic programming; Kuhn–Tucker conditions; Statistical hypothesis 1. Introduction We consider the standard stochastic programming (SP) problem with a single inequality-constraint F0 ðxÞ ¼ Ef0 ð x; xÞ ! min; F1 ðxÞ ¼ Ef1 ð x; xÞ 6 0; ð1Þ n x2R ; where functions fi : Rn  X ! R, i ¼ 0; 1, satisfy certain conditions on continuity and differentiability, x 2 X is an elementary event in some probability space ðX; R; Px Þ, E is the symbol of mathematical ex- pectation. Assume that we have a possibility to get finite sequences of realizations of x and that values of functions f0 ; f1 are available at any point x 2 Rn for any realization of x 2 X. The constraint optimization with mean-valued objective and constraint functions occurs in many ap- plied problems of engineering, statistics, finances, business management, etc. Stochastic procedures for solving problems of this kind are often considered and several concepts are proposed to ensure and improve * Tel.: +370-2-729312; fax: +370-2-729209. E-mail address: sakal@ktl.mii.lt (L.L. Sakalauskas). 0377-2217/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 7 - 2 2 1 7 ( 0 1 ) 0 0 1 0 9 - 6
  • 2. L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 559 the convergence behavior of developed methods (Ermolyev and Wets, 1988; Shapiro, 1989; Kall and Wallace, 1994; Prekopa, 1995). The concept of stochastic approximation is quite well studied, when the convergence is ensured by varying certain step-length multipliers in a scheme of stochastic gradient search (see, e.g., Ermolyev, 1976; Mikhalevitch et al., 1987; Ermolyev and Wets, 1988; Uriasyev, 1990; Kall and Wallace, 1994; Dippon, 1998). However, the following obstacles are often mentioned in the implementation of stochastic approximation: • It is not so clear when to stop the process of stochastic approximation. • The methods of stochastic approximation converge rather slowly. Improvement of convergence behavior using more accurate estimators is a well-known concept, which has already found an application in a semi-stochastic approximation (Ermolyev and Wets, 1988; Uriasyev, 1990; Kall and Wallace, 1994). However, the obstacles mentioned above occur here too. In this paper, we attempt to develop this concept starting from a theoretical scheme of methods with a relative stochastic error (see Polyak, 1983). The main requirement of this scheme is that the variance of the stochastic gradient be varied during the optimization procedure so that it would remain propor- tional to the square of the gradient norm. We show that such an approach offers an opportunity to develop implementable algorithms of stochastic programming, using a finite series of Monte-Carlo estimators for the construction of gradient-type methods, where the accuracy of estimators is adjusted in a special way. We briefly survey stochastic differentiation techniques and also introduce a set of Monte-Carlo esti- mators for SP. The accuracy of the solution is treated in a statistical manner, testing the hypothesis of optimality according to statistical criteria. Although we pay special attention to the case of the Gaussian measure Px because of its importance in applications, the technique developed could be successfully ex- tended to other distributions as well as to other cases of SP, that differ from those considered here. The developed methods are studied numerically in solving test examples and problems in practice (see also Sakalauskas, 1992, 1997, 2000; Sakalauskas and Steishunas, 1993). 2. Stochastic differentiation and Monte-Carlo estimators The gradient search is the most often used way of constructing methods for numerical optimization. Since mathematical expectations in (1) are computed explicitly only in rare cases, all the more it is com- plicated to analytically compute the gradients of functions, containing this expression. The Monte-Carlo method is a universal and convenient tool of estimating these expectations and it could be applied to es- timate derivatives too. The procedures of gradient evaluation are often constructed expressing a gradient as an expectation and then evaluating this expectation by means of statistical simulation (see Rubinstein, 1983; Kall and Wallace, 1994; Prekopa, 1995; Ermolyev and Norkin, 1995). Let us introduce a set of Monte-Carlo estimators needed for construction of a stochastic optimization procedure. Assume without loss of simplicity the measure Px to be absolutely continuous, i.e. it can be defined by the density function p : Rn  X ! Rþ . Assume, also, that values of p are available at any point x 2 Rn for any realization of x 2 X. First, let us consider the expectation Z F ðxÞ ¼ Ef ðx; xÞ f ðx; yÞ Á pðx; yÞdy; ð2Þ Rn where the function f and the density function p are differentiable with respect to x in the entire space Rn . Let us denote the support of measure Px as SðxÞ ¼ fy j pðx; yÞ 0g; x 2 Rn :
  • 3. 560 L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 Then it is not difficult to see that the vector-column of the gradient of this function could be expressed as rF ðxÞ ¼ Eðrx f ðx; xÞ þ f ðx; xÞ Á rx ln pðx; xÞÞ ð3Þ (we assume rx ln pðx; yÞ ¼ 0, y 62 SðxÞ). We also see from the equation Erx ln pðx; xÞ ¼ 0 R (which was obtained by differentiating the equation X pðx; yÞdy ¼ 1) that various expressions of the gra- dient follow. For instance, the formula rF ðxÞ ¼ Eð rx f ðx; xÞ þ ðf ðx; xÞ À f ðx; ExÞÞ Á rx ln pðx; xÞÞ ð4Þ serves as an example of such an expression. Thus we see that it is possible to express the expectation and its gradient through a linear operator from the same probability space. Hence, operators (2)–(4) can be estimated by means of the same Monte-Carlo sample. It depends on the task solved, which formula, (3) or (4), is better for use. For instance, expression (4) can provide smaller variances of gradient components than (3), if the variances of x components are small. Now, assume the Monte-Carlo sample to be given for some x 2 D Rn : Y ¼ ðy 1 ; y 2 ; . . . ; y N Þ; ð5Þ where y i are independent random vectors identically distributed with the density pðx; ÁÞ : X ! Rþ . We have the estimates ~ 1 XN Fi ðxÞ ¼ fi ðx; y j Þ; ð6Þ N j¼1 1 X N ~ D2i ðxÞ ¼ F ðfi ðx; y j Þ À Fi ðxÞÞ2 ; ~ i ¼ 0; 1: ð7Þ N À 1 i¼1 If we introduce the Lagrange function Lðx; kÞ ¼ F0 ðxÞ þ k Á F1 ðxÞ; ð8Þ which may be treated as an expectation of the stochastic Lagrange function lðx; k; xÞ ¼ f0 ðx; xÞ þ k Á f1 ðx; xÞ; then the estimate of the gradient of the Lagrange function ~ 1 X j N rx Lðx; kÞ ¼ G; ð9Þ N j¼1 could be considered according to (3), as the average of identically distributed independent vectors Gj ¼ rx lðx; k; y j Þ þ lðx; k; y j Þ Á rx ln pðx; y j Þ; where EGj ¼ rLðx; kÞ, j ¼ 1; . . . ; N . The sampling covariance matrix 1 X j ~ N ~ 0 A¼ ðG À rx LÞ Á ðGj À rx LÞ ð10Þ N j¼1 will be used later on too.
  • 4. L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 561 3. The rule for iterative adjustment of sample size The Monte-Carlo sample size N should be chosen to compute the estimators introduced in the previous section. Sometimes this sample size is taken fixed and sufficiently large to ensure the required accuracy of estimates in all the iterations of the optimization process. Very often this size is about 1000–1500 trials or still more, and, if the number of optimization steps is large, solution of an SP problem can require a great amount of computations (Jun Shao, 1989). Besides, it is well known that the fixed sample size, although very large, is sufficient only to ensure the convergence to some neighborhood of the optimal point (see, e.g., Polyak, 1983; Sakalauskas, 1997). Note that there is no great necessity to estimate functions with a high accuracy on starting the opti- mization, because then it suffices to evaluate only an approximate direction leading to the optimum. Therefore the samples should not be taken large at the beginning of optimization, adjusting their sizes so as to obtain the estimates of the objective and constraint functions with a desired accuracy only at the moment of decision making on the solution finding. It is mentioned above that the concept of using more accurate estimators has already found an application in semi-stochastic approximation methods. We develop here this concept in an iterative scheme of methods with a relative stochastic error (see Polyak, 1983), choosing at every next iteration the sample size inversely proportional to the square of norm of the gradient estimate from the current iteration. It can be proved under some conditions that such a rule enables us to construct the converging stochastic methods of SP. Let xþ 2 Rn be now the solution of SP problem (1). By virtue of the Kuhn–Tucker theorem (see, e.g., Bertsekas, 1982) there exist values kþ P 0; rF0 ðxþ Þ; rF1 ðxþ Þ such that rF0 ðxþ Þ þ kþ Á rF1 ðxþ Þ ¼ 0; kþ Á F1 ðxþ Þ ¼ 0: ð11Þ In SP, the well-known gradient-type procedures of constrained optimization can be applied in iterative search of the Kuhn–Tucker solution (Kall and Wallace, 1994; Ermolyev and Wets, 1988; Mikhalevitch et al., 1987). We analyze the peculiarities of realization of such methods constructing a stochastic version of the well-known Arrow–Hurvitz–Udzava procedure. Theorem 1. Let the functions Fi : Rn ! R, i ¼ 0; 1, expressed as expectations (3), where x 2 X is an event from the probability space ðX; R; Px Þ, Px is an absolutely continuous measure with the density function p : Rn  X ! Rþ , be convex and twice differentiable. Assume that all the eigenvalues of the matrix of second derivatives r2 Lðx; kþ Þ, 8x 2 Rn , are uniformly bounded and belong to the interval ½m; MŠ, m 0, and, besides, xx m jrF1 ðxÞ À rF1 ðxþ Þj 6 jrF1 ðxþ Þj 8x 2 Rn ; jrF1 ðxþ Þj 0; M where ðxþ ; kþ Þ is a point satisfying the Kuhn–Tucker conditions (11). Let us state, in addition, that for any x 2 Rn and any number N P 1 there exists a possibility to obtain estimates (7) and ~ 1 XN rx Fi ðxÞ ¼ Gi ðx; y j Þ; N j¼1 where fy j gN is a sample of independent vectors, identically distributed with the density pðx; ÁÞ : X ! Rþ , 1 EGi ðx; xÞ ¼ rx Fi ðxÞ, and the conditions of uniform boundedness on variances are valid: 2 2 Eðf1 ðx; xÞ À F1 ðxÞÞ d; EjGi ðx; xÞ À rFi ðxÞj K; 8x 2 Rn ; i ¼ 0; 1:
  • 5. 562 L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 Let now an initial point x0 2 Rn , the scalar k0 , and an initial sample size N 0 be given and the random se- 1 quence fxt ; kt gt¼0 be defined according to ~ xtþ1 ¼ xt À q Á rx Lðxt ; kt Þ; ð12Þ ~ ktþ1 ¼ kt þ q Á a Á F1 ðxt Þ; where estimates (6), (9) are used varying the sample size N t according to the rule C N tþ1 P ; ð13Þ ~ x Lðxt ; kt Þj2 q Á jr q 0, a 0, C 0 are certain constants. Then, there exist positive values q, C such that ! 2 þ 2 jkt À kþ j 1 t E jx À x j þ þ t B Á bt ; t ¼ 1; 2; . . . ; a N for certain values 0 b 1, B 0, as q q, C C . The proof is given in Appendix A. 2 Remark. An interesting comment follows. As we can see, the error jxt À xþ j and the sample size N t have a linear rate of changing, conditioned by the value 0 b 1, which depends mostly on the constant C in expression (13) and conditionality of the Hessian of Lagrange function. It follows from the proof of the theorem that N t =N tþ1 % b for large t. Then, by virtue of the formula of geometrical progression X t Q Ni % Nt Á ; i¼0 1Àb where Q is a certain constant. Note now that the stochastic error of estimates depends on the sample size at the moment of the stopping decision, say N t , and, in its turn, the accuracy of the solution. Thus, the ratio of the total number Pt of computations i¼0 N i with the sample size at the stopping moment N t can be considered as bounded by a certain constant and not dependent on this accuracy. Thus, if we have a certain resource to compute one value of the objective or constraint function with an admissible accuracy, then the optimization requires in fact only several times more computations. This enables us to construct reasonable, from a computational viewpoint, stochastic methods for SP. The procedure (12) and the rule (13) are developed further in Section 4. The choice of the best metrics for computing the norm in (13) or other parameters in stochastic gradient procedures of the considered kind requires a separate study. The theorem proved is important in principle and the proof could be extended without much difficulty to the case of unconstrained optimization as well as that of optimization with several constraints (see also Sakalauskas, 1997). 4. The algorithm for SP by the finite series of Monte-Carlo samples While implementing the method, developed in the previous section, we have to keep in mind that the performance and stopping of computation of (12) and (13) could be done only on the base of finite series of
  • 6. L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 563 Monte-Carlo samples (5). In such a case, the application of methods of the theory of statistical decisions is a constructive way of building an implementable method for SP. A possible decision on finding the optimal solution should be examined at each iteration of the opti- mization process. Namely, the decision on optimum finding could be made, if, first, there is no reason to reject the hypothesis on the validity of Kuhn–Tucker conditions (11), and, second, the objective and constraint functions are estimated with admissible accuracy. Since the distribution of introduced estimators (5) and (9) can be approximated by the one- and multivariate Gaussian law, when the Monte-Carlo sample size is sufficiently large (see, e.g., Bhattacharya and Rao, 1976; Box and Watson, 1962; Bentkus and Gotze, 1999), it is convenient to use the well-known methods of normal sample analysis for our purposes. Thus, the validity of the first optimality condition in (11) could be tested by means of the well-known multidimensional Hotelling T 2 -statistics (see, e.g., Krishnaiah and Lee, 1980). Namely, the optimality hypothesis could be accepted for some point x with the significance l, if the following condition vanishes: ~ 0 ~ ðN À nÞ Á ðrx LÞ Á AÀ1 Á ðrx LÞ=n 6 Fishðl; n; N À nÞ; ð14Þ where Fishðl; n; N t À nÞ is the l-quantile of the Fisher distribution with ðn; N t À nÞ degrees of freedom, ~ ~ rx L ¼ rx Lðx; kÞ is the estimate (9) of the Lagrange function gradient, and A is the normalizing matrix (10) estimated at the point ðx; kÞ, N is the size of sample (5). In the opposite case this hypothesis is rejected. Now we make two important remarks related to the statistical error of estimator (6) of the objective and constraint functions. Since the real values of these functions remain unknown to us and we can operate only with their estimators, produced by statistical simulation, first of all, the upper confidence bound of the constraint function estimate has to be used for performance of (12) and for testing the second optimality condition in (14), and, second, the sample size N has to be sufficiently large in order to estimate the confidence intervals of the objective and constraint functions with an admissible accuracy. It is convenient here to use the asymptotic normality and to approximate the respective confidence bounds by means of the sampling variance (7). Summarizing all such considerations, we propose the following algorithm for SP by finite Monte-Carlo series. Assume some initial value k0 P 0, some initial point x0 2 Rn to be given, random sample (5) of a certain initial size N 0 be generated at this point, and Monte-Carlo estimates (6), (7), (9), and (10) be computed. Let us find a solution such that the statistical hypothesis on the validity of the first Kuhn–Tucker condition (11) would not be rejected and the objective and constraint functions would be evaluated with permissible confidence intervals. Then, such a version of procedure (11) could be applied in stochastic solving of (1): ~ xtþ1 ¼ xt À q Á rx Lðxt ; kt Þ; ð15Þ ~ ~ ktþ1 ¼ max½0; kt þ q Á a Á ðF1 ðxt Þ þ gb Á DF1 ðxt ÞÞŠ; where q; a, are the normalizing multipliers, gb is the b-quantile of the standard normal distribution. Note ~ ~ that we introduce an estimate of the upper confidence bound F1 ðxt Þ þ gb Á DF1 ðxt Þ, taking into account the statistical error of estimate (6) of the constraint function and using the normal approximation. As mentioned above, we can choose the constant C and the metric to compute the norm while the rule (13) is implemented. A possible way is to compute the estimate of the gradient norm in the metric, induced by the sampling covariance matrix (10), and to set C ¼ n Á Fishðc; n; N t À nÞ. This is convenient for inter- pretations, because, in such a case, the random error of the gradient does not exceed the gradient norm approximately with probability 1 À c. Such a modification of (13) is as follows: n Á Fishðc; n; N t À nÞ N tþ1 P : ð16Þ q Á ðrx Lt Þ0 Á ðAt ÞÀ1 Á ðrx Lt Þ ~ ~
  • 7. 564 L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 In numerical realization it is recommended to introduce the minimal and maximal values Nmin (usually $20–50) and Nmax to avoid great fluctuations of sample size in iterations. Thus, the procedure (15) is iterated adjusting the sample size according to (13) or (16) and testing the optimality conditions at each iteration. If: (a) Criterion (14) does not contradict the hypothesis that the gradient of the Lagrange function is equal to 0 (first condition in (11)). (b) The constraint condition (second condition in (11)) vanishes with a given probability b: ~ ~ F1 ðxt Þ þ gb Á DF1 ðxt Þ 6 0: (c) The estimated lengths of the confidence interval of the objective function and the constraint one do not exceed the admissible accuracy ei : pffiffiffiffi 2gbi Á DFi = N 6 ei ; i ¼ 0; 1; where gbi is the bi -quantile of the standard normal distribution, there are no reasons to reject the hypothesis about the optimum finding. Therefore, there is a basis to stop the optimization and to make a decision on the optimum finding with an admissible accuracy. If one at least condition out of three is not satisfied, then a next sample is generated and the optimization is continued. As follows from Section 3, the optimization process should stop after generating a finite number of samples (5). Also note that, since the statistical testing of the optimality hypothesis is grounded by the convergence of distribution of sampling estimates to the Gaussian law, additional standard points could be introduced, considering the rate of convergence to the normal law and following from the Berry–Esseen or large de- viation theorems. An extension of the developed approach to the case of several inequalities-constraints is related, first, with the consideration of joint confidence statements when the statistical error of constraint estimates is taken into account. The simplest way is to consider a vectorial case of (12) or (15) and, separately for each constraint, to compute the corresponding confidence bounds in (15) and condition (c). The case of equalities-constraints is reduced to the one considered here, changing each equality by two inequalities. Such remarks should be also taken into consideration applying our approach to other cases of SP as well as to constraint optimization procedures, differing from that of Arrow–Hurwitz–Udzava. 5. Numerical study of the stochastic optimization algorithm Let us discuss a numerical experiment to study the proposed algorithm. Since typical functions are often in practice of a quadratic character with some nonlinear disturbance in a neighborhood of the optimal point, we consider a test example of SP with functions of this kind F ðxÞ Ef0 ðx þ xÞ ! min; ð17Þ P ðf1 ðx þ xÞ 6 0Þ À p P 0; where X n X n f0 ðyÞ ¼ ðai yi2 þ bi Á ð1 À cosðci Á yi ÞÞÞ; f1 ¼ ðyi þ 0:5Þ; i¼1 i¼1 yi ¼ xi þ xi ; xi are random and normally Nð0; d 2 Þ distributed, d ¼ 0:5, a ¼ ð8:00; 5:00; 4:30; 9:10; 1:50; 5:00; 4:00; 4:70; 8:40; 10:00Þ, b ¼ ð3:70; 1:00; 2:10; 0:50; 0:20; 4:00; 2:00; 2:10; 5:80; 5:00Þ, c ¼ ð0:45; 0:50; 0:10; 0:60; 0:35; 0:50; 0:25; 0:15; 0:40; 0:50Þ and n 6 10.
  • 8. L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 565 Such an example may be treated as a stochastic version of the deterministic task of mathematical programming with the objective function f0 and the constraint function, f1 where the controlled variables are measured under some Gaussian error with the variance d 2 . Note that our task is not convex. The stopping criteria introduced above essentially depend on the rate of convergence of the distribution of estimates (7) and (11) to the Gaussian law (see also Sakalauskas and Steishunas, 1993). We study the distribution of statistic in criterion (14) by means of statistical simulation, solving the test task, when p ¼ 0; n ¼ 2. The optimal point is known in this case: xþ ¼ 0 Let the gradient of considered functions be evaluated numerically by the Monte-Carlo estimator following from (3). Thus, 400 Monte-Carlo samples of size N ¼ ð50; 100; 200; 500; 1000Þ were generated and the T 2 -statistics in (15) were computed for each sample. The hypothesis on the difference of empirical distribution of this statistic from the Fisher distri- bution was tested according to the criteria x2 and X2 (see Bolshev and Smirnov, 1983). The value of the first criterion for N ¼ 50 is x2 ¼ 0:2746 at the optimal point against the critical value 0.46 (p ¼ 0:05), and that of the next one is X2 ¼ 1:616 against the critical value 2.49 (p ¼ 0:05). Besides, the hypothesis on the co- incidence of empirical distribution of the considered statistics from the Fisher distribution was rejected at the points differing from the optimal one according to the criteria x2 and X2 (if r ¼ jx À xþ j P 0:1). So the distribution of multidimensional T 2 -statistics, in our case, can be approximated sufficiently well by the Fisher distribution with even not large samples ðN ffi 50Þ. Further, the dependencies of the stopping probability according to criterion (14) on the distance r ¼ jx À xþ j to the optimal point were studied. These dependencies are presented in Fig. 1 (for confidence l ¼ 0:95). The same dependencies are given in Fig. 2 for N ¼ 1000 and l ¼ ð0:90; 0:95; 0:99Þ. So we see that by adjusting the sample size, we are able to test the optimality hypothesis in a statistical way and to evaluate the objective and constraint functions with a desired accuracy. Fig. 1. Stopping probability according to (14), l ¼ 0:95. Fig. 2. Stopping probability according to (14), N ¼ 1000.
  • 9. 566 L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 Now, let us consider the results, obtained for this example, by iterating procedure (15) 40 times and changing the sample size according to the rule (16), where N 0 ¼ Nmin ¼ 50, and Nmax is chosen according to the restrictions on the confidence bounds of the objective and constraint functions. The initial data were chosen or varied: n ¼ ð2; 5; 10Þ, probability in (17) (i.e., p ¼ ð0:0; 0:3; 0:6; 0:9Þ), initial point (x0 ¼ ðx0 ; x0 ; . . . ; x0 Þ, x0 ¼ À1; 1 6 i 6 n), multipliers in (15) ða ¼ 0:1; q ¼ 20), probabilities in (b), (c) and (15) 1 2 n i (b ¼ b0 ¼ b1 ¼ 0:95), confidence in (14) ðl ¼ 0:95Þ, probability in (16) ðc ¼ 0:95Þ, lengths of confidence ~ intervals in (c) (e0 ¼ 5% (from F ðxt Þ), e1 ¼ 0:05). The optimization was repeated 400 times. Conditions (a)– (c) were satisfied one time for all paths of optimization. Thus, a decision could be made on the optimum finding with an admissible accuracy for all paths (the sampling frequency of stopping after t iterations with confidence intervals is presented in Fig. 3 ðp ¼ 0:6; n ¼ 2Þ). Minimal, average and maximal values of the number of iterations and those of the total Monte-Carlo trials are presented in Table 1 that were needed to solve the optimization task ðn ¼ 2Þ, depending on the probability p. The number of iterations and that of total Monte-Carlo trials, needed for stopping, are presented in Table 2 depending on the dimension n ðp ¼ 0; n P 2). The sampling estimates of the stopping point are ~stop ¼ ð0:006 Æ 0:053; À0:003 Æ 0:026Þ x for p ¼ 0; n ¼ 2 (compare with xopt ¼ ð0; 0Þ). These results illustrate that the algorithm proposed can be successfully applied when the objective and constraint functions are convex and smooth only in a neigh- borhood of the optimum. Fig. 3. Frequency of stopping (with confidence interval). Table 1 n¼2 P P Number of iterations, t Total number of trials ð t Nt Þ Min Mean Max Min Mean Max 0.0 6 11:5 Æ 0:2 19 1029 2842 Æ 90 7835 0.3 4 11:2 Æ 0:3 27 1209 4720 Æ 231 18 712 0.6 7 12:5 Æ 0:3 29 1370 4984 Æ 216 15 600 0.9 10 31:5 Æ 1:1 73 1360 13100 Æ 629 37 631 Table 2 p ¼ 0; n P 2 P n Number of iterations, t Total number of trials ð t Nt Þ Min Mean Max Min Mean Max 2 6 11:5 Æ 0:2 19 1029 2842 Æ 90 7835 5 6 12:2 Æ 0:2 21 1333 3696 Æ 104 9021 10 7 13:2 Æ 0:2 27 1405 3930 Æ 101 8668
  • 10. L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 567 ~ ~ ~ The averaged dependencies of the objective function EF0t , the constraint EF1t , the sample size EN t , the ~ ~ Lagrange multiplier Ekt and the solution point Ext depending on the iteration number t are given in Figs. 4– 8 to illustrate the convergence and the behavior of the optimization process ðp ¼ 0:6; n ¼ 2Þ. Also, one path of realization of the optimization process illustrates the stochastic character of this process in these figures. Thus, the theoretical analysis and numerical experiment show that the proposed approach enables us to solve SP problems with a sufficient accuracy using a reasonable amount of computations (5–20 iterations Fig. 4. Change of the objective function. Fig. 5. Change of the constraint function. Fig. 6. Change of the sample size.
  • 11. 568 L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 Fig. 7. Change of the Lagrange multiplier. Fig. 8. Change of the point of iteration: (Â) average, (Ã) path. and 3000–10 000 total Monte-Carlo trials). If we keep in mind that the application of the Monte-Carlo procedure usually requires 1000–2000 trials in statistical simulation and estimation of one value of the function, the optimization by our approach can require only 3–5 times more computation. 6. Application to decision support in investment planning The investment planning in many fields is related with the control of risk factors in the case of uncer- tainty. It is often possible to describe this uncertainty by probabilistic models and thus to reduce the de- cision making problem in finance planning to SP problems. Namely, the objective function could have a meaning of average losses that should be optimized under the constraint on the probability of the desired scenario of events. Let us consider, as a typical example, the investment planning in the gas delivery (see Guldman, 1983; Ermolyev and Wets, 1988). Omitting some details, the task is as follows. Choose x ¼ ðx1 ; . . . ; x12 Þ; z, to minimize the expected costs of purchases, storage operations, and the increment of storage capacity, X 12 X 12 F ðx; zÞ ¼ ct Á xt þ c0 Á E jxt À xt j þ c13 Á maxfxt g þ c14 Á z; t t¼1 t¼1 where xt is the amount of gas ordered from pipeline per month t ¼ 1; 2; . . . ; 12; z is the increment of the gas storage, subject to the condition on probability of a reliable supply,
  • 12. L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 569 8
  • 13. PtÀ1 9
  • 14. ðxt À xt Þ À a1 Á s¼1 ðxs À xs Þ 6 b1 þ a2 Á z
  • 15. PtÀ1 = Uðx; zÞ ¼ p À P ðAÞ p À P x
  • 16. ÀðxtPt xt Þ À a3 Á s¼1 ðxs À xs Þ 6 b2 þ a4 Á z 6 0; À
  • 17. 0 6 s¼1 ðxs À xs Þ 6 b3 þ a5 Á z
  • 18. :
  • 19. ; t ¼ 1; 2; 3; . . . ; 12: where xt is the actual gas consumption per month t, considered as independent and normal Nðlt ; rt Þ, t ¼ 1; 2; . . . ; 12; p is the probability of a reliable supply, aj 0, bi 0, j ¼ 1; 2; 3; 4; 5; i ¼ 1; 2; 3. More details and data can be found in Guldman (1983) or Ermolyev and Wets (1988). Using the technique of Section 1 we express F and U as integrals and differentiate them: P12 # oF ðxt À lt Þ Á s¼1 jxs À xs j ¼ ct þ c13 Á ut þ c0 Á E ; oxt r2 t oU ¼ E½ðxt À lt Þ Á ðp À vA ðxÞÞ=r2 Š; t oxt where 1 if xt ¼ maxs fxs g; ut ¼ 0 in the opposite case: The derivative of the objective function with respect to z is found trivially. Using the method of simplicial approximations (see, e.g., Director, 1977) we can make sure that the derivative of the constraint function with respect to z could be expressed as !# oU X ðxt À xt Þðxt À lt Þ T ¼ E H ðxÞ Á vA ðxÞ Á T À ; oz t¼1 r2t T ¼ 12; H ðxÞ ¼ 1=ðaj z þ bi Þ, where aj ; bi , j ¼ 2; 4; 5; i ¼ 1; 2; 3, are taken such that the expres- sion ( ( PtÀ1 PtÀ1 Pt )) ðxt À xt Þ À a1 Á s¼1 ðxs À xs Þ Àðxt À xt Þ À a3 Á s¼1 ðxs À xs Þ ðxs À xs Þ max max ; ; s¼1 t a2 z þ b1 a4 z þ b2 a5 z þ b3 achieves the maximum for respective aj ; bi . Hence, the objective and the constraint functions and their gradients are expressed as expectations. Now we can estimate these expectations by the Monte-Carlo method and apply the algorithm (15), (16) with stopping rules (a)–(c). The results of application are presented in Table 3 (costs and supply reliability are evaluated by the Monte-Carlo method, costs are in thousands of USD, the confidence interval of cost Table 3 Desired reliability, p 0.8 0.85 0.9 0.95 0.99 Initial costs 550,337 553,730 558,001 564,340 576,200 Reliability ð0:76; 0:78Þ ð0:82; 0:84Þ ð0:89; 0:90Þ ð0:94; 0:95Þ ð0:98; 0:99Þ Optimal costs 547,686 552,990 556,340 563,459 576,148 Reliability ð0:806; 0:827Þ ð0:859; 0:877Þ ð0:901; 0:916Þ ð0:951; 0:961Þ ð0:991; 0:996Þ Probability of event B ð0:983; 0:987Þ ð0:985; 0:989Þ ð0:979; 0:985Þ ð0:984; 0:990Þ ð0:990; 0:993Þ
  • 20. 570 L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 estimates does not exceed 0.01 (i.e., 10,000 USD)). The initial approximation of variables xt ; z was taken from Ermolyev and Wets (1988), obtained under the following assumption (see Guldman, 1983):
  • 21. xt P xt ; t ¼ 1; 2; . . . ; 7 P fBg P 0:999; where B ¼ x
  • 23. : xt 6 xt ; t ¼ 8; 9; 10; 11; 12 This assumption allowed us to linearize the objective function and obtain an approximate solution analytically. Thus we see from the table that the proposed SP method enables us to improve the initial analytical solution obtained from the linearity assumption: stochastic optimization makes it possible to decrease the costs and, at the same time, to guarantee the reliability of gas delivery. Note that our ex- amination by the Monte-Carlo method has shown that this assumption is not acceptable at the point of optimal solution (the last row in the table). 7. Discussion and conclusions The stochastic iterative method has been developed to solve the SP problems by a finite sequence of Monte-Carlo samples. This method is grounded by the stopping procedure, the rule for iterative regu- lation of the size of Monte-Carlo samples and taking into consideration the stochastic model risk. The proposed stopping procedure allows us to test the optimality hypothesis and evaluate the confidence intervals of the objective and constraint functions in a statistical way. The numerical experiment has shown the acceptability of this procedure, when the Monte-Carlo sample size is N P 50. The adjustment of sample size, when it is taken inversely proportional to the square of the norm of the Monte-Carlo estimate of the gradient, guarantees the convergence a.s. at a linear rate. The numerical study and an example in practice corroborate the theoretical conclusions and show that the procedures developed make it possible to solve SP problems with a sufficient admissible accuracy by means of the acceptable amount of computations (5–20 iterations and 3000–10 000 total Monte-Carlo trials). If we keep in mind that the application of the Monte-Carlo procedure usually requires 1000–2000 trials for statistical sim- ulation and estimation of one value of the function, the optimization by our approach can require only 3–5 times more computation. Appendix A Lemma 1. Let the ðn þ 1Þ Â ðn þ 1Þ matrix pffiffiffi pAffi 0 ffiffi aÁb W ¼ ; À aÁc 0 be given, where all the eigenvalues of the symmetric n  n matrix A belong to the interval ½m; MŠ and b; c 2 Rn are the vectors such that jb À cj 6 ðm=MÞjcj, jcj ¼ 0. Then the real part of any eigenvalue of W is positive. 6 Proof. First, let us consider the case of diagonal matrix: A ¼ diagðb1 ; b2 ; . . . ; bn Þ, where values bi are ar- ranged in increasing order, i.e., b1 ¼ m, bn ¼ M. By virtue of the well-known theorem on the determinant of block matrix (Gantmacher, 1988) ! X bi Á c i n an detðW À b Á In Þ ¼ À b þ a Á Á ðbi À bÞ: i¼1 bi À b i¼0
  • 24. L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 571 Hence, eigenvalues of W are roots of the equation X bi Á c i n b¼aÁ : i¼1 bi À b We have for b 6 0, X bi Á c i n X c2 n X ci Á ðci À bi Þ n i ¼ À i¼1 bi À b i¼1 bi À b i¼1 bi À b X c2 n i M À b jc À bj Á 1À Á P 0: i¼1 bi À b mÀb jcj Hence all the real eigenvalues of W are positive. It is also proved simply, that the real part of the eigenvalue is positive, when its imaginary part is nonzero. Since any symmetric matrix can be transformed to a di- agonal one after the orthonormal transformation without changing the eigenvalues and the vector norm, the proof of the lemma could be considered as completed. Ã Proof of Theorem 1. Let us introduce the Lyapunov function
  • 28. 2 1 þ 2 V ðx; k; N Þ ¼ j x À x j þ þ : a N We have from the Lagrange formula (Dieudonne, 1960) that rx Lðx; kÞ ¼ ðx À xþ Þ Á DðxÞ þ ðk À kþ Þ Á rx F1 ðxÞ; ðA:1Þ where Z 1 DðxÞ ¼ r2 Lðxþ þ s Á ðx À xþ Þ; kþ Þds: xx 0 Let us consider the matrix pffiffiffi DðxÞ a Á rx F1 ðxÞ W ðxÞ ¼ pffiffiffi R 1 : À a Á 0 rx F1 ðxþ þ s Á ðx À xþ ÞÞ Á ds 0 It follows from Lemma 1 and the assumptions of the theorem that all the eigenvalues of the matrix W ðxÞ are positive and uniformly bounded. Let us introduce minimal and maximal values: 0 cÀ 6 cþ . Then kInþ1 À q Á W ðxÞk 6 1 À q Á cÀ ðA:2Þ for any x 2 Rn , if q 6 1=cþ . Next, by virtue of the theorem assumption on variances of the estimates of the gradient of the objective and constraint functions we have 2 2 2 K Á ð1 þ kÞ 2 Á K Á ðð1 þ kþ Þ þ ðk À kþ Þ Þ Ejrx Lðx; kÞ À rx Lðx; kÞj2 6 ~ 6 : ðA:3Þ N N 1 1 Now, assume fFt gt¼0 to be a stream of r-algebras generated by the sequence fxt gt¼0 (Shyryajev, 1989) and denote
  • 29. 572 L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 0 kt À kþ zt ¼ xt À xþ ; pffiffiffi : a It follows from (A.1)–(A.3), (12), (13) and assumptions of the theorem that ! 2 2 tþ1 tþ1 tþ1 t þ ~ t t 2 ðkt À kþ þ q Á a Á F1 ðxt ÞÞ ~ ~ jrx Lðxt ;kt Þj ðV ðx ;k ;N ÞjFt Þ ¼ E jx À x À q Árx Lðx ;k Þj þ þqÁ a C jrx Lðxt ; kt Þj 2 q ~ 2 6 jðInþ1 À q Á W ðxt ÞÞ Á zt j2 þ q Á þ q þ Á Ejrx Lðxt ;kt Þ À rx Lðxt ;kt Þj2 C C þ q2 Á a Á EðF1 ðxt Þ À F1 ðxt ÞÞ2 ~ 2 q Á c2 t 2 2K Á q Á ðq þ ð1=CÞÞ Á ðkt À kþÞ2 2þ 6 ð1 À q Á cÀ Þ Á jzt j þ Á jz j þ C Nt þ 2 q Á ð2 Á K Á ðq þ ð1=CÞÞ Á ð1 þ k Þ þ a Á q Á dÞ þ N t A2 6 1 À q Á 2cÀ À q Á A1 À Á V ðxt ;kt ;N t Þ; C where 2 A1 ¼ c2 þ a Á d þ 2Kð1 þ ð1 þ kþ Þ Þ; À 2 A2 ¼ c2 þ 2K Á ð1 þ ð1 þ kþ Þ Þ: þ It is easy to see that A2 b ¼ 1 À q Á 2cÀ À q Á A1 À 1 À q Á cÀ 1; if 0 q q; C C ; C where 1 cÀ c q ¼ min ; ; C¼ À : cþ 2A1 2A2 The proof of the theorem is completed. Ã References Bentkus, V., Gotze, F., 1999. Optimal bounds in non-Gaussian limit theorems for U -statistics. Annals of Probability 27 (1), 454–521. Bertsekas, D.I., 1982. Constrained Optimization and Lagrange Multiplier Methods. Academic Press, Paris. Bhattacharya, R.N., Rao, R.R., 1976. Normal Approximation and Asymptotic Expansions. Wiley, New York. Bolshev, L.N., Smirnov, N.V., 1983. Tables of Mathematical Statistics. Nauka, Moscow, (in Russian). Box, G., Watson, G., 1962. Robustness to non-normality of regression tests. Biometrika 49, 93–106. Dieudonne, J., 1960. Foundations of Modern Analysis. Academic Press, New York. Dippon, J., 1998. Globally convergent stochastic optimization with optimal asymptotic distribution. Journal of Applied Probability 353 (2), 395–402. Director, S., 1977. The simplicial approximation to design centering. IEEE Transactions on Circuits and Systems 24 (7), 363–372. Ermolyev, Ju., 1976. Methods of Stochastic Programming. Nauka, Moscow, (in Russian). Ermolyev, Yu., Norkin, I., 1995. On nonsmooth problems of stochastics systems optimization. WP-95-96, IIASA, A-2361, Laxenburg, Austria.
  • 30. L.L. Sakalauskas / European Journal of Operational Research 137 (2002) 558–573 573 Ermolyev, Yu., Wets, R., 1988. Numerical Techniques for Stochastic Optimization. Springer, Berlin. Gantmacher, F.R., 1988. Matrix Theory. Nauka, Moscow (in Russian). Guldman, J., 1983. Supply, storage and service reliability decisions by gas distribution utilities: A chance-constrained approach. Management Science 29 (8), 884–906. Jun Shao, 1989. Monte-Carlo approximations in Bayesian decision theory. Journal of the American Statistical Association 84 (407), 727–732. Kall, P., Wallace, S.W., 1994. In: Stochastic Programming. Wiley, Chichester, p. 307. Krishnaiah, P.R., Lee, J.C., 1980. Handbook of Statistics. Analysis of Variance, vol. 1. North-Holland, Amsterdam. Mikhalevitch, V.S., Gupal, A.M., Norkin, V.I., 1987. Methods of Nonconvex Optimization. Nauka, Moscow (in Russian). Polyak, B.T., 1983. In: Introduction to Optimization. Nauka, Moscow, p. 284 (in Russian). Prekopa, A., 1995. In: Stochastic Programming. Kluwer Academic Publishers, Dordrecht, p. 599. Rubinstein, R., 1983. Smoothed functionals in stochastic optimization. Mathematical Operations Research (8), 26–33. Sakalauskas, L., 1992. System for statistical simulation and optimization of linear hybrid circuits. In: Proceedings of the 6th European Conference on Mathematics in Industry (ECMI’91), August 27–31, 1991, Limerick, Ireland. Teubner, Stuttgart, pp. 259–262. Sakalauskas, L., 1997. A centering by the Monte-Carlo method. Stochastic Analysis and Applications 15 (4), 615–627. Sakalauskas, L., 2000. Nonlinear stochastic optimization by the Monte-Carlo method. Informatica (4), 455–468. Sakalauskas, L.L., Steishunas, S., 1993. Stochastic optimization method based on the Monte-Carlo simulation. In: Proceedings of the International AMSE Conference on Applied Modelling and Simulation, Lviv (Ukraine), September 30 – October 2, 1993. AMSE, New York, pp. 19–23. Shapiro, A., 1989. Asymptotic properties of statistical estimators in stochastic programming. The Annals of Statistics 17 (2), 841–858. Shyryajev, A., 1989. Probability. Nauka, Moscow (in Russian). Uriasyev, S.P., 1990. Adaptive Algorithms of Stochastic Optimization and Theory of Games. Nauka, Moscow (in Russian).