SlideShare a Scribd company logo
On Inexact Newton Directions in
Interior Point Methods for Linear
         Optimization




         Ghussoun Al-Jeiroudi




          Doctor of Philosophy
         University of Edinburgh

                  2008
Declaration
I declare that this thesis was composed by myself and that the work contained
therein is my own, except where explicitly stated otherwise in the text.

                                                     (Ghussoun Al-Jeiroudi)
Abstract

In each iteration of the interior point method (IPM) at least one linear system
has to be solved. The main computational effort of IPMs consists in the com-
putation of these linear systems. Solving the corresponding linear systems
with a direct method becomes very expensive for large scale problems.
   In this thesis, we have been concerned with using an iterative method for
solving the reduced KKT systems arising in IPMs for linear programming.
The augmented system form of this linear system has a number of advan-
tages, notably a higher degree of sparsity than the normal equations form.
We design a block triangular preconditioner for this system which is con-
structed by using a nonsingular basis matrix identified from an estimate of
the optimal partition in the linear program. We use the preconditioned con-
jugate gradients (PCG) method to solve the augmented system. Although
the augmented system is indefinite, short recurrence iterative methods such
as PCG can be applied to indefinite system in certain situations. This ap-
proach has been implemented within the HOPDM interior point solver.
   The KKT system is solved approximately. Therefore, it becomes neces-
sary to study the convergence of IPM for this inexact case. We present the
convergence analysis of the inexact infeasible path-following algorithm, prove
the global convergence of this method and provide complexity analysis.
Acknowledgements

I would like to express my sincere thanks to Professor Jacek Gondzio. I can
honestly say I have been extremely fortunate to have him as my supervisor.
He has been my encyclopaedia of research knowledge. I would like to thank
him for giving me this opportunity and having belief in me.
   I would like to thank Dr. Julian Hall for giving me the opportunity to
work in programming with him. I have learnt a lot from him. I have been
honoured to work with such an enlightened individual.
   I would also like to thank all who have given me motivation and helped me
through out my Ph.D. Thanks to my friends who have shared with me hard
moments as well as beautiful moments. I would like to thank all my friends
who have introduced me to many different cultures and have contributed to
an experience that I will never forget.
   The study could not have taken place without a sponsor. I would like to
acknowledge the University of Damascus for sponsoring me throughout my
Ph.D.
   I would also like to take this opportunity and thank my family, for their
love and support in all my pursuits in life.
Contents

1 Introduction                                                               7
  1.1   Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
  1.2   Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
  1.3   The structure of the thesis . . . . . . . . . . . . . . . . . . . . 17
  1.4   Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Fundamentals                                                              20
  2.1   The Interior Point Method . . . . . . . . . . . . . . . . . . . . 20
        2.1.1   The IPM for linear programming . . . . . . . . . . . . 20
        2.1.2   The Primal-Dual Interior Point Algorithms . . . . . . . 22
  2.2   Newton method . . . . . . . . . . . . . . . . . . . . . . . . . . 28
        2.2.1   The convergence of Newton method . . . . . . . . . . . 29
        2.2.2   Termination of the iteration . . . . . . . . . . . . . . . 30
        2.2.3   Error in the function and derivative . . . . . . . . . . . 30
  2.3   Inexact Newton method . . . . . . . . . . . . . . . . . . . . . 31
        2.3.1   The convergence of Inexact Newton Method . . . . . . 31
  2.4   Methods for solving a linear system . . . . . . . . . . . . . . . 33
        2.4.1   Sparse Matrices . . . . . . . . . . . . . . . . . . . . . . 33
        2.4.2   Direct Methods . . . . . . . . . . . . . . . . . . . . . . 35
                2.4.2.1 Gaussian elimination . . . . . . . . . . . . . . 35


                                      4
5


                2.4.2.2 Cholesky factorisation . . . . . . . . . . . . . . 36
        2.4.3   Iterative Methods . . . . . . . . . . . . . . . . . . . . . 36
                2.4.3.1 Stationary Iterative Methods . . . . . . . . . . 37
                         a. Jacobi Method . . . . . . . . . . . . . . . 37
                         b.   Gauss-Seidel Method . . . . . . . . . . . 37
                         c. Arrow-Hurwicz and Uzawa Methods . . . 38
                2.4.3.2 Krylov Subspace Methods . . . . . . . . . . . . 39
                         a.   Conjugate Gradient Method . . . . . . . 40
                         b.   GMRES Method . . . . . . . . . . . . . . 45
                         c.   BiConjugate Gradient Method . . . . . . 47
                         e.   MINRES and SYMMLQ Method . . . . . 48
        2.4.4   Null Space Methods . . . . . . . . . . . . . . . . . . . 48

3 The PCG Method for the Augmented System                                   52
  3.1   Preconditioner . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
        3.1.1   Solving equations with P . . . . . . . . . . . . . . . . . 65
  3.2   Spectral analysis . . . . . . . . . . . . . . . . . . . . . . . . . 66
  3.3   The PCG method for nonsymmetric indefinite system . . . . . 71
        3.3.1   The convergence of the PCG method . . . . . . . . . . 75
  3.4   Identifying and factorising the matrix B . . . . . . . . . . . . 82
        3.4.1   Identifying the columns of B via Gaussian elimination       83

4 Inexact Interior Point Method                                             86
  4.1   The residual of inexact Newton method . . . . . . . . . . . . . 90
  4.2   Convergence of the IIPF Algorithm . . . . . . . . . . . . . . . 93
        4.2.1   Inexact Infeasible Path-Following Algorithm . . . . . . 94

5 Numerical Results                                                       108
6


6 Conclusions      119

7   Bibliography   122
Chapter 1

Introduction

Interior point methods constitute the core of many popular solvers for linear
and nonlinear optimization. In linear programming however, that was not
always the case due to the total dominance of the simplex method. The
simplex method was invented by Dantzig in 1947. It is an iterative technique,
where the iterates move from vertex to vertex until an optimal vertex is found.
The simplex method may visit every vertex of the feasible polyhedron. That
makes the complexity result of this method poor: the worst-case complexity
of the simplex method is exponential in the problem dimension. Accordingly,
there was great interest in finding a method with polynomial complexity.
In 1984 Karmarkar presented a new polynomial-time algorithm for linear
programming. He claimed to be able to solve linear programs up to 50 times
faster than the simplex method. That was the start of the “interior point
revolution” [48], which like many other revolutions, includes old ideas that
are rediscovered or seen in a different light, along with genuinely new ones.
See [3, 27, 76].
   An interior point method (IPM for short) is a powerful tool to solve
linear, quadratic and nonlinear programming problems. In this thesis we are


                                      7
Chapter 1. Introduction                                                                8


concerned with the use of primal-dual interior point methods to solve large-
scale linear programming problems. A primal-dual method is applied to the
primal-dual formulation of linear program


            Primal                                         Dual

    (P )    min           cT x                       (D)   max     bT y
            s.t.          Ax = b,                           s.t.   AT y + s = c,
                          x ≥ 0;                                   y free, s ≥ 0,

where A ∈ Rm×n , x, s, c ∈ Rn and y, b ∈ Rm . x, y and s are primal, dual
and slack variables respectively. We assume that m ≤ n and the matrix
A has full row rank. Primal-dual techniques are usually faster and more
reliable than pure primal or pure dual approaches [3, 38, 77]. In order to
solve problem (P), we need to find the solution of the Karush-Kuhn-Tucker
(KKT) optimality conditions:


                                      Ax − b = 0
                                    AT y + s − c = 0
                                                                                    (1.1)
                                       XSe = 0
                                       (x, s) ≥ 0.

where X = diag(x), S = diag(s) and e ∈ Rn is the vector of all ones. Interior
point methods approach the optimal solution by moving through the interior
of the feasible region. This is done by introducing a central path C joined
with a parameter τ > 0. The central path C is an arc of strictly feasible
points, which is defined as


               C = {(x, y, s) ∈ F0 : xi si = τ for all i = 1, ..., n},
Chapter 1. Introduction                                                     9


where F0 is the primal-dual strictly feasible set defined by


               F0 = {(x, y, s) : Ax = b, AT y + s = c, (x, s) > 0}.




    The KKT conditions are replaced by the following conditions:


                                   Ax − b = 0
                                AT y + s − c = 0
                                                                         (1.2)
                                   XSe = τ e
                                   (x, s) > 0.

These conditions differ from the KKT conditions only in the term µ and
in the requirement for (x, s) to be strictly positive. The central path C is
well defined because the system (1.2) has unique solution for each τ > 0.
Furthermore, the points on the central path C converges to a primal-dual
solution of the linear program (P) when τ converges to zero if F0 is nonempty.
τ is equal to or smaller than the current barrier parameter µ = xT s/n. The
target value τ = σµ is used, where σ ∈ [0, 1] is the centering parameter. See
[77].
    The previous system (1.2) can be rewritten as the following
                                                  
                                        Ax − b
                                                  
                          F (t) =  AT y + s − c    = 0,
                                                  
                                                                       (1.3)
                                    XSe − σµe
                                      x > 0,   s > 0,

where t = (x, y, s).
    Most primal-dual algorithms take Newton steps toward points on central
Chapter 1. Introduction                                                       10


path C for which µ > 0, where the direction at each iteration is computed
according to




                                   F (t)∆t = −F (t),                        (1.4)


where F (t) is the derivative of F (t). That yields
                                                                  
                  A 0 0             ∆x           Ax − b
                                                                  
                 0 AT I          ∆y  = −  A y + s − c
                                              T
                                                                       .   (1.5)
                                                                   
                                                                  
                  S 0 X             ∆s         XSe − σµe

In computational practice, (1.5) is reduced: after substituting


                          ∆s = −X −1 S∆x − s + σµX −1 e,                    (1.6)


in the second row we get the following symmetric indefinite system of linear
equations, usually called the augmented system
                                                             
                                −1      T
                              −Θ      A          ∆x           f
                                                   =         ,        (1.7)
                               A       0         ∆y           g

where Θ = XS −1 , f = AT y − c + σµX −1 e and g = Ax − b. In many
implementations, (1.7) is further reduced to the normal equations form


                                   AΘAT ∆y = AΘf + g.                       (1.8)
Chapter 1. Introduction                                                      11


1.1        Motivation
The goal of this thesis is to explore how existing techniques in the areas of
numerical analysis and linear algebra can be refined and combined into a
new approach of a new inexact Newton method iteration to be employed
in interior point methods. We are interested in using the preconditioned
conjugate gradient method to solve the augmented system (1.7) and studying
the convergence behaviour of the resulting interior point algorithm.
    In each iteration of interior point methods, one of the linear systems (1.7)
or (1.8) has to be solved. The main computational effort of an interior point
iteration is the solution of these linear systems. Accordingly, in recent years
extensive research has been devoted to developing techniques for solving these
systems. In chapter 2 we survey some of the popular solution methods for
these linear systems.
    Historically, the normal equations system (1.8) was solved directly, be-
cause this system is symmetric and positive definite and its dimension is
smaller compared to the augmented system [52, 73]. In [24, 31, 32, 35],
Cholesky factorisation is used to factorise the normal equations matrix into
a lower triangular matrix multiplied with its transpose, then forward and
backward substitutions are used to solve the normal equations. In order to
speed up solving a linear system by Cholesky factorisation, a reordering for
sparsity is required. There are two famous heuristic orderings, the minimum
degree and the minimum local fill-in orderings, see [24, 31, 32, 35].
    The size of optimization problems has been increasing dramatically. Solv-
ing the linear systems (1.7) and (1.8) with a direct method is often very dif-
ficult for large problems, even when ordering to exploit the sparsity is taken
into consideration. This is due to three main reasons. Firstly, the normal
equations (1.8) may easily get dense even though the constraint matrix A is
Chapter 1. Introduction                                                       12


not. Secondly, although the augmented system is usually sparse for a sparse
constraint matrix, it is nevertheless, indefinite. Finally, the linear systems
(1.7) and (1.8) become extremely ill-conditioned as the IPM approaches the
solution, which leads to numerical instability. These difficulties make many
researchers interested in finding alternative techniques for solving the linear
systems (1.7) and (1.8). The idea was to use an iterative method to solve
these linear systems. Iterative methods however, usually fail to solve these
systems without preconditioning. The term preconditioning refers to trans-
forming a linear system into another system with more favorable properties
for iterative solution [75]. Therefore, there is an urgent need for designing
good preconditioners, as a good preconditioner is the key ingredient for solv-
ing a linear system iteratively. That makes a significant number of researchers
tackle this issue [10, 28, 44, 45].
    For the same reasons as above the normal equations system is nominated
again to be solved by using an iterative method. As the system is sym-
metric and positive definite, the preconditioned conjugate gradient (PCG)
method [42] is an appropriate iterative method to solve this system. The
PCG method is one of the most popular iterative methods, because it is a
short recurrence method and it has strong convergence properties, see sec-
tion 2.4. In [15, 42, 45, 54, 55], the PCG method is used to solve the normal
equations. The preconditioners in [45, 54, 55] are the incomplete Cholesky
factorisation of the normal equations matrix. The incomplete Cholesky fac-
torisation was proposed by Meijerink and Van Der Vorst (1977) [56] to be
used with symmetric Hermitian matrices. There are two strategies of identi-
fying the position of the nonzero elements in this factorisation: the fixed fill-in
strategy and the drop-tolerance strategy, see [12]. These types of precondi-
tioner do not always work as well as expected. However, they are constructed
Chapter 1. Introduction                                                      13


by using fascinating techniques of linear algebra. These preconditioners are
effective in the early stage of IPM, but they start to struggle in the final
iterations. This is due to the extreme ill-conditioned nature of this system
in the final iterations of IPM. Therefore, it is necessary to design a precon-
ditioner after understanding the nature of the problem, in particular at the
final iterations of IPM.
    We notice that iterative methods struggle to solve the linear systems in
the final iterations of an IPM, due to the extreme ill-conditioning of these
systems. Therefore we are concerned with finding an iterative approach to
solve these linear systems efficiently in the final iterations of an IPM. In this
thesis we will convert the disadvantages of the final iterations of IPM into
an advantage, and we will construct our preconditioner for the augmented
system (1.7) by exploiting the issues that leads to the ill-conditioning of this
system.
    There are many important reasons why we choose to work with the aug-
mented system. The first reason is that the augmented system is sparser
compared with the normal equations. Factoring the augmented system (1.7)
often produces significant savings in the number of nonzero entries over fac-
toring the normal equations. The existence of a dense column in the con-
straint matrix A results in a straightforward dense normal equations matrix.
For an example of such a situation, see [3, 24] and the references therein.
Compared with Cholesky factorisation for the normal equations, the aug-
mented system factorisation enjoys an additional degree of freedom resulting
from the ability to interchange pivots between diagonal elements of Θ and
diagonal elements of the already filled (2, 2) block in (1.7). We aim to exploit
these advantages when we construct our preconditioner for the augmented
system.
Chapter 1. Introduction                                                   14


    The second reason is that the augmented system may have a better con-
dition number compared to the normal equations, after suitable scaling as
suggested in [4]. The ill-conditioning in these systems is due to the matrix
Θ, since some of its elements move toward zero and the others move to-
ward infinity. The position of Θ in the augmented system makes it easier
to control the ill-conditioning of the augmented system when designing a
preconditioner.
    The final reason comes from the analysis by Oliveira and Sorensen [60]
who propose a preconditioner for the augmented system (1.7), and then re-
duce the preconditioned system to positive definite normal equations, al-
lowing them to use the conjugate gradients method to solve (1.8). They
show in [60] that all preconditioners for the normal equations system have
an equivalent for the augmented system, while the converse is not true. More
precisely, they show that the whole classes of (different) preconditioners for
the augmented system can result in the same preconditioner for the nor-
mal equations. We consider this to be a strong argument for constructing a
preconditioner for the augmented system.



1.2        Contributions
The contributions of this research are as follows.
    First, we design a block triangular preconditioner for the augmented sys-
tem (1.7). To construct this preconditioner, we partition the constraint ma-
trix A into two matrices. The first one is nonsingular matrix with size m,
while the other one is the remaining matrix. The idea is to use the basic
and nonbasic partition which is used in the simplex method, with one mean
different; in the simplex method one has exactly m basic and n − m nonbasic
Chapter 1. Introduction                                                     15


variables at each iteration, while in interior point method this is true in the
optimal solution. So, such partition becomes clearer at final iterations of
interior point method, where we suggest using our preconditioner. The non-
singular matrix in our partition represents an approximation of the basic part
of the variables. After designing this preconditioner, we perform a spectral
analysis of the preconditioned matrix. We also show that the preconditioned
matrix has n + m − p unit eigenvalues and the remaining eigenvalues are
positive and greater or equal one, where p is the rank of the second matrix
of the partition of A.
    We propose preconditioner for the augmented system and go a step fur-
ther than in [60]. Instead of reducing the augmented system to normal equa-
tions and then applying an iterative method, we use the preconditioned con-
jugate gradients method to solve the indefinite system (1.7). We are aware
of the disadvantages associated with applying the PCG method to indef-
inite systems [26]. However, we are motivated by the recent analyses of
Lukˇan and Vlˇek [51] and Rozlozn´ and Simoncini [65] showing that short
   s         c                   ık
recurrence iterative methods such as conjugate gradients can be applied to
indefinite systems in certain situations. We show in particular that the anal-
ysis of [65] may be applied to the preconditioner proposed in this thesis. We
prove that the PCG method, when applied to the indefinite system (1.7)
preconditioned with our proposed preconditioner, converges as in the case of
a symmetric positive definite system. The convergence of the PCG method
is proved by showing that the error term and the residual converge to zero.
The error and the residual bounds are given by Theorem 3.3.4 and Theorem
3.3.5 respectively, which is related to symmetric positive definite matrices.
    We have implemented this iterative approach in the final iterations of the
interior point solver HOPDM when the normal equations system becomes
Chapter 1. Introduction                                                     16


ill conditioned. The implementation within HOPDM shows remarkable im-
provement on a series of problems, see the numerical results in Chapter 5.
    A consequence of using an iterative method to solve the linear systems
which arise in interior point methods is that the search direction is computed
approximately. Hence, instead of the pure Newton iteration (1.4), we now
have the following


                          F (tk )∆tk = −F (tk ) + rk ,


which is an inexact Newton iteration. This causes a major difference in an
interior point algorithm, whose convergence is proved under the assumption
that the search directions are calculated exactly. Our final contribution is the
convergence analysis of an interior point algorithm with our specific inexact
Newton direction.
    We use the PCG method to solve the augmented system preconditioned
with a block triangular matrix P . This yields a specific inexact interior point
method. In this thesis we focus on the convergence analysis of one interior
point algorithm for this inexact case. This algorithm is the infeasible path-
following (IPF) algorithm. For the inexact case, we refer to this algorithm
as the inexact infeasible path-following (IIPF) algorithm.
    We prove global convergence and provide a complexity result for the IIPF
algorithm. We design a suitable stopping criteria for the PCG method. This
plays an important role in the convergence of the IIPF algorithm. This stop-
ping criterion allows a low accuracy when the current iterate is far from the
solution. We impose some conditions on the forcing term of the inexact New-
ton method in order to prove the convergence of the IIPF algorithm. Note
that the same analysis can be used in the cases where the augmented system
is solved iteratively, providing that the residual of this iterative method has
Chapter 1. Introduction                                                     17


a zero block in its second component corresponding to (2, 2) block in (1.7)
such that r = [r1 , 0]. Thus we can carry out this approach to cases like [65],
for example.
    The original results presented in this thesis have been the basis for two
papers that have been accepted for publication, jointly with Jacek Gondzio
and Julian Hall [2], and with Jacek Gondzio [1].



1.3        The structure of the thesis
This thesis is structured as follows. In Chapter 2, we introduce and formalise
the primal-dual interior point method for linear programming. Also in this
chapter we present some of the well known feasible and infeasible interior
point algorithms. Moreover, Chapter 2 review the convergence behaviour
of Newton and inexact Newton methods. Furthermore, in this chapter we
discuss several well known methods to solve a linear system. We introduce
briefly a few direct methods and discuss extensively several iterative methods.
As in this thesis we are concerned with the use of an iterative method to solve
the linear systems which arise from IPMs, we mainly focus on the Krylov
subspace methods in this chapter.
    In Chapter 3 firstly, we present preconditioners for the augmented sys-
tem which have been constructed in the last few years. Secondly, we propose
our new block triangular preconditioner and we perform a spectral analysis of
this preconditioner. Moreover, in this chapter we take a closer look at the be-
haviour of conjugate gradients for the indefinite system: we follow [65] in the
analysis of our preconditioner. Furthermore, we prove that the convergence
of the PCG method applied to the indefinite system (1.7) preconditioned
with the proposed preconditioner, is similar to the convergence of the PCG
Chapter 1. Introduction                                                       18


method applied to a positive definite system. Finally, we discuss the issues
involved in the identification of a suitable subset of columns to produce a
well-conditioned matrix.
    In Chapter 4 we compute the residual of the inexact Newton method and
choose suitable stopping criteria to the PCG method which makes sense for
the convergence of the inexact Newton method. In addition in this chapter
we perform the convergence analysis and provide the complexity result for
the IIPF Algorithm.
    We have implemented the conjugate gradients method with the indefi-
nite preconditioner in the context of the HOPDM interior point solver and
we have applied it to solve a number of medium and large-scale linear pro-
gramming problems. In Chapter 5, we discuss our computational experience.
In Chapter 6 we draw our conclusions and discuss possible future develop-
ments.



1.4        Notations
Throughout the thesis, we use the following notation. By R we denote the
set of real number. For a natural number n, the symbol Rn denotes the set
of vectors with n components in R. Greek letters denote scalars, lower-case
letters denote vectors and upper-case letters denote matrices. The ith row
and jth column component of the matrix A is denoted by aij . The iden-
tity matrix will be denoted by I, a subscript will determine its dimension
when it is not clear from context. The symbol . represents the Euclidean
              √
norm ( x = xT x). The symbol . G represents the G-norm for a symmet-
                                       √
ric positive definite matrix G ( x G = xT Gx). The F and F0 denote the
primal-dual feasible and strictly feasible sets respectively. The N2 () or N−∞ ()
Chapter 1. Introduction                                                        19


denote the interior point method neighbourhood, since most primal-dual al-
gorithms take Newton step toward points in specific neighbourhood. The
point t∗ = (x∗ , y ∗ , s∗ ) denotes the optimal solution of interior point method.
The sequence {tk } = {(xk , y k , sk )} denotes the interior point iterations. The
ξ k = (ξp , ξd , ξµ ) denotes the right hand side of the Newton method system
        k k k


(1.5) at iterate k. The rk = (rp , rd , rµ ) denotes the inexact Newton method
                               k k k

                            k
residual at iterate k. The rP CG denotes the residual on the kth PCG itera-
tion. The ek denotes the error on the kth PCG iteration, unless otherwise
stated. For any vector v is in (1.7), v = [v1 , v2 ] and v1 = [vB , vN ], where
vB ∈ Rm . The PCG method residual rP CG = [r1 , r2 ] and r1 = [rB , rN ].
                                   k        k k           k     k    k
Chapter 2

Fundamentals

2.1     The Interior Point Method

2.1.1      The IPM for linear programming

It is widely accepted that the primal-dual interior point method is the most
efficient variant of interior point algorithms for linear programming [3, 77].
The usual transformation in interior point methods consists of replacing in-
equality constraints by the logarithmic barrier. The primal barrier problem
becomes:
                                               n
                         min        cT x − µ         ln xj
                                               j=1

                         s.t.           Ax = b,

where µ > 0 is a barrier parameter. The Lagrangian associated with this
problem has the form:

                                                             n
                                T       T
                L(x, y, µ) = c x − y (Ax − b) − µ                  ln xj
                                                             j=1




                                         20
Chapter 2. Fundamentals                                                               21


and the conditions for a stationary point are


                    x L(x, y, µ)    = c − AT y − µX −1 e = 0
                    y L(x, y, µ)    =                  Ax − b = 0,

where X −1 = diag{x−1 , x−1 , . . . , x−1 }. Denoting s = µX −1 e,
                   1     2             n                                    i.e.    XSe =
µe, where S = diag{s1 , s2 , . . . , sn } and e = (1, 1, . . . , 1)T , the first order op-
timality conditions (for the barrier problem) are:


                                    Ax           = b,
                                   AT y + s = c,
                                                                                   (2.1)
                                   XSe           = µe
                                    (x, s)       > 0.

The interior point algorithm for linear programming applies Newton method
to solve this system of nonlinear equations and gradually reduces the barrier
parameter µ to guarantee convergence to the optimal solution of the original
problem. The Newton direction is obtained by solving the system of linear
equations:
                                                            
                  A 0 0                ∆x          b − Ax
                                                            
                 0 AT I             ∆y  =  c − AT y − s     ,                (2.2)
                                                            
                                                            
                  S 0 X                ∆s       −XSe + µe

By eliminating


                          ∆s = −X −1 S∆x + µX −1 e,
Chapter 2. Fundamentals                                                       22


from the second equation we get the symmetric indefinite augmented system
of linear equations
                                                     
                          −Θ−1 AT        ∆x           f
                                           =         .
                           A   0         ∆y           g

where Θ = XS −1 ∈ Rn×n is a diagonal scaling matrix and the right-hand-side
vectors satisfy f = AT y + s − c − X −1 (XSe − µe) and g = Ax − b.


2.1.2      The Primal-Dual Interior Point Algorithms

Primal-dual interior point algorithms are the most important, efficient and
useful interior point algorithms for linear programming. That is because
of the strong theoretical properties and the practical performance of these
algorithms. Since 1994 researchers have understood well the properties and
theoretical background of primal-dual interior point algorithms [3, 37, 53,
77, 79] and the references therein. In this section we briefly review several
feasible primal-dual interior point algorithms and an infeasible primal-dual
interior point algorithm.
    Primal-dual interior point methods find primal-dual solutions (x∗ , y ∗ , s∗ )
by applying Newton method to the optimality conditions in (2.1) and by mod-
ifying the search directions and step lengths so that the inequality (x, s) > 0
is satisfied strictly at every iteration [77]. Most primal-dual algorithms take
Newton steps toward points in a specific neighbourhood. This neighbourhood
guarantees to keep (x, s) strictly positive and to prevent xi si from becoming
too small relatively for all i = 1, ..., n. In this section we introduce a few
feasible primal-dual interior point algorithms and an infeasible primal-dual
interior point algorithm. For a feasible algorithm the two most interesting
Chapter 2. Fundamentals                                                             23


neighbourhoods are N2 and N−∞ . The N2 neighbourhood is defined by


                N2 (θ) = {(x, y, s) ∈ F0 : XSe − µe       2   ≤ θµ}


for some θ ∈ (0, 1). The N−∞ neighbourhood is defined by


            N−∞ (γ) = {(x, y, s) ∈ F0 : xi si ≥ γµ, ∀i = 1, 2, ..., n}


for some γ ∈ (0, 1). By choosing γ close to zero, N−∞ (γ) encompass most of
the feasible region. However, N2 (θ) is more restrictive, since certain points
in F0 do not belong to N2 (θ) no matter how close θ is chosen to its upper
bound [77]. In other words, N2 (θ) contains only a small fraction of the points
in F0 , while N−∞ (γ) takes up almost all the entire of F0 for small γ, which
makes N−∞ (γ) much more expansive when γ is small. See [77].
    For infeasible algorithms neighbourhoods should guarantee an extra con-
dition; namely the residuals should decrease at each iteration. The extension
of N−∞ (γ) for infeasible algorithms is N−∞ (γ, β), which is defined by


   N−∞ (γ, β) = {(x, y, s) : (ξp , ξd ) /µ ≤ β (ξp , ξd ) /µ0 , (x, s) > 0,
                                                 0 0


                                                 xi si ≥ γµ, i = 1, 2, ..., n},


where the residuals ξp = Ax − b and ξd = AT y + s − c. See [77].
    In primal-dual interior point methods, the initial solution (x0 , y 0 , s0 ) should
belong to the neighbourhood. At each iteration, solution should also belong
to this neighbourhood. The solution at iteration k is given by (xk , y k , sk ) =
(xk−1 , y k−1 , sk−1 )+αk (∆xk , ∆y k , ∆sk ), where αk is the step length and (∆xk , ∆y k , ∆sk )
Chapter 2. Fundamentals                                                     24


is the direction, which is given by:
                                                             
                                     k               k
               A     0    0      ∆x           Ax − b
                                                             
                      T
                          I   ∆y k  =  AT y k + sk − c        ,      (2.3)
                                                             
               0 A
                                                             
               Sk    0    X k
                                 ∆s k
                                          −X k S k e + σk µk e

where σk ∈ [0, 1] is centering parameter. Choosing σk plays a crucial role
in primal-dual interior point algorithms. If σk = 1, the equation (2.3) gives
a centering direction, which improves centrality (all xi si are close to µ) and
makes little progress in reducing µ. If σk = 0 that gives the standard Newton
step, which reduces µ. One can choose the centering parameter σk and the
step length αk to ensure that an iterate stays within the chosen neighbour-
hood. See [77].
    For feasible algorithms, the iterations belong to F0 , so for any iteration
k we have Axk = b and AT y k + sk = c. That makes the first and the second
rows of the right hand side of (2.3) equal to zero. So for feasible algorithms
(2.3) replaced by:
                                                             
                                     k
               A     0    0     ∆x                  0
                                                             
                      T
                          I   ∆y k  =                         .      (2.4)
                                                             
               0 A                                 0
                                                             
               Sk    0    Xk    ∆sk        −X k S k e + σk µk e

    The interior point algorithms which we mention in this section have a
global linear convergence. An algorithm has a global convergence if the algo-
rithm guarantees to converge to the solution from any approximation. The
sequences {µk } converges linearly to zero if µk+1 ≤ δµk , where δ ∈ (0, 1).
Knowing that an algorithm has global convergence and the rate of this conver-
gence alone will not give the whole picture. It is necessary, to know the time
requires an algorithm to solve a given instance of linear programming prob-
Chapter 2. Fundamentals                                                             25


lem. Complexity theory has been concerned with the worst case behaviour
of algorithms. Complexity result is an upper bound on the time required
algorithm to solve a problem. For example, the short-step path-following
                                                               √
algorithm has a polynomial complexity result in the order of O( n log 1/ ),
                                                               √
where > 0. This gives that there is an index K with K = O( n log 1/ )
such that µk ≤ for all k ≥ K. See [77].


The Short-Step Path-Following Algorithm (SPF Algorithm):
                                 √
    • Given θ = 0.4, σ = 1 − 0.4/ n, and (x0 , y 0 , s0 ) ∈ N2 (θ).

    • For k = 0, 1, ...
       set σk = σ and solve (2.4) to obtain (∆xk , ∆y k , ∆sk );
       set (xk+1 , y k+1 , sk+1 ) = (xk , y k , sk ) + (∆xk , ∆y k , ∆sk ).

This algorithm has a global linear convergence and a polynomial complexity
                         √
result in the order of O( n log 1/ ) [77].


The Predictor-Corrector Algorithm (PC Algorithm):

    • Given (x0 , y 0 , s0 ) ∈ N2 (0.25).

    • For k = 0, 1, ...



         if k is even (predictor step)
             solve (2.4) with σk = 0 to obtain (∆xk , ∆y k , ∆sk ); choose αk as
             the largest value of α ∈ [0, 1] such that (xk (α), y k (α), sk (α)) ∈
             N2 (0.5), where
             (xk (α), y k (α), sk (α)) = (xk , y k , sk ) + α(∆xk , ∆y k , ∆sk );
             set (xk+1 , y k+1 , sk+1 ) = (xk (α), y k (α), sk (α));
Chapter 2. Fundamentals                                                            26


         else (corrector step)
             solve (2.4) with σk = 1 to obtain (∆xk , ∆y k , ∆sk );
             set (xk+1 , y k+1 , sk+1 ) = (xk , y k , sk ) + (∆xk , ∆y k , ∆sk )

The parameter σk is chosen to be either 0 or 1. This choice has the following
meaning: improving centrality (corrector step) and reducing the duality mea-
sure µ (predictor step). Also this algorithm has a global linear convergence
                                                      √
and a polynomial complexity result in the order of O( n log 1/ ). However,
this algorithm is a definite improvement over the short-step algorithm be-
cause of the adaptivity that is built into the choice of predictor step length.
See [77].


The Long-Step Path-Following Algorithm (LPF Algorithm):

    • Given γ, σmin , σmax with γ ∈ (0, 1), 0 < σmin < σmax < 1, and
       (x0 , y 0 , s0 ) ∈ N−∞ (γ).

    • For k = 0, 1, ...
       set σk ∈ [σmin , σmax ];
       solve (2.4) to obtain (∆xk , ∆y k , ∆sk );
       choose αk as the largest value of α ∈ [0, 1] such that (xk (α), y k (α), sk (α)) ∈
       N−∞ (γ);
       set (xk+1 , y k+1 , sk+1 ) = (xk (α), y k (α), sk (α)).

This algorithm has a global linear convergence and a polynomial complexity
result in the order of O(n log 1/ ) [77]. In [39] the authors show that the
                                                            √
complexity result for long-step primal-dual algorithm is O( nL) iterations
where L is the size of the input.
    In most cases it is quite difficult to find a strictly feasible starting point
(a point which belongs to F0 ). In this case one can use an infeasible interior
Chapter 2. Fundamentals                                                          27


point algorithm.


The Infeasible Path-Following Algorithm (IPF Algorithm):

   1. Given γ, β, σmin , σmax with γ ∈ (0, 1), β ≥ 1, and 0 < σmin < σmax <
       0.5; choose (x0 , y 0 , s0 ) with (x0 , s0 ) > 0;

   2. For k = 0, 1, 2, ...

          • choose σk ∈ [σmin , σmax ]; and solve
                                                                       
                                                k                     k
                    A 0   0               ∆x                         ξp
                                                                       
                   0 AT I              ∆y k  =            k            ,
                                                                       
                                                              ξd
                                                                       
                    Sk 0 X k              ∆sk        −X k S k e + σk µk e

             where ξp = Axk − b and ξd = AT y k + sk − c
                    k                k


          • choose αk as the largest value of α in [0, 1] such that


                              (xk (α), y k (α), sk (α)) ∈ N−∞ (γ, β)


             and the following Armijo condition holds:


                                      µk (α) ≤ (1 − .01α)µk ;


          • set (xk+1 , y k+1 , sk+1 ) = (xk (αk ), y k (αk ), sk (αk ));

          • stop when µk < , for a small positive constant .

β is chosen such that β ≥ 1 to ensure that the initial point belongs to the
neighbourhood N−∞ (γ, β). This algorithm has a global linear convergence
and a polynomial complexity result in the order of O(n2 | log |) [77]. In [78]
Chapter 2. Fundamentals                                                          28


the author shows that the complexity result for the infeasible path-following
               √
algorithm is O( nL) iterations where L is the size of the input.



2.2       Newton method
In this section we give a closer look at Newton method, inexact Newton
method and their convergence analysis. However, the convergence analysis
of interior point methods follow a different path from the convergence analysis
of Newton method, even though, interior point method takes Newton steps
toward points on certain neighbourhood. Newton method is an iterative
method which is used to solve a system of nonlinear equations. See [47, 61].


                                     F (t) = 0.                                (2.5)


Newton iterations are given by


                           tk+1 = tk − F (tk )−1 F (tk ),                      (2.6)


where tk+1 is the new iterate and tk is the current iterate.
    Assume the problem (2.5) has the solution t∗ . We can approximate the
function with a polynomial by using Taylor expansion about tk .

                                                   F (tk )
            F (t) = F (tk ) + F (tk )(t − tk ) +           (t − tk )2 + ....
                                                     2



                     F (t) ≈ Mk (t) = F (tk ) + F (tk )(t − tk )
Chapter 2. Fundamentals                                                       29


Let tk+1 be the root of the Mk (t) then


                   0 = Mk (tk+1 ) = F (tk ) + F (tk )(tk+1 − tk )


which implies (2.6).
    Let ∆tk = tk+1 − tk then (2.6) becomes


                              F (tk )∆tk = −F (tk ).                        (2.7)


See [47] for more details.


2.2.1      The convergence of Newton method

The Newton method is attractive because it converges quadratically starting
from any sufficiently good initial guess t0 . See [47].
Definition: β(δ) denote the ball of radius δ about the solution t∗


                              β(δ) = {t : e < δ},


where e is the error of the current iterate, e = t − t∗ .


The standard assumptions:

   1. Equation (2.5) has a solution t∗ .

   2. F is Lipschitz continuous with Lipschitz constant γ.

   3. F (t∗ ) is nonsingular.

The following theorem shows that if the standard assumptions hold the func-
tion F (t) satisfies the following properties, Kelley [47, Theorem 5.1.1].
Chapter 2. Fundamentals                                                     30


Theorem 2.2.1. Let the standard assumptions hold. If there are K > 0 and
δ > 0 such that Kδ < 1 and tk ∈ β(δ), where the Newton iterate tk given by
(2.6), then


                                 ek+1 ≤ K ek 2 .                          (2.8)


    This theorem shows that Newton method has a local convergence, since
the initial solution t0 is chosen to be near the solution t∗ . Furthermore,
Newton method converges quadratically, see (2.8).


2.2.2       Termination of the iteration

The iteration is terminated when the ratio F (t) / F (t0 ) is small [47]. More
generally the termination conditioned can be written as


                            F (t) ≤ τr F (t0 ) + τa ,                     (2.9)


where τr is the relative error tolerance and τa is the absolute error tolerance
[47].


2.2.3       Error in the function and derivative

Suppose that F and F are computed inaccurately so that F + ε and F + ζ
are used instead of F and F in iterations. Under this case Newton iterations
should be


                tk+1 = tk − (F (tk ) + ζ(tk ))−1 (F (tk ) + ε(tk )).    (2.10)


Theorem 2.2.2. Let the standard assumptions hold. Assume F (tk ) + ζ(tk )
                             ¯
is nonsingular. If there are K > 0, δ > 0, and δ1 > 0 such that ζ(tk ) < δ1
Chapter 2. Fundamentals                                                   31


and tk ∈ β(δ), where tk is given by (2.10), then


                         ¯
                  ek+1 ≤ K( ek    2
                                      + ζ(tk ) ek + ε(tk ) ).          (2.11)


    Proof: Kelly [47, Theorem 5.4.1].



2.3       Inexact Newton method
Solving the linear equation (2.7) exactly can be very expensive. There-
fore, this linear equation can be solved approximately by using an iterative
method. So instead of (2.7) we get


                          F (tk )∆tk = −F (tk ) + rk .                 (2.12)


The process is stopped when the residual rk satisfies


                               rk ≤ ηk F (tk ) .                       (2.13)


We refer to the term ηk as the forcing term. See [20, 47].


2.3.1      The convergence of Inexact Newton Method

The following theorems illustrate the convergence of inexact Newton method.
By comparing the error of Newton method (2.8) with the error of inexact
Newton method (2.14), we note that the forcing term in the condition (2.13)
plays an important role in the convergence of inexact Newton method. There-
fore, choosing a stopping criterion for the residual of inexact Newton method
affects directly on its convergence.
Chapter 2. Fundamentals                                                      32


Theorem 2.3.1. Let the standard assumptions hold. If there are δ and KI
such that tk ∈ β(δ) and (2.13), where ∆tk satisfies (2.12), then


                           ek+1 ≤ KI ( ek + ηk ) ek .                    (2.14)


    Proof: Kelly [47, Theorem 6.1.1].
    However, in the Newton method the error term satisfies


                                   ek+1 ≤ K ek 2 .


Theorem 2.3.2. Let the standard assumptions hold. If there are δ and η
such that t0 ∈ β(δ) and {ηk } ⊂ [0, η], then the inexact Newton iteration tk+1 ,
which satisfies (2.13), converges linearly to t∗ . Moreover,

    • if ηk → 0 the convergence is superlinear.

    • if ηk ≤ Kη F (tk )     p
                                 for some Kη > 0 the convergence is superlinear
       with order 1 + p.

    Proof: Kelly [47, Theorem 6.1.4].
    The superliner convergence is defined as the following


                           ek+1 ≤      ek , where → 0.




    The superliner convergence with order 1 + p is defined as the following


                          ek+1 ≤     ek p , where ∈ (0, 1).
Chapter 2. Fundamentals                                                      33


2.4       Methods for solving a linear system
In this section, we discuss several methods to solve the following linear system
Hu = q. This system represents either the augmented system (1.7) or the
normal equations (1.8). H ∈ R × , where          = n + m for the augmented
system and      = m for the normal equations, respectively.
    For most problems, the linear system Hu = q is sparse. Before introduc-
ing methods for solving this system, we first focus on the sparsity of linear
system.


2.4.1      Sparse Matrices

A matrix is sparse if many of its coefficients are zero. It is very important to
highlight sparsity for two main reasons. Firstly, many large scale problems,
which occur in practice, have sparse matrices. Secondly, exploiting sparsity
can lead to enormous computational saving. See [24]. To illustrate the po-
tential saving from exploiting sparsity, we consider a small example. Suppose
we want to solve a system with the following matrix
                                               
                                               
                                               
                          H=
                                               
                                                ,
                                               
                                               
                                               



The term        represents a nonzero coefficient, while the coefficients are zero
elsewhere.
    Gaussian elimination can be used, for instance, to solve this system. It
is used to reduce the matrix H to an equivalent upper triangular matrix U
by applying rows operations. The first step of Gaussian elimination leads to
Chapter 2. Fundamentals                                                        34


the following matrix
                                             
                                            
                                            
                                      f   f 
                                            ,
                                            
                                  f       f 
                                            
                                   f   f   f

where f represents a fill-in. The elimination operations change a zero coeffi-
cient into a nonzero one, (we refer to this by the term fill-in). A fill-in requires
additional storage and operations. This elimination leads to full active sub-
matrix (3 × 3 matrix; its columns: 2,3,4 and its rows: 2,3,4). Consequently,
all Gaussian elimination steps from no one will be dense.
    However, if we do rows/columns ordering to H we can control the amount
of fill-in. For our example, swapping between row 1 and row 4 leads to the
equivalent matrix
                                             
                                             
                                             
                                             
                                             .
                                             
                                             
                                             



That leads to an upper triangular matrix without requiring any eliminations.
This saves us extra storages and extra operations.
    The problem of finding the ordering which minimizes fill-in is NP-complete
[77]. However, there are good ordering heuristics which preform quite well
in practices. There are two famous heuristic orderings, the minimum degree
and the minimum local fill-in orderings, see [24, 31, 32, 35].
Chapter 2. Fundamentals                                                        35


2.4.2        Direct Methods

The main focus of this thesis is the use of an iterative method to solve the
linear system which arises from the IPMs. However, we will highlight briefly
some direct methods.


2.4.2.1 Gaussian elimination

Gaussian elimination is one of the most well known direct methods. It is
used to reduce the matrix H to an upper triangular matrix U by applying
row operations. Diagonal elements are chosen to be the pivots. If a diagonal
element is zero, a row interchange has to be carried out. The reduction for
H is performed by using elementary row operations which can be written as


                               L −1 ...L2 L1 H = U.


That can be written as


                                    H = LU,


where L is a unit lower triangular matrix, and its elements lij are precisely the
multipliers which are used in the elimination to vanish the element at the (i, j)
position in U . This decomposition of H is called LU factorisation of H. See
[57] for more details. The computation cost of this method can be expressed
     2
as   3
         + O( 2 ) flops, where each addition, subtraction, multiplication, division
or square root counts as a flops [71].
Chapter 2. Fundamentals                                                    36


2.4.2.2 Cholesky factorisation

Cholesky factorisation method is used to decompose symmetric positive def-
inite matrices. This factorisation produces a lower triangular matrix L with
positive diagonal elements such that


                                H = LLT .


Solving Hu = q is equivalent to solving two systems one with a forward
substitution and the other with a backward substitution,


                            Lv = r, LT u = v.


    We assume the constraint matrix A has a full row rank, so the matrix of
the normal equations system (1.8) will be symmetric and positive definite.
The use of Cholesky factorization to solve the normal equations is a common
choice, see for example [35].


2.4.3      Iterative Methods

The standard approach uses the direct method to solve the normal equa-
tions (symmetric positive definite system) by sparse Cholesky factorisation.
However, for large-scale problems, the computational effort of direct meth-
ods can become sometimes very expensive. Therefore, an iterative method
is employed to solve the linear system which arises from IPMs.
    Iterative method solves the problem approximately. It generates a se-
quence of iterations starting from an initial guess and terminates when the
found solution is close enough to the exact solution or when the residual gets
sufficiently small.
Chapter 2. Fundamentals                                                       37


2.4.3.1 Stationary Iterative Methods

The first iterative methods which were used to solve large linear systems were
based on relaxation of the coordinates. Starting from initial approximation
solution, these methods modify the approximation solution until convergence
is reach. Each of these modifications, called relaxation steps [66]. The iter-
ations of these methods are based on splitting the matrix H into the form
H = H1 + H2 , where H1 is a non-singular matrix. Then the system Hu = q
is converted to the fixed point problem u = H1 (q − H2 u). By beginning
                                            −1


with an initial solution u0 the iterations of these methods is generated by


                            uj+1 = H1 q − H1 H2 uj .
                                    −1     −1




See [66, 74, 80]. Among different stationary methods we mention: Jacobi,
Gauss-Seidel, sucessive overrelaxation (SOR), Arrow-Hurwicz and Uzawa
methods. The stationary methods are now more commonly used as pre-
conditioners for the Krylov subspace methods.


Jacobi Method

Jacobi method uses the splitting H1 = D and H2 = L + U , as the matrix H
is written as the following H = D +L+U , where D is diagonal matrix , L is a
nondiagonal lower triangular matrix and U is a nondiagonal upper triangular
matrix, [47]. Jacobi method converges to solution for all right hand side q,
if 0 <    j=i   |hij | < |hii | for all 1 ≤ i ≤ , see [47, Theorem 1.4.1].


Gauss-Seidel Method

The coefficient matrix for this method is also written as the following H =
D + L + U . The Gauss-Seidel method uses the splitting H1 = D + U and
Chapter 2. Fundamentals                                                                  38


H2 = L, [47]. This method converges for the same conditions of convergence
of the Jacobi method, [43].


Arrow-Hurwicz and Uzawa Methods

These iterative methods are used to solve saddle point problems, such as the
augmented system (1.7). The idea of these stationary methods is to split the
matrix H so that these methods become simultaneous iterations for both ∆x
and ∆y [8].
     The iterations of Uzawa’s method is given as follow:


                          ∆xj+1 = Θ(AT ∆y j − f ),
                          ∆y j+1 = ∆y j + ω(A∆xj+1 − g),

where ω > 0 is relaxation parameter. Accordingly, the splitting matrices are
given by
                                                                           
                               −1                                         T
                              −Θ         0                       0 A
                H1 =                         ,   H2 =                      .
                                     1                               1
                               A    −ωI                          0   ω
                                                                       I

     The iterations of Arrow-Hurwicz method is given as follow:


                ∆xj+1 = ∆xj + α(f + Θ−1 ∆xj − AT ∆y j ),
                ∆y j+1 = ∆y j + ω(A∆xj+1 − g),

where α and ω are relaxation parameters. The splitting matrices are given
by
                                                                                 
                       1                                1            −1        T
                         I     0                       −αI   −Θ               A
            H1 =      α            ,       H2 =                                  .
                               1                                              1
                          A   −ωI                            0                ω
                                                                                I
Chapter 2. Fundamentals                                                    39


For more detail on Arrow-Hurwicz and Uzawa methods see [8] and the ref-
erences therein.


2.4.3.2 Krylov Subspace Methods

Krylov subspace methods are a family of iterative methods to solve a linear
system of the form


                                     Hu = q.                            (2.15)


Krylov subspace methods extract an approximate solution uj from an affine
subspace u0 + Kj of dimension j, where u0 is an arbitrary initial guess to the
solution of (2.15). The Krylov subspace is defined by




                Kj (H, r0 ) = span{r0 , Hr0 , H2 r0 , ..., Hj−1 r0 },   (2.16)


for j ≥ 1. The residual r0 is given by r0 = q − Hu0 . See [47, 66].
    The dimension of the subspace increases by one at each step of the ap-
proximation process. The Krylov subspace has the following properties. The
first property is that Kj is the space of all vectors in the space which can be
written as u = pj−1 (H)r0 , where pj−1 is a polynomial of degree not exceeding
j − 1. The other property is that the degree of the minimal polynomial of
r0 with respect to H (it is a polynomial such that pj−1 (H)r0 = 0) does not
exceed the size of the space dimension          [66].
    There exist many Krylov subspace methods, a few of the most important
ones will be highlighted in the following discussion.
Chapter 2. Fundamentals                                                    40


Conjugate Gradient Method (CG)

Conjugate gradient (CG) method is one of the most popular iterative meth-
ods. This method is used to solve symmetric and positive definite linear
systems. Many studies analyse the CG method [33, 42, 47, 66, 68, 72], and
many papers use it to solve the linear systems which arise from interior point
methods [13, 15, 40, 45, 54, 55].
    In [42, 68] the authors explain the idea of conjugacy. The idea is to pick
up a set of H-orthogonal search directions and to take exactly one step with
the right length, in each one of them. Then the solution will be found after
  steps. Two vectors v and w are H-orthogonal if v T Hw = 0.
    At each step the iterate will be


                             uj+1 = uj + αj dj ,


where αj is the step length and dj is the direction. Let the error term be
defined by ej = uj − u∗ . The step length αj is chosen such that the search
direction is H-orthogonal to the error ej+1 . Consequently, αj is chosen as
follows:


                             (dj )T Hej+1 = 0,


which implies


                          (dj )T H(ej + αj dj ) = 0.


That leads to

                                       (dj )T Hej
                             αj = −               .
                                       (dj )T Hdj
Chapter 2. Fundamentals                                                                      41


Unfortunately, we do not know the ej . If we know ej the problem would be
solved. On the other hand, the residual is given as rj = q − Huj , which can
be written Huj = q − rj which is equivalent to Huj − Hu∗ = q − rj − Hu∗
that leads to Hej = −rj . So the step length can be written as

                                               (dj )T rj
                                       αj =              .
                                              (dj )T Hdj

     All we need now is to find the set of H-orthogonal search direction {dj }.
In order to find this set, we assume we have                         linearly independent columns
               −1
z 0 , ..., z        . We choose d0 = z 0 and for j > 0, set

                                                  j−1
                                     dj = z j +         βk,j dk .
                                                  k=0


     The βj,i is chosen such that (dj )T Hdi = 0 for j > i. So for j > i

                                                 (z j )T Hdi
                                      βj,i = −               .
                                                 (di )T Hdi

     In the CG method the search directions are constructed by the conjuga-
tion of the residuals. So z j = rj . This choice makes sense because the residual
is orthogonal to the previous search directions which guarantees producing a
new linearly independent search direction unless the residual is zero. When
the residual is zero, the solution is found. These properties guarantee that
the CG method is a short recurrence (CG method does not require to save
all previous search directions) Krylov subspace method.
     In the CG method αj and βj,i can be expressed as

                             (rj )T rj                (rj )T Hdi
                     αj =              ,   βj,i = −                    for j > i.
                            (dj )T Hdj                (di )T Hdi
Chapter 2. Fundamentals                                                                42


where the search direction is written as

                                                  j−1
                                   j      j
                                   d =r +               βj,k dk .
                                                  k=0


    The residual can be rewritten as


                                   ri+1 = ri − αi Hdi .


That because ri+1 = q − Hui+1 = q − H(ui + αi di ) = ri − αi Hdi .
    So we have


 (rj )T ri+1 = (rj )T ri − αi (rj )T Hdi ⇒ αi (rj )T Hdi = (rj )T ri − (rj )T ri+1 .


Since the residual is orthogonal to the previous residuals [68]. This leads to


                             
                              − 1 (rj )T rj j = i + 1,
                   (r ) Hd =
                     j T  i     αj−1
                                    0       j > i + 1.

That gives
                                   
                                          (rj )T rj
                                   
                                       (dj−1 )T rj−1
                                                         j = i + 1,
                          βj,i =
                                             0          j > i + 1.

Let βj = βj,j−1 . So the search direction can be expressed as


                                   dj = rj + βj dj−1 .


Consequently, CG method is a short recurrence method, because it is re-
quired to save the immediate previous direction only.
Chapter 2. Fundamentals                                                             43



The CG Algorithm:

    • Given an initial solution u0 , r0 = q − Hu0 and d0 = r0 .

    • For j = 0, 1, ...
       while rj > do
        αj     = (rj )T rj /(dj )T Hdj ,
        uj+1 = uj + αj dj ,
        rj+1 = rj − αj Hdj ,
        βj+1 = (rj+1 )T rj+1 /(rj )T rj ,
        dj+1 = rj+1 + βj+1 dj .

Theorem 2.4.1. Let H be symmetric positive definite. Then the CG algo-
rithm will find the solution within               iterations. [47, Theorem 2.2.1].

    This theorem shows that the CG method finds the solution after a maxi-
mum of iterations. In practice however, accumulated floating point roundoff
error causes the residual to lose accuracy gradually. This causes search di-
rections to lose H-orthogonality [68]. So providing the convergence analysis
of CG method is essential.

Theorem 2.4.2. Let e0 be the initial error of the CG. Then


                          ej   2
                               H   ≤       min     max [Pi (λ)]2 e0        2
                                                                           H,
                                       Pi ,Pi (0)=1 λ∈Λ(H)


where Pi is a polynomials of degree i and Λ(H) is the set of eigenvalues of
H. See [68].

Theorem 2.4.3. Let e0 be the initial error of the CG. Then

                                                 √           j
                                   j               κ−1
                               e       H   ≤2    √               e0   H,
                                                   κ+1
Chapter 2. Fundamentals                                                      44


where κ is the condition number of the matrix H and .        H    is the H-norm
for the symmetric positive definite matrix H. See [68].

                                                         λmin
    The condition number of a matrix defines as κ =       λmax
                                                              ,   where λmin and
λmax are the minimum and maximum eigenvalues of the matrix H respec-
tively. The previous theorem is not precise. Since, the CG method converges
in a few iterations for a matrix which has a few distinct eigenvalues, even if
it has large condition number.

Theorem 2.4.4. Let H be symmetric positive definite. Assume that there
are exactly k ≤ distinct eigenvalues of H. Then the CG iteration terminates
in at most k iterations. [47, Theorem 2.2.3].

    The previous theorems show that the convergence of the CG method de-
pends on the eigenvalues of the matrix of the linear system. The idea of
preconditioning appears to improve the characteristic of the original matrix.
Let P be a preconditioner. P is an approximation to H but is easier to invert
and it is a symmetric positive definite matrix. Instead of solving (2.15), the
system P −1 Hu = P −1 q is solved. The CG method is applied for a sym-
metric positive definite system. P is symmetric positive definite matrix, so
it can be written as P = LLT . Accordingly, we solve the following system
L−1 HL−T u = L−1 q, where u = LT u. Applying the CG method to solve the
         ˆ                ˆ
preconditioned system L−1 HL−T u = L−1 q leads to preconditioned conjugate
                               ˆ
gradient (PCG) method [47, 66, 68].


The PCG Algorithm:

    • Given an initial solution u0 , r0 = q − Hu0 and d0 = P −1 r0 .

    • For j = 0, 1, ...
Chapter 2. Fundamentals                                                     45


       while rj > do
        αj      = (rj )T P −1 rj /(dj )T Hdj ,
        uj+1 = uj + αj dj ,
        rj+1 = rj − αj Hdj ,
        βj+1 = (rj+1 )T P −1 rj+1 /(rj )T P −1 rj ,
        dj+1 = P −1 rj+1 + βj+1 dj .

Generalized Minimal Residual Method (GMRES)

CG method is used to solve a symmetric positive definite system. In 1986
the GMRES was proposed as a Krylov subspace method for solving a non-
symmetric system [67]. GMRES method minimizes the residuals norm over
all vectors in u0 + Kk . Suppose there is an orthogonal projector Vk onto Kk ,
then any vector uk ∈ u0 + Kk can be written as


                                  uk = u0 + Vk y,


where y ∈ Rk . The GMRES generates iterations such that the residual rk is
minimized, which can be written as


                                 Minuk ∈u0 +Kk rk .


On the other hand


             rk = q − Huk = q − H(u0 + Vk y) = r0 − HVk y .


The columns of Vk are generated by using Arnoldi algorithm [47, Algorithm
3.4.1]. The starting vector is given as v 1 = r0 / r0 and the following vectors
Chapter 2. Fundamentals                                                       46


are generated by

                          j+1     Hv j −   j       j T i i
                                           i=1 ((Hv ) v )v
                      v         =                            ,
                                  Hv j −   j       j T i i
                                           i=1 ((Hv ) v )v


for j ≥ 0.
    Let Hk be constructed such that hji = (Hv j )T v i and hij = 0 for i > j + 1.
Then Arnoldi algorithm produces matrices Vk such that




                                    HVk = Vk+1 Hk .


    Consequently, the residual norm becomes




       rk = r0 − HVk y = r0 − Vk+1 Hk y = Vk+1 (βe1 − Hk y) .


That is because v 1 = r0 / r0 and β = r0 , where e1 = [1, 0, ..., 0] and
e1 ∈ Rk+1 . See [47, 66].


The GMRES Algorithm:

    • Given an initial solution u0 , r0 = q − Hu0 , v 1 = r0 / r0 , ρ0 = r0 ,
       β0 = ρ0 and j=0.

    • While ρj >          q and j < jmax do

       (a) j = j + 1.

       (b) For i = 1, ..., j
Chapter 2. Fundamentals                                                                47


                hij            = (Hv j )T v i
                v j+1          = Hv j −         j
                                                i=0   hij v i ,
                hj+1,j         =      v j+1 ,
                v j+1          = v j+1 / v j+1 ,
                e1             = (1, 0, ..., 0)T ∈ Rj+1 ,
                Minimize              βj e1 − Hj dj over Rj to obtain dj ,
                ρj+1           =      βj e1 − Hj dj ,
                uj+1           = uj + Vj dj .

    The GMRES method breaks down when Hv j −                         j       j T i i
                                                                     i=1 ((Hv ) v )v   is
zero. This quantity is zero when the residual is zero. This is not a problem,
since if the residual is zero the solution will be found. See [47, 66].


 BiConjugate Gradient Method (BiCG)

Among all methods which do not require the matrix to be symmetric the
GMRES method is the most successful Krylov subspace method in terms
of minimization property. However, the operations and the storage require-
ment for this method increase linearly with the iteration number (GMRES
is long recurrence method). The BiConjugate Gradient method is a short
recurrence method and is used to solve nonsymmetric problem. It takes an-
other approach: instead of minimizing the residual, the residual is required
to satisfy the bi-orthogonality condition




        (rj )T w = 0,          ¯
                          ∀w ∈ Kj ;    ¯
                                       Kj = span{ˆ0 , HT r0 , ...., (HT )j−1 r0 },
                                                 r       ˆ                   ˆ


      ¯
where Kj is Krylov subspace for HT and usually r0 is chosen such that r0 = r0
                                               ˆ                      ˆ
[47].
Chapter 2. Fundamentals                                                     48



The BiCG Algorithm:

    • Given an initial solution u0 , r0 = q − Hu0 , Choose r0 such that r0 = 0,
                                                           ˆ            ˆ
                   ˆ
       d0 = r0 and d0 = r0 .
                        ˆ

    • For j = 0, 1, ...
       while rj > do
                                  ˆ
        αj = (rj )T rj /(dj )T HT dj ,
                    ˆ
        uj+1 = uj + αj dj ,
        rj+1 = rj − αj Hdj ,                        ˆ
                                  rj+1 = rj − αj HT dj ,
                                  ˆ      ˆ
        βj = (rj+1 )T rj+1 /(rj )T rj ,
                      ˆ            ˆ
        dj+1 = rj+1 + βj dj ,     ˆ                ˆ
                                  dj+1 = rj+1 + βj dj .
                                         ˆ

                                                                   ˆ
The BiCG method breaks down when either (ˆj )T rj = 0 or (dj )T HT dj = 0.
                                         r
If these quantities are very small this method becomes unstable [47, 70].


 MINRES and SYMMLQ Methods

MINRES and SYMMLQ methods are used to solve symmetric indefinite
equation systems. The MINRES method minimizes the 2-norm of the resid-
ual, while the SYMMLQ method solves the projected system, but it does
not minimize anything. It maintains the residual orthogonal to the previous
residuals. See [62] for more detail. As the MINRES and the SYMMLQ meth-
ods are used to solve symmetric indefinite system, these methods should be
preconditioned by a symmetric preconditioner, see [25, 62].


2.4.4      Null Space Methods

Null space methods can be used for solving saddle point problems like the
augmented system (1.7).
Chapter 2. Fundamentals                                                      49


    Solving (1.7) is equivalent to solving the following two equations:


                             −Θ−1 ∆x + AT ∆y = f,
                                                                          (2.17)
                                   A∆x           = g.

    Let us introduce Z the null space matrix. Z is a matrix belong to
Rn×(n−m) and satisfies AZ = 0.
    Null space method is described as follows

   1. Find ∆˜ such that
            x


                                        A∆˜ = g.
                                          x


   2. Solve the system


                             Z T Θ−1 Zp = −Z T (Θ−1 ∆˜ + f ),
                                                     x                    (2.18)


       where Z is the null space matrix of the constraint matrix A.

   3. Set the solution (∆x∗ , ∆y ∗ ) as the following:

       ∆x∗ = ∆˜ + Zp.
              x

       ∆y ∗ is the solution of the system


                               AAT ∆y = A(f + Θ−1 ∆x∗ ).


See [8].
    Let us explain this method. First, we multiply the first equation of (2.17)
with Z T , which gives


                          −Z T Θ−1 ∆x + Z T AT ∆y = Z T f.
Chapter 2. Fundamentals                                                      50


This is equivalent to


                                −Z T Θ−1 ∆x = Z T f


because of Z T AT = 0.
     Let us denote ∆x = ∆˜ + Zp, where ∆˜ is chosen such that A∆˜ = g,
                         x              x                       x
then the previous equation becomes


                          Z T Θ−1 Zp = −Z T (Θ−1 ∆˜ + f ),
                                                  x


which is equivalent to (2.18).
     In order to find ∆y ∗ , we substitute ∆x∗ is the first equation of (2.17) and
then multiply it with A which gives AAT ∆y = A(f + Θ−1 ∆x∗ ).
     The null space method is an attractive approach when n−m is small. The
null space system (2.18) can be solved either directly or iteratively (see Sub-
section 1.2.1 and 1.2.2 above). In [19] the PCG method is used to solve the
null space system (which is similar to (2.18) but for quadratic minimization
problem).
     In order to use the null space method we first have to compute the null
space matrix Z. Let us assume A has full row rank. The matrix Z is given
by
                                                  
                                         −A−1 A2
                                           1
                                Z=                ,
                                          In−m

where the constraint matrix A is partitioned as A = [A1 , A2 ], where A1 is
a m × m nonsingular matrix. There are plenty of choices to construct the
m × m nonsingular matrix A1 see [8]. In order to save on computation
time and storage, one should choose the sparsest null basis matrix A1 . The
Chapter 2. Fundamentals                                                 51


problem of finding the sparsest null basis is called the null space problem.
This problem is NP hard [17], and there are many papers which propose
(heuristic) approaches to solve it [8, 11, 17, 18, 63].
Chapter 3

The PCG Method for the
Augmented System

We are dealing with large and sparse problems and we are looking for an
iterative method from the Krylov-subspace family which can solve the aug-
mented system (1.7) efficiently. As we have discussed in the previous chap-
ter, there exists a wide range of iterative methods which can be used in this
context The family of Krylov-subspace methods [47, 66, 72] enjoys a partic-
ularly good reputation among different iterative methods. Since we plan to
solve large systems of equations, we prefer to use a short recurrence method
rather than a long recurrence one. The full recurrence methods such as GM-
RES [67] occasionally do not manage to converge fast enough and become
unacceptably expensive. Among the short recurrence methods we consid-
ered MINRES [62] and PCG [42, 66, 72]. Bearing in mind that, whichever
method is used, preconditioning is necessary, we decided not to use MINRES
because this method requires a symmetric positive definite preconditioner, a
restriction we would like to avoid. Summing up, encouraged by recent anal-
yses [51, 65] we will apply the preconditioned conjugate gradients (PCG)


                                     52
Chapter 3. The PCG Method for the Augmented System                        53


method directly to the indefinite system (1.7).
    In the introduction section we explained fully why we chose to work with
the augmented system (1.7). To summarise, the augmented system has better
conditioning and has additional flexibility in exploiting sparsity compared
to the normal equations. In addition, all preconditioners for the normal
equations system have an equivalent for the augmented system, while the
opposite is not true.
    The results presented in this chapter have been the subject of joint work
with Jacek Gondzio and Julian Hall [2].



3.1       Preconditioner
Choosing the preconditioner for a linear system plays a critical role in the
convergence of the iterative solver. The issue of finding a preconditioner for
the augmented system was investigated in many papers [9, 10, 16, 21, 22, 23,
34, 46, 60] to mention a few. Let H be the matrix of the augmented system
which arises from IPMs for the linear, quadratic or nonlinear programming
problems.
                                                    
                                          H AT
                                H=                  ,                 (3.1)
                                          A    0

where H is a n × n matrix.
    Before presenting a list of preconditioners for augmented systems, we
should mention first the characteristics of good preconditioner. The precon-
ditioner is considered to be good if it satisfies the following features. The
first one is that the preconditioner should be a good approximation to the
original matrix H. If preconditioner is approximately equal to the original
Chapter 3. The PCG Method for the Augmented System                          54


matrix, then the preconditioned matrix P −1 H will be approximately equal
to identity matrix. That makes it easy to solve the preconditioned system.
The second feature is that the preconditioner should be relatively easy to
compute. Since for most iterative methods, the preconditioner is computed
at each iteration of interior point method. The third feature is that it should
be relatively easy to solve an equation with preconditioner, namely it should
be easy to solve P d = r. Since, this system is required to be solved at each
iterations of the iterative solver. The final feature is that the eigenvalues of
the preconditioned matrix should be clustered (and the distinct eigenvalues
of the preconditioned matrix should be as less as possible) and bounded away
from zero. Because, the convergence of iterative solvers usually relates to the
eigenvalues of the preconditioned system. For the PCG method, for instance,
see Theorem 2.4.4.
    It is very difficult to design a preconditioner satisfies the previous four
features in the same time. Consequently one needs to make a balance among
these features to design a high-quality preconditioner. That why there are
huge number of studies tackle this issue. Below we discuss a few of recently
developed preconditioners for (3.1). We also report theorems which show the
behaviours of the eigenvalues of preconditioned matrices for some of these
preconditioners, see Theorems 3.1.1, 3.1.2, 3.1.3 and 3.1.4. This information
is important because it give an idea about the convergence of the precondi-
tioned system.
    In [9] Bergamaschi, Gondzio, Venturin and Zilli propose a preconditioner
for the augmented system for the linear, quadratic or nonlinear programming
Chapter 3. The PCG Method for the Augmented System                       55


problems. This preconditioner is defined as follow:
                                                    
                                            ˜
                                          G AT
                                P =                 ,
                                          ˜
                                          A    0

                                                 ˜
where G is an invertible approximation of H, and A is a sparse approximation
of the Jacobian of constraints that is of matrix A. Let the error matrix
        ˜
E = A − A have rank p, where 0 ≤ p ≤ m. Let σ be the smallest singular
                                            ˜
         ˜
value of AD−1/2 and eQ and eA be errors terms given as

                                                            ED−1/2
                eQ = D−1/2 QD−1/2 − I ,              eA =          .
                                                              σ
                                                              ˜

The eigenvalues of the preconditioned matrix P −1 H are characterized by the
following theorem [9, Theorem 2.1].
                            ˜
Theorem 3.1.1. Assume A and A have maximum rank. If the eigenvector
is of the form (0, y)T then the eigenvalues of P −1 H are either one (with
multiplicity at least m − p ) or possibly complex and bounded by | | ≤ eA .
Corresponding to eigenvectors of the form (x, y)T with x = 0 the eigenvalues
are

   1. equal to one (with multiplicity at least m − p), or

   2. real positive and bounded by


               λmin (D−1/2 QD−1/2 ) ≤ λ ≤ λmax (D−1/2 QD−1/2 ), or


   3. complex, satisfying


                                     | R | ≤ eQ + eA ,

                                      | I | ≤ eQ + eA ,
Chapter 3. The PCG Method for the Augmented System                          56


      where     =   R   + i I.

    In [21] the constraint matrix is partitioned into two matrices, such that
A = [A1 , A2 ], where A1 is nonsingular. Accordingly, the matrix H is parti-
tioned as follows
                                                    
                                         H11 H12
                                 H=                 .
                                         H21 H22

The preconditioner P is constructed by replacing H by G. Similarly G is
partitioned into
                                                    
                                         G11 G12
                                 G=                 .
                                         G21 G22

    The following theorem describes the eigenvalues of the preconditioned
matrix P −1 H [21, Theorem 2.1].

Theorem 3.1.2. Suppose that Z is the null space matrix of A. Then P −1 H
has 2m unit eigenvalues, and the remaining n − m eigenvalues are those of
the generalized eigenproblem


                                 Z T HZv = λZ T GZv.


    Different choices of the matrices G11 , G12 , G21 and G22 give different pre-
                                            T
conditioners. For the symmetric case H21 = H12 , the authors proposed dif-
ferent choices of the matrix G, which improve the eigenvalues of the precon-
ditioned matrix P −1 H. Here we will mention a few of these preconditioners.
By choosing


                        G22 = H22 , G11 = 0 and G21 = 0.
Chapter 3. The PCG Method for the Augmented System                           57


The eigenvalues of the preconditioned matrix are given in the following the-
orem [21, Theorem 2.3].

Theorem 3.1.3. Suppose that the matrix G is chosen as mentioned before.
Suppose that H22 is positive definite, and let


      ρ = min{rank(A2 ), rank(H21 )} + min{rank(A2 ), rank(H21 )
             + min[rank(A2 ), rank(H11 )]}.

Then P −1 H has at most

      rank(RT H21 + H21 R + RT H11 R) + 1 ≤ min(ρ, n − m) + 1
               T


                                                     ≤ min(2m, n − m) + 1,

distinct eigenvalues, where R = −A−1 A2 .
                                  1


    For G22 = H22 , G11 = H11 and G21 = 0. The eigenvalues of the precon-
ditioned matrix satisfy the following theorem [21, Theorem 2.4].

Theorem 3.1.4. Suppose that the matrix G is chosen as mentioned before.
Suppose that H22 + RT H11 R is positive definite, and that
                       T




                       ν = 2 min{rank(A2 ), rank(H21 )}.


Then P −1 H has at most ν + 1 distinct eigenvalues, where


            rank(RT H11 R) + 1 ≤ ν + 1 ≤ min(2m, n − m) + 1.


    In [34] the authors propose four different symmetric positive definite pre-
conditioners for the augmented system for the linear programs. In order to
construct these preconditioners the matrices H and A are partitioned as has
Chapter 3. The PCG Method for the Augmented System                                58


been mentioned earlier. However, A2 is chosen to be the nonsingular matrix
instead of A1 .
     The first preconditioner is a diagonal matrix. This preconditioner is given
by
                                                         
                                       H  0 0
                                      11     
                                 T
                         P = C1 C1 =  0 I 0  .
                                             
                                             
                                        0 0 I

The preconditioned matrix is given by
                                                                       
                                                          −1/2
                                     I               0   H11       AT
                                                                    1
                                                                       
                C1 HC1 =  0
                 −1  −T
                                                           AT           .
                                                                       
                                    H22                     2
                                                                       
                               −1/2
                           A1 H11   A2                         0

The second preconditioner is a block diagonal matrix. It is presented as
follows
                                                              
                                         H   0   0
                                        11        
                                 T          T
                       P =   C2 C2   =  0 A2 A2 0  .
                                                  
                                                  
                                          0  0   I

The preconditioned matrix is given by
                                                                            
                                                                −1/2
                            I             0                    H11 AT1
                                                                            
           C2 HC2
            −1  −T
                         = 0        A−T H22 A−1                             .
                                                                            
                                      2       2                     I
                                                                            
                                −1/2
                            A1 H11        I                         0

The third preconditioner is designed to eliminate the submatrix A−T H22 A−1
                                                                 2       2
Chapter 3. The PCG Method for the Augmented System                                         59


in the previous preconditioned matrix. This preconditioner is given by
                                                                       
                                             1/2
                                            H11       0         0
                                                                       
                            T
               P =      C3 C3 ,   C3 =              AT    1
                                                             H A−1      .
                                                                       
                                             0        2    2 22 2
                                                                       
                                             0       0          I

The preconditioned matrix is given by
                                                                                               
                                                           −1/2                   −1/2
                    I                                  1
                                                     − 2 H11      AT A−T H22 A−1 H11
                                                                   1 2        2            AT
                                                                                            1
                                                                                               
C3 HC3 =  − 1 A−T H22 A−1 A1 H11
 −1  −T                        −1/2
                                                                                                .
                                                                                               
                                                                    0                  I
          2 2          2                                                                       
                −1/2
           A1 H11                                                   I                  0

The fourth preconditioner also eliminates the submatrix A−T H22 A−1 , using
                                                         2       2

the factorization AT = LU . This preconditioner is given by
                   2

                                                                       
                                             1/2
                                            H11      0         0
                                                                       
                            T
                P =     C4 C4 ,   C4 =                   1
                                                            H L−T       .
                                                                       
                                              0      L    2 22
                                                                       
                                              0      0         UT

The preconditioned matrix is given by
                                                                                                         
                                                               −1/2                          −1/2
                  I                                       − 1 H11 AT U −1 L−1 H22 L−T
                                                            2       1                       H11 AT U −1
                                                                                                  1
                                                                                                         
C4 HC4
 −1  −T                                  −1/2
              =  − 2 L H22 L−T U −T A1 H11
                 1 −1
                                                                                                          .
                                                                                                          
                                                                             0                       I
                                                                                                         
                           −1/2
                  U −T A1 H11                                                I                       0

    The preconditioner in [60] is given in the form P = CC T and is applied
from the left and from the right to the augmented system, which arises from
the IPMs for LP. To construct this preconditioner the matrices A and H are
partitioned as mentioned before, where A1 is nonsingular. The inverse of C
is given by
Chapter 3. The PCG Method for the Augmented System                                  60



                                                             
                                               −H −1/2 M
                               C −1 =                        ,
                                                 T        0

                                                                            1/2
T = [I    0]Q, where Q is a permutation matrix, and M = T T H11 A−1 .
                                                                 1

    The preconditioned matrix is given by:


                                                                      
                                                −I    −W          0
                                                                 
                 C −1 HC −T = Q  −W T
                                                                   T
                                                               0 Q ,
                                
                                                      I
                                                                 
                                   0                  0       H11

                 1/2            −1/2
where W = H11 A−1 A2 H22
               1                       .
    Assume ∆x = [∆x1 , ∆x2 ] is partitioned accordingly to the partition of A.
    Eventually in the approach of [60] the preconditioned system is reduced
to the following normal equations


                                (I + W W T )∆x1 = g .
                                                  ˜


    In this section we construct a new preconditioner for the augmented sys-
tem (1.7). And before we do so we will rearrange the augmented system such
that
                                                                  
                           −1      T
                           Θ      A             −∆x               f
                                                   =              ,         (3.2)
                           A       0             ∆y                g

where in this chapter we redefine g as follows g = Ax − b.
    To design the preconditioner for the augmented system, we first ob-
serve that the ill-conditioning in linear systems (1.7) and (1.8) is a conse-
Chapter 3. The PCG Method for the Augmented System                             61


quence of the properties of the diagonal scaling matrix Θ. From the com-
plementarity condition for linear programs we know that, at the optimum,
xj sj = 0, ∀j ∈ {1, 2, . . . , n}. The condition xj sj = 0 is satisfied if at least
ˆ ˆ                                              ˆ ˆ
one of the variables xj and sj is zero. Primal-dual interior point methods
                     ˆ      ˆ
identify a strong optimal partition [77], that is, they produce an optimal so-
lution with the property xj + sj > 0, ∀j. In other words, only one of xj and
                         ˆ    ˆ                                       ˆ
sj is zero. The set of indices {1, 2, . . . , n} can therefore be partitioned into
ˆ
two disjoint subsets:


  B = {j ∈ {1, 2, ..., n} : xj > 0} and N = {j ∈ {1, 2, ..., n} : sj > 0}.
                            ˆ                                     ˆ


In fact, the optimal partition is closely related (but not equivalent to) the
basic-nonbasic partition in the simplex method. That is due to that simplex
method iterations move from vertex to vertex until the optimal solution is
found. So the simplex method has exactly m basic variables (variables belong
to B) and n−m nonbasic variables (variables belong to N). However, interior
point methods approach the optimal solution by moving through the interior
of the feasible region. Consequently, interior point methods have m basic
variable and n − m nonbasic variables in the limit only. That is why we refer
to this partition in interior point methods by optimal partition.
    Unlike the simplex method which satisfies the complementarity condition
at each iteration, the interior point method satisfies this condition only in
the limit. The primal-dual interior point method identifies a strong optimal
partition near the optimal solution. Below we will summarise its asymptotic
behaviour and use the arrow to denote “converges to”. If at the optimal
solution j ∈ B, then xj → xj > 0 and sj → 0, hence the corresponding
                          ˆ
element θj → ∞. If at the optimal solution j ∈ N, then xj → 0 and
Chapter 3. The PCG Method for the Augmented System                                          62


sj → sj > 0 and θj → 0. Summing up,
     ˆ
                                                           
             ∞, if j ∈ B                                    0, if j ∈ B
                                                     −1
       θj →                               and       θj    →                           (3.3)
             0, if j ∈ N,                                   ∞, if j ∈ N.


This property of interior point methods is responsible for a number of numer-
ical difficulties. In particular, it causes both linear systems (1.7) and (1.8) to
become very ill-conditioned when an interior point method approaches the
optimal solution [3]. However, it may be used to advantage when construct-
ing a preconditioner for the iterative method.
    We partition the matrices and vectors:
                                         
                                ΘB   0
A = [AB , AN ],       Θ=                 ,    x = [xB , xN ],      and   s = [sB , sN ]
                                 0   ΘN

according to the partition of {1, 2, . . . , n} into sets B and N. With this
notation, from (3.3) we conclude that ΘN ≈ 0 and Θ−1 ≈ 0. Consequently,
                                                  B

the matrix in the augmented system (3.2) can be approximated as follows:
                                                                  
                      Θ−1
                       B             AT
                                      B                         AT
                                                                 B
                                                                  
                            Θ−1      AT   ≈             Θ−1   AT   ,               (3.4)
                                                                  
                  
                            N        N                  N     N   
                      AB        AN                  AB    AN

and the matrix in the normal equations system (1.8) can be approximated
as follows:


                AΘAT = AB ΘB AT + AN ΘN AT ≈ AB ΘB AT .
                              B          N          B                                 (3.5)


If the matrix AB was square and nonsingular then equations (3.4) and (3.5)
would suggest obvious preconditioners for the augmented system and nor-
Chapter 3. The PCG Method for the Augmented System                         63


mal equations, respectively. However, there is no guarantee that this is the
case. On the contrary, in practical applications it is very unlikely that the
matrix AB corresponding to the optimal partition is square and nonsingular.
Moreover, the optimal partition is known only when an IPM approaches the
optimal solution of the linear program.
    To construct a preconditioner to (3.2) with a structure similar to the ap-
proximation (3.4) we need to guess an optimal partition and, additionally,
guarantee that the matrix B which approximates AB is nonsingular. We ex-
ploit the difference in magnitude of elements in Θ to design a preconditioner.
We sort the elements of Θ in non-increasing order: θ1 ≥ θ2 ≥ θ3 ≥ · · · ≥ θn .
                                   −1   −1   −1
Hence the elements of Θ−1 satisfy θ1 ≤ θ2 ≤ θ3 ≤ · · · ≤ θn . If the
                                                          −1


primal-dual iterate is sufficiently close to an optimal solution, then the first
          −1
elements θj in this list correspond to variables xj which are most likely to
be nonzero at the optimum, and the last elements in the list correspond to
variables which are likely to be zero at the optimum. We select the first
m linearly independent columns of the matrix A, when permuted according
                 −1
to the order of θj , and we construct a nonsingular matrix B from these
columns. The submatrix of A corresponding to all the remaining columns is
denoted by N . Therefore we assume that a partition A = [B, N ] is known
                                            −1
such that B is nonsingular and the entries θj corresponding to columns of
B are chosen from the smallest elements of Θ−1 . According to this partition-
ing of A and Θ (and after a symmetric row and column permutation) the
indefinite matrix in (3.2) can be rewritten in the following form
                                                            
                                     Θ−1
                                      B              BT
                                                            
                           K=              Θ−1          T   .          (3.6)
                                                            
                                             N       N
                                                            
                                      B      N
Chapter 3. The PCG Method for the Augmented System                                         64


By construction, the elements of Θ−1 are supposed to be among the smallest
                                  B

elements of Θ−1 , hence we may assume that Θ−1 ≈ 0. The following easily
                                            B

invertible block-triangular matrix
                                                            
                                                         T
                                                     B
                                                            
                             P =          Θ−1           T                             (3.7)
                                                            
                                            N        N       
                                                            
                                       B    N

is a good approximation to K. Hence P is an attractive preconditioner for K.
We should mention that Oliveira and Sorensen [60] use a similar partitioning
process to derive their preconditioner for the normal equations. They order
the columns of the matrix AΘ−1 from the smallest to the largest with respect
to the 1-norm and then scan the columns of A in this order to select the first
m that are linearly independent.
    Since the matrix B was constructed from columns corresponding to the
smallest possible elements of Θ−1 we may expect that Θ−1
                                                      B                        F     Θ−1
                                                                                      N    F,

where .    F   denotes the Frobenius norm of the matrix. Using (3.6) and (3.7)
we derive the following bound on the square of the Frobenius norm of the
difference of matrices K and P :


                    K −P      2
                              F   = Θ−1
                                     B
                                           2
                                           F         P       2
                                                             F   < K   2
                                                                       F.              (3.8)


Summing up, P is a good approximation to K (since the approximation
                                       2                 2
error is small in relation to P        F   and K         F)      and we may consider it as a
possible preconditioner of K. Secondly, it is easy to compute P , we order
the elements of Θ in non-increase order then we pick the first m linearly
independent columns of A in this order to construct the nonsingular matrix
B, see Subsection 3.4. In addition, it is easy to solve an equation with P
Chapter 3. The PCG Method for the Augmented System                           65


because it is block-triangular with nonsingular diagonal blocks B, Θ−1 and
                                                                    N

B T . We conclude this section by giving explicit formulae for the solution
of equations with the preconditioner (3.7) and leave the analysis of spectral
properties of the preconditioned matrix P −1 K to Subsection 3.2.


3.1.1      Solving equations with P

The matrix (3.7) is block triangular and its diagonal blocks B, Θ−1 and B T
                                                                 N

are invertible. Let d = [dB , dN , dy ] and r = [rB , rN , ry ] and consider the
system of equations
                                                             
                                   BT           dB           rB
                                                             
                            Θ−1        T              =  rN     .       (3.9)
                                                             
                            N     N         dN
                                                             
                        B    N                 dy          ry

The solution of (3.9) can easily be computed by exploiting the block-triangular
structure of the matrix:

                         B T dy = rB ⇒ dy = B −T rB
             Θ−1 dN + N T dy = rN ⇒ dN = ΘN rN − ΘN N T dy
              N
                                                                         (3.10)
                 BdB + N dN = ry ⇒ dB = B −1 (ry − N dN ).

The operation d = P −1 r involves solving two equations (one with B and one
with B T ) and a couple of matrix-vector multiplications. These operations
will be performed at every iteration of the conjugate gradients procedure
hence they should be implemented in the most efficient way. The issues of
choosing a well-conditioned basis matrix B with sparse factored inverse are
addressed in Subsection 3.4.
Chapter 3. The PCG Method for the Augmented System                           66


3.2       Spectral analysis
We have observed earlier that if ΘB is chosen carefully and        Θ−1
                                                                    B    F

 Θ−1 F
  N       then the preconditioner (3.7) is a good approximation to K in (3.6).
To assess the quality of the preconditioner we need a better understanding
of the relation between P and K.
    We will therefore analyse the spectral properties of the preconditioned
matrix P −1 K. Let us use the notation Kt = q to denote the system (3.2),
where t = [−∆x, ∆y] and q = [f, g]. Given a starting approximation t(0) and
the associated residual r(0) = q − Kt(0) the indefinite preconditioner may be
applied either from the right, yielding the system


                              KP −1 t = q,
                                    ˆ                t = P −1 t,
                                                              ˆ          (3.11)


or from the left, so that the system to be solved becomes


                                    P −1 Kt = P −1 q.                    (3.12)


The right and the left preconditioned matrices KP −1 and P −1 K have the
same eigenvalues so general spectral results can be given in terms of either
of the two formulations. The following theorem shows that the eigenvalues
of the P −1 K matrix are real and positive. Moreover they are bounded away
from zero.

Theorem 3.2.1. Let λ be an eigenvalue of P −1 K. Then λ is real and λ ≥ 1.

Proof. Let v be an eigenvector of P −1 K corresponding to the eigenvalue λ,
that is, P −1 Kv = λv. Let λ = 1 + τ and, applying the usual partitioning
Chapter 3. The PCG Method for the Augmented System                                    67


v = [vB , vN , vy ], the eigensystem can be written as Kv = (1 + τ )P v:
                                                                           
        Θ−1          B   T
                                  v                              B   T
                                                                             v
        B                     B                                      B     
              Θ−1 N T                   = (1 + τ )         Θ−1 N T
                                                                           
              N
                                vN                          N
                                                                           vN   
                                                                           
        B      N                  vy                     B   N               vy

which yields


                                            Θ−1 vB = τ B T vy
                                             B

                              τ (Θ−1 vN + N T vy ) = 0
                                  N

                                   τ (BvB + N vN ) = 0.


    We consider two cases. When τ = 0 clearly λ = 1 so the claim is true.
Otherwise, when τ = 0, the equation system can be simplified:


                                           Θ−1 vB = τ B T vy
                                            B

                                Θ−1 vN + N T vy = 0
                                 N

                                    BvB + N vN = 0,


and solved for τ . From the third equation we get vB = −B −1 N vN and,
substituting this in the first equation, yields N vN = −τ BΘB B T vy . Next, we
use the second equation to substitute for vN = −ΘN N T vy giving


                             (N ΘN N T )vy = τ (BΘB B T )vy .


    If vy = 0 then (using τ = 0) we deduce that vB = 0 and vN = 0, that is
the eigenvector is zero. We can exclude such a situation and safely assume
Chapter 3. The PCG Method for the Augmented System                               68

                                                                      T
that vy = 0. In this case, we multiply both sides of the equation by vy to get


                      vy (N ΘN N T )vy = τ vy (BΘB B T )vy .
                       T                    T




Since all the elements of Θ are positive and B is nonsingular, the matrix
BΘB B T is symmetric positive definite and the matrix N ΘN N T is symmetric
positive semidefinite. Hence we conclude that

                                    vy (N ΘN N T )vy
                                     T
                              τ=                     ≥ 0,                     (3.13)
                                    vy (BΘB B T )vy
                                     T



which is real and positive number, which completes the proof.

    The proof reveals the importance of the correct partitioning of A =
[B, N ]. Indeed, this partition should have a number of desirable features:

    • B should be nonsingular and well-conditioned since we should operate
      accurately with the preconditioner;

    • All elements in Θ−1 should be small in comparison with those in Θ−1 .
                       B                                               N


    The condition Θ−1
                   B          F        Θ−1
                                        N    F   is relatively easy to satisfy. How-
ever, (3.13) indicates that we need a stronger property: we would like to
bound τ from above and, in that way, cluster all eigenvalues of P −1 K in an
interval [1, λmax ], with λmax kept as small as possible. This opens questions
regarding the necessary concessions to be made when the matrix B and the
corresponding ΘB are chosen. The ability to identify a well-conditioned ma-
trix B consisting of columns for which the θj are “large” is crucial for the
good/efficient behaviour of our approach. We discuss these issues in detail
in Section 3.4.
    In the previous theorem we show that the eigenvalues of the precondi-
tioned matrix KP −1 are real and greater than one. In the following theorem
Chapter 3. The PCG Method for the Augmented System                                        69


we show that the preconditioned matrix KP −1 has at least n + m − p unit
eigenvalues, where p is the rank of the matrix N .

Theorem 3.2.2. The preconditioned matrix KP −1 has:

    • unit eigenvalues with multiplicity n + m − p.

                                                                z T N ΘN N T z
    • the remaining p eigenvalues are given by 1 +              z T BΘB B T z
                                                                                 ≥ 1, where
      z = 0.


Proof. The inverse of the preconditioner P is given by
                                                                         
                              B −1 N ΘN N T B −T −B −1 N ΘN B −1
                                                                         
               P −1 =          −ΘN N B T    −T                           .
                                                                         
                                                       ΘN           0
                                                                         
                                      B −T              0           0

Therefore, the preconditioned matrix KP −1 is given by
                                                                                    
                     I+   Θ−1 B −1 N ΘN N T B −T
                           B                         −Θ−1 B −1 N ΘN
                                                       B                  Θ−1 B −1
                                                                           B
                                                                                    
    KP −1    =                                                                      .
                                                                                    
                                  0                         I                    0
                                                                                    
                                  0                         0                    I

Let v be an eigenvector of KP −1 corresponding to the eigenvalue λ, that is,
KP −1 v = λv, which can be rewritten as
                                                                                          
    I+   Θ−1 B −1 N ΘN N T B −T       −Θ−1 B −1 N ΘN    Θ−1 B −1       v                 v
         B                             B                B          B                B     
                                                                                  = λ  vN
                                                                                          
                0                            I             0        vN                      
                                                                                          
                 0                            0             I          vy                vy

which yields


 (I + Θ−1 B −1 N ΘN N T B −T )vB − Θ−1 B −1 N ΘN vN + Θ−1 B −1 vy = λvB , (3.14)
       B                            B                  B
Chapter 3. The PCG Method for the Augmented System                             70


                                       vN = λvN ,                          (3.15)

                                        vy = λvy .                         (3.16)

The equations (3.15) and (3.16) are true if either (λ = 1 for any vN and vy )
or (vN = vy = 0).

   1. Case λ = 1, we now analyse a number of cases depending on vB , vN
      and vy .

      a. vN = 0. Substituting this in (3.14) gives


                                    vB = −B T (N ΘN N T )−1 vy .


             That gives the eigenvector [−B T (N ΘN N T )−1 vy , 0, vy ] which is
             associated with the unit eigenvalue with multiplicity m, because
             we can find m linearly independent vectors vy .

      b. vy = 0. Substituting this in (3.14) gives


                                  vB = B T (N ΘN N T )−1 N ΘN vN .


             That gives the eigenvector [B T (N ΘN N T )−1 N ΘN vN , vN , 0] which
             is associated with the unit eigenvalue with multiplicity n − m,
             because we can find n − m linearly independent vectors vN .

   2. Case vN = vy = 0. Substituting this in (3.14) gives


                         BΘB vB + N ΘN N T B −T vB = λBΘB vB .


      For vB = 0, there is a nonzero vector z such that vB = B T z. Since B
      is nonsingular z = 0. By substituting this in the previous equation we
Chapter 3. The PCG Method for the Augmented System                            71


      get the following equality


                           BΘB B T z + N ΘN N T z = λBΘB B T z.


      Since z = 0 and BΘB B T is symmetric positive definite matrix we can
      write
                                            z T N ΘN N T z
                                 λ=1+                      ≥ 1.           (3.17)
                                            z T BΘB B T z
      That gives the eigenvectors [vB , 0, 0] which is associated with the eigen-
      values (3.17). Moreover, N has rank p, and for m linearly independent
      vectors vB , we get N T z = 0 with multiplicity p and N T z = 0 with mul-
      tiplicity m − p. Consequently the eigenvectors [vB , 0, 0] are associated
      with the unit eigenvalues with multiplicity m − p and the remaining p
      eigenvalues are given by (3.17).

We Conclude from the previous cases that the preconditioned matrix KP −1
has n + m − p unit eigenvalues and the remaining p eigenvalues are given by
(3.17).



3.3       The PCG method for nonsymmetric in-
          definite system
Rozlozn´ and Simoncini [65] used the BiCG method to solve an indefinite
       ık
system such as (3.2) preconditioned from the right. They show that the
right preconditioned BiCG method reduces to the standard preconditioned
CG method if the following two properties hold. The first property is that the
preconditioned matrix H = KP −1 is J-symmetric, where J = P −1 , and the
second is that g = 0. The reason behind this is that when g = 0 the residual
Chapter 3. The PCG Method for the Augmented System                                                      72

                                                       j
of PCG has a zero block and can be expressed as rj = [r1 , 0]. Although in
our case g = 0, the initial iterate t0 can be chosen so that the corresponding
residual has the form r0 = [r1 , 0]. Furthermore, the preconditioned matrix
                             0


H = KP −1 is J-symmetric, since H T J = JH. See [65].
      Let us consider the following starting point for CG:
                                                                    
                                      −∆x0
                                         B                 B −1 g
                                                                    
                         0
                         t =  −∆x0              =                   ,                           (3.18)
                                                                    
                                    N                          0
                                                                    
                                   0
                                ∆y                             0

                        
                   ∆xB
where ∆x =              . The initial residual r0 = q−Kt0 may then be written
                   ∆xN
as
                                                                                            
      f                Θ−1              B   T
                                                      B g −1
                                                                               fB −   Θ−1 B −1 g
     B               B                                                          B           
  0
r =  fN      −                 Θ−1       T                      =                             .
                                                                                            
                                   N    N                0                          fN
                                                                                            
       g                 B        N                        0                           0

      Note two interesting properties of the preconditioned matrix KP −1 stated
as two Lemmas below. Multiplying by the preconditioned matrix KP −1
preserves a zero block in the third component of the vector.
                                                     
                            v                       z
                          B                      B 
Lemma 3.3.1. Let t =  vN . Then KP −1 t =  zN .
                                                     
                                                     
                             0                       0

Proof. We note first that, by using (3.9)-(3.10), we may write u = P −1 t as
                                                                                
                             −1         T       −T             −1
                         B N ΘN N B                  vB − B N ΘN vN
                                                                                
                u=               −ΘN N T B −T vB + ΘN vN                        .
                                                                                
                                                                                
                                                B −T vB
Chapter 3. The PCG Method for the Augmented System                                                    73


Hence
                                                                                                         
                           Θ−1
                            B               BT           B −1 N ΘN N T B −T vB − B −1 N ΘN vN
                                                                                                         
KP −1 t = Ku =                    Θ−1          T
                                                                  −ΘN N B     T    −T
                                                                                                         
                                    N       N                                         vB + ΘN vN          
                                                                                                         
                           B       N                                              B −T vB
                                                                                             
                   (I + Θ−1 B −1 N ΘN N T B −T )vB − Θ−1 B −1 N ΘN vN
                         B                            B
                                                                                             
          =                                                                                  ,
                                                                                             
                                                    vN
                                                                                             
                                                    0

which completes the proof.

    Furthermore, using the initial approximate solution
                                                                           
                           −∆x0
                              B                 B −1 (g − N ΘN fN )
                                                                           
                   0
                   t =  −∆x0       =                                       ,                   (3.19)
                                                                           
                              N                          ΘN fN
                                                                           
                             0
                          ∆y                                 0

                                                                     
                                                                 rB
                                                
the residuals will have two zero blocks, r =  0 .
                                                
                                                
                                               0
   The initial residual r0 = q − Kt0 may be written:
                                                                                    
                 f           Θ−1                B   T            −1
                                                             B (g−N ΘN fN )
                B          B                                                        
         r 0 =  fN    −          Θ−1             T                                    ,
                                                                                    
                                     N          N                   ΘN fN
                                                                                    
                  g            B        N                                 0
Chapter 3. The PCG Method for the Augmented System                                               74


which gives
                                                                            
                            fB −Θ−1 B −1 g+Θ−1 B −1 N ΘN fN
                                 B          B
                                                                            
                   0
                  r =                                                       .
                                                                            
                                                0
                                                                            
                                                0

    We observe an important property of the preconditioned matrix: multi-
plying with the matrix KP −1 preserves the zero blocks in the second and
third components of the vector.
                                                                          
                          v                  z
                        B                 B                               
Lemma 3.3.2. Let t =  0 . Then KP −1 t =  0                               .
                                                                          
                                                                          
                           0                  0

Proof. We note first that, by using (3.9)-(3.10), we may write u = P −1 t as
                                                                    
                                         −1           T    −T
                                        B N ΘN N B              vB
                                                                    
                            u=                   T       −T
                                         −ΘN N B
                                                                    
                                                               vB    
                                                                    
                                              B −T vB

hence
                                                                                         
                                Θ−1
                                 B            B   T             −1
                                                               B N ΘN N B    T    −T
                                                                                       vB
                                                                                         
           −1
      KP        t = Ku =               Θ−1    T 
                                                                −ΘN N B  T       −T         ,
                                                                                           
                                         N    N                                     vB
                                                                                         
                                    B    N                           B −T vB

we obtain
                                                                            
                                    (I + Θ−1 B −1 N ΘN N T B −T )vB
                                          B
                                                                            
                       −1
                  KP        t=                                              ,
                                                                            
                                                      0
                                                                            
                                                      0
Chapter 3. The PCG Method for the Augmented System                                   75


    which completes the proof.

    From the PCG algorithm, we have d0 = P −1 r0 , dj = P −1 rj + βj dj−1 and
rj+1 = rj − αj Kdj . So the residual r1 is computed as linear combination
of r0 and KP −1 r0 . For j > 1, the residual rj+1 is computed as a linear
combination of rj−1 , rj and KP −1 rj (That is because rj+1 = αj βj /αj−1 rj−1 +
                                                               j
(1 − αj βj /αj−1 )rj − αj KP −1 rj ). This implies that rj = [r1 , 0] for j = 0,1, . . .
Consequently, we can use the standard PCG method along with (3.7) to
solve (3.2).


3.3.1      The convergence of the PCG method

In this section, we analyse the behaviour of the PCG method for the indefinite
system (3.2) and give explicit formulae describing the convergence of the
method. The convergence analysis of the PCG method is important because
both K and P are indefinite matrices. In [65] the authors prove that both
the error and the residual of PCG method converge to zero. In here we
prove that too. We analyse the method working in our specific setup with a
particular starting point guaranteeing that the initial residual has the form
r0 = [rB , 0, 0].
       0


    The PCG algorithm (see Chapter 2) generates iterates tj , j = 0, 1, . . .
with residuals rj = q − Ktj . The error corresponding to each PCG iteration
has the form ej = tj − t∗ , where t∗ is the solution of (3.2), and the residual
can be written as rj = −Kej since Kej = Ktj − Kt∗ = −rj . In Lemma 3.3.3
we prove that the indefinite K-inner product of the error ej in the PCG
algorithm is always non-negative so we can write ej             K   =    < ej , Kej >,
even though K is not positive definite. In Theorem 3.3.4 we show that the
K-norm of the error ej is minimized over the eigenvalues of the symmetric
positive definite matrices. Similarly, in Theorem 3.3.5 we show that the
Chapter 3. The PCG Method for the Augmented System                                                     76


Euclidean norm of the residual rj is also minimized over the eigenvalues
of the symmetric positive definite matrices. In other words, the error and
residual terms display asymptotic convergence similar to that observed when
PCG is applied to symmetric positive definite systems.

Lemma 3.3.3. Assume we use (3.18) or (3.19) as initial solution of PCG
method. Then the indefinite K-inner product < ej , Kej > is non-negative for
the error ej hence it defines a seminorm


                        ej   K   =    < ej , Kej > = ej
                                                      1          Θ−1 .                             (3.20)


Proof. We have shown in Lemmas 3.3.1 and 3.3.2 that, for a suitable initial
                                           j
solution, the residual has the form rj = [r1 , 0]. Hence
                                                                                     
                             Θ   −1
                                      A   T
                                                   ej
                                                    1           −Θ−1 ej
                                                                      1      −   AT ej
                                                                                     2
     rj = −Kej = −                                   =                               ,
                                 A    0            ej
                                                    2                 −Aej
                                                                         1



implies Aej = 0. Simple calculations give the following result
          1

                                                                                                
                                                                      Θ  −1
                                                                                 A   T
                                                                                              ej
                                                                                               1
    < ej , Kej > = (ej )T Kej =                (ej )T (ej )T
                                                 1      2
                                                                                                
                                                                         A       0            ej
                                                                                               2

                     = (ej )T Θ−1 ej + (ej )T AT ej + (ej )T Aej
                         1         1     1        2     2      1

                     = (ej )T Θ−1 ej
                         1         1

                     = (ej )T ΘB ej + (ej )T Θ−1 ej ≥ 0
                         B
                               −1
                                  B     N     N N                                                  (3.21)


because Θ−1 is positive definite. This gives ej                   K    = ej
                                                                         1       Θ−1 ,    which com-
pletes the proof.

    Let Dj be the Krylov subspace Dj = span{d0 , d1 , ..., dj−1 }. Then D1 =
span{d0 } = span{P −1 r0 }. D2 = span{d0 , d1 }, where the direction d1 is a
Chapter 3. The PCG Method for the Augmented System                                                  77


linear combination of the previous direction and P −1 r1 , while r1 is a linear
combination of the previous residual and Kd0 . This implies that d1 is a linear
combination of d0 and P −1 KP −1 r0 , which gives D2 = span{P −1 r0 , P −1 KP −1 r0 }.
By the same argument dj−1 is a linear combination of dj−2 and (P −1 K)j−1 P −1 r0 ,
giving Dj = span{P −1 r0 , P −1 KP −1 r0 , ..., (P −1 K)j−1 P −1 r0 }. Moreover, r0 =
−Ke0 , so Dj = span{P −1 Ke0 , (P −1 K)2 e0 , . . . , (P −1 K)j e0 }.
      The error can be written as ej = ej−1 + αj−1 dj−1 , hence ej = e0 +
  j−1                                                                                  j
  k=0     αk dk . Since dj ∈ Dj+1 the error can be written as ej = (I+                 k=1    ψk (P −1 K)k )e0 ,
where the coefficient ψk is related to αk and βk . Hence the error term can
be expressed as


                                      ej = φj (P −1 K)e0 ,                                     (3.22)


where φj is a polynomial of degree j and we require that φj (0) = 1.

Theorem 3.3.4. Let e(0) be the initial error of PCG. Then


 ej   2
      K   ≤      min           max          [φ(λ)]2 e0
                                                     B
                                                             2
                                                                 + min
                                                             Θ−1 φ∈P ,φ(0)=1 λ∈Λ(I
                                                                                     max          [φ(λ)]2 e0
                                                                                                           N
                                                                                                                   2
                                                                                                                   Θ−1
                                                                                                                       ,
              φ∈Pj ,φ(0)=1 λ∈Λ(Im +W W T )                     B      j              n−m +W
                                                                                              TW)                    N

                                                                                               (3.23)
where Pj is the set of polynomials of degree j, Λ(G) is the set of eigenvalues
                                       −1/2            1/2
of the matrix G and W = ΘB                    B −1 N ΘN . Im + W W T and In−m + W T W
are symmetric positive definite matrices.

Proof. First, we observe that Ae0 = 0, that is Be0 + N e0 = 0, and hence
                                1                B      N

we write
                                                                 
                                           +  Θ−1 e0
                                               B B       B T e0
                                                              2
                                                                 
                                  0 −1 0
                              Ke =  ΘN eN + N T e0
                                                                  
                                                 2               
                                                                  
                                           0
Chapter 3. The PCG Method for the Augmented System                                                   78


and, using (3.10), we get
                                                                                       
                               −1
                             B N ΘN N B          T       −T
                                                              Θ−1 e0
                                                               B B     −B   −1
                                                                                 N e0
                                                                                    N
                                                                                       
           P −1 Ke0 =                  −ΘN N T B −T Θ−1 e0 + e0                        .
                                                                                       
                                                     B B      N                        
                                             B −T Θ−1 e0 + e0
                                                   B B      2



Since Be0 + N e0 = 0, that is e0 = −B −1 N e0 and N e0 = −Be0 , we obtain
        B      N               B            N        N      B

                                                                                            
                                   −1
                               B N ΘN N B            T    −T
                                                               Θ−1 e0
                                                                B B     −B   −1
                                                                                  (−Be0 )
                                                                                      B
                                                                                            
              −1    0
          P        Ke   =         −ΘN N B   T       −T
                                                          Θ−1 (−B −1 N e0 )             e0
                                                                                            
                                                           B            N          +     N   
                                                                                            
                                                     B −T Θ−1 e0 + e0
                                                           B B      2
                                                                                            
                               ΘB (Θ−1
                                    B        +   Θ−1 B −1 N ΘN N T B −T Θ−1 )e0
                                                  B                      B    B
                                                                                            
                        =          ΘN (Θ−1 + N T B −T Θ−1 B −1 N )e0                        .   (3.24)
                                                                                            
                                        N              B           N                        
                                                     B −T Θ−1 e0 + e0
                                                           B B      2


Let us define


C1 = Θ−1 + Θ−1 B −1 N ΘN N T B −T Θ−1 and C2 = Θ−1 + N T B −T Θ−1 B −1 N.
      B     B                      B            N              B



It is easy to prove that C1 and C2 are symmetric and positive definite ma-
trices. By repeating a similar argument to the one used to derive (3.24) we
obtain
                                                                       
                                                     φ(ΘB C1 )e0
                                                               B
                                                                       
                              −1         0
                        φ(P        K)e =  φ(ΘN C2 )e0                  .                        (3.25)
                                                                       
                                                    N                  
                                               ∗

We observe that it is not necessary to compute the last component of the
vector P −1 Ke0 because Lemma 3.3.3 guarantees that this component does
not contribute to ej      2
                          K.
Chapter 3. The PCG Method for the Augmented System                                                           79


    Using (3.25) to compute the K-norm of the error (3.21) we obtain


         φj (P −1 K)e0      2
                            K    =         φj (ΘB C1 )e0
                                                       B
                                                                2
                                                                Θ−1
                                                                      + φj (ΘN C2 )e0
                                                                                    N
                                                                                           2
                                                                                           Θ−1
                                                                                               .      (3.26)
                                                                  B                          N




Let us observe that

                        1/2      1/2        1/2         −1/2        1/2                    −1/2
        (ΘB C1 )k = ΘB (ΘB C1 ΘB )k ΘB                         = ΘB (Im + W W T )k ΘB             ,


where (Im + W W T ) is a symmetric and positive definite matrix.
                                                                      1/2                          −1/2
    Analogously, we observe that (ΘN C2 )k = ΘN (In−m + W T W )k ΘN                                       , also
(In−m + W T W ) is a symmetric and positive definite matrix. Using these
facts, the two terms on the right-hand-side of (3.26) can be simplified as
follows

                                                  1/2                          −1/2 0 2
          φj (ΘB C1 )e0
                      B
                              2
                              Θ−1
                                       =     ΘB φj (Im + W W T )ΘB                 eB Θ−1
                                B                                                       B
                                                                           −1/2 0 2
                                       =     φj (Im + W W T )ΘB                eB ,
                                                  1/2                            −1/2 0 2
          φj (ΘN C2 )e0
                      N
                              2
                              Θ−1
                                       =     ΘN φj (In−m + W T W )ΘN                 eN Θ−1
                                N                                                         N
                                                                             −1/2 0 2
                                       =     φj (In−m + W T W )ΘN                eN ,

From (3.22) we have ej                 2
                                       K   = φj (P −1 K)e0            2
                                                                      K,   where φj is a polynomial of
degree j and φj (0) = 1. So the K-norm error in (3.26) becomes

                                           −1/2 0 2                                       −1/2 0 2
   ej   2
        K   = φj (Im + W W T )ΘB               eB          + φj (In−m + W T W )ΘN             eN .(3.27)


That is for every polynomial φj over the set of eigenvalues of Im + W W T and
In−m + W T W . Consequently, we can write

                                                                               −1/2 0 2
               ej   2
                    K   ≤        min               max             [φ(λ)]2 ΘB      eB
                              φ∈Pj ,φ(0)=1 λ∈Λ(Im       +W W T )
                                                                              −1/2 0 2
                    +         min                 max           [φ(λ)]2 ΘN        eN ,
                        φ∈Pj ,φ(0)=1 λ∈Λ(In−m           +W T W )
Chapter 3. The PCG Method for the Augmented System                                          80

                                                          −1/2 0 2
and the claim is proved after substituting ΘB                 eB =      e0
                                                                         B
                                                                             2
                                                                             Θ−1
                                                                                   and
                                                                               B
   −1/2 0 2
 ΘN    eN =       e0
                   N
                       2
                       Θ−1
                           .
                         N




    The K-norm of the error ej = φj (P −1 K)e0 is minimized over the eigen-
values of the symmetric positive definite matrices (Im + W W T ) and (In−m +
W T W ) so the error term behaves similar to the symmetric positive definite
case.
    The Euclidean norm of the residual is minimized over the eigenvalues of
the symmetric positive definite matrix Im + W W T . The following Theorem
shows that the residual term displays asymptotic convergence similar to that
observed when PCG is applied to positive definite system.

Theorem 3.3.5. The residual of the PCG method which is used to solve the
augmented system (1.7) preconditioned by P satisfies


                   rj ≤        min          max                   0
                                                          |φ(λ)| rB .                    (3.28)
                           φ∈Pj ,φ(0)=1 λ∈Λ(Im +W W T )


Proof. The residual satisfies


                                     rj = −Kej ,


and the error can be written as


                                ej = φj (P −1 K)e0 .


So we can write the residual as


         rj = −Kφj (P −1 K)e0 = −φj (KP −1 )Ke0 = φj (KP −1 )r0 .
Chapter 3. The PCG Method for the Augmented System                                            81


Furthermore,
                                                                                                   
                  (I +   Θ−1 B −1 N ΘN N T B −T )rB
                          B
                                                  0
                                                        −   Θ−1 B −1 N ΘN rN
                                                             B
                                                                           0
                                                                               +   Θ−1 B −1 r2
                                                                                    B
                                                                                             0
                                                                                                   
  −1 0                                                 0
KP r =                                                                                             ,
                                                                                                   
                                                      rN
                                                                                                   
                                                       0
                                                      r2

             j    j    j
where rj = [rB , rN , r2 ]. The initial residual has the form r0 = [rB , 0, 0]
                                                                     0


because of using the starting point (3.19), so the previous equation becomes
                                                                       
                                 Θ−1 (ΘB + B −1 N ΘN N T B −T )rB
                                  B
                                                                0
                                                                       
              KP −1 r0 =                                               .                 (3.29)
                                                                       
                                                  0
                                                                       
                                                  0

Let us define C = ΘB + B −1 N ΘN N T B −T . It is easy to prove that C is a
symmetric positive definite matrix. By repeating a similar argument to one
used to derive (3.29) we obtain
                                                                  
                                                φj (Θ−1 C)rB
                                                     B
                                                           0
                                                                  
                    rj = φj (KP −1 )r0 =                          ,                      (3.30)
                                                                  
                                                        0
                                                                  
                                                        0

and so


                                 rj = φj (Θ−1 C)rB .
                                           B
                                                 0
                                                                                           (3.31)

                                           −1/2       −1/2     −1/2 k   1/2         −1/2
Let us observe that (Θ−1 C)k = ΘB
                      B                           (ΘB        CΘB   ) ΘB       = ΘB         (Im +
            1/2
W W T )k ΘB , where
Im + W W T is a symmetric positive definite matrix.
Chapter 3. The PCG Method for the Augmented System                                          82


    Using these definitions, (3.31) can be written as

             −1/2                       1/2                                   1/2
  rj = ΘB           φj (Im + W W T )ΘB rB = φj (Im + W W T )ΘB rB
                                        0                       0
                                                                                    Θ−1 .
                                                                                     B




Therefore,

                                                               1/2
               rj ≤       min           max                      0
                                                      |φ(λ)| ΘB rB    Θ−1 ,
                       φ∈Pj ,φ(0)=1 λ∈Λ(Im +W W T )                    B




                                               0         1/2             0
and the claim is proved after substituting ΘB rB                Θ−1   = rB .
                                                                 B




3.4       Identifying and factorising the matrix B
The preconditioner P was derived on the assumption that it should be signifi-
cantly cheaper to compute sparse factors of just the matrix B than computing
a Cholesky factorisation of the coefficient matrix of the normal equations.
Assuming that A has full row rank, we can find an m by m non-singular
sub-matrix B.
    The matrix B is given by the first m linearly independent columns of the
matrix A, where the columns of A are those of the constraint matrix A, or-
                              −1
dered by increasing value of θj . The set of columns forming B is identified
by applying Gaussian elimination to the matrix A, as described below. Al-
though this yields an LU factorisation of B, the factorisation is not efficient
with respect to sparsity and its use in subsequent PCG iterations would be
costly. This potential cost is reduced significantly by using the Tomlin matrix
inversion procedure [69] to determine the factorisation of B for use in PCG
iterations. The Tomlin procedure is a relatively simple method of triangular-
isation and factorisation that underpins the highly efficient implementation
of the revised simplex method described by Hall and McKinnon [41]. Since
Chapter 3. The PCG Method for the Augmented System                          83


the matrix B is analogous to a simplex basis matrix, the use of the Tomlin
procedure in this thesis is expected to be similarly advantageous.


3.4.1      Identifying the columns of B via Gaussian elim-
           ination

When applying Gaussian elimination to the matrix A in order to identify
the set of columns forming B, it is important to stress that the matrix A is
not updated when elimination operations are identified. The linear indepen-
dence of a particular column of A, with respect to columns already in B, is
determined as follows.
    Suppose that k columns of B have been determined and let Lk be the
current lower triangular matrix of elimination multipliers. Let aq be the first
column of A that has not yet been considered for inclusion in B. The system
Lk aq = aq is solved and the entries of the pivotal column aq are scanned for
   ˆ                                                       ˆ
a good pivotal value. At each step of Gaussian elimination, one requires to
divide the indices of the pivotal column by the pivot. So it is necessary to
choose the pivot with large magnitude. Usually the pivot is chosen to be
the coefficient which has the maximum magnitude among the coefficients of
the pivotal column. On the other hand, chosen the pivot plays an important
role in term of sparsity. So, we consider the pivot to be good if it has an
acceptable large magnitude and has relatively small row count.
    If there are no acceptable pivots, indicating that aq is linearly dependent
on the columns already in B, then aq is discarded. Otherwise, a pivot is
chosen and aq is added to the set of columns forming B.
    At least m systems of the form Lk aq = aq must be solved in order to
                                      ˆ
identify all the columns of B. For some problems, a comparable number
of linearly dependent columns of A are encountered before a complete basis
Chapter 3. The PCG Method for the Augmented System                           84


is formed. Thus the efficiency with which Lk aq = aq is solved is crucial.
                                           ˆ
Additionally, the ill-conditioning of B may lead to PCG being prohibitively
expensive. This issue of efficiency is addressed in the following two ways.
    Firstly, in order to reduce the number of nonzeros in the matrices Lk , the
pivotal entry in aq is selected from the set of acceptable pivots on grounds of
                 ˆ
sparsity. If the matrix A were updated with respect to elimination operations,
then the acceptable pivot of minimum row count could be chosen. Since this
is not known, a set of approximate row counts is maintained and used to
discriminate between acceptable pivots. This set of approximate row counts
is initialised to be correct and then, as elimination operations are identified,
updated according to the maximum fill-in that could occur were A to be
updated. (The row counts are initially the number of nonzero indices in
each row of A. Then at each step of Gaussian elimination the row counts
are approximately updated. Row counts are updated if fill-in occurs, while
they are not updated if cancellations occur. Consequently, the same indices
may include more than once if it is removed and then it is created again. In
practice however, there is no much advantage of checking for cancellations
and keeping the list of cancel indices.)
    Secondly, since aq is sparse, consideration is given to the likelihood that
aq is also sparse. This is trivially the case when k = 0 since aq = aq . Since
ˆ                                                              ˆ
the columns of Lk are subsets of the entries in pivotal columns, it follows that
for small values of k, aq will remain sparse. For some important classes of
                       ˆ
LP problems, this property holds for all k and is analogous to what Hall and
McKinnon term hyper-sparsity [41]. Techniques for exploiting hyper-sparsity
when forming aq analogous to those described in [41] have been used when
             ˆ
computing the preconditioner and have led to significant improvements in
computational performance.
Chapter 3. The PCG Method for the Augmented System                        85



Tomlin invert


We apply the Tomlin matrix inversion procedure to the matrix B to de-
termine a sparser LU factorisation for B.
    The active sub-matrix at any time in the Tomlin procedure consists of
those rows and columns in which a pivot has not been found. Initially it is
the whole matrix B. The Tomlin procedure has the following steps:

   1. Find any identity columns of the matrix B and then eliminate these
      columns and their corresponding rows from the active sub-matrix.

   2. Find any singleton row in the active sub-matrix and eliminate it to-
      gether with the corresponding column. Store the column of singleton
      row in the matrix L. Repeat this step to find all singleton rows in the
      active sub-matrix.

   3. Find any singleton column in the active sub-matrix and eliminate it
      together with the corresponding row from the active sub-matrix. Store
      the singleton column in the matrix U . Repeat this to find all singleton
      columns in the active sub-matrix.

   4. Repeat 2 and 3 until there are no more singleton rows or columns.

   5. If the active sub-matrix is empty then stop. Otherwise, move to next
      step.

   6. Apply Gaussian elimination to the remaining active sub-matrix.
Chapter 4

Inexact Interior Point Method

The consequence of using an iterative method to solve the linear system
which arises from IPMs, is solving the KKT system approximately. In this
case, the Newton method (1.4) is solved approximately. So instead of (1.4)
we have the following system.




                       F (tk )∆tk = −F (tk ) + rk ,                    (4.1)


where rk is the residual of the inexact Newton method. Any approximate
step is accepted provided that the residual rk is small such as


                            rk ≤ ηk F (tk ) ,                          (4.2)


as required by the theory [20, 47]. We refer to the term ηk as the forcing
term.
   The original content of this chapter has already appeared in [1], co-
authored with Jacek Gondzio.
   The idea behind inexact interior point algorithms is to derive a stopping

                                     86
Chapter 4. Inexact Interior Point Method                                     87


criterion of the iterative linear system solvers that minimizes the computa-
tional effort involved in computing the search directions and guarantee global
convergence [5].
    We use the PCG method to solve the augmented system (1.7) precondi-
tioned by a block triangular matrix P (3.7). As a result of this the search di-
rections are computed approximately. That makes it necessary to rethink the
convergence of the interior point algorithms, whose convergence are proved
under the assumption that the search directions are calculated exactly. In
this chapter we focus on one interior point algorithm which is the infeasible
path-following algorithm. In order to prove the convergence of the inexact
infeasible path-following algorithm (IIPF algorithm) we should prove first
the convergence of the PCG method applied to the indefinite system (1.7)
then we prove the convergence of the IIPF algorithm.
    In the previous chapter we proved that the PCG method applied to the
indefinite system (1.7) preconditioned with (3.7) and initialized with an ap-
propriate starting point (3.19), converges in a similar way to the case of
applying PCG to a positive definite system. In this chapter we show that
applying PCG to solve (1.7) with the preconditioner (3.7) can be analysed
using the classical framework of the inexact Newton method (4.1).
    The use of inexact Newton methods in interior point methods for LP was
investigated in [5, 6, 16, 29, 58, 59]. In [5] the convergence of the infeasible
interior point algorithm of Kojima, Megiddo, and Mizuno is proved under the
assumption that the iterates are bounded. Monteiro and O’Neal [59] propose
the convergence analysis of inexact infeasible long-step primal-dual algorithm
and give complexity results for this method. In [59] the PCG method is used
to solve the normal equations preconditioned with a sparse preconditioner.
The proposed preconditioner was inspired by the Maximum Weight Basis
Chapter 4. Inexact Interior Point Method                                                  88


Algorithm developed in [64]. In [7] an inexact interior point method for
semidefinite programming is presented. It allows the linear system to be
solved to a low accuracy when the current iterate is far from the solution. In
[50] the convergence analysis of inexact infeasible primal-dual path-following
algorithm for convex quadratic programming is presented. In these papers
the search directions are inexact as the PCG method is used to solve the
normal equations. Korzak [49] proves the convergence of the inexact infea-
sible interior point algorithm of Kojima, Megiddo and Mizuno for LP. This
is for search directions which are computed approximately for any iterative
solver. This convergence is proven under the assumption that the iterates are
bounded. Furthermore, in [82] Zhou and Toh show that the primal-dual inex-
act infeasible interior point algorithm can find the -approximate solution of
a semidefinite programm in O(n2 ln(1/ )) iterations. That is also for search
directions which are computed approximately for any iterative solver with-
out the need of assuming the boundedness of the iterations. That is because
residuals satisfy specific conditions. One of these conditions is dependent on
the smallest singular value of the constraint matrix.
    In order to provide the complexity result for the inexact infeasible interior
point methods, one should find an upper bound on |∆xT ∆s| at each iteration
of IPM. In [50] the authors change the neighbourhood of the interior point
algorithm for QP. The same approach is used to find a bound on |∆xT ∆s|
in [59]. However, that does not work for LP case. The authors assume
that there is a point (¯, y , s) such that the residual of the infeasible primal-
                       x ¯ ¯
dual algorithm is zero (the point (¯, y , s) is primal-dual feasible) and there
                                   x ¯ ¯
is a strictly positive point (x0 , y 0 , s0 ) such that (xk , y k , sk ) = ρ(x0 , y 0 , s0 ),
where ρ ∈ [0, 1] and also (x0 , s0 ) ≥ (¯, s). These conditions are restrictive
                                        x ¯
and do not always hold. In [6, 7] the inexactness comes from solving the
Chapter 4. Inexact Interior Point Method                                      89


normal equation system iteratively. In order to find a bound on |∆xT ∆s|,
the authors find a bound on the normal equations matrix. However, in [82]
the authors force residual to satisfy specific conditions, one of which depends
on the singular value on the constraint matrix.
    In our case we do not require the residual of the inexact Newton method
to satisfy a sophisticated condition. The condition on the residual is defined
by rk      ≤ ηk µk . This condition allows a low accuracy when the current
iterate is far from the solution and high accuracy as the interior point method
approaches optimality, because the term µk decreases as the iterations move
toward the solution. Furthermore, we use shifting residual strategy, which
makes the proof of the convergence and the complexity result of the inexact
infeasible path-following algorithm follow the exact case.
    In this chapter we study the convergence analysis of inexact infeasible
path following algorithm for linear programming as the PCG method is used
to solve the augmented system preconditioned with block triangular sparse
preconditioner. We prove the global convergence and the complexity result
for this method without having to assume the boundedness of the iterates.
We design a suitable stopping criteria for the PCG method. This plays an
important role in the whole convergence of IIPF algorithm. This stopping
criteria allows a low accuracy when the current iterate is far from the solution.
We state conditions on the forcing term of inexact Newton method in order
to prove the convergence of IIPF algorithm.
    The inexact approach in this thesis can be used in the cases where the
augmented system is solved iteratively, provided that the residual of this
iterative method has a zero block r = [r1 , 0]. So we can carry out this
approach to cases like [65] for example.
Chapter 4. Inexact Interior Point Method                                        90


4.1        The residual of inexact Newton method
Using the PCG method to solve the augmented system (1.7) produces a
specific value of the residual of the inexact Newton method (4.1). So we shall
find the value of the residual r in (4.1) in order to prove the convergence of
inexact infeasible path following algorithm and provide a complexity result.
    Solving (1.7) approximately gives
                                                                  
                         −1      T
                     −Θ        A           ∆x           f           r1
                                             =         +          ,   (4.3)
                       A        0          ∆y           g           r2

where r1 = [rB , rN ].
    That gives the following equations:


      −X −1 S∆x + AT ∆y = f + r1 = c − AT y − σµX −1 e + r1 ,                 (4.4)



                        A∆x = g + r2 = b − Ax + r2 .                          (4.5)


    Then we find ∆s by substituting ∆x in (1.6). However, we can shift the
residual from (4.4) to (1.6) by assuming there is a residual h while computing
∆s. Then (1.6) is replaced by


                       ∆s = −X −1 S∆x − s + σµX −1 e + h,


which we can rewrite as


                       −X −1 S∆x = ∆s + s − σµX −1 e − h.
Chapter 4. Inexact Interior Point Method                                     91


Substituting it in (4.4) gives


                       AT ∆y + ∆s = c − AT y − s + h + r1 .


To satisfy the second equation of (1.5) we choose h = −r1 . This gives


                             AT ∆y + ∆s = c − AT y − s,                    (4.6)


and


                      ∆s = −X −1 S∆x − s + σµX −1 e − r1 ,


which implies


                       S∆x + X∆s = −XSe + σµe − Xr1 .                      (4.7)


Equations (4.5), (4.6) and (4.7) give
                                                               
                  A     0    0         ∆x           ξp           r2
                                                               
               0 AT I             ∆y  =  ξd  +  0              ,
                                                               
                                                               
                S 0 X                ∆s       ξµ      −Xr1

where ξp = b − Ax, ξd = c − AT y − s, ξµ = −XSe + σµe and σ ∈ [0, 1].
    In the setting in which we apply the PCG method to solve (1.7) precon-
ditioned with (3.7) we have r2 = 0 and r1 = [rB , 0], see equation (3.30) in
the proof of Theorem 3.3.5. Therefore, the inexact Newton method residual
Chapter 4. Inexact Interior Point Method                                           92


r is
                                                       
                                                    0
                                                       
                                      r=
                                                       
                                                    0   
                                                       
                                                −Xr1

                                             
                    XB rB             XB rB
with Xr1 =                 =                 .
                   XN rN           0
       Shifting the residual from (4.4) to (1.6) is an essential step to prove the
convergence of the IIPF algorithm. It results in moving the residual from
the second row to the last row of the inexact Newton system, which makes
the proof of the convergence of the IIPF Algorithm much easier, as we will
see in Section 4.2.
       The issue of choosing the stopping criteria of inexact Newton method
to satisfy the condition (4.2) has been discussed in many papers. See for
example [5, 6, 7, 49, 82]. In [5, 6] the residual of inexact Newton method is
chosen such that


                                           rk ≤ ηk µk ,


while in [7] the choice satisfies


                                      rk ≤ ηk (nµk ).


       Let the residual be r = [rp , rd , rµ ]. According to Korzak [49], the residual
Chapter 4. Inexact Interior Point Method                                     93


is chosen such that

                         k
                        rp   2   ≤ (1 − τ1 ) Axk − b 2 ,
                         k
                        rd   2   ≤ (1 − τ2 ) AT y k + sk − c 2 ,
                         k
                        rµ   ∞   ≤ τ 3 µk .

where τ1 , τ2 ∈ (0, 1] and τ3 ∈ [0, 1) are some appropriately chosen constants.
    In our case rp = rd = 0, we will stop the PCG algorithm when

                                        k
                                       rµ   ∞   ≤ ηk µk .


As rµ = −X k r1 and r1 = [rB , 0], the stopping criteria becomes
    k         k



                                     k k
                                    XB rB       ∞   ≤ ηk µk .              (4.8)


    We terminate the PCG algorithm when the stopping criteria (4.8) is sat-
isfied. This stopping criteria allows a low accuracy when the current iterate
is far from the solution. In the later iterations the accuracy increases because
the average complementarity gap µ reduces from one iteration to another.



4.2        Convergence of the IIPF Algorithm
In this section we carry out the proof of the convergence of the IIPF algorithm
and derive a complexity result. In the previous section we used the shifting
residual strategy, which makes the proof of the convergence of this inexact
algorithm similar to that of the exact case.
    This section is organised as follows. First we describe the IIPF algorithm.
Then in Lemmas 4.2.1, 4.2.2 and 4.2.3 we derive useful bounds on the iterates.
In Theorems 4.2.4 and 4.2.5 we prove that there is a step length α such that
Chapter 4. Inexact Interior Point Method                                                    94


the new iteration generated by IIPF algorithm belongs to the neighbourhood
N−∞ (γ, β) and the average complementarily gap decreases. In order to prove
that we supply conditions on the forcing term ηk . In Theorem 4.2.6 we show
that the sequence {µk } converges Q-linearly to zero and the normal residual
             k k
sequence { (ξp , ξd ) } converges R-linearly to zero. Finally in Theorem 4.2.7,
we provide the complexity result for this algorithm.
    Definition: The central path neighbourhood N−∞ (γ, β) is defined by


    N−∞ (γ, β) = {(x, y, s) : (ξp , ξd ) /µ ≤ β (ξp , ξd ) /µ0 , (x, s) > 0,
                                                  0 0
                                                                                         (4.9)
                                                       xi si ≥ γµ, i = 1, 2, ..., n},

where γ ∈ (0, 1) and β ≥ 1 [77].


4.2.1        Inexact Infeasible Path-Following Algorithm

   1. Given γ, β, σmin , σmax with γ ∈ (0, 1), β ≥ 1, 0 < σmin < σmax < 0.5,
       and
       0 < ηmin < ηmax < 1; choose (x0 , y 0 , s0 ) with (x0 , s0 ) > 0;

   2. For k = 0, 1, 2, ...

           • choose σk ∈ [σmin , σmax ] and ηk ∈ [ηmin , ηmax ] such that
                     σk (1−γ)
              ηk <    (1+γ)
                                and ηk + σk < 0.99; and solve

                                                                                     
                                           k                    k
                A 0   0                ∆x                      ξp                    0
                                                                                     
               0 AT I                   k  =            k              −               .
                                                                                            (4.10)
                                                                                       
                                     ∆y               ξd                         0
                                                                                     
                Sk 0 X k               ∆sk       σ k µk e − X k S k e              X k r1
                                                                                        k
Chapter 4. Inexact Interior Point Method                                        95

                         k
              Such that rN = 0 and

                                              k k
                                             XB rB   ∞   ≤ ηk µk ,           (4.11)


           • choose αk as the largest value of α in [0, 1] such that


                                (xk (α), y k (α), sk (α)) ∈ N−∞ (γ, β)       (4.12)


              and the following Armijo condition holds:


                                       µk (α) ≤ (1 − .01α)µk ;               (4.13)


           • set (xk+1 , y k+1 , sk+1 ) = (xk (αk ), y k (αk ), sk (αk ));

           • stop when µk < , for a small positive constant .

    In this section we will follow the convergence analysis of the infeasible
path-following algorithm proposed originally by Zhang [81]. However, we
will follow the proof techniques proposed in Wright’s book [77].
    Firstly, let us introduce the quantity

                                      k−1
                               νk =         (1 − αj ), ν0 = 1
                                      j=0


    Note that ξp = b − Axk+1 = b − A(xk + αk ∆xk ) = b − Axk − αk A∆xk =
               k+1


ξp − αk A∆xk , from the first row of (4.10) we get
 k



                                    k+1           k
                                   ξp = (1 − αk )ξp ,                        (4.14)
Chapter 4. Inexact Interior Point Method                                                96


which implies

                                            k       0
                                           ξp = νk ξp .

           k+1
Note also ξd = c − AT y k+1 − sk+1 = c − AT (y k + αk ∆y k ) − (sk + αk ∆sk ) =
(c − AT y k − sk ) − αk (AT ∆y k + ∆sk ) = ξd − αk (AT ∆y k + ∆sk ). From the
                                            k


second row of (4.10) we get

                                    k+1           k
                                   ξd = (1 − αk )ξd ,                                (4.15)


which implies

                                            k       0
                                           ξd = νk ξd ,


Consequently, the quantity νk satisfies

                                                    µk
                                           νk ≤ β      .
                                                    µ0

More details can be found in [77].
    Let (x∗ , y ∗ , s∗ ) be any primal-dual solution.

Lemma 4.2.1. Assume that (xk , y k , sk ) ∈ N−∞ (γ, β), (∆xk , ∆y k , ∆sk ) sat-
isfies (4.10) and (4.11) for all k ≥ 0, and µk ≤ (1 − .01αk−1 )µk−1 for all
k ≥ 1. Then there is a positive constant C1 such that for all k ≥ 0


                                 νk (xk , sk ) ≤ C1 µk ,                             (4.16)


where C1 is given as


                C1 = ζ −1 (nβ + n + β (x0 , s0 )           ∞   (x∗ , s∗ ) 1 /µ0 ),
Chapter 4. Inexact Interior Point Method                                               97


where


                                ζ = min min(x0 , s0 ).
                                             i    i
                                      i=1,...,n



    The proof of this Lemma is similar to the proof of Lemma 6.3 in [77].
Moreover, we follow the same logic as in [77] to prove the following lemma.

Lemma 4.2.2. Assume that (xk , y k , sk ) ∈ N−∞ (γ, β), (∆xk , ∆y k , ∆sk ) sat-
isfies (4.10) and (4.11) for all k ≥ 0, and µk ≤ (1 − .01αk−1 )µk−1 for all
k ≥ 1. Then there is a positive constant C2 such that

                                                       1/2
                                  D−1 ∆xk ≤ C2 µk ,                                 (4.17)



                                                     1/2
                                    D∆sk ≤ C2 µk ,                                  (4.18)


where D = X 1/2 S −1/2 . For all k ≥ 0.

Proof. For simplicity we omit the iteration index k in the proof.
    Let


           (¯, y , s) = (∆x, ∆y, ∆s) + νk (x0 , y 0 , s0 ) − νk (x∗ , y ∗ , s∗ ).
            x ¯ ¯


Then A¯ = 0 and AT y + s = 0, which implies xT s = 0.
      x            ¯ ¯                      ¯ ¯
    A¯ = 0 because
     x


    A¯ = A∆x + νk Ax0 − νk Ax∗ = ξp + νk Ax0 − νk b = ξp − νk ξ0 = 0.
     x


Similarly one can show that AT y + s = 0. Hence
                               ¯ ¯


             0 = xT s = (∆x + νk x0 − νk x∗ )T (∆s + νk s0 − νk s∗ ).
                 ¯ ¯                                                                (4.19)
Chapter 4. Inexact Interior Point Method                                        98


Using the last row of (4.10) implies


           S(∆x + νk x0 − νk x∗ ) + X(∆s + νk s0 − νk s∗ )
           = S∆x + X∆s + νk S(x0 − x∗ ) + νk X(s0 − s∗ )
           = −XSe + σµe − Xr1 + νk S(x0 − x∗ ) + νk X(s0 − s∗ ).

By multiplying this system by (XS)−1/2 , we get


  D−1 (∆x + νk x0 − νk x∗ ) + D(∆s + νk s0 − νk s∗ )
  = (XS)−1/2 (−XSe + σµe − Xr1 ) + νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ ).

The equality (4.19) gives


             D−1 (∆x + νk x0 − νk x∗ ) + D(∆s + νk s0 − νk s∗ )        2
                                                                           =
             D−1 (∆x + νk x0 − νk x∗ )       2
                                                 + D(∆s + νk s0 − νk s∗ ) 2 .

Consequently,


  D−1 (∆x + νk x0 − νk x∗ )        2
                                       + D(∆s + νk s0 − νk s∗ )    2
                                                                      (4.20)
 = (XS)−1/2 (−XSe + σµe − Xr1 ) + νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ ) 2 ,

which leads to

  D−1 (∆x + νk x0 − νk x∗ ) ≤ (XS)−1/2 (−XSe + σµe − Xr1 )
 +νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ ) ≤ (XS)−1/2 (−XSe + σµe − Xr1 )
 +νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ ) .

The triangle inequality and addition of an extra term νk D(s0 − s∗ ) to the
Chapter 4. Inexact Interior Point Method                                                             99


right hand side give


   D−1 ∆x ≤            (XS)−1/2 [−XSe + σµe − Xr1 ] + 2νk D−1 (x0 − x∗ )
                                                                                                 (4.21)
                      +2νk D(s0 − s∗ ) .

(4.20) leads to


  D(∆s + νk s0 − νk s∗ ) ≤ (XS)−1/2 (−XSe + σµe − Xr1 ) + νk D−1 (x0 − x∗ )
 +νk D(s0 − s∗ ) ≤ (XS)−1/2 (−XSe + σµe − Xr1 ) + νk D−1 (x0 − x∗ )
 +νk D(s0 − s∗ ) .

The triangle inequality and addition of an extra term νk D−1 (x0 − x∗ ) to
the right hand side give


     D∆s ≤            (XS)−1/2 [−XSe + σµe − Xr1 ] + 2νk D−1 (x0 − x∗ )
                                                                                                 (4.22)
                   +2νk D(s0 − s∗ ) .

    We can write

                                                              n
               −1/2                                  2              (−xi si + σµ − xi r1,i )2
       (XS)           (−XSe + σµe − Xr1 )                =
                                                              i=1
                                                                             x i si
                                                 2
                 − XSe + σµe − Xr1                        1
           ≤                                         ≤      − XSe + σµe − Xr1 2 .
                     mini xi si                          γµ

    because (x, y, s) ∈ N−∞ (γ, β) which implies xi si ≥ γµ for i = 1, ..., n.
    On the other hand,

                        2                  2              2
   − XSe + σµe              =    XSe           + σµe          − 2σµeT XSe = XSe            2
                                                                                               + nσ 2 µ2 − 2nσµ2
                                           2
                            ≤    XSe       1   + nσ 2 µ2 − 2nσµ2 = (xT s)2 + nσ 2 µ2 − 2nσµ2
                            ≤ n2 µ2 + nσ 2 µ2 − 2nσµ2 ≤ n2 µ2 ,
Chapter 4. Inexact Interior Point Method                                                         100


as σ ∈ (0, 1). This leads to


        − XSe + σµe − Xr1              ≤  − XSe + σµe + Xr1
                                             √                √
                                       ≤ nµ + n XB rB ∞ ≤ nµ + nηµ
                                             √
                                       ≤ nµ + nηmax µ,

which implies the following

                                                                      √
         (XS)−1/2 (−XSe + σµe − Xr1 ) ≤ γ −1/2 (n +                       nηmax )µ1/2 .     (4.23)


On the other hand

                   νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ )
                                                                                            (4.24)
                   ≤ νk ( D−1 + D ) max( x0 − x∗ , s0 − s∗ ).

For the matrix norm D−1 , we have

            −1
 D−1 ≤ max Dii = D−1 e                     ∞   = (XS)−1/2 Se      ∞   ≤ (XS)−1/2          s 1,
               i



and similarly


                                D ≤ (XS)−1/2           x 1.


Using Lemma 4.2.1 and (4.24) we get


 νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ ) ≤ νk (x, s)               1   (XS)−1/2 max( x0 − x∗ ,
  s0 − s∗ ) ≤ C1 γ −1/2 µ1/2 max( x0 − x∗ , s0 − s∗ ).

    By substituting the previous inequality and (4.23) in (4.21) and (4.22)
Chapter 4. Inexact Interior Point Method                                          101


we get

                               √
 D−1 ∆x ≤ (γ −1/2 (n +             nηmax ) + 2C1 γ −1/2 max( x0 − x∗ , s0 − s∗ ))µ1/2


and

                            √
 D∆s ≤ (γ −1/2 (n +          nηmax ) + 2C1 γ −1/2 max( x0 − x∗ , s0 − s∗ ))µ1/2 .


Let us define C2 as

                          √
      C2 = γ −1/2 (n +     nηmax ) + 2C1 γ −1/2 max( x0 − x∗ , s0 − s∗ ).


which completes the proof.

Lemma 4.2.3. Assume that (xk , y k , sk ) ∈ N−∞ (γ, β), (∆xk , ∆y k , ∆sk ) sat-
isfies (4.10) and (4.11) for all k ≥ 0, and µk ≤ (1 − .01αk−1 )µk−1 for all
k ≥ 1. Then there is a positive constant C3 such that


                                 |(∆xk )T ∆sk | ≤ C3 µk ,                      (4.25)



                                    |∆xk ∆sk | ≤ C3 µk
                                       i   i                                   (4.26)


for all k ≥ 0.

Proof. For simplicity we omit the iteration index k in the proof. From Lemma
4.2.2 we have


         |∆xT ∆s| = |(D−1 ∆x)T (D∆s)| ≤ D−1 ∆x                     2
                                                            D∆s ≤ C2 µ.
Chapter 4. Inexact Interior Point Method                                   102


Moreover, using Lemma 4.2.2 again we obtain

               −1                   −1
|∆xi ∆si | = |Dii ∆xi Dii ∆si | = |Dii ∆xi ||Dii ∆si | ≤ D−1 ∆x D∆s ≤ C2 µ.
                                                                       2




                    2
Let us denote C3 = C2 , and the proof is complete.

Theorem 4.2.4. Assume that (xk , y k , sk ) ∈ N−∞ (γ, β), (∆xk , ∆y k , ∆sk )
satisfies (4.10) and (4.11) for all k ≥ 0, and µk ≤ (1 − .01αk−1 )µk−1 for
all k ≥ 1. Then there is a value α ∈ (0, 1) such that the following three
                                 ¯
conditions are satisfied for all α ∈ [0, α] for all k ≥ 0
                                        ¯


                  (xk + α∆xk )T (sk + α∆sk ) ≥ (1 − α)(xk )T sk          (4.27)



                                           γ k
          (xk + α∆xk )(sk + α∆sk ) ≥
            i      i    i      i             (x + α∆xk )T (sk + α∆sk )   (4.28)
                                           n



                (xk + α∆xk )T (sk + α∆sk ) ≤ (1 − .01α)(xk )T sk .       (4.29)


Proof. For simplicity we omit the iteration index k in the proof.
    The last row of the system (4.10) implies


                      sT ∆x + xT ∆s = −xT s + nσµ − xT rB ,
                                                     B



and


                       si ∆xi + xi ∆si = −xi si + σµ − xi r1,i
Chapter 4. Inexact Interior Point Method                                             103


which leads to

 (x + α∆x)T (s + α∆s) = xT s + α(xT ∆s + sT ∆x) + α2 (∆x)T ∆s
                                = xT s + α(−xT s + nσµ − xT rB ) + α2 (∆x)T ∆s
                                                          B

                                = (1 − α)xT s + nασµ − αxT rB + α2 (∆x)T ∆s.
                                                         B


Similarly


 (xi + α∆xi )(si + α∆si ) = xi si + α(si ∆xi + xi ∆si ) + α2 ∆xi ∆si
                                  = xi si + α(−xi si + σµ − xi r1,i ) + α2 ∆xi ∆si
                                  = (1 − α)xi si + ασµ − αxi r1,i + α2 ∆xi ∆si .

For (4.27) we have


 (x + α∆x)T (s + α∆s) − (1 − α)xT s = (1 − α)xT s + nασµ − αxT rB
                                                             B

                                                    +α2 (∆x)T ∆s − (1 − α)xT s
                                                = nασµ − αxT rB + α2 (∆x)T ∆s
                                                           B

                                                ≥ nασµ − α|xT rB | − α2 |(∆x)T ∆s|
                                                            B

                                                ≥ nασµ − nαηµ − α2 C3 µ

where we used the fact that from (4.11) we have


                            |xT rB | ≤ n XB rB
                              B                    ∞   ≤ nηµ.


     Therefore, the condition (4.27) holds for all α ∈ [0, α1 ], where α1 is given
by

                                            n(σ − η)
                                     α1 =            ,                           (4.30)
                                               C3

and we choose η < σ − ε1 to guarantee α1 to be strictly positive, where ε1 is
Chapter 4. Inexact Interior Point Method                                               104


a constant strictly greater than zero.
    Let us consider (4.28)

                            γ
 (xi + α∆xi )(si + α∆si ) − n (x + α∆x)T (s + α∆s) = (1 − α)xi si + ασµ
                          γ
 −αxi r1,i + α2 ∆xi ∆si − n ((1 − α)xT s + nασµ − αxT rB + α2 (∆x)T ∆s)
                                                    B


because (x, y, s) ∈ N−∞ (γ, β), so xi si ≥ γµ, ∀i = 1, ..., n, that gives

                            γ
 (xi + α∆xi )(si + α∆si ) − n (x + α∆x)T (s + α∆s) ≥ (1 − α)γµ + ασµ
                                                      γ          γ
 −α maxi xi r1,i − α2 |∆xi ∆si | − γ(1 − α)µ − γασµ + n αxT rB − n α2 (∆x)T ∆s
                                                          B
                                                 γ             γ
 ≥ ασµ − α XB rB          ∞   − α2 C3 µ − ασγµ − n α|xT rB | − n α2 C3 µ ≥ ασµ − αηµ
                                                      B
                          γ
 −α2 C3 µ − ασγµ − γαηµ − n α2 C3 µ ≥ α((1 − γ)σ − η(1 + γ))µ − 2α2 C3 µ

Condition (4.28) holds for all α ∈ [0, α2 ], where α2 is given by:

                                     σ(1 − γ) − (1 + γ)η
                              α2 =                       .                       (4.31)
                                            2C3

                     σ(1−γ)
We choose η <         (1+γ)
                              − ε2 to guarantee α2 to be strictly positive, where ε2
is a constant strictly greater than zero.
    Finally, let us consider condition (4.29)

 1
 n
   [(x   + α∆x)T (s + α∆s) − (1 − .01α)xT s] =
   1
 = n [(1 − α)xT s + nασµ − αxT rB + α2 (∆x)T ∆s − (1 − .01α)xT s]
                             B
   1
 = n [−.99αxT s + nασµ − αxT rB + α2 (∆x)T ∆s]
                           B
                                            α2                               α2
 ≤ −.99αµ + ασµ + α |xT rB | +
                  n B                       n 3
                                               Cµ   ≤ −.99αµ + ασµ + αηµ +   n 3
                                                                                C µ.

We can conclude that condition (4.29) holds for all α ∈ [0, α3 ], where α3 is
given by:

                                           n(0.99 − σ − η)
                                α3 =                       .                     (4.32)
                                                 C3
Chapter 4. Inexact Interior Point Method                                             105


We choose η and σ such that η + σ < 0.99 − ε3 to guarantee α3 to be strictly
positive, where ε3 is a constant strictly greater than zero.
    Combining the bounds (4.30), (4.31) and (4.32), we conclude that condi-
tions (4.27), (4.28) and (4.29) hold for α ∈ [0, α], where
                                                 ¯

                       n(σ − η) σ(1 − γ) − (1 + γ)η n(0.99 − σ − η)
      α = min 1,
      ¯                        ,                   ,                           .   (4.33)
                          C3           2C3                C3




    We introduce the constants ε1 , ε2 and ε3 to guarantee that the limit of the
step length α is strictly greater than zero and to make it flexible to choose
            ¯
the parameters ηk and σk .
                          σ(1−γ)                         (1−γ)
    Note that if η <      (1+γ)
                                   then η < σ because    (1+γ)
                                                                 < 1 for any γ ∈ (0, 1).
    From this theorem we observe that the forcing term ηk should be chosen
                                                   σk (1−γ)
such that the following two conditions ηk <         (1+γ)
                                                            −ε2   and ηk +σk < 0.99−ε3
are satisfied. Under these assumption the following theorem guarantees that
there is a step length α such that the new point belongs to the neighbour-
hood N−∞ (γ, β) and its average complementarity gap decreases according to
condition (4.13).
    Below we prove two theorems using standard techniques which follow
from Wright [77].
                                             σk (1−γ)
Theorem 4.2.5. Assume that ηk <               (1+γ)
                                                        − ε2 , ηk + σk < 0.99 − ε3 for
ε2 , ε3 > 0, (xk , y k , sk ) ∈ N−∞ (γ, β) and (∆xk , ∆y k , ∆sk ) satisfies (4.10)
and (4.11) for all k ≥ 0, µk ≤ (1 − .01αk−1 )µk−1 for all k ≥ 1. Then
(xk (α), y k (α), sk (α)) ∈ N−∞ (γ, β) and µk (α) ≤ (1−.01α)µk for all α ∈ [0, α],
                                                                               ¯
where α is given by (4.33).
      ¯

Proof. Theorem 4.2.4 ensures that the conditions (4.27), (4.28) and (4.29)
are satisfied. Note that (4.29) implies that the condition µk (α) ≤ (1−.01α)µk
Chapter 4. Inexact Interior Point Method                                                     106


is satisfied, while (4.28) guarantees that xk (α)sk (α) ≥ γµk (α).
                                           i     i

    To prove that (xk (α), y k (α), sk (α)) ∈ N−∞ (γ, β), we have to prove that
   k       k                    0 0
 (ξp (α), ξd (α)) /µk (α) ≤ β (ξp , ξd ) /µ0 . From (4.14), (4.15) and (4.27) we
have

               k       k
             (ξp (α),ξd (α))                     k k
                                    (1−α) (ξp ,ξd )                k k
                                                           (1−α) (ξp ,ξd )         k k
                                                                                 (ξp ,ξd )
                  µk (α)
                                =        µk (α)
                                                       ≤      (1−α)µk
                                                                             ≤     µk
                                       (ξ 0 ,ξ 0 )
                                ≤   β p 0d ,
                                          µ


since (xk , y k , sk ) ∈ N−∞ (γ, β).

Theorem 4.2.6. The sequence {µk } generated by the IIPF Algorithm con-
                                                                  k k
verges Q-linearly to zero, and the sequence of residual norms { (ξp , ξd ) }
converges R-linearly to zero.

Proof. Q-linear convergence of {µk } follows directly from condition (4.13)
and Theorem 4.2.4. There exists a constant α > 0 such that αk ≥ α for
                                           ¯                    ¯
every k such that


             µk+1 ≤ (1 − .01αk )µk ≤ (1 − .01¯ )µk , for all k ≥ 0.
                                             α


From (4.14) and (4.15) we also have

                             k+1 k+1                k k
                           (ξp , ξd ) ≤ (1 − αk ) (ξp , ξd ) .


Therefore,

                             k+1 k+1              k k
                           (ξp , ξd ) ≤ (1 − α) (ξp , ξd ) .
                                             ¯


Also from Theorem 4.2.5 we know that

                                                               0 0
                                 k+1 k+1
                                                             (ξp , ξd )
                               (ξp , ξd ) ≤ µk β                        .
                                                               µ0
Chapter 4. Inexact Interior Point Method                                           107


Therefore, the sequence of residual norms is bounded above by another se-
                                         k k
quence that converges Q-linearly, so { (ξp , ξd ) } converges R-linearly.

Theorem 4.2.7. Let             > 0 and the starting point (x0 , y 0 , s0 ) ∈ N−∞ (γ, β)
in the Algorithm IIPF be given. Then there is an index K with


                                    K = O(n2 |log |)


such that the iterates {(xk , y k , sk )} generated by IIPF Algorithm satisfy


                                µk ≤ , for all k ≥ K.


Proof. If the conditions of Theorem 4.2.5 are satisfied, then the conditions
(4.12) and (4.13) are satisfied for all α ∈ [0, α] for all k ≥ 0. By Theorem
                                               ¯
4.2.4, the quantity α satisfies
                    ¯

                       n(σ − η) σ(1 − γ) − (1 + γ)η n(0.99 − σ − η)
       α ≥ min 1,
       ¯                       ,                   ,                         .
                          C3           2C3                C3

Furthermore, from Lemmas 4.2.1, 4.2.2 and 4.2.3 we have C3 = O(n2 ), there-
fore

                                                δ
                                           α≥
                                           ¯
                                                n2

for some positive scalar δ independent of n. That implies

                                                     .01δ
                µk+1 ≤ (1 − .01¯ )µk ≤ (1 −
                               α                          )µk , for k ≥ 0.
                                                      n2

The complexity result is an immediate consequence of Theorem 3.2 of [77].
Chapter 5

Numerical Results

The numerical results, which are demonstrated in this chapter, have been
presented in the paper [2]. The method discussed in this thesis has been
implemented in the context of HOPDM [36]. We have implemented the
preconditioned conjugate gradients method for the augmented system given
a specific starting point. In the implementation, the starting point (3.19)
with two zero blocks in its residual is used. We consider a subset of the
linear programming problems from the Netlib [30], Kennington [14] and other
public test sets used in [60]. In this chapter we indicate that the new approach
can be very effective in some cases, and that the new approach is an important
option for some classes of problems.
   In the initial iterations of the interior point method the normal equa-
tions are solved using the direct approach by forming the Cholesky factori-
sation LDLT for the normal equations matrix. As the interior point method
approaches optimality, the normal equation matrix becomes extremely ill-
conditioned due to a very different magnitude of the entries in Θ. At this
point, we switch to the iterative solver. In practice, we switch to PCG when
two conditions are satisfied: firstly, there are enough small elements in Θ−1


                                       108
Chapter 5. Numerical Results                                               109

                                      −1         −1
(we have at least 3m/4 small entries θj , where θj ≤ 10−2 ) . Secondly, the
relative duality gap is less than or equal to 10−2 .
    In our implementation, the termination criterion for the PCG method
is set as rk / r(0) < ε. Initially, we chose ε = 10−2 . When the relative
duality gap becomes less than or equal to 10−3 the value of ε is changed to
10−3 and, finally, when the relative duality gap falls below 10−4 the value of
ε becomes 10−4 .
    Through out our study, we assume A has full row rank. This assumption
does not effect on the robustness of this approach. Since, if A does not have
full rank, we add artificial variables to the constraints to construct full rank
constraints matrix A. We also add these variables to the objective function
after multiply them with big constant M .
    The numerical results, which are shown in this chapter, are calculated
for the following case. The matrix B is rebuilt at each iteration of interior
point method, where the iterative solver is used. On the other hand, we can
used the old information to update B for the next iteration. This will save
a lot of factorisation time. However, in this case we will have larger θj , and
consequently the number of the PCG iterations will increase. The idea of
updating B is very interesting, but it requires a lot of work to grab hold of
the best total running time (especially, to make a balance between the time
of the PCG solver and the LU factorisation) for most of problems. This will
be one of our future works.

    In Table 5.1, we report the problem sizes: m, n and nz(A) denote the
number of rows, columns and nonzeros in the constraint matrix A. In the
next two columns, nz(B) denotes the number of nonzeros in the LU fac-
torisation of the basis matrix B and nz(L) denotes the number of nonzero
elements in the Cholesky factor of the normal equations matrix. In this chap-
Chapter 5. Numerical Results                                              110


ter, we report results for problems which benefit from the use of the iterative
approach presented. As shown in the last column of Table 5.1, the iterative
method is storage-efficient, requiring one or two orders of magnitude less
memory than the Cholesky factorisation. These results show that in most
cases we save more than 90% of the memory by using the LU factorisation
compared with Cholesky factorisation. In pds20 problem for instance, the
Cholesky factorisation has 1626987 nonzeros, while LU factorisation only has
37123, which makes the memory saving reach 97.7%. If the PCG approach
were used for all IPM iterations, this memory advantage would allow certain
problems to be solved for which the memory requirement of Cholesky would
be prohibitive. In addition, it is essential that the LU factors are smaller
by a significant factor since they will have to be applied twice for each PCG
iteration when solving for the Newton direction, whereas the direct method
using Cholesky factors requires the L factor to be used just twice to compute
the Newton direction. The relative memory requirement can also be viewed
as a measure of the maximum number of PCG iterations that can be per-
formed while remaining competitive with the direct method using Cholesky
factors.

    The results of comparing our mixed approach against the pure direct
approach are given in Table 5.2. In all reported runs we have asked for
eight digits of accuracy in the solution. For each test problem we report the
number of interior point iterations and the total CPU time in seconds needed
to solve the problem. Additionally, for the mixed approach we also report
the number of interior point iterations in which preconditioned conjugate
gradients method was used (IPM-pcg). For the problem fit2p, for example,
12 of the 25 interior point iterations used the iterative solution method:
the remaining 13 iterations used the direct method. In the last column
Chapter 5. Numerical Results                                               111


of Table 5.2 we report the saving in the total CPU time, when the mixed
approach is used instead of the pure direct approach. For the problem fit2p,
for example, the mixed approach is 64% faster than the pure direct approach.
    As we report in the column headed “Mixed approach” of Table 5.2,
we use the PCG method only in the final iterations of the interior point
method, while the rest of the interior point iterations are made using the
direct method. For most problems, the numbers of IPM iterations required
when using the pure direct and mixed approaches to solve a given problem
are the same or differ only slightly. However, for chr15a, pds-10 and pds-20,
the mixed approach requires more iterations, significantly so in the case of
the latter two problems. In the case of chr15a this accounts for the only
negative time saving in Table 5.2. For one problem, chr22b, using the mixed
approach leads to significantly fewer IPM iterations being required.

    In order to give an insight into the behaviour of the preconditioned conju-
gate gradients, in Table 5.3 we report the number of PCG iterations needed
to solve a particular linear system. First, we report separately this number
for the last interior point iteration when our preconditioner is supposed to
behave best. The following three columns correspond to the minimum, the
average, and the maximum number of PCG iterations encountered through-
out all iterative solves.
    Finally, in Table 5.4 we report results for the problems solved with the
pure iterative method. In these runs we have ignored the spread of elements
in the diagonal matrix Θ and the distance to optimality, and we have forced
the use of the PCG method in all interior point iterations. Such an approach
comes with a risk of failure of the PCG method because the preconditioner
does not have all its attractive properties in the earlier IPM iterations. In-
deed, we would not advise its use in the general context. However, for several
Chapter 5. Numerical Results                                              112


problems in our collection such an approach has been very successful. In this
table the term unsolved denotes to that the solver is excess iteration limit.

    So far, we have reported some problems, which are benefit of our ap-
proach. In Table 5.5 and Table 5.6 we show problems, which do not benefit
of our approach. The consequences of using an iterative solver to solve the
linear systems which arise from IPM, may lead to increase the number of
IPM iterations. The total running time does not improve in the following
problems because of this reason: shell, nw14, pds-02 and storm8. In the
most of the problems in tables 5.5 and 5.6, the iterative approach works fine.
Since, the PCG method converges to the solution in reasonable number of
iterations. The slowness of the running time is due to that the solving time
of iterative approach increases comparing with the direct approach. In agg
and gfrd-pnc for instance there is no much saving in term of nonzero in the
factorization, which causes increasing of the solving time.
Chapter 5. Numerical Results                                             113




     Problem             Dimensions        Nonzeros in Factors Memory
                      m        n nz(A)      nz(B)       nz(L)   saving
     aircraft       3754    7517 24034       9754     1417131   99.3 %
     chr12a          947    1662    5820     5801       78822   92.6 %
     chr12b          947    1662    5820     4311       85155   94.9 %
     chr12c          947    1662    5820     6187       80318   92.3 %
     chr15b         1814    3270 11460       9574      218023   95.6 %
     chr15c         1814    3270 11460       9979      219901   95.5 %
     chr18a         3095    5679 19908      19559      531166   95.5 %
     chr18b         3095    5679 19908       9139      527294   96.3 %
     chr20a         4219    7810 27380      38477      885955   95.7 %
     chr20b         4219    7810 27380      63243      893674   92.9 %
     chr20c         4219    7810 27380      23802      926034   94.7 %
     chr22a         5587 10417 36520        33685     1392239   97.5 %
     chr22b         5587 10417 36520        38489     1382161   97.2 %
     chr25a         8148 15325 53725        49605     2555662   98.1 %
     fit1p            628    1677 10894       5002      196251   97.5 %
     fit2p           3001 13525 60784        34303     4498500   99.2 %
     fome10         6071 12230 35632       114338     1610864   92.2 %
     fome11        14695 24460 71264       237844     3221728   92.6 %
     fome12        24285 48920 167492      445156     6443456   93.1 %
     pds-06         9882 28655 82269        22020      580116   96.2 %
     pds-10        16559 48763 140063       37123     1626987   97.7 %
     pds-20        33875 105728 304153      77352     6960089   97.7 %
     route         20894 23923 187686       14876     3078015   99.5 %
     scr10           689    1540    5940    13653      124559   89.0 %
     scr12          1151    2784 10716      20437      330483   93.8 %
     scr15          2234    6210 24060      77680      125514   38.1 %
     scr20          5079 15980 61780       446686     6561431   93.2 %

Table 5.1: Comparing the number of nonzero elements in the LU factorisation
of the basis B and in the Cholesky factorisation of the normal equations
matrix AΘAT .
Chapter 5. Numerical Results                                               114




   Problem        Direct approach                 Mixed approach        Time
                   Time IPM-iters           Time IPM-iters IPM-pcg    saving
   aircraft        33.15         17         24.94         17      5   24.8 %
   chr12a          0.304         14         0.290         14      2   4.61 %
   chr12b          0.402         16         0.354         16      3   11.9 %
   chr12c          0.256         11         0.254         11      1   0.78 %
   chr15b          1.263         17         1.196         17      2   5.80 %
   chr15c          1.231         17         1.194         17      2   3.03 %
   chr18a          6.480         29         5.747         30      5   11.3 %
   chr18b          3.520         16         3.213         16      3   8.72 %
   chr20a          13.69         28         9.292         28     14   23.1 %
   chr20b          11.31         27         9.895         27      8   12.5 %
   chr20c          11.91         23         11.76         23      4   1.26 %
   chr22a          25.59         28         24.73         28      2   3.36 %
   chr22b          48.78         52         27.09         33      2   44.5 %
   chr25a          81.04         39         71.92         39      5   11.3 %
   fit1p             3.49         20          2.01         20      9   42.2 %
   fit2p           583.33         25        211.93         25     12   63.7 %
   fome10         281.96         45        124.01         43     17   56.0 %
   fome11         827.85         48        288.44         44     17   65.2 %
   fome12        1646.29         48        604.98         44     17   63.3 %
   pds-06          60.81         44         28.12         43     21   57.8 %
   pds-10         198.08         38        103.34         53     29   47.8 %
   pds-20        2004.87         47        770.83         66     38   61.6 %
   route           53.98         25         48.99         24      4   9.20 %
   scr10           0.839         19         0.685         19      8   18.4 %
   scr12           3.092         14         2.951         14      2   18.8 %
   scr15           50.79         26         41.22         26      7   18.8 %
   scr20          614.56         25        517.62         26      4   15.8 %

                               Table 5.2: Solution statistics.
Chapter 5. Numerical Results                                        115




                      Problem            PCG Iterations
                                 lastIPM min average max
                      aircraft         10    8         9  10
                      chr12a           19   18        20  23
                      chr12b           29   28        29  29
                      chr12c           26   26        26  26
                      chr15b           33   31        38  36
                      chr15c           32   31        32  32
                      chr18a           37   35        37  38
                      chr18b           57   53        56  57
                      chr20a           39   38        56  82
                      chr20b           32   32        63 104
                      chr20c           45   42        44  45
                      chr22a           48   46        49  53
                      chr22b           45   39        42  46
                      chr25a           51   46        50  55
                      fit1p              2    2         3   6
                      fit2p              4    3        15  43
                      fome10          142 129        243 519
                      fome11          169 123        205 494
                      fome12          111 111        210 500
                      pds-06           60   36        53  71
                      pds-10           66   45        60  86
                      pds-20          111   44        78 145
                      route            85   30        60  92
                      scr10            19   16        19  23
                      scr12            44   44        45  45
                      scr15            43   43        61  78
                      scr20           200 141        181 291

Table 5.3: The number of PCG iterations during the interior point method
iterations.
Chapter 5. Numerical Results                                              116




      Problem Direct approach          Iterative approach         Time
                 Time IPM-iters         Time        IPM-iters    saving
      aircraft   33.15        17          2.87             15   91.3 %
      chr12a     0.304        14        0.449              14  -47.7 %
      chr12b     0.402        16        0.306              14    23.9%
      chr12c     0.256        11        0.254              11    1.01%
      chr15b     1.263        17        0.944              16   25.3 %
      chr15c     1.231        17        0.959              18   22.1 %
      chr18a     6.480        29        3.119              29   51.9 %
      chr18b     3.520        16        2.255              18   35.9 %
      chr20a     13.69        28        5.721              34   58.2 %
      chr20b     11.31        27        5.721              30   49.4 %
      chr20c     11.91        23        4.800              22   59.7 %
      chr22a     25.59        28        6.725              31   73.7 %
      chr22b     48.78        52        8.232              36   83.1 %
      chr25a     81.04        39        17.54              41   78.4 %
      fit1p        3.49        20          0.38             19   89.1 %
      fit2p      583.33        25        19.09              26   96.7 %
      fome10    281.96        45       126.72              47   19.6 %
      fome11    827.85        48       437.93              51  74.02 %
      fome12   1646.29        48             -              - Unsolved
      pds-06     60.81        44        98.80              44  -31.23%
      pds-10    198.08        38       122.42              46   33.15%
      pds-20   2004.87        47             -              - Unsolved
      scr10      0.839        19        0.633              19   24.6 %
      scr12      3.092        14        1.701              15   96.7 %
      scr15      50.79        26        16.55              26   67.4 %
      scr20     614.56        25             -              - Unsolved

                Table 5.4: Efficiency of the pure iterative method.
Chapter 5. Numerical Results                                               117




 Problem                Dimensions         Direct approach           Mixed approach
                  m           n    nz(A)   Time IPM-iters    Time     IPM-iters IPM-pcg
 80bau3b        2235      14269    24883   2.209        50   5.172           50      14
 agg            3754       7517    24034   0.179        20   0.277           26      12
 bore3d          233        567     1679   0.064        23   0.059           23       2
 chr15a         1814       3270    11460   1.274        17   1.316           22       9
 dbir2         18879      64729 1177011    310.7        38   225.8           39      11
 gfrd-pnc        616       1776     3061   0.100        18   0.123           18      13
 pds-02         2953      10488    19424   1.476        31   3.213           34      15
 qap8            912       2544     8208   2.183        10   2.380           10       1
 nw14             73     123482 904983     24.12        45   46.04           50      27
 scorpion        388        854     1922   0.056        16   0.053           16       1
 shell           536       2313     3594   0.150        21   0.407           43      21
 ship04l         360       2526     6740   0.123        16   0.142           16       3
 ship04s         360       1866     4760   0.099        16   0.117           16       5
 stocfor1        117        282      618   0.024        20   0.057           20      11
 stocfor2       2157       5202    11514   0.582        36   1.829           36      10
 storm8         4393      15715    32946   4.541        52   8.691           54      18

Table 5.5: Solution statistics for problems, which do not benefit of iterative
approach.
Chapter 5. Numerical Results                                         118




              Problem Nonzeros in Factors  PCG Iterations
                       nz(B)       nz(L) min average max
              80bau3b   5800       42709  29      64 226
              agg       1589       16629   3      24      45
              bore3d     821        2941  17      17      17
              chr15a   10533      218060  37      38      41
              dbir2    51609     2869915  50      74      93
              gfrd-pnc  1240        1798  11      13      15
              pds-02    6422       40288  38      48      58
              qap8     60553      193032 175     175 175
              nw14       443        1968   6       ??     15
              scorpion  1559        2102  38      38      38
              shell     1075        4096   3      25      45
              ship04l    941        4428  10      12      14
              ship04s    938        3252  10      11      13
              stocfor1   302         903   8      29      46
              stocfor2  6585       33207  32      96 325
              storm8    9805      136922  42      64      85

Table 5.6: Comparing the number of nonzero elements in the factorisations
and the number of PCG iterations during IPM.
Chapter 6

Conclusions

In this thesis we have discussed interior point method for linear programming
problems. At each iteration of the IPM at least one linear system has to be
solved. The main computational effort of interior point algorithms consists
in the computation of these linear systems. Every day optimization problems
become larger. Solving the corresponding linear systems with a direct method
becomes sometimes very expensive for large problems. In this thesis, we have
been concerned with using an iterative method to solve these linear systems.
In Chapter 2 we have reviewed some of the popular solution methods of these
linear systems (direct methods and iterative method).
   In this thesis we have used the PCG method to solve the (indefinite)
augmented system (1.7), which arises from interior point algorithms for linear
programming. We have proposed in Chapter 3 a new sparse preconditioner
for the augmented system. This preconditioner takes advantage of the fact
that a subset of elements in the matrix Θ−1 converge to zero as the solution of
the linear program is approached. We replace these elements with zeros in the
preconditioner. As a result, we have obtained a sparse and easily invertible
block-triangular matrix. The constraint matrix A has been partitioned into


                                     119
Chapter 6. Conclusions                                                    120


[B, N ], where B is an m by m nonsingular matrix. The matrix B is obtained
                                                                    −1
from m linearly independent columns of A which correspond to small θj . By
following the analysis of Rozlozn´ and Simoncini [65] closely, we have shown
                                 ık
that the PCG method can be applied to a non-symmetric indefinite matrix
for a specific starting point. In addition, we have analysed the behaviour of
the error and residual terms. This analysis reveals that, although we work
with the indefinite system preconditioned with the indefinite matrix, the
error and residual converge to zero and, asymptotically, behave in a similar
way to the classical case when PCG is applied to a positive definite system.
    The use of an iterative method in this context makes an essential dif-
ference in the implementation of the interior point algorithm. This requires
a better understanding of IPM convergence properties in a situation when
directions are inexact. In Chapter 4 we have considered the convergence
analysis of the inexact infeasible path-following algorithm, where the aug-
mented system is solved iteratively, according to what have been mentioned
earlier. We have used a trick which consisted in shifting the residual from
the dual constraint to the perturbed complementarity constraint. This has
allowed us to modify the analysis of the (exact) infeasible IPM [77, 81] and
generalize it to the inexact case. We have chosen a suitable stopping criteria
of the PCG method used in this context and have provided a condition on
the forcing term. Furthermore, we have proved the global convergence of the
IIPF algorithm and have provided a complexity result for this method.
    Finally, in Chapter 5 we have illustrated the feasibility of our approach
on a set of medium to large-scale linear problems. Based on these results we
conclude that it is advantageous to apply the preconditioned conjugate gra-
dient method to indefinite KKT systems arising in interior point algorithms
for linear programming.
Chapter 6. Conclusions                                                    121


    There are many research possibilities of interest still to explore in this
area. The approach proposed in this thesis has proved to work well. However,
in its current form it is limited to the linear programming case. One of
the possible developments is to extend this approach to the quadratic and
nonlinear programming problems.
Bibliography

[1] G. Al-Jeiroudi and J. Gondzio, Convergence analysis of inexact in-
   feasible interior point method for linear optimization, (accepted for pub-
   lication in Journal on Optimization Theory and Applications), (2007).

[2] G. Al-Jeiroudi, J. Gondzio, and J. Hall, Preconditioning indefi-
   nite systems in interior point methods for large scale linear optimization,
   Optimization Methods and Software, 23 (2008), pp. 345–363.

[3] E. D. Andersen, J. Gondzio, C. Meszaros, and X. Xu, Imple-
                                    ´ ´
   mentation of interior point methods for large scale linear programming,
   in Interior Point Methods in Mathematical Programming, T. Terlaky,
   ed., Kluwer Academic Publishers, 1996, pp. 189–252.

[4] M. Arioli, I. S. Duff, and P. P. M. de Rijk, On the augmented
   system approach to sparse least-squares problems, Numerische Mathe-
   matik, 55 (1989), pp. 667–684.

[5] V. Baryamureeba and T. Steihaug, On the convergence of an inex-
   act primal-dual interior point method for linear programming, in Lecture
   Notes in Computer Science, Springer Berlin/Heidelberg, 2006.

[6] S. Bellavia, An inexact interior point method, Journal of Optimization
   Theory and Applications, 96 (1998), pp. 109–121.

                                    122
123


 [7] S. Bellavia and S. Pieraccini, Convergence analysis of an inexact
    infeasible interior point method for semidefinite programming, Compu-
    tational Optimization and Applications, 29 (2004), pp. 289–313.

 [8] M. Benzi, G. Golub, and J. Liesen, Numerical solution of saddle
    point problems, Acta Numerica, 14 (2005), pp. 1–137.

 [9] L. Bergamaschi, J. Gondzio, M. Venturin, and G. Zilli, Inex-
    act constraint preconditioners for linear systems arising in interior point
    methods, Computational Optimization and Applications, 36 (2007),
    pp. 137–147.

[10] L. Bergamaschi, J. Gondzio, and G. Zilli, Preconditioning indef-
    inite systems in interior point methods for optimization, Computational
    Optimization and Applications, 28 (2004), pp. 149–171.

[11] M. W. Berry, M. T. Heath, I. Kaneko, M. Lawo, and R. J.
    Plemmon, An algorithm to compute a sparse basis of the null space,
    Numerische Mathematik, 47 (1985), pp. 483–504.

[12] A. Bjorck, Numerical Methods for Least Squares Problems, SIAM,
    Philadelphia, 1996.

[13] S. Bocanegra, F. Campos, and A. Oliveira, Using a hybrid
    preconditioner for solving large-scale linear systems arising from inte-
    rior point methods, Computational Optimization and Applications, 36
    (2007), pp. 149–164.

[14] W. J. Carolan, J. E. Hill, J. L. Kennington, S. Niemi, and
    S. J. Wichmann, An empirical evaluation of the KORBX algorithms
    for military airlift applications, Operations Research, 38 (1990), pp. 240–
    248.
124


[15] T. Carpenter and D. Shanno, An interior point method for
       quadratic programs based on conjugate projected gradients, Computa-
       tional Optimization and Applications, 2 (1993), pp. 5–28.

[16] J. S. Chai and K. C. Toh, Preconditioning and iterative solution of
       symmetric indefinite linear systems arising from interior point methods
       for linear programming, Computational Optimization and Applications,
       36 (2007), pp. 221–247.

[17] T. F. Coleman and A. Pothen, The null space problem I. com-
       plexity, SIAM Journal on Algebraic and Discrete Methods, 7 (1986),
       pp. 527–537.

[18]       , The null space problem II. algorithms, SIAM Journal on Algebraic
       and Discrete Methods, 7 (1986), pp. 544–562.

[19] T. F. Coleman and A. Verma, A preconditioned conjugate gradi-
       ent approach to linear equality constrained minimization, Computational
       Optimization and Applications, 20 (2001), pp. 61–72.

[20] R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton
       methods, SIAM Journal on Numerical Analysis, 19 (1982), pp. 400–408.

[21] H. Dollar, N. Gould, and A. Wathen, On implicit-factorization
       constraint preconditioners, in Large-Scale Nonlinear Optimization, G. D.
       Pillo, ed., Springer Netherlands, 2006.

[22] H. Dollar and A. Wathen, Approximate factorization constraint
       preconditioners for saddle-point matrices, SIAM Journal on Scientific
       Computing, 27 (2005), pp. 1555–1572.
125


[23] H. S. Dollar, N. I. M. Gould, W. H. A. Schilders, and A. J.
    Wathen, Implicit-factorization preconditioning and iterative solvers for
    regularized saddle-point systems, SIAM Journal on Matrix and Applica-
    tions, 28 (2006), pp. 170–189.

[24] I. S. Duff, A. M. Erisman, and J. K. Reid, Direct methods for
    sparse matrices, Oxford University Press, New York, 1987.

[25] B. Fischer, Polynomial Based Iteration Methods for Symmetric Linear
    Systems, Wiley-Teubner, Chichester and Stuttgart, 1996.

[26] R. Fletcher, Conjugate gradient methods for indefinite systems, in
    Numerical Analysis Dundee 1975, G. Watson, ed., Springer-Verlag,
    Berlin, New York, 1976, pp. 73–89.

[27] A. Forsgren, P. E.Gill, and M. H.Wright, Interior point meth-
    ods for nonlinear optimization, SIAM Review, 44 (2002), pp. 525–597.

[28] R. W. Freund and F. Jarre, A QMR-based interior-point algorithm
    for solving linear programs, Mathematical Programming, 76 (1997),
    pp. 183–210.

[29] R. W. Freund, F. Jarre, and S. Mizuno, Convergence of a class
    of inexact interior-point algorithms for linear programs, Mathematics of
    Operations Research, 24 (1999), pp. 105–122.

[30] D. M. Gay, Electronic mail distribution of linear programming test
    problems, Mathematical Programming Society COAL Newsletter, 13
    (1985), pp. 10–12.

[31] A. George and J. W. H. Liu, The evolution of the minimum degree
    ordering algorithm, SIAM Review, 31 (1989), pp. 1–19.
126


[32]          , Computing solution of large sparse positive definite systems,
       Prentice-Hall, Englewood Cliffs, (NJ, 1981).

[33] J. C. Gilbert and J. Nocedal, Global convergence properties of
       conjugate gradient methods for optimization, SIAM Journal on Opti-
       mization, 2 (1992), pp. 21–42.

[34] P. E. Gill, W. Murray, D. B. Ponceleon, and M. A. Saun-
                                         ´
       ders, Preconditioners for indefinite systems arising in optimization,
       SIAM Journal on Matrix Analysis and Applications, 13 (1992), pp. 292–
       311.

[35] J. Gondzio, Implementing Cholesky factorization for interior point
       methods of linear programming, Optimization, 27 (1993), pp. 121–140.

[36]          , HOPDM (version 2.12) – a fast LP solver based on a primal-dual
       interior point method, European Journal of Operational Research, 85
       (1995), pp. 221–225.

[37] J. Gondzio and T. Terlaky, A computational view of interior point
       methods for linear programming, In J. E. Beasley, editor, Advances in
       Linear and Integer Programming, chapter 3, Oxford University Press,
       Oxford, England, (1994), pp. 103–144.

[38] C. Gonzaga, Path-following methods in linear programming, SIAM
       Review, 34 (1992), pp. 167–224.
                                     √
[39] C. Gonzaga and M. J. Todd, An O( nL)-iteration large-step
       primal-dual affine algorithm for linear programming, SIAM Journal on
       Optimization, 2 (1992), pp. 349–359.
127


[40] N. I. M. Gould, M. E. Hribar, and J. Nocedal, On the solution
    of equality constrained quadratic problems arising in optimization, SIAM
    Journal on Scientific Computing, 23 (2001), pp. 1375–1394.

[41] J. A. J. Hall and K. I. M. Mckinnon, Hyper-sparsity in the revised
    simplex method and how to exploit it, Computational Optimization and
    Applications, 32 (2005), pp. 259–283.

[42] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients
    for solving linear systems, Journal of Research of Natlional Bureau of
    Standards, 49 (1952), pp. 409–436.

[43] K. R. James and W. Riha, Convergence criteria for successive overre-
    laxation, SIAM Journal on Numerical Analysis, 12 (1975), pp. 137–143.

[44] J. J. Judice, J. Patricio, L. F. Portugal, M. G. C. Re-
            ´
    sende, and G. Veiga, A study of preconditioners for network inte-
    rior point methods, Computational Optimization and Applications, 24
    (2003), pp. 5–35.

[45] N. Karmarkar and K. Ramakrishnan, Computational results of
    an interior point algorithm for large scale linear programming, Mathe-
    matical Programming, 52 (1991), pp. 555–586.

[46] C. Keller, N. I. M. Gould, and A. J. Wathen, Constraint precon-
    ditioning for indefinite linear systems, SIAM Journal on Matrix Analysis
    and Applications, 21 (2000), pp. 1300–1317.

[47] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations,
    vol. 16 of Frontiers in Applied Mathematics, SIAM, Philadelphia, 1995.
128


[48] V. Klee and G. J. Minty, How good is the simplex algorithm?, in
       inequalities iii, O. Shisha, ed., Academic Press, London, New York,
       (1972), pp. 159–175.

[49] J. Korzak, Convergence analysis of inexact infeasible-interior-point-
       algorithm for solving linear progamming problems, SIAM Journal on
       Optimization, 11 (2000), pp. 133–148.

[50] Z. Lu, R. D. S. Monteiro, and J. W. O’Neal, An iterative
       solver-based infeasible primal-dual path-following algorithm for convex
       QP, SIAM Journal on Optimization, 17 (2006), pp. 287–310.

[51] L. Lukˇan and J. Vlcek, Indefinitely preconditioned inexact New-
           s            ˇ
       ton method for large sparse equality constrained nonlinear program-
       ming problems, Numerical Linear Algebra with Applications, 5 (1998),
       pp. 219–247.

[52] I. Lustig, R. Marsten, and D. Shanno, Computational experience
       with a primaldual interior point method for linear programming, Linear
       Algebra and its Applications, 152 (1991), pp. 191–222.

[53]       , Interior point methods for linear programming: computational
       state of the art, ORSA Journal on Computing, 6 (1994), pp. 1–14.

[54] S. Mehrotra, Implementation of affine scaling methods: Approximate
       solutions of systems of linear equations using preconditioned conjugate
       gradient methods, Journal on Computing, 4 (1992), pp. 103–118.

[55] S. Mehrotra and J. S. Wang, Conjugate gradient based implementa-
       tion of interior point methods for network flow problems, in Linear and
       Nonlinear Conjugate Gradient-Related Methods, L. Adams and J. L.
129


    Nazareth, eds., AMS-IMS-SIAM Joint Summer Research Conference,
    1995.

[56] J. A. Meijerink and H. A. V. D. Vorst, An iterative solution
    method for linear systems of which the coefficient matrix is symmetric
    M-matrix, Mathematics of Computation, 31 (1977), pp. 148–162.

[57] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM,
    Philadelphia, 2000.

[58] S. Mizuno and F. Jarre, Global and polynomial-time convergence of
    an infeasible-interior-point algorithm using inexact computation, Math-
    ematical Programming, 84 (1999), pp. 105–122.

[59] R. D. S. Monteiro and J. W. O’Neal, Convergence analysis of
    long-step primal-dual infeasible interior point LP algorithm based on
    iterative linear solvers, Georgia Institute of Technology, (2003).

[60] A. R. L. Oliveira and D. C. Sorensen, A new class of precon-
    ditioners for large-scale linear systems from interior point methods for
    linear programming, Linear Algebra and its Applications, 394 (2005),
    pp. 1–24.

[61] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Non-
    linear Equations in Several Variables, Academic Press, New York, 1970.

[62] C. C. Paige and M. A. Saunders, Solution of sparse indefinite
    systems of linear equations, SIAM Journal on Numerical Analysis, 12
    (1975), pp. 617–629.

[63] A. Pothen, Sparse null space basis computations in structural opti-
    mization, Numerische Mathematik, 55 (1989), pp. 501–519.
130


[64] M. G. C. Resende and G. Veiga, An implementation of the dual
    affine scaling algorithm for minimum cost flow on bipartite uncapacitated
    networks, SIAM Journal on Optimization, 3 (1993), pp. 516–537.

[65] M. Rozlozn´ and V. Simoncini, Krylov subspace methods for saddle
               ık
    point problems with indefinite preconditioning, SIAM Journal of Matrix
    Analysis and Applications, 24 (2002), pp. 368–391.

[66] Y. Saad, Iterative Method for Sparse Linear System, Second Edition,
    SIAM, Philadelphia, 1995.

[67] Y. Saad and M. Schultz, GMRES: A generalized minimal residual
    algorithm for solving nonsymmetric linear systems, SIAM Journal on
    Scientific and Statistical Computing, 7 (1986), pp. 856–869.

[68] J. Shewchuk, An introduction to the conjugate gradient method with-
    out the agonizing pain, tech. report, School of Computer Science,
    Carnegie Mellon University, USA, 1994.

[69] J. A. Tomlin, Pivoting for size and sparsity in linear programming
    inversion routes, Journal of Mathematics and Applications, 10 (1972),
    pp. 289–295.

[70] C. H. Tong and Q. Ye, Analysis of the finite precision Bi-conjugate
    gradient algorithm for nonsymmetric linear systems, Mathematics of
    Computation, 69 (1999), pp. 1559–1575.

[71] L. N. Trefethen and D. Bau, III, Numerical linear algebra, Society
    for Industrial and Applied Mathematics, SIAM, Philadelphia, 1997.

[72] H. A. van der Vorst, Iterative Krylov methods for large linear sys-
    tems, Cambridge University Press, Cambridge, (2003).
131


[73] R. Vanderbei, LOQO : An interior point code for quadratic program-
    ming, program in statistics and operations research, Princeton Univer-
    sity, (1995).

[74] R. S. Varga, Matrix Iterative Analysis, Englewood Cliffs, NJ, 1962.

[75] W. Wang and D. P. O’Leary, Adaptive use of iterative methods in
    predictor-corrector interior point methods for linear programming, Nu-
    merical Algorithms, 25 (2000), pp. 387–406.

[76] M. H. Wright, The interior-point revolution in optimization: history,
    recent developments, and lasting consequences, American Mathematical
    Society, 42 (2004), pp. 39–65.

[77] S. J. Wright, Primal-Dual Interior-Point Methods, SIAM, Philadel-
    phia, 1997.
                 √
[78] X. Xu, An O( nL)-iteration large-step infeasible path-following algo-
    rithm for linear programming, Technical report, University of Lowa,
    (1994).

[79] Y. Ye, Interior-point algorithm: theory and analysis, John Wiley and
    Sons, New York, 1997.

[80] D. M. Young, Iterative Soluation of Large Linear Systems, Academic
    Press, New York, 1971.

[81] Y. Zhang, On the convergence of a class of infeasible interior-point
    methods for the horizontal linear complementarity problem, SIAM Jour-
    nal on Optimization, 4 (1994), pp. 208–227.
[82] G. Zhou and K. C. Toh, Polynomiality of an inexact infeasible inte-
    rior point algorithm for semidefinite programming, Mathematical Pro-
    gramming, 99 (2004), pp. 261–282.

More Related Content

PDF
InternshipReport
PDF
Compiled Report
PDF
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
PDF
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
PDF
Notes
PDF
Machine learning cheat sheet
PDF
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
InternshipReport
Compiled Report
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
Notes
Machine learning cheat sheet
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...

What's hot (20)

PDF
Free Ebooks Download ! Edhole
PDF
Stereographic Circular Normal Moment Distribution
PDF
Principle of Integral Applications - Integral Calculus - by Arun Umrao
PDF
Notes of Units, Dimensions & Errors for IIT JEE by Arun Umrao
PDF
Nishant_thesis_report
PDF
Numerical Solution of Nth - Order Fuzzy Initial Value Problems by Fourth Orde...
PDF
Principle of Definite Integra - Integral Calculus - by Arun Umrao
PDF
Discontinuous Galerkin Timestepping for Nonlinear Parabolic Problems
PDF
A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...
PDF
A Mathematical Model to Solve Nonlinear Initial and Boundary Value Problems b...
PDF
Principle of Integration - Basic Introduction - by Arun Umrao
PDF
CHN and Swap Heuristic to Solve the Maximum Independent Set Problem
PDF
PDF
Solving the Poisson Equation
PDF
IRJET- Wavelet based Galerkin Method for the Numerical Solution of One Dimens...
PDF
Numerical Solution Of Delay Differential Equations Using The Adomian Decompos...
PDF
Graph Spectra through Network Complexity Measures: Information Content of Eig...
PDF
The theory of continuum and elasto plastic materials
PDF
NEW NON-COPRIME CONJUGATE-PAIR BINARY TO RNS MULTI-MODULI FOR RESIDUE NUMBER ...
Free Ebooks Download ! Edhole
Stereographic Circular Normal Moment Distribution
Principle of Integral Applications - Integral Calculus - by Arun Umrao
Notes of Units, Dimensions & Errors for IIT JEE by Arun Umrao
Nishant_thesis_report
Numerical Solution of Nth - Order Fuzzy Initial Value Problems by Fourth Orde...
Principle of Definite Integra - Integral Calculus - by Arun Umrao
Discontinuous Galerkin Timestepping for Nonlinear Parabolic Problems
A Numerical Method for the Evaluation of Kolmogorov Complexity, An alternativ...
A Mathematical Model to Solve Nonlinear Initial and Boundary Value Problems b...
Principle of Integration - Basic Introduction - by Arun Umrao
CHN and Swap Heuristic to Solve the Maximum Independent Set Problem
Solving the Poisson Equation
IRJET- Wavelet based Galerkin Method for the Numerical Solution of One Dimens...
Numerical Solution Of Delay Differential Equations Using The Adomian Decompos...
Graph Spectra through Network Complexity Measures: Information Content of Eig...
The theory of continuum and elasto plastic materials
NEW NON-COPRIME CONJUGATE-PAIR BINARY TO RNS MULTI-MODULI FOR RESIDUE NUMBER ...
Ad

Viewers also liked (16)

PDF
Emerging Trends in Technology
PDF
Steel Sparrow
PPTX
Stimulating creativity in children presentation
PPTX
Jeremiah's project
DOCX
DOC
Updated cv Patrick Chigura
PPTX
Libraries matter
PDF
Total_KVM_catalog_8_page_USE
PDF
LR WORLD ΟΚΤΩΒΡΗΣ 2015
PDF
PenO3 Introductie slides
PDF
CSRF: ways to exploit, ways to prevent
PDF
Pat Good Results Analysis
DOCX
APUNTES
DOCX
Quick assist locksmith
Emerging Trends in Technology
Steel Sparrow
Stimulating creativity in children presentation
Jeremiah's project
Updated cv Patrick Chigura
Libraries matter
Total_KVM_catalog_8_page_USE
LR WORLD ΟΚΤΩΒΡΗΣ 2015
PenO3 Introductie slides
CSRF: ways to exploit, ways to prevent
Pat Good Results Analysis
APUNTES
Quick assist locksmith
Ad

Similar to On Inexact Newton Directions in Interior Point Methods for Linear Optimization (20)

PDF
Barret templates
PDF
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
PDF
Diederik Fokkema - Thesis
PDF
B02402012022
PDF
MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...
PDF
Computational Theory Of Iterative Methods Ioannis K Argyros Eds
PDF
Stochastic Programming
PDF
On the construction and comparison of an explicit iterative
DOCX
A First Course in NumeriCAl methodsCS07_Ascher-Gre.docx
PDF
2004 zuckerberg a set theoretic approach to lifting procedures for 0-1 inte...
PDF
Numerical Methods For Engineers_S. C. Chapra And R. P. Canale.pdf
PDF
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
PDF
C025020029
PDF
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
PDF
An efficient improvement of the Newton method for solving nonconvex optimizat...
PDF
math-basics.pdf
PDF
Na 20130603
PDF
James_F_Epperson_An_Introduction_to_Numerical_Methods_and_Analysis.pdf
PDF
A New Polynomial-Time Algorithm for Linear Programming
Barret templates
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Diederik Fokkema - Thesis
B02402012022
MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...
Computational Theory Of Iterative Methods Ioannis K Argyros Eds
Stochastic Programming
On the construction and comparison of an explicit iterative
A First Course in NumeriCAl methodsCS07_Ascher-Gre.docx
2004 zuckerberg a set theoretic approach to lifting procedures for 0-1 inte...
Numerical Methods For Engineers_S. C. Chapra And R. P. Canale.pdf
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
C025020029
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
An efficient improvement of the Newton method for solving nonconvex optimizat...
math-basics.pdf
Na 20130603
James_F_Epperson_An_Introduction_to_Numerical_Methods_and_Analysis.pdf
A New Polynomial-Time Algorithm for Linear Programming

More from SSA KPI (20)

PDF
Germany presentation
PDF
Grand challenges in energy
PDF
Engineering role in sustainability
PDF
Consensus and interaction on a long term strategy for sustainable development
PDF
Competences in sustainability in engineering education
PDF
Introducatio SD for enginers
PPT
DAAD-10.11.2011
PDF
Talking with money
PDF
'Green' startup investment
PDF
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
PDF
Dynamics of dice games
PPT
Energy Security Costs
PPT
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
PDF
Advanced energy technology for sustainable development. Part 5
PDF
Advanced energy technology for sustainable development. Part 4
PDF
Advanced energy technology for sustainable development. Part 3
PDF
Advanced energy technology for sustainable development. Part 2
PDF
Advanced energy technology for sustainable development. Part 1
PPT
Fluorescent proteins in current biology
PPTX
Neurotransmitter systems of the brain and their functions
Germany presentation
Grand challenges in energy
Engineering role in sustainability
Consensus and interaction on a long term strategy for sustainable development
Competences in sustainability in engineering education
Introducatio SD for enginers
DAAD-10.11.2011
Talking with money
'Green' startup investment
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
Dynamics of dice games
Energy Security Costs
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 1
Fluorescent proteins in current biology
Neurotransmitter systems of the brain and their functions

Recently uploaded (20)

PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
Unit 4 Skeletal System.ppt.pptxopresentatiom
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
IGGE1 Understanding the Self1234567891011
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Empowerment Technology for Senior High School Guide
PDF
RMMM.pdf make it easy to upload and study
PDF
advance database management system book.pdf
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
A systematic review of self-coping strategies used by university students to ...
Supply Chain Operations Speaking Notes -ICLT Program
Hazard Identification & Risk Assessment .pdf
Unit 4 Skeletal System.ppt.pptxopresentatiom
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
History, Philosophy and sociology of education (1).pptx
Final Presentation General Medicine 03-08-2024.pptx
IGGE1 Understanding the Self1234567891011
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Chinmaya Tiranga quiz Grand Finale.pdf
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Empowerment Technology for Senior High School Guide
RMMM.pdf make it easy to upload and study
advance database management system book.pdf

On Inexact Newton Directions in Interior Point Methods for Linear Optimization

  • 1. On Inexact Newton Directions in Interior Point Methods for Linear Optimization Ghussoun Al-Jeiroudi Doctor of Philosophy University of Edinburgh 2008
  • 2. Declaration I declare that this thesis was composed by myself and that the work contained therein is my own, except where explicitly stated otherwise in the text. (Ghussoun Al-Jeiroudi)
  • 3. Abstract In each iteration of the interior point method (IPM) at least one linear system has to be solved. The main computational effort of IPMs consists in the com- putation of these linear systems. Solving the corresponding linear systems with a direct method becomes very expensive for large scale problems. In this thesis, we have been concerned with using an iterative method for solving the reduced KKT systems arising in IPMs for linear programming. The augmented system form of this linear system has a number of advan- tages, notably a higher degree of sparsity than the normal equations form. We design a block triangular preconditioner for this system which is con- structed by using a nonsingular basis matrix identified from an estimate of the optimal partition in the linear program. We use the preconditioned con- jugate gradients (PCG) method to solve the augmented system. Although the augmented system is indefinite, short recurrence iterative methods such as PCG can be applied to indefinite system in certain situations. This ap- proach has been implemented within the HOPDM interior point solver. The KKT system is solved approximately. Therefore, it becomes neces- sary to study the convergence of IPM for this inexact case. We present the convergence analysis of the inexact infeasible path-following algorithm, prove the global convergence of this method and provide complexity analysis.
  • 4. Acknowledgements I would like to express my sincere thanks to Professor Jacek Gondzio. I can honestly say I have been extremely fortunate to have him as my supervisor. He has been my encyclopaedia of research knowledge. I would like to thank him for giving me this opportunity and having belief in me. I would like to thank Dr. Julian Hall for giving me the opportunity to work in programming with him. I have learnt a lot from him. I have been honoured to work with such an enlightened individual. I would also like to thank all who have given me motivation and helped me through out my Ph.D. Thanks to my friends who have shared with me hard moments as well as beautiful moments. I would like to thank all my friends who have introduced me to many different cultures and have contributed to an experience that I will never forget. The study could not have taken place without a sponsor. I would like to acknowledge the University of Damascus for sponsoring me throughout my Ph.D. I would also like to take this opportunity and thank my family, for their love and support in all my pursuits in life.
  • 5. Contents 1 Introduction 7 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3 The structure of the thesis . . . . . . . . . . . . . . . . . . . . 17 1.4 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2 Fundamentals 20 2.1 The Interior Point Method . . . . . . . . . . . . . . . . . . . . 20 2.1.1 The IPM for linear programming . . . . . . . . . . . . 20 2.1.2 The Primal-Dual Interior Point Algorithms . . . . . . . 22 2.2 Newton method . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.1 The convergence of Newton method . . . . . . . . . . . 29 2.2.2 Termination of the iteration . . . . . . . . . . . . . . . 30 2.2.3 Error in the function and derivative . . . . . . . . . . . 30 2.3 Inexact Newton method . . . . . . . . . . . . . . . . . . . . . 31 2.3.1 The convergence of Inexact Newton Method . . . . . . 31 2.4 Methods for solving a linear system . . . . . . . . . . . . . . . 33 2.4.1 Sparse Matrices . . . . . . . . . . . . . . . . . . . . . . 33 2.4.2 Direct Methods . . . . . . . . . . . . . . . . . . . . . . 35 2.4.2.1 Gaussian elimination . . . . . . . . . . . . . . 35 4
  • 6. 5 2.4.2.2 Cholesky factorisation . . . . . . . . . . . . . . 36 2.4.3 Iterative Methods . . . . . . . . . . . . . . . . . . . . . 36 2.4.3.1 Stationary Iterative Methods . . . . . . . . . . 37 a. Jacobi Method . . . . . . . . . . . . . . . 37 b. Gauss-Seidel Method . . . . . . . . . . . 37 c. Arrow-Hurwicz and Uzawa Methods . . . 38 2.4.3.2 Krylov Subspace Methods . . . . . . . . . . . . 39 a. Conjugate Gradient Method . . . . . . . 40 b. GMRES Method . . . . . . . . . . . . . . 45 c. BiConjugate Gradient Method . . . . . . 47 e. MINRES and SYMMLQ Method . . . . . 48 2.4.4 Null Space Methods . . . . . . . . . . . . . . . . . . . 48 3 The PCG Method for the Augmented System 52 3.1 Preconditioner . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.1 Solving equations with P . . . . . . . . . . . . . . . . . 65 3.2 Spectral analysis . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3 The PCG method for nonsymmetric indefinite system . . . . . 71 3.3.1 The convergence of the PCG method . . . . . . . . . . 75 3.4 Identifying and factorising the matrix B . . . . . . . . . . . . 82 3.4.1 Identifying the columns of B via Gaussian elimination 83 4 Inexact Interior Point Method 86 4.1 The residual of inexact Newton method . . . . . . . . . . . . . 90 4.2 Convergence of the IIPF Algorithm . . . . . . . . . . . . . . . 93 4.2.1 Inexact Infeasible Path-Following Algorithm . . . . . . 94 5 Numerical Results 108
  • 7. 6 6 Conclusions 119 7 Bibliography 122
  • 8. Chapter 1 Introduction Interior point methods constitute the core of many popular solvers for linear and nonlinear optimization. In linear programming however, that was not always the case due to the total dominance of the simplex method. The simplex method was invented by Dantzig in 1947. It is an iterative technique, where the iterates move from vertex to vertex until an optimal vertex is found. The simplex method may visit every vertex of the feasible polyhedron. That makes the complexity result of this method poor: the worst-case complexity of the simplex method is exponential in the problem dimension. Accordingly, there was great interest in finding a method with polynomial complexity. In 1984 Karmarkar presented a new polynomial-time algorithm for linear programming. He claimed to be able to solve linear programs up to 50 times faster than the simplex method. That was the start of the “interior point revolution” [48], which like many other revolutions, includes old ideas that are rediscovered or seen in a different light, along with genuinely new ones. See [3, 27, 76]. An interior point method (IPM for short) is a powerful tool to solve linear, quadratic and nonlinear programming problems. In this thesis we are 7
  • 9. Chapter 1. Introduction 8 concerned with the use of primal-dual interior point methods to solve large- scale linear programming problems. A primal-dual method is applied to the primal-dual formulation of linear program Primal Dual (P ) min cT x (D) max bT y s.t. Ax = b, s.t. AT y + s = c, x ≥ 0; y free, s ≥ 0, where A ∈ Rm×n , x, s, c ∈ Rn and y, b ∈ Rm . x, y and s are primal, dual and slack variables respectively. We assume that m ≤ n and the matrix A has full row rank. Primal-dual techniques are usually faster and more reliable than pure primal or pure dual approaches [3, 38, 77]. In order to solve problem (P), we need to find the solution of the Karush-Kuhn-Tucker (KKT) optimality conditions: Ax − b = 0 AT y + s − c = 0 (1.1) XSe = 0 (x, s) ≥ 0. where X = diag(x), S = diag(s) and e ∈ Rn is the vector of all ones. Interior point methods approach the optimal solution by moving through the interior of the feasible region. This is done by introducing a central path C joined with a parameter τ > 0. The central path C is an arc of strictly feasible points, which is defined as C = {(x, y, s) ∈ F0 : xi si = τ for all i = 1, ..., n},
  • 10. Chapter 1. Introduction 9 where F0 is the primal-dual strictly feasible set defined by F0 = {(x, y, s) : Ax = b, AT y + s = c, (x, s) > 0}. The KKT conditions are replaced by the following conditions: Ax − b = 0 AT y + s − c = 0 (1.2) XSe = τ e (x, s) > 0. These conditions differ from the KKT conditions only in the term µ and in the requirement for (x, s) to be strictly positive. The central path C is well defined because the system (1.2) has unique solution for each τ > 0. Furthermore, the points on the central path C converges to a primal-dual solution of the linear program (P) when τ converges to zero if F0 is nonempty. τ is equal to or smaller than the current barrier parameter µ = xT s/n. The target value τ = σµ is used, where σ ∈ [0, 1] is the centering parameter. See [77]. The previous system (1.2) can be rewritten as the following   Ax − b   F (t) =  AT y + s − c  = 0,     (1.3) XSe − σµe x > 0, s > 0, where t = (x, y, s). Most primal-dual algorithms take Newton steps toward points on central
  • 11. Chapter 1. Introduction 10 path C for which µ > 0, where the direction at each iteration is computed according to F (t)∆t = −F (t), (1.4) where F (t) is the derivative of F (t). That yields      A 0 0 ∆x Ax − b       0 AT I   ∆y  = −  A y + s − c  T . (1.5)          S 0 X ∆s XSe − σµe In computational practice, (1.5) is reduced: after substituting ∆s = −X −1 S∆x − s + σµX −1 e, (1.6) in the second row we get the following symmetric indefinite system of linear equations, usually called the augmented system      −1 T −Θ A ∆x f   = , (1.7) A 0 ∆y g where Θ = XS −1 , f = AT y − c + σµX −1 e and g = Ax − b. In many implementations, (1.7) is further reduced to the normal equations form AΘAT ∆y = AΘf + g. (1.8)
  • 12. Chapter 1. Introduction 11 1.1 Motivation The goal of this thesis is to explore how existing techniques in the areas of numerical analysis and linear algebra can be refined and combined into a new approach of a new inexact Newton method iteration to be employed in interior point methods. We are interested in using the preconditioned conjugate gradient method to solve the augmented system (1.7) and studying the convergence behaviour of the resulting interior point algorithm. In each iteration of interior point methods, one of the linear systems (1.7) or (1.8) has to be solved. The main computational effort of an interior point iteration is the solution of these linear systems. Accordingly, in recent years extensive research has been devoted to developing techniques for solving these systems. In chapter 2 we survey some of the popular solution methods for these linear systems. Historically, the normal equations system (1.8) was solved directly, be- cause this system is symmetric and positive definite and its dimension is smaller compared to the augmented system [52, 73]. In [24, 31, 32, 35], Cholesky factorisation is used to factorise the normal equations matrix into a lower triangular matrix multiplied with its transpose, then forward and backward substitutions are used to solve the normal equations. In order to speed up solving a linear system by Cholesky factorisation, a reordering for sparsity is required. There are two famous heuristic orderings, the minimum degree and the minimum local fill-in orderings, see [24, 31, 32, 35]. The size of optimization problems has been increasing dramatically. Solv- ing the linear systems (1.7) and (1.8) with a direct method is often very dif- ficult for large problems, even when ordering to exploit the sparsity is taken into consideration. This is due to three main reasons. Firstly, the normal equations (1.8) may easily get dense even though the constraint matrix A is
  • 13. Chapter 1. Introduction 12 not. Secondly, although the augmented system is usually sparse for a sparse constraint matrix, it is nevertheless, indefinite. Finally, the linear systems (1.7) and (1.8) become extremely ill-conditioned as the IPM approaches the solution, which leads to numerical instability. These difficulties make many researchers interested in finding alternative techniques for solving the linear systems (1.7) and (1.8). The idea was to use an iterative method to solve these linear systems. Iterative methods however, usually fail to solve these systems without preconditioning. The term preconditioning refers to trans- forming a linear system into another system with more favorable properties for iterative solution [75]. Therefore, there is an urgent need for designing good preconditioners, as a good preconditioner is the key ingredient for solv- ing a linear system iteratively. That makes a significant number of researchers tackle this issue [10, 28, 44, 45]. For the same reasons as above the normal equations system is nominated again to be solved by using an iterative method. As the system is sym- metric and positive definite, the preconditioned conjugate gradient (PCG) method [42] is an appropriate iterative method to solve this system. The PCG method is one of the most popular iterative methods, because it is a short recurrence method and it has strong convergence properties, see sec- tion 2.4. In [15, 42, 45, 54, 55], the PCG method is used to solve the normal equations. The preconditioners in [45, 54, 55] are the incomplete Cholesky factorisation of the normal equations matrix. The incomplete Cholesky fac- torisation was proposed by Meijerink and Van Der Vorst (1977) [56] to be used with symmetric Hermitian matrices. There are two strategies of identi- fying the position of the nonzero elements in this factorisation: the fixed fill-in strategy and the drop-tolerance strategy, see [12]. These types of precondi- tioner do not always work as well as expected. However, they are constructed
  • 14. Chapter 1. Introduction 13 by using fascinating techniques of linear algebra. These preconditioners are effective in the early stage of IPM, but they start to struggle in the final iterations. This is due to the extreme ill-conditioned nature of this system in the final iterations of IPM. Therefore, it is necessary to design a precon- ditioner after understanding the nature of the problem, in particular at the final iterations of IPM. We notice that iterative methods struggle to solve the linear systems in the final iterations of an IPM, due to the extreme ill-conditioning of these systems. Therefore we are concerned with finding an iterative approach to solve these linear systems efficiently in the final iterations of an IPM. In this thesis we will convert the disadvantages of the final iterations of IPM into an advantage, and we will construct our preconditioner for the augmented system (1.7) by exploiting the issues that leads to the ill-conditioning of this system. There are many important reasons why we choose to work with the aug- mented system. The first reason is that the augmented system is sparser compared with the normal equations. Factoring the augmented system (1.7) often produces significant savings in the number of nonzero entries over fac- toring the normal equations. The existence of a dense column in the con- straint matrix A results in a straightforward dense normal equations matrix. For an example of such a situation, see [3, 24] and the references therein. Compared with Cholesky factorisation for the normal equations, the aug- mented system factorisation enjoys an additional degree of freedom resulting from the ability to interchange pivots between diagonal elements of Θ and diagonal elements of the already filled (2, 2) block in (1.7). We aim to exploit these advantages when we construct our preconditioner for the augmented system.
  • 15. Chapter 1. Introduction 14 The second reason is that the augmented system may have a better con- dition number compared to the normal equations, after suitable scaling as suggested in [4]. The ill-conditioning in these systems is due to the matrix Θ, since some of its elements move toward zero and the others move to- ward infinity. The position of Θ in the augmented system makes it easier to control the ill-conditioning of the augmented system when designing a preconditioner. The final reason comes from the analysis by Oliveira and Sorensen [60] who propose a preconditioner for the augmented system (1.7), and then re- duce the preconditioned system to positive definite normal equations, al- lowing them to use the conjugate gradients method to solve (1.8). They show in [60] that all preconditioners for the normal equations system have an equivalent for the augmented system, while the converse is not true. More precisely, they show that the whole classes of (different) preconditioners for the augmented system can result in the same preconditioner for the nor- mal equations. We consider this to be a strong argument for constructing a preconditioner for the augmented system. 1.2 Contributions The contributions of this research are as follows. First, we design a block triangular preconditioner for the augmented sys- tem (1.7). To construct this preconditioner, we partition the constraint ma- trix A into two matrices. The first one is nonsingular matrix with size m, while the other one is the remaining matrix. The idea is to use the basic and nonbasic partition which is used in the simplex method, with one mean different; in the simplex method one has exactly m basic and n − m nonbasic
  • 16. Chapter 1. Introduction 15 variables at each iteration, while in interior point method this is true in the optimal solution. So, such partition becomes clearer at final iterations of interior point method, where we suggest using our preconditioner. The non- singular matrix in our partition represents an approximation of the basic part of the variables. After designing this preconditioner, we perform a spectral analysis of the preconditioned matrix. We also show that the preconditioned matrix has n + m − p unit eigenvalues and the remaining eigenvalues are positive and greater or equal one, where p is the rank of the second matrix of the partition of A. We propose preconditioner for the augmented system and go a step fur- ther than in [60]. Instead of reducing the augmented system to normal equa- tions and then applying an iterative method, we use the preconditioned con- jugate gradients method to solve the indefinite system (1.7). We are aware of the disadvantages associated with applying the PCG method to indef- inite systems [26]. However, we are motivated by the recent analyses of Lukˇan and Vlˇek [51] and Rozlozn´ and Simoncini [65] showing that short s c ık recurrence iterative methods such as conjugate gradients can be applied to indefinite systems in certain situations. We show in particular that the anal- ysis of [65] may be applied to the preconditioner proposed in this thesis. We prove that the PCG method, when applied to the indefinite system (1.7) preconditioned with our proposed preconditioner, converges as in the case of a symmetric positive definite system. The convergence of the PCG method is proved by showing that the error term and the residual converge to zero. The error and the residual bounds are given by Theorem 3.3.4 and Theorem 3.3.5 respectively, which is related to symmetric positive definite matrices. We have implemented this iterative approach in the final iterations of the interior point solver HOPDM when the normal equations system becomes
  • 17. Chapter 1. Introduction 16 ill conditioned. The implementation within HOPDM shows remarkable im- provement on a series of problems, see the numerical results in Chapter 5. A consequence of using an iterative method to solve the linear systems which arise in interior point methods is that the search direction is computed approximately. Hence, instead of the pure Newton iteration (1.4), we now have the following F (tk )∆tk = −F (tk ) + rk , which is an inexact Newton iteration. This causes a major difference in an interior point algorithm, whose convergence is proved under the assumption that the search directions are calculated exactly. Our final contribution is the convergence analysis of an interior point algorithm with our specific inexact Newton direction. We use the PCG method to solve the augmented system preconditioned with a block triangular matrix P . This yields a specific inexact interior point method. In this thesis we focus on the convergence analysis of one interior point algorithm for this inexact case. This algorithm is the infeasible path- following (IPF) algorithm. For the inexact case, we refer to this algorithm as the inexact infeasible path-following (IIPF) algorithm. We prove global convergence and provide a complexity result for the IIPF algorithm. We design a suitable stopping criteria for the PCG method. This plays an important role in the convergence of the IIPF algorithm. This stop- ping criterion allows a low accuracy when the current iterate is far from the solution. We impose some conditions on the forcing term of the inexact New- ton method in order to prove the convergence of the IIPF algorithm. Note that the same analysis can be used in the cases where the augmented system is solved iteratively, providing that the residual of this iterative method has
  • 18. Chapter 1. Introduction 17 a zero block in its second component corresponding to (2, 2) block in (1.7) such that r = [r1 , 0]. Thus we can carry out this approach to cases like [65], for example. The original results presented in this thesis have been the basis for two papers that have been accepted for publication, jointly with Jacek Gondzio and Julian Hall [2], and with Jacek Gondzio [1]. 1.3 The structure of the thesis This thesis is structured as follows. In Chapter 2, we introduce and formalise the primal-dual interior point method for linear programming. Also in this chapter we present some of the well known feasible and infeasible interior point algorithms. Moreover, Chapter 2 review the convergence behaviour of Newton and inexact Newton methods. Furthermore, in this chapter we discuss several well known methods to solve a linear system. We introduce briefly a few direct methods and discuss extensively several iterative methods. As in this thesis we are concerned with the use of an iterative method to solve the linear systems which arise from IPMs, we mainly focus on the Krylov subspace methods in this chapter. In Chapter 3 firstly, we present preconditioners for the augmented sys- tem which have been constructed in the last few years. Secondly, we propose our new block triangular preconditioner and we perform a spectral analysis of this preconditioner. Moreover, in this chapter we take a closer look at the be- haviour of conjugate gradients for the indefinite system: we follow [65] in the analysis of our preconditioner. Furthermore, we prove that the convergence of the PCG method applied to the indefinite system (1.7) preconditioned with the proposed preconditioner, is similar to the convergence of the PCG
  • 19. Chapter 1. Introduction 18 method applied to a positive definite system. Finally, we discuss the issues involved in the identification of a suitable subset of columns to produce a well-conditioned matrix. In Chapter 4 we compute the residual of the inexact Newton method and choose suitable stopping criteria to the PCG method which makes sense for the convergence of the inexact Newton method. In addition in this chapter we perform the convergence analysis and provide the complexity result for the IIPF Algorithm. We have implemented the conjugate gradients method with the indefi- nite preconditioner in the context of the HOPDM interior point solver and we have applied it to solve a number of medium and large-scale linear pro- gramming problems. In Chapter 5, we discuss our computational experience. In Chapter 6 we draw our conclusions and discuss possible future develop- ments. 1.4 Notations Throughout the thesis, we use the following notation. By R we denote the set of real number. For a natural number n, the symbol Rn denotes the set of vectors with n components in R. Greek letters denote scalars, lower-case letters denote vectors and upper-case letters denote matrices. The ith row and jth column component of the matrix A is denoted by aij . The iden- tity matrix will be denoted by I, a subscript will determine its dimension when it is not clear from context. The symbol . represents the Euclidean √ norm ( x = xT x). The symbol . G represents the G-norm for a symmet- √ ric positive definite matrix G ( x G = xT Gx). The F and F0 denote the primal-dual feasible and strictly feasible sets respectively. The N2 () or N−∞ ()
  • 20. Chapter 1. Introduction 19 denote the interior point method neighbourhood, since most primal-dual al- gorithms take Newton step toward points in specific neighbourhood. The point t∗ = (x∗ , y ∗ , s∗ ) denotes the optimal solution of interior point method. The sequence {tk } = {(xk , y k , sk )} denotes the interior point iterations. The ξ k = (ξp , ξd , ξµ ) denotes the right hand side of the Newton method system k k k (1.5) at iterate k. The rk = (rp , rd , rµ ) denotes the inexact Newton method k k k k residual at iterate k. The rP CG denotes the residual on the kth PCG itera- tion. The ek denotes the error on the kth PCG iteration, unless otherwise stated. For any vector v is in (1.7), v = [v1 , v2 ] and v1 = [vB , vN ], where vB ∈ Rm . The PCG method residual rP CG = [r1 , r2 ] and r1 = [rB , rN ]. k k k k k k
  • 21. Chapter 2 Fundamentals 2.1 The Interior Point Method 2.1.1 The IPM for linear programming It is widely accepted that the primal-dual interior point method is the most efficient variant of interior point algorithms for linear programming [3, 77]. The usual transformation in interior point methods consists of replacing in- equality constraints by the logarithmic barrier. The primal barrier problem becomes: n min cT x − µ ln xj j=1 s.t. Ax = b, where µ > 0 is a barrier parameter. The Lagrangian associated with this problem has the form: n T T L(x, y, µ) = c x − y (Ax − b) − µ ln xj j=1 20
  • 22. Chapter 2. Fundamentals 21 and the conditions for a stationary point are x L(x, y, µ) = c − AT y − µX −1 e = 0 y L(x, y, µ) = Ax − b = 0, where X −1 = diag{x−1 , x−1 , . . . , x−1 }. Denoting s = µX −1 e, 1 2 n i.e. XSe = µe, where S = diag{s1 , s2 , . . . , sn } and e = (1, 1, . . . , 1)T , the first order op- timality conditions (for the barrier problem) are: Ax = b, AT y + s = c, (2.1) XSe = µe (x, s) > 0. The interior point algorithm for linear programming applies Newton method to solve this system of nonlinear equations and gradually reduces the barrier parameter µ to guarantee convergence to the optimal solution of the original problem. The Newton direction is obtained by solving the system of linear equations:      A 0 0 ∆x b − Ax       0 AT I   ∆y  =  c − AT y − s , (2.2)           S 0 X ∆s −XSe + µe By eliminating ∆s = −X −1 S∆x + µX −1 e,
  • 23. Chapter 2. Fundamentals 22 from the second equation we get the symmetric indefinite augmented system of linear equations      −Θ−1 AT ∆x f   = . A 0 ∆y g where Θ = XS −1 ∈ Rn×n is a diagonal scaling matrix and the right-hand-side vectors satisfy f = AT y + s − c − X −1 (XSe − µe) and g = Ax − b. 2.1.2 The Primal-Dual Interior Point Algorithms Primal-dual interior point algorithms are the most important, efficient and useful interior point algorithms for linear programming. That is because of the strong theoretical properties and the practical performance of these algorithms. Since 1994 researchers have understood well the properties and theoretical background of primal-dual interior point algorithms [3, 37, 53, 77, 79] and the references therein. In this section we briefly review several feasible primal-dual interior point algorithms and an infeasible primal-dual interior point algorithm. Primal-dual interior point methods find primal-dual solutions (x∗ , y ∗ , s∗ ) by applying Newton method to the optimality conditions in (2.1) and by mod- ifying the search directions and step lengths so that the inequality (x, s) > 0 is satisfied strictly at every iteration [77]. Most primal-dual algorithms take Newton steps toward points in a specific neighbourhood. This neighbourhood guarantees to keep (x, s) strictly positive and to prevent xi si from becoming too small relatively for all i = 1, ..., n. In this section we introduce a few feasible primal-dual interior point algorithms and an infeasible primal-dual interior point algorithm. For a feasible algorithm the two most interesting
  • 24. Chapter 2. Fundamentals 23 neighbourhoods are N2 and N−∞ . The N2 neighbourhood is defined by N2 (θ) = {(x, y, s) ∈ F0 : XSe − µe 2 ≤ θµ} for some θ ∈ (0, 1). The N−∞ neighbourhood is defined by N−∞ (γ) = {(x, y, s) ∈ F0 : xi si ≥ γµ, ∀i = 1, 2, ..., n} for some γ ∈ (0, 1). By choosing γ close to zero, N−∞ (γ) encompass most of the feasible region. However, N2 (θ) is more restrictive, since certain points in F0 do not belong to N2 (θ) no matter how close θ is chosen to its upper bound [77]. In other words, N2 (θ) contains only a small fraction of the points in F0 , while N−∞ (γ) takes up almost all the entire of F0 for small γ, which makes N−∞ (γ) much more expansive when γ is small. See [77]. For infeasible algorithms neighbourhoods should guarantee an extra con- dition; namely the residuals should decrease at each iteration. The extension of N−∞ (γ) for infeasible algorithms is N−∞ (γ, β), which is defined by N−∞ (γ, β) = {(x, y, s) : (ξp , ξd ) /µ ≤ β (ξp , ξd ) /µ0 , (x, s) > 0, 0 0 xi si ≥ γµ, i = 1, 2, ..., n}, where the residuals ξp = Ax − b and ξd = AT y + s − c. See [77]. In primal-dual interior point methods, the initial solution (x0 , y 0 , s0 ) should belong to the neighbourhood. At each iteration, solution should also belong to this neighbourhood. The solution at iteration k is given by (xk , y k , sk ) = (xk−1 , y k−1 , sk−1 )+αk (∆xk , ∆y k , ∆sk ), where αk is the step length and (∆xk , ∆y k , ∆sk )
  • 25. Chapter 2. Fundamentals 24 is the direction, which is given by:      k k A 0 0 ∆x Ax − b      T I   ∆y k  =  AT y k + sk − c , (2.3)       0 A      Sk 0 X k ∆s k −X k S k e + σk µk e where σk ∈ [0, 1] is centering parameter. Choosing σk plays a crucial role in primal-dual interior point algorithms. If σk = 1, the equation (2.3) gives a centering direction, which improves centrality (all xi si are close to µ) and makes little progress in reducing µ. If σk = 0 that gives the standard Newton step, which reduces µ. One can choose the centering parameter σk and the step length αk to ensure that an iterate stays within the chosen neighbour- hood. See [77]. For feasible algorithms, the iterations belong to F0 , so for any iteration k we have Axk = b and AT y k + sk = c. That makes the first and the second rows of the right hand side of (2.3) equal to zero. So for feasible algorithms (2.3) replaced by:      k A 0 0 ∆x 0      T I   ∆y k  =  . (2.4)       0 A 0      Sk 0 Xk ∆sk −X k S k e + σk µk e The interior point algorithms which we mention in this section have a global linear convergence. An algorithm has a global convergence if the algo- rithm guarantees to converge to the solution from any approximation. The sequences {µk } converges linearly to zero if µk+1 ≤ δµk , where δ ∈ (0, 1). Knowing that an algorithm has global convergence and the rate of this conver- gence alone will not give the whole picture. It is necessary, to know the time requires an algorithm to solve a given instance of linear programming prob-
  • 26. Chapter 2. Fundamentals 25 lem. Complexity theory has been concerned with the worst case behaviour of algorithms. Complexity result is an upper bound on the time required algorithm to solve a problem. For example, the short-step path-following √ algorithm has a polynomial complexity result in the order of O( n log 1/ ), √ where > 0. This gives that there is an index K with K = O( n log 1/ ) such that µk ≤ for all k ≥ K. See [77]. The Short-Step Path-Following Algorithm (SPF Algorithm): √ • Given θ = 0.4, σ = 1 − 0.4/ n, and (x0 , y 0 , s0 ) ∈ N2 (θ). • For k = 0, 1, ... set σk = σ and solve (2.4) to obtain (∆xk , ∆y k , ∆sk ); set (xk+1 , y k+1 , sk+1 ) = (xk , y k , sk ) + (∆xk , ∆y k , ∆sk ). This algorithm has a global linear convergence and a polynomial complexity √ result in the order of O( n log 1/ ) [77]. The Predictor-Corrector Algorithm (PC Algorithm): • Given (x0 , y 0 , s0 ) ∈ N2 (0.25). • For k = 0, 1, ... if k is even (predictor step) solve (2.4) with σk = 0 to obtain (∆xk , ∆y k , ∆sk ); choose αk as the largest value of α ∈ [0, 1] such that (xk (α), y k (α), sk (α)) ∈ N2 (0.5), where (xk (α), y k (α), sk (α)) = (xk , y k , sk ) + α(∆xk , ∆y k , ∆sk ); set (xk+1 , y k+1 , sk+1 ) = (xk (α), y k (α), sk (α));
  • 27. Chapter 2. Fundamentals 26 else (corrector step) solve (2.4) with σk = 1 to obtain (∆xk , ∆y k , ∆sk ); set (xk+1 , y k+1 , sk+1 ) = (xk , y k , sk ) + (∆xk , ∆y k , ∆sk ) The parameter σk is chosen to be either 0 or 1. This choice has the following meaning: improving centrality (corrector step) and reducing the duality mea- sure µ (predictor step). Also this algorithm has a global linear convergence √ and a polynomial complexity result in the order of O( n log 1/ ). However, this algorithm is a definite improvement over the short-step algorithm be- cause of the adaptivity that is built into the choice of predictor step length. See [77]. The Long-Step Path-Following Algorithm (LPF Algorithm): • Given γ, σmin , σmax with γ ∈ (0, 1), 0 < σmin < σmax < 1, and (x0 , y 0 , s0 ) ∈ N−∞ (γ). • For k = 0, 1, ... set σk ∈ [σmin , σmax ]; solve (2.4) to obtain (∆xk , ∆y k , ∆sk ); choose αk as the largest value of α ∈ [0, 1] such that (xk (α), y k (α), sk (α)) ∈ N−∞ (γ); set (xk+1 , y k+1 , sk+1 ) = (xk (α), y k (α), sk (α)). This algorithm has a global linear convergence and a polynomial complexity result in the order of O(n log 1/ ) [77]. In [39] the authors show that the √ complexity result for long-step primal-dual algorithm is O( nL) iterations where L is the size of the input. In most cases it is quite difficult to find a strictly feasible starting point (a point which belongs to F0 ). In this case one can use an infeasible interior
  • 28. Chapter 2. Fundamentals 27 point algorithm. The Infeasible Path-Following Algorithm (IPF Algorithm): 1. Given γ, β, σmin , σmax with γ ∈ (0, 1), β ≥ 1, and 0 < σmin < σmax < 0.5; choose (x0 , y 0 , s0 ) with (x0 , s0 ) > 0; 2. For k = 0, 1, 2, ... • choose σk ∈ [σmin , σmax ]; and solve      k k A 0 0 ∆x ξp       0 AT I   ∆y k  =  k ,      ξd      Sk 0 X k ∆sk −X k S k e + σk µk e where ξp = Axk − b and ξd = AT y k + sk − c k k • choose αk as the largest value of α in [0, 1] such that (xk (α), y k (α), sk (α)) ∈ N−∞ (γ, β) and the following Armijo condition holds: µk (α) ≤ (1 − .01α)µk ; • set (xk+1 , y k+1 , sk+1 ) = (xk (αk ), y k (αk ), sk (αk )); • stop when µk < , for a small positive constant . β is chosen such that β ≥ 1 to ensure that the initial point belongs to the neighbourhood N−∞ (γ, β). This algorithm has a global linear convergence and a polynomial complexity result in the order of O(n2 | log |) [77]. In [78]
  • 29. Chapter 2. Fundamentals 28 the author shows that the complexity result for the infeasible path-following √ algorithm is O( nL) iterations where L is the size of the input. 2.2 Newton method In this section we give a closer look at Newton method, inexact Newton method and their convergence analysis. However, the convergence analysis of interior point methods follow a different path from the convergence analysis of Newton method, even though, interior point method takes Newton steps toward points on certain neighbourhood. Newton method is an iterative method which is used to solve a system of nonlinear equations. See [47, 61]. F (t) = 0. (2.5) Newton iterations are given by tk+1 = tk − F (tk )−1 F (tk ), (2.6) where tk+1 is the new iterate and tk is the current iterate. Assume the problem (2.5) has the solution t∗ . We can approximate the function with a polynomial by using Taylor expansion about tk . F (tk ) F (t) = F (tk ) + F (tk )(t − tk ) + (t − tk )2 + .... 2 F (t) ≈ Mk (t) = F (tk ) + F (tk )(t − tk )
  • 30. Chapter 2. Fundamentals 29 Let tk+1 be the root of the Mk (t) then 0 = Mk (tk+1 ) = F (tk ) + F (tk )(tk+1 − tk ) which implies (2.6). Let ∆tk = tk+1 − tk then (2.6) becomes F (tk )∆tk = −F (tk ). (2.7) See [47] for more details. 2.2.1 The convergence of Newton method The Newton method is attractive because it converges quadratically starting from any sufficiently good initial guess t0 . See [47]. Definition: β(δ) denote the ball of radius δ about the solution t∗ β(δ) = {t : e < δ}, where e is the error of the current iterate, e = t − t∗ . The standard assumptions: 1. Equation (2.5) has a solution t∗ . 2. F is Lipschitz continuous with Lipschitz constant γ. 3. F (t∗ ) is nonsingular. The following theorem shows that if the standard assumptions hold the func- tion F (t) satisfies the following properties, Kelley [47, Theorem 5.1.1].
  • 31. Chapter 2. Fundamentals 30 Theorem 2.2.1. Let the standard assumptions hold. If there are K > 0 and δ > 0 such that Kδ < 1 and tk ∈ β(δ), where the Newton iterate tk given by (2.6), then ek+1 ≤ K ek 2 . (2.8) This theorem shows that Newton method has a local convergence, since the initial solution t0 is chosen to be near the solution t∗ . Furthermore, Newton method converges quadratically, see (2.8). 2.2.2 Termination of the iteration The iteration is terminated when the ratio F (t) / F (t0 ) is small [47]. More generally the termination conditioned can be written as F (t) ≤ τr F (t0 ) + τa , (2.9) where τr is the relative error tolerance and τa is the absolute error tolerance [47]. 2.2.3 Error in the function and derivative Suppose that F and F are computed inaccurately so that F + ε and F + ζ are used instead of F and F in iterations. Under this case Newton iterations should be tk+1 = tk − (F (tk ) + ζ(tk ))−1 (F (tk ) + ε(tk )). (2.10) Theorem 2.2.2. Let the standard assumptions hold. Assume F (tk ) + ζ(tk ) ¯ is nonsingular. If there are K > 0, δ > 0, and δ1 > 0 such that ζ(tk ) < δ1
  • 32. Chapter 2. Fundamentals 31 and tk ∈ β(δ), where tk is given by (2.10), then ¯ ek+1 ≤ K( ek 2 + ζ(tk ) ek + ε(tk ) ). (2.11) Proof: Kelly [47, Theorem 5.4.1]. 2.3 Inexact Newton method Solving the linear equation (2.7) exactly can be very expensive. There- fore, this linear equation can be solved approximately by using an iterative method. So instead of (2.7) we get F (tk )∆tk = −F (tk ) + rk . (2.12) The process is stopped when the residual rk satisfies rk ≤ ηk F (tk ) . (2.13) We refer to the term ηk as the forcing term. See [20, 47]. 2.3.1 The convergence of Inexact Newton Method The following theorems illustrate the convergence of inexact Newton method. By comparing the error of Newton method (2.8) with the error of inexact Newton method (2.14), we note that the forcing term in the condition (2.13) plays an important role in the convergence of inexact Newton method. There- fore, choosing a stopping criterion for the residual of inexact Newton method affects directly on its convergence.
  • 33. Chapter 2. Fundamentals 32 Theorem 2.3.1. Let the standard assumptions hold. If there are δ and KI such that tk ∈ β(δ) and (2.13), where ∆tk satisfies (2.12), then ek+1 ≤ KI ( ek + ηk ) ek . (2.14) Proof: Kelly [47, Theorem 6.1.1]. However, in the Newton method the error term satisfies ek+1 ≤ K ek 2 . Theorem 2.3.2. Let the standard assumptions hold. If there are δ and η such that t0 ∈ β(δ) and {ηk } ⊂ [0, η], then the inexact Newton iteration tk+1 , which satisfies (2.13), converges linearly to t∗ . Moreover, • if ηk → 0 the convergence is superlinear. • if ηk ≤ Kη F (tk ) p for some Kη > 0 the convergence is superlinear with order 1 + p. Proof: Kelly [47, Theorem 6.1.4]. The superliner convergence is defined as the following ek+1 ≤ ek , where → 0. The superliner convergence with order 1 + p is defined as the following ek+1 ≤ ek p , where ∈ (0, 1).
  • 34. Chapter 2. Fundamentals 33 2.4 Methods for solving a linear system In this section, we discuss several methods to solve the following linear system Hu = q. This system represents either the augmented system (1.7) or the normal equations (1.8). H ∈ R × , where = n + m for the augmented system and = m for the normal equations, respectively. For most problems, the linear system Hu = q is sparse. Before introduc- ing methods for solving this system, we first focus on the sparsity of linear system. 2.4.1 Sparse Matrices A matrix is sparse if many of its coefficients are zero. It is very important to highlight sparsity for two main reasons. Firstly, many large scale problems, which occur in practice, have sparse matrices. Secondly, exploiting sparsity can lead to enormous computational saving. See [24]. To illustrate the po- tential saving from exploiting sparsity, we consider a small example. Suppose we want to solve a system with the following matrix       H=   ,       The term represents a nonzero coefficient, while the coefficients are zero elsewhere. Gaussian elimination can be used, for instance, to solve this system. It is used to reduce the matrix H to an equivalent upper triangular matrix U by applying rows operations. The first step of Gaussian elimination leads to
  • 35. Chapter 2. Fundamentals 34 the following matrix        f f   ,    f f    f f f where f represents a fill-in. The elimination operations change a zero coeffi- cient into a nonzero one, (we refer to this by the term fill-in). A fill-in requires additional storage and operations. This elimination leads to full active sub- matrix (3 × 3 matrix; its columns: 2,3,4 and its rows: 2,3,4). Consequently, all Gaussian elimination steps from no one will be dense. However, if we do rows/columns ordering to H we can control the amount of fill-in. For our example, swapping between row 1 and row 4 leads to the equivalent matrix          .       That leads to an upper triangular matrix without requiring any eliminations. This saves us extra storages and extra operations. The problem of finding the ordering which minimizes fill-in is NP-complete [77]. However, there are good ordering heuristics which preform quite well in practices. There are two famous heuristic orderings, the minimum degree and the minimum local fill-in orderings, see [24, 31, 32, 35].
  • 36. Chapter 2. Fundamentals 35 2.4.2 Direct Methods The main focus of this thesis is the use of an iterative method to solve the linear system which arises from the IPMs. However, we will highlight briefly some direct methods. 2.4.2.1 Gaussian elimination Gaussian elimination is one of the most well known direct methods. It is used to reduce the matrix H to an upper triangular matrix U by applying row operations. Diagonal elements are chosen to be the pivots. If a diagonal element is zero, a row interchange has to be carried out. The reduction for H is performed by using elementary row operations which can be written as L −1 ...L2 L1 H = U. That can be written as H = LU, where L is a unit lower triangular matrix, and its elements lij are precisely the multipliers which are used in the elimination to vanish the element at the (i, j) position in U . This decomposition of H is called LU factorisation of H. See [57] for more details. The computation cost of this method can be expressed 2 as 3 + O( 2 ) flops, where each addition, subtraction, multiplication, division or square root counts as a flops [71].
  • 37. Chapter 2. Fundamentals 36 2.4.2.2 Cholesky factorisation Cholesky factorisation method is used to decompose symmetric positive def- inite matrices. This factorisation produces a lower triangular matrix L with positive diagonal elements such that H = LLT . Solving Hu = q is equivalent to solving two systems one with a forward substitution and the other with a backward substitution, Lv = r, LT u = v. We assume the constraint matrix A has a full row rank, so the matrix of the normal equations system (1.8) will be symmetric and positive definite. The use of Cholesky factorization to solve the normal equations is a common choice, see for example [35]. 2.4.3 Iterative Methods The standard approach uses the direct method to solve the normal equa- tions (symmetric positive definite system) by sparse Cholesky factorisation. However, for large-scale problems, the computational effort of direct meth- ods can become sometimes very expensive. Therefore, an iterative method is employed to solve the linear system which arises from IPMs. Iterative method solves the problem approximately. It generates a se- quence of iterations starting from an initial guess and terminates when the found solution is close enough to the exact solution or when the residual gets sufficiently small.
  • 38. Chapter 2. Fundamentals 37 2.4.3.1 Stationary Iterative Methods The first iterative methods which were used to solve large linear systems were based on relaxation of the coordinates. Starting from initial approximation solution, these methods modify the approximation solution until convergence is reach. Each of these modifications, called relaxation steps [66]. The iter- ations of these methods are based on splitting the matrix H into the form H = H1 + H2 , where H1 is a non-singular matrix. Then the system Hu = q is converted to the fixed point problem u = H1 (q − H2 u). By beginning −1 with an initial solution u0 the iterations of these methods is generated by uj+1 = H1 q − H1 H2 uj . −1 −1 See [66, 74, 80]. Among different stationary methods we mention: Jacobi, Gauss-Seidel, sucessive overrelaxation (SOR), Arrow-Hurwicz and Uzawa methods. The stationary methods are now more commonly used as pre- conditioners for the Krylov subspace methods. Jacobi Method Jacobi method uses the splitting H1 = D and H2 = L + U , as the matrix H is written as the following H = D +L+U , where D is diagonal matrix , L is a nondiagonal lower triangular matrix and U is a nondiagonal upper triangular matrix, [47]. Jacobi method converges to solution for all right hand side q, if 0 < j=i |hij | < |hii | for all 1 ≤ i ≤ , see [47, Theorem 1.4.1]. Gauss-Seidel Method The coefficient matrix for this method is also written as the following H = D + L + U . The Gauss-Seidel method uses the splitting H1 = D + U and
  • 39. Chapter 2. Fundamentals 38 H2 = L, [47]. This method converges for the same conditions of convergence of the Jacobi method, [43]. Arrow-Hurwicz and Uzawa Methods These iterative methods are used to solve saddle point problems, such as the augmented system (1.7). The idea of these stationary methods is to split the matrix H so that these methods become simultaneous iterations for both ∆x and ∆y [8]. The iterations of Uzawa’s method is given as follow: ∆xj+1 = Θ(AT ∆y j − f ), ∆y j+1 = ∆y j + ω(A∆xj+1 − g), where ω > 0 is relaxation parameter. Accordingly, the splitting matrices are given by     −1 T −Θ 0 0 A H1 =  , H2 =  . 1 1 A −ωI 0 ω I The iterations of Arrow-Hurwicz method is given as follow: ∆xj+1 = ∆xj + α(f + Θ−1 ∆xj − AT ∆y j ), ∆y j+1 = ∆y j + ω(A∆xj+1 − g), where α and ω are relaxation parameters. The splitting matrices are given by     1 1 −1 T I 0 −αI −Θ A H1 =  α , H2 =  . 1 1 A −ωI 0 ω I
  • 40. Chapter 2. Fundamentals 39 For more detail on Arrow-Hurwicz and Uzawa methods see [8] and the ref- erences therein. 2.4.3.2 Krylov Subspace Methods Krylov subspace methods are a family of iterative methods to solve a linear system of the form Hu = q. (2.15) Krylov subspace methods extract an approximate solution uj from an affine subspace u0 + Kj of dimension j, where u0 is an arbitrary initial guess to the solution of (2.15). The Krylov subspace is defined by Kj (H, r0 ) = span{r0 , Hr0 , H2 r0 , ..., Hj−1 r0 }, (2.16) for j ≥ 1. The residual r0 is given by r0 = q − Hu0 . See [47, 66]. The dimension of the subspace increases by one at each step of the ap- proximation process. The Krylov subspace has the following properties. The first property is that Kj is the space of all vectors in the space which can be written as u = pj−1 (H)r0 , where pj−1 is a polynomial of degree not exceeding j − 1. The other property is that the degree of the minimal polynomial of r0 with respect to H (it is a polynomial such that pj−1 (H)r0 = 0) does not exceed the size of the space dimension [66]. There exist many Krylov subspace methods, a few of the most important ones will be highlighted in the following discussion.
  • 41. Chapter 2. Fundamentals 40 Conjugate Gradient Method (CG) Conjugate gradient (CG) method is one of the most popular iterative meth- ods. This method is used to solve symmetric and positive definite linear systems. Many studies analyse the CG method [33, 42, 47, 66, 68, 72], and many papers use it to solve the linear systems which arise from interior point methods [13, 15, 40, 45, 54, 55]. In [42, 68] the authors explain the idea of conjugacy. The idea is to pick up a set of H-orthogonal search directions and to take exactly one step with the right length, in each one of them. Then the solution will be found after steps. Two vectors v and w are H-orthogonal if v T Hw = 0. At each step the iterate will be uj+1 = uj + αj dj , where αj is the step length and dj is the direction. Let the error term be defined by ej = uj − u∗ . The step length αj is chosen such that the search direction is H-orthogonal to the error ej+1 . Consequently, αj is chosen as follows: (dj )T Hej+1 = 0, which implies (dj )T H(ej + αj dj ) = 0. That leads to (dj )T Hej αj = − . (dj )T Hdj
  • 42. Chapter 2. Fundamentals 41 Unfortunately, we do not know the ej . If we know ej the problem would be solved. On the other hand, the residual is given as rj = q − Huj , which can be written Huj = q − rj which is equivalent to Huj − Hu∗ = q − rj − Hu∗ that leads to Hej = −rj . So the step length can be written as (dj )T rj αj = . (dj )T Hdj All we need now is to find the set of H-orthogonal search direction {dj }. In order to find this set, we assume we have linearly independent columns −1 z 0 , ..., z . We choose d0 = z 0 and for j > 0, set j−1 dj = z j + βk,j dk . k=0 The βj,i is chosen such that (dj )T Hdi = 0 for j > i. So for j > i (z j )T Hdi βj,i = − . (di )T Hdi In the CG method the search directions are constructed by the conjuga- tion of the residuals. So z j = rj . This choice makes sense because the residual is orthogonal to the previous search directions which guarantees producing a new linearly independent search direction unless the residual is zero. When the residual is zero, the solution is found. These properties guarantee that the CG method is a short recurrence (CG method does not require to save all previous search directions) Krylov subspace method. In the CG method αj and βj,i can be expressed as (rj )T rj (rj )T Hdi αj = , βj,i = − for j > i. (dj )T Hdj (di )T Hdi
  • 43. Chapter 2. Fundamentals 42 where the search direction is written as j−1 j j d =r + βj,k dk . k=0 The residual can be rewritten as ri+1 = ri − αi Hdi . That because ri+1 = q − Hui+1 = q − H(ui + αi di ) = ri − αi Hdi . So we have (rj )T ri+1 = (rj )T ri − αi (rj )T Hdi ⇒ αi (rj )T Hdi = (rj )T ri − (rj )T ri+1 . Since the residual is orthogonal to the previous residuals [68]. This leads to   − 1 (rj )T rj j = i + 1, (r ) Hd = j T i αj−1  0 j > i + 1. That gives  (rj )T rj  (dj−1 )T rj−1 j = i + 1, βj,i =  0 j > i + 1. Let βj = βj,j−1 . So the search direction can be expressed as dj = rj + βj dj−1 . Consequently, CG method is a short recurrence method, because it is re- quired to save the immediate previous direction only.
  • 44. Chapter 2. Fundamentals 43 The CG Algorithm: • Given an initial solution u0 , r0 = q − Hu0 and d0 = r0 . • For j = 0, 1, ... while rj > do αj = (rj )T rj /(dj )T Hdj , uj+1 = uj + αj dj , rj+1 = rj − αj Hdj , βj+1 = (rj+1 )T rj+1 /(rj )T rj , dj+1 = rj+1 + βj+1 dj . Theorem 2.4.1. Let H be symmetric positive definite. Then the CG algo- rithm will find the solution within iterations. [47, Theorem 2.2.1]. This theorem shows that the CG method finds the solution after a maxi- mum of iterations. In practice however, accumulated floating point roundoff error causes the residual to lose accuracy gradually. This causes search di- rections to lose H-orthogonality [68]. So providing the convergence analysis of CG method is essential. Theorem 2.4.2. Let e0 be the initial error of the CG. Then ej 2 H ≤ min max [Pi (λ)]2 e0 2 H, Pi ,Pi (0)=1 λ∈Λ(H) where Pi is a polynomials of degree i and Λ(H) is the set of eigenvalues of H. See [68]. Theorem 2.4.3. Let e0 be the initial error of the CG. Then √ j j κ−1 e H ≤2 √ e0 H, κ+1
  • 45. Chapter 2. Fundamentals 44 where κ is the condition number of the matrix H and . H is the H-norm for the symmetric positive definite matrix H. See [68]. λmin The condition number of a matrix defines as κ = λmax , where λmin and λmax are the minimum and maximum eigenvalues of the matrix H respec- tively. The previous theorem is not precise. Since, the CG method converges in a few iterations for a matrix which has a few distinct eigenvalues, even if it has large condition number. Theorem 2.4.4. Let H be symmetric positive definite. Assume that there are exactly k ≤ distinct eigenvalues of H. Then the CG iteration terminates in at most k iterations. [47, Theorem 2.2.3]. The previous theorems show that the convergence of the CG method de- pends on the eigenvalues of the matrix of the linear system. The idea of preconditioning appears to improve the characteristic of the original matrix. Let P be a preconditioner. P is an approximation to H but is easier to invert and it is a symmetric positive definite matrix. Instead of solving (2.15), the system P −1 Hu = P −1 q is solved. The CG method is applied for a sym- metric positive definite system. P is symmetric positive definite matrix, so it can be written as P = LLT . Accordingly, we solve the following system L−1 HL−T u = L−1 q, where u = LT u. Applying the CG method to solve the ˆ ˆ preconditioned system L−1 HL−T u = L−1 q leads to preconditioned conjugate ˆ gradient (PCG) method [47, 66, 68]. The PCG Algorithm: • Given an initial solution u0 , r0 = q − Hu0 and d0 = P −1 r0 . • For j = 0, 1, ...
  • 46. Chapter 2. Fundamentals 45 while rj > do αj = (rj )T P −1 rj /(dj )T Hdj , uj+1 = uj + αj dj , rj+1 = rj − αj Hdj , βj+1 = (rj+1 )T P −1 rj+1 /(rj )T P −1 rj , dj+1 = P −1 rj+1 + βj+1 dj . Generalized Minimal Residual Method (GMRES) CG method is used to solve a symmetric positive definite system. In 1986 the GMRES was proposed as a Krylov subspace method for solving a non- symmetric system [67]. GMRES method minimizes the residuals norm over all vectors in u0 + Kk . Suppose there is an orthogonal projector Vk onto Kk , then any vector uk ∈ u0 + Kk can be written as uk = u0 + Vk y, where y ∈ Rk . The GMRES generates iterations such that the residual rk is minimized, which can be written as Minuk ∈u0 +Kk rk . On the other hand rk = q − Huk = q − H(u0 + Vk y) = r0 − HVk y . The columns of Vk are generated by using Arnoldi algorithm [47, Algorithm 3.4.1]. The starting vector is given as v 1 = r0 / r0 and the following vectors
  • 47. Chapter 2. Fundamentals 46 are generated by j+1 Hv j − j j T i i i=1 ((Hv ) v )v v = , Hv j − j j T i i i=1 ((Hv ) v )v for j ≥ 0. Let Hk be constructed such that hji = (Hv j )T v i and hij = 0 for i > j + 1. Then Arnoldi algorithm produces matrices Vk such that HVk = Vk+1 Hk . Consequently, the residual norm becomes rk = r0 − HVk y = r0 − Vk+1 Hk y = Vk+1 (βe1 − Hk y) . That is because v 1 = r0 / r0 and β = r0 , where e1 = [1, 0, ..., 0] and e1 ∈ Rk+1 . See [47, 66]. The GMRES Algorithm: • Given an initial solution u0 , r0 = q − Hu0 , v 1 = r0 / r0 , ρ0 = r0 , β0 = ρ0 and j=0. • While ρj > q and j < jmax do (a) j = j + 1. (b) For i = 1, ..., j
  • 48. Chapter 2. Fundamentals 47 hij = (Hv j )T v i v j+1 = Hv j − j i=0 hij v i , hj+1,j = v j+1 , v j+1 = v j+1 / v j+1 , e1 = (1, 0, ..., 0)T ∈ Rj+1 , Minimize βj e1 − Hj dj over Rj to obtain dj , ρj+1 = βj e1 − Hj dj , uj+1 = uj + Vj dj . The GMRES method breaks down when Hv j − j j T i i i=1 ((Hv ) v )v is zero. This quantity is zero when the residual is zero. This is not a problem, since if the residual is zero the solution will be found. See [47, 66]. BiConjugate Gradient Method (BiCG) Among all methods which do not require the matrix to be symmetric the GMRES method is the most successful Krylov subspace method in terms of minimization property. However, the operations and the storage require- ment for this method increase linearly with the iteration number (GMRES is long recurrence method). The BiConjugate Gradient method is a short recurrence method and is used to solve nonsymmetric problem. It takes an- other approach: instead of minimizing the residual, the residual is required to satisfy the bi-orthogonality condition (rj )T w = 0, ¯ ∀w ∈ Kj ; ¯ Kj = span{ˆ0 , HT r0 , ...., (HT )j−1 r0 }, r ˆ ˆ ¯ where Kj is Krylov subspace for HT and usually r0 is chosen such that r0 = r0 ˆ ˆ [47].
  • 49. Chapter 2. Fundamentals 48 The BiCG Algorithm: • Given an initial solution u0 , r0 = q − Hu0 , Choose r0 such that r0 = 0, ˆ ˆ ˆ d0 = r0 and d0 = r0 . ˆ • For j = 0, 1, ... while rj > do ˆ αj = (rj )T rj /(dj )T HT dj , ˆ uj+1 = uj + αj dj , rj+1 = rj − αj Hdj , ˆ rj+1 = rj − αj HT dj , ˆ ˆ βj = (rj+1 )T rj+1 /(rj )T rj , ˆ ˆ dj+1 = rj+1 + βj dj , ˆ ˆ dj+1 = rj+1 + βj dj . ˆ ˆ The BiCG method breaks down when either (ˆj )T rj = 0 or (dj )T HT dj = 0. r If these quantities are very small this method becomes unstable [47, 70]. MINRES and SYMMLQ Methods MINRES and SYMMLQ methods are used to solve symmetric indefinite equation systems. The MINRES method minimizes the 2-norm of the resid- ual, while the SYMMLQ method solves the projected system, but it does not minimize anything. It maintains the residual orthogonal to the previous residuals. See [62] for more detail. As the MINRES and the SYMMLQ meth- ods are used to solve symmetric indefinite system, these methods should be preconditioned by a symmetric preconditioner, see [25, 62]. 2.4.4 Null Space Methods Null space methods can be used for solving saddle point problems like the augmented system (1.7).
  • 50. Chapter 2. Fundamentals 49 Solving (1.7) is equivalent to solving the following two equations: −Θ−1 ∆x + AT ∆y = f, (2.17) A∆x = g. Let us introduce Z the null space matrix. Z is a matrix belong to Rn×(n−m) and satisfies AZ = 0. Null space method is described as follows 1. Find ∆˜ such that x A∆˜ = g. x 2. Solve the system Z T Θ−1 Zp = −Z T (Θ−1 ∆˜ + f ), x (2.18) where Z is the null space matrix of the constraint matrix A. 3. Set the solution (∆x∗ , ∆y ∗ ) as the following: ∆x∗ = ∆˜ + Zp. x ∆y ∗ is the solution of the system AAT ∆y = A(f + Θ−1 ∆x∗ ). See [8]. Let us explain this method. First, we multiply the first equation of (2.17) with Z T , which gives −Z T Θ−1 ∆x + Z T AT ∆y = Z T f.
  • 51. Chapter 2. Fundamentals 50 This is equivalent to −Z T Θ−1 ∆x = Z T f because of Z T AT = 0. Let us denote ∆x = ∆˜ + Zp, where ∆˜ is chosen such that A∆˜ = g, x x x then the previous equation becomes Z T Θ−1 Zp = −Z T (Θ−1 ∆˜ + f ), x which is equivalent to (2.18). In order to find ∆y ∗ , we substitute ∆x∗ is the first equation of (2.17) and then multiply it with A which gives AAT ∆y = A(f + Θ−1 ∆x∗ ). The null space method is an attractive approach when n−m is small. The null space system (2.18) can be solved either directly or iteratively (see Sub- section 1.2.1 and 1.2.2 above). In [19] the PCG method is used to solve the null space system (which is similar to (2.18) but for quadratic minimization problem). In order to use the null space method we first have to compute the null space matrix Z. Let us assume A has full row rank. The matrix Z is given by   −A−1 A2 1 Z= , In−m where the constraint matrix A is partitioned as A = [A1 , A2 ], where A1 is a m × m nonsingular matrix. There are plenty of choices to construct the m × m nonsingular matrix A1 see [8]. In order to save on computation time and storage, one should choose the sparsest null basis matrix A1 . The
  • 52. Chapter 2. Fundamentals 51 problem of finding the sparsest null basis is called the null space problem. This problem is NP hard [17], and there are many papers which propose (heuristic) approaches to solve it [8, 11, 17, 18, 63].
  • 53. Chapter 3 The PCG Method for the Augmented System We are dealing with large and sparse problems and we are looking for an iterative method from the Krylov-subspace family which can solve the aug- mented system (1.7) efficiently. As we have discussed in the previous chap- ter, there exists a wide range of iterative methods which can be used in this context The family of Krylov-subspace methods [47, 66, 72] enjoys a partic- ularly good reputation among different iterative methods. Since we plan to solve large systems of equations, we prefer to use a short recurrence method rather than a long recurrence one. The full recurrence methods such as GM- RES [67] occasionally do not manage to converge fast enough and become unacceptably expensive. Among the short recurrence methods we consid- ered MINRES [62] and PCG [42, 66, 72]. Bearing in mind that, whichever method is used, preconditioning is necessary, we decided not to use MINRES because this method requires a symmetric positive definite preconditioner, a restriction we would like to avoid. Summing up, encouraged by recent anal- yses [51, 65] we will apply the preconditioned conjugate gradients (PCG) 52
  • 54. Chapter 3. The PCG Method for the Augmented System 53 method directly to the indefinite system (1.7). In the introduction section we explained fully why we chose to work with the augmented system (1.7). To summarise, the augmented system has better conditioning and has additional flexibility in exploiting sparsity compared to the normal equations. In addition, all preconditioners for the normal equations system have an equivalent for the augmented system, while the opposite is not true. The results presented in this chapter have been the subject of joint work with Jacek Gondzio and Julian Hall [2]. 3.1 Preconditioner Choosing the preconditioner for a linear system plays a critical role in the convergence of the iterative solver. The issue of finding a preconditioner for the augmented system was investigated in many papers [9, 10, 16, 21, 22, 23, 34, 46, 60] to mention a few. Let H be the matrix of the augmented system which arises from IPMs for the linear, quadratic or nonlinear programming problems.   H AT H= , (3.1) A 0 where H is a n × n matrix. Before presenting a list of preconditioners for augmented systems, we should mention first the characteristics of good preconditioner. The precon- ditioner is considered to be good if it satisfies the following features. The first one is that the preconditioner should be a good approximation to the original matrix H. If preconditioner is approximately equal to the original
  • 55. Chapter 3. The PCG Method for the Augmented System 54 matrix, then the preconditioned matrix P −1 H will be approximately equal to identity matrix. That makes it easy to solve the preconditioned system. The second feature is that the preconditioner should be relatively easy to compute. Since for most iterative methods, the preconditioner is computed at each iteration of interior point method. The third feature is that it should be relatively easy to solve an equation with preconditioner, namely it should be easy to solve P d = r. Since, this system is required to be solved at each iterations of the iterative solver. The final feature is that the eigenvalues of the preconditioned matrix should be clustered (and the distinct eigenvalues of the preconditioned matrix should be as less as possible) and bounded away from zero. Because, the convergence of iterative solvers usually relates to the eigenvalues of the preconditioned system. For the PCG method, for instance, see Theorem 2.4.4. It is very difficult to design a preconditioner satisfies the previous four features in the same time. Consequently one needs to make a balance among these features to design a high-quality preconditioner. That why there are huge number of studies tackle this issue. Below we discuss a few of recently developed preconditioners for (3.1). We also report theorems which show the behaviours of the eigenvalues of preconditioned matrices for some of these preconditioners, see Theorems 3.1.1, 3.1.2, 3.1.3 and 3.1.4. This information is important because it give an idea about the convergence of the precondi- tioned system. In [9] Bergamaschi, Gondzio, Venturin and Zilli propose a preconditioner for the augmented system for the linear, quadratic or nonlinear programming
  • 56. Chapter 3. The PCG Method for the Augmented System 55 problems. This preconditioner is defined as follow:   ˜ G AT P = , ˜ A 0 ˜ where G is an invertible approximation of H, and A is a sparse approximation of the Jacobian of constraints that is of matrix A. Let the error matrix ˜ E = A − A have rank p, where 0 ≤ p ≤ m. Let σ be the smallest singular ˜ ˜ value of AD−1/2 and eQ and eA be errors terms given as ED−1/2 eQ = D−1/2 QD−1/2 − I , eA = . σ ˜ The eigenvalues of the preconditioned matrix P −1 H are characterized by the following theorem [9, Theorem 2.1]. ˜ Theorem 3.1.1. Assume A and A have maximum rank. If the eigenvector is of the form (0, y)T then the eigenvalues of P −1 H are either one (with multiplicity at least m − p ) or possibly complex and bounded by | | ≤ eA . Corresponding to eigenvectors of the form (x, y)T with x = 0 the eigenvalues are 1. equal to one (with multiplicity at least m − p), or 2. real positive and bounded by λmin (D−1/2 QD−1/2 ) ≤ λ ≤ λmax (D−1/2 QD−1/2 ), or 3. complex, satisfying | R | ≤ eQ + eA , | I | ≤ eQ + eA ,
  • 57. Chapter 3. The PCG Method for the Augmented System 56 where = R + i I. In [21] the constraint matrix is partitioned into two matrices, such that A = [A1 , A2 ], where A1 is nonsingular. Accordingly, the matrix H is parti- tioned as follows   H11 H12 H= . H21 H22 The preconditioner P is constructed by replacing H by G. Similarly G is partitioned into   G11 G12 G= . G21 G22 The following theorem describes the eigenvalues of the preconditioned matrix P −1 H [21, Theorem 2.1]. Theorem 3.1.2. Suppose that Z is the null space matrix of A. Then P −1 H has 2m unit eigenvalues, and the remaining n − m eigenvalues are those of the generalized eigenproblem Z T HZv = λZ T GZv. Different choices of the matrices G11 , G12 , G21 and G22 give different pre- T conditioners. For the symmetric case H21 = H12 , the authors proposed dif- ferent choices of the matrix G, which improve the eigenvalues of the precon- ditioned matrix P −1 H. Here we will mention a few of these preconditioners. By choosing G22 = H22 , G11 = 0 and G21 = 0.
  • 58. Chapter 3. The PCG Method for the Augmented System 57 The eigenvalues of the preconditioned matrix are given in the following the- orem [21, Theorem 2.3]. Theorem 3.1.3. Suppose that the matrix G is chosen as mentioned before. Suppose that H22 is positive definite, and let ρ = min{rank(A2 ), rank(H21 )} + min{rank(A2 ), rank(H21 ) + min[rank(A2 ), rank(H11 )]}. Then P −1 H has at most rank(RT H21 + H21 R + RT H11 R) + 1 ≤ min(ρ, n − m) + 1 T ≤ min(2m, n − m) + 1, distinct eigenvalues, where R = −A−1 A2 . 1 For G22 = H22 , G11 = H11 and G21 = 0. The eigenvalues of the precon- ditioned matrix satisfy the following theorem [21, Theorem 2.4]. Theorem 3.1.4. Suppose that the matrix G is chosen as mentioned before. Suppose that H22 + RT H11 R is positive definite, and that T ν = 2 min{rank(A2 ), rank(H21 )}. Then P −1 H has at most ν + 1 distinct eigenvalues, where rank(RT H11 R) + 1 ≤ ν + 1 ≤ min(2m, n − m) + 1. In [34] the authors propose four different symmetric positive definite pre- conditioners for the augmented system for the linear programs. In order to construct these preconditioners the matrices H and A are partitioned as has
  • 59. Chapter 3. The PCG Method for the Augmented System 58 been mentioned earlier. However, A2 is chosen to be the nonsingular matrix instead of A1 . The first preconditioner is a diagonal matrix. This preconditioner is given by   H 0 0  11  T P = C1 C1 =  0 I 0  .     0 0 I The preconditioned matrix is given by   −1/2 I 0 H11 AT 1   C1 HC1 =  0 −1 −T AT .   H22 2   −1/2 A1 H11 A2 0 The second preconditioner is a block diagonal matrix. It is presented as follows   H 0 0  11  T T P = C2 C2 =  0 A2 A2 0  .     0 0 I The preconditioned matrix is given by   −1/2 I 0 H11 AT1   C2 HC2 −1 −T = 0 A−T H22 A−1 .   2 2 I   −1/2 A1 H11 I 0 The third preconditioner is designed to eliminate the submatrix A−T H22 A−1 2 2
  • 60. Chapter 3. The PCG Method for the Augmented System 59 in the previous preconditioned matrix. This preconditioner is given by   1/2 H11 0 0   T P = C3 C3 , C3 =  AT 1 H A−1 .   0 2 2 22 2   0 0 I The preconditioned matrix is given by   −1/2 −1/2 I 1 − 2 H11 AT A−T H22 A−1 H11 1 2 2 AT 1   C3 HC3 =  − 1 A−T H22 A−1 A1 H11 −1 −T −1/2 .   0 I  2 2 2  −1/2 A1 H11 I 0 The fourth preconditioner also eliminates the submatrix A−T H22 A−1 , using 2 2 the factorization AT = LU . This preconditioner is given by 2   1/2 H11 0 0   T P = C4 C4 , C4 =  1 H L−T .   0 L 2 22   0 0 UT The preconditioned matrix is given by   −1/2 −1/2 I − 1 H11 AT U −1 L−1 H22 L−T 2 1 H11 AT U −1 1   C4 HC4 −1 −T −1/2 =  − 2 L H22 L−T U −T A1 H11  1 −1 .  0 I   −1/2 U −T A1 H11 I 0 The preconditioner in [60] is given in the form P = CC T and is applied from the left and from the right to the augmented system, which arises from the IPMs for LP. To construct this preconditioner the matrices A and H are partitioned as mentioned before, where A1 is nonsingular. The inverse of C is given by
  • 61. Chapter 3. The PCG Method for the Augmented System 60   −H −1/2 M C −1 =  , T 0 1/2 T = [I 0]Q, where Q is a permutation matrix, and M = T T H11 A−1 . 1 The preconditioned matrix is given by:   −I −W 0   C −1 HC −T = Q  −W T  T 0 Q ,  I   0 0 H11 1/2 −1/2 where W = H11 A−1 A2 H22 1 . Assume ∆x = [∆x1 , ∆x2 ] is partitioned accordingly to the partition of A. Eventually in the approach of [60] the preconditioned system is reduced to the following normal equations (I + W W T )∆x1 = g . ˜ In this section we construct a new preconditioner for the augmented sys- tem (1.7). And before we do so we will rearrange the augmented system such that      −1 T Θ A −∆x f   = , (3.2) A 0 ∆y g where in this chapter we redefine g as follows g = Ax − b. To design the preconditioner for the augmented system, we first ob- serve that the ill-conditioning in linear systems (1.7) and (1.8) is a conse-
  • 62. Chapter 3. The PCG Method for the Augmented System 61 quence of the properties of the diagonal scaling matrix Θ. From the com- plementarity condition for linear programs we know that, at the optimum, xj sj = 0, ∀j ∈ {1, 2, . . . , n}. The condition xj sj = 0 is satisfied if at least ˆ ˆ ˆ ˆ one of the variables xj and sj is zero. Primal-dual interior point methods ˆ ˆ identify a strong optimal partition [77], that is, they produce an optimal so- lution with the property xj + sj > 0, ∀j. In other words, only one of xj and ˆ ˆ ˆ sj is zero. The set of indices {1, 2, . . . , n} can therefore be partitioned into ˆ two disjoint subsets: B = {j ∈ {1, 2, ..., n} : xj > 0} and N = {j ∈ {1, 2, ..., n} : sj > 0}. ˆ ˆ In fact, the optimal partition is closely related (but not equivalent to) the basic-nonbasic partition in the simplex method. That is due to that simplex method iterations move from vertex to vertex until the optimal solution is found. So the simplex method has exactly m basic variables (variables belong to B) and n−m nonbasic variables (variables belong to N). However, interior point methods approach the optimal solution by moving through the interior of the feasible region. Consequently, interior point methods have m basic variable and n − m nonbasic variables in the limit only. That is why we refer to this partition in interior point methods by optimal partition. Unlike the simplex method which satisfies the complementarity condition at each iteration, the interior point method satisfies this condition only in the limit. The primal-dual interior point method identifies a strong optimal partition near the optimal solution. Below we will summarise its asymptotic behaviour and use the arrow to denote “converges to”. If at the optimal solution j ∈ B, then xj → xj > 0 and sj → 0, hence the corresponding ˆ element θj → ∞. If at the optimal solution j ∈ N, then xj → 0 and
  • 63. Chapter 3. The PCG Method for the Augmented System 62 sj → sj > 0 and θj → 0. Summing up, ˆ    ∞, if j ∈ B  0, if j ∈ B −1 θj → and θj → (3.3)  0, if j ∈ N,  ∞, if j ∈ N. This property of interior point methods is responsible for a number of numer- ical difficulties. In particular, it causes both linear systems (1.7) and (1.8) to become very ill-conditioned when an interior point method approaches the optimal solution [3]. However, it may be used to advantage when construct- ing a preconditioner for the iterative method. We partition the matrices and vectors:   ΘB 0 A = [AB , AN ], Θ= , x = [xB , xN ], and s = [sB , sN ] 0 ΘN according to the partition of {1, 2, . . . , n} into sets B and N. With this notation, from (3.3) we conclude that ΘN ≈ 0 and Θ−1 ≈ 0. Consequently, B the matrix in the augmented system (3.2) can be approximated as follows:     Θ−1 B AT B AT B     Θ−1 AT ≈ Θ−1 AT , (3.4)       N N   N N  AB AN AB AN and the matrix in the normal equations system (1.8) can be approximated as follows: AΘAT = AB ΘB AT + AN ΘN AT ≈ AB ΘB AT . B N B (3.5) If the matrix AB was square and nonsingular then equations (3.4) and (3.5) would suggest obvious preconditioners for the augmented system and nor-
  • 64. Chapter 3. The PCG Method for the Augmented System 63 mal equations, respectively. However, there is no guarantee that this is the case. On the contrary, in practical applications it is very unlikely that the matrix AB corresponding to the optimal partition is square and nonsingular. Moreover, the optimal partition is known only when an IPM approaches the optimal solution of the linear program. To construct a preconditioner to (3.2) with a structure similar to the ap- proximation (3.4) we need to guess an optimal partition and, additionally, guarantee that the matrix B which approximates AB is nonsingular. We ex- ploit the difference in magnitude of elements in Θ to design a preconditioner. We sort the elements of Θ in non-increasing order: θ1 ≥ θ2 ≥ θ3 ≥ · · · ≥ θn . −1 −1 −1 Hence the elements of Θ−1 satisfy θ1 ≤ θ2 ≤ θ3 ≤ · · · ≤ θn . If the −1 primal-dual iterate is sufficiently close to an optimal solution, then the first −1 elements θj in this list correspond to variables xj which are most likely to be nonzero at the optimum, and the last elements in the list correspond to variables which are likely to be zero at the optimum. We select the first m linearly independent columns of the matrix A, when permuted according −1 to the order of θj , and we construct a nonsingular matrix B from these columns. The submatrix of A corresponding to all the remaining columns is denoted by N . Therefore we assume that a partition A = [B, N ] is known −1 such that B is nonsingular and the entries θj corresponding to columns of B are chosen from the smallest elements of Θ−1 . According to this partition- ing of A and Θ (and after a symmetric row and column permutation) the indefinite matrix in (3.2) can be rewritten in the following form   Θ−1 B BT   K= Θ−1 T . (3.6)   N N   B N
  • 65. Chapter 3. The PCG Method for the Augmented System 64 By construction, the elements of Θ−1 are supposed to be among the smallest B elements of Θ−1 , hence we may assume that Θ−1 ≈ 0. The following easily B invertible block-triangular matrix   T B   P = Θ−1 T (3.7)   N N    B N is a good approximation to K. Hence P is an attractive preconditioner for K. We should mention that Oliveira and Sorensen [60] use a similar partitioning process to derive their preconditioner for the normal equations. They order the columns of the matrix AΘ−1 from the smallest to the largest with respect to the 1-norm and then scan the columns of A in this order to select the first m that are linearly independent. Since the matrix B was constructed from columns corresponding to the smallest possible elements of Θ−1 we may expect that Θ−1 B F Θ−1 N F, where . F denotes the Frobenius norm of the matrix. Using (3.6) and (3.7) we derive the following bound on the square of the Frobenius norm of the difference of matrices K and P : K −P 2 F = Θ−1 B 2 F P 2 F < K 2 F. (3.8) Summing up, P is a good approximation to K (since the approximation 2 2 error is small in relation to P F and K F) and we may consider it as a possible preconditioner of K. Secondly, it is easy to compute P , we order the elements of Θ in non-increase order then we pick the first m linearly independent columns of A in this order to construct the nonsingular matrix B, see Subsection 3.4. In addition, it is easy to solve an equation with P
  • 66. Chapter 3. The PCG Method for the Augmented System 65 because it is block-triangular with nonsingular diagonal blocks B, Θ−1 and N B T . We conclude this section by giving explicit formulae for the solution of equations with the preconditioner (3.7) and leave the analysis of spectral properties of the preconditioned matrix P −1 K to Subsection 3.2. 3.1.1 Solving equations with P The matrix (3.7) is block triangular and its diagonal blocks B, Θ−1 and B T N are invertible. Let d = [dB , dN , dy ] and r = [rB , rN , ry ] and consider the system of equations      BT dB rB      Θ−1 T  =  rN . (3.9)       N N   dN      B N dy ry The solution of (3.9) can easily be computed by exploiting the block-triangular structure of the matrix: B T dy = rB ⇒ dy = B −T rB Θ−1 dN + N T dy = rN ⇒ dN = ΘN rN − ΘN N T dy N (3.10) BdB + N dN = ry ⇒ dB = B −1 (ry − N dN ). The operation d = P −1 r involves solving two equations (one with B and one with B T ) and a couple of matrix-vector multiplications. These operations will be performed at every iteration of the conjugate gradients procedure hence they should be implemented in the most efficient way. The issues of choosing a well-conditioned basis matrix B with sparse factored inverse are addressed in Subsection 3.4.
  • 67. Chapter 3. The PCG Method for the Augmented System 66 3.2 Spectral analysis We have observed earlier that if ΘB is chosen carefully and Θ−1 B F Θ−1 F N then the preconditioner (3.7) is a good approximation to K in (3.6). To assess the quality of the preconditioner we need a better understanding of the relation between P and K. We will therefore analyse the spectral properties of the preconditioned matrix P −1 K. Let us use the notation Kt = q to denote the system (3.2), where t = [−∆x, ∆y] and q = [f, g]. Given a starting approximation t(0) and the associated residual r(0) = q − Kt(0) the indefinite preconditioner may be applied either from the right, yielding the system KP −1 t = q, ˆ t = P −1 t, ˆ (3.11) or from the left, so that the system to be solved becomes P −1 Kt = P −1 q. (3.12) The right and the left preconditioned matrices KP −1 and P −1 K have the same eigenvalues so general spectral results can be given in terms of either of the two formulations. The following theorem shows that the eigenvalues of the P −1 K matrix are real and positive. Moreover they are bounded away from zero. Theorem 3.2.1. Let λ be an eigenvalue of P −1 K. Then λ is real and λ ≥ 1. Proof. Let v be an eigenvector of P −1 K corresponding to the eigenvalue λ, that is, P −1 Kv = λv. Let λ = 1 + τ and, applying the usual partitioning
  • 68. Chapter 3. The PCG Method for the Augmented System 67 v = [vB , vN , vy ], the eigensystem can be written as Kv = (1 + τ )P v:       Θ−1 B T v B T v  B  B    B  Θ−1 N T  = (1 + τ )  Θ−1 N T        N   vN N   vN        B N vy B N vy which yields Θ−1 vB = τ B T vy B τ (Θ−1 vN + N T vy ) = 0 N τ (BvB + N vN ) = 0. We consider two cases. When τ = 0 clearly λ = 1 so the claim is true. Otherwise, when τ = 0, the equation system can be simplified: Θ−1 vB = τ B T vy B Θ−1 vN + N T vy = 0 N BvB + N vN = 0, and solved for τ . From the third equation we get vB = −B −1 N vN and, substituting this in the first equation, yields N vN = −τ BΘB B T vy . Next, we use the second equation to substitute for vN = −ΘN N T vy giving (N ΘN N T )vy = τ (BΘB B T )vy . If vy = 0 then (using τ = 0) we deduce that vB = 0 and vN = 0, that is the eigenvector is zero. We can exclude such a situation and safely assume
  • 69. Chapter 3. The PCG Method for the Augmented System 68 T that vy = 0. In this case, we multiply both sides of the equation by vy to get vy (N ΘN N T )vy = τ vy (BΘB B T )vy . T T Since all the elements of Θ are positive and B is nonsingular, the matrix BΘB B T is symmetric positive definite and the matrix N ΘN N T is symmetric positive semidefinite. Hence we conclude that vy (N ΘN N T )vy T τ= ≥ 0, (3.13) vy (BΘB B T )vy T which is real and positive number, which completes the proof. The proof reveals the importance of the correct partitioning of A = [B, N ]. Indeed, this partition should have a number of desirable features: • B should be nonsingular and well-conditioned since we should operate accurately with the preconditioner; • All elements in Θ−1 should be small in comparison with those in Θ−1 . B N The condition Θ−1 B F Θ−1 N F is relatively easy to satisfy. How- ever, (3.13) indicates that we need a stronger property: we would like to bound τ from above and, in that way, cluster all eigenvalues of P −1 K in an interval [1, λmax ], with λmax kept as small as possible. This opens questions regarding the necessary concessions to be made when the matrix B and the corresponding ΘB are chosen. The ability to identify a well-conditioned ma- trix B consisting of columns for which the θj are “large” is crucial for the good/efficient behaviour of our approach. We discuss these issues in detail in Section 3.4. In the previous theorem we show that the eigenvalues of the precondi- tioned matrix KP −1 are real and greater than one. In the following theorem
  • 70. Chapter 3. The PCG Method for the Augmented System 69 we show that the preconditioned matrix KP −1 has at least n + m − p unit eigenvalues, where p is the rank of the matrix N . Theorem 3.2.2. The preconditioned matrix KP −1 has: • unit eigenvalues with multiplicity n + m − p. z T N ΘN N T z • the remaining p eigenvalues are given by 1 + z T BΘB B T z ≥ 1, where z = 0. Proof. The inverse of the preconditioner P is given by   B −1 N ΘN N T B −T −B −1 N ΘN B −1   P −1 =  −ΘN N B T −T .   ΘN 0   B −T 0 0 Therefore, the preconditioned matrix KP −1 is given by   I+ Θ−1 B −1 N ΘN N T B −T B −Θ−1 B −1 N ΘN B Θ−1 B −1 B   KP −1 = .   0 I 0   0 0 I Let v be an eigenvector of KP −1 corresponding to the eigenvalue λ, that is, KP −1 v = λv, which can be rewritten as      I+ Θ−1 B −1 N ΘN N T B −T −Θ−1 B −1 N ΘN Θ−1 B −1 v v  B B B  B   B   = λ  vN       0 I 0   vN       0 0 I vy vy which yields (I + Θ−1 B −1 N ΘN N T B −T )vB − Θ−1 B −1 N ΘN vN + Θ−1 B −1 vy = λvB , (3.14) B B B
  • 71. Chapter 3. The PCG Method for the Augmented System 70 vN = λvN , (3.15) vy = λvy . (3.16) The equations (3.15) and (3.16) are true if either (λ = 1 for any vN and vy ) or (vN = vy = 0). 1. Case λ = 1, we now analyse a number of cases depending on vB , vN and vy . a. vN = 0. Substituting this in (3.14) gives vB = −B T (N ΘN N T )−1 vy . That gives the eigenvector [−B T (N ΘN N T )−1 vy , 0, vy ] which is associated with the unit eigenvalue with multiplicity m, because we can find m linearly independent vectors vy . b. vy = 0. Substituting this in (3.14) gives vB = B T (N ΘN N T )−1 N ΘN vN . That gives the eigenvector [B T (N ΘN N T )−1 N ΘN vN , vN , 0] which is associated with the unit eigenvalue with multiplicity n − m, because we can find n − m linearly independent vectors vN . 2. Case vN = vy = 0. Substituting this in (3.14) gives BΘB vB + N ΘN N T B −T vB = λBΘB vB . For vB = 0, there is a nonzero vector z such that vB = B T z. Since B is nonsingular z = 0. By substituting this in the previous equation we
  • 72. Chapter 3. The PCG Method for the Augmented System 71 get the following equality BΘB B T z + N ΘN N T z = λBΘB B T z. Since z = 0 and BΘB B T is symmetric positive definite matrix we can write z T N ΘN N T z λ=1+ ≥ 1. (3.17) z T BΘB B T z That gives the eigenvectors [vB , 0, 0] which is associated with the eigen- values (3.17). Moreover, N has rank p, and for m linearly independent vectors vB , we get N T z = 0 with multiplicity p and N T z = 0 with mul- tiplicity m − p. Consequently the eigenvectors [vB , 0, 0] are associated with the unit eigenvalues with multiplicity m − p and the remaining p eigenvalues are given by (3.17). We Conclude from the previous cases that the preconditioned matrix KP −1 has n + m − p unit eigenvalues and the remaining p eigenvalues are given by (3.17). 3.3 The PCG method for nonsymmetric in- definite system Rozlozn´ and Simoncini [65] used the BiCG method to solve an indefinite ık system such as (3.2) preconditioned from the right. They show that the right preconditioned BiCG method reduces to the standard preconditioned CG method if the following two properties hold. The first property is that the preconditioned matrix H = KP −1 is J-symmetric, where J = P −1 , and the second is that g = 0. The reason behind this is that when g = 0 the residual
  • 73. Chapter 3. The PCG Method for the Augmented System 72 j of PCG has a zero block and can be expressed as rj = [r1 , 0]. Although in our case g = 0, the initial iterate t0 can be chosen so that the corresponding residual has the form r0 = [r1 , 0]. Furthermore, the preconditioned matrix 0 H = KP −1 is J-symmetric, since H T J = JH. See [65]. Let us consider the following starting point for CG:     −∆x0 B B −1 g     0 t =  −∆x0 = , (3.18)     N 0     0 ∆y 0   ∆xB where ∆x =  . The initial residual r0 = q−Kt0 may then be written ∆xN as        f Θ−1 B T B g −1 fB − Θ−1 B −1 g  B   B    B  0 r =  fN − Θ−1 T = .        N N  0 fN        g B N 0 0 Note two interesting properties of the preconditioned matrix KP −1 stated as two Lemmas below. Multiplying by the preconditioned matrix KP −1 preserves a zero block in the third component of the vector.     v z  B   B  Lemma 3.3.1. Let t =  vN . Then KP −1 t =  zN .         0 0 Proof. We note first that, by using (3.9)-(3.10), we may write u = P −1 t as   −1 T −T −1 B N ΘN N B vB − B N ΘN vN   u= −ΘN N T B −T vB + ΘN vN .     B −T vB
  • 74. Chapter 3. The PCG Method for the Augmented System 73 Hence    Θ−1 B BT B −1 N ΘN N T B −T vB − B −1 N ΘN vN    KP −1 t = Ku =  Θ−1 T −ΘN N B T −T    N N  vB + ΘN vN     B N B −T vB   (I + Θ−1 B −1 N ΘN N T B −T )vB − Θ−1 B −1 N ΘN vN B B   =  ,   vN   0 which completes the proof. Furthermore, using the initial approximate solution     −∆x0 B B −1 (g − N ΘN fN )     0 t =  −∆x0 = , (3.19)     N ΘN fN     0 ∆y 0   rB   the residuals will have two zero blocks, r =  0 .     0 The initial residual r0 = q − Kt0 may be written:      f Θ−1 B T −1 B (g−N ΘN fN )  B   B   r 0 =  fN − Θ−1 T ,      N N  ΘN fN      g B N 0
  • 75. Chapter 3. The PCG Method for the Augmented System 74 which gives   fB −Θ−1 B −1 g+Θ−1 B −1 N ΘN fN B B   0 r = .   0   0 We observe an important property of the preconditioned matrix: multi- plying with the matrix KP −1 preserves the zero blocks in the second and third components of the vector.     v z  B   B  Lemma 3.3.2. Let t =  0 . Then KP −1 t =  0 .         0 0 Proof. We note first that, by using (3.9)-(3.10), we may write u = P −1 t as   −1 T −T B N ΘN N B vB   u= T −T −ΘN N B   vB    B −T vB hence    Θ−1 B B T −1 B N ΘN N B T −T vB    −1 KP t = Ku =  Θ−1 T  −ΘN N B T −T ,   N N  vB    B N B −T vB we obtain   (I + Θ−1 B −1 N ΘN N T B −T )vB B   −1 KP t= ,   0   0
  • 76. Chapter 3. The PCG Method for the Augmented System 75 which completes the proof. From the PCG algorithm, we have d0 = P −1 r0 , dj = P −1 rj + βj dj−1 and rj+1 = rj − αj Kdj . So the residual r1 is computed as linear combination of r0 and KP −1 r0 . For j > 1, the residual rj+1 is computed as a linear combination of rj−1 , rj and KP −1 rj (That is because rj+1 = αj βj /αj−1 rj−1 + j (1 − αj βj /αj−1 )rj − αj KP −1 rj ). This implies that rj = [r1 , 0] for j = 0,1, . . . Consequently, we can use the standard PCG method along with (3.7) to solve (3.2). 3.3.1 The convergence of the PCG method In this section, we analyse the behaviour of the PCG method for the indefinite system (3.2) and give explicit formulae describing the convergence of the method. The convergence analysis of the PCG method is important because both K and P are indefinite matrices. In [65] the authors prove that both the error and the residual of PCG method converge to zero. In here we prove that too. We analyse the method working in our specific setup with a particular starting point guaranteeing that the initial residual has the form r0 = [rB , 0, 0]. 0 The PCG algorithm (see Chapter 2) generates iterates tj , j = 0, 1, . . . with residuals rj = q − Ktj . The error corresponding to each PCG iteration has the form ej = tj − t∗ , where t∗ is the solution of (3.2), and the residual can be written as rj = −Kej since Kej = Ktj − Kt∗ = −rj . In Lemma 3.3.3 we prove that the indefinite K-inner product of the error ej in the PCG algorithm is always non-negative so we can write ej K = < ej , Kej >, even though K is not positive definite. In Theorem 3.3.4 we show that the K-norm of the error ej is minimized over the eigenvalues of the symmetric positive definite matrices. Similarly, in Theorem 3.3.5 we show that the
  • 77. Chapter 3. The PCG Method for the Augmented System 76 Euclidean norm of the residual rj is also minimized over the eigenvalues of the symmetric positive definite matrices. In other words, the error and residual terms display asymptotic convergence similar to that observed when PCG is applied to symmetric positive definite systems. Lemma 3.3.3. Assume we use (3.18) or (3.19) as initial solution of PCG method. Then the indefinite K-inner product < ej , Kej > is non-negative for the error ej hence it defines a seminorm ej K = < ej , Kej > = ej 1 Θ−1 . (3.20) Proof. We have shown in Lemmas 3.3.1 and 3.3.2 that, for a suitable initial j solution, the residual has the form rj = [r1 , 0]. Hence      Θ −1 A T ej 1 −Θ−1 ej 1 − AT ej 2 rj = −Kej = −   = , A 0 ej 2 −Aej 1 implies Aej = 0. Simple calculations give the following result 1    Θ −1 A T ej 1 < ej , Kej > = (ej )T Kej = (ej )T (ej )T 1 2    A 0 ej 2 = (ej )T Θ−1 ej + (ej )T AT ej + (ej )T Aej 1 1 1 2 2 1 = (ej )T Θ−1 ej 1 1 = (ej )T ΘB ej + (ej )T Θ−1 ej ≥ 0 B −1 B N N N (3.21) because Θ−1 is positive definite. This gives ej K = ej 1 Θ−1 , which com- pletes the proof. Let Dj be the Krylov subspace Dj = span{d0 , d1 , ..., dj−1 }. Then D1 = span{d0 } = span{P −1 r0 }. D2 = span{d0 , d1 }, where the direction d1 is a
  • 78. Chapter 3. The PCG Method for the Augmented System 77 linear combination of the previous direction and P −1 r1 , while r1 is a linear combination of the previous residual and Kd0 . This implies that d1 is a linear combination of d0 and P −1 KP −1 r0 , which gives D2 = span{P −1 r0 , P −1 KP −1 r0 }. By the same argument dj−1 is a linear combination of dj−2 and (P −1 K)j−1 P −1 r0 , giving Dj = span{P −1 r0 , P −1 KP −1 r0 , ..., (P −1 K)j−1 P −1 r0 }. Moreover, r0 = −Ke0 , so Dj = span{P −1 Ke0 , (P −1 K)2 e0 , . . . , (P −1 K)j e0 }. The error can be written as ej = ej−1 + αj−1 dj−1 , hence ej = e0 + j−1 j k=0 αk dk . Since dj ∈ Dj+1 the error can be written as ej = (I+ k=1 ψk (P −1 K)k )e0 , where the coefficient ψk is related to αk and βk . Hence the error term can be expressed as ej = φj (P −1 K)e0 , (3.22) where φj is a polynomial of degree j and we require that φj (0) = 1. Theorem 3.3.4. Let e(0) be the initial error of PCG. Then ej 2 K ≤ min max [φ(λ)]2 e0 B 2 + min Θ−1 φ∈P ,φ(0)=1 λ∈Λ(I max [φ(λ)]2 e0 N 2 Θ−1 , φ∈Pj ,φ(0)=1 λ∈Λ(Im +W W T ) B j n−m +W TW) N (3.23) where Pj is the set of polynomials of degree j, Λ(G) is the set of eigenvalues −1/2 1/2 of the matrix G and W = ΘB B −1 N ΘN . Im + W W T and In−m + W T W are symmetric positive definite matrices. Proof. First, we observe that Ae0 = 0, that is Be0 + N e0 = 0, and hence 1 B N we write   + Θ−1 e0 B B B T e0 2   0 −1 0 Ke =  ΘN eN + N T e0   2   0
  • 79. Chapter 3. The PCG Method for the Augmented System 78 and, using (3.10), we get   −1 B N ΘN N B T −T Θ−1 e0 B B −B −1 N e0 N   P −1 Ke0 =  −ΘN N T B −T Θ−1 e0 + e0 .    B B N  B −T Θ−1 e0 + e0 B B 2 Since Be0 + N e0 = 0, that is e0 = −B −1 N e0 and N e0 = −Be0 , we obtain B N B N N B   −1 B N ΘN N B T −T Θ−1 e0 B B −B −1 (−Be0 ) B   −1 0 P Ke =  −ΘN N B T −T Θ−1 (−B −1 N e0 ) e0   B N + N    B −T Θ−1 e0 + e0 B B 2   ΘB (Θ−1 B + Θ−1 B −1 N ΘN N T B −T Θ−1 )e0 B B B   =  ΘN (Θ−1 + N T B −T Θ−1 B −1 N )e0 . (3.24)    N B N  B −T Θ−1 e0 + e0 B B 2 Let us define C1 = Θ−1 + Θ−1 B −1 N ΘN N T B −T Θ−1 and C2 = Θ−1 + N T B −T Θ−1 B −1 N. B B B N B It is easy to prove that C1 and C2 are symmetric and positive definite ma- trices. By repeating a similar argument to the one used to derive (3.24) we obtain   φ(ΘB C1 )e0 B   −1 0 φ(P K)e =  φ(ΘN C2 )e0 . (3.25)    N  ∗ We observe that it is not necessary to compute the last component of the vector P −1 Ke0 because Lemma 3.3.3 guarantees that this component does not contribute to ej 2 K.
  • 80. Chapter 3. The PCG Method for the Augmented System 79 Using (3.25) to compute the K-norm of the error (3.21) we obtain φj (P −1 K)e0 2 K = φj (ΘB C1 )e0 B 2 Θ−1 + φj (ΘN C2 )e0 N 2 Θ−1 . (3.26) B N Let us observe that 1/2 1/2 1/2 −1/2 1/2 −1/2 (ΘB C1 )k = ΘB (ΘB C1 ΘB )k ΘB = ΘB (Im + W W T )k ΘB , where (Im + W W T ) is a symmetric and positive definite matrix. 1/2 −1/2 Analogously, we observe that (ΘN C2 )k = ΘN (In−m + W T W )k ΘN , also (In−m + W T W ) is a symmetric and positive definite matrix. Using these facts, the two terms on the right-hand-side of (3.26) can be simplified as follows 1/2 −1/2 0 2 φj (ΘB C1 )e0 B 2 Θ−1 = ΘB φj (Im + W W T )ΘB eB Θ−1 B B −1/2 0 2 = φj (Im + W W T )ΘB eB , 1/2 −1/2 0 2 φj (ΘN C2 )e0 N 2 Θ−1 = ΘN φj (In−m + W T W )ΘN eN Θ−1 N N −1/2 0 2 = φj (In−m + W T W )ΘN eN , From (3.22) we have ej 2 K = φj (P −1 K)e0 2 K, where φj is a polynomial of degree j and φj (0) = 1. So the K-norm error in (3.26) becomes −1/2 0 2 −1/2 0 2 ej 2 K = φj (Im + W W T )ΘB eB + φj (In−m + W T W )ΘN eN .(3.27) That is for every polynomial φj over the set of eigenvalues of Im + W W T and In−m + W T W . Consequently, we can write −1/2 0 2 ej 2 K ≤ min max [φ(λ)]2 ΘB eB φ∈Pj ,φ(0)=1 λ∈Λ(Im +W W T ) −1/2 0 2 + min max [φ(λ)]2 ΘN eN , φ∈Pj ,φ(0)=1 λ∈Λ(In−m +W T W )
  • 81. Chapter 3. The PCG Method for the Augmented System 80 −1/2 0 2 and the claim is proved after substituting ΘB eB = e0 B 2 Θ−1 and B −1/2 0 2 ΘN eN = e0 N 2 Θ−1 . N The K-norm of the error ej = φj (P −1 K)e0 is minimized over the eigen- values of the symmetric positive definite matrices (Im + W W T ) and (In−m + W T W ) so the error term behaves similar to the symmetric positive definite case. The Euclidean norm of the residual is minimized over the eigenvalues of the symmetric positive definite matrix Im + W W T . The following Theorem shows that the residual term displays asymptotic convergence similar to that observed when PCG is applied to positive definite system. Theorem 3.3.5. The residual of the PCG method which is used to solve the augmented system (1.7) preconditioned by P satisfies rj ≤ min max 0 |φ(λ)| rB . (3.28) φ∈Pj ,φ(0)=1 λ∈Λ(Im +W W T ) Proof. The residual satisfies rj = −Kej , and the error can be written as ej = φj (P −1 K)e0 . So we can write the residual as rj = −Kφj (P −1 K)e0 = −φj (KP −1 )Ke0 = φj (KP −1 )r0 .
  • 82. Chapter 3. The PCG Method for the Augmented System 81 Furthermore,   (I + Θ−1 B −1 N ΘN N T B −T )rB B 0 − Θ−1 B −1 N ΘN rN B 0 + Θ−1 B −1 r2 B 0   −1 0 0 KP r =  ,   rN   0 r2 j j j where rj = [rB , rN , r2 ]. The initial residual has the form r0 = [rB , 0, 0] 0 because of using the starting point (3.19), so the previous equation becomes   Θ−1 (ΘB + B −1 N ΘN N T B −T )rB B 0   KP −1 r0 =  . (3.29)   0   0 Let us define C = ΘB + B −1 N ΘN N T B −T . It is easy to prove that C is a symmetric positive definite matrix. By repeating a similar argument to one used to derive (3.29) we obtain   φj (Θ−1 C)rB B 0   rj = φj (KP −1 )r0 =  , (3.30)   0   0 and so rj = φj (Θ−1 C)rB . B 0 (3.31) −1/2 −1/2 −1/2 k 1/2 −1/2 Let us observe that (Θ−1 C)k = ΘB B (ΘB CΘB ) ΘB = ΘB (Im + 1/2 W W T )k ΘB , where Im + W W T is a symmetric positive definite matrix.
  • 83. Chapter 3. The PCG Method for the Augmented System 82 Using these definitions, (3.31) can be written as −1/2 1/2 1/2 rj = ΘB φj (Im + W W T )ΘB rB = φj (Im + W W T )ΘB rB 0 0 Θ−1 . B Therefore, 1/2 rj ≤ min max 0 |φ(λ)| ΘB rB Θ−1 , φ∈Pj ,φ(0)=1 λ∈Λ(Im +W W T ) B 0 1/2 0 and the claim is proved after substituting ΘB rB Θ−1 = rB . B 3.4 Identifying and factorising the matrix B The preconditioner P was derived on the assumption that it should be signifi- cantly cheaper to compute sparse factors of just the matrix B than computing a Cholesky factorisation of the coefficient matrix of the normal equations. Assuming that A has full row rank, we can find an m by m non-singular sub-matrix B. The matrix B is given by the first m linearly independent columns of the matrix A, where the columns of A are those of the constraint matrix A, or- −1 dered by increasing value of θj . The set of columns forming B is identified by applying Gaussian elimination to the matrix A, as described below. Al- though this yields an LU factorisation of B, the factorisation is not efficient with respect to sparsity and its use in subsequent PCG iterations would be costly. This potential cost is reduced significantly by using the Tomlin matrix inversion procedure [69] to determine the factorisation of B for use in PCG iterations. The Tomlin procedure is a relatively simple method of triangular- isation and factorisation that underpins the highly efficient implementation of the revised simplex method described by Hall and McKinnon [41]. Since
  • 84. Chapter 3. The PCG Method for the Augmented System 83 the matrix B is analogous to a simplex basis matrix, the use of the Tomlin procedure in this thesis is expected to be similarly advantageous. 3.4.1 Identifying the columns of B via Gaussian elim- ination When applying Gaussian elimination to the matrix A in order to identify the set of columns forming B, it is important to stress that the matrix A is not updated when elimination operations are identified. The linear indepen- dence of a particular column of A, with respect to columns already in B, is determined as follows. Suppose that k columns of B have been determined and let Lk be the current lower triangular matrix of elimination multipliers. Let aq be the first column of A that has not yet been considered for inclusion in B. The system Lk aq = aq is solved and the entries of the pivotal column aq are scanned for ˆ ˆ a good pivotal value. At each step of Gaussian elimination, one requires to divide the indices of the pivotal column by the pivot. So it is necessary to choose the pivot with large magnitude. Usually the pivot is chosen to be the coefficient which has the maximum magnitude among the coefficients of the pivotal column. On the other hand, chosen the pivot plays an important role in term of sparsity. So, we consider the pivot to be good if it has an acceptable large magnitude and has relatively small row count. If there are no acceptable pivots, indicating that aq is linearly dependent on the columns already in B, then aq is discarded. Otherwise, a pivot is chosen and aq is added to the set of columns forming B. At least m systems of the form Lk aq = aq must be solved in order to ˆ identify all the columns of B. For some problems, a comparable number of linearly dependent columns of A are encountered before a complete basis
  • 85. Chapter 3. The PCG Method for the Augmented System 84 is formed. Thus the efficiency with which Lk aq = aq is solved is crucial. ˆ Additionally, the ill-conditioning of B may lead to PCG being prohibitively expensive. This issue of efficiency is addressed in the following two ways. Firstly, in order to reduce the number of nonzeros in the matrices Lk , the pivotal entry in aq is selected from the set of acceptable pivots on grounds of ˆ sparsity. If the matrix A were updated with respect to elimination operations, then the acceptable pivot of minimum row count could be chosen. Since this is not known, a set of approximate row counts is maintained and used to discriminate between acceptable pivots. This set of approximate row counts is initialised to be correct and then, as elimination operations are identified, updated according to the maximum fill-in that could occur were A to be updated. (The row counts are initially the number of nonzero indices in each row of A. Then at each step of Gaussian elimination the row counts are approximately updated. Row counts are updated if fill-in occurs, while they are not updated if cancellations occur. Consequently, the same indices may include more than once if it is removed and then it is created again. In practice however, there is no much advantage of checking for cancellations and keeping the list of cancel indices.) Secondly, since aq is sparse, consideration is given to the likelihood that aq is also sparse. This is trivially the case when k = 0 since aq = aq . Since ˆ ˆ the columns of Lk are subsets of the entries in pivotal columns, it follows that for small values of k, aq will remain sparse. For some important classes of ˆ LP problems, this property holds for all k and is analogous to what Hall and McKinnon term hyper-sparsity [41]. Techniques for exploiting hyper-sparsity when forming aq analogous to those described in [41] have been used when ˆ computing the preconditioner and have led to significant improvements in computational performance.
  • 86. Chapter 3. The PCG Method for the Augmented System 85 Tomlin invert We apply the Tomlin matrix inversion procedure to the matrix B to de- termine a sparser LU factorisation for B. The active sub-matrix at any time in the Tomlin procedure consists of those rows and columns in which a pivot has not been found. Initially it is the whole matrix B. The Tomlin procedure has the following steps: 1. Find any identity columns of the matrix B and then eliminate these columns and their corresponding rows from the active sub-matrix. 2. Find any singleton row in the active sub-matrix and eliminate it to- gether with the corresponding column. Store the column of singleton row in the matrix L. Repeat this step to find all singleton rows in the active sub-matrix. 3. Find any singleton column in the active sub-matrix and eliminate it together with the corresponding row from the active sub-matrix. Store the singleton column in the matrix U . Repeat this to find all singleton columns in the active sub-matrix. 4. Repeat 2 and 3 until there are no more singleton rows or columns. 5. If the active sub-matrix is empty then stop. Otherwise, move to next step. 6. Apply Gaussian elimination to the remaining active sub-matrix.
  • 87. Chapter 4 Inexact Interior Point Method The consequence of using an iterative method to solve the linear system which arises from IPMs, is solving the KKT system approximately. In this case, the Newton method (1.4) is solved approximately. So instead of (1.4) we have the following system. F (tk )∆tk = −F (tk ) + rk , (4.1) where rk is the residual of the inexact Newton method. Any approximate step is accepted provided that the residual rk is small such as rk ≤ ηk F (tk ) , (4.2) as required by the theory [20, 47]. We refer to the term ηk as the forcing term. The original content of this chapter has already appeared in [1], co- authored with Jacek Gondzio. The idea behind inexact interior point algorithms is to derive a stopping 86
  • 88. Chapter 4. Inexact Interior Point Method 87 criterion of the iterative linear system solvers that minimizes the computa- tional effort involved in computing the search directions and guarantee global convergence [5]. We use the PCG method to solve the augmented system (1.7) precondi- tioned by a block triangular matrix P (3.7). As a result of this the search di- rections are computed approximately. That makes it necessary to rethink the convergence of the interior point algorithms, whose convergence are proved under the assumption that the search directions are calculated exactly. In this chapter we focus on one interior point algorithm which is the infeasible path-following algorithm. In order to prove the convergence of the inexact infeasible path-following algorithm (IIPF algorithm) we should prove first the convergence of the PCG method applied to the indefinite system (1.7) then we prove the convergence of the IIPF algorithm. In the previous chapter we proved that the PCG method applied to the indefinite system (1.7) preconditioned with (3.7) and initialized with an ap- propriate starting point (3.19), converges in a similar way to the case of applying PCG to a positive definite system. In this chapter we show that applying PCG to solve (1.7) with the preconditioner (3.7) can be analysed using the classical framework of the inexact Newton method (4.1). The use of inexact Newton methods in interior point methods for LP was investigated in [5, 6, 16, 29, 58, 59]. In [5] the convergence of the infeasible interior point algorithm of Kojima, Megiddo, and Mizuno is proved under the assumption that the iterates are bounded. Monteiro and O’Neal [59] propose the convergence analysis of inexact infeasible long-step primal-dual algorithm and give complexity results for this method. In [59] the PCG method is used to solve the normal equations preconditioned with a sparse preconditioner. The proposed preconditioner was inspired by the Maximum Weight Basis
  • 89. Chapter 4. Inexact Interior Point Method 88 Algorithm developed in [64]. In [7] an inexact interior point method for semidefinite programming is presented. It allows the linear system to be solved to a low accuracy when the current iterate is far from the solution. In [50] the convergence analysis of inexact infeasible primal-dual path-following algorithm for convex quadratic programming is presented. In these papers the search directions are inexact as the PCG method is used to solve the normal equations. Korzak [49] proves the convergence of the inexact infea- sible interior point algorithm of Kojima, Megiddo and Mizuno for LP. This is for search directions which are computed approximately for any iterative solver. This convergence is proven under the assumption that the iterates are bounded. Furthermore, in [82] Zhou and Toh show that the primal-dual inex- act infeasible interior point algorithm can find the -approximate solution of a semidefinite programm in O(n2 ln(1/ )) iterations. That is also for search directions which are computed approximately for any iterative solver with- out the need of assuming the boundedness of the iterations. That is because residuals satisfy specific conditions. One of these conditions is dependent on the smallest singular value of the constraint matrix. In order to provide the complexity result for the inexact infeasible interior point methods, one should find an upper bound on |∆xT ∆s| at each iteration of IPM. In [50] the authors change the neighbourhood of the interior point algorithm for QP. The same approach is used to find a bound on |∆xT ∆s| in [59]. However, that does not work for LP case. The authors assume that there is a point (¯, y , s) such that the residual of the infeasible primal- x ¯ ¯ dual algorithm is zero (the point (¯, y , s) is primal-dual feasible) and there x ¯ ¯ is a strictly positive point (x0 , y 0 , s0 ) such that (xk , y k , sk ) = ρ(x0 , y 0 , s0 ), where ρ ∈ [0, 1] and also (x0 , s0 ) ≥ (¯, s). These conditions are restrictive x ¯ and do not always hold. In [6, 7] the inexactness comes from solving the
  • 90. Chapter 4. Inexact Interior Point Method 89 normal equation system iteratively. In order to find a bound on |∆xT ∆s|, the authors find a bound on the normal equations matrix. However, in [82] the authors force residual to satisfy specific conditions, one of which depends on the singular value on the constraint matrix. In our case we do not require the residual of the inexact Newton method to satisfy a sophisticated condition. The condition on the residual is defined by rk ≤ ηk µk . This condition allows a low accuracy when the current iterate is far from the solution and high accuracy as the interior point method approaches optimality, because the term µk decreases as the iterations move toward the solution. Furthermore, we use shifting residual strategy, which makes the proof of the convergence and the complexity result of the inexact infeasible path-following algorithm follow the exact case. In this chapter we study the convergence analysis of inexact infeasible path following algorithm for linear programming as the PCG method is used to solve the augmented system preconditioned with block triangular sparse preconditioner. We prove the global convergence and the complexity result for this method without having to assume the boundedness of the iterates. We design a suitable stopping criteria for the PCG method. This plays an important role in the whole convergence of IIPF algorithm. This stopping criteria allows a low accuracy when the current iterate is far from the solution. We state conditions on the forcing term of inexact Newton method in order to prove the convergence of IIPF algorithm. The inexact approach in this thesis can be used in the cases where the augmented system is solved iteratively, provided that the residual of this iterative method has a zero block r = [r1 , 0]. So we can carry out this approach to cases like [65] for example.
  • 91. Chapter 4. Inexact Interior Point Method 90 4.1 The residual of inexact Newton method Using the PCG method to solve the augmented system (1.7) produces a specific value of the residual of the inexact Newton method (4.1). So we shall find the value of the residual r in (4.1) in order to prove the convergence of inexact infeasible path following algorithm and provide a complexity result. Solving (1.7) approximately gives        −1 T −Θ A ∆x f r1   = + , (4.3) A 0 ∆y g r2 where r1 = [rB , rN ]. That gives the following equations: −X −1 S∆x + AT ∆y = f + r1 = c − AT y − σµX −1 e + r1 , (4.4) A∆x = g + r2 = b − Ax + r2 . (4.5) Then we find ∆s by substituting ∆x in (1.6). However, we can shift the residual from (4.4) to (1.6) by assuming there is a residual h while computing ∆s. Then (1.6) is replaced by ∆s = −X −1 S∆x − s + σµX −1 e + h, which we can rewrite as −X −1 S∆x = ∆s + s − σµX −1 e − h.
  • 92. Chapter 4. Inexact Interior Point Method 91 Substituting it in (4.4) gives AT ∆y + ∆s = c − AT y − s + h + r1 . To satisfy the second equation of (1.5) we choose h = −r1 . This gives AT ∆y + ∆s = c − AT y − s, (4.6) and ∆s = −X −1 S∆x − s + σµX −1 e − r1 , which implies S∆x + X∆s = −XSe + σµe − Xr1 . (4.7) Equations (4.5), (4.6) and (4.7) give        A 0 0 ∆x ξp r2         0 AT I   ∆y  =  ξd  +  0 ,               S 0 X ∆s ξµ −Xr1 where ξp = b − Ax, ξd = c − AT y − s, ξµ = −XSe + σµe and σ ∈ [0, 1]. In the setting in which we apply the PCG method to solve (1.7) precon- ditioned with (3.7) we have r2 = 0 and r1 = [rB , 0], see equation (3.30) in the proof of Theorem 3.3.5. Therefore, the inexact Newton method residual
  • 93. Chapter 4. Inexact Interior Point Method 92 r is   0   r=   0    −Xr1     XB rB XB rB with Xr1 =  = . XN rN 0 Shifting the residual from (4.4) to (1.6) is an essential step to prove the convergence of the IIPF algorithm. It results in moving the residual from the second row to the last row of the inexact Newton system, which makes the proof of the convergence of the IIPF Algorithm much easier, as we will see in Section 4.2. The issue of choosing the stopping criteria of inexact Newton method to satisfy the condition (4.2) has been discussed in many papers. See for example [5, 6, 7, 49, 82]. In [5, 6] the residual of inexact Newton method is chosen such that rk ≤ ηk µk , while in [7] the choice satisfies rk ≤ ηk (nµk ). Let the residual be r = [rp , rd , rµ ]. According to Korzak [49], the residual
  • 94. Chapter 4. Inexact Interior Point Method 93 is chosen such that k rp 2 ≤ (1 − τ1 ) Axk − b 2 , k rd 2 ≤ (1 − τ2 ) AT y k + sk − c 2 , k rµ ∞ ≤ τ 3 µk . where τ1 , τ2 ∈ (0, 1] and τ3 ∈ [0, 1) are some appropriately chosen constants. In our case rp = rd = 0, we will stop the PCG algorithm when k rµ ∞ ≤ ηk µk . As rµ = −X k r1 and r1 = [rB , 0], the stopping criteria becomes k k k k XB rB ∞ ≤ ηk µk . (4.8) We terminate the PCG algorithm when the stopping criteria (4.8) is sat- isfied. This stopping criteria allows a low accuracy when the current iterate is far from the solution. In the later iterations the accuracy increases because the average complementarity gap µ reduces from one iteration to another. 4.2 Convergence of the IIPF Algorithm In this section we carry out the proof of the convergence of the IIPF algorithm and derive a complexity result. In the previous section we used the shifting residual strategy, which makes the proof of the convergence of this inexact algorithm similar to that of the exact case. This section is organised as follows. First we describe the IIPF algorithm. Then in Lemmas 4.2.1, 4.2.2 and 4.2.3 we derive useful bounds on the iterates. In Theorems 4.2.4 and 4.2.5 we prove that there is a step length α such that
  • 95. Chapter 4. Inexact Interior Point Method 94 the new iteration generated by IIPF algorithm belongs to the neighbourhood N−∞ (γ, β) and the average complementarily gap decreases. In order to prove that we supply conditions on the forcing term ηk . In Theorem 4.2.6 we show that the sequence {µk } converges Q-linearly to zero and the normal residual k k sequence { (ξp , ξd ) } converges R-linearly to zero. Finally in Theorem 4.2.7, we provide the complexity result for this algorithm. Definition: The central path neighbourhood N−∞ (γ, β) is defined by N−∞ (γ, β) = {(x, y, s) : (ξp , ξd ) /µ ≤ β (ξp , ξd ) /µ0 , (x, s) > 0, 0 0 (4.9) xi si ≥ γµ, i = 1, 2, ..., n}, where γ ∈ (0, 1) and β ≥ 1 [77]. 4.2.1 Inexact Infeasible Path-Following Algorithm 1. Given γ, β, σmin , σmax with γ ∈ (0, 1), β ≥ 1, 0 < σmin < σmax < 0.5, and 0 < ηmin < ηmax < 1; choose (x0 , y 0 , s0 ) with (x0 , s0 ) > 0; 2. For k = 0, 1, 2, ... • choose σk ∈ [σmin , σmax ] and ηk ∈ [ηmin , ηmax ] such that σk (1−γ) ηk < (1+γ) and ηk + σk < 0.99; and solve        k k A 0 0 ∆x ξp 0         0 AT I k  =  k − . (4.10)        ∆y   ξd 0        Sk 0 X k ∆sk σ k µk e − X k S k e X k r1 k
  • 96. Chapter 4. Inexact Interior Point Method 95 k Such that rN = 0 and k k XB rB ∞ ≤ ηk µk , (4.11) • choose αk as the largest value of α in [0, 1] such that (xk (α), y k (α), sk (α)) ∈ N−∞ (γ, β) (4.12) and the following Armijo condition holds: µk (α) ≤ (1 − .01α)µk ; (4.13) • set (xk+1 , y k+1 , sk+1 ) = (xk (αk ), y k (αk ), sk (αk )); • stop when µk < , for a small positive constant . In this section we will follow the convergence analysis of the infeasible path-following algorithm proposed originally by Zhang [81]. However, we will follow the proof techniques proposed in Wright’s book [77]. Firstly, let us introduce the quantity k−1 νk = (1 − αj ), ν0 = 1 j=0 Note that ξp = b − Axk+1 = b − A(xk + αk ∆xk ) = b − Axk − αk A∆xk = k+1 ξp − αk A∆xk , from the first row of (4.10) we get k k+1 k ξp = (1 − αk )ξp , (4.14)
  • 97. Chapter 4. Inexact Interior Point Method 96 which implies k 0 ξp = νk ξp . k+1 Note also ξd = c − AT y k+1 − sk+1 = c − AT (y k + αk ∆y k ) − (sk + αk ∆sk ) = (c − AT y k − sk ) − αk (AT ∆y k + ∆sk ) = ξd − αk (AT ∆y k + ∆sk ). From the k second row of (4.10) we get k+1 k ξd = (1 − αk )ξd , (4.15) which implies k 0 ξd = νk ξd , Consequently, the quantity νk satisfies µk νk ≤ β . µ0 More details can be found in [77]. Let (x∗ , y ∗ , s∗ ) be any primal-dual solution. Lemma 4.2.1. Assume that (xk , y k , sk ) ∈ N−∞ (γ, β), (∆xk , ∆y k , ∆sk ) sat- isfies (4.10) and (4.11) for all k ≥ 0, and µk ≤ (1 − .01αk−1 )µk−1 for all k ≥ 1. Then there is a positive constant C1 such that for all k ≥ 0 νk (xk , sk ) ≤ C1 µk , (4.16) where C1 is given as C1 = ζ −1 (nβ + n + β (x0 , s0 ) ∞ (x∗ , s∗ ) 1 /µ0 ),
  • 98. Chapter 4. Inexact Interior Point Method 97 where ζ = min min(x0 , s0 ). i i i=1,...,n The proof of this Lemma is similar to the proof of Lemma 6.3 in [77]. Moreover, we follow the same logic as in [77] to prove the following lemma. Lemma 4.2.2. Assume that (xk , y k , sk ) ∈ N−∞ (γ, β), (∆xk , ∆y k , ∆sk ) sat- isfies (4.10) and (4.11) for all k ≥ 0, and µk ≤ (1 − .01αk−1 )µk−1 for all k ≥ 1. Then there is a positive constant C2 such that 1/2 D−1 ∆xk ≤ C2 µk , (4.17) 1/2 D∆sk ≤ C2 µk , (4.18) where D = X 1/2 S −1/2 . For all k ≥ 0. Proof. For simplicity we omit the iteration index k in the proof. Let (¯, y , s) = (∆x, ∆y, ∆s) + νk (x0 , y 0 , s0 ) − νk (x∗ , y ∗ , s∗ ). x ¯ ¯ Then A¯ = 0 and AT y + s = 0, which implies xT s = 0. x ¯ ¯ ¯ ¯ A¯ = 0 because x A¯ = A∆x + νk Ax0 − νk Ax∗ = ξp + νk Ax0 − νk b = ξp − νk ξ0 = 0. x Similarly one can show that AT y + s = 0. Hence ¯ ¯ 0 = xT s = (∆x + νk x0 − νk x∗ )T (∆s + νk s0 − νk s∗ ). ¯ ¯ (4.19)
  • 99. Chapter 4. Inexact Interior Point Method 98 Using the last row of (4.10) implies S(∆x + νk x0 − νk x∗ ) + X(∆s + νk s0 − νk s∗ ) = S∆x + X∆s + νk S(x0 − x∗ ) + νk X(s0 − s∗ ) = −XSe + σµe − Xr1 + νk S(x0 − x∗ ) + νk X(s0 − s∗ ). By multiplying this system by (XS)−1/2 , we get D−1 (∆x + νk x0 − νk x∗ ) + D(∆s + νk s0 − νk s∗ ) = (XS)−1/2 (−XSe + σµe − Xr1 ) + νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ ). The equality (4.19) gives D−1 (∆x + νk x0 − νk x∗ ) + D(∆s + νk s0 − νk s∗ ) 2 = D−1 (∆x + νk x0 − νk x∗ ) 2 + D(∆s + νk s0 − νk s∗ ) 2 . Consequently, D−1 (∆x + νk x0 − νk x∗ ) 2 + D(∆s + νk s0 − νk s∗ ) 2 (4.20) = (XS)−1/2 (−XSe + σµe − Xr1 ) + νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ ) 2 , which leads to D−1 (∆x + νk x0 − νk x∗ ) ≤ (XS)−1/2 (−XSe + σµe − Xr1 ) +νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ ) ≤ (XS)−1/2 (−XSe + σµe − Xr1 ) +νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ ) . The triangle inequality and addition of an extra term νk D(s0 − s∗ ) to the
  • 100. Chapter 4. Inexact Interior Point Method 99 right hand side give D−1 ∆x ≤ (XS)−1/2 [−XSe + σµe − Xr1 ] + 2νk D−1 (x0 − x∗ ) (4.21) +2νk D(s0 − s∗ ) . (4.20) leads to D(∆s + νk s0 − νk s∗ ) ≤ (XS)−1/2 (−XSe + σµe − Xr1 ) + νk D−1 (x0 − x∗ ) +νk D(s0 − s∗ ) ≤ (XS)−1/2 (−XSe + σµe − Xr1 ) + νk D−1 (x0 − x∗ ) +νk D(s0 − s∗ ) . The triangle inequality and addition of an extra term νk D−1 (x0 − x∗ ) to the right hand side give D∆s ≤ (XS)−1/2 [−XSe + σµe − Xr1 ] + 2νk D−1 (x0 − x∗ ) (4.22) +2νk D(s0 − s∗ ) . We can write n −1/2 2 (−xi si + σµ − xi r1,i )2 (XS) (−XSe + σµe − Xr1 ) = i=1 x i si 2 − XSe + σµe − Xr1 1 ≤ ≤ − XSe + σµe − Xr1 2 . mini xi si γµ because (x, y, s) ∈ N−∞ (γ, β) which implies xi si ≥ γµ for i = 1, ..., n. On the other hand, 2 2 2 − XSe + σµe = XSe + σµe − 2σµeT XSe = XSe 2 + nσ 2 µ2 − 2nσµ2 2 ≤ XSe 1 + nσ 2 µ2 − 2nσµ2 = (xT s)2 + nσ 2 µ2 − 2nσµ2 ≤ n2 µ2 + nσ 2 µ2 − 2nσµ2 ≤ n2 µ2 ,
  • 101. Chapter 4. Inexact Interior Point Method 100 as σ ∈ (0, 1). This leads to − XSe + σµe − Xr1 ≤ − XSe + σµe + Xr1 √ √ ≤ nµ + n XB rB ∞ ≤ nµ + nηµ √ ≤ nµ + nηmax µ, which implies the following √ (XS)−1/2 (−XSe + σµe − Xr1 ) ≤ γ −1/2 (n + nηmax )µ1/2 . (4.23) On the other hand νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ ) (4.24) ≤ νk ( D−1 + D ) max( x0 − x∗ , s0 − s∗ ). For the matrix norm D−1 , we have −1 D−1 ≤ max Dii = D−1 e ∞ = (XS)−1/2 Se ∞ ≤ (XS)−1/2 s 1, i and similarly D ≤ (XS)−1/2 x 1. Using Lemma 4.2.1 and (4.24) we get νk D−1 (x0 − x∗ ) + νk D(s0 − s∗ ) ≤ νk (x, s) 1 (XS)−1/2 max( x0 − x∗ , s0 − s∗ ) ≤ C1 γ −1/2 µ1/2 max( x0 − x∗ , s0 − s∗ ). By substituting the previous inequality and (4.23) in (4.21) and (4.22)
  • 102. Chapter 4. Inexact Interior Point Method 101 we get √ D−1 ∆x ≤ (γ −1/2 (n + nηmax ) + 2C1 γ −1/2 max( x0 − x∗ , s0 − s∗ ))µ1/2 and √ D∆s ≤ (γ −1/2 (n + nηmax ) + 2C1 γ −1/2 max( x0 − x∗ , s0 − s∗ ))µ1/2 . Let us define C2 as √ C2 = γ −1/2 (n + nηmax ) + 2C1 γ −1/2 max( x0 − x∗ , s0 − s∗ ). which completes the proof. Lemma 4.2.3. Assume that (xk , y k , sk ) ∈ N−∞ (γ, β), (∆xk , ∆y k , ∆sk ) sat- isfies (4.10) and (4.11) for all k ≥ 0, and µk ≤ (1 − .01αk−1 )µk−1 for all k ≥ 1. Then there is a positive constant C3 such that |(∆xk )T ∆sk | ≤ C3 µk , (4.25) |∆xk ∆sk | ≤ C3 µk i i (4.26) for all k ≥ 0. Proof. For simplicity we omit the iteration index k in the proof. From Lemma 4.2.2 we have |∆xT ∆s| = |(D−1 ∆x)T (D∆s)| ≤ D−1 ∆x 2 D∆s ≤ C2 µ.
  • 103. Chapter 4. Inexact Interior Point Method 102 Moreover, using Lemma 4.2.2 again we obtain −1 −1 |∆xi ∆si | = |Dii ∆xi Dii ∆si | = |Dii ∆xi ||Dii ∆si | ≤ D−1 ∆x D∆s ≤ C2 µ. 2 2 Let us denote C3 = C2 , and the proof is complete. Theorem 4.2.4. Assume that (xk , y k , sk ) ∈ N−∞ (γ, β), (∆xk , ∆y k , ∆sk ) satisfies (4.10) and (4.11) for all k ≥ 0, and µk ≤ (1 − .01αk−1 )µk−1 for all k ≥ 1. Then there is a value α ∈ (0, 1) such that the following three ¯ conditions are satisfied for all α ∈ [0, α] for all k ≥ 0 ¯ (xk + α∆xk )T (sk + α∆sk ) ≥ (1 − α)(xk )T sk (4.27) γ k (xk + α∆xk )(sk + α∆sk ) ≥ i i i i (x + α∆xk )T (sk + α∆sk ) (4.28) n (xk + α∆xk )T (sk + α∆sk ) ≤ (1 − .01α)(xk )T sk . (4.29) Proof. For simplicity we omit the iteration index k in the proof. The last row of the system (4.10) implies sT ∆x + xT ∆s = −xT s + nσµ − xT rB , B and si ∆xi + xi ∆si = −xi si + σµ − xi r1,i
  • 104. Chapter 4. Inexact Interior Point Method 103 which leads to (x + α∆x)T (s + α∆s) = xT s + α(xT ∆s + sT ∆x) + α2 (∆x)T ∆s = xT s + α(−xT s + nσµ − xT rB ) + α2 (∆x)T ∆s B = (1 − α)xT s + nασµ − αxT rB + α2 (∆x)T ∆s. B Similarly (xi + α∆xi )(si + α∆si ) = xi si + α(si ∆xi + xi ∆si ) + α2 ∆xi ∆si = xi si + α(−xi si + σµ − xi r1,i ) + α2 ∆xi ∆si = (1 − α)xi si + ασµ − αxi r1,i + α2 ∆xi ∆si . For (4.27) we have (x + α∆x)T (s + α∆s) − (1 − α)xT s = (1 − α)xT s + nασµ − αxT rB B +α2 (∆x)T ∆s − (1 − α)xT s = nασµ − αxT rB + α2 (∆x)T ∆s B ≥ nασµ − α|xT rB | − α2 |(∆x)T ∆s| B ≥ nασµ − nαηµ − α2 C3 µ where we used the fact that from (4.11) we have |xT rB | ≤ n XB rB B ∞ ≤ nηµ. Therefore, the condition (4.27) holds for all α ∈ [0, α1 ], where α1 is given by n(σ − η) α1 = , (4.30) C3 and we choose η < σ − ε1 to guarantee α1 to be strictly positive, where ε1 is
  • 105. Chapter 4. Inexact Interior Point Method 104 a constant strictly greater than zero. Let us consider (4.28) γ (xi + α∆xi )(si + α∆si ) − n (x + α∆x)T (s + α∆s) = (1 − α)xi si + ασµ γ −αxi r1,i + α2 ∆xi ∆si − n ((1 − α)xT s + nασµ − αxT rB + α2 (∆x)T ∆s) B because (x, y, s) ∈ N−∞ (γ, β), so xi si ≥ γµ, ∀i = 1, ..., n, that gives γ (xi + α∆xi )(si + α∆si ) − n (x + α∆x)T (s + α∆s) ≥ (1 − α)γµ + ασµ γ γ −α maxi xi r1,i − α2 |∆xi ∆si | − γ(1 − α)µ − γασµ + n αxT rB − n α2 (∆x)T ∆s B γ γ ≥ ασµ − α XB rB ∞ − α2 C3 µ − ασγµ − n α|xT rB | − n α2 C3 µ ≥ ασµ − αηµ B γ −α2 C3 µ − ασγµ − γαηµ − n α2 C3 µ ≥ α((1 − γ)σ − η(1 + γ))µ − 2α2 C3 µ Condition (4.28) holds for all α ∈ [0, α2 ], where α2 is given by: σ(1 − γ) − (1 + γ)η α2 = . (4.31) 2C3 σ(1−γ) We choose η < (1+γ) − ε2 to guarantee α2 to be strictly positive, where ε2 is a constant strictly greater than zero. Finally, let us consider condition (4.29) 1 n [(x + α∆x)T (s + α∆s) − (1 − .01α)xT s] = 1 = n [(1 − α)xT s + nασµ − αxT rB + α2 (∆x)T ∆s − (1 − .01α)xT s] B 1 = n [−.99αxT s + nασµ − αxT rB + α2 (∆x)T ∆s] B α2 α2 ≤ −.99αµ + ασµ + α |xT rB | + n B n 3 Cµ ≤ −.99αµ + ασµ + αηµ + n 3 C µ. We can conclude that condition (4.29) holds for all α ∈ [0, α3 ], where α3 is given by: n(0.99 − σ − η) α3 = . (4.32) C3
  • 106. Chapter 4. Inexact Interior Point Method 105 We choose η and σ such that η + σ < 0.99 − ε3 to guarantee α3 to be strictly positive, where ε3 is a constant strictly greater than zero. Combining the bounds (4.30), (4.31) and (4.32), we conclude that condi- tions (4.27), (4.28) and (4.29) hold for α ∈ [0, α], where ¯ n(σ − η) σ(1 − γ) − (1 + γ)η n(0.99 − σ − η) α = min 1, ¯ , , . (4.33) C3 2C3 C3 We introduce the constants ε1 , ε2 and ε3 to guarantee that the limit of the step length α is strictly greater than zero and to make it flexible to choose ¯ the parameters ηk and σk . σ(1−γ) (1−γ) Note that if η < (1+γ) then η < σ because (1+γ) < 1 for any γ ∈ (0, 1). From this theorem we observe that the forcing term ηk should be chosen σk (1−γ) such that the following two conditions ηk < (1+γ) −ε2 and ηk +σk < 0.99−ε3 are satisfied. Under these assumption the following theorem guarantees that there is a step length α such that the new point belongs to the neighbour- hood N−∞ (γ, β) and its average complementarity gap decreases according to condition (4.13). Below we prove two theorems using standard techniques which follow from Wright [77]. σk (1−γ) Theorem 4.2.5. Assume that ηk < (1+γ) − ε2 , ηk + σk < 0.99 − ε3 for ε2 , ε3 > 0, (xk , y k , sk ) ∈ N−∞ (γ, β) and (∆xk , ∆y k , ∆sk ) satisfies (4.10) and (4.11) for all k ≥ 0, µk ≤ (1 − .01αk−1 )µk−1 for all k ≥ 1. Then (xk (α), y k (α), sk (α)) ∈ N−∞ (γ, β) and µk (α) ≤ (1−.01α)µk for all α ∈ [0, α], ¯ where α is given by (4.33). ¯ Proof. Theorem 4.2.4 ensures that the conditions (4.27), (4.28) and (4.29) are satisfied. Note that (4.29) implies that the condition µk (α) ≤ (1−.01α)µk
  • 107. Chapter 4. Inexact Interior Point Method 106 is satisfied, while (4.28) guarantees that xk (α)sk (α) ≥ γµk (α). i i To prove that (xk (α), y k (α), sk (α)) ∈ N−∞ (γ, β), we have to prove that k k 0 0 (ξp (α), ξd (α)) /µk (α) ≤ β (ξp , ξd ) /µ0 . From (4.14), (4.15) and (4.27) we have k k (ξp (α),ξd (α)) k k (1−α) (ξp ,ξd ) k k (1−α) (ξp ,ξd ) k k (ξp ,ξd ) µk (α) = µk (α) ≤ (1−α)µk ≤ µk (ξ 0 ,ξ 0 ) ≤ β p 0d , µ since (xk , y k , sk ) ∈ N−∞ (γ, β). Theorem 4.2.6. The sequence {µk } generated by the IIPF Algorithm con- k k verges Q-linearly to zero, and the sequence of residual norms { (ξp , ξd ) } converges R-linearly to zero. Proof. Q-linear convergence of {µk } follows directly from condition (4.13) and Theorem 4.2.4. There exists a constant α > 0 such that αk ≥ α for ¯ ¯ every k such that µk+1 ≤ (1 − .01αk )µk ≤ (1 − .01¯ )µk , for all k ≥ 0. α From (4.14) and (4.15) we also have k+1 k+1 k k (ξp , ξd ) ≤ (1 − αk ) (ξp , ξd ) . Therefore, k+1 k+1 k k (ξp , ξd ) ≤ (1 − α) (ξp , ξd ) . ¯ Also from Theorem 4.2.5 we know that 0 0 k+1 k+1 (ξp , ξd ) (ξp , ξd ) ≤ µk β . µ0
  • 108. Chapter 4. Inexact Interior Point Method 107 Therefore, the sequence of residual norms is bounded above by another se- k k quence that converges Q-linearly, so { (ξp , ξd ) } converges R-linearly. Theorem 4.2.7. Let > 0 and the starting point (x0 , y 0 , s0 ) ∈ N−∞ (γ, β) in the Algorithm IIPF be given. Then there is an index K with K = O(n2 |log |) such that the iterates {(xk , y k , sk )} generated by IIPF Algorithm satisfy µk ≤ , for all k ≥ K. Proof. If the conditions of Theorem 4.2.5 are satisfied, then the conditions (4.12) and (4.13) are satisfied for all α ∈ [0, α] for all k ≥ 0. By Theorem ¯ 4.2.4, the quantity α satisfies ¯ n(σ − η) σ(1 − γ) − (1 + γ)η n(0.99 − σ − η) α ≥ min 1, ¯ , , . C3 2C3 C3 Furthermore, from Lemmas 4.2.1, 4.2.2 and 4.2.3 we have C3 = O(n2 ), there- fore δ α≥ ¯ n2 for some positive scalar δ independent of n. That implies .01δ µk+1 ≤ (1 − .01¯ )µk ≤ (1 − α )µk , for k ≥ 0. n2 The complexity result is an immediate consequence of Theorem 3.2 of [77].
  • 109. Chapter 5 Numerical Results The numerical results, which are demonstrated in this chapter, have been presented in the paper [2]. The method discussed in this thesis has been implemented in the context of HOPDM [36]. We have implemented the preconditioned conjugate gradients method for the augmented system given a specific starting point. In the implementation, the starting point (3.19) with two zero blocks in its residual is used. We consider a subset of the linear programming problems from the Netlib [30], Kennington [14] and other public test sets used in [60]. In this chapter we indicate that the new approach can be very effective in some cases, and that the new approach is an important option for some classes of problems. In the initial iterations of the interior point method the normal equa- tions are solved using the direct approach by forming the Cholesky factori- sation LDLT for the normal equations matrix. As the interior point method approaches optimality, the normal equation matrix becomes extremely ill- conditioned due to a very different magnitude of the entries in Θ. At this point, we switch to the iterative solver. In practice, we switch to PCG when two conditions are satisfied: firstly, there are enough small elements in Θ−1 108
  • 110. Chapter 5. Numerical Results 109 −1 −1 (we have at least 3m/4 small entries θj , where θj ≤ 10−2 ) . Secondly, the relative duality gap is less than or equal to 10−2 . In our implementation, the termination criterion for the PCG method is set as rk / r(0) < ε. Initially, we chose ε = 10−2 . When the relative duality gap becomes less than or equal to 10−3 the value of ε is changed to 10−3 and, finally, when the relative duality gap falls below 10−4 the value of ε becomes 10−4 . Through out our study, we assume A has full row rank. This assumption does not effect on the robustness of this approach. Since, if A does not have full rank, we add artificial variables to the constraints to construct full rank constraints matrix A. We also add these variables to the objective function after multiply them with big constant M . The numerical results, which are shown in this chapter, are calculated for the following case. The matrix B is rebuilt at each iteration of interior point method, where the iterative solver is used. On the other hand, we can used the old information to update B for the next iteration. This will save a lot of factorisation time. However, in this case we will have larger θj , and consequently the number of the PCG iterations will increase. The idea of updating B is very interesting, but it requires a lot of work to grab hold of the best total running time (especially, to make a balance between the time of the PCG solver and the LU factorisation) for most of problems. This will be one of our future works. In Table 5.1, we report the problem sizes: m, n and nz(A) denote the number of rows, columns and nonzeros in the constraint matrix A. In the next two columns, nz(B) denotes the number of nonzeros in the LU fac- torisation of the basis matrix B and nz(L) denotes the number of nonzero elements in the Cholesky factor of the normal equations matrix. In this chap-
  • 111. Chapter 5. Numerical Results 110 ter, we report results for problems which benefit from the use of the iterative approach presented. As shown in the last column of Table 5.1, the iterative method is storage-efficient, requiring one or two orders of magnitude less memory than the Cholesky factorisation. These results show that in most cases we save more than 90% of the memory by using the LU factorisation compared with Cholesky factorisation. In pds20 problem for instance, the Cholesky factorisation has 1626987 nonzeros, while LU factorisation only has 37123, which makes the memory saving reach 97.7%. If the PCG approach were used for all IPM iterations, this memory advantage would allow certain problems to be solved for which the memory requirement of Cholesky would be prohibitive. In addition, it is essential that the LU factors are smaller by a significant factor since they will have to be applied twice for each PCG iteration when solving for the Newton direction, whereas the direct method using Cholesky factors requires the L factor to be used just twice to compute the Newton direction. The relative memory requirement can also be viewed as a measure of the maximum number of PCG iterations that can be per- formed while remaining competitive with the direct method using Cholesky factors. The results of comparing our mixed approach against the pure direct approach are given in Table 5.2. In all reported runs we have asked for eight digits of accuracy in the solution. For each test problem we report the number of interior point iterations and the total CPU time in seconds needed to solve the problem. Additionally, for the mixed approach we also report the number of interior point iterations in which preconditioned conjugate gradients method was used (IPM-pcg). For the problem fit2p, for example, 12 of the 25 interior point iterations used the iterative solution method: the remaining 13 iterations used the direct method. In the last column
  • 112. Chapter 5. Numerical Results 111 of Table 5.2 we report the saving in the total CPU time, when the mixed approach is used instead of the pure direct approach. For the problem fit2p, for example, the mixed approach is 64% faster than the pure direct approach. As we report in the column headed “Mixed approach” of Table 5.2, we use the PCG method only in the final iterations of the interior point method, while the rest of the interior point iterations are made using the direct method. For most problems, the numbers of IPM iterations required when using the pure direct and mixed approaches to solve a given problem are the same or differ only slightly. However, for chr15a, pds-10 and pds-20, the mixed approach requires more iterations, significantly so in the case of the latter two problems. In the case of chr15a this accounts for the only negative time saving in Table 5.2. For one problem, chr22b, using the mixed approach leads to significantly fewer IPM iterations being required. In order to give an insight into the behaviour of the preconditioned conju- gate gradients, in Table 5.3 we report the number of PCG iterations needed to solve a particular linear system. First, we report separately this number for the last interior point iteration when our preconditioner is supposed to behave best. The following three columns correspond to the minimum, the average, and the maximum number of PCG iterations encountered through- out all iterative solves. Finally, in Table 5.4 we report results for the problems solved with the pure iterative method. In these runs we have ignored the spread of elements in the diagonal matrix Θ and the distance to optimality, and we have forced the use of the PCG method in all interior point iterations. Such an approach comes with a risk of failure of the PCG method because the preconditioner does not have all its attractive properties in the earlier IPM iterations. In- deed, we would not advise its use in the general context. However, for several
  • 113. Chapter 5. Numerical Results 112 problems in our collection such an approach has been very successful. In this table the term unsolved denotes to that the solver is excess iteration limit. So far, we have reported some problems, which are benefit of our ap- proach. In Table 5.5 and Table 5.6 we show problems, which do not benefit of our approach. The consequences of using an iterative solver to solve the linear systems which arise from IPM, may lead to increase the number of IPM iterations. The total running time does not improve in the following problems because of this reason: shell, nw14, pds-02 and storm8. In the most of the problems in tables 5.5 and 5.6, the iterative approach works fine. Since, the PCG method converges to the solution in reasonable number of iterations. The slowness of the running time is due to that the solving time of iterative approach increases comparing with the direct approach. In agg and gfrd-pnc for instance there is no much saving in term of nonzero in the factorization, which causes increasing of the solving time.
  • 114. Chapter 5. Numerical Results 113 Problem Dimensions Nonzeros in Factors Memory m n nz(A) nz(B) nz(L) saving aircraft 3754 7517 24034 9754 1417131 99.3 % chr12a 947 1662 5820 5801 78822 92.6 % chr12b 947 1662 5820 4311 85155 94.9 % chr12c 947 1662 5820 6187 80318 92.3 % chr15b 1814 3270 11460 9574 218023 95.6 % chr15c 1814 3270 11460 9979 219901 95.5 % chr18a 3095 5679 19908 19559 531166 95.5 % chr18b 3095 5679 19908 9139 527294 96.3 % chr20a 4219 7810 27380 38477 885955 95.7 % chr20b 4219 7810 27380 63243 893674 92.9 % chr20c 4219 7810 27380 23802 926034 94.7 % chr22a 5587 10417 36520 33685 1392239 97.5 % chr22b 5587 10417 36520 38489 1382161 97.2 % chr25a 8148 15325 53725 49605 2555662 98.1 % fit1p 628 1677 10894 5002 196251 97.5 % fit2p 3001 13525 60784 34303 4498500 99.2 % fome10 6071 12230 35632 114338 1610864 92.2 % fome11 14695 24460 71264 237844 3221728 92.6 % fome12 24285 48920 167492 445156 6443456 93.1 % pds-06 9882 28655 82269 22020 580116 96.2 % pds-10 16559 48763 140063 37123 1626987 97.7 % pds-20 33875 105728 304153 77352 6960089 97.7 % route 20894 23923 187686 14876 3078015 99.5 % scr10 689 1540 5940 13653 124559 89.0 % scr12 1151 2784 10716 20437 330483 93.8 % scr15 2234 6210 24060 77680 125514 38.1 % scr20 5079 15980 61780 446686 6561431 93.2 % Table 5.1: Comparing the number of nonzero elements in the LU factorisation of the basis B and in the Cholesky factorisation of the normal equations matrix AΘAT .
  • 115. Chapter 5. Numerical Results 114 Problem Direct approach Mixed approach Time Time IPM-iters Time IPM-iters IPM-pcg saving aircraft 33.15 17 24.94 17 5 24.8 % chr12a 0.304 14 0.290 14 2 4.61 % chr12b 0.402 16 0.354 16 3 11.9 % chr12c 0.256 11 0.254 11 1 0.78 % chr15b 1.263 17 1.196 17 2 5.80 % chr15c 1.231 17 1.194 17 2 3.03 % chr18a 6.480 29 5.747 30 5 11.3 % chr18b 3.520 16 3.213 16 3 8.72 % chr20a 13.69 28 9.292 28 14 23.1 % chr20b 11.31 27 9.895 27 8 12.5 % chr20c 11.91 23 11.76 23 4 1.26 % chr22a 25.59 28 24.73 28 2 3.36 % chr22b 48.78 52 27.09 33 2 44.5 % chr25a 81.04 39 71.92 39 5 11.3 % fit1p 3.49 20 2.01 20 9 42.2 % fit2p 583.33 25 211.93 25 12 63.7 % fome10 281.96 45 124.01 43 17 56.0 % fome11 827.85 48 288.44 44 17 65.2 % fome12 1646.29 48 604.98 44 17 63.3 % pds-06 60.81 44 28.12 43 21 57.8 % pds-10 198.08 38 103.34 53 29 47.8 % pds-20 2004.87 47 770.83 66 38 61.6 % route 53.98 25 48.99 24 4 9.20 % scr10 0.839 19 0.685 19 8 18.4 % scr12 3.092 14 2.951 14 2 18.8 % scr15 50.79 26 41.22 26 7 18.8 % scr20 614.56 25 517.62 26 4 15.8 % Table 5.2: Solution statistics.
  • 116. Chapter 5. Numerical Results 115 Problem PCG Iterations lastIPM min average max aircraft 10 8 9 10 chr12a 19 18 20 23 chr12b 29 28 29 29 chr12c 26 26 26 26 chr15b 33 31 38 36 chr15c 32 31 32 32 chr18a 37 35 37 38 chr18b 57 53 56 57 chr20a 39 38 56 82 chr20b 32 32 63 104 chr20c 45 42 44 45 chr22a 48 46 49 53 chr22b 45 39 42 46 chr25a 51 46 50 55 fit1p 2 2 3 6 fit2p 4 3 15 43 fome10 142 129 243 519 fome11 169 123 205 494 fome12 111 111 210 500 pds-06 60 36 53 71 pds-10 66 45 60 86 pds-20 111 44 78 145 route 85 30 60 92 scr10 19 16 19 23 scr12 44 44 45 45 scr15 43 43 61 78 scr20 200 141 181 291 Table 5.3: The number of PCG iterations during the interior point method iterations.
  • 117. Chapter 5. Numerical Results 116 Problem Direct approach Iterative approach Time Time IPM-iters Time IPM-iters saving aircraft 33.15 17 2.87 15 91.3 % chr12a 0.304 14 0.449 14 -47.7 % chr12b 0.402 16 0.306 14 23.9% chr12c 0.256 11 0.254 11 1.01% chr15b 1.263 17 0.944 16 25.3 % chr15c 1.231 17 0.959 18 22.1 % chr18a 6.480 29 3.119 29 51.9 % chr18b 3.520 16 2.255 18 35.9 % chr20a 13.69 28 5.721 34 58.2 % chr20b 11.31 27 5.721 30 49.4 % chr20c 11.91 23 4.800 22 59.7 % chr22a 25.59 28 6.725 31 73.7 % chr22b 48.78 52 8.232 36 83.1 % chr25a 81.04 39 17.54 41 78.4 % fit1p 3.49 20 0.38 19 89.1 % fit2p 583.33 25 19.09 26 96.7 % fome10 281.96 45 126.72 47 19.6 % fome11 827.85 48 437.93 51 74.02 % fome12 1646.29 48 - - Unsolved pds-06 60.81 44 98.80 44 -31.23% pds-10 198.08 38 122.42 46 33.15% pds-20 2004.87 47 - - Unsolved scr10 0.839 19 0.633 19 24.6 % scr12 3.092 14 1.701 15 96.7 % scr15 50.79 26 16.55 26 67.4 % scr20 614.56 25 - - Unsolved Table 5.4: Efficiency of the pure iterative method.
  • 118. Chapter 5. Numerical Results 117 Problem Dimensions Direct approach Mixed approach m n nz(A) Time IPM-iters Time IPM-iters IPM-pcg 80bau3b 2235 14269 24883 2.209 50 5.172 50 14 agg 3754 7517 24034 0.179 20 0.277 26 12 bore3d 233 567 1679 0.064 23 0.059 23 2 chr15a 1814 3270 11460 1.274 17 1.316 22 9 dbir2 18879 64729 1177011 310.7 38 225.8 39 11 gfrd-pnc 616 1776 3061 0.100 18 0.123 18 13 pds-02 2953 10488 19424 1.476 31 3.213 34 15 qap8 912 2544 8208 2.183 10 2.380 10 1 nw14 73 123482 904983 24.12 45 46.04 50 27 scorpion 388 854 1922 0.056 16 0.053 16 1 shell 536 2313 3594 0.150 21 0.407 43 21 ship04l 360 2526 6740 0.123 16 0.142 16 3 ship04s 360 1866 4760 0.099 16 0.117 16 5 stocfor1 117 282 618 0.024 20 0.057 20 11 stocfor2 2157 5202 11514 0.582 36 1.829 36 10 storm8 4393 15715 32946 4.541 52 8.691 54 18 Table 5.5: Solution statistics for problems, which do not benefit of iterative approach.
  • 119. Chapter 5. Numerical Results 118 Problem Nonzeros in Factors PCG Iterations nz(B) nz(L) min average max 80bau3b 5800 42709 29 64 226 agg 1589 16629 3 24 45 bore3d 821 2941 17 17 17 chr15a 10533 218060 37 38 41 dbir2 51609 2869915 50 74 93 gfrd-pnc 1240 1798 11 13 15 pds-02 6422 40288 38 48 58 qap8 60553 193032 175 175 175 nw14 443 1968 6 ?? 15 scorpion 1559 2102 38 38 38 shell 1075 4096 3 25 45 ship04l 941 4428 10 12 14 ship04s 938 3252 10 11 13 stocfor1 302 903 8 29 46 stocfor2 6585 33207 32 96 325 storm8 9805 136922 42 64 85 Table 5.6: Comparing the number of nonzero elements in the factorisations and the number of PCG iterations during IPM.
  • 120. Chapter 6 Conclusions In this thesis we have discussed interior point method for linear programming problems. At each iteration of the IPM at least one linear system has to be solved. The main computational effort of interior point algorithms consists in the computation of these linear systems. Every day optimization problems become larger. Solving the corresponding linear systems with a direct method becomes sometimes very expensive for large problems. In this thesis, we have been concerned with using an iterative method to solve these linear systems. In Chapter 2 we have reviewed some of the popular solution methods of these linear systems (direct methods and iterative method). In this thesis we have used the PCG method to solve the (indefinite) augmented system (1.7), which arises from interior point algorithms for linear programming. We have proposed in Chapter 3 a new sparse preconditioner for the augmented system. This preconditioner takes advantage of the fact that a subset of elements in the matrix Θ−1 converge to zero as the solution of the linear program is approached. We replace these elements with zeros in the preconditioner. As a result, we have obtained a sparse and easily invertible block-triangular matrix. The constraint matrix A has been partitioned into 119
  • 121. Chapter 6. Conclusions 120 [B, N ], where B is an m by m nonsingular matrix. The matrix B is obtained −1 from m linearly independent columns of A which correspond to small θj . By following the analysis of Rozlozn´ and Simoncini [65] closely, we have shown ık that the PCG method can be applied to a non-symmetric indefinite matrix for a specific starting point. In addition, we have analysed the behaviour of the error and residual terms. This analysis reveals that, although we work with the indefinite system preconditioned with the indefinite matrix, the error and residual converge to zero and, asymptotically, behave in a similar way to the classical case when PCG is applied to a positive definite system. The use of an iterative method in this context makes an essential dif- ference in the implementation of the interior point algorithm. This requires a better understanding of IPM convergence properties in a situation when directions are inexact. In Chapter 4 we have considered the convergence analysis of the inexact infeasible path-following algorithm, where the aug- mented system is solved iteratively, according to what have been mentioned earlier. We have used a trick which consisted in shifting the residual from the dual constraint to the perturbed complementarity constraint. This has allowed us to modify the analysis of the (exact) infeasible IPM [77, 81] and generalize it to the inexact case. We have chosen a suitable stopping criteria of the PCG method used in this context and have provided a condition on the forcing term. Furthermore, we have proved the global convergence of the IIPF algorithm and have provided a complexity result for this method. Finally, in Chapter 5 we have illustrated the feasibility of our approach on a set of medium to large-scale linear problems. Based on these results we conclude that it is advantageous to apply the preconditioned conjugate gra- dient method to indefinite KKT systems arising in interior point algorithms for linear programming.
  • 122. Chapter 6. Conclusions 121 There are many research possibilities of interest still to explore in this area. The approach proposed in this thesis has proved to work well. However, in its current form it is limited to the linear programming case. One of the possible developments is to extend this approach to the quadratic and nonlinear programming problems.
  • 123. Bibliography [1] G. Al-Jeiroudi and J. Gondzio, Convergence analysis of inexact in- feasible interior point method for linear optimization, (accepted for pub- lication in Journal on Optimization Theory and Applications), (2007). [2] G. Al-Jeiroudi, J. Gondzio, and J. Hall, Preconditioning indefi- nite systems in interior point methods for large scale linear optimization, Optimization Methods and Software, 23 (2008), pp. 345–363. [3] E. D. Andersen, J. Gondzio, C. Meszaros, and X. Xu, Imple- ´ ´ mentation of interior point methods for large scale linear programming, in Interior Point Methods in Mathematical Programming, T. Terlaky, ed., Kluwer Academic Publishers, 1996, pp. 189–252. [4] M. Arioli, I. S. Duff, and P. P. M. de Rijk, On the augmented system approach to sparse least-squares problems, Numerische Mathe- matik, 55 (1989), pp. 667–684. [5] V. Baryamureeba and T. Steihaug, On the convergence of an inex- act primal-dual interior point method for linear programming, in Lecture Notes in Computer Science, Springer Berlin/Heidelberg, 2006. [6] S. Bellavia, An inexact interior point method, Journal of Optimization Theory and Applications, 96 (1998), pp. 109–121. 122
  • 124. 123 [7] S. Bellavia and S. Pieraccini, Convergence analysis of an inexact infeasible interior point method for semidefinite programming, Compu- tational Optimization and Applications, 29 (2004), pp. 289–313. [8] M. Benzi, G. Golub, and J. Liesen, Numerical solution of saddle point problems, Acta Numerica, 14 (2005), pp. 1–137. [9] L. Bergamaschi, J. Gondzio, M. Venturin, and G. Zilli, Inex- act constraint preconditioners for linear systems arising in interior point methods, Computational Optimization and Applications, 36 (2007), pp. 137–147. [10] L. Bergamaschi, J. Gondzio, and G. Zilli, Preconditioning indef- inite systems in interior point methods for optimization, Computational Optimization and Applications, 28 (2004), pp. 149–171. [11] M. W. Berry, M. T. Heath, I. Kaneko, M. Lawo, and R. J. Plemmon, An algorithm to compute a sparse basis of the null space, Numerische Mathematik, 47 (1985), pp. 483–504. [12] A. Bjorck, Numerical Methods for Least Squares Problems, SIAM, Philadelphia, 1996. [13] S. Bocanegra, F. Campos, and A. Oliveira, Using a hybrid preconditioner for solving large-scale linear systems arising from inte- rior point methods, Computational Optimization and Applications, 36 (2007), pp. 149–164. [14] W. J. Carolan, J. E. Hill, J. L. Kennington, S. Niemi, and S. J. Wichmann, An empirical evaluation of the KORBX algorithms for military airlift applications, Operations Research, 38 (1990), pp. 240– 248.
  • 125. 124 [15] T. Carpenter and D. Shanno, An interior point method for quadratic programs based on conjugate projected gradients, Computa- tional Optimization and Applications, 2 (1993), pp. 5–28. [16] J. S. Chai and K. C. Toh, Preconditioning and iterative solution of symmetric indefinite linear systems arising from interior point methods for linear programming, Computational Optimization and Applications, 36 (2007), pp. 221–247. [17] T. F. Coleman and A. Pothen, The null space problem I. com- plexity, SIAM Journal on Algebraic and Discrete Methods, 7 (1986), pp. 527–537. [18] , The null space problem II. algorithms, SIAM Journal on Algebraic and Discrete Methods, 7 (1986), pp. 544–562. [19] T. F. Coleman and A. Verma, A preconditioned conjugate gradi- ent approach to linear equality constrained minimization, Computational Optimization and Applications, 20 (2001), pp. 61–72. [20] R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM Journal on Numerical Analysis, 19 (1982), pp. 400–408. [21] H. Dollar, N. Gould, and A. Wathen, On implicit-factorization constraint preconditioners, in Large-Scale Nonlinear Optimization, G. D. Pillo, ed., Springer Netherlands, 2006. [22] H. Dollar and A. Wathen, Approximate factorization constraint preconditioners for saddle-point matrices, SIAM Journal on Scientific Computing, 27 (2005), pp. 1555–1572.
  • 126. 125 [23] H. S. Dollar, N. I. M. Gould, W. H. A. Schilders, and A. J. Wathen, Implicit-factorization preconditioning and iterative solvers for regularized saddle-point systems, SIAM Journal on Matrix and Applica- tions, 28 (2006), pp. 170–189. [24] I. S. Duff, A. M. Erisman, and J. K. Reid, Direct methods for sparse matrices, Oxford University Press, New York, 1987. [25] B. Fischer, Polynomial Based Iteration Methods for Symmetric Linear Systems, Wiley-Teubner, Chichester and Stuttgart, 1996. [26] R. Fletcher, Conjugate gradient methods for indefinite systems, in Numerical Analysis Dundee 1975, G. Watson, ed., Springer-Verlag, Berlin, New York, 1976, pp. 73–89. [27] A. Forsgren, P. E.Gill, and M. H.Wright, Interior point meth- ods for nonlinear optimization, SIAM Review, 44 (2002), pp. 525–597. [28] R. W. Freund and F. Jarre, A QMR-based interior-point algorithm for solving linear programs, Mathematical Programming, 76 (1997), pp. 183–210. [29] R. W. Freund, F. Jarre, and S. Mizuno, Convergence of a class of inexact interior-point algorithms for linear programs, Mathematics of Operations Research, 24 (1999), pp. 105–122. [30] D. M. Gay, Electronic mail distribution of linear programming test problems, Mathematical Programming Society COAL Newsletter, 13 (1985), pp. 10–12. [31] A. George and J. W. H. Liu, The evolution of the minimum degree ordering algorithm, SIAM Review, 31 (1989), pp. 1–19.
  • 127. 126 [32] , Computing solution of large sparse positive definite systems, Prentice-Hall, Englewood Cliffs, (NJ, 1981). [33] J. C. Gilbert and J. Nocedal, Global convergence properties of conjugate gradient methods for optimization, SIAM Journal on Opti- mization, 2 (1992), pp. 21–42. [34] P. E. Gill, W. Murray, D. B. Ponceleon, and M. A. Saun- ´ ders, Preconditioners for indefinite systems arising in optimization, SIAM Journal on Matrix Analysis and Applications, 13 (1992), pp. 292– 311. [35] J. Gondzio, Implementing Cholesky factorization for interior point methods of linear programming, Optimization, 27 (1993), pp. 121–140. [36] , HOPDM (version 2.12) – a fast LP solver based on a primal-dual interior point method, European Journal of Operational Research, 85 (1995), pp. 221–225. [37] J. Gondzio and T. Terlaky, A computational view of interior point methods for linear programming, In J. E. Beasley, editor, Advances in Linear and Integer Programming, chapter 3, Oxford University Press, Oxford, England, (1994), pp. 103–144. [38] C. Gonzaga, Path-following methods in linear programming, SIAM Review, 34 (1992), pp. 167–224. √ [39] C. Gonzaga and M. J. Todd, An O( nL)-iteration large-step primal-dual affine algorithm for linear programming, SIAM Journal on Optimization, 2 (1992), pp. 349–359.
  • 128. 127 [40] N. I. M. Gould, M. E. Hribar, and J. Nocedal, On the solution of equality constrained quadratic problems arising in optimization, SIAM Journal on Scientific Computing, 23 (2001), pp. 1375–1394. [41] J. A. J. Hall and K. I. M. Mckinnon, Hyper-sparsity in the revised simplex method and how to exploit it, Computational Optimization and Applications, 32 (2005), pp. 259–283. [42] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, Journal of Research of Natlional Bureau of Standards, 49 (1952), pp. 409–436. [43] K. R. James and W. Riha, Convergence criteria for successive overre- laxation, SIAM Journal on Numerical Analysis, 12 (1975), pp. 137–143. [44] J. J. Judice, J. Patricio, L. F. Portugal, M. G. C. Re- ´ sende, and G. Veiga, A study of preconditioners for network inte- rior point methods, Computational Optimization and Applications, 24 (2003), pp. 5–35. [45] N. Karmarkar and K. Ramakrishnan, Computational results of an interior point algorithm for large scale linear programming, Mathe- matical Programming, 52 (1991), pp. 555–586. [46] C. Keller, N. I. M. Gould, and A. J. Wathen, Constraint precon- ditioning for indefinite linear systems, SIAM Journal on Matrix Analysis and Applications, 21 (2000), pp. 1300–1317. [47] C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, vol. 16 of Frontiers in Applied Mathematics, SIAM, Philadelphia, 1995.
  • 129. 128 [48] V. Klee and G. J. Minty, How good is the simplex algorithm?, in inequalities iii, O. Shisha, ed., Academic Press, London, New York, (1972), pp. 159–175. [49] J. Korzak, Convergence analysis of inexact infeasible-interior-point- algorithm for solving linear progamming problems, SIAM Journal on Optimization, 11 (2000), pp. 133–148. [50] Z. Lu, R. D. S. Monteiro, and J. W. O’Neal, An iterative solver-based infeasible primal-dual path-following algorithm for convex QP, SIAM Journal on Optimization, 17 (2006), pp. 287–310. [51] L. Lukˇan and J. Vlcek, Indefinitely preconditioned inexact New- s ˇ ton method for large sparse equality constrained nonlinear program- ming problems, Numerical Linear Algebra with Applications, 5 (1998), pp. 219–247. [52] I. Lustig, R. Marsten, and D. Shanno, Computational experience with a primaldual interior point method for linear programming, Linear Algebra and its Applications, 152 (1991), pp. 191–222. [53] , Interior point methods for linear programming: computational state of the art, ORSA Journal on Computing, 6 (1994), pp. 1–14. [54] S. Mehrotra, Implementation of affine scaling methods: Approximate solutions of systems of linear equations using preconditioned conjugate gradient methods, Journal on Computing, 4 (1992), pp. 103–118. [55] S. Mehrotra and J. S. Wang, Conjugate gradient based implementa- tion of interior point methods for network flow problems, in Linear and Nonlinear Conjugate Gradient-Related Methods, L. Adams and J. L.
  • 130. 129 Nazareth, eds., AMS-IMS-SIAM Joint Summer Research Conference, 1995. [56] J. A. Meijerink and H. A. V. D. Vorst, An iterative solution method for linear systems of which the coefficient matrix is symmetric M-matrix, Mathematics of Computation, 31 (1977), pp. 148–162. [57] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia, 2000. [58] S. Mizuno and F. Jarre, Global and polynomial-time convergence of an infeasible-interior-point algorithm using inexact computation, Math- ematical Programming, 84 (1999), pp. 105–122. [59] R. D. S. Monteiro and J. W. O’Neal, Convergence analysis of long-step primal-dual infeasible interior point LP algorithm based on iterative linear solvers, Georgia Institute of Technology, (2003). [60] A. R. L. Oliveira and D. C. Sorensen, A new class of precon- ditioners for large-scale linear systems from interior point methods for linear programming, Linear Algebra and its Applications, 394 (2005), pp. 1–24. [61] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Non- linear Equations in Several Variables, Academic Press, New York, 1970. [62] C. C. Paige and M. A. Saunders, Solution of sparse indefinite systems of linear equations, SIAM Journal on Numerical Analysis, 12 (1975), pp. 617–629. [63] A. Pothen, Sparse null space basis computations in structural opti- mization, Numerische Mathematik, 55 (1989), pp. 501–519.
  • 131. 130 [64] M. G. C. Resende and G. Veiga, An implementation of the dual affine scaling algorithm for minimum cost flow on bipartite uncapacitated networks, SIAM Journal on Optimization, 3 (1993), pp. 516–537. [65] M. Rozlozn´ and V. Simoncini, Krylov subspace methods for saddle ık point problems with indefinite preconditioning, SIAM Journal of Matrix Analysis and Applications, 24 (2002), pp. 368–391. [66] Y. Saad, Iterative Method for Sparse Linear System, Second Edition, SIAM, Philadelphia, 1995. [67] Y. Saad and M. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM Journal on Scientific and Statistical Computing, 7 (1986), pp. 856–869. [68] J. Shewchuk, An introduction to the conjugate gradient method with- out the agonizing pain, tech. report, School of Computer Science, Carnegie Mellon University, USA, 1994. [69] J. A. Tomlin, Pivoting for size and sparsity in linear programming inversion routes, Journal of Mathematics and Applications, 10 (1972), pp. 289–295. [70] C. H. Tong and Q. Ye, Analysis of the finite precision Bi-conjugate gradient algorithm for nonsymmetric linear systems, Mathematics of Computation, 69 (1999), pp. 1559–1575. [71] L. N. Trefethen and D. Bau, III, Numerical linear algebra, Society for Industrial and Applied Mathematics, SIAM, Philadelphia, 1997. [72] H. A. van der Vorst, Iterative Krylov methods for large linear sys- tems, Cambridge University Press, Cambridge, (2003).
  • 132. 131 [73] R. Vanderbei, LOQO : An interior point code for quadratic program- ming, program in statistics and operations research, Princeton Univer- sity, (1995). [74] R. S. Varga, Matrix Iterative Analysis, Englewood Cliffs, NJ, 1962. [75] W. Wang and D. P. O’Leary, Adaptive use of iterative methods in predictor-corrector interior point methods for linear programming, Nu- merical Algorithms, 25 (2000), pp. 387–406. [76] M. H. Wright, The interior-point revolution in optimization: history, recent developments, and lasting consequences, American Mathematical Society, 42 (2004), pp. 39–65. [77] S. J. Wright, Primal-Dual Interior-Point Methods, SIAM, Philadel- phia, 1997. √ [78] X. Xu, An O( nL)-iteration large-step infeasible path-following algo- rithm for linear programming, Technical report, University of Lowa, (1994). [79] Y. Ye, Interior-point algorithm: theory and analysis, John Wiley and Sons, New York, 1997. [80] D. M. Young, Iterative Soluation of Large Linear Systems, Academic Press, New York, 1971. [81] Y. Zhang, On the convergence of a class of infeasible interior-point methods for the horizontal linear complementarity problem, SIAM Jour- nal on Optimization, 4 (1994), pp. 208–227.
  • 133. [82] G. Zhou and K. C. Toh, Polynomiality of an inexact infeasible inte- rior point algorithm for semidefinite programming, Mathematical Pro- gramming, 99 (2004), pp. 261–282.